CN105046320A - Virtual sample generation method - Google Patents

Virtual sample generation method Download PDF

Info

Publication number
CN105046320A
CN105046320A CN201510496474.4A CN201510496474A CN105046320A CN 105046320 A CN105046320 A CN 105046320A CN 201510496474 A CN201510496474 A CN 201510496474A CN 105046320 A CN105046320 A CN 105046320A
Authority
CN
China
Prior art keywords
vector
virtual sample
input vector
output vector
alternative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510496474.4A
Other languages
Chinese (zh)
Inventor
汤健
孙春来
毛克峰
贾美英
李东
李立国
胡亚男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CALCULATE OFFICE UNIT 94070 OF PLA
Original Assignee
CALCULATE OFFICE UNIT 94070 OF PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CALCULATE OFFICE UNIT 94070 OF PLA filed Critical CALCULATE OFFICE UNIT 94070 OF PLA
Priority to CN201510496474.4A priority Critical patent/CN105046320A/en
Publication of CN105046320A publication Critical patent/CN105046320A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention discloses a virtual sample generation method comprising the following steps: firstly, acquiring a limited quantity of high dimensional real samples by adopting means of signal acquisition and corresponding devices, then constructing a feasibility based planning (FBP) model by adopting a partial least square (PLS) algorithm, a genetic (GA) algorithm and a back propagation neural network (BPNN) algorithm; secondly, generating an input of a virtual sample on the basis of future knowledge of the known real sample; thirdly, inputting PLS extracted potential features of the virtual sample into FBP, and acquiring an output of the virtual sample on the basis of the future knowledge; and finally, combining input vectors and output vectors of the virtual sample according with a preset rule to acquire a complete virtual sample. Therefore, the virtual sample that can be used for predicting high dimensional data is generated relatively accurately.

Description

A kind of virtual sample generation method
Technical field
The present invention relates to machine learning techniques field, be specifically related to a kind of virtual sample generation method.
Background technology
Machine learning techniques based on large data extensively and is successfully applied to much different industries.For many medical recordss of rare disease and the commitment of flexible manufacturing system, a small amount of training sample is only had to can be used in structure forecast model.For the process flow industry process of complexity, for realizing its optimal control and running optimizatin, must the testing process parameter that be difficult to of critical mechanical equipment be measured or be predicted, as grinding process grinding machine internal load difficult parameters directly calculates with direct-detection and employing mechanism model, main employing is carried out based on the vibration of grinding mill barrel and the flexible measurement method of acoustic signal higher-dimension frequency spectrum data.But effective modeling data can only obtain two stages: (1) is specially for soft-sensing model builds the experimental design stage of carrying out; (2) grinding machine out of service and restart the operation phase.In actual industrial process, to sacrifice economic interests for cost or after the very long stand-by period, perhaps can obtain the available training sample of sufficient amount.The similar problem being difficult to obtain enough modeling samples is there is equally at the medical records of rare disease and the commitment of flexible manufacturing system.Therefore, reality is the Modeling Research that we must carry out towards higher-dimension Small Sample Database.
Research shows, enough sample sizes are extremely important for the effective learning model of structure.Current existing large quantifier elimination multiaspect to classification problem, as document [1] have studied error in classification, learning sample quantity, sample input dimension and sorting algorithm complicacy between mutual relationship.The quantity of the smallest sample needed to determine to obtain necessary estimated performance, researchist proposes the indexs [2,3] such as approximate correct, the training sample of probability and input feature vector ratio.At present, the definition about Small Sample Database is also very relative and subjective.
Prior art proposes a kind of huge trend dispersion (MTD) technology newly for solving the planned dispatching problem of early stage flexible manufacturing system, main employing virtual sample generates the nicety of grading that (VirtualSampleGeneration, VSG) technology improves back propagation neural networks (BPNN) model.At present, existing polytype VSG method, as the distribution function etc. utilizing domain-specialist knowledge, noise added input data, utilize original sample.These research multiaspects are to the classification problem based on higher-dimension Small Sample Database.
Virtual sample for regression problem is generated, document [4] proposes based on the VSG method of multi-Layer Perceptron Neural Network for improving the Generalization Capability of learning model, wherein: the input of virtual sample produces by selecting point near authentic specimen input, and the output of virtual sample exports data acquisition by balancing different multi-Layer Perceptron Neural Network.Document [5] proposition Decentralized Neural Network (DNN) produces virtual sample and modeling small data set, and simulation result shows that DNN has stronger estimated performance than BPNN.These methods are all the input feature vector processing separately training sample usually.Recently, the VSG method based on genetic algorithm (GA) that document [6] proposes can describe the integrating effect between different input feature vector.
Said method adopts traditional single model to produce virtual sample.For modeling data or the higher-dimension small sample training data with complex distributions, traditional single model modeling method is difficult to carry out effective pattern-recognition or regression modeling.
【1】S.J.Raudys,A.K.Jain,“SmallSampleSizeEffectsinStatisticalPatternRecognition:RecommendationsforPractitioners,”IEEETransactionsonPatternAnalysisandMachineIntelligence,vol.13,no.3,pp.252-265,1991.
【2】J.Shawe-Taylor,M.Anthony,andN.L.Biggs,“BoundingSampleSizewiththeVapnik-ChervonenkisDimension,”DiscreteAppliedMath.,vol.42,pp.65-73,1993.
【3】Y.MutoandY.Hamamoto,“ImprovementoftheParzenClassifierinSmallTrainingSampleSizeSituations,”IntelligentDataAnalysis,vol.5,no.6,pp.477-490,2001.
【4】S.Z.Cho,M.Jang,S.J.Chang.Virtualsamplegenerationusingapopulationofnetworks,neuralprocessingletters,vol.5,pp.83-89,1997.
【5】C.F.HuangandC.Moraga,“ADiffusion-Neural-NetworkforLearningfromSmallSamples,”Int’lJ.ApproximateReasoning,vol.35,pp.137-161,2004.
【6】D.C.Li,I.H.Wen,”Ageneticalgorithm-basedvirtualsamplegenerationtechniquetoimprovesmalldatasetlearning,”Neurocomputing,vol.143,pp.222-230,2014.
Summary of the invention
In view of this, the invention provides a kind of virtual sample generation method, to solve the problem of the sample size deficiency of higher-dimension small sample.
The embodiment of the present invention provides a kind of virtual sample generation method, and for based on multiple authentic specimen generating virtual sample, described method comprises:
S100, extract the potential feature of the input vector of described authentic specimen, and obtain potential Feature Selection Model;
S200, the output vector training corresponding according to described potential characteristic sum obtain the forecast model that estimated performance meets predetermined condition, and described forecast model is used for the output vector corresponding according to the potential feature calculation of input vector;
S300, all can interpolation authentic specimen between carry out the alternative input vector of interpolation generating virtual sample, described can the authentic specimen of interpolation two authentic specimens identical to the element of predetermined quantity in the associated vector for corresponding input vector or input vector;
S400, extract the potential feature of described alternative input vector according to described potential Feature Selection Model;
S500, the alternative output vector corresponding according to the potential feature calculation of described forecast model and described alternative input vector, and retain meet virtual sample screening conditions alternative output vector and corresponding alternative input vector to obtain virtual sample set corresponding to described multiple authentic specimen.
Further, described step S100 comprises:
To maximize covariance between input vector and output vector for target, extract the potential feature of the input vector of described authentic specimen based on partial least squares algorithm.
Further, described step S200 comprises:
S210, using the potential feature of the input vector of described authentic specimen and the output vector of correspondence as training dataset;
S220, produce multiple training subsample by Bootstrap algorithm from described training dataset;
S230, build multiple candidate's submodel based on BPNN according to described multiple training subsample;
S240, to choose all candidate's submodels that corresponding Model Selection weight parameter is greater than Model Selection threshold value according to training dataset and form integrated models, wherein, Model Selection weight parameter is by obtaining through genetic algorithm optimization to minimize predicated error for target the initial weight parameter of stochastic generation;
S250, to calculate in integrated model all submodel output vector sums as the prediction output vector of integrated model based on training dataset, and calculate the estimated performance of integrated model based on the prediction output vector of described integrated model and the output vector of described training sample;
S260, when the estimated performance of integrated model meets predetermined condition, using described integrated model as described forecast model.
Further, described step S300 comprises the alternative input vector based on following formula generating virtual sample:
x l &prime; V S = x l o w V S + ( x h i g h V S - x l o w V S ) l &prime; N V S G 1 &le; l &prime; < N V S G
Wherein, be the alternative input vector that the individual interpolation of l ' generates, be first can the input vector of authentic specimen of interpolation, be second can the input vector of authentic specimen of interpolation, N vSGfor the quantity of predetermined interpolation segmentation.
Further, described step S500 comprises:
S510, calculate described alternative output vector corresponding to current alternative input vector;
S520, when described alternative output vector is between the predetermined output vector upper limit and output vector lower limit, retain this alternative output vector and corresponding alternative input vector as virtual sample;
S530, when described alternative output vector is not between the predetermined output vector upper limit and output vector lower limit, judge whether the calculation times of current alternative input vector exceeds predetermined threshold, if so, perform S540, otherwise perform step 510;
S540, abandon current alternative input vector, using alternative for next one input vector as current alternative input vector, perform step 510;
S550, after all alternative input vectors are all traversed, using retain all virtual samples as described virtual sample set.
Further, described virtual sample is used for and described authentic specimen together trains acquisition high dimension vector forecast model, and described high dimension vector forecast model is used for based on output vector corresponding to higher-dimension input vector prediction.
Further, described input vector is sample vibration signals spectrograph and the sample acoustic signal frequency spectrum of grinding mill barrel;
Described output vector is mill load parameter;
Described virtual sample is for training mill load parameter soft-sensing model.
First the present invention adopts the means such as signals collecting and related device to obtain the authentic specimen of a limited number of higher-dimension, then adopts offset minimum binary (PLS) algorithm, genetic algorithm (GA), back propagation neural networks (BPNN) algorithm structure based on planning (FBP) model of feasibility; Then based on the input of the priori generation virtual sample of known authentic specimen; The potential feature of the input vector of the virtual sample then extracted by PLS inputs FBP and obtains virtual sample based on priori and exports; Finally by virtual sample complete to the input vector and the rear acquisition of output vector combination that meet the virtual sample of presetting rule.Thus, the virtual sample more adequately generating and can be used for high dimensional data prediction is achieved.
Accompanying drawing explanation
By referring to the description of accompanying drawing to the embodiment of the present invention, above-mentioned and other objects, features and advantages of the present invention will be more clear, in the accompanying drawings:
Fig. 1 is the process flow diagram of the virtual sample generation method of the embodiment of the present invention;
Fig. 2 is the process flow diagram of the step S500 of the virtual sample generation method of the embodiment of the present invention;
Fig. 3 is that the virtual sample generation method of the application embodiment of the present invention carries out the grinder system of hard measurement and the hardware configuration schematic diagram of supporting hard measurement system;
Fig. 4 is the curve map as the rumble spectrum of the grinding machine of input vector, shake audio spectrum and the ore milling concentration as output vector in authentic specimen;
Fig. 5 be that latent variable quantity is set to 3 and interpolation segments is set to 5 time the rumble spectrum of virtual sample, the curve map of shake audio spectrum and ore milling concentration;
Fig. 6 is the error statistics comparison diagram of each virtual sample when adopting different latent variable quantity;
Fig. 7 is the rate of change comparison diagram of input vector and output vector data variance when adopting different latent variable quantity.
Embodiment
Based on embodiment, present invention is described below, but the present invention is not restricted to these embodiments.In hereafter details of the present invention being described, detailedly describe some specific detail sections.Do not have the description of these detail sections can understand the present invention completely for a person skilled in the art yet.In order to avoid obscuring essence of the present invention, known method, process, flow process, element and circuit do not describe in detail.
In addition, it should be understood by one skilled in the art that the accompanying drawing provided at this is all for illustrative purposes, and accompanying drawing is not necessarily drawn in proportion.
Unless the context clearly requires otherwise, similar words such as " comprising ", " comprising " otherwise in whole instructions and claims should be interpreted as the implication that comprises instead of exclusive or exhaustive implication; That is, be the implication of " including but not limited to ".
In describing the invention, it is to be appreciated that term " first ", " second " etc. are only for describing object, and instruction or hint relative importance can not be interpreted as.In addition, in describing the invention, except as otherwise noted, the implication of " multiple " is two or more.
Fig. 1 is the process flow diagram of the virtual sample generation method of the embodiment of the present invention.As shown in Figure 1, described virtual sample generation method comprises the steps:
Step S100, extract the potential feature of the input vector of described authentic specimen, and obtain potential Feature Selection Model.
In the present embodiment, to maximize covariance between input vector and output vector for target, the potential feature of the input vector of authentic specimen is extracted based on partial least squares algorithm (PLS).Partial least squares algorithm can carry out dimensionality reduction to the authentic specimen of higher-dimension, with the high dimensional feature that the potential character displacement that dimension is lower is original.For authentic specimen (x, y), wherein x is input vector, and y is output vector, can be expressed as from the feature wherein extracted:
Z=[t 1,t 2,…,t h]
Wherein, t i(i=1,2 ... .., h) be potential feature, h is the quantity of potential feature, and it can be determined by experience or variance contribution ratio.
Step S200, the output vector training corresponding according to described potential characteristic sum obtain the forecast model that estimated performance meets predetermined condition, and described forecast model is used for the output vector corresponding according to the potential feature calculation of input vector.
Particularly, step S200 comprises:
Step S210, using the potential feature of the input vector of described authentic specimen and the output vector of correspondence as training dataset.
Wherein, training sample represents with Z.
Step S220, produce P by Bootstrap algorithm from training dataset gAindividual training subsample.
The matter of utmost importance of modeling is integrated structure." sample train sample " method based on Bootstrap algorithm is used in training sample Z, produce training subsample wherein P gAthe quantity of subsample, the population quantity namely in the quantity of candidate's submodel and GA algorithm.
Step S230, according to training subsample build the candidate's submodel based on BPNN
Step S240, to choose the set set of integrated submodel (also i.e.) that all candidate's submodels that corresponding Model Selection weight parameter is greater than Model Selection threshold value form qualified submodel according to training dataset Z wherein, Model Selection weight parameter is by obtaining through genetic algorithm optimization to minimize predicated error for target the initial weight parameter of stochastic generation.Wherein, P * gAfor the submodel quantity in integrated model.
Particularly, can comprise in step S240:
(1) adopt training dataset Z according to the output vector calculating all candidate's submodels respectively, obtain corresponding output vector set
(2) output vector concentrated based on training data calculates the predicated error of all candidate's submodels
(3) correlation matrix is built based on predicated error
(4) for candidate's submodel produces the initial weight parameter for carrying out Model Selection at random
(5) based on adopt standard genetic algorithm to minimize predicated error for these random vectors of objective optimization, and its result is designated as as optimization weight parameter.
(6) select candidate's submodel, obtain qualified submodel set with their quantity summation wherein, λ gAfor submodel screening threshold value, in the present embodiment, λ can be arranged so that gA=1/p gA.
Step 240 is actually the integrated submodel selecting for building final forecast model according to selective ensemble (SEN) study based on genetic algorithm (GA).。
Step S250, calculate the output vector of average as integrated model of all submodel output vector sums in integrated submodel based on training dataset Z, and calculate the estimated performance of integrated model based on the output vector of described integrated model and the output vector of described training sample.
Particularly, the output vector of integrated model is calculated in the present embodiment based on such as under type that is:
y ^ G A S E N = 1 p G A * &Sigma; j s u b = 1 p G A * y ^ j s u b = 1 p G A * &Sigma; j s u b = 1 p G A * f j s u b B P N N ( Z )
Meanwhile, based on the estimated performance MAPE of following formulae discovery integrated model:
M A P E = 1 k &Sigma; i = 1 k | y ^ G A S E N - y i y i |
Wherein, for the output vector of integrated model, y ifor the output vector that training data is concentrated, k is the quantity that training data concentrates output vector.
Step S260, when the estimated performance of integrated model meets predetermined condition, using described integrated model as described forecast model.
In this enforcement, using the appreciation condition of MAPE≤0.1 as integrated model, when estimated performance meets above-mentioned condition, using integrated model as forecast model, otherwise, return step S210 and rebuild new integrated model, until obtain the forecast model meeting above-mentioned condition.
Step S300, all can interpolation authentic specimen between carry out the alternative input vector of interpolation generating virtual sample, described can the authentic specimen of interpolation two authentic specimens identical to the element of predetermined quantity in the associated vector for corresponding input vector or input vector.
Particularly, step S300 comprises the alternative input vector based on following formula generating virtual sample:
x l &prime; V S = x l o w V S + ( x h i g h V S - x l o w V S ) l &prime; N V S G 1 &le; l &prime; < N V S G
Wherein, be the alternative input vector that the individual interpolation of l ' generates, be first can the input vector of interpolation higher-dimension training sample, be second can the input vector of interpolation higher-dimension training sample, N vSGfor the quantity of predetermined interpolation segmentation.
Preferably, in the input variable of input vector except a variable (element), two true training samples that other variable (element) is all identical can form one can the authentic specimen pair of interpolation.Usually, for concrete a certain physics or chemical process, the concrete meaning of these input variables (element) is all known and explainable; For the experiment that these physics or chemical process are done, or the data acquisition carried out, namely the requirement at the interval between these variablees is formed to the priori at true training sample interval.Certainly, also can be screened by the identical vector relevant to input vector of variable major part that obtain can the authentic specimen pair of interpolation.Between the input vector of these two authentic specimens, the alternative input vector of the virtual sample of required quantity can be obtained by linear interpolation.If such as the interval between two adjacent authentic specimens can be divided into N vSGpart, so altogether can produce and amount to N vSGthe input of-1 virtual sample.Wherein, N vSG>=2.
The authentic specimen of interpolation all alternative input vectors namely can be obtained to carrying out interpolation to all.
Step S400, extract the potential feature of described alternative input vector according to described potential Feature Selection Model.
Input with virtual sample for example, following formula is adopted to extract potential feature:
z l &prime; V S = x l &prime; V S W ( P T W ) - 1
Wherein P=[p 1, p 2..., p h] and W=[w 1, w 2..., w h] be load matrix and the matrix of coefficients of the acquisition when utilizing partial least squares algorithm to carry out dimensionality reduction, which constitute the parameter of potential Feature Selection Model.
Step S500, the alternative output vector corresponding according to the potential feature calculation of described forecast model and described alternative input vector, and retain meet virtual sample screening conditions alternative output vector and corresponding alternative input vector to obtain virtual sample set corresponding to authentic specimen set.
Fig. 2 is the process flow diagram of the step S500 of the virtual sample generation method of the embodiment of the present invention.As shown in Figure 2, step S500 comprises:
Step S510, calculate alternative output vector corresponding to current alternative input vector.
Particularly, the alternative input vector of virtual sample output vector following formula can be adopted to calculate:
y ^ l &prime; V S = 1 p G A * &Sigma; j s u b = 1 p G A * y ^ j s u b V S = 1 p G A * &Sigma; j s u b = 1 p G A * f j s u b B P N N ( z l &prime; V S )
Wherein, for alternative input vector corresponding potential feature.
Step S520, when described alternative output vector is between the predetermined output vector upper limit and output vector lower limit, retain alternative output vector and corresponding alternative input vector as virtual sample.
Particularly, judge whether meet following virtual sample screening conditions:
y 1 o w V S &le; y ^ l &prime; V S &le; y h i g h V S
Wherein, for the predetermined output vector upper limit, and for predetermined output vector lower limit.If meet above-mentioned condition, saved as the current alternative input vector of virtual sample with the output vector of correspondence export as virtual sample can be accepted.
Step S530, when described alternative output vector is not between the predetermined output vector upper limit and output vector lower limit, judge whether the calculation times of current alternative input vector exceeds predetermined threshold, if so, perform step S540, otherwise perform step S510.
? when not meeting above-mentioned condition, current output vector can not as virtual sample, therefore, need repetition step S510-S520 to calculate the output variable that makes new advances and to judge whether to meet the requirements, if the number of times repeated exceeds predetermined threshold, then illustrate that current input vector is not suitable as virtual sample.
Step S540, abandon current alternative input vector, using alternative for next one input vector as current alternative input vector, perform step S510.
Judging that current input vector is not suitable as virtual sample, then changing alternative input vector, calculating the output vector of this alternative input vector and correspondence.
Step S550, after all alternative input vectors are all traversed, using retain all virtual samples as described virtual sample set.
After all alternative input vectors are all traversed, can using all virtual samples as virtual sample set, as new training sample after combining with original true training sample, it can be represented as
First the present embodiment adopts the means such as signals collecting and related device to obtain a limited number of higher-dimension training sample, then adopts offset minimum binary (PLS) algorithm, genetic algorithm (GA), back propagation neural networks (BPNN) algorithm structure based on planning (FBP) model of feasibility; Then based on the input of the priori generation virtual sample of known true training sample; Then the potential feature input FBP of the virtual sample extracted by PLS also obtains virtual sample output based on priori; Finally obtain complete virtual sample by rear for the virtual input and output combination meeting presetting rule.Thus, the virtual sample generation more adequately generating and can be used for high dimensional data prediction is achieved.
The present embodiment can be applied to physics or the chemical processes such as mill load parameter hard measurement, the virtual sample generation of flexible manufacturing, rare case history forecast model structure.When being applied to mill load parameter hard measurement, the input vector of described high dimension vector training sample is sample vibration signals spectrograph and the sample acoustic signal frequency spectrum of grinding mill barrel.Described output vector is mill load parameter.
Particularly, Fig. 3 is that the virtual sample generation method applying the embodiment of the present invention carries out the grinder system of hard measurement and the hardware configuration schematic diagram of supporting hard measurement system.As shown in Figure 3, two-part grinding circuit (GC) is widely used in ore dressing process, at the first end of grinding circuit, generally include be linked in sequence feed bin 1, rock feeder 2, wet type pre-selecting machine 3, grinding machine 4 and pump pond 5.Hydrocyclone 6 is connected between pump pond 5 and wet type pre-selecting machine 3, makes to be returned grinding machine compared with the part of coarseness as underflow and regrinds.Newly give ore deposit, new feedwater and periodically add steel ball, enter grinding machine 4 (being generally bowl mill) together with the underflow of hydrocyclone.Ore is impacted by steel ball and grinds stripping for thinner particle in grinding machine 4, and the ore pulp continuous print be mixed to get with water in grinding machine 4 flows out grinding machine, enters pump pond 5.By injecting new water in pump pond 5, ore pulp is diluted, and this ore pulp diluted is injected hydrocyclone 6 with certain pressure, and then these ore pulps pumping into hydrocyclone are separated into two parts: comprise and enter grinding machine compared with the part of coarseness as underflow and regrind; Remainder then enters secondary grinding (GCII).
Simultaneously, in order to carry out the hard measurement of load parameter, vibration signal acquisition device 7 and acoustic signal acquisition device 8 are combined with grinding machine 4 respectively and arrange to obtain vibration signal and acoustic signal, and data processing equipment 9 carries out data processing according to the vibration signal and acoustic signal that detect acquisition and obtains higher-dimension frequency spectrum structure soft-sensing model acquisition load parameter.
Time in grinding production rate (i.e. ore grinding output) is normally obtained by maximized optimization circulating load, and circulating load is often determined by the load of GCI.Grinding machine overload can cause grinding machine to tell the stifled mill of thicker, the grinding machine of granularity of material, grinding machine outlet material, even causes grinding process to stop production.Grinding machine underload can cause grinding machine sky to be pounded, and causes energy dissipation, increases steel ball loss, and even grinding machine damages.Therefore, mill load is very important parameter.The Measurement accuracy of ball grinder interior load parameter is for guarantee grinding process product quality, production efficiency, and the security of production run is closely related.In industry spot, domain expert's many dependence multi-source information and experience monitor mill load state.Data-driven flexible measurement method based on grinding mill barrel vibration signal and acoustic signal frequency spectrum is usually used in overcoming the subjectivity and instability that expert reasoning mill load brings.
Mill load parameter comprises material ball ratio (MBVR), pulp density (PD) and pack completeness (CVR), and these parameters and mill load, mill load state are relevant.
In fact, the steel ball in grinding machine is ten hundreds of.These steel ball hierarchal arrangement and simultaneously falling with different impulsive forces.The vibration that the impulsive force of these different frequencies and amplitude causes superposes mutually.The mass unbalance of grinding machine self and the installation of bowl mill are biased and grinding mill barrel also can be caused to vibrate.These vibration signals intercouple, the measurable cylindrical shell vibration signal of final formation.
The sound radiation of cylindrical shell vibration, i.e. construct noise is the chief component of acoustic signal.Due to the strong reflection face that grinding mill barrel is in acoustics, grinding machine internal noise continuous reflection carries out formation mixing sound field, and these parts being transferred to grinding machine outside by grinding mill barrel and grinding machine bolt are called airborne noise.The acoustic signal measuring the grinding machine abrasive areas outside obtained also comprises the noise of contiguous grinding machine and miscellaneous equipment.
By running grinding machine under predetermined loading condiction, and shake sound and vibration signal of cylindrical shell being detected, a limited number of authentic specimen can be obtained.
Due to cylindrical shell vibration and acoustic signal frequency spectrum and mill load parameter closely related, therefore time-frequency convert acquisition power spectrum density (PSD) is carried out to them, thus, the higher-dimension input vector of correspondence can be obtained.Should be understood that and can carry out based on existing various technology or device vibration and acoustic signal rated output spectral density, do not repeat them here.Based on the virtual sample generation method generating virtual sample of the embodiment of the present invention, and the soft-sensing model of grinding machine can be built further based on acquisition virtual sample based on above-mentioned authentic specimen.Hard measurement can be carried out to meeting of grinding machine based on this soft-sensing model.
Particularly, this experiment is tested on bowl mill in XMQL-420 × 450 and is carried out.The steel ball size adopted is respectively 30,20 and 15 millimeters.
Table 1 is the distribution form of the authentic specimen under four kinds of different experiment conditions.
Table 1
In the present embodiment, adopting the vector relevant to input vector to screen acquisition can the authentic specimen pair of interpolation.Known according to table 1, as in the Water l oad of experiment condition and material load (both are that the vector of element is directly related with the frequency spectrum of input vector acoustic signal and vibration signal), have a maintenance to immobilize, two samples of an other change just can be formed can the authentic specimen pair of interpolation.As expected, load keeps 10Kg constant, and Water l oad changes to multiple virtual samples that can to generate between No.1 and No.2 of 15Kg between Water l oad 5 ~ 10Kg from 5Kg.Thus, the alternative input vector of virtual sample can by carrying out interpolation acquisition to the interval of following authentic specimen, i.e. No.1 and No.2, No.2 and No.3, No.4 and No.5, No.5 and No.6, No.7 and No.8, No.8 and No.9, No.10 and No.11, No.11 and No.12, and No.12 and No.13.Work as N vSG=2,3 ..., when 10, the quantity of the alternative input vector of virtual sample is then 9,18 respectively ..., 81.
Thus, corresponding virtual sample can be generated based on the virtual sample generation method described in the embodiment of the present invention.
Fig. 4 is the curve map as the rumble spectrum of the grinding machine of input vector, shake audio spectrum and the ore milling concentration as output vector in authentic specimen.Fig. 5 be that latent variable quantity is set to 3 and interpolation segments is set to 5 time the rumble spectrum of virtual sample, the curve map of shake audio spectrum and ore milling concentration.In an experiment, N is got vSG=2,3,4,5 for generation of virtual sample.Following parameter is used to the forecast model building ore milling concentration (PD): population quantity (P gA) 20, submodel selects threshold value (λ gA) 0.05, latent variable quantity h=1 ..., 5, the hidden layer neural source quantity of neural network submodel is 2*h+1, and gets predetermined threshold N times=10.Adopt different parameters (N vSG, the quantity of virtual sample quantity time h) is as shown in table 2.
Adopt the virtual training sample of new generation and original authentic specimen to combine and build pulp density (PD) soft-sensing model.Model parameter is determined: the span of latent variable number (LV) is 1 ~ 5, and population quantity is 20, and submodel selects threshold value (λ gA) quantity of implicit node is that 2 times of input variable add 1 again in 0.05, BPNN algorithm, acquiescence training pace is 100.Root mean square relative error (RMSRE) is for assessment of the estimated performance of ore milling concentration (PD) soft-sensing model.Run 20 times, adopt the statistics of the estimated performance of varying number virtual sample as shown in table 2.
Table 2
Meanwhile, Fig. 6 is the error statistics comparison diagram of each virtual sample when adopting different latent variable quantity.Fig. 7 is the comparison diagram of input vector and output vector data variance rate of change when adopting different latent variable quantity.
Table 2 and Fig. 6-Fig. 7 show: the impact that the virtual sample generation method of the embodiment of the present invention generates parameter in the estimated performance selected by virtual sample is larger, its precision of prediction is not weaker than the method only adopting authentic specimen modeling, but its fluctuation range in estimated performance (difference of the maximal and minmal value of 20 operating predicated errors) is better than non-VSG method; After increasing virtual sample, increasing along with the increase of latent variable number input data variance rate of change, is then increase when latent variable number is less to output data.
Thus, the embodiment of the present invention achieves the virtual sample more adequately generating and can be used for high dimensional data prediction.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, to those skilled in the art, the present invention can have various change and change.All do within spirit of the present invention and principle any amendment, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (7)

1. a virtual sample generation method, for based on multiple authentic specimen generating virtual sample, described method comprises:
S100, extract the potential feature of the input vector of described authentic specimen, and obtain potential Feature Selection Model;
S200, the output vector training corresponding according to described potential characteristic sum obtain the forecast model that estimated performance meets predetermined condition, and described forecast model is used for the output vector corresponding according to the potential feature calculation of input vector;
S300, all can interpolation authentic specimen between carry out the alternative input vector of interpolation generating virtual sample, described can the authentic specimen of interpolation two authentic specimens identical to the element of predetermined quantity in the associated vector for corresponding input vector or input vector;
S400, extract the potential feature of described alternative input vector according to described potential Feature Selection Model;
S500, the alternative output vector corresponding according to the potential feature calculation of described forecast model and described alternative input vector, and retain meet virtual sample screening conditions alternative output vector and corresponding alternative input vector to obtain virtual sample set corresponding to described multiple authentic specimen.
2. virtual sample generation method according to claim 1, is characterized in that, described step S100 comprises:
To maximize covariance between input vector and output vector for target, extract the potential feature of the input vector of described authentic specimen based on partial least squares algorithm.
3. virtual sample generation method according to claim 2, is characterized in that, described step S200 comprises:
S210, using the potential feature of the input vector of described authentic specimen and the output vector of correspondence as training dataset;
S220, produce multiple training subsample by Bootstrap algorithm from described training dataset;
S230, build multiple candidate's submodel based on BPNN algorithm according to described multiple training subsample;
S240, to choose all candidate's submodels that corresponding Model Selection weight parameter is greater than Model Selection threshold value according to described training dataset and form integrated models, wherein, Model Selection weight parameter is by carrying out genetic algorithm optimization acquisition to the initial weight parameter of stochastic generation to minimize predicated error for target;
S250, calculate the prediction output vector of average as integrated model of all submodel output vector sums in integrated model based on training dataset, and calculate the estimated performance of integrated model based on the prediction output vector of described integrated model and the output vector of described training sample;
S260, when the estimated performance of integrated model meets predetermined condition, using described integrated model as described forecast model.
4. virtual sample generation method according to claim 1, is characterized in that, described step S300 comprises the alternative input vector based on following formula generating virtual sample:
x l &prime; V S = x l o w V S + ( x h i g h V S - x l o w V S ) l &prime; N V S G 1 &le; l &prime; < N V S G
Wherein, be the alternative input vector that the individual interpolation of l ' generates, be first can the input vector of authentic specimen of interpolation, be second can the input vector of authentic specimen of interpolation, N vSGfor the quantity of predetermined interpolation segmentation.
5. virtual sample generation method according to claim 1, is characterized in that, described step S500 comprises:
S510, calculate alternative output vector corresponding to current alternative input vector;
S520, when described alternative output vector is between the predetermined output vector upper limit and output vector lower limit, retain this alternative output vector and corresponding alternative input vector as virtual sample;
S530, when described alternative output vector is not between the predetermined output vector upper limit and output vector lower limit, judge whether the calculation times of current alternative input vector exceeds predetermined threshold, if so, perform S540, otherwise perform S510;
S540, abandon current alternative input vector, using alternative for next one input vector as current alternative input vector, perform S510;
S550, after all alternative input vectors are all traversed, using retain all virtual samples as described virtual sample set.
6. virtual sample generation method according to claim 1, it is characterized in that, described virtual sample is used for and described authentic specimen together trains acquisition high dimension vector forecast model, and described high dimension vector forecast model is used for based on output vector corresponding to higher-dimension input vector prediction.
7. the virtual sample generation method according to any one of claim 1-6, is characterized in that, described input vector is sample vibration signals spectrograph and the sample acoustic signal frequency spectrum of grinding mill barrel;
Described output vector is mill load parameter;
Described virtual sample is for training mill load parameter soft-sensing model.
CN201510496474.4A 2015-08-13 2015-08-13 Virtual sample generation method Pending CN105046320A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510496474.4A CN105046320A (en) 2015-08-13 2015-08-13 Virtual sample generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510496474.4A CN105046320A (en) 2015-08-13 2015-08-13 Virtual sample generation method

Publications (1)

Publication Number Publication Date
CN105046320A true CN105046320A (en) 2015-11-11

Family

ID=54452849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510496474.4A Pending CN105046320A (en) 2015-08-13 2015-08-13 Virtual sample generation method

Country Status (1)

Country Link
CN (1) CN105046320A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279385A (en) * 2015-11-16 2016-01-27 中国人民解放军61599部队计算所 Mill load parameter soft measuring method based on virtual sample
CN107229930A (en) * 2017-04-28 2017-10-03 北京化工大学 A kind of pointer instrument numerical value intelligent identification Method and device
CN108764489A (en) * 2018-06-05 2018-11-06 北京百度网讯科技有限公司 Model training method based on virtual sample and equipment
CN110188828A (en) * 2019-05-31 2019-08-30 大连理工大学 A kind of image sources discrimination method based on virtual sample integrated study
CN110598243A (en) * 2019-07-26 2019-12-20 浙江大学 Virtual sample capacity expansion method based on historical data of mechanical product
CN110991064A (en) * 2019-12-11 2020-04-10 广州城建职业学院 Soil heavy metal content inversion model generation method and system, storage medium and inversion method
CN105447730B (en) * 2015-12-25 2020-11-06 腾讯科技(深圳)有限公司 Target user orientation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096843A (en) * 2011-01-25 2011-06-15 南京信息工程大学 Virtual sample-based KPCA (kernel principal component analysis) characteristic extraction method and mode identification method
CN103077288A (en) * 2013-01-23 2013-05-01 重庆科技学院 Small sample test data-oriented method for soft measurement and formula decision of multielement alloy material
CN104268593A (en) * 2014-09-22 2015-01-07 华东交通大学 Multiple-sparse-representation face recognition method for solving small sample size problem
CN104700076A (en) * 2015-02-13 2015-06-10 电子科技大学 Face image virtual sample generating method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096843A (en) * 2011-01-25 2011-06-15 南京信息工程大学 Virtual sample-based KPCA (kernel principal component analysis) characteristic extraction method and mode identification method
CN103077288A (en) * 2013-01-23 2013-05-01 重庆科技学院 Small sample test data-oriented method for soft measurement and formula decision of multielement alloy material
CN104268593A (en) * 2014-09-22 2015-01-07 华东交通大学 Multiple-sparse-representation face recognition method for solving small sample size problem
CN104700076A (en) * 2015-02-13 2015-06-10 电子科技大学 Face image virtual sample generating method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAN TANG等: "Modeling high dimensional frequency spectral data based on virtual sample generation technique", 《IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION》 *
于旭等: "虚拟样本生成技术研究", 《计算机科学》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279385A (en) * 2015-11-16 2016-01-27 中国人民解放军61599部队计算所 Mill load parameter soft measuring method based on virtual sample
CN105279385B (en) * 2015-11-16 2018-06-15 中国人民解放军61599部队计算所 A kind of mill load parameter soft measurement method based on virtual sample
CN105447730B (en) * 2015-12-25 2020-11-06 腾讯科技(深圳)有限公司 Target user orientation method and device
CN107229930A (en) * 2017-04-28 2017-10-03 北京化工大学 A kind of pointer instrument numerical value intelligent identification Method and device
CN107229930B (en) * 2017-04-28 2021-01-19 北京化工大学 Intelligent identification method for numerical value of pointer instrument
CN108764489A (en) * 2018-06-05 2018-11-06 北京百度网讯科技有限公司 Model training method based on virtual sample and equipment
CN110188828A (en) * 2019-05-31 2019-08-30 大连理工大学 A kind of image sources discrimination method based on virtual sample integrated study
CN110598243A (en) * 2019-07-26 2019-12-20 浙江大学 Virtual sample capacity expansion method based on historical data of mechanical product
CN110598243B (en) * 2019-07-26 2021-04-30 浙江大学 Virtual sample capacity expansion method based on historical data of mechanical product
CN110991064A (en) * 2019-12-11 2020-04-10 广州城建职业学院 Soil heavy metal content inversion model generation method and system, storage medium and inversion method

Similar Documents

Publication Publication Date Title
CN105046320A (en) Virtual sample generation method
CN105279385B (en) A kind of mill load parameter soft measurement method based on virtual sample
CN104932425B (en) A kind of mill load parameter soft measurement method
CN111126611B (en) High-speed traffic distribution simulation quantum computing method considering destination selection
CN102185735B (en) Network security situation prediction method
CN110162857A (en) A kind of flexible measurement method for surveying parameter towards complex industrial process difficulty
Mohammad Rezapour Tabari Prediction of river runoff using fuzzy theory and direct search optimization algorithm coupled model
Sadat et al. Semi-autogenous mill power prediction by a hybrid neural genetic algorithm
Lee et al. Predicting drag on rough surfaces by transfer learning of empirical correlations
Abyani et al. Predicting failure pressure of the corroded offshore pipelines using an efficient finite element based algorithm and machine learning techniques
De Falco et al. Model parameter estimation using Bayesian and deterministic approaches: the case study of the Maddalena Bridge
CN104866930A (en) Time series prediction method based on quantitative screening time series prediction model
Dumitriu et al. Modeling the signals collected in cavitation field by stochastic and Artificial intelligence methods
CN110489844A (en) One kind being suitable for the uneven large deformation grade prediction technique of soft rock tunnel
CN108647897A (en) A kind of method and system of product reliability analysis
Abuhasel Machine learning approach to handle data‐driven model for simulation and forecasting of the cone crusher output in the stone crushing plant
Tsaur et al. Tourism demand forecasting using a novel high-precision fuzzy time series model
Ahangar Asr et al. Air losses in compressed air tunnelling: a prediction model
Yang et al. Forecasting model for urban traffic flow with BP neural network based on genetic algorithm
Kouzehgar et al. Application of experimental data and soft computing techniques in determining the outflow and breach characteristics in embankments and landslide dams
Rahimian Measuring efficiency in DEA by differential evolution algorithm
Sire et al. Quantizing rare random maps: application to flooding visualization
Santosa et al. Computational of Concrete Slump Model Based on H2O Deep Learning framework and Bagging to reduce Effects of Noise and Overfitting
Le IDENTIFICATION OF DAMAGE IN STEEL BEAM BY NATURAL FREQUENCY USING XGB MODEL
Rusek et al. Assessment of technical condition of prefabricated large-block building structures located in mining area using the Naive Bayes classifier

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151111