CN110348624A

CN110348624A - A kind of classification of sandstorm intensity prediction technique based on Stacking Integrated Strategy

Info

Publication number: CN110348624A
Application number: CN201910598794.9A
Authority: CN
Inventors: 仁庆道尔吉; 张唯铭; 邱莹; 郑碧莹
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2019-07-04
Filing date: 2019-07-04
Publication date: 2019-10-18
Anticipated expiration: 2039-07-04
Also published as: CN110348624B

Abstract

A kind of classification of sandstorm intensity prediction technique based on Stacking Integrated Strategy, using Recognition with Recurrent Neural Network R and convolutional neural networks C as first-level class device, original weather sample data is inputted into Recognition with Recurrent Neural Network R and convolutional neural networks C respectively, obtains corresponding one level learning feature；Using Stacking Integrated Strategy, a meta classifier Q is introduced as secondary classifier, the one level learning feature group is incorporated as to the input of secondary classifier；Using the output of secondary classifier as the classification of sandstorm intensity amount finally predicted.The present invention has merged the time series data processing capacity of RNN and the high dimensional feature extractability of CNN, with wider pre- measuring angle and better generalization ability, default activation function selection can lift scheme flexibility and generalization ability, 1*1 convolution kernel replaces full articulamentum, more features can be integrated, better Generalization Capability is provided, using L2 regularization and Batch-Normalization or Dropout technology, the generalization ability and whole classifier prediction accuracy and precision of classifiers at different levels are improved.

Description

A kind of classification of sandstorm intensity prediction technique based on Stacking Integrated Strategy

Technical field

The invention belongs to field of computer technology, in particular to a kind of classification of sandstorm intensity based on Stacking Integrated Strategy Prediction technique.

Background technique

Sandstorm takes place frequently as a kind of natural calamity in arid and semi-arid lands, early in away from 7000 Wan Nianqian, on the earth Just there is sandstorm phenomenon.Since modern age, due to soil erosion, desertification of land, the environment reasons such as vegetation deterioration, China north Side, the especially sandstorm of northwest region occur quantity and obviously rise, and influence of the sandstorm to people's production and living is increasing.

Traditional weather forecast weather forecast is according to weather observation data, using synoptic meteorology, Dynamic Meteorology, statistics Principle and method, the weather conditions of certain period following to certain region or certain place make qualitative or quantitative prediction.Formerly In previous decades, weather forecasting techniques and mechanism have obtained tremendous development, but have arrived in the recent period, and conventional method does not obtain for a long time Qualitative leap.With weather informationization build it is more and more perfect, improve common meteorological and harmfulness weather prognosis accuracy rate by Gradually become the hot spot direction of related fields research now.Since the complexity of sandstorm occurrence cause and the quantity of meteorological data are huge Greatly, common neural network to this or is difficult to be fitted, or is difficult to extensive.

Summary of the invention

In order to overcome the disadvantages of the above prior art, the purpose of the present invention is to provide one kind integrates plan based on Stacking Classification of sandstorm intensity prediction technique slightly, is modeled and is predicted to sandstorm Meteorological Grade using integrated deep neural network, and Specifically using Recognition with Recurrent Neural Network R as a first-level class device, using convolutional neural networks C as another first-level class Device improves it as far as possible and predicts estimated performance to above-mentioned first-level class device, that is, improves its ability in feature extraction, and provide one It can the second level of high dimensional data feature extracted of the temporal aspect that extracts of first-level class device R above-mentioned compared with good utilisation and first-level class device C Meta classifier model, and a kind of effective, the integrated approach that the feature extracted can be made sufficiently to merge is provided.

To achieve the goals above, the technical solution adopted by the present invention is that:

A kind of classification of sandstorm intensity prediction technique based on Stacking Integrated Strategy, comprising:

Using Recognition with Recurrent Neural Network R and convolutional neural networks C as first-level class device, original weather sample data is distinguished defeated Enter Recognition with Recurrent Neural Network R and convolutional neural networks C, obtains corresponding one level learning feature；

Using Stacking Integrated Strategy, a meta classifier Q is introduced as secondary classifier, by the one level learning spy Sign group is incorporated as the input of secondary classifier；

Using the output of secondary classifier as the classification of sandstorm intensity amount finally predicted.

The original weather sample data obtains in the following way:

By " Chinese terrestrial climate data earning in a day data set " and " Chinese strong chromatic number sequence and its support data set " basis Date is integrated into a whole data set；

The whole data set is subjected to data cleansing, the data predictions such as attitude layer；

Pretreated data are subjected to timing arrangement, attribute is from left to right unfolded, and timing arranges from top to bottom, and is each A data stamp classification of sandstorm intensity label, finally obtain original weather sample data.

The present invention extracts the temporal aspect of original weather sample data using the Recognition with Recurrent Neural Network R, utilizes the volume Product neural network C extracts the high dimensional feature of original weather sample data, merges the excellent of two deep neural networks with meta classifier Point obtains good extensive prediction classification performance.

It can be specifically k sample set by the m original random cuttings of weather sample data, therefrom enumerate each sample set Close S_i, i≤k is respectively trained two first-level class devices, trained one using remaining sample set as training set Grade classifier basic mode type is expressed as C_iAnd R_i, by its corresponding sample set S_iCarry out classification of sandstorm intensity prediction, each basic mode type pair The predicted value of i-th of training sample set will concentrate a characteristic value of i-th of sample as new samples, and by all features Value is combined into new feature samples, finally carries out the training of secondary classifier as training set using this feature sample；For pre- Survey process first is predicted to form feature samples collection, finally be predicted again feature samples collection by all first-level class devices, to obtain To better prediction effect.

The feedforward neural network of the first-level class device and secondary classifier carries out information propagation by following formula:

a^(l)=f_l(W^(l)·a^(l-1)+b^(l))

Wherein, a^(l)Indicate the output of l layers of neuron, f_lIndicate the activation primitive of l layers of neuron, W^(l)Indicate that l-1 layers are arrived l The weight matrix of layer, b^(l)Indicate l-1 layers to l layers of biasing；

The classification layer of the first-level class device and secondary classifier is using Softmax as output function, Softmax function Formula is as follows:

Wherein, j=1 ..., K, K here represent class categories number, z_jFor vector, i.e., for by the upper one layer of generation of classification layer , need to input the vector data of Softmax function；

Sample vector x belongs to the probability of j-th of classification are as follows:

x^TThe transposition of x is represented, w represents weight, and k is the parameter in sum formula, and numerical value is 1 to K；Recognition with Recurrent Neural Network R Cycling element activation primitive use tanh (hyperbolic tangent function), the classification layer of each classifier uses Softmax activation primitive, Each classifier rest part (the full articulamentum of secondary classifier, convolutional layer, pond layer, the full articulamentum and one of first-level class device C The full articulamentum of grade classifier R) activation primitive default using parametrization amendment linear unit (Parametric Rectified Linear Unit, PReLU), formula are as follows:

Wherein, x is the data of input；α is adjustable constant, is to be obtained by neural network learning, if study obtains α=0, then PReLU degenerate for amendment linear unit (Rectified Linear UnitReLU)；If α is a very little Fixed value, then PReLU degenerates for band leakage amendment linear unit (Leaky ReLU, LReLU)；

The entirety instruction of the first-level class device and secondary classifier using cross entropy as its cost function, for model Practice, cross entropy description are as follows:

Wherein P and Q is two given probability distribution, the i.e. probability distribution of prediction label and true tag；Due to sand and dust Sudden and violent label is discrete distribution ,-E_X~PIt is equivalent to-∑_xP (x), P (x) are used to describe the true distribution of sample, and Q (x) is for indicating pre- The distribution of survey；

Sample is independently distributed in the neural network of the first-level class device and second level meta classifier, and cross entropy uses Maximum likelihood principle, i.e.,

WhereinIt is i-th of sample input data x⁽ⁱ⁾On output, i.e., prediction label vector, n are each trained batches The number of middle sample, n are the subsets of m, and each batch training n is a part in m, y⁽ⁱ⁾It is the sandstorm mark of i-th of sample Sign vector, x⁽ⁱ⁾It is the input data of i-th of sample, θ is represented the distribution parameter in maximal possibility estimation, i.e., inputted using sample Data are distributed the parameter value estimated, p (y according to sample label⁽ⁱ⁾|x⁽ⁱ⁾；The maximum likelihood for θ) representing every a sample is estimated Meter is accumulated it as whole maximum likelihood, and σ represents the standard deviation for the sample label distribution for needing to estimate；

Use accuracy rate, precision, recall rate, F1 Score index comprehensive as first-level class device and secondary classifier model Performance metric is closed, wherein F1 Score is the harmonic-mean of accurate rate and recall rate.

The Recognition with Recurrent Neural Network R is the depth RNN of multilayer, using gate recursive unit (Gated Recursive Unit, GRU) long-term Dependence Problem present in tradition RNN is solved, and full articulamentum is equally replaced by 1*1 convolutional layer, it carries out The stabilization and feature integration of model, the activation primitive in GRU unit use tanh, other layer activation primitive of the model except classification layer Default uses PReLU, and it is excessively quasi- to use batch normalization (Batch-Normalization, BN) and L2 regularization method to reduce it It closes, increases generalization；

The convolutional neural networks C is the depth CNN of multilayer, obtains local feature information by convolution kernel, passes through pond Layer carries out down-sampling, and the effect of down-sampling is characterized dimensionality reduction, and the quantity of compressed data and parameter reduces over-fitting, improves simultaneously The fault-tolerance of model；Full articulamentum is replaced by 1*1 convolutional layer, carries out the stabilization and feature integration of model, and use batch normalizing Change (Batch-Normalization, BN) and L2 regularization method reduces its over-fitting, increases generalization；

The meta classifier Q is the full Connection Neural Network of a multilayer, uses Dropout (DP) and L2 regularization method Over-fitting is reduced, full articulamentum is replaced using 1*1 convolutional layer, carries out the stabilization and feature integration of model, i.e. meta classifier Q essence It is 1*1, the convolutional neural networks that multilayer convolution stacks for convolution kernel size.

It is described that full articulamentum is replaced using 1*1 convolutional layer, to realize interaction and information integration across channel, and rolled up The dimensionality reduction and liter dimension of product core port number, the storage form of each sample is identical as gray scale picture, i.e., each sample has one feature map。

Described batch of normalization (Batch-Normalization, BN) reduces over-fitting method, is enabled every by using BN The activation of a neuron becomes to meet Gaussian Profile, i.e. neuron is usually medium active, sometimes somewhat active, rare very living Jump, the algorithm description of BN are as follows:

M is the sample size in a batch (batch)；X represents input sample；u_BRepresent the equal of sample in this batch Value；For the variance of sample in this batch；Represent the sample data after normalization；γ is scale factor；β is translation The factor；y_iTo operate finally obtained data by batch normalization (bn).

I.e. BN step is broadly divided into 4 steps

(1) mean value of each training lot data is sought；

(2) variance of each training lot data is sought；

(3) training data of the batch is normalized using the mean value and variance that acquire, obtains 0-1 distribution；

(4) change of scale and offset: by x_iNumerical values recited is adjusted multiplied by γ, obtains y after increasing offset along with β_i, γ is Scale factor, β are shift factors, due to the x after normalization_iSubstantially can be limited under normal distribution, so that the expression of network Ability decline, to solve this problem, introduce two new parameters: γ, beta, gamma and β are that network oneself study obtains in training 's.

The L2 regularization method is the quadratic sum that weight parameter is added on the basis of original loss function, L2 regularization Loss function later indicates are as follows:

Wherein, w is classifier network model parameter, E_inIt (w) is the training sample error for not including regularization term, λ is just Then change parameter；

According to above-mentioned formula, the gradient of L (w) is expressed as:

Dropout (DP) method be when neural network propagated forward, allow the activation value of some neuron with Certain Probability p stops working, so that model in each training batch, will not too depend on certain parts of lot data Feature.

Compared with prior art, the beneficial effects of the present invention are:

(1) traditional method of meteorological forecast uses synoptic meteorology, meteorologic dynamics, and the methods of statistics carries out weather pre- It surveys, to common weather, such as precipitation, temperature etc. has relatively good prediction effect.But sandstorm meteorology is a kind of special weather Phenomenon, needs to consider the meteorologic factor of various aspects, and traditional weather forecast method predicts sandstorm, will consume a large amount of computing resource And human resources.Since deep neural network has feature extraction, the advantage in terms of time series modeling, using deep learning modeling Form predicted, more flexible efficient can use data resource and computing resource.Because its statistical forecast angle is more wide in range, So can also be used as an effective compensation process of traditional meteorological prediction.

(2) compared with the technology for using single deep neural network, present invention employs Stacking integrated technologies, and select With RNN and CNN respectively as its first-level class device, the time series data processing capacity of fusion RNN and the higher-dimension of CNN can be very good Ability in feature extraction has wider pre- measuring angle and better generalization ability.

(3) using PReLU (parametrization amendment linear unit), ReLU is compared default activation function, can pass through network science It practises, automatically selects and be degenerated to ReLU or retain the opposite preferable parameter alpha of classifying quality, it can be with the flexibility of lift scheme And generalization ability.

(4) technology that 1*1 convolution kernel replaces full articulamentum is used, net can be allowed in the case where not increasing receptive field Network is deepened, and introduces more non-linear neural members, can integrate more features, provide better Generalization Capability, improve prediction Comprehensive performance.

(5) Batch-Normalization and L2 Regularization Technique is used in first-level class device, is classified in second level member Dropout and L2 Regularization Technique is used in device, improves the generalization ability of classifiers at different levels, it is pre- to improve whole classifier Survey accuracy and precision.

Detailed description of the invention

Fig. 1 is that the present invention is based on Stacking Integrated Strategy neural network flow charts.

Fig. 2 is grid type timing meteorological data schematic diagram.

Fig. 3 is feedforward neural network schematic diagram.

Fig. 4 is ReLU, PReLU and tanh activation primitive schematic diagram.

Fig. 5 is first-level class device R feature extraction flow chart.

Fig. 6 is RNN recursive unit expanded view.

Fig. 7 is GRU door control mechanism figure.

Fig. 8 is convolution sum pond operation diagram.

Fig. 9 is that 1*1 convolution kernel replaces full attended operation figure.

Figure 10 is Dropout contrast schematic diagram.

Specific embodiment

The embodiment that the present invention will be described in detail with reference to the accompanying drawings and examples.

((Recurrent Neural Network, RNN) is one of deep learning model to Recognition with Recurrent Neural Network.This Neural network is commonly used in processing sequence data.The space-time characterisation and periodicity having due to meteorological data, the present invention Recognition with Recurrent Neural Network will be used as its a first-level class device, and use gate recursive unit (Gated Recursive Unit, GRU) it solves the problems, such as to rely on for a long time in tradition RNN, analysis and pre- is carried out to collected sandstorm Meteorological series data It surveys.

Convolutional neural networks (Convolutional Neural Network, CNN) in terms of high dimensional data feature extraction, Generally there is better effect.Convolutional neural networks are a kind of nerves for being specifically used to handle and having the data of similar network Network, in many fields, such as image procossing, is all excellent in since ability in feature extraction is strong.In meteorological data, entirety It is a time series, and in a certain data, there are multiple meteorological attributes, is typical high dimensional data.In face of meteorological number According to the classification of sandstorm intensity that can not only go prediction following from temporal contextual information can also be from the meteorologic factor of higher-dimension It is middle to extract useful information.Based on the above advantage, use CNN as its second first-level class device in the present invention.

That is, a kind of classification of sandstorm intensity prediction technique based on Stacking integrated neural network of the present invention, utilizes circulation mind The temporal aspect that original weather data is extracted through network extracts the high dimensional feature of original weather using convolutional neural networks, utilizes Stacking Integrated Strategy, obtains good extensive prediction and classifies the advantages of merging two deep neural networks with meta classifier Performance.

Performance metric requirement: for each classifier, classification accuracy, recall rate, precision, the synthesis of F1 Score composition Performance is as high as possible.

Specifically as shown in Figure 1, a fraction of the present invention in convolutional neural networks C and Recognition with Recurrent Neural Network R as sandstorm On the basis of class device, and Stacking Integrated Strategy is utilized, introduce a second level meta classifier device Q, it is more extensive accurate to obtain Prediction result.

Above-mentioned first-level class device convolutional neural networks C and Recognition with Recurrent Neural Network R main function are, by original weather sample number According to its own network is inputted respectively, its level-one learning characteristic is obtained, and this level-one feature is combined into secondary classifier Q's Input, R are primarily upon the temporal aspect of original weather sample data, and C is primarily upon the high dimensional feature of original weather sample data.

Raw sample data is grid type meteorological data by data prediction, based on timing arrangement, wherein the time It arranges from top to bottom, attribute from left to right arranges, as shown in Fig. 2, W represents the time span of time series data, unit is day, L generation The attribute number of table meteorological data, as shown in the figure is W days, the time series data with L meteorological attribute.

Raw sample data is obtained especially by such as under type:

The whole data set is subjected to data cleansing, the data predictions such as attitude layer, wherein attribute refers to meteorology Department's acquisition such as wind speed, the meteorology attribute such as sunshine time；

Wherein timing refers to the time sequencing that meteorological attribute data is recorded by unified metric, such as in a certain sample, number It is June 1 according to the time that record starts, the Close Date is June 15, then in this sample, timing is 1-June 15 in June Day, 15 days time sequencings in total.Classification of sandstorm intensity label falls into 5 types, in addition not occurring according to national standard according to visibility The classification of sandstorm is specifically defined as { 0,1,2,3,4,5 }, and the smaller higher grade of number, i.e., 0 is Severe Sand-Dust Storms, and 5 be not send out Raw sandstorm.For example, the time that data record starts is June 1 in a certain sample, the Close Date is June 15, grade Label is classification of sandstorm intensity on June 16, with this achieve the effect that former 15 days data predict the 16th day classification of sandstorm intensity ( It can label it as the grade on other dates more posteriorly).

Specifically, it can be k sample set by the m original random cuttings of weather sample data, therefrom enumerate each sample This set S_i, i≤k is respectively trained two first-level class devices, trains using remaining sample set as training set First-level class device R and C have k basic mode type { R } and { C } respectively, wherein each basic mode type is represented by R_iAnd C_i, then each Basic mode type has a training sample set S_iNot as its training sample, and as its forecast sample, by its corresponding sample Set S_iClassification of sandstorm intensity prediction is carried out, each basic mode type will be used as new samples collection to the predicted value of i-th of training sample set In i-th of sample a characteristic value collection, and all eigenvalue clusters are synthesized to new feature samples, combination is identical (the basic mode type of same type refers to all R to the base model feature value of type by rows_iFor same type, all C_iIt is identical Type), it is unfolded by the characteristic value that same sample set is predicted by column, is finally carried out using this feature sample as training set The training of second level meta classifier；For predicting process, first respectively by all first-level class device base MODEL Cs_iAnd R_iTest set is carried out Prediction, it is hereby achieved that k prediction result set, prediction result set is voted and (select most classes), to be formed Feature samples forecast set, specific combination is that resulting feature is voted by same type basic mode type by rows, by same The characteristic value that this prediction obtains is unfolded by column, thus the available feature with features described above sample training collection same form of mode Sample predictions collection finally recycles second level meta classifier to predict it, to obtain preferably predicting classifying quality.

The feedforward neural network of first-level class device and secondary classifier of the present invention is as shown in figure 3, the structure has following characteristics:

(1) it is interconnected completely between every layer of neuron and next layer of neuron

(2) there is no same layers to connect between neuron

(3) parallel link is not present between neuron

Wherein the Neuren in hidden layer and output layer represents neuron, and " feedforward " refers to not depositing in network topology structure In ring or circuit.

The feedforward neural network of first-level class device and secondary classifier carries out information propagation by following formula:

z^(l)=W^(l)·a^(l-1)+b^(l)

a^(l)=f_l(z^(l))

Two formulas are merged:

a^(l)=f_l(W^(l)·a^(l-1)+b^(l))

Wherein, a^(l)Indicate the output of l layers of neuron, f_lIndicate the activation primitive of l layers of neuron, W^(l)Indicate that l-1 layers are arrived l The weight matrix of layer, b^(l)Indicate l-1 layers to l layers of biasing.

The wherein full articulamentum of second level meta classifier, the convolutional layer of first-level class device C, pond layer, full articulamentum and R it is complete The activation primitive of articulamentum is all made of PReLU (parametrization amendment linear unit), and the cycling element activation primitive in R uses tanh The classification layer of (hyperbolic tangent function), all classifiers all uses Softmax activation primitive.

Wherein, x is the data of input；α is adjustable constant, is to be obtained by neural network learning, if study obtains α=0, then PReLU degenerate for amendment linear unit (Rectified Linear UnitReLU)；If α is a very little Fixed value (such as α=0.01), then PReLU degenerate for band leakage amendment linear unit (Leaky ReLU, LReLU).Relative to The factor alpha of ReLU, PReLU negative loop is according to data come fixed, rather than is defined previously as 0, and it is higher that this has model Capability of fitting；Relative to LReLU, PReLU is more flexible to obtain factor alpha by training study, and merely adds minimal amount of Parameter also means which only adds a small amount of, the risk of negligible calculation amount and over-fitting.

The formula of tanh activation primitive are as follows:

Wherein, x is the data of input.

The reason of selecting PReLU, comparing tanh in gradient decline has a faster convergence rate, and computing cost compared with Small, relative to ReLU, PReLU can be automatically selected to be degenerated to ReLU or retain an opposite classification and be imitated by e-learning The preferable parameter alpha of fruit, can be with the flexibility of lift scheme and generalization ability；And generally swashed using tanh in RNN cycling element Function living, PReLU and tanh schematic diagram are as shown in Figure 4.

For above-mentioned first-level class device R and C when carrying out feature extraction, input layer is using original weather data x as first layer Input a⁽⁰⁾It substitutes into f (w, a, b), exports a using multilayer hidden layer^(l)As for entire function output vector, then should Output vector is inputted as the input vector x of Softmax activation primitive, and Softmax function will export the prediction after normalizing Probability.

That is, the classification layer of first-level class device and secondary classifier uses Softmax as output function, not with cost function Together, classification layer can obtain prediction result, and Softmax function formula is as follows:

Wherein, j=1 ..., K, K here represent class categories number, are 6 in the present invention；z_jFor vector, i.e., for by point The upper one layer of generation of class layer, need to input the vector data of Softmax function.

Softmax function is actually the log of gradient normalization of finite term discrete probability distribution.Particularly, in the present invention Multinomial logistic regression and linear discriminant analysis in, the input of function is obtained from K different linear functions as a result, and sample This vector x belongs to the probability of j-th of classification are as follows:

This can be considered compound, the x of K linear function Softmax function^TThe transposition of x is represented, w represents weight, and k is Parameter in sum formula, numerical value are 1 to K.

First-level class device and secondary classifier use cross entropy as its cost function (also known as loss function, loss), use In the entirety training of model, cross entropy description are as follows:

Wherein P and Q is two given probability distribution, the i.e. probability distribution of prediction label and true tag；Due to sand and dust Sudden and violent label is discrete distribution ,-E_X~PIt is equivalent to-∑_xP (x), P (x) are used to describe the true distribution of sample, such as [1,0,0,0] Indicate that this sample x belongs to the first kind, and Q (x) is used to indicate the distribution of prediction, such as [0.7,0.1,0.1,0.1], to represent this sample The probability that this x belongs to the first kind is 0.7.

Sample is independently distributed in the neural network of first-level class device and second level meta classifier, and cross entropy uses maximum Likelihood principle, i.e.,

WhereinIt is i-th of sample input data x⁽ⁱ⁾On output, i.e., prediction label vector (label use one-hot Coding), n is the number of sample in each trained batch, and n is the subset of m, and each batch training n is a part in m, y⁽ⁱ⁾ It is the sandstorm label vector (one-hot coding) of i-th of sample, x⁽ⁱ⁾It is the input data of i-th of sample, θ represents maximum seemingly So distribution parameter in estimation, that is, use sample input data, and the parameter value estimated, p (y are distributed according to sample label⁽ⁱ⁾| x⁽ⁱ⁾；The maximal possibility estimation for θ) representing every a sample, taking logarithm and is not influenced as a result, accumulating it for convenience of calculation As whole maximum likelihood, σ represent the standard deviation for the sample label distribution for needing to estimate, i.e., in this formula, the need of levoform The distribution parameter θ to be estimated is the σ of right formula；

For above-mentioned all secondary classifiers, accuracy rate, precision, recall rate, F1Score index conduct are all used Comprehensive performance measurement, if:

TP (True Positive): being judged as positive sample, in fact and positive sample.

TN (True Negative): being judged as negative sample, in fact and negative sample.

FP (False Positive): it is judged as positive sample, but is in fact negative sample.

Accuracy rate formula are as follows:

Accuracy=(TP+TN)/(TP+TN+FN+FP)/100

Accuracy formula are as follows:

Precision=TP/ (TP+FP)/100

Recall rate formula are as follows:

Recall=TP/ (TP+FN)/100

F1 Score, also known as balance F score (balanced F Score), it is defined as the tune of accurate rate and recall rate And average, formula are as follows:

First-level class device R and C, the performance metric of itself is not the measurement to overall model performance, but is mentioned to its feature Take the measurement of ability, that is to say, that the height of R and C performance does not reflect the performance of whole disaggregated model directly.

It is predicted by the training of first-level class device, the available weather meteorological data feature based on first-level class device, this By the input as second level meta classifier, the specific feature extraction process of first-level class device R is as shown in figure 5, first-level class device C is special Sign extracts process and R similarly.Global feature extracts detailed process and can be described as:

(a) for model 1 (being first-level class device R in the present invention), training set is divided into k parts, for every portion, with residue Then data set training pattern predicts this result.

(b) repeat previous step, until it is every it is a all predict, obtain one of the training set of secondary classifier model Point.

(c) k parts of test set predicted values are obtained, a part of the test set of meta classifier model is obtained after average rounding.

(d) for model 2 (first-level class device C) repeat above step, obtain entire meta classifier model training set and Test set.

(e) meta classifier Q model is trained and is predicted.

Traditional RNN expansion as shown in fig. 6, first-level class device Recognition with Recurrent Neural Network R of the present invention for multilayer depth RNN, Long-term Dependence Problem existing for tradition RNN is solved using gate recursive unit (Gated Recursive Unit, GRU), GRU is A kind of variant of traditional RNN, which introduce door control mechanisms, respectively update door and resetting door.The door control mechanism of GRU such as Fig. 7 institute Show, the z in figure_tAnd r_tIt respectively indicates and updates door and resetting door.It updates door and is brought into for controlling the status information of previous moment Degree in current state, update door the bigger status information for illustrating previous moment of value bring into it is more.It is previous to reset door control How many information of state is written to current Candidate SetOn, resetting door is smaller, and the information of previous state is written into fewer. The formula that GUR is propagated forward are as follows:

r_t=σ (W_r·[h_t-1,x_t] (1)

z_t=σ (W_z·[h_t-1,x_t] (2)

y_t=σ (W₀·h_t) (5)

Wherein [] indicates that two vectors are connected, and the product of * representing matrix, Recognition with Recurrent Neural Network R is replaced by 1*1 convolutional layer Full articulamentum carries out the stabilization and feature integration of model, wherein the tanh in (3) is the activation primitive of GRU unit default, it can be with Other activation primitives are substituted for, such as correct linear unit (RELU).Model is defaulted except other layer of activation primitive of classification layer to be used PReLU, and its over-fitting is reduced using batch normalization (Batch-Normalization, BN) and L2 regularization method, increase general The property changed.

First-level class device convolutional neural networks C of the present invention is the depth CNN of multilayer, and the function using convolutional layer is to input Data carry out feature extraction, and internal includes multiple convolution kernels, and each element for forming convolution kernel corresponds to a weight coefficient With a departure (bias vector), similar to the neuron (Neuron) of a feedforward neural network.It is each in convolutional layer Neuron is all connected with multiple neurons in the region being closely located in preceding layer, and the size in region depends on the big of convolution kernel It is small, also referred to as " receptive field (receptive field) ".Local feature information is obtained by convolution kernel, is carried out by pond layer Down-sampling, the effect of down-sampling are characterized dimensionality reduction, and the quantity of compressed data and parameter reduces over-fitting, while improving model Fault-tolerance；Full articulamentum is replaced by 1*1 convolutional layer, carries out the stabilization and feature integration of model, and use batch normalization (Batch-Normalization, BN) and L2 regularization method reduce its over-fitting, increase generalization.

Specifically, convolution kernel at work, understands regularly inswept input feature vector, does square to input feature vector in receptive field Array element element multiplication sums and is superimposed departure.After convolutional layer carries out feature extraction, the characteristic pattern of output can be passed to pond Layer carries out feature selecting and information filtering.Pond layer includes presetting pond function, and function is by a single point in characteristic pattern Result replace with the characteristic pattern statistic of its adjacent area.Pond layer choosing takes pond region and convolution kernel scanning feature figure step It is identical, You Chihua size, step-length and filling control.Convolution and pondization operation are as shown in Figure 8.

Meta classifier Q is the full Connection Neural Network of a multilayer, is reduced using Dropout (DP) and L2 regularization method Over-fitting replaces full articulamentum using 1*1 convolutional layer, carries out the stabilization and feature integration of model, i.e. meta classifier Q is in the nature volume Product core size is 1*1, the convolutional neural networks that multilayer convolution stacks.

First-level class device R, C and secondary classifier of the present invention need complete after passing through convolution or GRU operation respectively The characteristic synthetic that articulamentum extracts front.In the present invention, 1*1 convolution kernel will be used to replace full articulamentum, to It realizes interaction and information integration across channel, and carries out the dimensionality reduction of convolution kernel port number and rise dimension.In each convolutional layer, data are all Be it is existing in three dimensions, it can be regarded as many a 2-D datas and stacked, wherein each 2-D data is known as One feature map.For image data, in input layer, if it is gray scale picture, that is with regard to only one feature map；It is exactly generally 3 feature map (RGB) if it is color image.Single channel feature map is rolled up with monokaryon Product operation, as multiplied by a parameter, for multi-kernel convolution multi-channel operation, needs to realize that multiple feature map's is linear Combination.In the present invention, the storage form of each sample is identical as gray scale picture, i.e., each sample has a feature map. In numerical operation angle, convolution is a dot product operation with connecting entirely, and difference is that convolution is to act on the area of a part Domain, and connecting entirely is to expand as entirely inputting by the region that convolution acts on for entirely inputting, 1*1 convolution kernel, that is, effective Instead of full articulamentum, and better Generalization Capability is provided, improves prediction comprehensive performance.

As shown in figure 9, network first tier has 5 neurons in figure, it is a1-a5 respectively, by becoming 3 after connecting entirely It is a, it is b1-b3 respectively, i.e., the 5 of first layer neuron will realize full connection with 3 below, and it is complete only to have drawn a1-a5 in this figure It is connected to the signal of b1, it can be seen that b1 is the weighted sum of 5 neurons in front, corresponding weight in fact in full articulamentum For W1-W5；When replacing full connection using 1*1 convolution kernel, 5 neurons of first layer are equivalent in input feature vector in fact Port number: 5, and 3 neurons of the second layer are equivalent to the new feature port number after 1*1 convolution: 3, W1-W5 can be with It is considered as the weight coefficient of convolution kernel, the 1*1 convolution kernel for replacing full attended operation can be constructed by above-mentioned data.

1*1 convolution kernel, and it is known as Webweb (Network in Network), relative to full connection, 1*1 convolution kernel has Following characteristics:

(1) in the case where not increasing receptive field, network is allowed to deepen, introduces more non-linear neural members.

(2) interaction and information integration across channel are realized, in the present invention, it can integrate more features.

(3) the liter peacekeeping dimensionality reduction of convolution kernel port number is carried out.

Wherein, it criticizes normalization (Batch-Normalization, BN) and reduce over-fitting method, is enabled every by using BN The activation of a neuron becomes to meet Gaussian Profile, i.e. neuron is usually medium active, sometimes somewhat active, rare very living Jump.In the standardization of bn, covariant offset be it is constant, the needs of Gaussian Profile are unsatisfactory for, because subsequent layer must be protected Hold the variation for adapting to distribution pattern.The essence of bn is exactly to become variance size and mean location using optimization, so that new point Cloth more suits the true distribution of data, guarantees non-linear expression's ability of model.The algorithm description of BN is as follows:

I.e. BN step is broadly divided into 4 steps

(1) mean value of each training lot data is sought；

(2) variance of each training lot data is sought；

The present invention will use batch normalization before the activation primitive of all hidden layers of first-level class device (without classification layer).

L2 regularization method is the quadratic sum that weight parameter is added on the basis of original loss function, after L2 regularization Loss function indicate are as follows:

According to above-mentioned formula, the gradient of L (w) is expressed as:

And the method that Dropout (DP) solves meta classifier Q over-fitting, refer to when neural network propagated forward, It allows the activation value of some neuron to stop working with certain Probability p (present invention takes p=50%), model can be made every in this way In secondary trained batch, certain local features of lot data will not be too depended on, this mode can subtract to a certain extent Few interneuronal interaction, keeps model generalization stronger.As shown in Figure 10, the left side is not use for Dropout signal The neural network of Dropout operation, the right are using the neural network after Dropout operation, it can be seen that the nerve net on the right The some neurons of network can be when the data of some batch pass through, temporary inactivation.

Below with reference to embodiment, invention is further explained.

The original weather data being collected into is subjected to data prediction, is integrated into grid type time series data as shown in Figure 2, And it is divided into training set and test set.Training set is sequentially input to the training that model is carried out in above-mentioned first-level class device R and C, root According to the feature generating mode of Fig. 5, the temporal aspect and high dimensional feature of original weather data are extracted, and as input, to two Grade meta classifier is trained；In model construction, Batch-Normalization, Dropout, L2 regularization, 1*1 are used Convolution kernel replaces full attended operation etc. to increase the method for model generalization ability, finally obtains with C and R Common advantages, has The integrated classifier of more preferable generalisation properties.

When classifying prediction, test set is input in the R and C in integrated classifier, extracts its feature, inputs member classification Device Q finally obtains the classification of sandstorm intensity amount of prediction, and the classification performance of integrated classifier is measured with this.

In actual prediction, collection obtains relevant Weather property, is integrated into corresponding sample form, is input to one Feature is extracted in grade classifier, then feature is inputted into meta classifier, the sand-dust storm forecast grade of future time instance can be obtained.

Claims

1. a kind of classification of sandstorm intensity prediction technique based on Stacking Integrated Strategy characterized by comprising

Using Recognition with Recurrent Neural Network R and convolutional neural networks C as first-level class device, original weather sample data is inputted respectively and is followed Ring neural network R and convolutional neural networks C obtains corresponding one level learning feature；

Using Stacking Integrated Strategy, a meta classifier Q is introduced as secondary classifier, by the one level learning feature group It is incorporated as the input of secondary classifier；

2. according to claim 1 based on the classification of sandstorm intensity prediction technique of Stacking Integrated Strategy, which is characterized in that institute Original weather sample data is stated to obtain in the following way:

By " Chinese terrestrial climate data earning in a day data set " and " Chinese strong chromatic number sequence and its support data set " according to the date It is integrated into a whole data set；

Pretreated data are subjected to timing arrangement, attribute is from left to right unfolded, and timing arranges from top to bottom, and is each number According to classification of sandstorm intensity label is stamped, original weather sample data is finally obtained.

3. according to claim 1 based on the classification of sandstorm intensity prediction technique of Stacking Integrated Strategy, which is characterized in that benefit The temporal aspect that original weather sample data is extracted with the Recognition with Recurrent Neural Network R is extracted former using the convolutional neural networks C The high dimensional feature of beginning weather sample data.

4. according to claim 1 based on the classification of sandstorm intensity prediction technique of Stacking Integrated Strategy, which is characterized in that will The m original random cuttings of weather sample data are k sample set, therefrom enumerate each sample set S_i, i≤k will be remaining Sample set as training set, two first-level class devices are trained respectively, trained first-level class device base model table It is shown as C_iAnd R_i, by its corresponding sample set S_iClassification of sandstorm intensity prediction is carried out, each basic mode type is to i-th of training sample set The predicted value of conjunction will concentrate a characteristic value of i-th of sample as new samples, and all eigenvalue clusters are synthesized to new spy Sample is levied, finally carries out the training of secondary classifier as training set using this feature sample；For predicting process, first by owning First-level class device is predicted to form feature samples collection, finally be predicted again feature samples collection, to obtain preferably predicting effect Fruit.

5. according to claim 4 based on the classification of sandstorm intensity prediction technique of Stacking Integrated Strategy, which is characterized in that institute The feedforward neural network for stating first-level class device and secondary classifier carries out information propagation by following formula:

a^(l)=f_l(W^(l)·a^(l-1)+b^(l))

Wherein, a^(l)Indicate the output of l layers of neuron, f_lIndicate the activation primitive of l layers of neuron, W^(l)Indicate l-1 layers to l layers Weight matrix, b^(l)Indicate l-1 layers to l layers of biasing；

The classification layer of the first-level class device and secondary classifier is using Softmax as output function, Softmax function formula It is as follows:

Wherein, j=1 ..., K, K here represent class categories number, z_jIt is needed for vector that is, for what is generated by upper one layer of classification layer Input the vector data of Softmax function；

x^TThe transposition of x is represented, w represents weight, and k is the parameter in sum formula, and numerical value is 1 to K；The circulation of Recognition with Recurrent Neural Network R Unit activating function uses tanh, and the classification layer of each classifier uses Softmax activation primitive, and each classifier rest part swashs Function default living is using parametrization amendment linear unit (Parametric Rectified Linear Unit, PReLU), formula Are as follows:

Wherein, x is the data of input；α be adjustable constant, be obtained by neural network learning, if study obtain α= 0, then PReLU degenerates for amendment linear unit (Rectified Linear UnitReLU)；If α is the fixation of a very little Value, then PReLU degenerates for band leakage amendment linear unit (Leaky ReLU, LReLU)；

The first-level class device and secondary classifier are handed over as its cost function for the entirety training of model using cross entropy Pitch entropy description are as follows:

Wherein P and Q is two given probability distribution, the i.e. probability distribution of prediction label and true tag；Due to sandstorm mark Label are discrete distribution ,-E_X~PIt is equivalent to-∑_xP (x), P (x) are used to describe the true distribution of sample, and Q (x) is used to indicate prediction Distribution；

WhereinIt is i-th of sample input data x⁽ⁱ⁾On output, i.e., prediction label vector, n are samples in each trained batch This number, n is the subset of m, and each batch training n is a part in m, y⁽ⁱ⁾Be i-th of sample sandstorm label to Amount, x⁽ⁱ⁾It is the input data of i-th of sample, θ represents the distribution parameter in maximal possibility estimation, that is, sample input data is used, The parameter value estimated, p (y are distributed according to sample label⁽ⁱ⁾|x⁽ⁱ⁾；The maximal possibility estimation for θ) representing every a sample, will It is whole maximum likelihood that it is cumulative, and σ represents the standard deviation for the sample label distribution for needing to estimate；

Use accuracy rate, precision, recall rate, F1 Score index comprehensive as first-level class device and secondary classifier model It can measure, wherein F1 Score is the harmonic-mean of accurate rate and recall rate.

6. according to claim 1 based on the classification of sandstorm intensity prediction technique of Stacking Integrated Strategy, which is characterized in that institute The depth RNN that Recognition with Recurrent Neural Network R is multilayer is stated, is solved using gate recursive unit (Gated Recursive Unit, GRU) Long-term Dependence Problem present in traditional RNN, and equally by 1*1 convolutional layer replace full articulamentum, carry out model stabilization and Feature integration, the activation primitive in GRU unit use tanh, and model is defaulted except other layer of activation primitive of classification layer to be used PReLU, and its over-fitting is reduced using batch normalization (Batch-Normalization, BN) and L2 regularization method, increase general The property changed；

The convolutional neural networks C be multilayer depth CNN, by convolution kernel obtain local feature information, by pond layer into Row down-sampling, the effect of down-sampling are characterized dimensionality reduction, and the quantity of compressed data and parameter reduces over-fitting, while improving model Fault-tolerance；Full articulamentum is replaced by 1*1 convolutional layer, carries out the stabilization and feature integration of model, and use batch normalization (Batch-Normalization, BN) and L2 regularization method reduce its over-fitting, increase generalization；

The meta classifier Q is the full Connection Neural Network of a multilayer, is reduced using Dropout (DP) and L2 regularization method Over-fitting replaces full articulamentum using 1*1 convolutional layer, carries out the stabilization and feature integration of model, i.e. meta classifier Q is in the nature volume Product core size is 1*1, the convolutional neural networks that multilayer convolution stacks.

7. according to claim 6 based on the classification of sandstorm intensity prediction technique of Stacking Integrated Strategy, which is characterized in that institute It states and full articulamentum is replaced using 1*1 convolutional layer, to realize interaction and information integration across channel, and carry out convolution kernel port number Dimensionality reduction and rise dimension, the storage form of each sample is identical as gray scale picture, i.e., each sample has a feature map.

8. according to claim 6 based on the classification of sandstorm intensity prediction technique of Stacking Integrated Strategy, which is characterized in that institute It states batch normalization (Batch-Normalization, BN) and reduces over-fitting method, be that each neuron is enabled by using BN Activation becomes to meet Gaussian Profile, i.e. neuron usually medium active, sometimes somewhat active, rare very active, the algorithm of BN It is described as follows:

M is the sample size in a batch (batch)；X represents input sample；u_BRepresent the mean value of sample in this batch；For the variance of sample in this batch；Represent the sample data after normalization；γ is scale factor；β is shift factor； y_iTo operate finally obtained data by batch normalization (bn).

I.e. BN step is broadly divided into 4 steps

(1) mean value of each training lot data is sought；

(2) variance of each training lot data is sought；

(4) change of scale and offset: by x_iNumerical values recited is adjusted multiplied by γ, obtains y after increasing offset along with β_i, γ is scale The factor, β are shift factors, due to the x after normalization_iSubstantially can be limited under normal distribution, so that the ability to express of network Decline, to solve this problem, introduce two new parameters: γ, beta, gamma and β are that network oneself study obtains in training.

9. according to claim 6 based on the classification of sandstorm intensity prediction technique of Stacking Integrated Strategy, which is characterized in that institute Stating L2 regularization method is the quadratic sum that weight parameter is added on the basis of original loss function, the loss after L2 regularization Function representation are as follows:

Wherein, w is classifier network model parameter, E_inIt (w) is the training sample error for not including regularization term, λ is regularization Parameter；

According to above-mentioned formula, the gradient of L (w) is expressed as:

。

10. according to claim 6 based on the classification of sandstorm intensity prediction technique of Stacking Integrated Strategy, which is characterized in that Dropout (DP) method is to allow the activation value of some neuron with certain general when neural network propagated forward Rate p stops working, so that model in each training batch, will not too depend on certain local features of lot data.