CN114004154A

CN114004154A - Penetration depth prediction method based on multi-partition model integration

Info

Publication number: CN114004154A
Application number: CN202111281357.8A
Authority: CN
Inventors: 王继民; 张晨楠; 王飞; 张新华
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2021-11-01
Filing date: 2021-11-01
Publication date: 2022-02-01

Abstract

The invention discloses a penetration depth prediction method based on multi-partition model integration, which comprises the steps of dividing an evaluation interval for evaluating an engineering algorithm; selecting a plurality of better engineering algorithms; adopting a selected engineering algorithm to perform batch sample calculation to obtain engineering calculation simulation data; in each evaluation interval, test data and engineering calculation simulation data are adopted to establish a dimensionless penetration depth prediction model based on a random forest and a BP neural network; performing weighted integration by adopting K adjacent partition model fusion output to form a fusion model; the input features generate partition prediction results through a plurality of prediction models, and finally, dimensionless penetration depth prediction output is generated through a fusion model. The invention ensures the accuracy of data and avoids the influence of the lack of experimental data on the deep learning model; and establishing a model in a partition mode, reducing the modeling range, ensuring that the model can reflect the physical characteristics and rules of a local area, generating smaller prediction error and obtaining a prediction model with higher accuracy.

Description

Penetration depth prediction method based on multi-partition model integration

Technical Field

The invention belongs to the technical field of information processing, and particularly relates to a penetration depth prediction method based on multi-partition model integration.

Background

The penetration mechanism of the protective material medium is revealed by an empirical algorithm established by a large amount of experimental data. The problem of penetration damage is a very complex physical process, and the existing methods are difficult to accurately and accurately restore the actual situation, so that the engineering algorithm still occupies an important position in the actual process. The engineering algorithm is simple to use, and only a plurality of parameters are required to be given, so that an estimated dimensionless penetration depth value can be obtained. The existing penetration depth engineering algorithm is a calculation formula generated by the inventor through fitting and other modes based on own test data. The method is limited by experimental environment, economic conditions and the like, used test data can only cover a part of parameter intervals, and most of the test data are scaled tests, so that each engineering algorithm can only be used in a part of parameter range and cannot meet the full parameter area of penetration deep analysis. Meanwhile, the application range of the engineering algorithm has no clear measurement standard and is not clear. Therefore, how to establish a penetration depth prediction model by combining test data on hands of respective workers with the existing engineering algorithm improves the accuracy of prediction, alleviates the difficulty in selecting a plurality of engineering algorithms in the practical application of engineering, and becomes an important direction of penetration research.

The penetration process is a highly nonlinear and complex process, and may exhibit different physical laws and mechanisms in different parameter ranges, such as rigid body at low speed and fluid at ultra-high speed. Therefore, it is necessary to effectively partition the parameter space of the penetration effect, show different physical mechanisms in different intervals as much as possible, and independently establish models for different intervals for prediction, rather than predict with one model in the whole parameter interval.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art and application, the invention provides a penetration depth prediction method based on multi-partition model integration, which fully exerts the advantages of test data and the advantages of different engineering algorithms in each parameter interval, establishes evaluation partitions by reducing the analysis range, establishes prediction models in partitions, realizes refined prediction, and improves the penetration depth prediction precision.

The technical scheme is as follows: the invention provides a penetration depth prediction method based on multi-partition model integration, which specifically comprises the following steps:

(1) the method comprises the steps of dividing an evaluation interval for evaluating the engineering algorithm by reducing dimensions and clustering test data and combining domain expert knowledge;

(2) selecting k engineering algorithms with higher calculation precision in each divided evaluation interval;

(3) in each evaluation interval, performing batch sample calculation by adopting a selected engineering algorithm to obtain engineering calculation simulation data;

(4) in each evaluation interval, based on the engineering algorithm simulation data and the test data of the interval, integrating the BP neural network by adopting a random forest integration method, establishing a prediction model RF _ BP _ r of the interval, and realizing penetration depth prediction of the interval;

(5) and generating a penetration depth predicted value of the full-parameter interval by using K neighbor fusion prediction.

Further, the step (1) includes the steps of:

(11) and (3) reducing the dimension of the test data by adopting a flow type learning LLE algorithm: the input of penetration depth analysis comprises target landing speed, projectile body mass, projectile body diameter, target compressive strength, target material density, CRH, shape factor and 8 characteristic quantities of projectile head length, the output is dimensionless penetration depth, and the international units are uniformly adopted, and data are organized into the following matrix format:

wherein d is_ijIs the input quantity, m is the outputInputting the number of characteristic quantities, wherein the last column is output of a model, namely dimensionless penetration depth, other columns are input of the model, and n is the number of data; adopting a local linear embedding flow pattern learning LLE algorithm to reduce the dimension of sample data in a 9-dimensional space described by a matrix M to a d-dimensional space, trying to keep the linear relation between samples in a neighborhood by local linear embedding, keeping the linear relation in a low-dimensional space, and being applicable to the nonlinear dimension reduction of high-dimensional data;

(12) clustering the d-dimensional spatial data subjected to dimensionality reduction by adopting a hierarchical clustering algorithm to determine a preliminary evaluation interval range, and analyzing different clustering results by field experts to determine a reasonable clustering hierarchy and result; determining an input characteristic quantity interval in each class according to the characteristic value range of the sample points contained in each class, wherein the value range of the characteristic quantity in each class forms an evaluation interval;

(13) the domain experts reasonably expand the characteristic quantity intervals, and the value ranges of the characteristic quantities in different evaluation intervals can be overlapped to a certain extent.

Further, the step (2) is realized as follows:

extracting test data in the evaluation interval r to form a test sample set S aiming at the evaluation interval r_rSuppose that g engineering algorithms a to be evaluated are selected as r₁、a₂、…、a_gBy using S_RAnd a₁、a₂、…、a_gCalculating the dimensionless penetration depth, comparing the dimensionless penetration depth with an actual value, and calculating the algorithm precision; the average absolute percentage error MAPE is used as an evaluation standard of the calculation precision of an engineering algorithm:

wherein, y_iFor true dimensionless penetration of depth values, p_iM is the total amount of the samples in the evaluation interval for the calculated dimensionless penetration depth value;

assumption of MAPE₁、MAPE₂、…、MAPE_gRespectively representing engineering algorithms a₁、a₂、…、a_gAverage absolute percent error in the parameter interval analyzed, MAPE_i(i-1, …, g), and taking the algorithm corresponding to the first k errors as the better algorithm of the parameter interval according to the sequence from low to high.

Further, the step (3) is realized as follows:

discretizing parameters required by the engineering algorithm calculation in the range of an evaluation interval, combining to form a large number of input vectors, and calculating by adopting the engineering algorithm of the evaluation interval to obtain engineering calculation simulation data; assuming that the current algorithm needs m parameters, the discretization value of each parameter is p₁,p₂,…,p_mAll combinations are

Batch calculation of an engineering algorithm in the current evaluation interval can be generated

Simulation data; and obtaining engineering calculation simulation data through batch engineering calculation.

Further, the step (4) comprises the steps of:

(41) establishing a BP neural network model of an evaluation partition by adopting a random forest algorithm idea, randomly selecting m characteristics from 8 characteristic vectors at each time as the input of the BP neural network, and outputting the input as dimensionless penetration depth; from S_rRandomly selecting p% of data S_{r_p}Extracting S_{r_p}The selected m characteristics and dimensionless penetration depth dimension form a training set S_{r_p_train}By using S_{r_p_train}Training a tree, namely a BP neural network; m takes the value of 6 or 7; p is set to be 70-80;

(42) f trees are constructed according to (41), namely f BP neural networks are constructed, and the hidden layer number of each BP neural network is

The number of nodes in each layer is 2m, and the activation function is Relu(ii) a The training data was normalized using the following formula:

S_{r_p_train}＝(S_{r_p_train-}S_{r_p_train_}mean)/S_{r_p_train_}std (3)

wherein S is_{r_p_train_}mean represents S_{r_p_train}Vector of mean values of the dimensions, S_{r_p_train_}std denotes S_{r_p_train}A vector formed by the standard deviation of each dimension;

(44) and averaging the output of the f trees by adopting an average strategy to serve as the prediction output of the corresponding partition.

Further, the step (5) includes the steps of:

(51) judging the evaluation interval according to the input parameters, selecting the prediction model of the evaluation interval as the main prediction model, and obtaining the main prediction value p₁；

(52) Searching adjacent intervals of the interval where the parameters are located, supposing that k adjacent intervals are found, wherein each evaluation interval is a hyper-rectangular area, and for a certain main evaluation interval r, the evaluation intervals are all adjacent intervals of r as long as the evaluation intervals are connected with the interval r or the vertexes of the evaluation intervals are contacted;

(53) respectively adopting prediction models of adjacent intervals to carry out penetration depth prediction on input, and carrying out averaging on prediction results to obtain a prediction value p of the adjacent interval₂；

(54) The final prediction result is p ═ (1-1/(k +1)) × p1+ p2/(k + 1); the weight occupied by the main prediction interval is 1/(k +1), wherein k +1 represents the number of all partitions used for performing the current prediction, namely the weight occupied by the main prediction partition model is inversely proportional to the number of the partitions used for performing the prediction; the more the number of evaluation sections is, the more finely the section division is indicated, the more the independence of each section is strong, and the weight occupied by the master prediction section model is high.

Has the advantages that: compared with the prior art, the invention has the beneficial effects that: compared with a single engineering algorithm, the method can realize accurate prediction in a wider parameter area; compared with the existing model established only based on test data, the method can fully utilize the advantages of the existing engineering algorithm through the pseudo data of the engineering algorithm, and improve the accuracy of the deep learning model; according to the invention, the prediction models are established in a plurality of intervals in a targeted manner, each partition model can reflect local physical laws, and the prediction accuracy can be effectively improved.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a neighborhood partition retrieval map;

figure 3 is a MAPE graph of the test set against different experimental models.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

As shown in FIG. 1, the penetration depth prediction method based on multi-partition model integration disclosed by the invention is characterized in that an evaluation interval for evaluating an engineering algorithm is divided by reducing dimensions and clustering of test data and combining with domain expert knowledge; in each evaluation area, k engineering algorithms with higher calculation precision in the area are selected; in each evaluation interval, performing batch sample calculation by adopting a selected engineering algorithm to obtain engineering calculation simulation data; in each evaluation interval, based on the engineering algorithm simulation data and the test data of the interval, integrating the BP neural network by adopting a random forest integration method, establishing an RF _ BP _ r model of the interval, and realizing penetration depth prediction of the interval; and finally, performing weighted integration on the prediction results of all the partitions through weight-based fusion to establish the dimensionless penetration depth prediction output. According to the invention, different engineering algorithms are adopted to generate simulation data in different parameter intervals, so that the accuracy of the data is ensured as much as possible, meanwhile, the influence of lack of experimental data on a deep learning model is avoided through a large amount of engineering algorithm simulation data, different models are built in different intervals in a targeted manner, a refined interval model can be generated, and a prediction model which can exceed the accuracy of the existing engineering algorithm is generated.

Step 1: and dividing evaluation intervals. And (3) carrying out evaluation interval for evaluating the engineering algorithm by reducing dimensions and clustering of test data and combining with domain expert knowledge division. The method comprises the following steps:

(1) and (5) reducing the dimension of the test data by adopting a flow pattern learning LLE algorithm. The input of penetration depth analysis comprises 8 characteristic quantities such as target landing speed, projectile body mass, projectile body diameter, target compressive strength, target material density, CRH, shape factor, projectile head length and the like, the output is dimensionless penetration depth, and international units are adopted uniformly. The data is organized in a matrix format as follows:

wherein d is_ijThe method comprises the steps of inputting quantities, wherein m is the number of input characteristic quantities, the last column is model output, namely the last column of dimensionless is model output, namely dimensionless penetration depth, other columns are input of models, and n is the number of data.

And reducing the dimension of the sample data in the 9-dimensional space described by the matrix M to a d-dimensional space by adopting a local linear embedding flow pattern learning LLE algorithm, wherein the local linear embedding tries to keep the linear relation between the samples in the neighborhood and keeps the linear relation in a low-dimensional space, and the method can be suitable for the nonlinear dimension reduction of high-dimensional data.

(2) And clustering the d-dimensional spatial data subjected to dimension reduction by adopting a hierarchical clustering algorithm, and determining a preliminary evaluation interval range. And (4) analyzing different clustering results by field experts to determine a reasonable clustering level and a reasonable clustering result. Determining an input characteristic quantity interval in each class according to a characteristic value range of sample points contained in each class, wherein the value range of the characteristic quantity in each class forms an evaluation interval, for example (projectile mass (0-100), landing speed (0-200), projectile shape (0.72-0.84), target density (2.7-12.0), CRH (0.5-1.3) …) describe a high-dimensional space corresponding to the evaluation interval, and the clustering result gives a preliminary evaluation interval segmentation.

(3) And adjusting the evaluation interval by the domain expert. The domain experts reasonably expand the characteristic quantity intervals, and the value ranges of the characteristic quantities in different evaluation intervals can be overlapped to a certain extent. The union of the corresponding spaces of all the evaluation intervals needs to be able to cover the high-dimensional space formed by reasonable values of the selected characteristic quantities. If the feature amount of the evaluation section includes 3 features of the projectile mass, the landing speed, and the target density, the union of the ranges of all the evaluation sections needs to be able to cover a three-dimensional space formed by the projectile mass, the landing speed, and the target density.

Step 2: an engineering algorithm is selected. And selecting k engineering algorithms with high calculation precision in each divided evaluation interval. The selection method comprises two steps:

(1) according to the existing application practice summary data, the application intervals of various engineering algorithms, such as a concrete penetration depth algorithm Young formula, are analyzed, and generally the effect is better in the medium-speed interval.

(2) And (1) on the premise that enough information cannot be obtained, performing calculation precision analysis by using the test data. Extracting test data in the evaluation interval r to form a test sample set S aiming at the evaluation interval r_rSuppose that g engineering algorithms a to be evaluated are selected as r₁、a₂、…、a_gBy using S_RAnd a₁、a₂、…、a_gAnd (4) calculating the dimensionless penetration depth, comparing the dimensionless penetration depth with an actual value, and calculating the algorithm precision. The average absolute percentage error MAPE is used as an evaluation standard of the calculation precision of an engineering algorithm:

wherein, y_iFor true dimensionless penetration of depth values, p_iAnd m is the total amount of the samples in the evaluation interval for the calculated dimensionless penetration depth value.

And step 3: and in each evaluation interval, performing batch sample calculation by adopting a selected engineering algorithm to obtain engineering calculation simulation data.

And carrying out batch engineering calculation in each evaluation interval to obtain engineering calculation simulation data. Discretizing parameters required by the engineering algorithm calculation in the range of the evaluation interval, combining to form a large number of input vectors, and calculating by adopting the engineering algorithm of the evaluation interval to obtain engineering calculation simulation data. Assuming that the current algorithm needs m parameters, the discretization value of each parameter is p₁,p₂,…,p_mAll combinations are

And (4) simulating the data. The parameter ranges generally include two types:

1) the real number value range. And determining the maximum value, the minimum value and the change step length of the parameter aiming at the parameter with the real number value range, and then automatically and discretely generating all the values of the parameter. For example, in the intermediate speed interval [340,650), the minimum value is 340, the maximum value is 650, and assuming that the step size is set to 10, 31 values can be generated.

2) And enumerating values. If the bullet shape, including flat bullet, ovum type bullet, tip bullet etc. the value sets up to enumerate the type, includes: 0.72, 0.8 and 1.14.

And obtaining engineering calculation simulation data through batch engineering calculation. Because the engineering algorithm is also generated by fitting a large amount of test data, the engineering algorithm is used for calculation in the interval with higher accuracy of the engineering algorithm, and the obtained simulation data has higher accuracy. And mixing engineering simulation data obtained by the calculation of the engineering algorithms of the intervals to form a data set of the training model.

And 4, step 4: and (5) partitioning data. Dividing the test data according to the value range of the characteristic quantity of the evaluation interval, and dividing the data S of each evaluation interval_rThe method comprises engineering calculation simulation data and test data belonging to the evaluation interval. Establishing an evaluationPartitioned RF _ BP _ r model. Establishing a prediction model for each evaluation partition based on a random forest algorithm, wherein the method comprises the following steps:

(1) establishing a BP neural network model of an evaluation partition by adopting a random forest algorithm idea, randomly selecting m characteristics from 8 characteristic vectors at each time as the input of the BP neural network, and outputting the input as dimensionless penetration depth; from S_rRandomly selecting p% of data S_{r_p}Extracting S_{r_p}The selected m characteristics and dimensionless penetration depth dimension form a training set S_{r_p_train}By using S_{r_p_train}A tree (i.e., a BP neural network) is trained. The value of m is generally 6-7, and because the penetration process is related to a plurality of factors, if the selected characteristic quantity is too small each time, a proper model is difficult to establish; the setting of p is generally 70-80, the p value is too small, the data volume is small, the training effect of the BP neural network is poor, the model difference is large, the caused error is large, the p value is too high, for example, 100%, the training data of each model are the same, the trained BP neural network has small difference and lacks diversity;

(2) and (3) constructing f trees according to the idea of (1), namely constructing f BP neural networks. The number of hidden layers of each BP neural network is

The number of nodes in each layer is 2m, and the activation function is Relu. The training data was normalized using the following formula:

S_{r_p_train}＝(S_{r_p_train-}S_{r_p_train_}mean)/S_{r_p_train_}std (3)

S_{r_p_train_}mean represents S_{r_p_train}Vector of mean values of the dimensions, S_{r_p_train_}std denotes S_{r_p_train}And the standard deviation of each dimension constitutes a vector.

(3) And averaging the output of the f trees by adopting an average strategy to serve as the prediction output of the corresponding partition.

And 5: and predicting the penetration depth of the full-parameter interval by adopting K neighbor fusion prediction. The model for each partition may yield relatively good results in the prediction of the respective partition interval. However, since penetration is a very complicated process, in the boundary region of different intervals, the models of adjacent partitions are approximate simulations of the mechanism of the region, and similar results may be generated. Therefore, when prediction is carried out, the evaluation partition where the input parameters are located is taken as the main, and meanwhile, prediction results of adjacent partitions are fully utilized for fusion, so that a final penetration depth prediction result is obtained.

(1) Judging the evaluation interval according to the input parameters, selecting a prediction model of the evaluation interval as a main prediction model, and predicting the input by using the main prediction model to obtain a main prediction value p₁；

(2) And searching the adjacent intervals of the intervals where the parameters are located, and assuming that k adjacent intervals are found. As shown in fig. 2, there are 8 adjacent intervals around the main prediction interval, and each evaluation interval is a hyper-rectangular area. For a certain evaluation interval r, the adjacent intervals of r are all the same as long as there is an interface (edge) with the interval r or the evaluation interval with the vertex. In order to improve the efficiency, a neighboring interval list of each evaluation interval can be established and stored in advance;

(3) respectively adopting prediction models of adjacent intervals to carry out penetration depth prediction on input, and carrying out averaging on prediction results to obtain prediction values p of the adjacent intervals₂；

(4) The final prediction result is p ═ (1-1/(k +1)) × p1+ p2/(k + 1). The weight occupied by the main prediction interval is 1/(k +1), and k +1 represents the number of all the partitions used for the current prediction, namely the weight occupied by the main prediction partition model is inversely proportional to the number of the partitions used for the prediction. The larger the number of evaluation sections, the more detailed the section division is, the stronger the independence of each section is, and thus the higher the weight occupied by the master prediction section model is.

The process of predicting the dimensionless penetration depth comprises the steps of generating prediction output through SF _ BP _ r models of all partitions by using input vectors formed by 8 characteristic quantities of target landing speed, projectile mass, projectile diameter, target compressive strength, target material density, CRH, shape factors and projectile head length, and generating the final dimensionless penetration depth value through K neighbor fusion prediction.

In order to verify the performance of the penetration depth prediction method based on multi-partition model integration, concrete penetration depth prediction is taken as an example for experiment, and the method is compared with the existing algorithm. The algorithm involved in the comparison includes: BP neural network algorithms and existing empirical algorithms.

For comparison, a test data set is divided into a training set and a test set according to the proportion of 7:3, the performance difference between the method provided by the invention and the existing method is compared on the test set, and the evaluation index adopts the average absolute percentage error MAPE.

Firstly, evaluation interval division is carried out, firstly, the existing test data is subjected to dimension reduction by using a flow pattern learning algorithm LLE, k is taken as 4 when neighbor searching is carried out, the data is projected into a 3-dimensional space, the 3-dimensional space data is clustered, and finally 12 evaluation partitions are obtained through adjustment of related field personnel.

According to the existing algorithm documents and test data, the engineering algorithms are selected in 12 intervals, and four engineering algorithms are selected in each interval, as shown in table 1.

TABLE 1 candidate list of engineering algorithm for each evaluation interval

And carrying out batch engineering calculation in each interval, discretizing each parameter according to a mode of table 2 to generate a combined sample, and calculating by using a candidate engineering algorithm of each evaluation interval to generate engineering calculation simulation data.

TABLE 2 empirical algorithm pseudo data source generation parameter table

MAPE errors of the models and common engineering algorithms on a test set are shown in FIG. 3, wherein BP-exp is a BP neural network model established by experimental data, and BP-cand is a BP neural network model established by engineering calculation simulation data. The dashed line in the graph represents the lowest MAPE error in the conventional empirical algorithm, and it can be seen from the graph that the MAPE error of the engineering algorithm on the test set is higher than that of the neural network as a whole. MAPE errors of a BP-exp model based on a test data source are slightly lower than those of a BP-cand model based on a pseudo data source of an empirical algorithm; the K-nearest neighbor Fusion model predicts (K _ Fusion) the lowest MAPE error.

Claims

1. A penetration depth prediction method based on multi-partition model integration is characterized by comprising the following steps:

2. The method for penetration depth prediction based on multi-partition model integration according to claim 1, wherein the step (1) comprises the steps of:

wherein d is_ijThe method comprises the following steps that (1) m is input quantity, m is the number of input characteristic quantities, the last column is output of a model, namely dimensionless penetration depth, other columns are input of the model, and n is data quantity; adopting a local linear embedding flow pattern learning LLE algorithm to reduce the dimension of sample data in a 9-dimensional space described by a matrix M to a d-dimensional space, trying to keep the linear relation between samples in a neighborhood by local linear embedding, keeping the linear relation in a low-dimensional space, and being applicable to the nonlinear dimension reduction of high-dimensional data;

3. The method for penetration depth prediction based on multi-partition model integration according to claim 1, wherein the step (2) is implemented as follows:

4. The method for penetration depth prediction based on multi-partition model integration according to claim 1, wherein the step (3) is implemented as follows:

5. The method for penetration depth prediction based on multi-partition model integration according to claim 1, wherein the step (4) comprises the steps of:

(41) establishing a BP neural network model of an evaluation partition by adopting a random forest algorithm thought, randomly selecting m characteristics from 8 characteristic vectors at a time as the input of the BP neural network, and outputting dimensionless invasionDepth; from S_rRandomly selecting p% of data S_{r_p}Extracting S_{r_p}The selected m characteristics and dimensionless penetration depth dimension form a training set S_{r_p_train}By using S_{r_p_train}Training a tree, namely a BP neural network; m takes the value of 6 or 7; p is set to be 70-80;

(42) f trees are constructed according to (41), namely f BP neural networks are constructed, and the hidden layer number of each BP neural network is log₂ ^mThe number of nodes in each layer is 2m, and the activation function is Relu; the training data was normalized using the following formula:

S_{r_p_train}＝(S_{r_p_train}-S_{r_p_train_}mean)/S_{r_p_train_}std (3)

(43) and averaging the output of the f trees by adopting an average strategy to serve as the prediction output of the corresponding partition.

6. The method for penetration depth prediction based on multi-partition model integration according to claim 1, wherein the step (5) comprises the steps of: