CN113241132B

CN113241132B - Method and system for establishing machine learning model for predicting material irradiation damage

Info

Publication number: CN113241132B
Application number: CN202110496009.6A
Authority: CN
Inventors: 康俸溪; 柳彦博; 赵志杰; 王瑞; 赵晓锟; 李柔娟
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2022-12-09
Anticipated expiration: 2041-05-07
Also published as: CN113241132A

Abstract

The invention relates to a method and a system for establishing a machine learning model for predicting material irradiation damage. The method for establishing the material irradiation damage prediction machine learning model comprises the steps of dividing sample data in an acquired laser examination data set into a training set and a testing set according to a set proportion, generating a new training set with the same number as that of decision tree regression models according to the sample data in the training set, training the acquired machine learning model by taking independent variables of the generated sample data in the new training set as input and response variables of the sample data in the training set as output, and obtaining the material irradiation damage prediction machine learning model. The material irradiation damage prediction machine learning model constructed through the process can solve the problems that the efficiency is low, the cost is high, the period is long, the material irradiation damage prediction machine learning model only aims at a specific system, the calculated amount is large, the requirement of quantitatively describing the material property cannot be met and the like in the prior art.

Description

Method and system for establishing machine learning model for predicting material irradiation damage

Technical Field

The invention relates to the technical field of prediction model establishment, in particular to a method and a system for establishing a machine learning model for predicting material irradiation damage.

Background

With the development of human science and technology, the types of weapons are greatly enriched. The combination of the laser technology and carrying platforms such as ships and warships improves the maneuverability of laser weapons, and poses great threat to weaponry such as satellites, airplanes and missiles.

The extraordinary service conditions put forward higher requirements on the relevant properties of the materials, and based on the military requirements, the research and development of new materials with better properties under the extraordinary conditions such as high-temperature ablation, laser irradiation and the like is necessary, and the acceleration of the development of new materials causes high attention of governments and researchers of various countries. The traditional development method mostly adopts a trial and error method, and has the problems of low efficiency, high cost, long period and the like; or a material calculation simulation method such as a first principle method, molecular dynamics, finite element analysis and the like is adopted, so that the problems that the specific system is only adopted, the calculated amount is large, the requirement of quantitatively describing the material property cannot be met and the like exist.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a method and a system for establishing a machine learning model for predicting the irradiation damage of a material.

In order to achieve the purpose, the invention provides the following scheme:

a method for establishing a machine learning model for predicting material irradiation damage comprises the following steps:

acquiring a laser assessment data set; the laser assessment data set comprises: sample data of the material damaged by irradiation; the sample data includes: an independent variable and a response variable;

dividing sample data in the laser assessment data set into a training set and a test set according to a set proportion;

obtaining a machine learning model; the machine learning model comprises a plurality of decision tree regression models;

generating a new training set with the same number as the decision tree regression model according to the sample data in the training set;

training a machine learning model by taking the independent variable of the sample data in the new training set as input and taking the response variable of the sample data in the training set as output to obtain a trained machine learning model; and the trained machine learning model is a material irradiation damage prediction machine learning model.

Preferably, generating a new training set with the same number as the number of the decision tree regression models according to the sample data in the training set specifically includes:

and after the sample data in the training set is sampled by a self-help method, a new training set is formed.

Preferably, the sample data in the laser assessment data set is divided into a training set and a test set according to the proportion of 8.

Preferably, training the machine learning model by using the independent variable of the sample data in the new training set as an input and using the response variable of the sample data in the training set as an output to obtain the trained machine learning model specifically includes:

determining the characteristics of the regression model of the decision tree according to the number of independent variables of the sample data in the new training set;

carrying out data region division on the sample data in the new training set according to the characteristics of the decision tree regression model to obtain a data division region;

acquiring the average value of response variables of sample data in a new training set and inputting the response variable value of the sample data in each data division region;

and determining a loss function of the machine learning model according to the response variable value and the average value of the response variable, and finishing the training of the machine learning model.

Preferably, the method further comprises the following steps:

and testing the material irradiation damage prediction machine learning model by adopting the test set.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention provides a method for establishing a material irradiation damage prediction machine learning model, which comprises the steps of dividing sample data in an acquired laser examination data set into a training set and a test set according to a set proportion, generating a new training set with the same number as that of decision tree regression models according to the sample data in the training set, and training the acquired machine learning model by taking independent variables of the sample data in the generated new training set as input and response variables of the sample data in the training set as output to obtain the material irradiation damage prediction machine learning model. The material irradiation damage prediction machine learning model constructed through the process can solve the problems that in the prior art, the efficiency is low, the cost is high, the period is long, the material irradiation damage prediction machine learning model only aims at a specific system, the calculated amount is large, and the requirement for quantitatively describing the material property cannot be met.

Corresponding to the method for establishing the machine learning model for predicting the material irradiation damage, the invention also provides the following technical scheme:

a system for establishing a machine learning model for predicting material irradiation damage comprises:

the data set acquisition module is used for acquiring a laser assessment data set; the laser assessment data set comprises: sample data of the material damaged by irradiation; the sample data includes: an independent variable and a response variable;

the data set dividing module is used for dividing the sample data in the laser assessment data set into a training set and a test set according to a set proportion;

the machine learning surface model acquisition module is used for acquiring a machine learning model; the machine learning model comprises a plurality of decision tree regression models;

the training set generating module is used for generating a new training set with the same number as the decision tree regression model according to the sample data in the training set;

the learning model building module is used for training the machine learning model by taking the independent variable of the sample data in the new training set as input and taking the response variable of the sample data in the training set as output to obtain the trained machine learning model; the trained machine learning model is a material irradiation damage prediction machine learning model.

Preferably, the training set generating module specifically includes:

and the training set generating unit is used for forming a new training set after self-help sampling is carried out on the sample data in the training set.

Preferably, the learning model building module specifically includes:

the characteristic determining unit is used for determining the characteristics of the decision tree regression model according to the independent variable number of the sample data in the new training set;

the region dividing unit is used for carrying out data region division on the sample data in the new training set according to the characteristics of the decision tree regression model to obtain a data dividing region;

the response variable value acquisition unit is used for acquiring the average value of the response variables of the sample data in the new training set and the response variable value of the sample data input into each data division region;

and the training unit is used for determining a loss function of the machine learning model according to the response variable value and the average value of the response variable to finish the training of the machine learning model.

Preferably, the method further comprises the following steps:

and the testing module is used for testing the material irradiation damage prediction machine learning model by adopting the testing set.

The technical effect achieved by the system for establishing the material irradiation damage prediction machine learning model provided by the invention is the same as the technical effect achieved by the method for establishing the material irradiation damage prediction machine learning model provided by the invention, so that the details are not repeated herein.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of a method for establishing a machine learning model for predicting material irradiation damage according to the present invention;

FIG. 2 is a schematic diagram of a decision tree regression model dividing data regions when sample data has 2 attributes according to an embodiment of the present invention;

FIG. 3 is a comparison graph of the predicted results after transformation of the y values of the response variables provided by the embodiment of the present invention;

FIG. 4 is a comparison graph of the predicted results of response variable y values provided by the embodiment of the present invention without transformation;

fig. 5 is a schematic structural diagram of a system for establishing a machine learning model for predicting material irradiation damage provided by the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims to provide a method and a system for establishing a machine learning model for predicting material irradiation damage, and aims to solve the problems that the efficiency is low, the cost is high, the period is long, the requirements on a specific system and the calculated amount are large, the quantitative description of material properties cannot be met and the like in the prior art through the established model.

So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments thereof which are illustrated in the appended drawings

As shown in fig. 1, the method for establishing a machine learning model for predicting radiation damage of a material provided by the present invention includes:

step 100: and acquiring a laser assessment data set. The laser assessment data set comprises: sample data of the radiation damaged material. The sample data includes: an independent variable and a response variable. In the present invention, the number of sample data is preferably 190. <xnotran> 37 , 36 (: power- , time- , thickness- , inflame- , organic _ num- , orgmass- , inorgnum- , inorganic _ mass- , yield _ max- , yield _ min- , yield _ mean- , yield _ geom- , yield _ weighted- , λ _ max- , tm _ max- , tb _ max- , hp _ max- , λ _ min- , tm _ min- , tb _ min- , hp _ min- , λ _ mean- , tm _ mean- , tb _ mean- , hp _ mean- , λ _ geom- , tm _ geom- , tb _ geom- , hp _ geom- , λ _ weighted- , tm _ weighted- , tb _ weighted- , hp _ weighted- , newsoild- , nhc- ntm- ), 1 (back _ temperature- ). </xnotran>

Step 101: and dividing sample data in the laser assessment data set into a training set and a test set according to a set proportion. Specifically, sample data in the laser assessment data set is divided into a training set and a test set according to the proportion of 8.

Step 102: a machine learning model is obtained. The machine learning model includes a plurality of decision tree regression models. The machine learning model adopted by the invention is a random forest model or a neural network model.

Step 103: and generating a new training set with the same number as the regression model of the decision tree according to the sample data in the training set. Specifically, after the sample data in the training set is sampled by a self-help method (i.e., the sample data is randomly sampled with a return), a new training set is formed.

Step 104: and training the machine learning model by taking the independent variable of the sample data in the new training set as input and the response variable of the sample data in the training set as output to obtain the trained machine learning model. The trained machine learning model is the material irradiation damage prediction machine learning model. The process specifically comprises the following steps:

and determining the characteristics of the decision tree regression model according to the independent variable number of the sample data in the new training set.

And carrying out data region division on the sample data in the new training set according to the characteristics of the decision tree regression model to obtain a data division region.

And acquiring the average value of response variables of the sample data in the new training set and inputting the response variable value of the sample data in each data division region.

In order to improve the prediction accuracy of the constructed model, the invention also adopts a test set to test the material irradiation damage prediction machine learning model.

The method for establishing the machine learning model for predicting the material irradiation damage provided by the invention is specifically explained below by taking a selected machine learning model as a random forest model as an example, and other models can be adopted in the machine learning model in the actual application process.

Step 1, acquiring a data set and preprocessing the data:

a laser assessment dataset was first imported from the database, which contained 190 samples in total, each sample having 37 attributes. One attribute is back _ temperature, and specifically refers to a back temperature value of the material subjected to laser ablation; the other 36 attributes are some of the physicochemical properties of the material (see the above list for specific attributes). The invention predicts the back temperature value according to the 36 attributes by the established random forest model.

Secondly, the imported laser examination data set needs to be divided into two parts, one part is a training set (accounting for about 70% -80% of the total sample amount), and the other part is a testing set (accounting for 20% -30%). And because the original data is distributed in a skewed state, namely the number of low-temperature (not higher than 300 ℃) samples is large and the number of high-temperature samples is small, before the model is established, the method preprocesses the back temperature value y, namely ln (y) replaces the original y value, so that the y value is basically normally distributed, and the accuracy of model prediction is improved.

Step 2, establishing a random forest model:

the idea of establishing the model in this embodiment is mainly: and establishing a random forest regression model through the training set. The random forest regression is an integrated algorithm of decision tree regression (regression tree for short), and takes a series of regression trees and then synthesizes a prediction result of each tree as a final result of a model. For example, if a random forest model includes 30 regression trees, the result of the random forest is the average of the results of the 30 regression trees. Random forests are based on regression trees. The following is a brief description of the regression tree principle:

for a given data set (i.e., the imported laser assessment data set), let it have n samples, each sample containing x _i1 ,x _i2 ,...,x _ip P independent variables and one response variable y _i I = 1. Dividing the n samples, and recording the divided parts as R ₁ ,R ₂ ,...,R _j And record

Is R _j Average value of the response variable of the medium sample. The loss function can be found to be:

as shown in fig. 2, taking p =2 as an example, the loss function is

Based on the above process, the regression tree model is actually built by obtaining a partition of the data set, so that the loss function loss is the smallest value under the partition. After the division, for the new sample, it must fall into a certain region R _i Then the predicted value of the new sample back temperature is the average value of all the sample back temperatures in the area.

Based on the above process, the randomness of the established random forest model is mainly embodied in two aspects: the randomness of the training set, and the randomness of the features. For the first aspect, assuming that 80% of samples in the original data set are selected as a training set (i.e. total 0.8n samples and the remaining 0.2n as a test set), the randomness of the training set means that 0.8n samples are randomly and repeatedly extracted from the training set (i.e. extracted by a self-help method) to form a new sample set, and the new sample set is used as the new training set, and the training set used by each regression tree is generated by the method. Based on the specific generation mode of the new training set, the same training set can be prevented from being used by each regression tree, and due to the randomness of sample extraction, the extracted training sets can be prevented from being distributed in a biased manner, so that the training accuracy and the real-time performance are improved. In the second aspect, the randomness of the features refers to randomly selecting M features (M > M) from the original M features, so that the number of features used in each regression tree is M.

Step 3, model evaluation:

in the invention, the evaluation of the model is to measure the quality of the model by the performance of the model on the test set. Specifically, the method is measured by calculating the average absolute Percentage Error (map for short) of the model, and a smaller map indicates that the accuracy of the model prediction is higher. Suppose a test set contains k samples, y _i Is the true response variable (true background temperature value) of the sample,

for prediction of sample response variablesValue (predicted back temperature value), then map can be given by:

in the random forest model built based on the above process, the key parameters are the trees of the regression trees (n _ estimators), the number of attributes used per regression tree (max _ features), and the partition of the training set test set (changed by parameter random _ state in python code). By varying the first parameter n _ estimators, different models and their behavior can be obtained. The second parameter max _ features is fixed, and the value is sqrt (n) = sqrt (36) =6, where sqrt means the root number. The adjustment of the division of random _ state for the third parameter training set test set is the same as that of n _ estimators, and the results are selected one by one, so that the model is finally determined to be random _ state =8.

And 4, applying the model to a test set:

the random forest model obtained after training through the training set in the above embodiment includes a plurality of (n _ estimators) regression trees, each regression tree has 6 features, and the training set is divided by the 6 features. And substituting a certain new sample into the model, wherein for each regression tree, the sample necessarily falls into a certain area divided by the regression tree, so that the predicted value of each regression tree on the sample is obtained, and the predicted value of the random forest is the average value of the predicted values of each tree, so that the result of the model on the test set is obtained. In the present invention, the error is evaluated by using the absolute error percentage as an index.

The random forest model constructed by the above-mentioned embodiments of the present invention is described below with specific examples.

It is not difficult to know from the random forest principle, in fact, since the 6 attributes used by each regression tree are randomly selected from 36 attributes, if the attributes of each regression tree are not identical, it is possible thatIs provided with

In the near two million cases. The following is the expression of the random forest model when regression trees with different orders of magnitude are taken as the trees:

the original data set is divided into training set and test set by 8, random _ state =8, and is applied in the following examples.

The first embodiment is as follows: regression trees n _ estimators =100000, percentage absolute error: 19.994%

Example two: regression tree n _ estimators =10000, percentage absolute error: 19.971 percent

Example three: regression tree n _ estimators =1000, percentage absolute error: 20.151 percent

Example four: regression tree n _ estimators =100, percentage absolute error: 19.810 percent

Example five: regression trees n _ estimators =50, percentage absolute error: 20.210 percent

Example six: regression tree n _ estimators =10, percentage absolute error: 28.341%

It can be seen that the more regression trees are, the better, the more the regression trees are, the more the hundred thousand, ten thousand, one thousand to fifty are similar in performance, and the more the number is, the larger the calculation amount is. Also, regression trees are not as few as possible, and too few trees may result in insufficient attribute selection, leading to increased accidental errors. From the above results, the effect is best when n _ estimators is about 50 to 100.

Fig. 3 and 4 are graphs comparing the prediction results after transformation and without transformation of the y value of the response variable, wherein after ln (x) transformation, n _ estimators =100, and the absolute error percentage is: 20.638 percent. No transformation is done, n _ estimators =100, percentage absolute error is: 27.944 percent.

The comparison result can show that the distribution of the sample back temperature values is changed by changing, so that the biased distribution is changed into the approximate normal distribution, and the model prediction accuracy is favorably improved.

In conclusion, the method for establishing the machine learning model for predicting the irradiation damage of the material provided by the invention further has the following advantages:

1. the dimension reduction of the attributes (36) is not needed in the modeling process, because a random forest model comprises a plurality of regression trees, and 6 attributes used by each regression tree are randomly generated (explained in the method of the next part of the technical scheme), so that the dimension reduction effect can be achieved.

2. The raw data used are skewed, i.e. there are more samples at low temperature (not higher than 300 degrees celsius) and fewer samples at high temperature, which may result in poor performance of the model on samples at high temperature. Therefore, before the model is established, the back temperature value y is transformed, and the original y value is replaced by ln (y), so that the back temperature value y basically presents normal distribution, and the accuracy of model prediction is improved.

Corresponding to the method for establishing the machine learning model for predicting the material irradiation damage provided above, the present invention also provides a system for establishing a machine learning model for predicting the material irradiation damage, as shown in fig. 5, the system includes: the device comprises a data set acquisition module 1, a data set division module 2, a machine learning surface model acquisition module 3, a training set generation module 4 and a learning model construction module 5.

The data set acquisition module 1 is used for acquiring a laser examination data set. The laser assessment data set comprises: sample data of the radiation damaged material. The sample data includes: an independent variable and a response variable.

The data set dividing module 2 is used for dividing the sample data in the laser examination data set into a training set and a test set according to a set proportion.

The machine learning surface model obtaining module 3 is used for obtaining a machine learning model. The machine learning model includes a plurality of decision tree regression models.

The training set generating module 4 is used for generating a new training set with the same number as the decision tree regression model according to the sample data in the training set.

The learning model building module 5 is configured to train the machine learning model by using the independent variable of the sample data in the new training set as an input and using the response variable of the sample data in the training set as an output, so as to obtain the trained machine learning model. The trained machine learning model is the material irradiation damage prediction machine learning model.

Wherein, the training set generating module preferably includes: and a training set generation unit.

The training set generation unit is used for forming a new training set after self-help sampling is carried out on the sample data in the training set.

Further, in order to improve the accuracy of the prediction result, the learning model building module specifically includes: the device comprises a characteristic determining unit, a region dividing unit, a response variable value acquiring unit and a training unit.

The feature determination unit is used for determining features of the decision tree regression model according to the number of independent variables of the sample data in the new training set.

And the region division unit is used for carrying out data region division on the sample data in the new training set according to the characteristics of the decision tree regression model to obtain a data division region.

The response variable value acquisition unit is used for acquiring the average value of the response variables of the sample data in the new training set and inputting the response variable value of the sample data in each data division region.

In order to verify the prediction accuracy of the constructed material irradiation damage prediction machine learning model, the system for establishing the material irradiation damage prediction machine learning model, provided by the invention, preferably further comprises: and the test module is used for testing the material irradiation damage prediction machine learning model by adopting a test set.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the description of the method part.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the foregoing, the description is not to be taken in a limiting sense.

Claims

1. A method for establishing a machine learning model for predicting material irradiation damage is characterized by comprising the following steps:

generating a new training set with the same number as the regression model of the decision tree according to the sample data in the training set;

training the machine learning model by taking the independent variable of the sample data in the new training set as input and taking the response variable of the sample data in the training set as output to obtain the trained machine learning model, and specifically comprising the following steps:

carrying out data area division on the sample data in the new training set according to the characteristics of the decision tree regression model to obtain a data division area;

determining a loss function of a machine learning model according to the response variable value and the average value of the response variable, and finishing the training of the machine learning model; and the trained machine learning model is a material irradiation damage prediction machine learning model.

2. The method for establishing the machine learning model for predicting the material irradiation damage according to claim 1, wherein a new training set with the same number as that of the decision tree regression model is generated according to sample data in the training set, and specifically comprises:

3. The method for establishing the material irradiation damage prediction machine learning model according to claim 1, wherein sample data in the laser examination data set is divided into a training set and a test set according to a ratio of 8.

4. The method for establishing the machine learning model for predicting the material irradiation damage according to claim 1, further comprising:

5. A system for establishing a machine learning model for predicting material irradiation damage is characterized by comprising the following steps:

the machine learning model acquisition module is used for acquiring a machine learning model; the machine learning model comprises a plurality of decision tree regression models;

the learning model building module is used for training the machine learning model by taking the independent variable of the sample data in the new training set as input and taking the response variable of the sample data in the training set as output to obtain the trained machine learning model; the trained machine learning model is a material irradiation damage prediction machine learning model;

the learning model building module specifically comprises:

the response variable value acquisition unit is used for acquiring the average value of the response variables of the sample data in the new training set and inputting the response variable value of the sample data in each data division region;

6. The system for establishing the material irradiation damage prediction machine learning model according to claim 5, wherein the training set generation module specifically comprises:

7. The system for establishing the material irradiation damage prediction machine learning model according to claim 5, further comprising: