CN116629128B

CN116629128B - Method for controlling arc additive forming based on deep reinforcement learning

Info

Publication number: CN116629128B
Application number: CN202310620763.5A
Authority: CN
Inventors: 邓路兵; 董博伦; 蔡笑宇; 林三宝
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2024-03-29
Anticipated expiration: 2043-05-30
Also published as: CN116629128A

Abstract

The invention relates to a method for controlling arc additive forming based on deep reinforcement learning, and belongs to the technical field of arc additive manufacturing. The method solves the problems that the process parameters of the complex components are difficult to determine and the molding is difficult to regulate and control. The method comprises the following steps: s1: performing numerical simulation on the arc material adding process; s2: acquiring numerical simulation temperature field information of an arc material adding process and processing the numerical simulation temperature field information; s3: constructing an arc additive manufacturing reinforcement learning environment and an intelligent body; s4: setting up a value network and a decision network; s5: training the network based on the environment built by the S3 by utilizing the temperature field information acquired by the S2; s6: and (3) automatically adjusting lamination parameters in the arc material-increasing process by utilizing the trained neural network in the step (S5), and keeping the fusion width and the fusion depth of the lamination layer stable. The method has good generalization capability, is suitable for components with complex shapes, and can reduce the time cost and the material cost for exploring technological parameters by applying the parameters executed by the intelligent agent to the actual arc material-increasing process after correcting the numerical simulation model.

Description

Method for controlling arc additive forming based on deep reinforcement learning

Technical Field

The invention relates to an additive forming method, and belongs to the technical field of arc additive manufacturing.

Background

With the advent of AlphaGo, deep reinforcement learning has been vigorously developed. Reinforcement learning is a branch of machine learning, is suitable for scenes in which tasks need to be completed through trial and error, can make independent decisions based on environments, continuously adjusts strategies according to environment feedback, achieves the capability of quickly adapting to environment changes, and can well solve the problems in dynamic scenes. The cost function and the strategy function model in the deep reinforcement learning are established based on the neural network, so that the problem of a high-dimensional state space can be well processed, the search space can be optimized based on the reward function, and the intelligent agent can make a better decision by maximizing the expected reward. With the development of technology, the cross fusion of the arc additive forming technology and reinforcement learning is further deepened, and the method provides possibility for solving the problems existing in arc additive manufacturing.

Arc additive manufacturing is an emerging metal additive manufacturing technology that utilizes an arc as a heat source by layering the wires layer by layer into a three-dimensional member after melting. At present, arc additive manufacturing process parameters are mainly determined by performing experiments to continuously try and error. Since arc additive manufacturing is a multi-physical field coupling process, it is difficult to determine its optimal parameters through practical experiments for complex components, and due to constantly changing heat dissipation conditions and severe heat accumulation effects.

Therefore, a method for controlling arc additive forming based on deep reinforcement learning is needed to solve the above-mentioned problems.

Disclosure of Invention

The present invention solves the problems of difficult determination of process parameters and difficult regulation of molding of complex components, and provides a method of controlling arc additive molding based on deep reinforcement learning, a brief summary of which is provided below in order to provide a basic understanding of certain aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention.

The technical scheme of the invention is as follows:

a method for controlling arc additive forming based on deep reinforcement learning, comprising the following steps:

s1: performing numerical simulation on the arc material adding process;

s2: acquiring numerical simulation temperature field information of an arc material adding process and processing the numerical simulation temperature field information;

s3: constructing an arc additive manufacturing reinforcement learning environment and an intelligent body;

s4: setting up a value network and a decision network;

s5: training the network based on the environment built by the S3 by utilizing the temperature field information acquired by the S2;

s6: and (3) automatically adjusting lamination parameters in the arc material-increasing process by utilizing the trained neural network in the step (S5), and keeping the fusion width and the fusion depth of the lamination layer stable.

Preferably: in S1, discretizing a complex component, decomposing the complex component into a multi-layer single-channel component, a multi-layer multi-channel component and a cross structure, respectively carrying out numerical simulation on the complex component, controlling the grid quantity of each model, ensuring the shape diversity of the multi-layer single-channel component, the multi-layer multi-channel component and the cross structure, adopting a gradual calculation method in the simulation calculation process of the arc material-increasing process, dispersing each welding seam into a plurality of calculation working conditions, independently determining the optimal technological parameters of each working condition by an intelligent agent, and correcting the model.

Preferably: the number of grids is controlled within 50000.

Preferably: s2, opening a temperature field model diagram of a calculation result, recording temperature intervals corresponding to various colors in the temperature field cloud diagram, and calculating a median value of each temperature interval; acquiring temperature field cloud picture screenshot of each interlayer and the substrate, sequentially stacking all acquired temperature field pictures, and filling the pictures below the substrate picture and above the current laminated layer picture with monochromatic pictures for representing an environmental temperature state; converting a color temperature field picture into a gray picture under the condition that information contained in the temperature field picture is not reduced, performing feature processing on the gray picture to accelerate the training process of a neural network because temperature information represented by each pixel value in the converted gray picture is not clear, defining one pixel value in the gray picture as ambient temperature, defining the other pixel value in the gray picture as material melting point temperature, normalizing the values of each temperature field obtained by the previous calculation to other pixel values corresponding to two pixel value ranges, and replacing the pixels corresponding to each temperature in the gray picture with the calculated pixel values to form a new gray picture; and sequentially stacking all the processed gray-scale pictures into a picture sequence containing time dimension and space dimension temperature field information, and taking the picture sequence as the input of the neural network.

Preferably: s2, filling the lower surface of the substrate picture and the upper surface of the current lamination layer picture with 5 white pictures; and defining a pixel value 0 in the gray picture as an ambient temperature of 25 ℃, defining a pixel value 255 in the gray picture as a material melting point temperature, and normalizing the values of all the temperature fields obtained through the previous calculation to corresponding image values in a range of 0-255.

Preferably: s3, defining the temperature state diagram obtained in the S2 as a real-time state of the environment, wherein the action space of the intelligent body is used for adjusting the deposition current, the lamination voltage and the lamination speed, and determining the range of lamination parameters; the intelligent agent selects deposition current, lamination voltage and lamination speed according to the state of the environment received in real time, and then submits the deposition current, lamination voltage and lamination speed to a numerical simulation solver for the next material adding process, so that the regulation and control of the melting width and the melting depth are realized; when the layering parameters are selected, an attenuation epsilon-greedy strategy is used, so that an intelligent body randomly adopts the layering parameters to obtain various temperature field states for learning when the intelligent body just starts to learn, the optimal parameters which should be adopted currently are predicted better according to the current temperature field states so as to keep the melting width and the melting depth stable, and when the intelligent body trains to the later stage, the layering parameters output by a decision network are adopted; after the current numerical simulation calculation is completed, the robot enters the next environmental state and obtains corresponding rewards.

Preferably: in S3, the rewarding function of the corresponding rewarding of the robot entering the next environmental state is defined as follows:

wherein: d is the optimal penetration, D is the real-time penetration of the actual additive process, W is the optimal penetration, W is the real-time penetration of the actual additive process, and the rewarding value is limited to the range of < -10, 10 >.

Preferably: in S4 and S5, extracting time dimension and space dimension characteristics of the temperature state diagram obtained in S2 by adopting a 3D convolution layer; the value network and the decision network share weight, and only different neurons are adopted at the output layer for output; and optimizing the built model by adopting a near-end strategy optimization algorithm and a multithreading synchronous updating mode.

Preferably: in the multithreading synchronous updating mode, 12 numerical simulation solving environments are built at the same time, the additive component models of each numerical simulation solving environment are set to be different, an intelligent body interacts with the 12 environments through the same value network and strategy network, and in the interaction process, the intelligent body can record the temperature field state before each interaction, the lamination parameters executed during the interaction, the temperature field state after the interaction and the obtained rewards; after the intelligent agent interacts with all environments for 60 times, the intelligent agent updates the value network and the decision network by using the recorded 720 times of interaction information as one batch of data, 72 pieces of recording information are input when each time is updated, each batch of data trains the neural network for three rounds, each batch of data interacts with the environment by using a new network after training, and the next round of learning process is entered after the interaction process is completed.

Preferably: and S6, fixing the parameters of the neural network after the training of the neural network is completed, and adopting a greedy strategy when an intelligent agent interacts with the environment, and directly adopting the layering parameters with the maximum output probability of the current strategy network.

The invention has the following beneficial effects:

according to the invention, the temperature field state of the material adding process is obtained in real time through numerical simulation, and based on the built reinforcement learning environment, the intelligent body continuously interacts with temperature field information provided by the environment, so that the weights of a value network and a decision network are continuously optimized; after training, the intelligent agent adjusts the lamination parameters in real time through the optimal lamination parameters provided by the decision network so as to keep the penetration and the fusion width of the lamination process stable;

the invention has good generalization capability, can be suitable for various materials and components with complex shapes, can apply the parameters executed by the intelligent body to the actual arc material-increasing process, and can reduce the time cost and the material cost for exploring the technological parameters.

Drawings

FIG. 1 is an agent and environment interactive learning process;

FIG. 2 is a temperature field state diagram process flow diagram;

fig. 3 is a block diagram of a value network and decision network.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention is described below by means of specific embodiments shown in the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.

The first embodiment is as follows: 1-3, a method for controlling arc additive forming based on deep reinforcement learning according to the present embodiment includes the following steps:

s1: performing numerical simulation on the arc material adding process;

s4: setting up a value network and a decision network;

s6: the trained neural network of the S5 is utilized to automatically adjust the lamination parameters in the arc material-increasing process, and the fusion width and the fusion depth of the lamination layer are kept stable;

and a plurality of digital simulation solvers are deeply interacted with the built reinforcement learning environment, and the numerical simulation temperature field results are processed and then are input into a value network and a strategy network as environment states, so that the value network and the strategy network parameters are continuously optimized, the neural network is enabled to evaluate the temperature field states of the current material adding process more accurately, and the layering parameters output by the strategy network are more approximate to theoretical optimal parameters, thereby realizing real-time adjustment of arc material adding manufacturing process parameters and forming control.

The second embodiment is as follows: in S1, in order to improve the calculation efficiency of numerical simulation, discretizing a complex component during numerical simulation, decomposing the complex component into a multi-layer single-channel component and/or a multi-layer multi-channel component and/or a cross structure, respectively performing numerical simulation on the complex component, controlling the grid number of each model within 50000 so as to improve the calculation speed, and simultaneously, in order to ensure the generalization capability of the method, ensuring the diversity of the shapes of the multi-layer single-channel component, the multi-layer multi-channel component and the cross structure, adopting a step-by-step calculation method in the simulation calculation process of the arc additive process, dispersing each welding seam into a plurality of calculation working conditions, independently determining the optimal technological parameter of each working condition by an intelligent body, and enabling the simulation result to be consistent with the actual experimental result by the numerical simulation model through calibrating the heat source parameter and the heat dissipation condition.

And a third specific embodiment: 1-3, in S2, a temperature field model diagram of a calculation result is opened, temperature intervals corresponding to various colors in the temperature field cloud diagram are recorded, and a median value of each temperature interval is calculated; acquiring temperature field cloud picture screenshot of each interlayer and substrate, stacking all acquired temperature field pictures in sequence for better reaction of the temperature field state of the material adding process, and filling the lower surface of the substrate picture and the upper surface of the current laminated layer picture with monochromatic pictures for representing the environmental temperature state; in order to reduce the data amount input by the neural network, converting a color temperature field picture into a gray picture under the condition of not reducing the information contained in the temperature field picture, performing feature processing on the gray picture to accelerate the training process of the neural network because the temperature information represented by each pixel value in the converted gray picture is not clear, defining one pixel value in the gray picture as the ambient temperature, defining the other pixel value in the gray picture as the material melting point temperature, normalizing the value of each temperature field obtained by the previous calculation to other corresponding pixel values in the range of two pixel values, and replacing the pixel corresponding to each temperature in the gray picture with the pixel value obtained by the calculation to form a new gray picture; and sequentially stacking all the processed gray-scale pictures into a picture sequence containing time dimension and space dimension temperature field information, and taking the picture sequence as the input of the neural network.

The specific embodiment IV is as follows: 1-3, in S2, the lower surface of the substrate picture and the upper surface of the current lamination layer picture are filled with 5 white pictures; and defining a pixel value 0 in the gray picture as an ambient temperature of 25 ℃, defining a pixel value 255 in the gray picture as a material melting point temperature, and normalizing the values of all the temperature fields obtained through the previous calculation to corresponding image values in a range of 0-255.

Fifth embodiment: 1-3, in S3, the temperature state diagram obtained in S2 is defined as a real-time state of the environment, the action space of the intelligent body is used for adjusting deposition current, lamination voltage and lamination speed, and the range of lamination parameters is determined through practical experiments, so that the intelligent body can select proper lamination parameter combinations in the action space range in the face of different temperature field states, and the intelligent body is ensured to have complete capability of controlling penetration and stable fusion width; the intelligent agent selects deposition current, lamination voltage and lamination speed according to the state of the environment received in real time, and then submits the deposition current, lamination voltage and lamination speed to a numerical simulation solver for the next material adding process, so that the regulation and control of the melting width and the melting depth are realized; when the stacking parameters are selected, an attenuation epsilon-greedy strategy is used, so that an agent tends to randomly take the stacking parameters when just beginning to learn so as to obtain various temperature field states for learning, the value network and the weight parameters of the decision network are better optimized, the value network can better predict the value of the current temperature field state, the decision network can better predict the optimal parameters which should be taken currently according to the current temperature field state so as to keep the melting width and the melting depth stable, and when the agent trains to the later stage, the stacking parameters which are output by the decision network tend to be taken; after the current numerical simulation calculation is completed, the robot enters the next environmental state and obtains corresponding rewards.

Specific embodiment six: 1-3, in a method for controlling arc additive forming based on deep reinforcement learning according to the present embodiment, in S3, a reward function of a robot entering a next environmental state and corresponding to a reward is defined as follows:

wherein: d is the optimal penetration, D is the real-time penetration of the actual additive process, W is the optimal penetration, W is the real-time penetration of the actual additive process, and in order to prevent the problems of unstable neural network training or wrong decision and the like possibly caused by overlarge rewards, rewards calculated by the following formula are cut, and the rewards are limited to the range of [ -10, 10 ].

Seventh embodiment: 1-3 are combined to explain the embodiment, in the method for controlling arc additive forming based on deep reinforcement learning of the embodiment, in S4 and S5, a 3D convolution layer is adopted to extract the time dimension and space dimension characteristics of the temperature state diagram obtained in S2; in order to reduce the parameters of the neural network and improve the stability of the algorithm, the value network and the decision network share weight, and only different neurons are adopted at an output layer for output; optimizing the built model by adopting a near-end strategy optimization algorithm and a multithreading synchronous updating mode; by adopting the near-end strategy optimization algorithm, the transient updating of the weight value due to the collected bad data of the neural network can be effectively avoided, and the learned strategy can be ensured to be updated steadily, so that the efficient and stable training is realized.

Eighth embodiment: 1-3 are combined to explain the embodiment, in the multithreading synchronous updating mode, 12 numerical simulation solving environments are built at the same time, the material adding component model of each numerical simulation solving environment is set to be different, an intelligent agent interacts with the 12 environments through the same value network and strategy network, and in the interaction process, the intelligent agent can record the temperature field state before each interaction, the lamination parameters executed during the interaction, the temperature field state after the interaction and the obtained rewards; after the intelligent agent interacts with all environments for 60 times, the intelligent agent updates the value network and the decision network by using the recorded 720 times of interaction information as one batch of data, 72 pieces of recording information are input when each time is updated, each batch of data trains the neural network for three rounds, each batch of data interacts with the environment by using a new network after training, and the next round of learning process is entered after the interaction process is completed.

Detailed description nine: 1-3, in S6, when the neural network parameters are fixed after the neural network training is completed, an intelligent agent directly adopts a greedy strategy when interacting with the environment, and the layering parameters with the maximum probability of outputting the current strategy network are directly adopted, so that the intelligent agent can better control the layering layer melting width and the melting depth to be stable.

It should be noted that, in the above embodiments, as long as the technical solutions that are not contradictory can be arranged and combined, those skilled in the art can exhaust all the possibilities according to the mathematical knowledge of the arrangement and combination, so the present invention does not describe the technical solutions after the arrangement and combination one by one, but should be understood that the technical solutions after the arrangement and combination have been disclosed by the present invention.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The method for controlling arc additive forming based on deep reinforcement learning is characterized by comprising the following steps of: the method comprises the following steps:

s1: performing numerical simulation on the arc material adding process;

in S1, discretizing a complex component, decomposing the complex component into a multi-layer single-channel component, a multi-layer multi-channel component and a cross structure, respectively performing numerical simulation on the complex component, controlling the grid quantity of each model, ensuring the shape diversity of the multi-layer single-channel component, the multi-layer multi-channel component and the cross structure, adopting a gradual calculation method in the simulation calculation process of an arc material-increasing process, dispersing each welding seam into a plurality of calculation working conditions, independently determining the optimal technological parameters of each working condition by an intelligent agent, and correcting the model;

s2, opening a temperature field model diagram of a calculation result, recording temperature intervals corresponding to various colors in the temperature field cloud diagram, and calculating a median value of each temperature interval; acquiring temperature field cloud picture screenshot of each interlayer and the substrate, sequentially stacking all acquired temperature field pictures, and filling the pictures below the substrate picture and above the current laminated layer picture with monochromatic pictures for representing an environmental temperature state; converting a color temperature field picture into a gray picture under the condition that information contained in the temperature field picture is not reduced, performing feature processing on the gray picture to accelerate the training process of a neural network because temperature information represented by each pixel value in the converted gray picture is not clear, defining one pixel value in the gray picture as ambient temperature, defining the other pixel value in the gray picture as material melting point temperature, normalizing the values of each temperature field obtained by the previous calculation to other pixel values corresponding to two pixel value ranges, and replacing the pixels corresponding to each temperature in the gray picture with the calculated pixel values to form a new gray picture; sequentially stacking all the processed gray-scale pictures into a picture sequence containing time dimension and space dimension temperature field information, and taking the picture sequence as the input of a neural network;

s3, defining the temperature state diagram obtained in the S2 as a real-time state of the environment, wherein the action space of the intelligent body is used for adjusting the deposition current, the lamination voltage and the lamination speed, and determining the range of lamination parameters; the intelligent agent selects deposition current, lamination voltage and lamination speed according to the state of the environment received in real time, and then submits the deposition current, lamination voltage and lamination speed to a numerical simulation solver for the next material adding process, so that the regulation and control of the melting width and the melting depth are realized; when the layering parameters are selected, an attenuation epsilon-greedy strategy is used, so that an intelligent body randomly adopts the layering parameters to obtain various temperature field states for learning when the intelligent body just starts to learn, the optimal parameters which should be adopted currently are predicted better according to the current temperature field states so as to keep the melting width and the melting depth stable, and when the intelligent body trains to the later stage, the layering parameters output by a decision network are adopted; after the current numerical simulation calculation is completed, the robot enters the next environmental state and obtains corresponding rewards;

s4: setting up a value network and a decision network;

2. The method for controlling arc additive forming based on deep reinforcement learning of claim 1, wherein: s2, filling the lower surface of the substrate picture and the upper surface of the current lamination layer picture with 5 white pictures; and defining a pixel value 0 in the gray picture as an ambient temperature of 25 ℃, defining a pixel value 255 in the gray picture as a material melting point temperature, and normalizing the values of all the temperature fields obtained through the previous calculation to corresponding image values in a range of 0-255.

3. The method for controlling arc additive forming based on deep reinforcement learning of claim 1, wherein: in S4 and S5, extracting time dimension and space dimension characteristics of the temperature state diagram obtained in S2 by adopting a 3D convolution layer; the value network and the decision network share weight, and only different neurons are adopted at the output layer for output; and optimizing the built model by adopting a near-end strategy optimization algorithm and a multithreading synchronous updating mode.

4. A method of controlling arc additive forming based on deep reinforcement learning according to claim 3, wherein: in the multithreading synchronous updating mode, 12 numerical simulation solving environments are built at the same time, the additive component models of each numerical simulation solving environment are set to be different, an intelligent body interacts with the 12 environments through the same value network and strategy network, and in the interaction process, the intelligent body can record the temperature field state before each interaction, the lamination parameters executed during the interaction, the temperature field state after the interaction and the obtained rewards; after the intelligent agent interacts with all environments for 60 times, the intelligent agent updates the value network and the decision network by using the recorded 720 times of interaction information as one batch of data, 72 pieces of recording information are input when each time is updated, each batch of data trains the neural network for three rounds, each batch of data interacts with the environment by using a new network after training, and the next round of learning process is entered after the interaction process is completed.

5. The method for controlling arc additive forming based on deep reinforcement learning of claim 1 or 4, wherein: and S6, fixing the parameters of the neural network after the training of the neural network is completed, and adopting a greedy strategy when an intelligent agent interacts with the environment, and directly adopting the layering parameters with the maximum output probability of the current strategy network.