CN117915540A

CN117915540A - Beam current regulating and controlling method and device

Info

Publication number: CN117915540A
Application number: CN202410200312.0A
Authority: CN
Inventors: 李延林; 牟宏进; 安石; 刘小军; 金东晖; 张玮
Original assignee: Institute of Modern Physics of CAS
Current assignee: Institute of Modern Physics of CAS
Priority date: 2024-02-23
Filing date: 2024-02-23
Publication date: 2024-04-19

Abstract

The invention relates to the field of beam optimization, in particular to a beam regulation method and a device, which are implemented by acquiring the working state information of an accelerator; inputting the accelerator working state information into a trained DQN model; receiving a target regulatory instruction from the DQN model; and sending the target regulation and control instruction to a controlled accelerator, so that the controlled accelerator regulates and controls beam current according to the target regulation and control instruction. According to the method, simulation results obtained by a large number of electromagnetic power supplies under different working states after different changing instructions are obtained through the simulation model of the controlled accelerator in advance, and the DQN original model is trained through the simulation results, so that the intelligent body capable of automatically adjusting the exciting power supply is finally obtained, and beam optimization is completed.

Description

Beam current regulating and controlling method and device

Technical Field

The present invention relates to the field of beam optimization, and in particular, to a beam adjustment method and apparatus.

Background

The accelerator device comprises a large number of magnets, and hundreds of power supplies supply exciting currents to the magnets, so that the purpose of changing the magnetic field intensity in the magnets is achieved, and the functions of focusing, deflecting and the like of beam current are achieved. The technician controls the motion profile of the particles in the vacuum trajectory by varying these power supplies. From the control point of view, let as many particles as possible pass through the end of the track, is an important component of beam tuning. This is a time and effort consuming task that requires a certain experience and specialized physical knowledge of the technician. The adjustment efficiency is reduced to a certain extent, and the manpower is wasted.

Therefore, how to improve the beam current regulation and control efficiency, realize automation and reduce the labor cost is a problem to be solved urgently by those skilled in the art.

Disclosure of Invention

The invention aims to provide a beam current regulating and controlling method and device, which are used for solving the problems of low beam current regulating and controlling efficiency and high labor cost in the prior art.

In order to solve the above technical problems, the present invention provides a beam current regulation method, including:

Acquiring accelerator working state information;

inputting the accelerator working state information into a trained DQN model;

receiving a target regulatory instruction from the DQN model;

the target regulation and control instruction is sent to a controlled accelerator, so that the controlled accelerator regulates and controls beam current according to the target regulation and control instruction;

the training method of the DQN model comprises the following steps:

Acquiring initial state information and an action information group; the initial state information comprises initial working state information of all excitation power supplies of the controlled accelerator; each single action instruction in the action information group comprises a change instruction of all the excitation power supplies;

transmitting the initial state information and the action information group to an original DQN model, so that an intelligent agent of the original DQN model performs a first number of action-beam optical simulation by using the initial state information and the action information group through a pre-built simulation model of a controlled accelerator to obtain a first number of simulation results; the single simulation result comprises pre-action state information, single action instructions, post-action state information, terminal arrival quantity information and action instruction evaluation information; wherein the value of the end arrival quantity information is positively correlated with the value of the action instruction evaluation information;

And training the original DQN model by taking the first number of simulation results as a training set to obtain the DQN model.

Optionally, in the beam current regulation method, the sending the initial state information and the action information set to an DQN original model, so that an agent of the DQN original model performs a first number of action-beam current optical simulations by using the initial state information and the action information set through a pre-built simulation model of the controlled accelerator, where the obtaining a first number of simulation results includes:

Transmitting the initial state information and the action information group to an original DQN model, so that an intelligent agent of the original DQN model performs a first number of action-beam optical simulation by using the initial state information and the action information group through a pre-built simulation model of a controlled accelerator to obtain a first number of simulation results; and each time the second number of actions are performed, namely the beam optics simulation, the corresponding second number of simulation results are exported from the volatile memory and used as storage data of corresponding rounds.

Optionally, in the beam current regulation method, after each action-beam current optical simulation, the method further includes:

Judging whether the state information after the action corresponding to the action-beam optical simulation exceeds a control boundary value of the excitation power supply or not;

And when the state information after the action exceeds the control boundary value of the excitation power supply, setting the instruction scoring information corresponding to the action-beam optical simulation to be negative, ending the current round, and deriving a corresponding simulation result from a volatile memory as storage data of the corresponding round.

Optionally, in the beam current regulation method, the single action-beam current optical simulation includes:

The intelligent agent samples from the action information group according to a uniform sampling strategy to obtain a single action instruction corresponding to single action-beam optical simulation, and sends the single action instruction to the simulation model;

The simulation model carries out state updating according to the single action instruction, and starts a large number of particles of a preset type, the initial state of which is Gaussian distribution, and the particles move from the initial end to the corresponding tail end of a controlled accelerator in the simulation model according to preset energy and phase distribution, wherein the particles move according to the limitations of beam optics and the physical size of the controlled accelerator, and the particles exceeding the pipeline size of the controlled accelerator are removed at any time.

Optionally, in the beam regulation method, training the DQN original model with the first number of simulation results as a training set, and obtaining the DQN model includes:

Training the DQN original model by taking the first number of simulation results as a training set, and determining the model obtained by training for every third number of times as a model to be selected;

The DQN model is determined from a plurality of the candidate models.

Optionally, in the beam current regulation method, the DQN model is a model of a 4-layer neural network.

Optionally, in the beam current regulation method, the receiving the target regulation command from the DQN model includes:

receiving a target magnetic field regulation command from the DQN model;

Determining a target voltage regulation command corresponding to the target magnetic field regulation command according to a pre-stored magnetic field-voltage correspondence;

Correspondingly, the sending the target regulation and control instruction to the controlled accelerator, so that the controlled accelerator regulates and controls the beam current according to the target regulation and control instruction comprises:

And sending the target voltage regulation and control instruction to a controlled accelerator, so that the controlled accelerator regulates and controls beam current according to the target voltage regulation and control instruction.

Optionally, in the beam current regulation method, the training the original DQN model further includes:

Optimizing a neural network of the DQN original model by using an Adam optimizer; the training learning rate was 0.0001 and the discount rate was 0.9.

Optionally, in the beam current regulation method, the DQN model is set in an EPICS framework.

A beam steering apparatus comprising:

the acquisition module is used for acquiring the working state information of the accelerator;

the input module is used for inputting the accelerator working state information into the trained DQN model;

the receiving module is used for receiving a target regulation and control instruction from the DQN model;

The sending module is used for sending the target regulation and control instruction to the controlled accelerator so that the controlled accelerator regulates and controls beam current according to the target regulation and control instruction;

the training method of the DQN model comprises the following steps:

The information acquisition module is used for acquiring initial state information and action information groups; the initial state information comprises initial working state information of all excitation power supplies of the controlled accelerator; each single action instruction in the action information group comprises a change instruction of all the excitation power supplies;

The simulation module is used for sending the initial state information and the action information group to an original DQN model, so that an intelligent body of the original DQN model can perform a first number of action-beam optical simulation through a pre-built simulation model of the controlled accelerator by utilizing the initial state information and the action information group to obtain a first number of simulation results; the single simulation result comprises pre-action state information, single action instructions, post-action state information, terminal arrival quantity information and action instruction evaluation information; wherein the value of the end arrival quantity information is positively correlated with the value of the action instruction evaluation information;

And the training module is used for training the DQN original model by taking the first number of simulation results as a training set to obtain the DQN model.

According to the beam current regulation and control method provided by the invention, the working state information of the accelerator is obtained; inputting the accelerator working state information into a trained DQN model; receiving a target regulatory instruction from the DQN model; the target regulation and control instruction is sent to a controlled accelerator, so that the controlled accelerator regulates and controls beam current according to the target regulation and control instruction; the training method of the DQN model comprises the following steps: acquiring initial state information and an action information group; the initial state information comprises initial working state information of all excitation power supplies of the controlled accelerator; each single action instruction in the action information group comprises a change instruction of all the excitation power supplies; transmitting the initial state information and the action information group to an original DQN model, so that an intelligent agent of the original DQN model performs a first number of action-beam optical simulation by using the initial state information and the action information group through a pre-built simulation model of a controlled accelerator to obtain a first number of simulation results; the single simulation result comprises pre-action state information, single action instructions, post-action state information, terminal arrival quantity information and action instruction evaluation information; wherein the value of the end arrival quantity information is positively correlated with the value of the action instruction evaluation information; and training the original DQN model by taking the first number of simulation results as a training set to obtain the DQN model. According to the method, simulation results are obtained in advance through simulation models of the controlled accelerator, after electromagnetic power supplies in different working states are subjected to different changing instructions, the DQN original model is trained through the simulation results, and finally the intelligent body capable of automatically adjusting the exciting power supply and completing beam optimization is obtained. The invention also provides a beam current regulating device with the beneficial effects.

Drawings

For a clearer description of embodiments of the invention or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a regulating workflow of a specific embodiment of a beam current regulating method according to the present invention;

FIG. 2 is a schematic diagram of a training method of a DQN model according to an embodiment of the beam current control method provided by the present invention;

fig. 3 is a schematic structural diagram of an embodiment of the beam current adjusting device provided by the invention.

100-Acquisition module, 200-input module, 300-receiving module, 400-sending module, 500-information acquisition module, 600-simulation module, 700-training module.

Detailed Description

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The core of the present invention is to provide a beam current regulation method, wherein a flow diagram of one specific embodiment is shown in fig. 1 to 2, which is referred to as a specific embodiment, and a regulation workflow is shown in fig. 1 and includes:

s101: and acquiring accelerator working state information.

S102: and inputting the accelerator working state information into a trained DQN model.

S103: a target regulatory command is received from the DQN model.

The DQN model refers to a Deep Q Network model, hereinafter abbreviated as DQN model.

S104: and sending the target regulation and control instruction to a controlled accelerator, so that the controlled accelerator regulates and controls beam current according to the target regulation and control instruction.

It should be noted that, the steps S101 to S104 are all beam current adjusting methods, and the adjusting result is already executed until the step S104, and the adjusting process is ended, and the steps S201 to S203 are training methods of the DQN model used in the adjusting process, and there is no sequence between the steps of the two methods.

The training method of the DQN model is shown in figure 2, and comprises the following steps:

s201: acquiring initial state information and an action information group; the initial state information comprises initial working state information of all excitation power supplies of the controlled accelerator; each single action instruction in the action information group includes a change instruction of all the exciting power supply.

The starting state is the working state corresponding to each excitation power supply in the simulation model at the starting time. If four excitation power sources exist, the initial state information can be [50, -2000, 2100, -2300, 3000], which represents that the beam energy is 50Mev, and the magnetic induction intensities of the four excitation power sources in the initial state are respectively-2000 Gs, 2100Gs, -2300Gs and 3000Gs.

The single action instruction comprises a change instruction of all exciting power supplies, and if four exciting power supplies exist in the controlled accelerator, the single action instruction can be [50, 50, -50, 50], namely the magnetic induction intensity of a first exciting power supply, a second exciting power supply and a fourth exciting power supply in the four exciting power supplies is respectively increased by 50, and the magnetic induction intensity of a third exciting power supply is increased by-50.

The data used during training were 30 Mev, 35 Mev, 40 Mev, 45 Mev, 50 Mev. The beam current of any energy between 30 mev and 50 mev can be optimized for practical use.

S202: transmitting the initial state information and the action information group to an original DQN model, so that an intelligent agent of the original DQN model performs a first number of action-beam optical simulation by using the initial state information and the action information group through a pre-built simulation model of a controlled accelerator to obtain a first number of simulation results; the single simulation result comprises pre-action state information, single action instructions, post-action state information, terminal arrival quantity information and action instruction evaluation information; wherein the value of the end arrival number information is positively correlated with the value of the action instruction evaluation information.

In actual operation, the simulation model is built by simulating the actual device, so that the intelligent body operates the simulation model, and all operations of the intelligent body are recorded to be used as a training set for training the intelligent body. After learning the data in the training set, the intelligent agent has the function of giving the best action under the given state. After the intelligent agent circularly operates the simulation device for a plurality of times, the particle number reaching the tail end of the simulation device can be optimized.

It is expensive to have the agent operate the actual device directly. Therefore, it is a necessary way to build the simulation device.

The effect of directly controlling the actual device by the intelligent body which learns the optimal control strategy on the simulation device is related to the simulation accuracy, and the intelligent body which learns on the simulation device can adjust the beam current to be near the optimal value according to the current algorithm. The purpose of regulating and controlling the actual device can be achieved through fine adjustment of the model in the later period.

The simulation principle of the pre-built simulation model of the controlled accelerator comprises the following steps: 1. the movement of the particles in the simulation device is performed optically according to the beam; 2. a large number of particles (at least more than 5 ten thousand) are adopted for Monte Carlo motion simulation; 3. simulation is performed according to the physical dimensions of the actual device. For example, the drift tube radius is 30mm, and after the simulated particles pass through the drift tube, the position coordinates of the particles are counted, and the particles which are more than 30mm away from the center of the orbit are removed.

Specifically, the corresponding single action instructions are sequentially executed, and after execution is completed, the number of particles reaching the preset area (that is, the end reaching number information) is acquired, and the number is used as the evaluation information of the action, so that the number of particles reaching the preset area is increased by a good action. In addition, the end arrival number information is information reflecting the number of particles arriving at the end of the device, and the number of particles arriving at the end is an evaluation index in the present invention, however, in order to increase sensitivity in training, this part of data may be processed, for example, multiplied by a certain multiple, or log, etc., and the number of particles arriving at the end is still positively correlated, and whether the above processing is performed may be determined according to practical situations.

The first quantity can be valued according to actual conditions, for example, 5000 tens of thousands of actions are performed, namely, beam optical simulation is performed, and 5000 tens of thousands of simulation results are correspondingly obtained.

The action instruction evaluation information is also called Q value in the DQN model.

When the particles pass through the magnet corresponding to the excitation power supply, the calculation is carried out according to the external state (corresponding to the magnetic induction intensity of the equipment) at the time, the state and the attribute (speed, position, energy and type) of the particles and the size of the quadrupolar iron. The state (velocity, position, energy) of the particle itself as it leaves the quadrupolar iron is calculated.

Upon exiting the focusing magnet, particles enter the drift tube, so the simulation model of the controlled accelerator further includes a simulation of the drift tube: the simulation of the segment device does not take into account the interactions between the particles, so that the particles are no longer constrained by the electric and magnetic forces after entering the segment, and the speed and energy are no longer changed. The state of the particles in the vacuum tube is considered as uniform linear motion. At the end of the drift tube, particles are screened, and the particles with the position coordinates larger than or equal to the size of the drift tube are removed.

The selection of the single action instruction in each action-beam optical simulation by the agent in this step may be random or may be a preset rule, and the invention is not limited herein, and may be adjusted according to actual situations, and is preferably uniform collection.

As a preferred embodiment, the present step includes:

And according to the initial state information and the action information group, the intelligent agent samples actions from the action group by using a fixed uniform sampling strategy. And sending the motion to a simulation device, and after the simulation device updates the state of the current equipment according to the motion, starting a large number of particles of specified types, the initial state of which is Gaussian distribution, so as to design given energy and phase distribution, moving from the starting end of the device to the tail end of the device, and moving the particles strictly according to the limitations of beam optics and the physical size of the device. Particles exceeding the pipeline size are removed at any time. The number of particles reaching the end of the device will be used as an evaluation index for the present operation.

Since this step is actually a process of preparing materials for the subsequent neural network learning, a large amount of data needs to be generated (that is, the first number of values is relatively huge, usually about ten millions of values), so in this embodiment, the first number of motion-beam optical simulations are not completed at one time, but after the second number of motion-beam optical simulations are performed, that is, the simulation result corresponding to the portion is derived from the volatile memory and is stored in the nonvolatile memory, and a round of motion-beam optical simulations is said to be completed, and the simulation result derived from the batch is said to be the stored data corresponding to the round, which is, of course, the quotient obtained by dividing the first number by the second number if there is no other fault problem. In other words, in order to ensure efficiency in actual operation, the simulation results obtained by the action-beam optical simulation are stored in the RAM (volatile memory) of the system, and the total first number of action-beam optical simulations usually require several tens of days, if the problems such as power failure or program abnormality occur in the middle of the operation, the data in the RAM may be lost, and in the preferred embodiment, after a certain number of action-beam optical simulations (i.e., the second number of action-beam optical simulations) pass, the data in the RAM is first exported and stored in the nonvolatile memory, so that the data loss caused by accidents is reduced, and the working stability of the system is improved.

Still further, the single pass motion-beam optical simulation includes:

A1: and the intelligent agent samples from the action information group according to a uniform sampling strategy to obtain a single action instruction corresponding to single action-beam optical simulation, and sends the single action instruction to the simulation model.

The uniform sampling means that all single action instructions acquired by the agent are uniformly distributed in the action information group, can also be regarded as the same sampling interval, and cover the action information group.

A2: the simulation model carries out state updating according to the single action instruction, and starts a large number of particles of a preset type, the initial state of which is Gaussian distribution, and the particles move from the initial end to the corresponding tail end of a controlled accelerator in the simulation model according to preset energy and phase distribution, wherein the particles move according to the limitations of beam optics and the physical size of the controlled accelerator, and the particles exceeding the pipeline size of the controlled accelerator are removed at any time.

In this embodiment, a single process of the motion-beam optical simulation is specifically described, and the single motion instruction is extracted by using the uniform sampling strategy, so that the representativeness of the obtained training set can be greatly improved, and the accuracy of the finally obtained DQN model is further improved. In addition, in the specific embodiment, particles exceeding the pipeline size of the controlled accelerator are removed at any time in the simulation process, namely, once the particles exceed the pipeline size of the controlled accelerator, the particles cannot reach the corresponding tail end anyway, and at the moment, the particles are directly removed without calculating the subsequent motion trail, so that the calculation force occupation is greatly reduced, the operation efficiency is improved, and the calculation force resource is saved.

Preferably, the first number is not less than 2000 ten thousand.

S203: and training the original DQN model by taking the first number of simulation results as a training set to obtain the DQN model.

Still further, after each action-beam optical simulation, it further includes:

S2021: and judging whether the state information after the action corresponding to the action-beam optical simulation exceeds the control boundary value of the excitation power supply.

The control boundary value refers to the working range of the excitation power supply, and beyond the control boundary, the excitation power supply cannot be realized.

S2022: and when the state information after the action exceeds the control boundary value of the excitation power supply, setting the instruction scoring information corresponding to the action-beam optical simulation to be negative, ending the current round, and deriving a corresponding simulation result from a volatile memory as storage data of the corresponding round.

The excitation power supply cannot work normally due to exceeding the control boundary, and belongs to meaningless data, and the program gives punishment to the corresponding single action instruction in the case. And negative division is carried out on the current single action instruction, and meanwhile, the current round is ended, and the number of simulation results in the round is smaller than the second number.

As a preferred embodiment, after each derivation of the stored data of the corresponding round, it further comprises:

and resetting the working state of the simulation model by using the initial state information.

That is, in the preferred embodiment, each turn starts from the working state corresponding to the initial state information, and the simulation model is adjusted, so that the simulation model can be ensured not to accumulate errors, and the simulation accuracy is improved. Preferably, the second number of values is not less than 2000, if the number of times of action-beam optical simulation performed in a single round is too small, it is unfavorable for data exploration, and if the number of times is too large, the risk of data loss is increased, so after a large number of theoretical calculation and actual inspection, the above-mentioned preferred parameter range is obtained, and both data exploration and data security are considered, and of course, the present invention is not limited herein.

In addition, training the original model of the DQN by using the first number of simulation results as a training set, and obtaining the DQN model includes:

s2031: and training the DQN original model by taking the first number of simulation results as a training set, and determining the model obtained by training for every third number of times as a model to be selected.

S2032: the DQN model is determined from a plurality of the candidate models.

In the preferred embodiment, each time the model is trained, the corresponding model is saved as the model to be selected, and finally the model to be selected is selected from all the models to be selected, because the model may have the over-fitting phenomenon with the increase of the training times, and the accuracy of the model tuning with the over-fitting phenomenon is rather reduced, so that the embodiment saves the model for subsequent comparison after a certain number of training, avoids the accuracy reduction caused by the over-fitting, and improves the accuracy of the model output, and of course, the selection in the step S2032 can be performed in various manners, such as comparison with actual data, and the like, and the invention is not limited herein. The third number should of course be smaller than the first number, e.g. the first number is 10001 and the third number ranges from 500 to 100.

Also, the DQN model is a model of a 4-layer neural network. The structure of the 4-layer neural network model simplifies the neural network result to the greatest extent on the premise of ensuring higher accuracy, greatly shortens the training time and the training difficulty, and reduces the cost. In addition, the neural network may be configured as (5, 64, 128, 64, 16), with batch_size=4000 on training.

As a preferred embodiment, the receiving a target regulatory instruction from the DQN model includes:

S1031: a target magnetic field regulation command is received from the DQN model.

S1032: and determining a target voltage regulation command corresponding to the target magnetic field regulation command according to a pre-stored magnetic field-voltage correspondence.

S1041: and sending the target voltage regulation and control instruction to a controlled accelerator, so that the controlled accelerator regulates and controls beam current according to the target voltage regulation and control instruction.

In the preferred embodiment, the correspondence between the magnetic induction intensity and the voltage is established, so that each controlled component, namely, the magnetic induction intensity corresponding to the excitation power supply, can be directly input and output in the training of the DQN model, and the target magnetic field regulation instruction output by the model is replaced by the target voltage regulation instruction which can be directly executed according to the pre-stored correspondence between the magnetic field and the voltage before the controlled accelerator is actually sent for adjustment, thereby simplifying the model training process and the control process of actual beam regulation and optimization and improving the processing efficiency.

Still further, the training of the DQN original model further includes:

Optimizing a neural network of the DQN original model by using an Adam optimizer; the training learning rate was 0.0001 and the discount rate was 0.9.Adam (Adaptive Moment Estimation) the optimizer is an adaptive optimization algorithm, which can adjust the learning rate according to the historical gradient information, and normalize the update of the parameters, so that each update of the parameters has a similar star level, thereby improving the training effect. Adam optimizers perform well in many practical problems, especially when training deep neural networks on large data sets. In addition, the parameters are all the optimal ranges after a large number of theoretical calculation and actual inspection, and of course, the parameters can be correspondingly changed according to actual needs, and the invention is not further limited herein.

Preferably, the DQN model is placed in an EPICS (Experimental PHYSICS AND Industrial Control System) framework. The EPICS framework is flexible to configure and expand, strong in compatibility and wide in universality; furthermore, the controlled equipment is controlled through pyepics, so that the compatibility is further widened.

Preferably, the original model of DQN is built up by Pytorch. PyTorch is an open source Python machine learning library, torch-based, for applications such as natural language processing. Pytorch possess better compatibility, are more flexible, have powerful tensor calculation that graphics card accelerates, with the better compatibility to neural network, namely possess better universality.

The system in which the program of the beam regulation method is located is implemented by python, and of course, other languages may be selected according to actual situations, and the invention is not limited herein.

According to the beam current regulation and control method provided by the invention, the working state information of the accelerator is obtained; inputting the accelerator working state information into a trained DQN model; receiving a target regulatory instruction from the DQN model; the target regulation and control instruction is sent to a controlled accelerator, so that the controlled accelerator regulates and controls beam current according to the target regulation and control instruction; the training method of the DQN model comprises the following steps: acquiring initial state information and an action information group; the initial state information comprises initial working state information of all excitation power supplies of the controlled accelerator; each single action instruction in the action information group comprises a change instruction of all the excitation power supplies; performing first-number-of-rounds action-beam optical simulation through a pre-built simulation model of the controlled accelerator by using the initial state information and the action information group to obtain first-number simulation results; the single simulation result comprises a second number of simulation items, and the single simulation items comprise pre-action state information, single action instructions, post-action state information, end arrival quantity information and action instruction evaluation information; wherein the value of the end arrival quantity information is positively correlated with the value of the action instruction evaluation information; and training the original DQN model by taking the first number of simulation results as a training set to obtain the DQN model. According to the method, simulation results are obtained in advance through simulation models of the controlled accelerator, after electromagnetic power supplies in different working states are subjected to different changing instructions, the DQN original model is trained through the simulation results, and finally the intelligent body capable of automatically adjusting the exciting power supply and completing beam optimization is obtained.

The beam current adjusting device provided by the embodiment of the invention is introduced below, and the beam current adjusting device described below and the beam current adjusting method described above can be referred to correspondingly.

Fig. 3 is a block diagram of a beam current adjusting device according to an embodiment of the present invention, and referring to fig. 3, the beam current adjusting device may include:

an acquisition module 100, configured to acquire accelerator operation state information;

an input module 200, configured to input the accelerator operation state information into a trained DQN model;

a receiving module 300 for receiving a target regulatory instruction from the DQN model;

The sending module 400 is configured to send the target regulation and control instruction to a controlled accelerator, so that the controlled accelerator regulates and controls the beam according to the target regulation and control instruction;

the training method of the DQN model comprises the following steps:

The information acquisition module 500 is configured to acquire initial state information and an action information set; the initial state information comprises initial working state information of all excitation power supplies of the controlled accelerator; each single action instruction in the action information group comprises a change instruction of all the excitation power supplies;

The simulation module 600 is configured to send the initial state information and the action information set to an original DQN model, so that an agent of the original DQN model performs a first number of action-beam optical simulations by using the initial state information and the action information set through a pre-built simulation model of the controlled accelerator, to obtain a first number of simulation results; the single simulation result comprises pre-action state information, single action instructions, post-action state information, terminal arrival quantity information and action instruction evaluation information; wherein the value of the end arrival quantity information is positively correlated with the value of the action instruction evaluation information;

the training module 700 is configured to train the DQN original model with the first number of simulation results as a training set, to obtain the DQN model.

As a preferred embodiment, the simulation module 600 includes:

The round simulation unit is used for sending the initial state information and the action information group to an original DQN model, so that an intelligent body of the original DQN model performs a first number of action-beam optical simulation by using the initial state information and the action information group through a pre-built simulation model of the controlled accelerator to obtain a first number of simulation results; and each time the second number of actions are performed, namely the beam optics simulation, the corresponding second number of simulation results are exported from the volatile memory and used as storage data of corresponding rounds.

As a preferred embodiment, the simulation module 600 further includes:

The boundary judging unit is used for judging whether the state information after the action corresponding to the action-beam optical simulation exceeds the control boundary value of the excitation power supply;

And the simulation termination unit is used for setting the instruction scoring information corresponding to the action-beam optical simulation to be negative when the state information after the action exceeds the control boundary value of the excitation power supply, ending the current round, and deriving a corresponding simulation result from a volatile memory as storage data of the corresponding round.

As a preferred embodiment, the training module 700 includes:

The segmentation training unit is used for training the DQN original model by taking the first number of simulation results as a training set, and determining the model obtained by training for every third number of times as a model to be selected;

And a training unit is selected and used for determining the DQN model from a plurality of models to be selected.

As a preferred embodiment, the receiving module 300 includes:

A magnetic field regulation receiving unit for receiving a target magnetic field regulation instruction from the DQN model;

the corresponding relation unit is used for determining a target voltage regulation instruction corresponding to the target magnetic field regulation instruction according to a prestored magnetic field-voltage corresponding relation;

Accordingly, the transmitting module 400 includes:

the voltage sending unit is used for sending the target voltage regulation and control instruction to the controlled accelerator, so that the controlled accelerator regulates and controls beam current according to the target voltage regulation and control instruction.

The beam current regulating and controlling device provided by the invention is used for acquiring the working state information of the accelerator through the acquisition module 100; an input module 200, configured to input the accelerator operation state information into a trained DQN model; a receiving module 300 for receiving a target regulatory instruction from the DQN model; the sending module 400 is configured to send the target regulation and control instruction to a controlled accelerator, so that the controlled accelerator regulates and controls the beam according to the target regulation and control instruction; the training method of the DQN model comprises the following steps: the information acquisition module 500 is configured to acquire initial state information and an action information set; the initial state information comprises initial working state information of all excitation power supplies of the controlled accelerator; each single action instruction in the action information group comprises a change instruction of all the excitation power supplies; the simulation module 600 is configured to send the initial state information and the action information set to an original DQN model, so that an agent of the original DQN model performs a first number of action-beam optical simulations by using the initial state information and the action information set through a pre-built simulation model of the controlled accelerator, to obtain a first number of simulation results; the single simulation result comprises pre-action state information, single action instructions, post-action state information, terminal arrival quantity information and action instruction evaluation information; wherein the value of the end arrival quantity information is positively correlated with the value of the action instruction evaluation information; the training module 700 is configured to train the DQN original model with the first number of simulation results as a training set, to obtain the DQN model. According to the method, simulation results are obtained in advance through simulation models of the controlled accelerator, after electromagnetic power supplies in different working states are subjected to different changing instructions, the DQN original model is trained through the simulation results, and finally the intelligent body capable of automatically adjusting the exciting power supply and completing beam optimization is obtained.

The beam adjustment device of this embodiment is used to implement the beam adjustment method described above, so that the specific implementation of the beam adjustment device can be found in the foregoing example portions of the beam adjustment method, for example, the acquisition module 100, the input module 200, the receiving module 300, the sending module 400, the information acquisition module 500, the simulation module 600, and the training module 700 are respectively used to implement steps S101, S102, S103, S104, S201, S202, and S203 in the beam adjustment method, so that the specific implementation thereof will not be described herein again with reference to the corresponding examples of each portion.

The invention also provides beam current regulation equipment, which comprises:

a memory for storing a computer program;

And a processor, configured to implement the steps of the beam current regulation method according to any one of the above when executing the computer program. According to the beam current regulation and control method provided by the invention, the working state information of the accelerator is obtained; inputting the accelerator working state information into a trained DQN model; receiving a target regulatory instruction from the DQN model; the target regulation and control instruction is sent to a controlled accelerator, so that the controlled accelerator regulates and controls beam current according to the target regulation and control instruction; the training method of the DQN model comprises the following steps: acquiring initial state information and an action information group; the initial state information comprises initial working state information of all excitation power supplies of the controlled accelerator; each single action instruction in the action information group comprises a change instruction of all the excitation power supplies; transmitting the initial state information and the action information group to an original DQN model, so that an intelligent agent of the original DQN model performs a first number of action-beam optical simulation by using the initial state information and the action information group through a pre-built simulation model of a controlled accelerator to obtain a first number of simulation results; the single simulation result comprises pre-action state information, single action instructions, post-action state information, terminal arrival quantity information and action instruction evaluation information; wherein the value of the end arrival quantity information is positively correlated with the value of the action instruction evaluation information; and training the original DQN model by taking the first number of simulation results as a training set to obtain the DQN model. According to the method, simulation results obtained by a large number of electromagnetic power supplies under different working states after different changing instructions are obtained through the simulation model of the controlled accelerator in advance, and the DQN original model is trained through the simulation results, so that the exciting power supply capable of being automatically adjusted is finally obtained, and beam optimization is completed.

The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the beam steering method as described in any of the above. According to the beam current regulation and control method provided by the invention, the working state information of the accelerator is obtained; inputting the accelerator working state information into a trained DQN model; receiving a target regulatory instruction from the DQN model; the target regulation and control instruction is sent to a controlled accelerator, so that the controlled accelerator regulates and controls beam current according to the target regulation and control instruction; the training method of the DQN model comprises the following steps: acquiring initial state information and an action information group; the initial state information comprises initial working state information of all excitation power supplies of the controlled accelerator; each single action instruction in the action information group comprises a change instruction of all the excitation power supplies; transmitting the initial state information and the action information group to an original DQN model, so that an intelligent agent of the original DQN model performs a first number of action-beam optical simulation by using the initial state information and the action information group through a pre-built simulation model of a controlled accelerator to obtain a first number of simulation results; the single simulation result comprises pre-action state information, single action instructions, post-action state information, terminal arrival quantity information and action instruction evaluation information; wherein the value of the end arrival quantity information is positively correlated with the value of the action instruction evaluation information; and training the original DQN model by taking the first number of simulation results as a training set to obtain the DQN model. According to the method, simulation results obtained by a large number of electromagnetic power supplies under different working states after different changing instructions are obtained through the simulation model of the controlled accelerator in advance, and the DQN original model is trained through the simulation results, so that the exciting power supply capable of being automatically adjusted is finally obtained, and beam optimization is completed.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

It should be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The beam current regulating and controlling method and the beam current regulating and controlling device provided by the invention are described in detail. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that the present invention may be modified and practiced without departing from the spirit of the present invention.

Claims

1. A beam steering method, comprising:

Acquiring accelerator working state information;

inputting the accelerator working state information into a trained DQN model;

receiving a target regulatory instruction from the DQN model;

the training method of the DQN model comprises the following steps:

2. The beam current regulation method of claim 1, wherein the sending the initial state information and the set of motion information to a DQN original model to enable an agent of the DQN original model to perform a first number of motion-beam current optical simulations using the initial state information and the set of motion information through a pre-built simulation model of the controlled accelerator, the obtaining a first number of simulation results comprising:

3. The beam steering method of claim 2, further comprising, after each action-beam optical simulation:

4. The beam steering method of claim 1, wherein a single said action-beam optical simulation comprises:

5. The beam steering method of claim 1, wherein training the original DQN model using the first number of simulation results as a training set to obtain the DQN model comprises:

The DQN model is determined from a plurality of the candidate models.

6. The beam steering method of claim 1, wherein the DQN model is a model of a 4-layer neural network.

7. The beam steering method of claim 1, wherein the receiving target steering instructions from the DQN model comprises:

receiving a target magnetic field regulation command from the DQN model;

8. The beam steering method of claim 1, wherein training the original model of DQN further comprises:

9. The beam steering method of claim 1, wherein the DQN model is disposed in an EPICS framework.

10. A beam steering apparatus, comprising:

the training method of the DQN model comprises the following steps: