US20230034598A1 - Control device, lithography apparatus, and article manufacturing method - Google Patents
Control device, lithography apparatus, and article manufacturing method Download PDFInfo
- Publication number
- US20230034598A1 US20230034598A1 US17/872,178 US202217872178A US2023034598A1 US 20230034598 A1 US20230034598 A1 US 20230034598A1 US 202217872178 A US202217872178 A US 202217872178A US 2023034598 A1 US2023034598 A1 US 2023034598A1
- Authority
- US
- United States
- Prior art keywords
- manipulated variable
- substrate
- probability distribution
- neural network
- stage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G03—PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
- G03F—PHOTOMECHANICAL PRODUCTION OF TEXTURED OR PATTERNED SURFACES, e.g. FOR PRINTING, FOR PROCESSING OF SEMICONDUCTOR DEVICES; MATERIALS THEREFOR; ORIGINALS THEREFOR; APPARATUS SPECIALLY ADAPTED THEREFOR
- G03F7/00—Photomechanical, e.g. photolithographic, production of textured or patterned surfaces, e.g. printing surfaces; Materials therefor, e.g. comprising photoresists; Apparatus specially adapted therefor
- G03F7/70—Microphotolithographic exposure; Apparatus therefor
- G03F7/70691—Handling of masks or workpieces
- G03F7/70716—Stages
-
- G—PHYSICS
- G03—PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
- G03F—PHOTOMECHANICAL PRODUCTION OF TEXTURED OR PATTERNED SURFACES, e.g. FOR PRINTING, FOR PROCESSING OF SEMICONDUCTOR DEVICES; MATERIALS THEREFOR; ORIGINALS THEREFOR; APPARATUS SPECIALLY ADAPTED THEREFOR
- G03F7/00—Photomechanical, e.g. photolithographic, production of textured or patterned surfaces, e.g. printing surfaces; Materials therefor, e.g. comprising photoresists; Apparatus specially adapted therefor
- G03F7/70—Microphotolithographic exposure; Apparatus therefor
- G03F7/70691—Handling of masks or workpieces
- G03F7/70733—Handling masks and workpieces, e.g. exchange of workpiece or mask, transport of workpiece or mask
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G06N7/005—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the present invention relates to a control device, a lithography apparatus, and an article manufacturing method.
- one of a continuous space and a discrete space can be selected as an action space in accordance with the constrain of an algorism and the property of the environment.
- an e greedy algorism non-patent literature 1, patent literature 1
- a Softmax method non-patent literature 1
- a greedy algorism is generally used as the action policy during searching.
- the performance of a controller that outputs a probability distribution used to determine a manipulated variable can be improved by performing learning using a method in which the manipulated variable is determined by sampling according to a random number.
- the stochastic behavior can affect quality assurance. Therefore, in general, it is continued to select the manipulated variable that maximizes the probability value.
- the control performance may deteriorate as compared to a case of determining the manipulated variable by sampling using a random number.
- the present invention provides a technique advantageous in suppressing a deterioration of the control performance during operation as compared to the control performance during learning.
- One of aspects of the present invention provides a control device for controlling an object to be controlled, the device comprising: a generator configured to generate a probability distribution used to determine a manipulated variable; and a determinator configured to determine the manipulated variable based on the probability distribution generated by the generator, wherein in an operation phase, the determinator determines the manipulated variable in accordance with an expectation value of the probability distribution.
- FIG. 1 is a view illustrating the configuration of a system according to an embodiment
- FIG. 2 is a view showing a configuration example of an object to be controlled in a case in which the system shown in FIG. 1 is applied to a stage control device;
- FIG. 3 is a block diagram showing a more specific configuration example of the stage control device shown in FIG. 2 ;
- FIG. 4 is a flowchart illustrating a method of determining a parameter value of a neural network by reinforcement learning
- FIG. 5 is a view showing a configuration example of the neural network
- FIG. 6 is a flowchart illustrating an operation of a neural network compensator
- FIG. 7 is a graph illustrating a probability distribution (probability mass function).
- FIG. 8 is a view illustrating a sampling method using an inverse transform method
- FIG. 9 is a graph illustrating responses of a stage
- FIG. 10 is a view showing another configuration example of the neural network
- FIG. 11 is a view showing another specific configuration example of the stage control device.
- FIG. 12 is a view showing a configuration example of an exposure apparatus as an example of a lithography apparatus.
- FIG. 13 is a flowchart showing an operation example of the exposure apparatus illustrated in FIG. 12 .
- FIG. 1 illustrates the configuration of a system according to an embodiment.
- This system can include an object 1 to be controlled, a control server 2 that controls the object 1 to be controlled, and a learning server 3 that performs learning by acquiring a control result from the object 1 to be controlled via the control server 2 .
- the learning server 3 can transmit, to a neural network formed inside the object 1 to be controlled, parameter information of the neural network via the control server 2 .
- the control server 2 can transmit a control instruction to the object 1 to be controlled, and acquire a control result from the object 1 to be controlled.
- the control result acquired by the control server 2 from the object 1 to be controlled can be transmitted from the control server 2 to the learning server 3 .
- the learning server 3 can calculate a reward indicating the quality of the parameter value of the neural network, and update the parameter value of the neural network based on the reward.
- control server 2 and the learning server 3 Since the calculation cost related to update of the parameter value of the neural network is high, it is advantageous to configure the control server 2 and the learning server 3 independently. In the configuration in which the control server 2 and the learning server 3 are independent, when there are a plurality of objects to be controlled, it is possible to perform an operation by preparing a plurality of the learning servers 3 each having a high calculation cost, and one control server 2 having a low calculation cost.
- FIG. 2 shows a configuration example of the object 1 to be controlled in a case in which the system shown in FIG. 1 is applied to a stage control device.
- the object 1 to be controlled can include a stage 5 , a sensor 6 , a control board 7 , and a driver 8 .
- the control board 7 can be configured to supply a current instruction to the driver 8 at predetermined time intervals.
- the driver 8 includes a current driver and an actuator.
- the current driver can supply a current corresponding to the current instruction to the actuator, and the actuator can drive the stage 5 .
- An operation of the stage 5 is observed (detected) by the sensor 6 , and an observation result can be supplied to the control board 7 .
- FIG. 3 shows a more specific configuration example of the stage control device shown in FIG. 2 .
- the control board (controller) 7 can include, for example, a subtractor 76 , a compensator 71 , a neural network compensator 72 , and an adder 75 .
- the control board 7 can receive a manipulation instruction supplied from the control server 2 , position information of the stage 5 supplied from the sensor 6 , and phase information supplied from the control server 2 .
- the position information of the stage 5 is an example of state information indicating the state of the stage 5 .
- the subtractor 76 can calculate the difference between the manipulation instruction supplied from the control server 2 and the position information supplied from the sensor 6 , that is, the deviation, and supply the deviation to the compensator 71 and the neural network compensator 72 .
- the compensator 71 generates a first manipulated variable based on the deviation supplied from the subtractor 76 , and supplies the first manipulated variable to the adder 75 .
- the neural network compensator 72 generates a second manipulated variable based on the difference supplied from the subtractor 76 , and supplies the second manipulated variable to the adder 75 .
- the neural network compensator 72 can include a neural network 73 , and a manipulated variable determinator 74 (determinator) that determines the second manipulated variable.
- the neural network 73 can output, based on the deviation supplied from the subtractor 76 , a probability distribution used to determine the second manipulated variable.
- the neural network 73 may be understood as a component that outputs, based on the deviation supplied from the subtractor 76 , a function which defines the probability distribution used to determine the second manipulated variable.
- the neural network 73 may be understood as a probability distribution generator (generator) that generates the probability distribution used to determine the second manipulated variable.
- the manipulated variable determinator 74 determines the second manipulated variable based on the probability distribution or the function, which defines the probability distribution, supplied from the neural network 73 and the phase information supplied from the control server 2 .
- a possible value of the phase information can include a value indicating a learning phase in which the parameter value of the neural network is learned, and a value indicating an operation phase in which control is performed using the parameter of the neural network for which learning is completed.
- a method of determining a manipulated variable by the manipulated variable determinator 74 will be described later.
- the compensator 71 and the neural network compensator 72 may be understood as a first compensator and a second compensator, respectively.
- the adder 75 adds the first manipulated variable supplied from the compensator 71 and the second manipulated variable supplied from the neural network compensator 72 , thereby generating a manipulated variable (combined manipulated variable).
- the adder 75 supplies the manipulated variable to the driver 8 as a current instruction.
- the driver 8 includes the current driver and the actuator.
- the current driver can supply a current corresponding to the current instruction to the actuator, and the actuator can drive the stage 5 .
- the deviation supplied to the neural network compensator 72 is not necessarily the deviation of the position information. For example, the deviation of the velocity, the acceleration, or the jerk may be used.
- a neural network parameter value (to be simply referred to as a parameter value hereinafter) of the neural network 73 is required to be determined by some learning method in advance.
- An example of the learning method is reinforcement learning.
- FIG. 4 illustrates a method (learning sequence) of determining the parameter value of the neural network 73 by reinforcement learning.
- the learning server 3 initializes the parameter value of the neural network 73 .
- the learning server 3 changes the parameter value of the neural network 73 .
- step S 402 in accordance with predetermined manipulation instruction data (for example, the time-series data of the manipulation instruction), the control board 7 manipulates the stage 5 serving as the object to be controlled.
- step S 403 the learning server 3 acquires the control result of the stage 5 serving as the object to be controlled, for example, the deviation data (for example, time-series data of the deviation).
- the control board 7 can provide the control result to the learning server 3 via the control server 2 .
- the learning server 3 calculates a reward based on the deviation data of the object to be controlled. In an example, the smaller the deviation, the higher the reward.
- the learning server 3 determines whether learning is completed. If it is determined that learning is not completed, the process returns to step S 401 . If it is determined that learning is completed, the process advances to step S 406 .
- the learning server 3 can determine that learning is not completed, and if the number of times of learning exceeds the predetermined number of times, the learning server 3 can determine that learning is completed.
- the learning server 3 can change the parameter value of the neural network 73 so as to increase the reward.
- the learning server 3 saves, as a learning result, the parameter value with which the maximum reward was obtained.
- the learning server 3 functions as a setter that sets the parameter value, which defines the operation of the neural network 73 (probability distribution generator), based on the control result of the object to be controlled which is controlled in accordance with the second manipulated variable determined by the manipulated variable determinator 74 .
- FIG. 5 shows a configuration example of the neural network 73 .
- the neural network 73 can include an input layer 731 , one or a plurality of intermediate layers 732 , an output layer 733 , a function 734 , and an output layer 735 .
- the input layer 731 can input, as input data 736 , the deviations for past N a control cycles including the current control cycle.
- output data 738 of the output layer 733 can be determined via the one or plurality of intermediate layers 732 .
- the output layer 738 can have N b numeric values (probabilities).
- the function 734 is, for example, a Softmax function.
- the function 734 can generate, as output data 739 of the output layer 735 , the probability mass function obtained by converting each of the N b numeric values of the output layer 738 into a normalized probability.
- the function 734 functions as a converter that converts the output of the neural network 73 into the probability mass function.
- the learning phase learning is performed using a reinforcement learning method such as Proximal Policy Optimization (PPO hereinafter) including a policy network, and the manipulated variable can be determined by generating a sample according to the probability mass function of the output data 739 .
- PPO Proximal Policy Optimization
- a pseudo-random number generation algorithm such as an inverse transform method or an MCMC method can be used. With this, it is possible to perform learning while performing a searching action.
- the manipulated variable having the highest probability of the output data 739 after conversion is generally selected.
- the accumulated value of the manipulated variables can influence the stage response. Therefore, the reward obtained by continuing to select the manipulated variable having the maximum probability may decrease as compared to the reward obtained when performing sampling from the probability mass function in the learning phase.
- an effect similar to that in the learning phase can be obtained in the operation phase when an expectation value, which is a sum of products of each manipulated variable candidate and the probability thereof, is used as the output (that is, the second manipulated variable) of the neural network compensator 72 .
- FIG. 6 illustrates an operation of the neural network compensator 72 .
- the neural network 73 outputs, to the output layer 735 , the probability distribution which uses a manipulated variable candidate as a random variable, in other words, the probability distribution used to determine the second manipulated variable.
- the probability distribution can be, for example, a probability mass function, but may be a probability density function as will be described later.
- the manipulated variable determinator 74 receives the phase information included in the control instruction supplied from the control server 2 , and checks the current phase. If the received phase information indicates the learning phase, the manipulated variable determinator 74 advances the process to step S 603 . If the received phase information indicates the operation phase, the manipulated variable determinator 74 advances the process to step S 605 .
- step S 603 that is, in the learning phase, based on the probability distribution (temporarily set probability distribution) output to the output layer 735 of the neural network 73 , the manipulated variable determinator 74 randomly determines the value of the random variable as the second manipulated variable.
- step S 605 that is, in the operation phase, the manipulated variable determinator 74 determines the second manipulated variable in accordance with the expectation value of the probability distribution output to the output layer 735 of the neural network 73 .
- step S 604 the manipulated variable determinator 74 outputs the second manipulated variable determined in step S 603 if it is in the learning phase, and outputs the second manipulated variable determined in step S 605 if it is in the operation phase.
- steps S 601 , (S 602 ), S 603 , and S 604 in the process shown in FIG. 6 are performed in step S 402 .
- a method (step S 605 ) of determining the manipulated variable in the operation phase will be exemplarily described below.
- a probability p i assigned to each manipulated variable candidate a i appears as the output data 739 of the output layer 735 .
- FIG. 7 illustrates the relationship between the manipulated variable candidate a i and the probability p i , that is, the probability distribution (probability mass function).
- An expectation value E determined in step S 605 is the expectation value of the probability distribution output to the output layer 735 of the neural network 73 .
- the expectation value E is a sum of products of a i and p i , and expressed by:
- a method (step S 603 ) of determining the manipulated variable in the learning phase will be exemplarily described below.
- an inverse transform method will be described with reference to FIG. 8 .
- a sample from the probability distribution expressed by the probability mass function can be obtained by using a continuous uniform random number r in a section [0, 1] and selecting the minimum i that satisfies r ⁇ b[i]. That is, based on the probability distribution, the value of the random variable can be randomly determined as the second manipulated variable.
- a reinforcement learning method such as Deep Q Network (DQN) including no policy network
- DQN Deep Q Network
- the deviations of the manipulation instructions for past N a control cycles including the current control cycle are input as the input data 736 of the input layer 731 .
- the scores of N b manipulated variable candidates can be obtained as the output data 738 of the output layer 733 via the one or plurality of intermediate layers 732 .
- the specific function 734 such as a Softmax function
- FIG. 9 illustrates responses of the stage 9 .
- a solid line indicates the deviation of the stage 9 in the learning phase.
- a dotted line indicates the deviation of the stage 9 in a case in which the manipulated variable candidate having the highest probability is output as the second manipulated variable in the operation phase.
- a dashed line indicates the deviation of the stage 9 in a case in which the expectation value of the probability distribution output to the output layer 735 of the neural network 73 is output as the second manipulated variable according to this embodiment.
- the manipulated variable candidate having the highest probability is output as the second manipulated variable in the operation phase, the waveform deteriorates as compared to the waveform in the learning phase.
- the expectation value is output as the second manipulated variable in the operation phase, the waveform similar to the waveform in the learning phase can be obtained.
- the neural network 73 described above is merely an example, and may be replaced with a neural network 303 as illustrated in FIG. 10 .
- the neural network 303 can include an input layer 761 , one or a plurality of intermediate layers 762 , an output layer 763 , a function 764 , and an output layer 765 .
- the input layer 761 can input, as input data 766 , the deviations for past N a control cycles including the current control cycle.
- coefficients ⁇ and ⁇ of the ⁇ distribution which is one kind of a probability density function, can be determined.
- the ⁇ distribution expressed by the coefficients ⁇ and ⁇ is scaled to the range [Fmin, Fmax] of the second manipulated variable.
- the learning phase learning is performed using a reinforcement learning method such as PPO including a polity network, and the second manipulated variable can be determined by generating a sample according to the probability density function.
- a reinforcement learning method such as PPO including a polity network
- the second manipulated variable can be determined by generating a sample according to the probability density function.
- an appropriate pseudo-random number generation algorithm such as an inverse transform method or an acceptance-rejection method can be used in accordance with the kind of the probability density function.
- the above-described scaling is performed on the manipulated variable candidate having the highest probability in the ⁇ distribution expressed by the coefficients ⁇ and ⁇ , which is the output data 769 , and the obtained value can be used as the output.
- the accumulated value of the manipulated variables influences the stage response. Therefore, the reward obtained by continuing to select the manipulated variable having the maximum probability may decrease as compared to the reward obtained when performing sampling from the probability density function in the learning phase.
- the second manipulated variable is determined in accordance with the expectation value E of the ⁇ distribution expressed by:
- the second manipulated variable can be determined. With this, an effect similar to that in the learning phase can be obtained.
- the manipulated variable determinator 74 operates as described above.
- a reinforcement learning method including no policy network may be used as the learning method used in the learning phase.
- FIG. 11 shows another specific configuration example of the stage control device.
- the difference (deviation) between the manipulation instruction and the position information is supplied to the neural network compensator 72 or the neural network 73 .
- the quality of a parameter value of the neural network can be determined from a reward calculated based on deviation data of the object to be controlled.
- the difference (deviation) between the manipulation instruction and the position information is not necessarily input to the neural network compensator 72 , but one or both of the manipulation instruction and the position information obtained from an output of the sensor 6 may be input.
- the position information is not necessarily input to the neural network compensator 72 .
- the velocity, the acceleration, or the jerk may be input.
- the second manipulated variable in the operation phase, can be determined in accordance with the expectation value of the probability distribution output from the neural network 73 .
- the difference (deviation) between the manipulation instruction and the position information is input to the neural network compensator 72 , by using the expectation value of the probability distribution as the second manipulated variable in the operation phase, a deviation suppression effect similar to that in the learning phase can be obtained.
- the manipulated variable to be supplied to the driver 8 is generated by adding the first manipulated variable output from the compensator 71 and the second manipulated variable output from the neural network compensator 72 , but the compensator 71 is not always necessary.
- the second manipulated variable output from the neural network compensator 72 may be supplied to the driver 8 intact.
- FIG. 12 shows an example in which the system described above is applied to a scanning exposure apparatus 800 which is an example of a lithography apparatus.
- the scanning exposure apparatus 800 is a step-and-scan exposure apparatus that performs scanning exposure of a substrate 14 by slit-shaped light shaped using a slit.
- the scanning exposure apparatus 800 can include an illumination optical system 23 , an original stage 12 , a projection optical system 13 , a substrate stage 15 , an original stage position measurement device 17 , a substrate stage position measurement device 18 , a substrate mark measurement device 21 , a substrate conveyor 22 , a controller 24 , and a temperature controller 25 .
- the controller 24 can control the illumination optical system 23 , the original stage 12 , the projection optical system 13 , the substrate stage 15 , the original stage position measurement device 17 , the substrate stage position measurement device 18 , the substrate mark measurement device 21 , and the substrate conveyor 22 .
- the controller 24 can control a process of transferring a pattern formed in an original 11 to the substrate 14 (a process of performing scanning exposure of the substrate 14 ).
- the controller 24 is formed by, for example, a PLD (the abbreviation of a Programmable Logic Device) such as an FPGA (the abbreviation of a Field Programmable Gate Array), an ASIC (the abbreviation of an Application Specific Integrated Circuit), a general-purpose computer installed with a program, or a combination of all or some of these components.
- the controller 24 also includes a driver that controls an actuator.
- the illumination optical system 23 illuminates the original 11 .
- the illumination optical system 23 can shape, using a light shielding member such as a masking blade, light emitted from a light source (not shown) into band-like or arcuate slit-shaped light long in the X direction, and illuminate a part of the original 11 with the slit-shaped light.
- the original 11 and the substrate 14 are held by the original stage 12 and substrate stage 15 , respectively, and are arranged in optically conjugate positions (the object plane and image plane of the projection optical system 13 ) via the projection optical system 13 .
- the projection optical system 13 has a predetermined projection magnification (For example, 1 ⁇ 2 or 1 ⁇ 4), and projects the pattern of the original 11 onto the substrate 14 by using the slit-shaped light.
- a region (a region irradiated with the slit-shaped light) on the substrate 14 onto which the pattern of the original 11 is projected is referred to as an irradiation region.
- the original stage 12 and the substrate stage 15 are configured to be movable in a direction (Y direction) orthogonal to the optical axis direction (Z direction) of the projection optical system 13 .
- the original stage 12 and the substrate stage 15 are relatively scanned and driven, by drivers (not shown) respectively, at a velocity ratio corresponding to the projection magnitude of the projection optical system 13 in synchronization with each other.
- the substrate 14 is scanned in the Y direction with respect to the irradiation region, and the pattern formed in the original 11 is transferred to a shot region on the substrate 14 .
- an exposure process for one substrate 14 is completed.
- the original stage position measurement device 17 includes, for example, a laser interferometer, and measures the position of the original stage 12 .
- the laser interferometer emits a laser beam toward a reflector (not shown) provided on the original stage 12 , and detects a displacement (a displacement from a reference position) of the original stage 12 based on the interference between the laser beam reflected on the reflector and the laser beam reflected on a reference surface.
- the original stage position measurement device 17 can acquire the current position of the original stage 12 based on the displacement.
- the original stage position measurement device 17 measures the position of the original stage 12 by the interferometer using the laser beam, but the present invention is not limited to this.
- an encoder may measure the position of the original stage 12 .
- the substrate stage position measurement device 18 includes, for example, a laser interferometer, and measures the position of the substrate stage 15 .
- the laser interferometer emits a laser beam toward a reflector (not shown) provided on the substrate stage 15 , and detects a displacement (a displacement from a reference position) of the substrate stage 15 based on the interference between the laser beam reflected on the reflector and the laser beam reflected on a reference surface.
- the substrate stage position measurement device 18 can acquire the current position of the substrate stage 15 based on the displacement.
- the substrate stage position measurement device 18 measures the position of the substrate stage 15 by the interferometer using the laser beam, but the present invention is not limited to this.
- an encoder may measure the position of the substrate stage 15 .
- the substrate mark measurement device 21 includes, for example, an image sensor, and can detect the position of a mark provided on a substrate.
- the substrate mark measurement device 21 of this embodiment detects the mark by the image sensor, but the present invention is not limited to this.
- a transmissive sensor may detect the mark.
- the substrate conveyor 22 supplies a substrate to the substrate stage 15 and collects it therefrom.
- the temperature controller 25 keeps the temperature and humidity within the exposure apparatus constant.
- FIG. 13 shows an operation example of the exposure apparatus illustrated in FIG. 12 .
- the substrate conveyor 22 supplies the substrate 14 onto the substrate stage 15 .
- the substrate stage 15 is driven such that a mark on the substrate 14 designated in an exposure recipe enters the measurement field of view of the substrate mark measurement device 21 , and alignment of the substrate 14 is performed.
- step S 903 for each shot region of the substrate 14 , scanning exposure of the substrate 14 is performed. The exposure order and exposure angle of view follow the designation by the exposure recipe.
- the substrate conveyor 22 collects the substrate 14 from the substrate stage.
- the sensor 6 shown in FIG. 2 corresponds to the substrate stage position measurement device 18
- the control board 7 corresponds to the controller 24
- the driver 8 corresponds to a substrate stage driver (not shown)
- the stage 5 corresponds to the substrate stage 15 .
- a settling time which is the time until the deviation converges after the substrate stage 15 is driven, can be shortened, so that the accuracy and throughput of the exposure apparatus can be improved.
- a deviation suppression effect similar to that in the learning phase can be obtained.
- the control board 7 shown in FIG. 2 corresponds to the controller 24
- the driver 8 corresponds to an original stage driver (not shown)
- the sensor 6 corresponds to the original stage position measurement device 17
- the stage 5 corresponds to the original stage 12 .
- a deviation suppression effect similar to that in the learning phase can be obtained.
- the control board 7 shown in FIG. 2 corresponds to the controller 24
- the driver 8 corresponds to a substrate conveyor driver (for example, AC servo motor) (not shown)
- the sensor 6 corresponds to a rotary encoder (not shown)
- the stage 5 corresponds to the substrate conveyor 22 .
- the present invention may be applied to another driving device in the scanning exposure apparatus.
- the present invention may also be applied to an exposure apparatus that performs exposure while stopping an original and a substrate, or may be applied to another lithography apparatus, for example, an imprint apparatus. Further, the present invention may be applied to another control device that controls an object to be controlled.
- the article manufacturing method can include a transfer step of transferring a pattern of an original to a substrate using the lithography apparatus, and a processing step of obtaining an article by processing the substrate to which the pattern has been transferred.
- the lithography apparatus is an exposure apparatus
- the article manufacturing method can include a transfer step of transferring a pattern of an original to a substrate (a wafer, a glass substrate, or the like) by exposing the substrate with a photosensitive agent applied thereto, and a processing step of obtaining an article by processing the substrate to which the pattern has been transferred.
- the processing step can include a step of developing the substrate (photosensitive agent).
- the processing step can further include other known steps, for example, steps for etching, resist removal, dicing, bonding, and packaging. According to this article manufacturing method, a higher-quality article than a conventional one can be manufactured.
- Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s).
- computer executable instructions e.g., one or more programs
- a storage medium which may also be referred to more fully as a
- the computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions.
- the computer executable instructions may be provided to the computer, for example, from a network or the storage medium.
- the storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)TM), a flash memory device, a memory card, and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Automation & Control Theory (AREA)
- Exposure And Positioning Against Photoresist Photosensitive Materials (AREA)
- Feedback Control In General (AREA)
Abstract
A control device controls an object to be controlled. The device includes a generator configured to generate a probability distribution used to determine a manipulated variable, and a determinator configured to determine the manipulated variable based on the probability distribution generated by the generator. In an operation phase, the determinator determines the manipulated variable in accordance with an expectation value of the probability distribution.
Description
- The present invention relates to a control device, a lithography apparatus, and an article manufacturing method.
- When learning a policy for maximizing a total reward by reinforcement learning, one of a continuous space and a discrete space can be selected as an action space in accordance with the constrain of an algorism and the property of the environment. When discrete action space is selected, an e greedy algorism (non-patent
literature 1, patent literature 1), a Softmax method (non-patent literature 1), or the like is generally used as the action policy during searching. As the action policy during operation, a greedy algorism is generally used. - The performance of a controller that outputs a probability distribution used to determine a manipulated variable can be improved by performing learning using a method in which the manipulated variable is determined by sampling according to a random number. However, in actual operation, if a controlled variable is determined by sampling using a random number as in learning, the stochastic behavior can affect quality assurance. Therefore, in general, it is continued to select the manipulated variable that maximizes the probability value. On the other hand, when it is continued to select the manipulated variable having the maximum probability, the control performance may deteriorate as compared to a case of determining the manipulated variable by sampling using a random number.
-
- Patent literature 1: Japanese Patent Laid-Open No. 2020-98538
- Non-patent literature 1: Sutton, R. S., Barto, A. G.: “Reinforcement Learning: An Introduction.” MIT Press, Cambridge, Mass. (1998)
- The present invention provides a technique advantageous in suppressing a deterioration of the control performance during operation as compared to the control performance during learning.
- One of aspects of the present invention provides a control device for controlling an object to be controlled, the device comprising: a generator configured to generate a probability distribution used to determine a manipulated variable; and a determinator configured to determine the manipulated variable based on the probability distribution generated by the generator, wherein in an operation phase, the determinator determines the manipulated variable in accordance with an expectation value of the probability distribution.
- Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
-
FIG. 1 is a view illustrating the configuration of a system according to an embodiment; -
FIG. 2 is a view showing a configuration example of an object to be controlled in a case in which the system shown inFIG. 1 is applied to a stage control device; -
FIG. 3 is a block diagram showing a more specific configuration example of the stage control device shown inFIG. 2 ; -
FIG. 4 is a flowchart illustrating a method of determining a parameter value of a neural network by reinforcement learning; -
FIG. 5 is a view showing a configuration example of the neural network; -
FIG. 6 is a flowchart illustrating an operation of a neural network compensator; -
FIG. 7 is a graph illustrating a probability distribution (probability mass function); -
FIG. 8 is a view illustrating a sampling method using an inverse transform method; -
FIG. 9 is a graph illustrating responses of a stage; -
FIG. 10 is a view showing another configuration example of the neural network; -
FIG. 11 is a view showing another specific configuration example of the stage control device; -
FIG. 12 is a view showing a configuration example of an exposure apparatus as an example of a lithography apparatus; and -
FIG. 13 is a flowchart showing an operation example of the exposure apparatus illustrated inFIG. 12 . - Hereinafter, embodiments will be described in detail with reference to the attached drawings. Note, the following embodiments are not intended to limit the scope of the claimed invention. Multiple features are described in the embodiments, but limitation is not made to an invention that requires all such features, and multiple such features may be combined as appropriate. Furthermore, in the attached drawings, the same reference numerals are given to the same or similar configurations, and redundant description thereof is omitted.
-
FIG. 1 illustrates the configuration of a system according to an embodiment. This system can include anobject 1 to be controlled, acontrol server 2 that controls theobject 1 to be controlled, and alearning server 3 that performs learning by acquiring a control result from theobject 1 to be controlled via thecontrol server 2. Thelearning server 3 can transmit, to a neural network formed inside theobject 1 to be controlled, parameter information of the neural network via thecontrol server 2. Then, thecontrol server 2 can transmit a control instruction to theobject 1 to be controlled, and acquire a control result from theobject 1 to be controlled. The control result acquired by thecontrol server 2 from theobject 1 to be controlled can be transmitted from thecontrol server 2 to thelearning server 3. In accordance with the control result, thelearning server 3 can calculate a reward indicating the quality of the parameter value of the neural network, and update the parameter value of the neural network based on the reward. - Since the calculation cost related to update of the parameter value of the neural network is high, it is advantageous to configure the
control server 2 and thelearning server 3 independently. In the configuration in which thecontrol server 2 and thelearning server 3 are independent, when there are a plurality of objects to be controlled, it is possible to perform an operation by preparing a plurality of thelearning servers 3 each having a high calculation cost, and onecontrol server 2 having a low calculation cost. -
FIG. 2 shows a configuration example of theobject 1 to be controlled in a case in which the system shown inFIG. 1 is applied to a stage control device. Theobject 1 to be controlled can include astage 5, asensor 6, acontrol board 7, and adriver 8. Thecontrol board 7 can be configured to supply a current instruction to thedriver 8 at predetermined time intervals. Thedriver 8 includes a current driver and an actuator. The current driver can supply a current corresponding to the current instruction to the actuator, and the actuator can drive thestage 5. An operation of thestage 5 is observed (detected) by thesensor 6, and an observation result can be supplied to thecontrol board 7. -
FIG. 3 shows a more specific configuration example of the stage control device shown inFIG. 2 . The control board (controller) 7 can include, for example, asubtractor 76, acompensator 71, aneural network compensator 72, and an adder 75. Thecontrol board 7 can receive a manipulation instruction supplied from thecontrol server 2, position information of thestage 5 supplied from thesensor 6, and phase information supplied from thecontrol server 2. The position information of thestage 5 is an example of state information indicating the state of thestage 5. Thesubtractor 76 can calculate the difference between the manipulation instruction supplied from thecontrol server 2 and the position information supplied from thesensor 6, that is, the deviation, and supply the deviation to thecompensator 71 and theneural network compensator 72. Thecompensator 71 generates a first manipulated variable based on the deviation supplied from thesubtractor 76, and supplies the first manipulated variable to the adder 75. - The
neural network compensator 72 generates a second manipulated variable based on the difference supplied from thesubtractor 76, and supplies the second manipulated variable to the adder 75. Theneural network compensator 72 can include aneural network 73, and a manipulated variable determinator 74 (determinator) that determines the second manipulated variable. Theneural network 73 can output, based on the deviation supplied from thesubtractor 76, a probability distribution used to determine the second manipulated variable. Theneural network 73 may be understood as a component that outputs, based on the deviation supplied from thesubtractor 76, a function which defines the probability distribution used to determine the second manipulated variable. Theneural network 73 may be understood as a probability distribution generator (generator) that generates the probability distribution used to determine the second manipulated variable. - The manipulated
variable determinator 74 determines the second manipulated variable based on the probability distribution or the function, which defines the probability distribution, supplied from theneural network 73 and the phase information supplied from thecontrol server 2. A possible value of the phase information can include a value indicating a learning phase in which the parameter value of the neural network is learned, and a value indicating an operation phase in which control is performed using the parameter of the neural network for which learning is completed. A method of determining a manipulated variable by the manipulatedvariable determinator 74 will be described later. Thecompensator 71 and theneural network compensator 72 may be understood as a first compensator and a second compensator, respectively. - The adder 75 adds the first manipulated variable supplied from the
compensator 71 and the second manipulated variable supplied from theneural network compensator 72, thereby generating a manipulated variable (combined manipulated variable). The adder 75 supplies the manipulated variable to thedriver 8 as a current instruction. As has been described above, thedriver 8 includes the current driver and the actuator. The current driver can supply a current corresponding to the current instruction to the actuator, and the actuator can drive thestage 5. Note that the deviation supplied to theneural network compensator 72 is not necessarily the deviation of the position information. For example, the deviation of the velocity, the acceleration, or the jerk may be used. - A neural network parameter value (to be simply referred to as a parameter value hereinafter) of the
neural network 73 is required to be determined by some learning method in advance. An example of the learning method is reinforcement learning.FIG. 4 illustrates a method (learning sequence) of determining the parameter value of theneural network 73 by reinforcement learning. First, in step S400, the learningserver 3 initializes the parameter value of theneural network 73. Then, in step S401, the learningserver 3 changes the parameter value of theneural network 73. In step S402, in accordance with predetermined manipulation instruction data (for example, the time-series data of the manipulation instruction), thecontrol board 7 manipulates thestage 5 serving as the object to be controlled. - In step S403, the learning
server 3 acquires the control result of thestage 5 serving as the object to be controlled, for example, the deviation data (for example, time-series data of the deviation). Here, thecontrol board 7 can provide the control result to thelearning server 3 via thecontrol server 2. Then, the learningserver 3 calculates a reward based on the deviation data of the object to be controlled. In an example, the smaller the deviation, the higher the reward. Then, the learningserver 3 determines whether learning is completed. If it is determined that learning is not completed, the process returns to step S401. If it is determined that learning is completed, the process advances to step S406. In an example, if the number of times of learning is equal to or smaller than a predetermined number of times, the learningserver 3 can determine that learning is not completed, and if the number of times of learning exceeds the predetermined number of times, the learningserver 3 can determine that learning is completed. In step S401, the learningserver 3 can change the parameter value of theneural network 73 so as to increase the reward. In step S406, the learningserver 3 saves, as a learning result, the parameter value with which the maximum reward was obtained. In the learning phase, the learningserver 3 functions as a setter that sets the parameter value, which defines the operation of the neural network 73 (probability distribution generator), based on the control result of the object to be controlled which is controlled in accordance with the second manipulated variable determined by the manipulatedvariable determinator 74. -
FIG. 5 shows a configuration example of theneural network 73. Theneural network 73 can include aninput layer 731, one or a plurality ofintermediate layers 732, anoutput layer 733, afunction 734, and an output layer 735. Theinput layer 731 can input, as input data 736, the deviations for past Na control cycles including the current control cycle. In response to the input,output data 738 of theoutput layer 733 can be determined via the one or plurality ofintermediate layers 732. Theoutput layer 738 can have Nb numeric values (probabilities). Thefunction 734 is, for example, a Softmax function. Thefunction 734 can generate, asoutput data 739 of the output layer 735, the probability mass function obtained by converting each of the Nb numeric values of theoutput layer 738 into a normalized probability. Thefunction 734 functions as a converter that converts the output of theneural network 73 into the probability mass function. - In the learning phase, learning is performed using a reinforcement learning method such as Proximal Policy Optimization (PPO hereinafter) including a policy network, and the manipulated variable can be determined by generating a sample according to the probability mass function of the
output data 739. For sampling from the probability distribution expressed by the probability mass function, for example, a pseudo-random number generation algorithm such as an inverse transform method or an MCMC method can be used. With this, it is possible to perform learning while performing a searching action. - After the learning phase is completed, in the operation phase in which the parameter value for which the learning phase is completed or the parameter value with which the maximum reward was obtained is used, the manipulated variable having the highest probability of the
output data 739 after conversion is generally selected. However, in a system that shows a transient response, such as a low-pass filter in stage control or the like, the accumulated value of the manipulated variables can influence the stage response. Therefore, the reward obtained by continuing to select the manipulated variable having the maximum probability may decrease as compared to the reward obtained when performing sampling from the probability mass function in the learning phase. - To prevent this, in this embodiment, an effect similar to that in the learning phase can be obtained in the operation phase when an expectation value, which is a sum of products of each manipulated variable candidate and the probability thereof, is used as the output (that is, the second manipulated variable) of the
neural network compensator 72. -
FIG. 6 illustrates an operation of theneural network compensator 72. First, in step S601, theneural network 73 outputs, to the output layer 735, the probability distribution which uses a manipulated variable candidate as a random variable, in other words, the probability distribution used to determine the second manipulated variable. The probability distribution can be, for example, a probability mass function, but may be a probability density function as will be described later. In step S602, the manipulatedvariable determinator 74 receives the phase information included in the control instruction supplied from thecontrol server 2, and checks the current phase. If the received phase information indicates the learning phase, the manipulatedvariable determinator 74 advances the process to step S603. If the received phase information indicates the operation phase, the manipulatedvariable determinator 74 advances the process to step S605. - In step S603, that is, in the learning phase, based on the probability distribution (temporarily set probability distribution) output to the output layer 735 of the
neural network 73, the manipulatedvariable determinator 74 randomly determines the value of the random variable as the second manipulated variable. In step S605, that is, in the operation phase, the manipulatedvariable determinator 74 determines the second manipulated variable in accordance with the expectation value of the probability distribution output to the output layer 735 of theneural network 73. In step S604, the manipulatedvariable determinator 74 outputs the second manipulated variable determined in step S603 if it is in the learning phase, and outputs the second manipulated variable determined in step S605 if it is in the operation phase. - Here, during execution of the process shown in
FIG. 4 , that is, the method (learning sequence) of determining the parameter value of theneural network 73, steps S601, (S602), S603, and S604 in the process shown inFIG. 6 are performed in step S402. - A method (step S605) of determining the manipulated variable in the operation phase will be exemplarily described below. Here, Nb manipulated variable candidates ai (i=0 to Nb) are defined. A probability pi assigned to each manipulated variable candidate ai appears as the
output data 739 of the output layer 735.FIG. 7 illustrates the relationship between the manipulated variable candidate ai and the probability pi, that is, the probability distribution (probability mass function). An expectation value E determined in step S605 is the expectation value of the probability distribution output to the output layer 735 of theneural network 73. The expectation value E is a sum of products of ai and pi, and expressed by: -
- A method (step S603) of determining the manipulated variable in the learning phase will be exemplarily described below. Here, as an example, an inverse transform method will be described with reference to
FIG. 8 . Consider a probability mass function where a[i] indicates the probability of selecting the ith manipulated variable candidate. An accumulated distribution function b[i] is defined as: -
b[i]=Σj=0 i a[j] (2) - A sample from the probability distribution expressed by the probability mass function can be obtained by using a continuous uniform random number r in a section [0, 1] and selecting the minimum i that satisfies r≤b[i]. That is, based on the probability distribution, the value of the random variable can be randomly determined as the second manipulated variable.
- As the learning method used in the learning phase, in addition to the reinforcement learning method such as PPO including a policy network, a reinforcement learning method such as Deep Q Network (DQN) including no policy network may be used. In this case, the deviations of the manipulation instructions for past Na control cycles including the current control cycle are input as the input data 736 of the
input layer 731. The scores of Nb manipulated variable candidates can be obtained as theoutput data 738 of theoutput layer 733 via the one or plurality ofintermediate layers 732. By converting, using thespecific function 734 such as a Softmax function, the score of the manipulated variable candidate into the probability of the manipulated variable candidate, theoutput data 739 of the output layer 735 can be generated. -
FIG. 9 illustrates responses of thestage 9. A solid line indicates the deviation of thestage 9 in the learning phase. A dotted line indicates the deviation of thestage 9 in a case in which the manipulated variable candidate having the highest probability is output as the second manipulated variable in the operation phase. A dashed line indicates the deviation of thestage 9 in a case in which the expectation value of the probability distribution output to the output layer 735 of theneural network 73 is output as the second manipulated variable according to this embodiment. As can be seen fromFIG. 9 , if the manipulated variable candidate having the highest probability is output as the second manipulated variable in the operation phase, the waveform deteriorates as compared to the waveform in the learning phase. On the other hand, as can be seen fromFIG. 9 , if the expectation value is output as the second manipulated variable in the operation phase, the waveform similar to the waveform in the learning phase can be obtained. - As has been described above, in a system that shows a transient response, such as a low-pass filter in stage control or the like, by using the expectation value as the output in the operation phase of the neural network that performs discrete output, a deviation suppression effect similar to that in the learning phase can be obtained.
- The
neural network 73 described above is merely an example, and may be replaced with aneural network 303 as illustrated inFIG. 10 . Theneural network 303 can include aninput layer 761, one or a plurality ofintermediate layers 762, anoutput layer 763, afunction 764, and anoutput layer 765. Theinput layer 761 can input, asinput data 766, the deviations for past Na control cycles including the current control cycle. Asoutput data 769 of theoutput layer 765 via the one or plurality ofintermediate layers 762, theoutput layer 763, and theactivation function 764, coefficients α and β of the β distribution, which is one kind of a probability density function, can be determined. When determining the second manipulated variable, the β distribution expressed by the coefficients α and β is scaled to the range [Fmin, Fmax] of the second manipulated variable. - In the learning phase, learning is performed using a reinforcement learning method such as PPO including a polity network, and the second manipulated variable can be determined by generating a sample according to the probability density function. For sampling from the probability distribution expressed by the probability density function, an appropriate pseudo-random number generation algorithm such as an inverse transform method or an acceptance-rejection method can be used in accordance with the kind of the probability density function. With this, it is possible to perform learning while performing a searching action. On the other hand, in the operation phase in which the parameter value for which the learning phase is completed or the parameter value with which the maximum reward was obtained is used, the above-described scaling is performed on the manipulated variable candidate having the highest probability in the β distribution expressed by the coefficients α and β, which is the
output data 769, and the obtained value can be used as the output. However, as has been described above, in a system that shows a transient response, such as a low-pass filter in stage control or the like, the accumulated value of the manipulated variables influences the stage response. Therefore, the reward obtained by continuing to select the manipulated variable having the maximum probability may decrease as compared to the reward obtained when performing sampling from the probability density function in the learning phase. To prevent this, the second manipulated variable is determined in accordance with the expectation value E of the β distribution expressed by: -
- For example, by performing the above-described scaling on the expectation value E described above, the second manipulated variable can be determined. With this, an effect similar to that in the learning phase can be obtained. The manipulated
variable determinator 74 operates as described above. A reinforcement learning method including no policy network may be used as the learning method used in the learning phase. - As has been described above, even when a neural network that outputs continuous values is used in a system that shows a transient response, such as a low-pass filter in stage control or the like, by using the expectation value as the output in the operation phase, a deviation suppression effect similar to that in the learning phase can be obtained.
-
FIG. 11 shows another specific configuration example of the stage control device. In the example described above, the difference (deviation) between the manipulation instruction and the position information is supplied to theneural network compensator 72 or theneural network 73. However, the quality of a parameter value of the neural network can be determined from a reward calculated based on deviation data of the object to be controlled. Accordingly, the difference (deviation) between the manipulation instruction and the position information is not necessarily input to theneural network compensator 72, but one or both of the manipulation instruction and the position information obtained from an output of thesensor 6 may be input. Note that also in this case, the position information is not necessarily input to theneural network compensator 72. For example, the velocity, the acceleration, or the jerk may be input. Also in this configuration, in the operation phase, the second manipulated variable can be determined in accordance with the expectation value of the probability distribution output from theneural network 73. In this manner, even in a case in which the difference (deviation) between the manipulation instruction and the position information is input to theneural network compensator 72, by using the expectation value of the probability distribution as the second manipulated variable in the operation phase, a deviation suppression effect similar to that in the learning phase can be obtained. - In the above description, the manipulated variable to be supplied to the
driver 8 is generated by adding the first manipulated variable output from thecompensator 71 and the second manipulated variable output from theneural network compensator 72, but thecompensator 71 is not always necessary. For example, the second manipulated variable output from theneural network compensator 72 may be supplied to thedriver 8 intact. -
FIG. 12 shows an example in which the system described above is applied to ascanning exposure apparatus 800 which is an example of a lithography apparatus. Thescanning exposure apparatus 800 is a step-and-scan exposure apparatus that performs scanning exposure of asubstrate 14 by slit-shaped light shaped using a slit. Thescanning exposure apparatus 800 can include an illuminationoptical system 23, anoriginal stage 12, a projectionoptical system 13, asubstrate stage 15, an original stageposition measurement device 17, a substrate stageposition measurement device 18, a substratemark measurement device 21, asubstrate conveyor 22, acontroller 24, and atemperature controller 25. - The
controller 24 can control the illuminationoptical system 23, theoriginal stage 12, the projectionoptical system 13, thesubstrate stage 15, the original stageposition measurement device 17, the substrate stageposition measurement device 18, the substratemark measurement device 21, and thesubstrate conveyor 22. Thecontroller 24 can control a process of transferring a pattern formed in an original 11 to the substrate 14 (a process of performing scanning exposure of the substrate 14). Thecontroller 24 is formed by, for example, a PLD (the abbreviation of a Programmable Logic Device) such as an FPGA (the abbreviation of a Field Programmable Gate Array), an ASIC (the abbreviation of an Application Specific Integrated Circuit), a general-purpose computer installed with a program, or a combination of all or some of these components. Thecontroller 24 also includes a driver that controls an actuator. - The illumination
optical system 23 illuminates the original 11. The illuminationoptical system 23 can shape, using a light shielding member such as a masking blade, light emitted from a light source (not shown) into band-like or arcuate slit-shaped light long in the X direction, and illuminate a part of the original 11 with the slit-shaped light. The original 11 and thesubstrate 14 are held by theoriginal stage 12 andsubstrate stage 15, respectively, and are arranged in optically conjugate positions (the object plane and image plane of the projection optical system 13) via the projectionoptical system 13. - The projection
optical system 13 has a predetermined projection magnification (For example, ½ or ¼), and projects the pattern of the original 11 onto thesubstrate 14 by using the slit-shaped light. A region (a region irradiated with the slit-shaped light) on thesubstrate 14 onto which the pattern of the original 11 is projected is referred to as an irradiation region. Theoriginal stage 12 and thesubstrate stage 15 are configured to be movable in a direction (Y direction) orthogonal to the optical axis direction (Z direction) of the projectionoptical system 13. Theoriginal stage 12 and thesubstrate stage 15 are relatively scanned and driven, by drivers (not shown) respectively, at a velocity ratio corresponding to the projection magnitude of the projectionoptical system 13 in synchronization with each other. Thus, thesubstrate 14 is scanned in the Y direction with respect to the irradiation region, and the pattern formed in the original 11 is transferred to a shot region on thesubstrate 14. By sequentially performing the scanning exposure as described above for each of a plurality of shot regions of thesubstrate 14 while moving thesubstrate stage 15, an exposure process for onesubstrate 14 is completed. - The original stage
position measurement device 17 includes, for example, a laser interferometer, and measures the position of theoriginal stage 12. For example, the laser interferometer emits a laser beam toward a reflector (not shown) provided on theoriginal stage 12, and detects a displacement (a displacement from a reference position) of theoriginal stage 12 based on the interference between the laser beam reflected on the reflector and the laser beam reflected on a reference surface. The original stageposition measurement device 17 can acquire the current position of theoriginal stage 12 based on the displacement. Here, the original stageposition measurement device 17 measures the position of theoriginal stage 12 by the interferometer using the laser beam, but the present invention is not limited to this. For example, an encoder may measure the position of theoriginal stage 12. - The substrate stage
position measurement device 18 includes, for example, a laser interferometer, and measures the position of thesubstrate stage 15. For example, the laser interferometer emits a laser beam toward a reflector (not shown) provided on thesubstrate stage 15, and detects a displacement (a displacement from a reference position) of thesubstrate stage 15 based on the interference between the laser beam reflected on the reflector and the laser beam reflected on a reference surface. The substrate stageposition measurement device 18 can acquire the current position of thesubstrate stage 15 based on the displacement. Here, the substrate stageposition measurement device 18 measures the position of thesubstrate stage 15 by the interferometer using the laser beam, but the present invention is not limited to this. For example, an encoder may measure the position of thesubstrate stage 15. - The substrate
mark measurement device 21 includes, for example, an image sensor, and can detect the position of a mark provided on a substrate. Here, the substratemark measurement device 21 of this embodiment detects the mark by the image sensor, but the present invention is not limited to this. For example, a transmissive sensor may detect the mark. Thesubstrate conveyor 22 supplies a substrate to thesubstrate stage 15 and collects it therefrom. Thetemperature controller 25 keeps the temperature and humidity within the exposure apparatus constant. -
FIG. 13 shows an operation example of the exposure apparatus illustrated inFIG. 12 . In step S901, thesubstrate conveyor 22 supplies thesubstrate 14 onto thesubstrate stage 15. In step S902, thesubstrate stage 15 is driven such that a mark on thesubstrate 14 designated in an exposure recipe enters the measurement field of view of the substratemark measurement device 21, and alignment of thesubstrate 14 is performed. In step S903, for each shot region of thesubstrate 14, scanning exposure of thesubstrate 14 is performed. The exposure order and exposure angle of view follow the designation by the exposure recipe. In step S904, thesubstrate conveyor 22 collects thesubstrate 14 from the substrate stage. - An example in which the system described above is applied to control of the substrate stage (movable portion) 15 will be described below. The
sensor 6 shown inFIG. 2 corresponds to the substrate stageposition measurement device 18, thecontrol board 7 corresponds to thecontroller 24, thedriver 8 corresponds to a substrate stage driver (not shown), and thestage 5 corresponds to thesubstrate stage 15. When the system described above is applied to control of thesubstrate stage 15, a settling time, which is the time until the deviation converges after thesubstrate stage 15 is driven, can be shortened, so that the accuracy and throughput of the exposure apparatus can be improved. Also in a system for controlling thesubstrate stage 15, by determining, in the operation phase, the manipulated variable in accordance with the expectation value of the probability distribution used to determine the manipulated variable, a deviation suppression effect similar to that in the learning phase can be obtained. - An example in which the system described above is applied to control of the original stage (movable portion) 12 will be described below. The
control board 7 shown inFIG. 2 corresponds to thecontroller 24, thedriver 8 corresponds to an original stage driver (not shown), thesensor 6 corresponds to the original stageposition measurement device 17, and thestage 5 corresponds to theoriginal stage 12. Also in a system for controlling theoriginal stage 12, by determining, in the operation phase, the manipulated variable in accordance with the expectation value of the probability distribution used to determine the manipulated variable, a deviation suppression effect similar to that in the learning phase can be obtained. - An example in which the system described above is applied to control of the substrate conveyor (movable portion) 22 will be described below. The
control board 7 shown inFIG. 2 corresponds to thecontroller 24, thedriver 8 corresponds to a substrate conveyor driver (for example, AC servo motor) (not shown), thesensor 6 corresponds to a rotary encoder (not shown), and thestage 5 corresponds to thesubstrate conveyor 22. When the system described above is applied to control of thesubstrate conveyor 22, a deviation during driving of thesubstrate conveyor 22 can be suppressed, so that the reproducibility of the supply position upon supplying thesubstrate 14 to thesubstrate stage 15 can be improved. Further, by suppressing the deviation while increasing the acceleration and the velocity, it is also possible to improve the throughput. Also in a system for controlling thesubstrate conveyor 22, by determining, in the operation phase, the manipulated variable in accordance with the expectation value of the probability distribution used to determine the manipulated variable, a deviation suppression effect similar to that in the learning phase can be obtained. - So far, an application to the driving device of each of the substrate stage, the original stage, and the substrate conveyor in the scanning exposure apparatus has been described, but the present invention may be applied to another driving device in the scanning exposure apparatus. The present invention may also be applied to an exposure apparatus that performs exposure while stopping an original and a substrate, or may be applied to another lithography apparatus, for example, an imprint apparatus. Further, the present invention may be applied to another control device that controls an object to be controlled.
- Next, an article manufacturing method of manufacturing an article (a semiconductor IC element, a liquid crystal display element, a MEMS, or the like) using the above-described lithography apparatus will be described. The article manufacturing method can include a transfer step of transferring a pattern of an original to a substrate using the lithography apparatus, and a processing step of obtaining an article by processing the substrate to which the pattern has been transferred. When the lithography apparatus is an exposure apparatus, the article manufacturing method can include a transfer step of transferring a pattern of an original to a substrate (a wafer, a glass substrate, or the like) by exposing the substrate with a photosensitive agent applied thereto, and a processing step of obtaining an article by processing the substrate to which the pattern has been transferred. The processing step can include a step of developing the substrate (photosensitive agent). The processing step can further include other known steps, for example, steps for etching, resist removal, dicing, bonding, and packaging. According to this article manufacturing method, a higher-quality article than a conventional one can be manufactured.
- Note that the series of embodiments have been described using a stage control device and an exposure apparatus, but a control device having another configuration may be used.
- Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
- While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
- This application claims the benefit of Japanese Patent Application No. 2021-126047, filed Jul. 30, 2021, which is hereby incorporated by reference herein in its entirety.
Claims (16)
1. A control device for controlling an object to be controlled, the device comprising:
a generator configured to generate a probability distribution used to determine a manipulated variable; and
a determinator configured to determine the manipulated variable based on the probability distribution generated by the generator,
wherein in an operation phase, the determinator determines the manipulated variable in accordance with an expectation value of the probability distribution.
2. The device according to claim 1 , wherein
in a learning phase, the determinator determines, as the manipulated variable, a value of a random variable randomly determined in accordance with a temporarily set probability distribution.
3. The device according to claim 2 , further comprising
a setter configured to set, in the learning phase, a parameter value that defines an operation of the generator based on a control result of the object which is controlled in accordance with the manipulated variable determined by the determinator.
4. The device according to claim 1 , wherein
the probability distribution is a probability mass function.
5. The device according to claim 4 , wherein
the generator includes a neural network that generates scores of a plurality of manipulated variable candidates.
6. The device according to claim 5 , wherein
the generator further includes a convertor configured to convert an output of the neural network into the probability mass function.
7. The device according to claim 6 , wherein
the convertor converts the output of the neural network in accordance with a Softmax function.
8. The device according to claim 1 , wherein
the probability distribution is a probability density function.
9. The device according to claim 1 , wherein
the generator receives a difference between a control instruction and state information indicating a state of the object, and generates the probability distribution in accordance with the difference.
10. The device according to claim 1 , wherein
the generator receives a control instruction and state information indicating a state of the object, and generates the probability distribution based on the control instruction and the state information.
11. The device according to claim 9 , wherein
the state information is a position of the object.
12. The device according to claim 9 , wherein
the state information is one of a velocity, an acceleration, and a jerk of the object.
13. The device according to claim 9 , further comprising:
a first compensator configured to generate a first manipulated variable based on the difference between the control instruction and the state information; and
an adder configured to generate a combined manipulated variable obtained by adding the first manipulated variable and the manipulated variable determined by the determinator,
wherein the combined manipulated variable is supplied to a driver configured to drive the object.
14. A lithography apparatus for transferring a pattern of an original to a substrate, the apparatus comprising:
a movable portion; and
a control device defined in claim 1 , and configured to control the movable portion.
15. The apparatus according to claim 14 , wherein
the movable portion is one of a substrate stage, an original stage, and a substrate conveyor.
16. An article manufacturing method comprising:
transferring a pattern of an original to a substrate using a lithography apparatus defined in claim 15 ; and
obtaining an article by processing the substrate to which the pattern has been transferred.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021-126047 | 2021-07-30 | ||
JP2021126047A JP2023020593A (en) | 2021-07-30 | 2021-07-30 | Control device, lithography device and article manufacturing method |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230034598A1 true US20230034598A1 (en) | 2023-02-02 |
Family
ID=85038585
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/872,178 Pending US20230034598A1 (en) | 2021-07-30 | 2022-07-25 | Control device, lithography apparatus, and article manufacturing method |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230034598A1 (en) |
JP (1) | JP2023020593A (en) |
KR (1) | KR20230019022A (en) |
CN (1) | CN115685692A (en) |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7085140B2 (en) | 2018-12-19 | 2022-06-16 | オムロン株式会社 | Control device, control method and control program |
-
2021
- 2021-07-30 JP JP2021126047A patent/JP2023020593A/en active Pending
-
2022
- 2022-07-18 KR KR1020220087915A patent/KR20230019022A/en active Search and Examination
- 2022-07-25 US US17/872,178 patent/US20230034598A1/en active Pending
- 2022-07-29 CN CN202210902358.8A patent/CN115685692A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
KR20230019022A (en) | 2023-02-07 |
CN115685692A (en) | 2023-02-03 |
JP2023020593A (en) | 2023-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI685722B (en) | Determining an edge roughness parameter of a periodic structure | |
EP3451063B1 (en) | Pattern forming apparatus, deciding method, program, information processing apparatus, and article manufacturing method | |
US10234775B2 (en) | Exposure apparatus, exposure method, and method of manufacturing article | |
KR102516180B1 (en) | Information processing apparatus, judgement method, program, lithography system, and manufacturing method of article | |
US10859927B2 (en) | Calculation method, exposure method, storage medium, exposure apparatus, and method of manufacturing article | |
US9996916B2 (en) | Evaluation method, storage medium, exposure apparatus, exposure method, and method of manufacturing article | |
KR102622405B1 (en) | Sub-field control and associated devices in lithographic processes | |
US20230034598A1 (en) | Control device, lithography apparatus, and article manufacturing method | |
US11693323B2 (en) | Control apparatus, positioning apparatus, lithography apparatus, and article manufacturing method | |
KR20240089237A (en) | How to determine performance parameter distribution | |
US9329498B2 (en) | Exposure apparatus and method of manufacturing article | |
US10635005B2 (en) | Exposure apparatus, method thereof, and method of manufacturing article | |
US11762299B2 (en) | Exposure apparatus and method of manufacturing article | |
JP2010040553A (en) | Position detecting method, program, position detection device, and exposure device | |
WO2022030334A1 (en) | Control device, lithography device, and method for manufacturing article | |
JP6945601B2 (en) | Pattern forming device, determination method, program, information processing device and manufacturing method of goods | |
US20220365454A1 (en) | Mark detecting apparatus, mark learning apparatus, substrate processing apparatus, mark detecting method, and manufacturing method of article | |
EP4145225A1 (en) | Management device, lithography device, management method, and method for manufacturing article | |
KR20230009303A (en) | Processing apparatus, management apparatus, lithography apparatus, and article manufacturing method | |
TW202347042A (en) | Methods of metrology and associated devices | |
KR20220053492A (en) | Processing apparatus, measurement apparatus, lithography apparatus, method of manufacturing article, model, processing method, measurement method, generation method, and generation apparatus | |
JP2007027263A (en) | Exposure apparatus and method, position detecting apparatus and method, and device manufacturing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: CANON KABUSHIKI KAISHA, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIYOHARA, NAOKI;KITA, NAOKI;SIGNING DATES FROM 20220713 TO 20220719;REEL/FRAME:060961/0215 |