CROSSREFERENCE TO RELATED APPLICATIONS

This is a continuation of International Application PCT/JP00/00855, with an international filing date of Feb. 16, 2000, the entire content of which being hereby incorporated herein by reference.[0001]
BACKGROUND OF THE INVENTION

1. Field of the Invention [0002]

The present invention relates to a position detection method and position detector, an exposure method and exposure apparatus, and a device and device manufacturing method and, more particularly, to a position detection method and position detector for obtaining arrangement information of divided areas on an object, an exposure method and exposure apparatus using the position detection method, and a device manufactured using the exposure method and a manufacturing method thereof. [0003]

2. Description of the Related Art [0004]

In a lithography process for making semiconductor devices, liquid crystal display devices, and the like, an exposure apparatus which transfers a pattern formed on a mask or reticle (to be generally referred to as a “reticle” hereinafter) onto a substrate (to be referred to as a “sensitive substrate or wafer” as needed hereinafter) such as a wafer, glass plate, or the like coated with a resist or the like via a projection optical system is used. As such an exposure apparatus, stationary exposure type projection exposure apparatus such as a socalled stepper, or a scanning exposure type projection exposure apparatus such as a socalled scanning stepper is mainly used. [0005]

In these exposure apparatuses, position adjustment (alignment) between a reticle and wafer must be accurately performed prior to exposure. To achieve this alignment, position detection marks (alignment marks) are formed (transferred by exposure) in the previous lithography process on the wafer in each of shot regions, and the position of the wafer (or a circuit pattern on the wafer) can be detected by detecting the positions of the alignment marks. Alignment is performed on the basis of the detection results of the positions of the wafer (or a circuit pattern on the wafer). [0006]

Such alignment schemes include a diebydie scheme for performing alignment by detecting alignment marks in each shot, and an enhanced global alignment (to be abbreviated as “EGA” hereinafter) scheme for, after measuring alignment marks (position adjustment marks transferred together with a circuit pattern) at several positions in a wafer, computing arrangement coordinate positions of each shot area by a statistical scheme such as a least square approximation and, upon exposure, stepping a wafer using the accuracy of a wafer stage and the computation result. This EGA scheme is disclosed in, e.g., Japanese Patent LaidOpen No. 6144429 and corresponding U.S. Pat. No. 4,780,617. Of these schemes, the EGA scheme is prevalently used nowadays in terms of the throughput of the apparatus. [0007]

In this EGA scheme, in order to determine a plurality of parameters that uniquely specifies the actual arrangement coordinate positions of shot areas relative to ideal arrangement coordinate positions in terms of design, more positions of alignment marks than the minimum number of those required to obtain the plurality of parameters are measured. Then, statistically valid parameter values are determined using a statistical scheme such as a least square approximation. [0008]

Upon applying such statistical scheme, error analysis is performed under the premise that “all position measurement results of alignment marks have the same reliability” (hereafter, this case is referred to as “prior art 1”). [0009]

Also, as disclosed in Japanese Patent Publication No. 7120621, a technique for determining the predetermined parameters by executing a fuzzy process based on fuzzy inference using statistical values such as the average value and variance of the measured positions of the alignment marks and obtaining the arrangement coordinate positions of the shot areas has also been proposed (hereafter, this case is referred to as “prior art 2”). [0010]

When all alignment marks are equally formed, the premise of prior art 1 that “all position measurement results of alignment marks have the same reliability” is true, but not true when the shapes of alignment marks differ depending on their positions on a substrate. Therefore, when the shapes of alignment marks differ depending on their positions on a substrate, all position measurement results, whether their reliabilities are high or low, equally contribute to determination of the arrangement coordinate positions of shot areas their different reliabilities. [0011]

The accuracy of the arrangement coordinate position determination of shot areas determined under the above premise suffices to achieve conventionally required exposure accuracy, but does not suffice for the increase of integration degree in recent years. [0012]

Prior art 2 is free from any problem of the accuracy of the arrangement coordinate position determination of shot areas unlike in prior art 1. However, since prior art 2 requires a huge computation volume for fuzzy inference, a long period of time is required to determine the arrangement coordinate positions of shot regions, and this makes it difficult to improve the throughput of exposure. In order to prevent such low throughput, a largescale computation resource is needed. However, the use thereof causes the whole exposure apparatus to be large and complicated. [0013]

Upon applying the conventional statistical scheme, the positions of alignment marks to be measured on a wafer are determined empirically or on a trialanderror basis that after transferring a pattern onto a wafer while aligning the wafer using a temporarily selected sample set, the same patterns on the wafer are measured, and that if desired results are not obtained, another sample set is selected. [0014]

As described above, in the conventional method, a sample set of alignment marks as a subset of a set of all alignment marks is determined by an aleatory method, and the validity of determination of that sample set is not quantitatively evaluated. Therefore, it is not guaranteed that the error distribution of the positions of a plurality of alignment marks as elements of a sample set determined by the conventional method appropriately reflects the error distribution of positions of all alignment marks. [0015]

For trying to solve this problem, there is the following method: the position control using a plurality of parameters which are obtained using a provisional sample set determined empirically or arbitrarily and uniquely specify the arrangement coordinate positions of shot regions, when the sample set includes shot areas (socalled “isolated shots”) having much larger alignment errors than those of other shot regions, alignment marks contained in such isolated shots are excluded from the sample set. This method presupposes that only few isolated shots exist, and that those alignment marks cause the decrease of alignment accuracy for all shot regions. [0016]

However, if alignment marks of two isolated shot regions, of which pattern shift directions are almost opposite to each other (i.e., the two shot areas having negative correlation), are selected, highaccuracy alignment is possible. Therefore, exclusion of the measured position information of alignment marks contained in an isolated shot area may result in an alignment accuracy decrease. [0017]

In the case where a wafer is aligned on the basis of the position measurement results of alignment marks included in a subset (sample set) selected from a large number of alignment marks, upon examining the validity of a method for selecting the desired sample set, evaluating separately individual alignment marks in the sample set does not have much sense. This is because it is ideal that the sample set broadly reflects the entire set and because it is preferable that alignment marks in the sample set preferably have a position distribution that corresponds to that of alignment marks in the entire set. For example, in the case where one of five alignment marks in a sample set is an alignment mark contained in an isolated shot region, if one fifth of all the alignment marks are contained in isolated shot regions, the sample set is more valid than a sample set excluding alignment marks of isolated shot regions. That is, position errors of measured alignment marks reflect the position distribution of all the alignment marks somehow, and should not be carelessly ignored. However, there have been no proposals concerning a method for selecting a sample set from the entire set of alignment marks formed on a substrate so as to perform statistically valid position control and alignment on the basis of the position measurement results of alignment marks in the sample set. [0018]

Furthermore, although it is not appropriate to exclude alignment marks in descending order of position error amounts in order to reduce the number of sample alignment marks and alignment measurement time, there have been no proposals concerning a method for reducing the number of sample alignment marks while maintaining alignment accuracy. [0019]

That is, an alignment technology that can meet recent requirements of improved exposure accuracy and throughput is needed. [0020]

The present invention has been made considering the above situation, and a first object of the present invention is to provide a position detection method and position detector which can accurately and efficiently detect arrangement information of divided areas on an object. [0021]

A second object of the present invention is to provide an exposure method and exposure apparatus that can transfer a predetermined pattern onto a substrate with high accuracy. [0022]

A third object of the present invention is to provide a device on which fine patterns are accurately formed. [0023]

A fourth object of the present invention is to provide a manufacturing method of manufacturing a device on which fine patterns are accurately formed. [0024]
SUMMARY OF THE INVENTION

According to the first aspect, there is provided a first position detection method for detecting position information of any area on an object provided with a plurality of positionmeasurementpoints, the position detection method comprising a measurement step of selecting more positionmeasurementpoints than a minimum number of measurements required to calculate values of a predetermined number of parameters, which uniquely specify position information of any area on the object, from the plurality of positionmeasurementpoints and measuring pieces of position information of the respective selected positionmeasurementpoints; an estimation step of calculating respective positions of the selected positionmeasurementpoints, based on the measurement results of the pieces of position information, and estimating probability density functions which each represent occurrence probability of the calculated position for respective one of the selected positionmeasurementpoints; a probability density calculation step of calculating probability density of the calculated position of each of the positionmeasurementpoints, based on respective one of the probability density functions; and a parameter calculation step of evaluating an error of each calculated position relative to respective reference position while using the respective calculated probability density's value as a piece of weight information and calculating values of the predetermined number of parameters, based on the evaluated errors. [0025]

According to this method, pieces of position information of selected positionmeasurementpoints are measured, and the positions and probability densities of the selected positionmeasurementpoints are calculated on the basis of the measurement results. Upon calculating the statistically most valid values of a predetermined number of parameters, which uniquely specify position information of any area on an object, while using the calculated probability densities as pieces of information each representing the certainty of the position of a respective positionmeasurementpoint, the error between each calculated position and a respective reference position is weighted in accordance with the certainty of the calculated position of the respective positionmeasurementpoint, i.e., the probability density at the calculated position of the respective positionmeasurementpoint. That is, if the probability density is large, the weight is large, and if the probability density is small, the weight is small. As a result, when the calculated position of a positionmeasurementpoint has high certainty, the degree of influence of the error of the calculated position of the positionmeasurementpoint relative to its reference position is high; when the calculated position of a positionmeasurementpoint has low certainty, the degree of influence of the error of the calculated position of the positionmeasurementpoint relative to its reference position is low. Therefore, since statistically valid values of a predetermined number of parameters which uniquely specify position information of any area on an object can be calculated while rationally reflecting respective certainties of the calculated positions of positionmeasurementpoints, the position of a area of interest on the object can be accurately detected. [0026]

In the first position detection method of the present invention, the reference positions can be determined in advance on the basis of design information. [0027]

The probability density of each calculated mark (a positionmeasurementpoint) position directly reflects the certainty of a respective mark position. Therefore, in the first position detection method of the present invention, the errors are evaluated by multiplying the errors of the calculated positions relative to the respective reference positions by the respective probability densities of the calculated positions. [0028]

In the first position detection method of the present invention, normal distributions can be adopted as the probability density functions. In this way, it is particularly valid to presume the occurrence probability distributions to be a normal distribution when variations of errors of the calculated mark positions relative to the respective reference positions are expected to be random like normal random numbers. When the occurrence probability distribution is known, a probability density function of that probability distribution can be used instead of a normal distribution. On the other hand, when the occurrence probability distribution is unknown, it is rational to presume the occurrence probability distribution to be a normal distribution, which is the most general probability distribution. [0029]

In the first position detection method of the present invention, position measurement marks can be formed at the positionmeasurementpoints. In such case, the position of each positionmeasurementpoint can be measured by detecting a respective position measurement mark. Note that the position measurement mark can be, e.g., a lineandspace mark, boxinbox mark, and the like. [0030]

In this case, a plurality of position detection marks formed at the plurality of positionmeasurementpoints can include a first number of first marks, of which surface states change in a first direction, and the position information of each first mark measured in the measurement step can be position information of a plurality of feature portions in the first direction of each first mark. In such a case, the position of a first mark in the first direction can be calculated by measuring and processing the position information of the first mark. Also, a probability density function that represents the occurrence probability of the calculated position can be estimated based on its design reference position and measured position information. When the first mark periodically changes in the first direction like a lineandspace mark, the average value of positions of a plurality of feature portions in the first direction such as the boundaries between lines and spaces, which represents the central position of the first mark, can be used as the position of the first mark, and the probability density function that represents the occurrence probability of the central position can be estimated. [0031]

The position information of each selected first mark can be measured in the measurement step at a plurality of positions in a direction perpendicular to the first direction. In this case, since the number of pieces of position information to be processed increases, the position of the first mark in the first direction can be accurately calculated, and the probability density function that represents the occurrence probability of the calculated position can be accurately estimated. [0032]

The surface state of each first mark can also change in a second direction different from the first direction, and the position information of the first mark measured in the measurement step can include position information, in the first direction, of a plurality of feature portions lined in the first direction of the first mark, and position information, in the second direction, of a plurality of feature portions lined in the second direction of the first mark. In this case, the twodimensional position of the first mark can be calculated based on the measurement result of the position information of the first mark, and the probability density function that represents the occurrence probability of the calculated position can be estimated based on the design reference position and measured position information. That is, information that pertains to the twodimensional position of the object can be calculated. [0033]

In the measurement step, for each selected first mark, at least one of position information in the first direction of a plurality of feature portions in the first direction and position information in the second direction of a plurality of feature portions in the second direction can be measured. In this case, since the number of pieces of position information to be processed increases for at least one of the first and second directions, the position of the first mark in a direction in which the number of pieces of position information to be processes has increased can be accurately calculated, and the probability density function that represents the occurrence probability of the calculated position can be accurately estimated. [0034]

The plurality of marks can further include a second number of second marks, of which surface states change in a second direction different from the first direction, and the position information of each second mark measured in the measurement step can be position information of a plurality of feature portions in the second direction of the second mark. In this case, the position of the second mark in the second direction can be calculated by measuring and processing position information of the second mark in the same manner as for the first mark, and a probability density function that represents the occurrence probability of the calculated position can be estimated based on its design reference position and measured position information. That is, information that pertains to a twodimensional position of the object can be calculated. [0035]

In the measurement step, the position information of each selected second mark can be measured at a plurality of positions in a direction perpendicular to the second direction. In this case, since the number of pieces of position information to be processed increases, the position of the second mark in the second direction can be accurately calculated, and the probability density function that represents the occurrence probability of the calculated position can be accurately estimated. [0036]

Furthermore, in the first position detection method of the present invention in which position measurement marks are formed at positionmeasurementpoints, a plurality of divided areas can be arranged on an object, and position measurement marks can be contained in each of the plurality of divided areas. In this case, the arrangement coordinate position of each divided area on the object can be accurately detected. [0037]

In addition, the predetermined number of parameters can include parameters associated with representative points of the plurality of divided areas. In this case, the arrangement of the representative points, e.g. the central points, of the plurality of divided areas on the object can be calculated, the arrangement being referred to as an arrangement coordinate system. [0038]

Note that the predetermined number of parameters can further include parameters associated with points other than the representative points of the plurality of divided areas. In this case, in addition to the arrangement coordinate system of representative points of the plurality of divided areas on the object, a divided area coordinate system that specifies the direction of pattern transfer, scale, and the like on the divided areas can be calculated. [0039]

According to the second aspect, there is provided a second position detection method for detecting position information of any area on an object provided with a first number of positionmeasurementpoints, the position detection method comprising a first step of selecting a plurality of measurement point subsets which each consist of a third number of positionmeasurementpoints and are different from one another, the third number being larger than a second number and smaller than the first number, the second number being a minimum number of measurement points required to calculate a predetermined number of parameters that uniquely specify position information of any area on the object; and a second step of statistically calculating, for each of the plurality of measurement point subsets, estimations of the predetermined number of parameters and certainty of the estimations, based on measurement results of the third number of positionmeasurementpoints. For each measurement point subset, the certainty of the estimations of the predetermined number of parameters is calculated using the calculated estimations, and is determined in accordance with position errors of positionmeasurementpoints, which are used to calculate the estimations, relative to respective expected positions. If the deviation of position errors of the positionmeasurementpoints used to calculate the estimations is large, the certainty is low; if the deviation of the position errors is small, the certainty is high. [0040]

According to this method, a plurality of different measurement point subsets are selected, and for each of the measurement point subsets, estimations and their certainty of the predetermined number of parameters which uniquely specify position information of any area on an object are statistically calculated. The estimations and their certainty of the predetermined number of parameters (to be also referred to as “position parameters” hereinafter) of each measurement point subset reflect the position distribution of all positionmeasurementpoints. Therefore, the position distribution of all positionmeasurementpoints can be accurately estimated based on respective groups of the estimations and their certainty of the predetermined number of parameters for the plurality of measurement point subsets that are selected empirically or arbitrarily. [0041]

The second position detection method can further comprise the third step of obtaining statistically valid values of the predetermined number of parameters, based on the respective groups of estimations and certainty for the plurality of measurement point subsets calculated in the second step. In this case, since statistically valid position parameter values are calculated on the basis of respective groups of the estimations and their certainty of position parameters for the measurement point subsets, and the groups of the estimations and certainty each statistically reflect the predetermined number of parameters that are statistically determined by broadly sampling from all positionmeasurementpoints, statistically valid values of the predetermined number of parameters for all the positionmeasurementpoints can be accurately calculated. [0042]

Furthermore, the statistically valid value of each of the predetermined number of parameters is obtained by calculating average of the corresponding estimations weighted with respective certainties, each of the certainties representing a piece of weight information for the respective estimation. In this case, since the weighted mean of estimations is calculated using the respective certainties of the estimations as respective weights of the estimations, the rational evaluation of the estimations can be performed in which estimations with a low certainty contribute less, and in which other estimations with a high certainty contribute more. And statistically valid values of the predetermined number of parameters for all the positionmeasurementpoints can be accurately and easily calculated. [0043]

In the second position detection method of the present invention, in the second step, certainties of position measurement results of the positionmeasurementpoints can be taken in account for calculating the estimations and certainty thereof. In this case, since the estimations and their certainty of the predetermined number of parameters are calculated considering the certainties of the position measurement results at the positionmeasurementpoints, statistically more valid values of the predetermined number of parameters can be calculated. [0044]

In addition, the second step can comprise an estimation step of calculating, for each of the plurality of measurement point subsets, respective positions of the third number of positionmeasurementpoints based on measurement results of the third number of positionmeasurementpoints and estimating probability density functions that each represent occurrence probability of the calculated position of the respective, selected positionmeasurementpoint; a probability density calculation step of calculating respective probability densities of the calculated positions of the positionmeasurementpoints, based on the probability density functions; and a parameter calculation step of evaluating an error of each of the calculated positions relative to respective reference position using the respective calculated probability density's value as a piece of weight information and calculating estimations of the predetermined number of parameters, based on the evaluated errors. In this case, since errors between the calculated positions and respective reference positions are weighted in accordance with the information of the certainties of the calculated positions of the positionmeasurementpoints, i.e. the probability densities at the calculated positions of the positionmeasurementpoints, statistically valid estimations of the predetermined number of parameters can be calculated which uniquely specify position information of any area on an object and rationally reflect the certainties of the calculated positions of the positionmeasurementpoints. [0045]

In the second position detection method of the present invention, position measurement marks can be formed at the positionmeasurementpoints as in the first position detection method of the present invention. And, a plurality of divided areas can be arranged on the object, and position measurement marks can be contained in each of the plurality of divided areas. [0046]

According to the third aspect, there is provided a third position detection method for detecting position information of any area on an object provided with a first number of positionmeasurementpoints, the position detection method comprising a first step of selecting a first measurement point subset which consists of a third number of positionmeasurementpoints, the third number being larger than a second number and smaller than the first number, the second number being a minimum number of measurement points required to calculate a predetermined number of parameters that uniquely specify position information of any area on the object; a second step of selecting a plurality of second measurement point subsets which each consist of a fourth number of positionmeasurementpoints and are different from one another, the fourth number being larger than the second number and smaller than the third number; and a third step of statistically evaluating possibility of replacing the first measurement point subset by one of the plurality of second measurement point subsets, based on measurement results of the third number of positionmeasurementpoints composing the first measurement point subset and measurement results of sets of the fourth number of positionmeasurementpoints each composing one of the second measurement point subsets, the first measurement point subset being used to calculate the predetermined number of parameters. [0047]

According to this, for each of the plurality of second measurement point subsets, it is evaluated whether or not it is possible to replace the initially selected sample set (first measurement point subset) by a sample set including a smaller number of elements. That is, to determine whether or not it is possible to reduce the number of positionmeasurementpoints used to calculate the predetermined number of parameters, it is evaluated, based on the position measurement results at the positionmeasurementpoints of the first measurement point subset and those of each second measurement point subset, whether or not the position error distribution of positionmeasurementpoints in the second measurement point subset is similar to that of positionmeasurementpoints of the first measurement point subset, in other words, whether or not the second measurement point subset and the first measurement point subset equally reflect the entire set of all positionmeasurementpoints. Therefore, upon reducing the number of positionmeasurementpoints as elements of a sample set, statistical validity of the calculated values of the predetermined number of parameters can be maintained. [0048]

In the third position detection method of the present invention, the third step can comprise a fourth step of statistically calculating estimations of the predetermined number of parameters and certainty of the estimations, based on measurement results of the third number of positionmeasurementpoints composing the first measurement point subset; a fifth step of statistically calculating estimations of the predetermined number of parameters and certainty of the estimations for each of the plurality of second measurement point subsets, based on measurement results of the fourth number of positionmeasurementpoints; and a sixth step of comparing the estimations and certainty of the first measurement point subset with the estimations and certainty for each of the plurality of second measurement point subsets and evaluating possibility of replacing the first measurement point subset by one of the plurality of second measurement point subsets, the first measurement point subset being used to calculate the predetermined number of parameters. [0049]

In this case, the estimations and their certainty of the predetermined number of parameters calculated based on the position measurement results at the positionmeasurementpoints of the first measurement point subset are compared with those calculated based on the position measurement results at the positionmeasurementpoints of each second measurement point subset. In this comparison, the certainties of respective groups of the estimations of the two measurement point subsets are compared as well as the groups of the estimations, the certainties each reflecting deviation of the position error distribution of positionmeasurementpoints of the respective measurement point subset. And by examining the two comparison results, the position error distribution of positionmeasurementpoints of the first measurement point subset is compared with that of positionmeasurementpoints of the second measurement point subset. Therefore, it can be determined whether or not one of the plurality of second measurement point subsets and the first measurement point subset equally reflect the entire set of all positionmeasurementpoints. [0050]

Furthermore, in the fourth step, certainties of position measurement results of the positionmeasurementpoints can be taken in account for calculating the estimations and certainty thereof. In this case, since the estimations and their certainty of the predetermined number of parameters are calculated considering the certainties of the position measurement results at the positionmeasurementpoints, statistically more valid values of the predetermined number of parameters can be calculated. [0051]

Additionally, the fourth step can comprise an estimation step of calculating respective positions of the third number of positionmeasurementpoints, which compose the first measurement point subset, based on measurement results of the third number of positionmeasurementpoints and estimating probability density functions that each represent occurrence probability of the calculated position of a respective point of the third number of positionmeasurementpoints; a probability density calculation step of calculating respective probability densities of the calculated positions of the positionmeasurementpoints, based on the probability density functions; and a parameter calculation step of evaluating an error of each of the calculated positions relative to respective reference position using the respective calculated probability density's value as a piece of weight information and calculating estimations of the predetermined number of parameters, based on the evaluated errors. In this case, since, for the third number of positionmeasurementpoints composing the first measurement point subset, errors between the calculated positions and their reference positions are weighted in accordance with the information of the certainties of the calculated positions of the positionmeasurementpoints, i.e. the probability densities at the calculated positions of the positionmeasurementpoints, statistically valid estimations of the predetermined number of parameters can be calculated which uniquely specify position information of any area on an object and rationally reflect the certainties of the calculated positions of the positionmeasurementpoints. [0052]

In addition, in the fifth step, certainties of position measurement results of the positionmeasurementpoints can be taken into account upon calculating the estimations and certainty thereof. Moreover, the fifth step can comprise an estimation step of calculating respective positions of the fourth number of positionmeasurementpoints, for each of the plurality of second measurement point subsets, based on measurement results of the fourth number of positionmeasurementpoints and estimating probability density functions that each represent occurrence probability of the calculated position of a respective point of the fourth number of positionmeasurementpoints; a probability density calculation step of calculating respective probability densities of the calculated positions of the positionmeasurementpoints, based on the probability density functions; and a parameter calculation step of evaluating an error of each of the calculated positions relative to respective reference position using the respective calculated probability density's value as a piece of weight information and calculating estimations of the predetermined number of parameters, based on the evaluated errors. [0053]

In addition, in the third position detection method according to this invention, the third step can comprise a fourth step of statistically calculating, for each of the second measurement point subsets, estimations of the predetermined number of parameters and certainty of the estimations, based on measurement results of the fourth number of positionmeasurementpoints; a fifth step of statistically calculating position errors of the positionmeasurementpoints of the first measurement point subset through use of the estimations of the predetermined number of parameters calculated in the fourth step and evaluating possibility of replacing the first measurement point subset by one of the plurality of second measurement point subsets. [0054]

In this case, by calculating position errors of positionmeasurementpoints composing the first measurement point subset by using the estimations, of the predetermined number of parameters, calculated on the basis of the position measurement results at the positionmeasurementpoints of each second measurement point subset, the position error distribution of positionmeasurementpoints in the first measurement point subset can be obtained. Therefore, without calculating the estimations and their certainty of the predetermined number of parameters on the basis of the position measurement results at the positionmeasurementpoints of the first measurement point subset, it can be determined whether or not one of the plurality of second measurement point subsets and the first measurement point subset equally reflect the entire set of all positionmeasurementpoints. [0055]

Especially, in the case where a second measurement point subset to be compared is a subset of the first measurement point subset, by calculating position errors of positionmeasurementpoints which are included in the first measurement point subset but not included in the second measurement point subset, the estimations and their certainty of the predetermined number of parameters can be calculated for the case where the first measurement point subset is used as a sample set. Therefore, it can be quickly determined whether or not one of the plurality of second measurement point subsets and the first measurement point subset equally reflect the entire set of all positionmeasurementpoints. [0056]

Furthermore, in the fourth step, certainties of position measurement results of the positionmeasurementpoints can be taken in account for calculating the estimations and certainty thereof. In this case, since the estimations and their certainty of the predetermined number of parameters are calculated considering the certainties of the position measurement results at the positionmeasurementpoints, statistically more valid values of the predetermined number of parameters can be calculated. [0057]

In addition, the fourth step can comprise an estimation step of calculating respective positions of the fourth number of positionmeasurementpoints, for each of the plurality of second measurement point subsets, based on measurement results of the fourth number of positionmeasurementpoints and estimating probability density functions that each represent occurrence probability of the calculated position of a respective point of the fourth number of positionmeasurementpoints; a probability density calculation step of calculating respective probability densities of the calculated positions of the positionmeasurementpoints, based on the probability density functions; and a parameter calculation step of evaluating an error of each of the calculated positions relative to respective reference position using the respective calculated probability density's value as a piece of weight information and calculating estimations of the predetermined number of parameters, based on the evaluated errors. In this case, since, for each of the second measurement point subsets, errors between the calculated positions and their reference positions are weighted in accordance with the information of the certainties of the calculated positions of the positionmeasurementpoints, i.e. the probability densities at the calculated positions of the positionmeasurementpoints, statistically valid estimations of the predetermined number of parameters can be calculated which uniquely specify position information of any area on an object and rationally reflect the certainties of the calculated positions of the positionmeasurementpoints. [0058]

Furthermore, the third position detection method according to this invention can further comprise the fourth step which, if the third step finds second measurement point subsets that can replace the first measurement point subset, selects the most valid, second measurement point subset for replacement and adopts estimations of the predetermined number of parameters, calculated based on measurement results of the fourth number of positionmeasurementpoints of the selected second measurement point subset, as values thereof, and which, if the third step finds no second measurement point subsets that can replace the first measurement point subset, adopts estimations of the predetermined number of parameters, calculated based on measurement results of the second number of positionmeasurementpoints of the first measurement point subset, as values thereof. [0059]

In this case, if the third step finds second measurement point subsets that can replace the first measurement point subset, i.e. if the number of positionmeasurementpoints can be reduced maintaining the statistical validity, the most valid, second measurement point subset for replacement is adopted as the sample set. On the other hand, if the third step finds no second measurement point subsets that can replace the first measurement point subset, i.e. if the number of positionmeasurementpoints can not be reduced maintaining the statistical validity, the first measurement point subset is adopted as the sample set. Then, the estimations of the predetermined number of parameters calculated based on the position measurement results at the positionmeasurementpoints of the sample set are adopted as the values of the predetermined number of parameters. Therefore, the number of positionmeasurementpoints used to calculate the values of the predetermined number of parameters can be reduced while maintaining the statistical validity, and improvement of the position detection speed can be achieved maintaining the accuracy. [0060]

In the third position detection method according to this invention, position measurement marks can be formed at the positionmeasurementpoints in the same manner as in the first position detection method, and a plurality of divided areas, each of which is provided with the position measurement marks, can be arranged on the object. [0061]

According to the fourth aspect of this invention, there is provided a fourth position detection method for detecting position information of any area on an object provided with a first number of positionmeasurementpoints, the position detection method comprising a first step of selecting a plurality of first measurement point subsets which each consist of a third number of positionmeasurementpoints and are different from one another, the third number being larger than a second number and smaller than the first number, the second number being a minimum number of measurement points required to calculate a predetermined number of parameters that uniquely specify position information of any area on the object; a second step of selecting a plurality of second measurement point subsets which each consist of a fourth number of positionmeasurementpoints and are different from one another, the fourth number being larger than the second number and smaller than the third number; and a third step of statistically evaluating possibility of replacing the plurality of first measurement point subsets by one of the plurality of second measurement point subsets, as a measurement point set used to calculate the predetermined number of parameters. [0062]

According to this method, for each of the plurality of second measurement point subsets, it is evaluated whether or not it is possible to replace the plurality of initially selected sample sets (the plurality of first measurement point subsets) by one sample set composed of a fewer number of elements. That is, to determine whether or not the number of positionmeasurementpoints used to calculate values of the predetermined number of parameters and the processing volume of the position measurement results can be reduced, it is evaluated whether or not the position error distribution of positionmeasurementpoints in one of the plurality of second measurement point subsets is similar to a position error distribution, for all positionmeasurementpoints, estimated based on the position measurement results at the positionmeasurementpoints of the plurality of first measurement point subsets. Therefore, upon reducing the number of positionmeasurementpoints as elements of a sample set and reducing the processing volume of the position measurement results, statistical validity of the calculated values of the predetermined number of parameters can be maintained. [0063]

In the fourth position detection method according to this invention, the third step can comprise a fourth step of statistically calculating, for each of the plurality of first measurement point subsets, estimations of the predetermined number of parameters and certainty of the estimations, based on measurement results of the third number of positionmeasurementpoints; a fifth step of calculating statistically valid estimations of the predetermined number of parameters and certainty of the estimations, based on groups of the estimations and certainty thereof for the plurality of first measurement point subsets, calculated in the fourth step; a sixth step of statistically calculating estimations of the predetermined number of parameters and certainty of the estimations for each of the plurality of second measurement point subsets, based on measurement results of the fourth number of positionmeasurementpoints; and a seventh step of comparing the statistically valid estimations and certainty with the estimations and certainty for each of the plurality of second measurement point subsets and evaluating possibility of adopting one of the plurality of second measurement point subsets as a measurement point set used to calculate the predetermined number of parameters. [0064]

In this case, the statistically valid estimations and their certainty of the predetermined number of parameters calculated based on the position measurement results at the positionmeasurementpoints of the plurality of first measurement point subsets are compared with the estimations and their certainty of the predetermined number of parameters calculated based on the position measurement results at the positionmeasurementpoints of each second measurement point subset. In this comparison, the certainties of the two groups of the estimations are compared as well as the groups of the estimations, the certainties each reflecting deviation of the position error distribution of positionmeasurementpoints of the respective measurement point subset. And by examining the two comparison results, the two position error distributions are compared. Therefore, it can be determined whether or not one of the plurality of second measurement point subsets and the first measurement point subset equally reflect the entire set of all positionmeasurementpoints. [0065]

Furthermore, in the fourth step, certainties of position measurement results of the positionmeasurementpoints can be taken in account for calculating the estimations and certainty thereof. In this case, since the estimations and their certainty of the predetermined number of parameters are calculated considering the certainties of the position measurement results at the positionmeasurementpoints, statistically more valid values of the predetermined number of parameters can be calculated. [0066]

Furthermore, the fourth step can comprise an estimation step of calculating respective positions of the third number of positionmeasurementpoints, for each of the plurality of first measurement point subsets, based on measurement results of the third number of positionmeasurementpoints and estimating probability density functions that each represent occurrence probability of the calculated position of a respective point of the third number of positionmeasurementpoints; a probability density calculation step of calculating respective probability densities of the calculated positions of the positionmeasurementpoints, based on the probability density functions; and a parameter calculation step of evaluating an error of each of the calculated positions relative to respective reference position using the respective calculated probability density's value as a piece of weight information and calculating estimations of the predetermined number of parameters, based on the evaluated errors. In this case, since, for each of the first measurement point subsets, errors between the calculated positions and their reference positions are weighted in accordance with the information of the certainties of the calculated positions of the positionmeasurementpoints, i.e. the probability densities at the calculated positions of the positionmeasurementpoints, statistically valid estimations of the predetermined number of parameters can be calculated which uniquely specify position information of any area on an object and rationally reflect the certainties of the calculated positions of the positionmeasurementpoints. [0067]

In addition, in the sixth step, certainties of position measurement results of the positionmeasurementpoints can be taken into account upon calculating the estimations and certainty thereof. And the sixth step can comprise an estimation step of calculating respective positions of the fourth number of positionmeasurementpoints, for each of the plurality of second measurement point subsets, based on measurement results of the fourth number of positionmeasurementpoints and estimating probability density functions that each represent occurrence probability of the calculated position of a respective point of the fourth number of positionmeasurementpoints; a probability density calculation step of calculating respective probability densities of the calculated positions of the positionmeasurementpoints, based on the probability density functions; and a parameter calculation step of evaluating an error of each of the calculated positions relative to respective reference position using the respective calculated probability density's value as a piece of weight information and calculating estimations of the predetermined number of parameters, based on the evaluated errors. [0068]

Furthermore, the fourth position detection method can further comprise the eighth step which, if the third step finds second measurement point subsets that can replace the first measurement point subset, selects the most valid, second measurement point subset for replacement and adopts estimations of the predetermined number of parameters, calculated based on measurement results of the fourth number of positionmeasurementpoints of the selected second measurement point subset, as values thereof, and which, if the third step finds no second measurement point subsets that can replace the first measurement point subset, adopts the statistically valid estimations as values of the predetermined number of parameters. Therefore, the number of positionmeasurementpoints used to calculate the values of the predetermined number of parameters can be reduced while maintaining the statistical validity, and improvement of the position detection speed can be achieved maintaining the accuracy. [0069]

In addition, in the fourth position detection method, the third step can comprise a fourth step of statistically calculating, for each of the second measurement point subsets, estimations of the predetermined number of parameters and certainty of the estimations, based on measurement results of the fourth number of positionmeasurementpoints; a fifth step of calculating position errors of all the positionmeasurementpoints of the plurality of first measurement point subsets through use of the estimations of the predetermined number of parameters calculated for each of the second measurement point subsets and evaluating possibility of replacing the plurality of first measurement point subsets by one of the plurality of second measurement point subsets. [0070]

In this case, by calculating position errors of positionmeasurementpoints of the plurality of first measurement point subsets by using the estimations, of the predetermined number of parameters, calculated based on the position measurement results at the positionmeasurementpoints of each second measurement point subset, the position error distribution for all positionmeasurementpoints, which will be estimated if the plurality of first measurement point subsets serve as the sample set, can be obtained. Therefore, without calculating groups of the estimations and their certainty of the predetermined number of parameters on the basis of the position measurement results at the positionmeasurementpoints of the plurality of first measurement point subsets and thus the statistically valid estimations and their certainty of the predetermined number of parameters, it can be determined whether or not one of the plurality of second measurement point subsets reflects the entire set of all positionmeasurementpoints. [0071]

Furthermore, in the fourth step, certainties of position measurement results of the positionmeasurementpoints can be taken in account for calculating the estimations and certainty thereof. In this case, since the estimations and their certainty of the predetermined number of parameters are calculated considering the certainties of the position measurement results at the positionmeasurementpoints, statistically more valid values of the predetermined number of parameters can be calculated. [0072]

Furthermore, the fourth step can comprise an estimation step of calculating respective positions of the fourth number of positionmeasurementpoints, for each of the plurality of second measurement point subsets, based on measurement results of the fourth number of positionmeasurementpoints and estimating probability density functions that each represent occurrence probability of the calculated position of a respective point of the fourth number of positionmeasurementpoints; a probability density calculation step of calculating respective probability densities of the calculated positions of the positionmeasurementpoints, based on the probability density functions; and a parameter calculation step of evaluating an error of each of the calculated positions relative to respective reference position using the respective calculated probability density's value as a piece of weight information and calculating estimations of the predetermined number of parameters, based on the evaluated errors. In this case, since, for each of the second measurement point subsets, errors between the calculated positions and their reference positions are weighted in accordance with the information of the certainties of the calculated positions of the positionmeasurementpoints, i.e. the probability densities at the calculated positions of the positionmeasurementpoints, statistically valid estimations of the predetermined number of parameters can be calculated which uniquely specify position information of any area on an object and rationally reflect the certainties of the calculated positions of the positionmeasurementpoints. [0073]

In addition, the fourth position detection method according to this invention can further comprise the fourth step which, if the third step finds second measurement point subsets that can replace the first measurement point subset, selects the most valid one, for replacement, of the second measurement point subsets and adopts estimations of the predetermined number of parameters, calculated based on measurement results of the fourth number of positionmeasurementpoints of the selected second measurement point subset, as values thereof, and which, if the third step finds no second measurement point subsets that can replace the first measurement point subset, statistically calculates estimations of the predetermined number of parameters and certainty of the estimations, based on measurement results of the third number of positionmeasurementpoints for each of the plurality of first measurement point subsets, and adopts as values of the predetermined number of parameters statistically valid estimations thereof calculated based on groups of the estimations and certainty thereof for the plurality of first measurement point subsets. Therefore, the number of positionmeasurementpoints used to calculate the values of the predetermined number of parameters can be reduced while maintaining the statistical validity, and improvement of the position detection speed can be achieved maintaining the accuracy. [0074]

Note that in the fourth position detection method of this invention, in the same manner as in the second position detection method, the statistically valid value of each of the predetermined number of parameters can be obtained by calculating average of the corresponding estimations weighted with the respective certainties, each of the certainties representing a piece of weight information for the respective estimation. [0075]

Furthermore, in the fourth position detection method of this invention, position measurement marks can be formed at the positionmeasurementpoints in the same manner as in the first position detection method. And a plurality of divided areas each of which is provided with the position measurement marks can be arranged on the object. [0076]

According to the fifth aspect of this invention, there is provided a first position detector that detects position information of any area on an object provided with a plurality of positionmeasurementpoints, the position detector comprising a measurement unit that measures pieces of position information of more positionmeasurementpoints than a minimum number of measurements required to calculate values of a predetermined number of parameters, which uniquely specify position information of any area on the object, the positionmeasurementpoints being selected from the plurality of positionmeasurementpoints; an estimation unit that is electrically connected to the measurement unit and that detects respective positions of the selected positionmeasurementpoints, based on the measurement results of the pieces of position information, estimates probability density functions which each represent occurrence probability of the detected position for respective one of the selected positionmeasurementpoints, and calculates probability density of the detected position of each of the positionmeasurementpoints; and a parameter calculation unit that is electrically connected to the measurement unit and the estimation unit and that evaluates detection error of each of the detected positions while using the respective calculated probability density's value as a piece of weight information and calculates such values of the predetermined number of parameters that the detection errors become statistically minimum as a whole, based on the detection errors. [0077]

In this detector, according to the first position detection method of the present invention, the estimation unit calculates the positions of the selected marks (positionmeasurementpoints) and respective probability densities at the calculated mark positions on the basis of position information of the marks measured by the measurement unit. And the parameter calculation unit calculates the values of the predetermined number of parameters that uniquely specify position information of any area on an object. Therefore, the predetermined number of parameters can be accurately calculated, and the position information of any area on an object can be accurately detected. [0078]

In the first position detector of the present invention, the measurement unit can comprise an image pickup unit for picking up images of marks formed on the object. In this case, the position information of a selected mark can be measured on the basis of changes in light intensity according to position in the pickedup mark image. [0079]

According to the sixth aspect, there is provided a second position detector that detects position information of any area on an object provided with a first number of positionmeasurementpoints, the position detector comprising a measurement unit that measures positions of the positionmeasurementpoints; a setselection unit that selects a plurality of measurement point subsets which each consist of a third number of positionmeasurementpoints and are different from one another, the third number being larger than a second number and smaller than the first number, the second number being a minimum number of measurement points required to calculate a predetermined number of parameters that uniquely specify position information of any area on the object; and an estimation computing unit that is electrically connected to the measurement unit and the setselection unit and that statistically calculates, for each of the plurality of measurement point subsets, estimations of the predetermined number of parameters and certainty of the estimations, based on measurement results of the third number of positionmeasurementpoints. [0080]

In this detector, for each measurement point subset selected by the set selection unit, according to the second position detection method of the present invention, the estimation computing unit statistically estimates values of the predetermined number of parameters which uniquely specify position information of any area on an object and calculates the certainty of the estimations on the basis of the positions of positionmeasurementpoints measured by the measurement unit. Therefore, the position distribution of all positionmeasurementpoints can be accurately estimated based on respective groups of the estimations and their certainty of the predetermined number of parameters for the plurality of measurement point subsets that are selected empirically or arbitrarily. [0081]

In the second position detector according to this invention, the estimation computing unit can comprise an estimation unit that, for each of the plurality of measurement point subsets, detects respective positions of the third number of positionmeasurementpoints, based on the measurement results of the pieces of position information of the third number of positionmeasurementpoints, estimates probability density functions which each represent occurrence probability of the detected position for the respective point of the third number of positionmeasurementpoints, and calculates probability density of the detected position of each of the positionmeasurementpoints; and a parameter calculation unit that is electrically connected to the estimation unit and that evaluates detection error of each of the detected positions while using the respective calculated probability density's value as a piece of weight information and calculates such values of the predetermined number of parameters that the detection errors become statistically minimum as a whole, based on the detection errors. In this case, for each of the plurality of measurement point subset, the parameter calculation unit weights errors between the calculated positions and their reference positions in accordance with the information of the certainties of the calculated positions of the positionmeasurementpoints calculated by the estimation unit, i.e. the probability densities at the calculated positions of the positionmeasurementpoints, and calculates estimations of the predetermined number of parameters which uniquely specify position information of any area on an object and rationally reflect the certainties of the calculated positions of the positionmeasurementpoints. Therefore, statistically valid estimations of the predetermined number of parameters can be obtained. [0082]

In addition, the second position detector according to this invention can further comprise a parameter value determining unit that is electrically connected to the estimation computing unit and that calculates statistically valid estimations of the predetermined number of parameters based on groups of the estimations and certainty thereof, calculated by the estimation computing unit, for the plurality of measurement point subsets. In this case, the statistically valid values of the predetermined number of parameters for all positionmeasurementpoints can be accurately obtained. [0083]

According to the seventh aspect of this invention, there is provided a third position detector that detects position information of any area on an object provided with a first number of positionmeasurementpoints, the position detector comprising a measurement unit that measures positions of the positionmeasurementpoints; a setselection unit that selects a first measurement point subsets, which each consist of a third number of positionmeasurementpoints, and a plurality of second measurement point subsets which each consist of a fourth number of positionmeasurementpoints and are different from one another, the third number being larger than a second number and smaller than the first number, the second number being a minimum number of measurement points required to calculate a predetermined number of parameters that uniquely specify position information of any area on the object, the fourth number being larger than the second number and smaller than the third number; and an evaluation computing unit that is electrically connected to the setselection unit and that evaluates possibility of replacing the first measurement point subset by one of the plurality of second measurement point subsets, the first measurement point subset being used to calculate the predetermined number of parameters. [0084]

In this detector, according to the third position detection method of this invention, the evaluation computing unit statistically evaluates based on positions of positionmeasurementpoints measured by the measurement unit whether or not it is possible to replace the first measurement point subset as a sample set composed of positionmeasurementpoints to be measured to calculate values of the predetermined number of parameters by one of the plurality of second measurement point subsets each of which is composed of a fewer number of elements than the first measurement point subset. Therefore, upon reducing the number of positionmeasurementpoints as elements of a sample set, statistical validity of the calculated values of the predetermined number of parameters can be maintained. [0085]

In the position detector according to this invention, the evaluation computing unit can comprise an estimation calculation unit that is electrically connected to the measurement unit and that statistically calculates, for the specific measurement point subset, estimations of the predetermined number of parameters and certainty of the estimations, based on measurement results of position information of positionmeasurementpoints composing the specific measurement point subset which is selected from the first measurement point subset and the plurality of second measurement point subsets; and an evaluation unit that is electrically connected to the estimation calculation unit and that compares the estimations and certainty of the first measurement point subset with the estimations and certainty for each of the plurality of second measurement point subsets and evaluates possibility of replacing the first measurement point subset by one of the plurality of second measurement point subsets, the first measurement point subset being used to calculate the predetermined number of parameters. [0086]

In this case, the evaluation unit compares the estimations and their certainty of the predetermined number of parameters calculated by the estimation computing unit for the first measurement point subset with those calculated for each second measurement point subset. In this comparison, the certainties of respective groups of the estimations of the two measurement point subsets are compared as well as the groups of the estimations, the certainties each reflecting deviation of the position error distribution of positionmeasurementpoints of the respective measurement point subset. And by examining the two comparison results, the position error distribution of positionmeasurementpoints of the first measurement point subset is compared with that of positionmeasurementpoints of the second measurement point subset. Therefore, it can be determined whether or not one of the plurality of second measurement point subsets and the first measurement point subset equally reflect the entire set of all positionmeasurementpoints. [0087]

Furthermore, the estimation calculation unit can comprise an estimation unit that detects respective positions of positionmeasurementpoints composing the specific measurement point subset, based on the measurement results of position information of positionmeasurementpoints composing the specific measurement point subset, estimates probability density functions which each represent occurrence probability of the detected position for respective one of the positionmeasurementpoints of the specific measurement point subset, and calculates probability density of the detected position of each of the positionmeasurementpoints; and a parameter calculation unit that is electrically connected to the estimation unit and that evaluates detection error of each of the detected positions while using the respective calculated probability density's value as a piece of weight information and calculates such values of the predetermined number of parameters that the detection errors become statistically minimum as a whole, based on the detection errors. In this case, since, for the specific measurement point subset, the parameter calculation unit weights errors between the calculated positions and their reference positions in accordance with the information of the certainties of the calculated positions of the positionmeasurementpoints, i.e. the probability densities at the calculated positions of the positionmeasurementpoints, and calculates statistically valid estimations of the predetermined number of parameters which uniquely specify position information of any area on an object. Therefore, statistically valid estimations of the predetermined number of parameters that rationally reflect the certainties of the calculated positions of the positionmeasurementpoints can be obtained. [0088]

In the third position detector according to this invention, the evaluation computing unit can comprise an estimation calculation unit that is electrically connected to the measurement unit and that statistically calculates, for the specific measurement point subset, estimations of the predetermined number of parameters and certainty of the estimations, based on measurement results of position information of positionmeasurementpoints composing the specific measurement point subset, which is selected from the plurality of second measurement point subsets; an evaluation unit that is electrically connected to the estimation calculation unit and that calculates position errors of the positionmeasurementpoints, composing the first measurement point subset, through use of estimations of the predetermined number of parameters for each of the polarity of second measurement point subsets and evaluates possibility of replacing the first measurement point subset by one of the plurality of second measurement point subsets. [0089]

In this case, by the evaluation unit calculating position errors of positionmeasurementpoints composing in the first measurement point subset by using the estimations, calculated by the estimation computing unit, of the predetermined number of parameters for the second measurement point subset, the position error distribution of positionmeasurementpoints in the first measurement point subset can be obtained. Therefore, without calculating the estimations and their certainty of the predetermined number of parameters on the basis of the position measurement results at the positionmeasurementpoints of the first measurement point subset, it can be determined whether or not one of the plurality of second measurement point subsets and the first measurement point subset equally reflect the entire set of all positionmeasurementpoints. [0090]

In addition, the estimation calculation unit can comprise an estimation unit that detects respective positions of positionmeasurementpoints composing the specific measurement point subset, based on the measurement results of position information of positionmeasurementpoints composing the specific measurement point subset, estimates probability density functions which each represent occurrence probability of the detected position for respective one of the positionmeasurementpoints of the specific measurement point subset, and calculates probability density of the detected position of each of the positionmeasurementpoints; and a parameter calculation unit that is electrically connected to the estimation unit and that evaluates detection error of each of the detected positions while using the respective calculated probability density's value as a piece of weight information and calculates such values of the predetermined number of parameters that the detection errors become statistically minimum as a whole, based on the detection errors. In this case, since, for the specific measurement point subset, the parameter calculation unit weights errors between the calculated positions and their reference positions in accordance with the information of the certainties of the calculated positions of the positionmeasurementpoints, i.e. the probability densities at the calculated positions of the positionmeasurementpoints, and calculates statistically valid estimations of the predetermined number of parameters which uniquely specify position information of any area on an object. Therefore, statistically valid estimations of the predetermined number of parameters that rationally reflect the certainties of the calculated positions of the positionmeasurementpoints can be obtained. [0091]

Furthermore, the third position detector according to this invention can further comprise a parameter value determining unit that is electrically connected to the evaluation computing unit and that calculates values of the predetermined number of parameters, based on evaluation results of the evaluation computing unit. Therefore, the number of positionmeasurementpoints used to calculate the values of the predetermined number of parameters can be reduced while maintaining the statistical validity, and improvement of the position detection speed can be achieved maintaining the accuracy. [0092]

According to the eighth aspect of this invention, there is provided a fourth position detector that detects position information of any area on an object provided with a first number of positionmeasurementpoints, the position detector comprising a measurement unit that measures positions of the positionmeasurementpoints; a setselection unit that selects a plurality of first measurement point subsets, which each consist of a third number of positionmeasurementpoints and are different from one another, and a plurality of second measurement point subsets which each consist of a fourth number of positionmeasurementpoints and are different from one another, the third number being larger than a second number and smaller than the first number, the second number being a minimum number of measurement points required to calculate a predetermined number of parameters that uniquely specify position information of any area on the object, the fourth number being larger than the second number and smaller than the third number; and an evaluation computing unit that is electrically connected to the setselection unit and that evaluates possibility of adopting one of the plurality of second measurement point subsets as a measurement point subset to calculate the predetermined number of parameters. [0093]

In this detector, according to the fourth position detection method of this invention, the evaluation computing unit statistically evaluates based on positions of positionmeasurementpoints measured by the measurement unit whether or not it is possible to replace the plurality of first measurement point subset as a sample set composed of positionmeasurementpoints to be measured to calculate values of the predetermined number of parameters by one of the plurality of second measurement point subsets each of which is composed of a fewer number of elements than the first measurement point subset. Therefore, upon reducing the number of positionmeasurementpoints as elements of a sample set, statistical validity of the calculated values of the predetermined number of parameters can be maintained. [0094]

In the fourth position detector according to this invention, the evaluation computing unit can comprise an estimation calculation unit that is electrically connected to the measurement unit and that statistically calculates, for the specific measurement point subset, estimations of the predetermined number of parameters and certainty of the estimations, based on measurement results of position information of positionmeasurementpoints composing the specific measurement point subset which is selected from the plurality of first measurement point subset and the plurality of second measurement point subsets, and calculates statistically valid estimations of the predetermined number of parameters and certainty of the estimations, based on groups of estimations of the predetermined number of parameters and certainty of the estimations for the plurality of first measurement point subsets; an evaluation computing unit that is electrically connected to the estimation calculation unit and that compares the statistically valid estimations and certainty of the first measurement point subset with the estimations and certainty for each of the plurality of second measurement point subsets, and evaluates possibility of adopting one of the plurality of second measurement point subsets as a measurement point subset to calculate the predetermined number of parameters. [0095]

In this case, the evaluation unit calculates the statistically valid estimations and their certainty of the predetermined number of parameters based on sets of the predetermined number of parameters, for the plurality of first measurement point subsets, calculated by the estimation calculation unit, and compares the statistically valid estimations and their certainty of the predetermined number of parameters, for each second measurement point subset, calculated by the estimation calculation unit with the statistically valid estimations and their certainty of the predetermined number of parameters. In this comparison, the certainties of the two groups of the estimations are compared as well as the groups of the estimations, the certainties each reflecting deviation of the position error distribution of positionmeasurementpoints of the respective measurement point subset. And by examining the two comparison results, the two position error distributions are compared. Therefore, it can be determined whether or not one of the plurality of second measurement point subsets and the first measurement point subset equally reflect the entire set of all positionmeasurementpoints. [0096]

Furthermore, the estimation calculation unit can comprise an estimation unit that is electrically connected to the measurement unit and that detects respective positions of positionmeasurementpoints composing the specific measurement point subset, based on the measurement results of position information of positionmeasurementpoints composing the specific measurement point subset, estimates probability density functions which each represent occurrence probability of the detected position for respective one of the positionmeasurementpoints of the specific measurement point subset, and calculates probability density of the detected position of each of the positionmeasurementpoints; and a parameter calculation unit that is electrically connected to the estimation unit and that evaluates detection error of each of the detected positions while using the respective calculated probability density's value as a piece of weight information and calculates such values of the predetermined number of parameters that the detection errors become statistically minimum as a whole, based on the detection errors. In this case, since, for the specific measurement point subset, the parameter calculation unit weights errors between the calculated positions and their reference positions in accordance with the information of the certainties of the calculated positions of the positionmeasurementpoints, i.e. the probability densities at the calculated positions of the positionmeasurementpoints, and calculates statistically valid estimations of the predetermined number of parameters which uniquely specify position information of any area on an object. Therefore, statistically valid estimations of the predetermined number of parameters that rationally reflect the certainties of the calculated positions of the positionmeasurementpoints can be obtained. [0097]

In the fourth position detector according to this invention, the evaluation computing unit can comprise an estimation calculation unit that is electrically connected to the measurement unit and that statistically calculates, for the specific measurement point subset, estimations of the predetermined number of parameters and certainty of the estimations, based on measurement results of position information of positionmeasurementpoints composing the specific measurement point subset which is selected from the plurality of second measurement point subsets; and an evaluation unit that is electrically connected to the estimation calculation unit and that calculates errors of all the positionmeasurementpoints of the plurality of first measurement point subsets through use of the estimations of the predetermined number of parameters calculated for each of the second measurement point subsets, and evaluates possibility of replacing the plurality of first measurement point subsets by one of the plurality of second measurement point subsets. [0098]

In this case, by the evaluation unit calculating position errors of positionmeasurementpoints of the plurality of first measurement point subsets by using the estimations, of the predetermined number of parameters for each second measurement point subset, calculated by the estimation calculation unit, the position error distribution for all positionmeasurementpoints, which will be estimated if the plurality of first measurement point subsets serve as the sample set, can be obtained. Therefore, without calculating groups of the estimations and their certainty of the predetermined number of parameters on the basis of the position measurement results at the positionmeasurementpoints of the plurality of first measurement point subsets and thus the statistically valid estimations and their certainty of the predetermined number of parameters, it can be determined whether or not one of the plurality of second measurement point subsets reflects the entire set of all positionmeasurementpoints. [0099]

In addition, the estimation calculation unit can comprise an estimation unit that detects respective positions of positionmeasurementpoints composing the specific measurement point subset, based on the measurement results of position information of positionmeasurementpoints composing the specific measurement point subset, estimates probability density functions which each represent occurrence probability of the detected position for respective one of the positionmeasurementpoints of the specific measurement point subset, and calculates probability density of the detected position of each of the positionmeasurementpoints; and a parameter calculation unit that is electrically connected to the estimation unit and that evaluates detection error of each of the detected positions while using the respective calculated probability density's value as a piece of weight information and calculates such values of the predetermined number of parameters that the detection errors become statistically minimum as a whole, based on the detection errors. In this case, since, for the specific measurement point subset, the parameter calculation unit weights errors between the calculated positions and their reference positions in accordance with the information of the certainties of the calculated positions of the positionmeasurementpoints, i.e. the probability densities at the calculated positions of the positionmeasurementpoints, and calculates statistically valid estimations of the predetermined number of parameters which uniquely specify position information of any area on an object. Therefore, statistically valid estimations of the predetermined number of parameters that rationally reflect the certainties of the calculated positions of the positionmeasurementpoints can be obtained. [0100]

The fourth position detector according to this invention can further comprise a parameter value determining unit that is electrically connected to the parameter calculation unit and that calculates values of the predetermined number of parameters, based on evaluation results of the evaluation computing unit. Therefore, the number of positionmeasurementpoints used to calculate the values of the predetermined number of parameters can be reduced while maintaining the statistical validity, and improvement of the position detection speed can be achieved maintaining the accuracy. [0101]

According to the ninth aspect of this invention, there is provided an exposure method for transferring a predetermined pattern onto divided areas on a substrate, comprising an arrangement information calculation step of calculating a predetermined number of parameters that pertain to positions of the divided areas by a position detection method according to this invention and calculating arrangement information of the divided areas on the substrate; and a transfer step of transferring the pattern onto the divided areas while aligning the substrate based on the arrangement information of the divided areas calculated in the arrangement information calculation step. [0102]

According to this method, a pattern is transferred onto divided areas while accurately detecting the arrangement of the divided areas on a substrate using a detection method of the present invention and aligning the substrate on the basis of the detection results. Therefore, a pattern can be accurately transferred onto the divided areas. [0103]

According to the tenth aspect of this invention, there is provided an exposure apparatus that transfers a predetermined pattern onto divided areas on a substrate, comprising a stage unit that moves the substrate along a movement plane; and a position detector according to this invention that calculates arrangement information of the divided areas on the substrate mounted on the stage unit. [0104]

This apparatus transfers a pattern onto divided areas while moving and aligning the substrate through the stage unit on the basis of arrangement of the divided areas detected by the position detection unit of this invention. Therefore, a pattern can be accurately transferred onto the divided areas. [0105]

In addition, in a lithography process, by performing exposure using an exposure apparatus according to the present invention, it is possible to form a multilayer pattern on a substrate with high accuracy of superposition, and therefore it is possible to manufacture a more highly integrated micro device with high yield, and the productivity can be improved. Accordingly, another aspect of the present invention is a device manufactured by using an exposure apparatus of the present invention and a method of manufacturing a device using an exposure method of the present invention.[0106]
BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic view showing the arrangement of an exposure apparatus according to an embodiment; [0107]

FIGS. 2A and 2B are views for explaining exemplary alignment marks; [0108]

FIG. 3 is a schematic block diagram showing the arrangement of the main control system in the apparatus shown in FIG. 1; [0109]

FIG. 4 is a view for explaining a design value of a transferred mark; [0110]

FIG. 5 is a flow chart (part 1) for explaining a position detection operation; [0111]

FIG. 6 is a flow chart (part 2) for explaining the position detection operation; [0112]

FIG. 7 is a view for explaining a measurement result of a mark; [0113]

FIGS. 8A and 8B are views showing an example of a mark measurement result and probability density function; [0114]

FIGS. 9A and 9B are views showing another example of a mark measurement result and probability density function; [0115]

FIG. 10 is a flow chart (part 3) for explaining the position detection operation; [0116]

FIG. 11 is a flow chart for explaining a method of manufacturing devices using the exposure apparatus shown in FIG. 1; and [0117]

FIG. 12 is a flow chart showing a process of the wafer process step of FIG. 11; [0118]

FIG. 13 is a view for explaining a modification (No.1); [0119]

FIG. 14 is a view for explaining another modification (No.2); and [0120]

FIGS. 15A to [0121] 15D are views showing exemplary patterns of twodimensional marks used in the modification (No.2).
DESCRIPTION OF THE PREFERRED EMBODIMETNS

An embodiment of the present invention will be described hereinafter with reference to FIGS. [0122] 1 to 12.

FIG. 1 shows a schematic arrangement of an exposure apparatus [0123] 100 according to an embodiment of the present invention. This exposure apparatus 100 is a stepandscan projection exposure apparatus. The exposure apparatus 100 comprises an illumination system 10, a reticle stage RST for holding a reticle R as a mask, a projection optical system PL, a wafer stage WST on which a wafer W as a substrate (object) is placed, an alignment microscope AS as an image pickup unit, a main control system 20 for systematically controlling the overall apparatus, and the like.

The illumination system [0124] 10 includes a light source, an illuminance uniforming optical system comprising a flyeye lens as an optical integrator, a relay lens, a variable ND filter, a reticle blind, a dichroic mirror, and the like (none of them are shown). The arrangement of such illumination system is disclosed in, e.g., Japanese Patent LaidOpen No. 10112433. Note that the light source unit uses a KrF excimer laser light source (oscillation wavelength=248 nm), an ArF excimer laser light source (oscillation wavelength=193 nm), a harmonic generator such as an F_{2 }laser light source (oscillation wavelength=157 nm), Kr_{2 }(krypton dimer) laser light source (oscillation wavelength=146 nm), Ar_{2 }(argon dimer) laser light source (oscillation wavelength=126 nm), copper vapor laser light source, or YAG laser, an ultrahighpressure mercury lamp (gline, iline, or the like), or the like. Note that a charged particle beam such as Xrays, electron rays, or the like may be used in place of light sent out from the aforementioned light source unit.

The operation of the illumination system [0125] 10 with this arrangement will be briefly explained below. Illumination light emitted by the light source unit enters the illuminance uniforming optical system when the shutter is open. In this way, a large number of secondary light sources are formed at the exit end of the illuminance uniforming optical system, and illumination light components sent out from a large number of secondary light sources reach the reticle blind. Illumination light transmitted through the reticle blind is output via an imaging lens system. This illumination system 10 illuminates slitlike illumination area portions, which are defined by the reticle blind, on the reticle R having circuit patterns and the like formed thereon with illumination light IL with nearly uniform illuminance.

The reticle R is fixed on the reticle stage RST by, e.g., vacuum chucking. The reticle stage RST can be finely driven in the XY plane perpendicular to the optical axis (which coincides with an optical axis AX of the projection optical system PL, described later) of the illumination system and can be driven at a designated scan velocity in a predetermined scan direction (Ydirection in this case) by a reticle stage driver (not shown) comprising a magnetic float type twodimensional linear actuator so as to align the reticle R. Furthermore, in this embodiment, since the magnetic float type twodimensional linear actuator includes a Zdrive coil in addition to an Xdrive coil, Ydrive coil, and the like, the reticle stage can also be finely driven in the Zdirection. [0126]

The position of the reticle stage RST within a stage moving surface is always detected by a reticle laser interferometer (to be referred to as a “reticle interferometer” hereinafter) [0127] 16 via a movable mirror 15 at a resolution of around 0.5 to 1 nm. The position information of the reticle stage RST from the reticle interferometer 16 is sent to a stage control system 19, which drives the reticle stage RST via the reticle stage driver (not shown) on the basis of the position information of the reticle stage RST.

The projection optical system PL is disposed below the reticle stage RST in FIG. 1, and the direction of its optical axis AX is defined as the Zaxis direction. As the projection optical system PL, for example, a refraction optical system which is bothside telecentric, and has a predetermined reduction ratio (e.g., ⅕, ¼, or {fraction (1/6)}) is used. For this reason, when the illumination area of the reticle R is illuminated with illumination light IL coming from the illumination optical system, a reducedscale image (partial inverted image) of the circuit pattern on the reticle R within that illumination area is formed on the wafer W, the surface of which is applied with a resist (photosensitive agent), via the projection optical system PL. [0128]

The wafer stage WST is disposed on a base BS below the projection optical system PL in FIG. 1, and a wafer holder [0129] 25 is mounted on the wafer stage WST. The wafer W is fixed by, e.g., vacuum chucking or the like on the wafer holder 25. The wafer holder 25 can tilt in any direction with respect to a plane perpendicular to the optical axis of the projection optical system PL by a driver (not shown), and can also be finely movable in the direction of the optical axis (Zdirection) of the projection optical system PL. Also, the wafer holder 25 is finely rotatable about the optical axis AX.

The wafer stage WST is movable not only in the scan direction (Ydirection) but also in a direction (Xdirection) perpendicular to the scan direction so as to locate a plurality of shot areas on the wafer W on an exposure area conjugated with the illumination region, and performs a stepandscan operation in which an operation of scanning and exposing each shot area on the wafer W and an operation of moving the wafer to the exposure start position of the next shot are repeated. The wafer stage WST is twodimensionally driven by a wafer stage driver [0130] 24 including a motor and the like.

The position of the wafer stage WST in the XY plane is always detected by a wafer laser interferometer [0131] 18 via a movable mirror 17 at a resolution of around 0.5 to 1 nm. Position information (or velocity information) WPV of the wafer stage WST is sent to the stage control system 19, which controls the wafer stage WST on the basis of this position information (or velocity information) WPV.

The alignment microscope AS is an offaxis alignment detector disposed on the side surface of the projection optical system PL. This alignment microscope AS outputs image pickup results of alignment marks (wafer marks) contained in each shot area on the wafer W. As the alignment marks, an Xposition detection mark MX(i, j) and Yposition detection mark MY(i, j), which are formed on street lines around a shot area SA(i, j) on the wafer W, as shown in, e.g., FIG. 2A, are used. As the marks MX(i, j) and MY(i, j), a lineandspace mark having a periodic structure in the detecting direction can be used, as represented by, e.g., the mark MX(i, j) shown on a larger scale in FIG. 2B. Note that FIG. 2B illustrates a lineandspace mark having three lines. However, the number of lines in a lineandspace mark used as the mark MX(i, j) (or mark MY(i, j)) can be three or more. The alignment microscope AS outputs pickedup image data IMD as its image pickup result to the main control system [0132] 20 (see FIG. 1).

As shown in FIG. 3, the main control system [0133] 20 comprises a main control unit 30 and storage unit 40. The main control unit 30 comprises a control unit 39 which controls the operation of the exposure apparatus 100 by, e.g., supplying stage control data SCD to the stage control system 19 and serves as a set selection unit for selecting a sample set and replacement candidate set, and a position arithmetic unit 37. The position arithmetic unit 37 comprises a pickedup image data acquisition unit 31, a mark information calculation unit 32 for calculating position information of pickedup marks MX and MY on the basis of the pickedup image data acquired by the pickedup image data acquisition unit 31, a parameter calculation unit 33 for calculating estimations of position parameters which uniquely determine the arrangement of shot areas SA, a valid value calculation unit 34 for calculating statistically valid position parameter values, and an evaluation unit 35 for evaluating possibility of replacing a sample set by another one containing a fewer number of elements. The storage unit 40 has pickedup image data stored area 41, sample set information storage area 42, mark information storage area 43, estimation storage area 45, and valid value storage area 44 therein.

Note that the aforementioned alignment microscope AS, control unit [0134] 39, and position arithmetic unit 37 constitute a position detector. Also, the mark information arithmetic unit 32, parameter calculation unit 33, and valid value calculation unit 34 constitute an estimation calculation unit, and the estimation calculation unit and evaluation unit 35 constitute an evaluation arithmetic unit. Furthermore, the alignment microscope AS and pickedup image data acquisition unit 31 constitute a measurement unit. Moreover, the parameter calculation unit 33 and valid value calculation unit 34 constitute a parameter determination unit. In FIG. 3, the flow of data is indicated by the solid arrows, and the flow of control is indicated by the dotted arrow. The respective operations of the units in the main control system 20 will be described later.

In this embodiment, the main control system [0135] 20 is constituted by combining various units. Alternatively, the main control system 20 may be constituted as a computer system, and the respective functions of the units that constitute the main control unit 30 may be implemented by programs installed in the computer.

Referring back to FIG. 1, in the exposure apparatus [0136] 100, an obliqueincidenttype multipoint focus detection system is fixed to a support portion (not shown) that supports the projection optical system PL, and comprises an illumination optical system 13 which directs imaging light beams used to form a plurality of slit images in an oblique direction with respect to the direction of the optical axis AX toward the best imaging surface of the projection optical system PL, and a lightreceiving optical system 14 for receiving these imaging light beams reflected by the surface of the wafer W via slits. The stage control system 19 drives the wafer holder 25 in the Zdirection and an oblique direction on the basis of wafer position information from this multipoint focus detection system (13, 14). The detailed arrangement and the like of this multipoint focus position detection system are disclosed in, e.g., Japanese Patent LaidOpen No. 6283403, its corresponding U.S. Pat. No. 5,448,332, and the like. The disclosures in the above Japanese Patent LaidOpen and U.S. patent are incorporated herein by reference as long as the national laws in designated states or elected states, to which this international application is applied, permit.

In the exposure apparatus [0137] 100 with the above arrangement, the arrangement coordinate position of each shot area on the wafer W is detected as follows. As a precondition for detecting the arrangement coordinate position of each shot region, assume that marks MX(i, j) and MY(i, j) are already formed on the wafer W in wafer processing up to the previous layer (e.g., in a process for the first layer).

Also, Xpositions {DX[0138] _{1}(i, j), DX_{2}(i, j), DX_{3}(i, j), DX_{4}(i, j), DX_{5}(i, j), DX_{6}(i, j)} of boundaries (to be referred to as “edges” hereinafter) between lines and spaces in a mark DMX(i, j), ideal in terms of design, corresponding to the mark MX(i, j) are known, as shown in FIG. 4. In FIG. 4, edge positions DX_{k}(i, j) (k=1 to 6) are expressed by DX_{k}. That is, assume

DX _{k+1}(i, j)−DX _{k}(i, j)=ΔX (1)

for the edge positions DX
[0139] _{k}(i, j) (k=1 to 6). Also, the Xposition DX
_{X }of the mark DMX(i, j) is defined by
$\begin{array}{cc}{\mathrm{DX}}_{X}\ue8a0\left(i,j\right)=\left\{\sum _{k=1}^{6}\ue89e{\mathrm{DX}}_{k}\ue8a0\left(i,j\right)\right\}/6& \left(2\right)\end{array}$

and is known. Furthermore, a Yposition DY[0140] _{X}(i, j) of the mark DMX(i, j) is determined upon design, and is known.

Likewise, Ypositions {DY[0141] _{1}(i, j), DY_{2}(i, j), DY_{3}(i, j), DY_{4}(i, j), DY_{5}(i, j), DY_{6}(i, j)} of the edges in a mark DMY(i, j), ideal in terms of design, corresponding to the mark MY(i, j) are known as in the mark DMX(i, j) That is, assume

DY _{k+1}(i, j)−DY _{k}(i, j)=ΔY (3)

for edge positions DY
[0142] _{k}(i, j) (k=1 to 6). Also, the Yposition DY
_{x }of the mark DMY(i, j) is defined by
$\begin{array}{cc}{\mathrm{DY}}_{Y}\ue8a0\left(i,j\right)=\left\{\sum _{k=1}^{6}\ue89e{\mathrm{DY}}_{k}\ue8a0\left(i,j\right)\right\}/6& \left(4\right)\end{array}$

and is known. Furthermore, an Xposition DX[0143] _{Y}(i, j) of the mark DMY(i, j) is determined upon design, and is known.

Detection of arrangement coordinate positions of shot areas on a plurality of wafers W (e.g., wafers for one lot) on which similar patterns (including the first number of marks) are formed by wafer processing up to the previous layer will be described below based on the flow chart shown in FIG. 5 with reference to other drawings as needed. [0144]

In step [0145] 201 in FIG. 5, the control unit 39 selects P (>1) sample sets S_{p}{MX(i_{pm}, j_{pm}), MY (i_{ps}, j_{ps})} (p=1 to P, m=1 to M (third number), s=1 to M, M>4), and also Q (>1) replacement candidate sets R_{q}{MX(i_{qn}, j_{qn}), MY(i_{qt}, i_{qt})} (q=1 to Q, n=1 to N (fourth number), t=1 to N, 4≦N≦M), and stores element information of the sample sets S_{p }and replacement candidate sets R_{q }in the sample set information storage area 42 in FIG. 3.

Note that M marks MX(i[0146] _{pm}, j_{pm}) and M marks MY(i_{ps}, j_{ps}) as elements of each sample set S_{p }are respectively selected not to line up upon design. Also, when comparing any two of the sample sets S_{p}, each set includes at least one element that the other does not include.

N marks MX(i[0147] _{qn}, j_{qn}) and N marks MY(i_{qt}, j_{qt}) as elements of each replacement candidate set R_{q }are also respectively selected not to line up upon design. Furthermore, when comparing any two of the replacement candidate sets R_{q}, each set includes at least one element that the other does not include.

In this embodiment, the numbers of marks MX(i[0148] _{pm}, j_{pm}) and marks MY (i_{ps}, j_{ps}) as elements of the sample set S_{p }are equal to each other (M), but may be different from each other. In such case, each of the numbers of marks MX(i_{pm}, j_{pm}) and marks MY(i_{ps}, j_{ps}) as elements of the sample set S_{p }must be three or more, and the total of them must be larger than 6. Also, in this embodiment, the numbers of marks MX(i_{qn}, j_{qn}) and marks MY(i_{qt}, j_{qt}) as elements of the replacement candidate set R_{q }are equal to each other (N), but may be different from each other. In such case as well, each of the numbers of marks MX(i_{qn}, j_{qn}) and marks MY(i_{qt}, j_{qt}) as elements of the replacement candidate set R_{q }must be three or more, and the total of them must be larger than 6.

The first wafer W is loaded onto the wafer holder [0149] 25 by a wafer loader (not shown), and alignment with coarse accuracy (prealignment) is done by the main control system 20 moving the wafer via the stage control system 19 so that marks MX(i, j) and MY(i, j) are placed within the observation field of view of the alignment microscope AS. Such prealignment is done by the main control system 20 (more specifically, control unit 39) via the stage control system 19 on the basis of observation of the outer shape of the wafer W, the observation result of marks MX(i, j) and MY(i, j) in a broader field of view, and position information (or velocity information) from the wafer interferometer 18.

Referring back to FIG. 5, the positions of the marks MX(i[0150] _{pm}, j_{pm}), MY(i_{ps}, j_{ps}), MX(i_{qn}, j_{qn}), and MY(i_{qt}, j_{qt}) as elements of the sample sets S_{p }or replacement candidate sets R_{q }are measured in subroutine 202.

In subroutine [0151] 202, the wafer W is moved to locate the first mark (Xposition detection mark MX(i_{11}, j_{11})) at the image pickup position of the alignment microscope AS in step 211 in FIG. 6. Such movement is done under the control of the main control system 20 via the stage control system 19.

Subsequently, the alignment microscope AS picks up an image of the mark MX(i[0152] _{11}, j_{11}) in step 212. Then, the pickedup image data acquisition unit 31 stores inputted pickedup image data IMD in the pickedup image data storage area 41 in accordance with an instruction from the control unit 39, thus acquiring pickedup image data IMD.

In step [0153] 213, the mark information calculation unit 32 reads out pickedup image data associated with the mark MX(i_{11}, j_{11}) from the pickedup image data storage area 41, and extracts the Xpositions of six edges as position information in the mark MX(i_{11}, j_{11}) on the basis of the pickedup image data and position information (or velocity information) WPV from the wafer interferometer 18 in accordance with an instruction from the control unit 39. In this manner, (FX_{1}(i_{11}, j_{11}), FX_{2}(i_{11}, j_{11}), FX_{3}(i_{11}, j_{11}), FX_{4}(i_{11}, j_{11}), FX_{5}(i_{11}, j_{11}), FX_{6}(i_{11}, j_{11})) shown in FIG. 7 are extracted as the Xpositions of the six edges.

Such extraction of the Xpositions of the edge can be implemented by analyzing a waveform obtained by scanning the pickedup image data along an XS(i[0154] _{11}, j_{11}) axis which passes all the three line portions of the mark MX(i_{11}, j_{11}) and is parallel to the Xaxis, as shown in FIG. 7, or by analyzing a waveform obtained by integrating the pickedup image data in the Ydirection. Note that the latter method requires a larger arithmetic volume but can accurately extract the Xpositions of the edges.

Referring back to FIG. 6, in step [0155] 214 the mark information calculation unit 32 calculates a position FX(i_{11}, j_{11}) of the mark MX(i_{11}, j_{11}) and a probability density PFX(i_{11}, j_{11}) of that position as follows on the basis of the edge positions FX_{k}(i_{11}, j_{11}) (k=1 to 6) of the mark MX(i_{11}, j_{11}) extracted in step 213, and edge positions DX_{k}(i_{11}, j_{11}) of a ideal mark DMX(i_{11}, j_{11}) corresponding to the mark MX(i_{11}, j_{11}).

The mark information calculation unit
[0156] 32 calculates an Xposition FX(i
_{11}, j
_{11}) of the mark MX(i
_{11}, j
_{11}) by
$\begin{array}{cc}\mathrm{FX}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)=\left\{\sum _{k=1}^{6}\ue89e{\mathrm{FX}}_{k}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)\right\}/6& \left(5\right)\end{array}$

More specifically, the average value of the edge positions FX[0157] _{k}(i_{11}, j_{11}) (k=1 to 6) of the mark MX(i_{11}, j_{11}) is calculated as the Xposition FX(i_{11}, j_{11}) of the mark MX(i_{11}, j_{11}).

The mark information calculation unit [0158] 32 then calculates errors dFX_{k}(i_{11}, j_{11}) of the measured edge positions FX_{k}(i_{11}, j_{11}) corresponding to ideal edge positions DX_{k}(i_{11}, j_{11}) by

dFX _{k}(i _{11} , j _{11})=FX _{k}(i _{11} , j _{11})−DX _{k}(i _{11} , j _{11}) (6)

and then calculates an average value dFX(i
[0159] _{11}, j
_{11}) and standard deviation σX(i
_{11}, j
_{11}) of the errors dFX
_{k}(i
_{11}, j
_{11}) by
$\begin{array}{cc}\mathrm{dFX}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)=\left\{\sum _{k=1}^{6}\ue89e{\mathrm{dFX}}_{k}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)\right\}/6& \left(7\right)\end{array}$
$\begin{array}{cc}\sigma \ue89e\text{\hspace{1em}}\ue89eX\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)={\left[\left\{\sum _{k=1}^{6}\ue89e{\left({\mathrm{dFX}}_{k}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)\mathrm{dFX}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)\right)}^{2}\right\}/5\right]}^{1/2}& \left(8\right)\end{array}$

Note that the Xposition FX(i[0160] _{11}, j_{11}) of the mark MX(i_{11}, j_{11}) given by equation (5), the Xposition DX_{X}(i_{11}, J_{11}) of the aforementioned mark DMX(i_{11}, j_{11}), and the average value dFX(i_{11}, j_{11}) of the errors dFX_{k}(i_{11}, j_{11}) satisfy:

FX(i _{11} , j _{11})=DX _{x}(i _{11} , j _{11})+dFX(i _{11}, j_{11}) (6A)

Hence, upon calculating the Xposition FX(i[0161] _{11}, j_{11}) of the mark MX(i_{11}, j_{11}), the value dFX(i_{11}, j_{11}) may be calculated by equation (7), and the Xposition may be calculated by equation (6A) using this value in place of a calculation given by equation (5).

Upon calculating the value dFX(i[0162] _{11}, j_{11}), in place of a calculation given by equation (7), after the value FX(i_{11}, j_{11}) has been calculated by equation (5), the value dFX(i_{11}, j_{11}) can be calculated by

dFX(i _{11} , j _{11})=FX(i _{11} , j _{11})−DX _{x}(i _{11} , j _{11}) (6B)

Since the generation factors of the errors dFX
[0163] _{k}(i
_{11}, j
_{11}) are considered to be random, the mark information calculation device
32 assumes that their distribution is a normal distribution, and that a probability density function f
_{X11}(dx) of the errors dFX
_{k}(i
_{11}, j
_{11}) is given by
$\begin{array}{cc}{f}_{\mathrm{X11}}\ue8a0\left(\mathrm{dx}\right)=\frac{1}{\sqrt{2\ue89e\pi}\xb7\sigma \ue89e\text{\hspace{1em}}\ue89eX\ue8a0\left({i}_{11},{j}_{11}\right)}\xb7\mathrm{exp}\ue8a0\left[\frac{{\left\{\mathrm{dx}\mathrm{dFX}\ue8a0\left({i}_{11},{j}_{11}\right)\right\}}^{2}}{2\ue89e{\left(\sigma \ue89e\text{\hspace{1em}}\ue89eX\ue8a0\left({i}_{11},{j}_{11}\right)\right)}^{2}}\right]& \left(9\right)\end{array}$

Based on such estimation, the mark information calculation unit [0164] 32 calculates a probability density that the value of a variable dx is dFX(i_{11}, j_{11}), i.e., an occurrence probability pFX(i_{11}, j_{11}) that the Xposition of the mark MX(i_{11}, j_{11}) takes a value FX(i_{11}, j_{11}) by

pFX(i _{11}, j_{11})=f _{X11}(dFX(i _{11} , j _{11}))={(2π)^{½} ·πX(i _{11} , j _{11})}^{−1} (10)

The mark information calculation unit [0165] 32 stores the position FX(i_{11}, j_{11}) of the mark MX(i_{11}, j_{11}) and its occurrence probability pFX(i_{11}, j_{11}) calculated in this way in the mark information storage area 43. In this manner, calculations of the mark information that pertains to the first mark MX(i_{11}, j_{11}) are completed.

FIGS. 8A and 8B show an example wherein the Xposition FX(i[0166] _{11}, j_{11}) of the mark MX(i_{11}, j_{11}) and its occurrence probability pFX(i_{11}, j_{11}) are calculated on the basis of the measured edge positions FX_{k}(i_{11}, j_{11}), and FIGS. 9A and 9B show another example.

In the example shown in FIGS. 8A and 8B, the errors dFX[0167] _{k}(i_{11}, j_{11}) of the measured edge positions FX_{k}(i_{11}, j_{11}) corresponding to the ideal edge positions DX_{k}(i_{11}, j_{11}) are relatively uniform, as shown in FIG. 8A. Note that FIG. 8A illustrates the edge positions DX_{k}(i_{11}, j_{11}) as elements on the Xaxis in the mark DMX(i_{11}, j_{11}), and has symbols DX_{k }attached onto those elements. Also, FIG. 8A illustrates the edge positions FX_{k}(i_{11}, j_{11}) as elements on the Xaxis in the mark MX(i_{11}, j_{11}), and has symbols FX_{k }attached onto those elements.

In the mark MX(i[0168] _{11}, j_{11}) for which the edge positions FX_{k}(i_{11}, j_{11}) shown in FIG. 8A are measured, the probability density function f_{X11}(dx) of the error distribution is steep, as shown in FIG. 8B. That is, the standard deviation σX(i_{11}, j_{11}) is small. As a result, a probability density pFX(i_{11}, j_{11}) that an error takes a value dFX(i_{11}, j_{11}), i.e., the Xposition of the mark MX(i_{11}, j_{11}) takes a value FX(i_{11}, _{11}) becomes larger than that in FIG. 9B to be described below.

Meanwhile, in the example shown in FIGS. 9A and 9B, the errors dFX[0169] _{k}(i_{11}, j_{11}) of the measured edge positions FX_{k}(i_{11}, j_{11}) corresponding to the ideal edge positions DX_{k}(i_{11}, j_{11}) greatly vary, as shown in FIG. 9A. Note that FIG. 9A uses the same expression method as FIG. 8A.

In the mark MX(i[0170] _{11}, j_{11}) for which the edge positions FX_{k}(i_{11}, j_{11}) shown in FIG. 9A are measured, the probability density function f_{X11}(dx) of the error distribution is gradual, as shown in FIG. 9B. That is, the standard deviation σX(i_{11}, j_{11}) of the error distribution takes a large value. As a result, the probability density pFX(i_{11}, j_{11}) that an error takes the value dFX(i_{11}, j_{11}), i.e., the Xposition of the mark XM(i_{11}, j_{11}) assumes the value FX(i_{11}, j_{11}) becomes smaller than that in FIG. 8B mentioned above.

Referring back to FIG. 6, it is checked in step [0171] 215 if mark information calculations for all the selected marks are complete. In the aforementioned process, since the calculations of mark information of only one mark MX(i_{11}, j_{11}), i.e., the mark position FX(i_{11}, j_{11}) of the mark MX(i_{11}, j_{11}) and its probability density pFX(i_{11}, j_{11}) are complete, the answer in step 215 is NO, and the sequence advances to step 216.

In step [0172] 216, the control unit 39 moves the wafer W to a position where the next mark falls within the image pickup field of view of the alignment microscope AS. Such movement of the wafer W is done by moving the wafer stage WST when the control unit 39 controls the wafer drive unit 24 via the stage control system 19 on the basis of the prealignment result.

After that, the Xpositions FX(i[0173] _{pm}, j_{pm}) of other marks MX(i_{pm}, j_{pm}) and their probability densities pFX(i_{pm}, j_{pm}) the Ypositions FY(i_{ps}, j_{ps}) of marks MY(i_{ps}, j_{ps}) and their probability densities pFY(i_{ps}, j_{ps}), the Xpositions FX(i_{qn}, j_{qn}) of marks MX(i_{qn}, j_{qn}) and their probability densities pFX(i_{qn}, j_{qn}), and the Ypositions FY(i_{qt}, j_{qt}) Of marks MY(i_{qt}, j_{qt}) and their probability densities pFY(i_{qt}, j_{qt}) are computed in the same manner as in the case of the aforementioned mark MX(i_{11}, j_{11}) until it is determined in step 215 that the mark information (mark positions and probability densities) of all the selected marks has been calculated. If the mark information of all the selected marks has been calculated, and the answer in step 215 is YES, subroutine 202 ends. And the sequence advances to step 203 in FIG. 5.

In step [0174] 203, the parameter calculation unit 33 reads out the Xpositions (i_{pm}, j_{pm}) (m=1 to M) of the marks MX(i_{pm}, j_{pm}) and their probability densities pFX(i_{pm}, j_{pm}), and the Ypositions FY(i_{ps}, j_{ps}) (s=1 to M) of the marks MY(i_{ps}, j_{ps}) and their probability densities pFY(i_{ps}, j_{ps}) from the mark information storage area 43 for each sample set S_{p }in accordance with an instruction from the control unit 39. The parameter calculation unit 33 then calculates the estimations of parameters which uniquely specify the arrangement of shot areas SA(i, j).

The marks MX(i, j) and MY(i, j) formed on the wafer W deviate from their ideal positions due to a mismatch between a stage coordinate system (X, Y) which specifies the position of the wafer stage WST, and the arrangement coordinate system of shot areas as a design coordinate system, i.e., a wafer coordinate system (α, β), and such a mismatch occurs due to the following four main factors. [0175]

{circle over (1)} Rotation of wafer: This is expressed by a residual rotation error θ of the wafer coordinate system (α, β) with respect to the stage coordinate system (X, Y). [0176]

{circle over (2)} Orthogonality of the stage coordinate system (X, Y): This occurs when the Xaxis and Yaxis feed directions of the wafer stage WST are not accurately orthogonal to each other, and is expressed by an orthogonality error w. [0177]

{circle over (3)} Linear expansion/shrinkage (wafer scaling values) in the α and βdirections of the wafer coordinate system (α, β): This occurs when the wafer W entirely expands/contracts due to wafer processing or the like. This expansion/shrinkage amount is expressed by wafer scaling values R[0178] _{X }and R_{Y }in the α and βdirections. Note that the wafer scaling value R_{X }in the αdirection is represented by the ratio between the actually measured value and design value of the distance between two points in the αdirection on the wafer W, and the wafer scaling value R_{Y }in the βdirection is represented by the ratio between the actually measured value and design value between two points in the βdirection.

{circle over (4)} Offset of the wafer coordinate system (α, β) with respect to the stage coordinate system (X, Y): This occurs when the wafer W has entirely deviated by an infinitesimal amount with respect to the wafer stage WST and is expressed by offset amounts O[0179] _{X }and O_{Y}.

When the aforementioned error factors {circle over (1)} to {circle over (4)} are added, a pattern to be transferred to a target position (DX, DY), in terms of design, on the wafer coordinate system (α, β) is transferred to a position (EX, EY) on the stage coordinate system (X, Y), of which the position is given by
[0180] $\begin{array}{cc}\left(\begin{array}{c}\mathrm{EX}\\ \mathrm{EY}\end{array}\right)=\text{\hspace{1em}}\ue89e\left(\begin{array}{c}{R}_{X},0\\ 0,{R}_{Y}\end{array}\right)\ue89e\left(\begin{array}{c}\mathrm{cos}\ue89e\text{\hspace{1em}}\ue89e\theta ,\mathrm{sin}\ue89e\text{\hspace{1em}}\ue89e\theta \\ \mathrm{sin}\ue89e\text{\hspace{1em}}\ue89e\theta ,\mathrm{cos}\ue89e\text{\hspace{1em}}\ue89e\theta \end{array}\right)\ue89e\text{\hspace{1em}}\ue89e\left(\begin{array}{cc}1,& \mathrm{tan}\ue89e\text{\hspace{1em}}\ue89ew\\ 0,& 1\end{array}\right)\ue89e\left(\begin{array}{c}\mathrm{DX}\\ \mathrm{DY}\end{array}\right)+\left(\begin{array}{c}{O}_{X}\\ {O}_{Y}\end{array}\right)& \left(11\right)\end{array}$

Note that various other error factors are also present in addition to the aforementioned ones upon actual transfer, and the position (EX, EY) is considered as an expected transfer position. [0181]

In general, since the orthogonality error w and residual rotation error θ can be considered as infinitesimal amounts, the target transfer position (DX, DY) and expected transfer position (EX, EY) are related by
[0182] $\begin{array}{cc}\left(\begin{array}{c}\mathrm{EX}\\ \mathrm{EY}\end{array}\right)=\text{\hspace{1em}}\ue89e\left(\begin{array}{cc}{R}_{X},& {R}_{X}\ue8a0\left(w+\theta \right)\\ {R}_{Y}\xb7\theta ,& {R}_{Y}\end{array}\right)\ue89e\left(\begin{array}{c}\mathrm{DX}\\ \mathrm{DY}\end{array}\right)+\left(\begin{array}{c}{O}_{X}\\ {O}_{Y}\end{array}\right)& \left(12\right)\end{array}$

which expresses a firstorder approximation of the trigonometric function in equation (11). [0183]

In the following description, equation (12) can also be expressed by
[0184] $\begin{array}{cc}\left(\begin{array}{c}\mathrm{EX}\\ \mathrm{EY}\end{array}\right)=\left(\begin{array}{c}{A}_{11},{A}_{12}\\ {A}_{21},{A}_{22}\end{array}\right)\ue89e\text{\hspace{1em}}\ue89e\left(\begin{array}{c}\mathrm{DX}\\ \mathrm{DY}\end{array}\right)+\left(\begin{array}{c}{O}_{X}\\ {O}_{Y}\end{array}\right)& \left(13\right)\end{array}$

where [0185]

A_{11}=R_{X} (14)

A_{12}=−R_{X}·(w+θ) (15)

A_{21}=R_{Y}·θ (16)

A_{22}=R_{Y} (17)

That is, parameters which uniquely specify the arrangement of shot areas SA(i, j) are six parameters A[0186] _{11}, A_{12}, A_{21}, A_{22}, O_{X}, and O_{Y}. In step 203, the parameter calculation unit 33 calculates the estimations of these six parameters of each sample set S_{p }as follows.

The parameter calculation unit [0187] 33 calculates an expected transfer Xposition EX(i_{pm}, j_{pm}) of each mark MX(i_{pm}, j_{pm}) from the ideal transfer position (DX_{X}(i_{pm}, j_{pm}), DY_{X}(i_{pm}, j_{pm})) of the mark MX(i_{pm}, j_{pm}) using equation (13), the EX(i_{pm}, j_{pm}) containing the parameters. Subsequently, the parameter calculation unit 33 calculates an expected transfer Yposition EY(i_{ps}, j_{ps}) of each mark MY(i_{ps}, j_{ps}) from the ideal transfer position (DX_{Y}(i_{ps}, j_{ps}), DY_{Y}(i_{ps}, j_{ps})) of the mark MY(i_{ps}, j_{ps}) using equation (13), the EY(i_{ps}, j_{ps}) containing the parameters.

Then, the parameter calculation unit [0188] 33 calculates an error σX_{pm }of the Xposition FX(i_{pm}, j_{pm}), calculated based on the measurement result, relative to the expected transfer Xposition EX(i_{pm}, j_{pm}) for each mark MX(i_{pm}, j_{pm}) by

σX _{pm} =FX(i _{pm} , j _{pm})−EX(i _{pm} , j _{pm}) (18)

The parameter calculation unit [0189] 33 also calculates an error σY_{ps }of the Yposition FY(i_{ps}, j_{ps}), calculated based on the measurement result, relative to the expected transfer Yposition EY(i_{ps}, j_{ps}) for each mark MY(i_{ps}, j_{ps}) by

σY _{ps} =FY(i _{ps} , j _{ps})−EY(i _{ps} , j _{ps}) (19)

Normally, it is said that the more identical the ideal mark shape and actually transferred mark shape are, the more accurate mark transfer onto the wafer W has been done (except for transfer position accuracy). Therefore, the mark position is more reliable as it is calculated from the measurement results of the edge positions of the mark having a shape more identical to the ideal mark's shape. Also, the reliability level of the calculated mark position depends on the occurrence probability of the mark position, i.e., the probability density. [0190]

Hence, the parameter calculation unit [0191] 33 evaluates the errors σX_{pm }and σY_{ps }calculated by equations (18) and (19) using the probability densities pFX(i_{pm}, j_{pm}) and pFY(i_{ps}, j_{ps}), and calculates evaluated errors εX_{pm }and εY_{ps }by

εX _{pm} =pFX(i _{pm} , j _{pm})·δX _{pm} (20)

εY _{ps} =pFY(i _{ps} , j _{ps})·δY _{ps} (21)

The parameter calculation unit
[0192] 33 then calculates the estimations of the six parameters A
_{11}, A
_{12}, A
_{21}, A
_{22}, O
_{X}, and O
_{Y }which minimize a variation S
_{p }given by
$\begin{array}{cc}{S}_{p}=\sum _{m=1}^{M}\ue89e\varepsilon \ue89e\text{\hspace{1em}}\ue89e{X}_{\mathrm{pm}}^{2}+\sum _{s=1}^{M}\ue89e\varepsilon \ue89e\text{\hspace{1em}}\ue89e{Y}_{p\ue89e\text{\hspace{1em}}\ue89es}^{2}& \left(22\right)\end{array}$

by applying the method of least squares on the basis of the evaluated errors εX[0193] _{pm }and εY_{ps}. More specifically, the parameter calculation unit 33 calculates estimations A_{11p}, A_{12p}, A_{21p}, A_{22p}, O_{Xp}, and O_{Yp }of the six parameters A_{11}, A_{12}, A_{21}, A_{22}, O_{X}, and O_{Y }by solving simultaneous equations made up of six equations obtained by setting partial differentials of the variation S_{p }given by equation (22) by the six parameters A_{11}, A_{12}, A_{21}, A_{22}, O_{X}, and O_{Y }to zero.

The parameter calculation unit [0194] 33 stores P groups of estimations (A_{11p}, A_{12p}, A_{21p}, A_{22p}, O_{Xp}, O_{Yp}) calculated in this way in the estimation storage area 45.

The valid value calculation unit [0195] 34 reads out the P sets of calculated estimations (A_{11p}, A_{12p}, A_{21p}, A_{22p}, O_{Xp}, O_{Yp}) from the estimation storage area 45, and calculates an Xposition deviation σ_{p}X, Yposition deviation σ_{p}Y, and covariance σ_{p}XY associated with each sample set S_{p }upon adopting each group of estimations as follows.

The valid value calculation unit [0196] 34 calculates the expected transfer Xposition EX_{p}(i_{pm}, j_{pm}) of each mark MX(i_{pm}, j_{pm}) from the ideal transfer position (DX_{X}(i_{pm}, j_{pm}), DY_{Y}(i_{pm}, j_{pm})) of the mark MX(i_{pm}, j_{pm}) using the estimations (A_{11p}, A_{12p}, A_{21p}, A_{22p}, O_{Xp}, O_{Yp}) as the values of the parameters (A_{11}, A_{12}, A_{21}, A_{22}, O_{X}, O_{Y}) in equation (13). Subsequently, the valid value calculation unit 34 calculates the expected transfer Yposition EY_{p}(i_{ps}, j_{ps}) of each mark MY(i_{ps}, j_{ps}) from the ideal transfer position (DX_{Y}(i_{ps}, j_{ps}), DY_{Y}(i_{ps}, j_{ps})) of the mark MY(i_{ps}, j_{ps}).

The valid value calculation unit [0197] 34 then calculates an error σ_{p}X_{pm }of the Xposition FX(i_{pm}, j_{pm}), calculated based on the measurement result, relative to the expected transfer Xposition EX_{p}(i_{pm}, j_{pm}) for each mark MX(i_{pm}, j_{pm}) by

δ_{p} X _{pm} =FX(i _{pm} , j _{pm})−EX _{p}(i _{pm} , j _{pm}) (23)

Subsequently, the valid value calculation unit [0198] 34 calculates an error δ_{p}Y_{ps }of the Yposition FY(i_{ps}, j_{ps}) calculated based on the measurement result from the expected transfer Yposition EY_{p}(i_{ps}, j_{ps}) for each mark MY(i_{ps}, j_{ps}) by

δ_{p} Y _{ps} =FY(i _{ps} , j _{ps})−EX _{p}(i_{pm} , j _{pm}) (24)

The valid value calculation unit [0199] 34 then evaluates the errors δ_{p}X_{pm }and δ_{p}Y_{ps }calculated by equations (23) and (24) using the probability densities pFX(i_{pm}, j_{pm}) and pFY(i_{ps}, j_{ps}), and calculates evaluated errors ε_{p}X_{pm }and ε_{p}Y_{ps }by

ε_{p} X _{pm} =pFX(i _{pm} , j _{pm})·σ_{p} X _{pm} (25)

ε_{p} Y _{ps} =pFY(i _{ps} , j _{ps})·δ_{p} Y _{ps} (26)

The valid value calculation unit
[0200] 34 calculates an Xposition deviation σ
_{p}X, Yposition deviation σ
_{p}Y, and covariance σ
_{p}XY respectively by
$\begin{array}{cc}{\sigma}_{p}\ue89eX={\left(\left[\left\{\sum _{m=1}^{M}\ue89e{\left(\varepsilon \ue89e\text{\hspace{1em}}\ue89e{X}_{\mathrm{pm}}\right)}^{2}\right\}/\left(M1\right)\right]\right)}^{1/2}& \left(27\right)\\ {\sigma}_{p}\ue89eY={\left(\left[\left\{\sum _{s=1}^{M}\ue89e{\left(\varepsilon \ue89e\text{\hspace{1em}}\ue89e{X}_{p\ue89e\text{\hspace{1em}}\ue89es}\right)}^{2}\right\}/\left(M1\right)\right]\right)}^{1/2}& \left(28\right)\\ {\sigma}_{p}\ue89e\mathrm{XY}={\left[\left\{\sum _{m=1}^{M}\ue89e\left(\varepsilon \ue89e\text{\hspace{1em}}\ue89e{X}_{\mathrm{pm}}\xb7\varepsilon \ue89e\text{\hspace{1em}}\ue89e{Y}_{\mathrm{pm}}\right)\right\}/\left(M1\right)\right]}^{1/2}& \left(29\right)\end{array}$

The valid value calculation unit [0201] 34 calculates a deviation σ_{p }upon adopting the estimations (A_{11p}, A_{12p}, A_{21p}, A_{22p}, O_{Xp}, O_{Yp}) by

σ_{p}={(σ_{p} X)^{2}·(σ_{p} Y)^{2}−(σ_{p} XY)^{2}}^{½} (30)

The calculated deviations σ[0202] _{p }indicate the certainties of the respective estimations (A_{11p}, A_{12p}, A_{21p}, A_{22p}, O_{Xp}, O_{Yp}) i.e., the degrees by which the respective sample sets S_{p }represent the entire marks MX and MY.

The parameter calculation unit [0203] 33 checks in step 204 if there are a plurality of sample sets S_{p}. In the above, since the number of sample sets S_{p }is P (>1), the answer in step 204 is YES, and the sequence advances to step 205.

In step
[0204] 205, on the basis of the P groups of estimations (A
_{11p}, A
_{12p}, A
_{21p}, A
_{22p}, O
_{Xp}, O
_{Yp}) and deviations σ
_{p }that reflect the certainty, the valid value calculation unit
34 calculates weighted mean of each parameter of the estimations (A
_{11p}, A
_{12p}, A
_{21p}, A
_{22p}, O
_{Xp}, O
_{Yp}) using the values (1/σ
_{p}) as respective weight coefficients of the estimations, and obtains a set of statistically valid parameter values (A
_{11O}, A
_{12O}, A
_{21O}, A
_{22O}, O
_{XO}, O
_{YO}) That is, the valid value calculation unit
34 computes
$\begin{array}{cc}{A}_{110}=\left\{\sum _{p=1}^{P}\ue89e\left({A}_{11\ue89ep}/{\sigma}_{p}\right)\right\}/\left\{\sum _{p=1}^{P}\ue89e\left(1/{\sigma}_{p}\right)\right\}& \left(31\right)\\ {A}_{120}=\left\{\sum _{p=1}^{P}\ue89e\left({A}_{12\ue89ep}/{\sigma}_{p}\right)\right\}/\left\{\sum _{p=1}^{P}\ue89e\left(1/{\sigma}_{p}\right)\right\}& \left(32\right)\\ {A}_{210}=\left\{\sum _{p=1}^{P}\ue89e\left({A}_{21\ue89ep}/{\sigma}_{p}\right)\right\}/\left\{\sum _{p=1}^{P}\ue89e\left(1/{\sigma}_{p}\right)\right\}& \left(33\right)\\ {A}_{220}=\left\{\sum _{p=1}^{P}\ue89e\left({A}_{22\ue89ep}/{\sigma}_{p}\right)\right\}/\left\{\sum _{p=1}^{P}\ue89e\left(1/{\sigma}_{p}\right)\right\}& \left(34\right)\\ {O}_{\mathrm{X0}}=\left\{\sum _{p=1}^{P}\ue89e\left({O}_{\mathrm{Xp}}/{\sigma}_{p}\right)\right\}/\left\{\sum _{p=1}^{P}\ue89e\left(1/{\sigma}_{p}\right)\right\}& \left(35\right)\\ {O}_{\mathrm{Y0}}=\left\{\sum _{p=1}^{P}\ue89e\left({O}_{\mathrm{Yp}}/{\sigma}_{p}\right)\right\}/\left\{\sum _{p=1}^{P}\ue89e\left(1/{\sigma}_{p}\right)\right\}& \left(36\right)\end{array}$

Also, the valid value calculation unit
[0205] 34 calculates deviation σ
_{O }upon adopting the statistically valid parameter values (A
_{11O}, A
_{12O}, A
_{21O}, A
_{22O}, O
_{XO}, O
_{YO}) by
$\begin{array}{cc}{\sigma}_{0}=P/\sum _{p=1}^{P}\ue89e\left(1/{\sigma}_{p}\right)& \left(37\right)\end{array}$

The valid value calculation unit [0206] 34 stores the statistically valid parameter values (A_{11O}, A_{12O}, A_{21O}, A_{22O}, O_{XO}, O_{YO}) and deviation σ_{O }calculated in this way in the valid value storage area 44 as parameter values (AU_{11}, AU_{12}, AU_{21}, AU_{22}, OU_{X}, OU_{Y}) and deviation σU used upon exposure of the wafer W.

In subroutine [0207] 207, the evaluation unit 35 evaluates in accordance with an instruction from the control unit 39 possibility of replacing the sample set S_{p }by any of replacement candidate sets R_{q}.

In subroutine [0208] 207, the evaluation unit 35 reads out Xpositions FX(i_{pn}, j_{qn}) (n=1 to N) of marks MX(i_{qn}, j_{qn}) and their probability densities pFX(i_{qn}, j_{qn}), and Ypositions FY(i_{qt}, j_{qt}) (t=1 to N) of marks MY(i_{qt}, j_{qt}) and their probability densities pFY(i_{qt}, j_{qt}) from the mark information storage area 43 in step 221 of FIG. 10 in the same manner as in the aforementioned case of the sample sets S_{p}. The evaluation unit 35 calculates a estimation (A_{11q}, A_{12q}, A_{21q}, A_{22q}, O_{Xq}, and O_{Yq}) for parameters A_{11}, A_{12}, A_{21}, A_{22}, O_{X}, and O_{Y }which uniquely specify the arrangement of shot areas SA(i, j) in the same manner as in the case of the sample sets S_{p}.

The evaluation device [0209] 35 then calculates Xposition deviations σ_{q}X, Yposition deviations σ_{q}Y, and covariances σ_{q}XY of respective replacement candidate sets R_{q }upon adopting the Q sets of calculated estimations (A_{11q}, A_{12q}, A_{21q}, A_{22q}, O_{Xq}, O_{Yq}), and also deviations σ_{q }upon adopting the Q sets of calculated estimations (A_{11q}, A_{12q}, A_{21q}, A_{22q}, O_{Xq}, O_{Yq}) in the same manner as in the case of the sample sets S_{p}. The calculated deviations σ_{q }indicate the certainties of the respective estimations (A_{11q}, A_{12q}, A_{21q}, A_{22q}, O_{Xq}, O_{Yq}), i.e., the degrees by which the respective replacement candidate sets R_{q }represent the entire marks MX and MY.

In step
[0210] 222, the evaluation unit
35 reads out the statistically valid parameter values (AU
_{11}, AU
_{12}, AU
_{21}, AU
_{22}, OU
_{X}, OU
_{Y}) and deviation σU from the valid value storage area
44, and evaluates similarities with the estimations (A
_{11q}, A
_{12q}, A
_{21q}, A
_{22q}, O
_{Xq}, O
_{Yq}) and deviations σ
_{q }associated with the replacement candidate sets R
_{q}. Such evaluation is done by comprehensively considering a similarity F of parameter values given by
$\begin{array}{cc}\begin{array}{c}{F}_{q}=\text{\hspace{1em}}\ue89e\uf603{A}_{11\ue89eq}{\mathrm{AU}}_{11}\uf604+\uf603{A}_{12\ue89eq}{\mathrm{AU}}_{12}\uf604+\uf603{A}_{21\ue89eq}{\mathrm{AU}}_{21}\uf604+\\ \text{\hspace{1em}}\ue89e\uf603{A}_{22\ue89eq}{\mathrm{AU}}_{22}\uf604+\uf603{O}_{\mathrm{Xq}}{\mathrm{OU}}_{X}\uf604+\uf603{O}_{\mathrm{Yq}}{\mathrm{OU}}_{Y}\uf604\end{array}& \left(38\right)\end{array}$

and a similarity G of deviation values given by [0211]

G _{q}=σ_{q} −σU (39)

For example, if [0212]

C1<F _{q} ×G _{q} (40)

for a predetermined value C1, it may be determined that the two sets are similar. In such a case, the similarity F of the parameter values and the similarity G of deviation values are equally handled. [0213]

On the other hand, if [0214]

C2<F _{q} +G _{q} (41)

for a predetermined value C2, it may be determined that the two sets are similar. In such a case as well, the similarity F of the parameter values and the similarity G of deviation values are equally handled. [0215]

Also, if [0216]

C3<F _{q}+3G _{q} (42)

for a predetermined value C3, it is evaluated that the two sets are similar. In such case, the similarity F of the parameter values and the triple of the similarity G of deviation values are equally handled. [0217]

The evaluation unit [0218] 35 checks in step 223 if there is a replacement candidate set R_{q }that has been found to be similar as a result of evaluation in step 222. If NO in step 223, the evaluation unit 35 sends a report indicating this to the control unit 39. The control unit 39 receives the report and selects new Q replacement candidate sets, and stores them in the sample set information storage area 42 in step 225. In this fashion, the process of subroutine 207 executed if NO in step 223 ends.

On the other hand, if YES in step [0219] 223, the evaluation unit 35 replaces the replacement candidate set information in the sample set information storage area 42 by information that pertains to only the replacement candidate set or sets found to be similar.

The evaluation unit [0220] 35 checks in step 226 if there is a replacement candidate set that is successively found to be similar a predetermined number of times. For example, when the predetermined number of times is three, since similarity evaluation has been done only once for only one wafer, the answer in step 226 is NO, and then the process of subroutine 207 ends.

If the predetermined number of times is one, the answer in step [0221] 226 is YES, and the evaluation unit 35 selects a replacement candidate set having the highest similarity from those found to be similar as a new sample set, and replaces the sample set information in the sample set information storage area 42 by information that pertains to the new sample set in step S227, thus ending the process of subroutine 207. In such a case, the number of sample sets is one from here on.

Parallel to the aforementioned process of subroutine [0222] 207, the control unit 39 calculates the arrangement coordinate positions of shot areas SA(i, j) using the parameter values (AU_{11}, AU_{12}, AU_{21}, AU_{22}, OU_{X}, OU_{Y}) calculated in step 205 in FIG. 5. Then, the reticle R and wafer W are aligned on the basis of the calculated shot area arrangement under the control of the control unit 39, and the wafer W and reticle R are synchronously moved at a velocity ratio corresponding to the projection magnification in opposite directions along the scan direction (Ydirection) while a slitlike illumination area (the center of which nearly coincides with the optical axis AX) is illuminated with illumination light IL, thus transferring a pattern on a pattern area of the reticle R onto each shot area SA(i, j) in the reduced scale. Upon completion of pattern transfer onto all the shot areas SA(i, j), the wafer is unloaded under the control of the control unit 39.

If exposure of the first wafer W is complete, and the process of the subroutine has ended in this way, the control unit [0223] 39 checks in step 208 in FIG. 5 if exposure of all wafers (e.g., wafers for one lot) is complete. In the above process, since exposure of only the first wafer is complete, the answer in step 208 is NO. The next wafer is loaded onto the wafer holder 25 by the wafer loader (not shown) in the same manner as the first wafer, and alignment with coarse accuracy (prealignment) is done by the main control system 20 moving the wafer W via the stage control system 19 so that marks MX(i, j) and MY(i, j) can be placed within the observation field of view of the alignment microscope AS. After that, steps 202 to 207 in FIG. 5 are repeated for each wafer W in the same manner as the first wafer until YES is determined n step 208.

If it is determined in step [0224] 208 in FIG. 5 that exposure of all wafers is complete, the exposure process ends.

If replacement candidate sets which are successively determined first to be similar to the sample set a predetermined number of times are found in step [0225] 226 in FIG. 10, since the most similar one of those sets is used as the subsequent sample set, the number of sample sets becomes one. As a result, NO is determined in step 204 in FIG. 5 executed thereafter, and step 206 is executed in place of step 205 mentioned above. That is, the estimations of position parameters calculated based on the position measurement results of marks MX and MY contained in the single sample set are directly adopted as parameter values (AU_{11}, AU_{12}, AU_{21}, AU_{22}, OU_{X}, OU_{Y}) used upon exposure of the wafer W.

According to the exposure apparatus of this embodiment with the aforementioned arrangement and operations, a plurality of sample sets are selected initially, and position parameters are calculated based on the result of estimating the position distribution of the entire marks MX and MY according to respective groups of the estimations and their certainties of position parameters calculated for the sample sets. Accordingly, statistically valid position parameter values can be obtained. Therefore, the wafer W can be aligned very accurately, and the pattern transfer accuracy can be improved. [0226]

Furthermore, if a set appropriate to replace the sample sets is found when searching for a replacement candidate set that reflects the position distribution of the entire marks MX and MY as much as the sample sets used to calculate the statistically valid position parameter values and contains fewer elements than any sample set, the appropriate set is used as a new sample set. Therefore, since the time required for aligning a wafer W can be shortened while maintaining statistical validity of the obtained position parameters, the throughput can be improved. [0227]

In addition, a mark position is calculated on the basis of the position information of each position measurement mark (alignment mark) obtained through measuring the alignment marks associated with each sample set, i.e., the measurement results of a plurality of edge positions of each alignment mark; the certainty of that mark position is calculated from the design values of the edge positions and errors, and parameter values (estimations) that uniquely specify the arrangement of shot areas (i, j) on the wafer W are calculated for each sample set using the certainty as the weight. Therefore, because the arrangement of the shot areas (i, j) on the wafer W is calculated using the finally obtained accurate parameter values to align the wafer W, accurate alignment can be performed, and pattern transfer with high overlap accuracy can be achieved. [0228]

The manufacture of a device using the exposure apparatus and method of this embodiment will be described below. [0229]

FIG. 11 is a flow chart of production of devices (semiconductor chips such as IC or LSI, liquid crystal panels, CCD's, thinfilm magnetic heads, micro machines, or the like) in this embodiment. As shown in FIG. 11, in step [0230] 301 (design step) function/performance design for the devices (e.g., circuit design for semiconductor devices) is performed, and also pattern design is performed. In step 302 (mask fabrication step), a mask formed with the designed circuit pattern is fabricated. On the other hand, in step 303 (wafer preparation step) a wafer is prepared using material such as silicon and the like.

In step [0231] 304 (wafer process step), an actual circuit and the like are formed on the wafer by lithography, as will be described later, using the mask and wafer prepared in steps 301 to 303. In step 305 (device assembly step), devices are assembled using the wafer processed in step 304. This step 305 includes processes such as an assembly process (dicing, bonding), packaging step (chip encapsulation), and the like.

Finally, in step [0232] 306 (inspection step) inspections such as an operation confirmation test, durability test, and the like of the devices manufactured in step 305 are performed. After these steps, the process is complete, and the devices are shipped out.

FIG. 12 shows a detailed, exemplary flow of step [0233] 304 for manufacturing semiconductor devices. As shown in FIG. 12, the wafer surface is oxidized in step 311 (oxidation step). In step 312 (CVD step), an insulation film is formed on the wafer surface. In step 313 (electrode formation step), electrodes are formed on the wafer by deposition. In step 314 (ion implantation step), ions are implanted into the wafer. Steps 311 to 314 mentioned above constitute a preprocess of each step in the wafer process, and is selectively executed in accordance with the process required in each step.

Upon completion of the preprocess in each step of the wafer process, a postprocess steps is performed as follows. In this postprocess, the wafer is coated with a photosensitive agent in step [0234] 315 (resist formation step), and the above exposure apparatus transfers the circuit pattern on the mask onto the wafer aligned using the aforementioned scheme, in step 316 (exposure step). The exposed wafer is developed in step 317 (development step), and an exposing member of portions other than portions where the resist remains is removed by etching in step 318 (etching step). Then, the resist that has become unnecessary after etching is removed in step 319 (resist removal step).

By repeating the pre and postprocess steps, multilayer circuit patterns are formed on the wafer. [0235]

In this way, devices having a micropattern accurately formed thereon are manufactured with high massproductivity. [0236]

In the above embodiment, upon detecting the edge positions of the marks MX(i[0237] _{pm}, j_{pm}) and MY(i_{ps}, j_{ps}), one edge position is extracted for each edge using the pickedup image data on a single axis parallel to the direction of the mark pattern change or the integrated data in the direction perpendicular to the direction of the mark pattern change, as described above. Alternatively, as shown in FIG. 13 that representatively shows a mark MX(i_{pm}, j_{pm}), edge positions can be extracted along each of a plurality of (=H) axes XS_{h}(i_{pm}, j_{pm}) (h=1 to H) parallel to the direction in which the pattern of the mark MX(i_{pm}, j_{pm}) changes. In such a case, edge positions FX_{kh}(i_{pm}, j_{pm}) (k=1 to 6, h=1 to H) are extracted.

Then, an Xposition FX(i
[0238] _{pm}, j
_{pm}) of the mask MX(i
_{pm}, j
_{pm}) is calculated by
$\begin{array}{cc}\mathrm{FX}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)=\left\{\sum _{k=1}^{6}\ue89e\sum _{h=1}^{H}\ue89e{\mathrm{FX}}_{\mathrm{kh}}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)\right\}/\left(6\ue89eH\right)& \left(43\right)\end{array}$

Also, an error dFX[0239] _{kh}(i_{pm}, j_{pm}) of each edge position FX_{kh}(i_{pm}, j_{pm}) from the ideal edge position DX_{k}(i_{pm}, j_{pm}) is calculated by

dFX _{kh}(i _{pm} , j _{pm})=FX _{kh}(i _{pm} , j _{pm})−DX _{k}(i _{pm} , j _{pm}) (44)

After that, an average value dFX(i
[0240] _{pm}, j
_{pm}) and standard deviation σX(i
_{pm}, j
_{pm}) of the errors dFX
_{kh}(i
_{pm}, j
_{pm}) are respectively calculated by
$\begin{array}{cc}\mathrm{dFX}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)=\left\{\sum _{k=1}^{6}\ue89e\sum _{h=1}^{H}\ue89e{\mathrm{dFX}}_{\mathrm{kh}}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)\right\}/\left(6\ue89eH\right)& \left(45\right)\\ \sigma \ue89e\text{\hspace{1em}}\ue89eX\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)={\left[\left\{\sum _{k=1}^{6}\ue89e\sum _{h=1}^{H}\ue89e{\left({\mathrm{dFX}}_{\mathrm{kh}}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)\mathrm{dFX}\ue8a0\left({i}_{\mathrm{pm}},{j}_{\mathrm{pm}}\right)\right)}^{2}\right\}/\left(6\ue89eH1\right)\right]}^{1/2}& \left(46\right)\end{array}$

The Xposition FX(i[0241] _{pm}, j_{pm}) and standard deviation σX(i_{pm}, j_{pm}) of the mark MX(i_{pm}, j_{pm}) calculated in this way are statistically more valid than those in the aforementioned embodiment, since the number of extracted edge positions is larger than that in the above embodiment.

Note that the Yposition FY(i[0242] _{ps}, j_{ps}) of each mark MY(i_{ps}, j_{ps}) can be calculated in the same manner as the mark MX(i_{pm}, j_{pm}). In such case as well, the calculated Y position FY(i_{ps}, j_{ps}) and standard deviation σY(i_{ps}, j_{ps}) of the mark MY(i_{ps}, j_{ps}) are statistically more valid than those in the above embodiment, since the number of extracted edge positions is larger than that in the above embodiment.

After that, optimal values of six parameters that uniquely specify the arrangement of shot areas SA(i, j) on the wafer W are calculated in the same manner as in the above embodiment. Since the optimal values of the six parameters are calculated based on statistically more valid mark positions and their certainties than the above case, higher accuracy than in the above embodiment can be assured. [0243]

In the above embodiment, the aforementioned six parameters are used as those for uniquely specifying the arrangement of shot areas SA(i, j) on the wafer W. Alternatively, even when the arrangement of shot areas SA(i, j) are uniquely specified by more parameters than the above embodiment, their optimal values can be accurately calculated as in the above embodiment. [0244]

More specifically, in the above embodiment, upon aligning shot areas SA(i, j) on the wafer, even when appropriate values of parameters that specify the arrangement of representative points, e.g. central points of the shot areas SA(i, j), are calculated, and the central point of a given shot area SA(i, j) is aligned using those parameters, sufficiently high accuracy of superposition cannot always be obtained. Such problem is caused by the following three major factors, as disclosed in, e.g., Japanese Patent LaidOpen Nos. 6275496 and 6349705. [0245]

{circle over (5)} Rotation of shot region: This is caused when a reticle R has rotated with respect to the stage coordinate system (X, Y) upon transferring a pattern formed on the reticle R onto a wafer W or when yawing is accidentally mixed in the movement of a wafer stage WST upon scanning exposure, and is expressed by a rotation error φ of the wafer coordinate system (α, β) with respect to a shot coordinate system having coordinate axes parallel to the α and βaxes. [0246]

{circle over (6)} Orthogonality of shot region: This is caused by distortion of a pattern formed on a reticle R, distortion (distortion error) of a projection optical system PL, and the like, and is expressed by an orthogonality error χ. [0247]

{circle over (7)} Linear expansion/shrinkage of shot region: This is caused by an error of the projection magnification upon projecting a pattern formed on a reticle R onto a wafer W or by the wafer entirely or partially expanding or shrinking due to a formation process or the like. This expansion/shrinkage amount is expressed by wafer scaling values r[0248] _{X }and r_{Y }in the coordinate axis directions (i.e., α and βdirections) of the shot coordinate system. Note that the wafer scaling value r_{X }in the αdirection is represented by the ratio between the actually measured value and design value of the distance between two points in the αdirection on the wafer W, and that the wafer scaling value r_{Y }in the βdirection is represented by the ratio between the actually measured value and design value of the distance between two points in the βdirection.

The ideal position (DX, DY) in a shot area is expressed by [0249]

DX=CX+SX (47)

DY=CY+SY (48)

using a central coordinate position (CX, CY) of that shot region, and a coordinate position (SX, SY) of the position (DX, DY) relative to that coordinate position (CX, CY) on the shot coordinate system. Therefore, when error factors {circle over (5)} to {circle over (7)} are added to the aforementioned error factors {circle over (1)} to {circle over (4)}, a pattern to be transferred at the ideal position (DX, DY) on the wafer coordinate system (α, β) is transferred to a position (EX, EY) on the stage coordinate system (X, Y) given by
[0250] $\begin{array}{cc}\begin{array}{c}\left(\begin{array}{c}\mathrm{EX}\\ \mathrm{EY}\end{array}\right)=\text{\hspace{1em}}\ue89e\left(\begin{array}{c}{R}_{X},0\\ 0,{R}_{Y}\end{array}\right)\ue89e\left(\begin{array}{c}\mathrm{cos}\ue89e\text{\hspace{1em}}\ue89e\theta ,\mathrm{sin}\ue89e\text{\hspace{1em}}\ue89e\theta \\ \mathrm{sin}\ue89e\text{\hspace{1em}}\ue89e\theta ,\mathrm{cos}\ue89e\text{\hspace{1em}}\ue89e\theta \end{array}\right)\ue89e\text{\hspace{1em}}\ue89e\left(\begin{array}{cc}1,& \mathrm{tan}\ue89e\text{\hspace{1em}}\ue89ew\\ 0,& 1\end{array}\right)\ue89e\left(\begin{array}{c}\mathrm{CX}\\ \mathrm{CY}\end{array}\right)+\\ \text{\hspace{1em}}\ue89e\left(\begin{array}{c}{r}_{X},0\\ 0,{r}_{Y}\end{array}\right)\ue89e\text{\hspace{1em}}\ue89e\left(\begin{array}{c}\mathrm{cos}\ue89e\text{\hspace{1em}}\ue89e\phi ,\mathrm{sin}\ue89e\text{\hspace{1em}}\ue89e\phi \\ \mathrm{sin}\ue89e\text{\hspace{1em}}\ue89e\phi ,\mathrm{cos}\ue89e\text{\hspace{1em}}\ue89e\phi \end{array}\right)\ue89e\text{\hspace{1em}}\ue89e\left(\begin{array}{cc}1,& \mathrm{tan}\ue89e\text{\hspace{1em}}\ue89e\chi \\ 0,& 1\end{array}\right)\ue89e\left(\begin{array}{c}\mathrm{SX}\\ \mathrm{SY}\end{array}\right)+\left(\begin{array}{c}{O}_{X}\\ {O}_{Y}\end{array}\right)\end{array}& \left(49\right)\end{array}$

Note that various other error factors are also present in addition to the aforementioned ones upon actual transfer, and that the position (EX, EY) is considered as an expected transfer position. [0251]

In general, since the orthogonality errors w and χ, and residual rotation errors θ and φ can be considered as infinitesimal amounts, the designed transfer position (DX, DY) and expected transfer position (EX, EY) are related by
[0252] $\begin{array}{cc}\begin{array}{c}\left(\begin{array}{c}\mathrm{EX}\\ \mathrm{EY}\end{array}\right)=\text{\hspace{1em}}\ue89e\left(\begin{array}{cc}{R}_{X},& {R}_{X}\ue8a0\left(w+\theta \right)\\ {R}_{Y}\xb7\theta ,& {R}_{Y}\end{array}\right)\ue89e\left(\begin{array}{c}\mathrm{CX}\\ \mathrm{CY}\end{array}\right)+\\ \text{\hspace{1em}}\ue89e\left(\begin{array}{cc}{r}_{X},& {r}_{X}\ue8a0\left(\chi +\phi \right)\\ {r}_{Y}\xb7\phi ,& {r}_{X}\end{array}\right)\ue89e\left(\begin{array}{c}\mathrm{SX}\\ \mathrm{SY}\end{array}\right)+\left(\begin{array}{c}{O}_{X}\\ {O}_{Y}\end{array}\right)\end{array}& \left(50\right)\end{array}$

which expresses a firstorder approximation of the trigonometric function in equation (49). [0253]

In the following description, equation (50) is also expressed by
[0254] $\begin{array}{cc}\left(\begin{array}{c}\mathrm{EX}\\ \mathrm{EY}\end{array}\right)=\left(\begin{array}{c}{A}_{11},{A}_{12}\\ {A}_{21},{A}_{22}\end{array}\right)\ue89e\left(\begin{array}{c}\mathrm{CX}\\ \mathrm{CY}\end{array}\right)+\left(\begin{array}{c}{B}_{11},{B}_{12}\\ {B}_{21},{B}_{22}\end{array}\right)\ue89e\left(\begin{array}{c}\mathrm{SX}\\ \mathrm{SY}\end{array}\right)+\left(\begin{array}{c}{O}_{X}\\ {O}_{Y}\end{array}\right)& \left(51\right)\end{array}$

where [0255]

B_{11}=r_{X} (52)

B _{12}=−r_{X}(χ+φ) (53)

B _{21} =r _{Y}·φ (54)

B_{22}=r_{Y} (55)

That is, parameters which uniquely specify the arrangement of shot areas SA(i, j) are ten parameters All, A[0256] _{12}, A_{21}, A_{22}, B_{11}, B_{12}, B_{21}, B_{22}, O_{X}, and O_{Y}. In order to obtain optimal values of the ten parameters, four twodimensional marks WMp(i, j) (p=1 to 4), which do not line up upon design and the patterns of which change in the X and Ydirections, are formed on each shot area SA(i, j), as shown in, e.g., FIG. 14. Note that the number of wafer marks is not limited to four, and can be three or more. As the twodimensional marks WMp, marks having patterns shown in, e.g., FIGS. 15A to 15D, can be used.

Then, more than five twodimensional marks WMp(i[0257] _{m}, j_{m}) (m>5) including three out of four twodimensional marks WMp(i, j) on any shot area are measured. After that, the X and Ypositions of the twodimensional marks WMp(i_{m}, j_{m}), and the probability densities of those X and Ypositions are calculated in the same manner as in the above embodiment, thus calculating optimal values of the ten parameters A_{11}, A_{12}, A_{21}, A_{22}, B_{11}, B_{12}, B_{21}, B_{22}, O_{X}, and O_{Y}.

In this way, the arrangement of shot areas can be obtained that takes into account the rotation, orthogonality and expansion/shrinkage of each shot region. Furthermore, optimal values of the error factors (r[0258] _{X}, r_{Y}, χ, φ) are calculated using equations (32) to (35) for the aforementioned controllable error factors (r_{X}, r_{Y}, χ, φ) and four parameters (B_{11}, B_{12}, B_{21}, B_{22}) of the ten parameters, the optimal values of which have been calculated, and in accordance with those error factors, corrected are the magnification of the projection optical system PL, the synchronous velocity ratio upon scanning exposure, synchronous moving directions upon scanning exposure, and the like. Thus, the accuracy of superposition is further improved.

In the above embodiment, a plurality of initial sample sets are used, but if one sample set from which accurate position parameters can be calculated is known, that sample set may be used as only one initial sample set. Even in such a case, the processing flow of the above embodiment can be applied. Since the number of sample sets is one from the beginning, in the processing flow of the above embodiment, the answer in step [0259] 204 is always NO.

In the above embodiment, whether or not it is possible to reduce the number of elements of a sample set is evaluated by comparing the position parameter values and their certainties calculated from the sample set, and estimations of position parameters and their certainties calculated from each of a plurality of replacement candidate sets. Alternatively, such evaluation can be performed by socalled crossverification. [0260]

More specifically, the position errors of marks MX and MY included in a sample set are calculated using the estimations of the position parameters calculated for each replacement candidate set, and the position error distribution of the marks MX and MY included in the sample set is calculated based on that calculation result, thus evaluating if each replacement candidate set has statistical characteristics equivalent to that of the sample set. In this case, replacement possibility can be evaluated without calculating the position parameter values and their certainties from the sample set. Such crossverification is particularly effective when the replacement candidate set is a subset of the sample set. [0261]

In the above embodiment, upon reducing the number of elements of a sample set, a plurality of initial sample sets are replaced by one new sample set. Alternatively, the number of elements of each of the plurality of initial sample sets may be reduced. In this case, for each sample set, a new sample set is searched for that has characteristics equivalent thereto in the statistical characteristics of position errors of marks and consists of fewer elements. [0262]

In the above embodiment, onedimensional marks MX and MY are used. Alternatively, twodimensional marks shown in FIGS. 15A to [0263] 15D may be used. Also, upon calculating the ten parameters, onedimensional marks MX and MY may be used. As the twodimensional mark, for example, a boxinbox mark may be used in addition to those shown in FIGS. 15A to 15D. Upon detecting the twodimensional position of such a twodimensional mark, the aforementioned onedimensional position detection process may be performed twice, or twodimensional template matching into which the above onedimensional template matching for onedimensional signals is extended may be performed on twodimensional signal waveforms of twodimensional marks.

In the above embodiment, the offaxis alignment method that measures the positions of alignment marks on a wafer without the intervention of a projection optical system is adopted. Alternatively, a TTL (throughthelens) scheme that measures the positions of alignment marks on a wafer via a projection optical system, or a TTR (throughthereticle) that simultaneously observes a wafer and reticle via a projection optical system may be adopted. In the case of the TTR scheme, upon observation, sample alignment senses the position of a wafer mark where the deviation between a reticle mark formed on the reticle and a wafer mark formed on the wafer is zero. [0264]

In the above embodiment, the coordinate positions of the shot areas are calculated. Alternatively, the step pitch of each shot may be calculated. [0265]

In the above embodiment, the flyeye lens is used as an optical integrator (homogenizer). In place of the flyeye lens, a rod integrator may be used. In an illumination optical system using the rod integrator, the rod integrator is disposed such that its exit surface is nearly conjugated with the pattern surface of a reticle R. Such an illumination optical system using the rod integrator is disclosed in, e.g., U.S. Pat. No. 5,675,401, and the disclosures in the above U.S. patent is incorporated herein by reference as long as the national laws in designated states or elected states, to which this international application is applied, permit. Also, the flyeye lens and rod integrator may be combined, or two flyeye lenses or rod integrators may be disposed in series to build double optical integrators. [0266]

In the above embodiment, the present invention is applied to the stepandscan scanning exposure apparatus. However, the application range of the present invention is not limited to such specific apparatus, and the present invention can be suitably applied to a stationary exposure apparatus such as a stepper or the like. [0267]

Even an exposure apparatus using, e.g., ultraviolet rays may adopt as a projection optical system a reflection system consisting of reflective optical elements alone or a reflection/refraction system (a catadioptric system) having both reflective and refractive optical elements. As the catadioptric projection optical system, a reflection/refraction system which is disclosed in, e.g., Japanese Patent LaidOpen No. 8171054 and corresponding U.S. Pat. No. 5,668,672, Japanese Patent LaidOpen No. 1020195 and corresponding U.S. Pat. No. 5,835,275, and the like, and has a beam splitter and concave mirror as reflection optical elements, or a reflection/refraction system which is disclosed in Japanese Patent LaidOpen No. 8334695 and corresponding U.S. Pat. No. 5,689,377, Japanese Patent LaidOpen No. 103039 and corresponding U.S. patent application Ser. No. 873,605 (application date: Jun. 12, 1997), and the like, and has a concave mirror and the like as reflective optical elements without using any beam splitter can be used. The disclosures in the above Japanese Patent LaidOpens and U.S. patents are incorporated herein by reference as long as the national laws in designated states or elected states, to which this international application is applied, permit. [0268]

In addition, a reflection/refraction system can be employed which comprises a plurality of refraction optical elements and two mirrors (a main mirror being a concave mirror and a submirror that is a back surface mirror of which the reflection surface is formed on the opposite side to the incident surface of a refraction element or plane parallel plate) that are disposed along one axis, and has the intermediate image, formed by those refraction optical elements, of a reticle pattern again imaged on a wafer using the main mirror and submirror, the reflection/refraction system being disclosed in Japanese Patent LaidOpen No. 10104513 and U.S. Pat. No. 5,488,229 corresponding thereto. In this reflection/refraction system, the main mirror and submirror are disposed in series with the plurality of refraction optical elements, and an illumination light passes through a portion of the main mirror, is reflected by the submirror and the main mirror in turn, passes through a portion of the submirror and reaches the wafer. The disclosures in the above Japanese Patent LaidOpen and U.S. patent are incorporated herein by reference as long as the national laws in designated states or elected states, to which this international application is applied, permit. [0269]

Furthermore, as the reflection/refractiontype projection optical system, a reduction system may be employed which has, e.g., a circular image field, is telecentric on both the object plane side and image plane side, and has a reduction ratio of, e.g., {fraction (1/4)} or {fraction (1/5)}. Also, in a scanning exposure apparatus comprising this reflection/refractiontype projection optical system, the illumination area of the illumination light may be a rectangularslitshaped area whose center almost coincides with the optical axis of the projection optical system and which extends along a direction almost perpendicular to the scanning direction of a reticle or wafer. By using a scanning exposure apparatus comprising such a reflection/refractiontype projection optical system, it is possible to accurately transfer a fine pattern of about 100 nm L/S pattern onto wafers even with F[0270] _{2 }laser light having, for example, the wavelength of 157 nm as exposure light.

Furthermore, as a vacuum ultraviolet light, ArF excimer laser light or F[0271] _{2 }laser light is used. However, in a case of containing only a beammonitor mechanism and reference wavelength light source in the same environment—controlled chamber as the projection optical system, a higher harmonic wave may be used which is obtained with wavelength conversion into ultraviolet by using nonlinear optical crystal after having amplified a single wavelength laser light, infrared or visible, emitted from a DFB semiconductor laser device or a fiber laser by a fiber amplifier having, for example, erbium (or erbium and ytterbium) doped.

For example, considering that the oscillation wavelength of a single wavelength laser is in the range of 1.51 to 1.59 um, an eighttimehigher harmonic wave of which the wavelength is in the range of 189 to 199 nm or a tentimehigher harmonic wave of which the wavelength is in the range of 151 to 159 nm is emitted. Especially, when the oscillation wavelength is in the range of 1.544 to 1.553um, an eighttimehigher harmonic wave of which the wavelength is in the range of 193 to 194 nm, that is, almost the same as ArF excimer laser light (ultraviolet light) is obtained, and when the oscillation wavelength is in the range of 1.57 to 1.58 um, a tentimehigher harmonic wave of which the wavelength is in the range of 157 to 158 nm, that is, almost the same as F[0272] _{2 }laser light (ultraviolet light) is obtained.

Furthermore, when the oscillation wavelength is in the range of 1.03 to 1.12 um, a seventimehigher harmonic wave of which the wavelength is in the range of 147 to 160 nm is emitted, and, especially, when the oscillation wavelength is in the range of 1.099 to 1.106 um, a seventimehigher harmonic wave of which the wavelength is in the range of 157 to 158 nm, that is, almost the same as F[0273] _{2 }laser light (ultraviolet light) is obtained. In this case, for example, ytterbiumdoped fiber laser can be employed as the single wavelength laser.

Moreover, the present invention can be applied not only to an exposure apparatus for producing micro devices such as semiconductor devices but also to an exposure apparatus that transfers a circuit pattern onto a glass substrate or silicon wafer so as to produce reticles or masks used by a light exposure apparatus, EUV (Extreme Ultraviolet) exposure apparatus, Xray exposure apparatus, electron beam exposure apparatus, etc. Incidentally, in an exposure apparatus using DUV (far ultraviolet) light or VUV (vacuum ultraviolet) light, a transmissiontype reticle is employed in general. And as the substrate of the reticle, quartz glass, quartz glass with fluorine doped, fluorite, magnesium fluoride, or quartz crystal is employed. And an Xray exposure apparatus of a proximity method or electron beam exposure apparatus employs a transmissiontype mask (stencilmask, membranemask); an EUV exposure apparatus employs a reflectiontype mask, and as the substrate of the mask, silicon wafer or the like is employed. [0274]

Note that the present invention can be applied not only to a wafer exposure apparatus used in the production of semiconductor devices but also to an exposure apparatus that transfers a device pattern onto a glass plate and is used in the production of displays such as liquid crystal display devices and plasma displays, an exposure apparatus that transfers a device pattern onto a ceramic plate and is used in the production of thin magnetic heads, and an exposure apparatus used in the production of pickup devices (CCD, etc.). In addition, in the above embodiment, positional detection of the alignment marks on a wafer and alignment of the wafer have been described. However, positional detection and alignment according to the present invention can be applied to positional detection of the alignment marks on a reticle and alignment of the reticle, and also to other units than exposure apparatuses such as a unit to observe objects and a unit that is used to detect positions of objects and align them in an assembly line, process line or inspection line. [0275]

As has been described in detail above, according to the position detection method and position detection apparatus of the present invention, since a statistical process is executed on the basis of position information of position detection points obtained by measurement of the position detection points on an object, position information of any area on the object is accurately and efficiently detected. Therefore, the position detection method and position detection apparatus of the present invention are suitable for detecting the position of any area on an object. [0276]

Also, according to the exposing method and exposure apparatus of the present invention, the positions of a predetermined number of alignment marks formed on a substrate are detected using the position detection method of the present invention, and a predetermined pattern is transferred onto divided areas while aligning the substrate on the basis of the detection result. Therefore, the exposing method and exposure apparatus of the present invention are suitable to perform multiexposure for forming a multilayer pattern with improved accuracy of superposition between layers. For this reason, the exposing method and exposure apparatus of the present invention are suitable for massproduction of devices having a fine pattern. [0277]

While the abovedescribed embodiment of the present invention is the presently preferred embodiment thereof, those skilled in the art of lithography systems will readily recognize that numerous additions, modifications, and substitutions may be made to the abovedescribed embodiment without departing from the spirit and scope thereof. It is intended that all such modifications, additions, and substitutions fall within the scope of the present invention, which is best defined by the claims appended below. [0278]