Disclosure of Invention
The invention aims to provide a similarity detection method and a similarity detection device so as to improve the accuracy of similarity detection.
In order to solve the above problem, the present invention provides a similarity detection method, including:
acquiring two groups of detection data, wherein the detection data are used for representing the properties of the product, and the detection data conform to the log-normal distribution;
establishing a linear regression model corresponding to the lognormal distribution function of each group of detection data;
calculating the error square sum and the error mean square corresponding to the linear regression model of each group of detection data;
setting a confidence band corresponding to a linear regression model of each group of detection data, wherein the confidence band is associated with the calculated mean square error;
and if the linear regression model of each group of detection data is in the confidence band corresponding to the linear regression model of the other group of detection data, determining that the two groups of detection data are similar.
Correspondingly, the invention also provides a similarity detection device, which comprises:
the acquisition unit is used for acquiring two groups of detection data, the detection data are used for representing the properties of the product, and the detection data conform to the log-normal distribution;
the establishing unit is used for establishing a linear regression model corresponding to the lognormal distribution function of each group of detection data;
the calculation unit is used for calculating the error square sum and the error mean square corresponding to the linear regression model of each group of detection data;
the setting unit is used for setting a confidence band corresponding to a linear regression model of each group of detection data, and the confidence band is associated with the calculated mean square error;
and the determining unit is used for determining that the two groups of detection data are similar if the linear regression model of each group of detection data is in a confidence band corresponding to the linear regression model of the other group of detection data.
Compared with the prior art, the similarity detection method and the similarity detection device have the following advantages: because a strict similarity rule is set, namely the linear regression models of each group of detection data are required to be respectively in the corresponding confidence bands of the linear regression models of the other group of detection data, the accuracy of similarity detection can be ensured even if the number of each group of detection data is less than 30. In addition, in the production process, by applying the similarity detection method and the similarity detection device, the number of each group of detection data can be less than 30, so that the cost and time consumed for obtaining the detection data can be saved, the test cost in the production process can be further reduced, and the test period in the production process can be shortened.
Detailed Description
According to the embodiment of the invention, Confidence Bands (CB) are set for the linear regression models of the detection data, and the similarity of two groups of detection data is determined according to the set similarity rule that the linear regression model of each group of detection data is in the Confidence Band corresponding to the linear regression model of the other group of detection data.
Fig. 1 is a flowchart of a similarity detection method for detecting similarity of properties of a product according to an embodiment of the present invention, including the steps of:
step S11, two sets of detection data are obtained, the detection data conform to Lognormal Distribution (Lognormal Distribution). The acquired detection data is used for characterizing the properties of the product.
Step S12, a linear regression (regression line) model corresponding to the lognormal distribution function of each group of the detection data is established.
Step S13, calculating Sum of Squared Errors (SSE) and Mean Square Error (MSE) corresponding to the linear regression model of each group of detection data.
And step S14, setting a confidence band corresponding to the linear regression model of each group of detection data, wherein the confidence band is associated with the calculated mean square error.
And step S15, if the linear regression model of each group of detection data is in the confidence band corresponding to the linear regression model of the other group of detection data, determining that the two groups of detection data are similar.
The following description will be made in detail with reference to the accompanying drawings by taking the similarity of the life spans of two batches of semiconductor devices as an example. The two batches of semiconductor devices may be produced on different production sites or lines or under different process conditions, but they are required to have similar service lives and therefore they are subjected to similarity testing.
Referring to fig. 1, in step S11, two sets of detection data are obtained, and the detection data conform to the log-normal distribution. And respectively selecting a certain number of devices from the two batches of semiconductor devices, wherein in the embodiment, the number of the selected devices is less than 30, and the service lives of the selected devices are tested. In order to measure the service life of a semiconductor device in a short time, an accelerated test experiment is usually used, that is, stress conditions (stress, which refers to higher environmental temperature, humidity, voltage, current, pressure and the like than normal working conditions) for accelerating performance degradation (degradation) of the device are applied to the device, performance parameters of the device are measured, so that the service life of the semiconductor device under a more strict working environment than the normal working conditions is obtained, and then a life cycle model (LifetimeModel) is used for calculating the service life of a product under the normal service conditions.
Fig. 2 is a schematic diagram showing an example of data distribution obtained by testing the Lifetime of a selected group of semiconductor devices, wherein the abscissa represents Lifetime (Lifetime) and the ordinate represents probability density (probability). The lifetime shown in the figure conforms to a lognormal distribution whose lognormal distribution function can be expressed as: <math>
<mrow>
<mi>f</mi>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<msqrt>
<mn>2</mn>
<mi>π</mi>
</msqrt>
<mi>σt</mi>
</mrow>
</mfrac>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mfrac>
<msup>
<mrow>
<mo>(</mo>
<mi>ln</mi>
<mi>t</mi>
<mo>-</mo>
<mi>μ</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mrow>
<mn>2</mn>
<msup>
<mi>σ</mi>
<mn>2</mn>
</msup>
</mrow>
</mfrac>
</mrow>
</msup>
<mo>,</mo>
</mrow>
</math> wherein t is the service life (0 < t < + > ∞), mu is the mean value, and sigma is the standard deviation(s) ((S)) <math>
<mrow>
<mi>σ</mi>
<mo>=</mo>
<msqrt>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<mfrac>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>X</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<mover>
<mi>X</mi>
<mo>‾</mo>
</mover>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mrow>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</mfrac>
</msqrt>
</mrow>
</math> ) And n is the number of selected semiconductor devices.
And step S12, establishing a linear regression model corresponding to the lognormal distribution function of each group of detection data. The linear regression model corresponding to the log normal distribution function of the service life is expressed as:Y=β0+β1X, wherein, beta1=1/σ,β0=Y-β1X=50%-T50/σ,T50For the median, this embodiment may be referred to as the median lifetime, and is expressed as: t is50=exp(μ)。
Fig. 3 is a schematic diagram of an example of a linear regression model corresponding to a log-normal distribution function of service life of a group of semiconductor devices, wherein the abscissa represents service life (Lifetime) and the ordinate represents Cumulative Failure rate (Cumulative Failure). The illustrated line l is a linear regression model corresponding to a log-normal distribution function of the service life, the slope of the line l is 1/σ, and the intercept on the abscissa is (50% -T)50σ). At 50% cumulative failure rate, the corresponding value on the abscissa is the median lifetime T50. Discrete points (x) around the line li,yi) (i 1, 2.., n-1, n) is an actual value of the detection data, and an abscissa (x) on the straight line l from the actual value of the detection datai) The same point is an estimated value of the detection data.
Step S13, the sum of the squares of errors and the mean square of errors corresponding to the linear regression model for each set of detected data are calculated. The sum of the squares of errors corresponding to the linear regression model of the test data is expressed as: <math>
<mrow>
<mi>SSE</mi>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>y</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>,</mo>
</mrow>
</math> wherein, referring to fig. 2, yi is an actual value of the detection data, i.e. an actual accumulated failure rate; <math>
<mrow>
<msub>
<mover>
<mi>y</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>=</mo>
<msub>
<mi>β</mi>
<mn>0</mn>
</msub>
<mo>+</mo>
<msub>
<mi>β</mi>
<mn>1</mn>
</msub>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
</mrow>
</math> is an estimate of the detected data, i.e., the estimated cumulative failure rate. The mean square error and the sum of squared errors corresponding to the linear regression model of the detected data are related, and are expressed as: MSE is SSE/(n-2).
In step S14, confidence bands corresponding to the linear regression models of the respective sets of detection data are set. The confidence band CB corresponding to the linear regression model of the detection data is associated with the mean square error MSE calculated in step S13, which can be expressed by the following formula:
<math>
<mrow>
<mi>CB</mi>
<mo>=</mo>
<msub>
<mover>
<mi>y</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>±</mo>
<msub>
<mi>t</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>2,1</mn>
<mo>-</mo>
<mi>α</mi>
<mo>/</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</msub>
<msqrt>
<mi>MSE</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<mo>+</mo>
<mfrac>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>T</mi>
<mn>50</mn>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>T</mi>
<mn>50</mn>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mfrac>
<mo>)</mo>
</mrow>
</msqrt>
</mrow>
</math>
wherein, t(n-2,1-α/2)It can be obtained by looking up a table of statistics t, where (n-2) is the degree of freedom and α is the significance level, and in this example, α is 0.05. The range of the confidence band CB is defined by a first threshold value CBUAnd a second threshold value CBLIt is determined that, i.e.,
<math>
<mrow>
<mi>C</mi>
<msub>
<mi>B</mi>
<mi>U</mi>
</msub>
<mo>=</mo>
<msub>
<mover>
<mi>y</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>±</mo>
<msub>
<mi>t</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>2,1</mn>
<mo>-</mo>
<mi>α</mi>
<mo>/</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</msub>
<msqrt>
<mi>MSE</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<mo>+</mo>
<mfrac>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>T</mi>
<mn>50</mn>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>T</mi>
<mn>50</mn>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mfrac>
<mo>)</mo>
</mrow>
</msqrt>
</mrow>
</math>
<math>
<mrow>
<mi>C</mi>
<msub>
<mi>B</mi>
<mi>L</mi>
</msub>
<mo>=</mo>
<msub>
<mover>
<mi>y</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>±</mo>
<msub>
<mi>t</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>2,1</mn>
<mo>-</mo>
<mi>α</mi>
<mo>/</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</msub>
<msqrt>
<mi>MSE</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<mo>+</mo>
<mfrac>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>T</mi>
<mn>50</mn>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>T</mi>
<mn>50</mn>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mfrac>
<mo>)</mo>
</mrow>
</msqrt>
</mrow>
</math>
and step S15, if the linear regression model of each group of detection data is in the confidence band corresponding to the linear regression model of the other group of detection data, determining that the two groups of detection data are similar. And setting a similarity rule according to the confidence band set in the step S14 so as to judge whether the two groups of detection data are similar.
Specifically, referring to FIG. 4, line l1Is a linear regression model of the first set of test data, Curve CB1UAnd CB1LThe range determined is a linear regression model (i.e., line l) corresponding to the first set of test data1) Confidence band CB (l)1) (ii) a Straight line l2Is a linear regression model of the second set of test data, Curve CB2UAnd CB2LThe range determined is a linear regression model (i.e., line l) corresponding to the second set of test data2) Zone of confidence CB2. In FIG. 4, line l1All in a straight line l2Corresponding confidence band CB (l)2) Inner, straight line l2All in a straight line l1Corresponding confidence band CB1The inner, namely: l1∈CB(l2) And l is2∈CB(l1) It can be determined that the two sets of test data are similar, that is, the service lives of selected devices from the two batches of semiconductor devices are similar, and it can be inferred that the two batches of semiconductor devices have similar service lives.
Because the linear regression models of the service lives of the two groups of devices are respectively provided with corresponding confidence bands, the set confidence bands not only consider the average value of the service lives, but also consider the overall distribution condition of the service lives, and a strict similarity rule is set according to the established linear regression models and the set confidence bands, namely the linear regression models of the service lives detected by each group of devices are required to be respectively in the confidence bands corresponding to the linear regression models of the service lives detected by the other group of devices, the conclusion of deducing the similarity of the service lives of the two groups of devices by detecting the similarity of the service lives of the selected devices has higher confidence level, namely the accuracy of similarity detection is improved.
It should be noted that the above embodiments are described with the service life of the semiconductor device as the detection data, and actually, the detection data may also be other detected product properties, for example, in the semiconductor production process, the detection data may be, for example, the growth thickness of a thin film, the etching depth, the precision of photolithography overlay, and the like, and by analyzing the similarity of these detection data, it is possible to determine whether the performance of each process and various manufactured devices is normal, so as to find out the problem occurring in the process as soon as possible.
In addition, in the above embodiment, the number of detection data per group is smaller than 30, and of course, the same applies to the similarity detection in which the number of detection data per group is larger than 30.
Correspondingly, the similarity detection device of the embodiment of the invention is used for detecting the similarity of the properties of the product, and as shown in fig. 5, the similarity detection device comprises: an acquisition unit 51, a setup unit 52, a calculation unit 53, a setting unit 54, and a determination unit 55.
The obtaining unit 51 obtains two sets of detection data, which conform to the log-normal distribution, and are used to represent the properties of the product, such as the service life of the semiconductor device.
A creating unit 52 for creating each set of detection data based on each set of detection data acquired by the acquiring unit 51The log normal distribution function of (2) is a linear regression model. The linear regression model of the detection data is expressed as: y ═ beta0+β1X, wherein, beta1=1/σ,β0=50%-T50/σ。
The calculating unit 53 calculates the sum of squared errors and the mean square error corresponding to the linear regression model for each set of detection data, based on the linear regression model for each set of detection data created by the creating unit 52. The sum of the squares of errors corresponding to the linear regression model of the test data is expressed as: <math>
<mrow>
<mi>SSE</mi>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>y</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>,</mo>
</mrow>
</math> the mean square error is expressed as: MSE is SSE/(n-2).
A setting unit 54, configured to set a confidence band corresponding to the linear regression model of each set of detection data based on the linear regression model corresponding to each set of detection data set by the setting unit 52 and the mean square error calculated by the calculating unit 53, wherein the confidence band is associated with the mean square error calculated by the calculating unit, and is set according to the following formula:
<math>
<mrow>
<mi>CB</mi>
<mo>=</mo>
<msub>
<mover>
<mi>y</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>±</mo>
<msub>
<mi>t</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>2,1</mn>
<mo>-</mo>
<mi>α</mi>
<mo>/</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</msub>
<msqrt>
<mi>MSE</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<mo>+</mo>
<mfrac>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>T</mi>
<mn>50</mn>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>T</mi>
<mn>50</mn>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mfrac>
<mo>)</mo>
</mrow>
</msqrt>
<mo>.</mo>
</mrow>
</math>
the determining unit 55 sets a similarity rule to determine the similarity between the two sets of detection data according to the linear regression model corresponding to each set of detection data established by the establishing unit 52 and the confidence band corresponding to the linear regression model set by the setting unit 54, that is, if the linear regression model of each set of detection data is within the confidence band corresponding to the linear regression model of the other set of detection data, it is determined that the two sets of detection data are similar, and thus it is determined that the two sets of products are similar in property.
In summary, in the similarity detection method and apparatus, the linear regression models of the two sets of detection data are respectively provided with corresponding confidence bands, the set confidence bands take into account both the mean value of the data and the overall distribution condition of the data, and a strict similarity rule is set according to the established linear regression model and the set confidence bands, that is, the linear regression models of each set of detection data are required to be respectively in the confidence bands corresponding to the linear regression models of the other set of detection data, so that the similarity of the detection data is inferred to have higher reliability according to the similarity rule, that is, the accuracy of the similarity detection is improved.
In the production process, by applying the similarity detection method and the similarity detection device, the number of each group of detection data can be less than 30, so that the cost and time consumed for obtaining the detection data can be saved, the test cost in the production process is further reduced, and the test period in the production process is shortened.
Although the present invention has been described with reference to the preferred embodiments, it is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.