CN117313937A

CN117313937A - Method and device for predicting wafer yield

Info

Publication number: CN117313937A
Application number: CN202311288704.9A
Authority: CN
Inventors: 易丛文; 薛司悦; 夏敏; 管健
Original assignee: Shenzhen Zhixian Future Industrial Software Co ltd
Current assignee: Shenzhen Zhixian Future Industrial Software Co ltd
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2023-12-29

Abstract

The embodiment of the specification provides a method and a device for predicting wafer yield. One embodiment of the method comprises the following steps: acquiring a plurality of detection data generated by each die on a target wafer in different production stages, wherein each detection data comprises a detection stage, a detection type and a detection result; for each die on the target wafer, inputting a detection result of the die into a target model according to detection stages and detection types of a plurality of detection data of the die to obtain failure probability of the die, wherein the target model is a linear regression model or a naive Bayesian model, and a plurality of inputs of the target model are respectively set to receive detection results of different detection types of different detection stages of the die; determining whether the corresponding die is a failed die according to the comparison of the failure probability of each die on the target wafer and a target threshold value; and determining the yield of the target wafer according to the number of the failed dies on the target wafer.

Description

Method and device for predicting wafer yield

Technical Field

Embodiments of the present disclosure relate to the field of semiconductor integrated circuit manufacturing, and more particularly, to a method and apparatus for predicting wafer yield.

Background

In the process of manufacturing semiconductor integrated circuits, the yield of wafers (wafer) is a very important index, and the yield of wafers is defined as: yield = number of die (die) available on wafer/total number of die on wafer. In practical production, a wafer is subjected to hundreds of precise processing procedures, and if some of the processing procedures are unreasonable, the subsequent processing procedures and product quality may be affected, resulting in poor wafer yield, and even wafer rejection and previous work rejection. The yield has a critical effect on the cost budget and the yield value, so that it is necessary to analyze and predict the yield of the wafer.

Disclosure of Invention

The embodiments of the present disclosure describe a method and apparatus for predicting a wafer yield, where the method learns a relationship between detection data of a die and a failure probability of the die through a model, so as to determine the failure probability of the die according to the detection data of the die generated in different production stages, and further determine the wafer yield.

According to a first aspect, there is provided a method of predicting wafer yield, comprising: acquiring a plurality of detection data generated by each die on a target wafer in different production stages, wherein each detection data comprises a detection stage, a detection type and a detection result; for each die on the target wafer, inputting a detection result of the die into a target model according to detection stages and detection types of a plurality of detection data of the die to obtain failure probability of the die, wherein the target model is a linear regression model or a naive Bayesian model, and a plurality of inputs of the target model are respectively set to receive detection results of different detection types of different detection stages of the die; determining whether the corresponding die is a failed die according to the comparison of the failure probability of each die on the target wafer and a target threshold value; and determining the yield of the target wafer according to the number of the failed dies on the target wafer.

According to a second aspect, there is provided an apparatus for predicting wafer yield, comprising: the device comprises an acquisition unit, a detection unit and a detection unit, wherein the acquisition unit is configured to acquire a plurality of detection data generated by each die on a target wafer in different production stages, and the detection data comprises a detection stage, a detection type and a detection result; the input unit is configured to input a detection result of each die on the target wafer into a target model according to detection stages and detection types of a plurality of detection data of the die to obtain failure probability of the die, wherein the target model is a linear regression model or a naive Bayesian model, and a plurality of inputs of the target model are respectively set to receive detection results of different detection types of different detection stages of the die; a determining unit configured to determine whether the corresponding die is a failed die according to comparison of failure probability of each die on the target wafer and a target threshold; and determining the yield of the target wafer according to the number of the failed dies on the target wafer.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform a method as described in any of the implementations of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has executable code stored therein, and wherein the processor, when executing the executable code, implements a method as described in any of the implementations of the first aspect.

According to the method and the device for predicting the wafer yield, provided by the embodiment of the specification, the dies are packaged together in each detection stage according to detection data of various detection types, the detection data are uniformly input into a trained target model, the mapping relation between a detection data set and failure probability is fitted by the model, the final failure probability of the dies is directly output, and the failure abandoning rate is prevented from being determined in a one-by-one detection process in the conventional technology. In the yield calculation stage, the probability of failure of the die is compared to a threshold to determine if it is a failed die. Therefore, the tube core with low failure probability is released, and inaccurate calculation of the failure number caused by accumulation of the minimum failure probability is avoided. By the aid of the scheme, accuracy of yield prediction is improved.

Drawings

FIG. 1 shows a schematic diagram of one application scenario in which embodiments of the present description may be applied;

FIG. 2 illustrates a flow diagram of a method of predicting wafer yield in accordance with one embodiment;

FIG. 3 illustrates a schematic diagram of one example of determining the number and location of failed dies on a wafer based on the probability of failure of the dies and a threshold value;

fig. 4 shows a schematic block diagram of an apparatus for predicting wafer yield in accordance with one embodiment.

Detailed Description

The technical scheme provided in the present specification is further described in detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. It should be noted that, without conflict, the embodiments of the present specification and features in the embodiments may be combined with each other.

As described above, it is necessary to analyze and predict the yield of wafers. A wafer has many dies thereon, and the yield of the wafer is closely dependent on the failure conditions of the dies therein. In the conventional algorithm in the industry, first, for each inspection process of different inspection types of the die at different stages, the probability of causing a defect, that is, the kill rate (failure discard rate), is calculated, and the probability of causing the defect is assumed to be independent. Then, the probability (Edi) of the final failure of the die is calculated according to the types and the number of detection anomalies of the single die in each detection process. Specifically, the die final failure probability can be expressed as:

Edi＝1-(1-p1)(1-p2)…(1-pn) (1)

where p1 to pn are the kill rates of the die during n detection processes, respectively. Only the die "survives" (i.e., does not fail, with a probability of 1-pi) during each inspection process i, and eventually is the "survived" die that does not fail. Thus, the final failure probability is 1 minus the product of the "survival" probabilities of the individual detection processes.

On the basis of obtaining the failure probability Edi of a single die according to the formula (1), according to the conventional scheme, the failure probabilities Edi of all the dies on one wafer are added, and the obtained value is used as the number of dies that may fail for the whole wafer. Further, a wafer yield is determined based on the number of failed dies.

However, the above conventional scheme has some disadvantages. First, in the die failure probability prediction stage, each kill rate in various tests needs to be derived separately. The detection principle and the detection object are different, the kill rate algorithm is different, and the evaluation process is complex. Second, in the yield calculation stage, the calculation method of simply adding the failure probabilities Edi of the respective dies is relatively coarse, resulting in a large deviation of the final yield prediction from the actual. For example, when the number of dies is large, the extremely small Edi may also calculate the probability of failure due to the accumulated number. For example, a wafer includes 100 dies with a failure probability of 0.01, and theoretically, the dies with a failure probability of 0.01 are less likely to be failed dies, and should not have an influence on yield calculation. However, in the manner described above, 100 0.01 summations may result in 1, i.e., indicating a die failure, which obviously affects yield calculation.

In view of this, a new yield prediction scheme is proposed in the present invention, in which, in the die failure probability prediction stage, the die is packed together in each detection stage for the detection data of each detection type, and is uniformly input into a trained target model, and the mapping relationship between the detection data and the failure probability is fitted by the model, so as to directly output the final failure probability of the die (instead of determining the kill rate in a detection process by detection process). In the yield calculation stage, the probability of failure of the die is compared to a threshold to determine if it is a failed die. Therefore, the tube core with low failure probability is released, and inaccurate calculation of the failure number caused by accumulation of the minimum Edi is avoided. The yield prediction scheme can improve the accuracy of yield prediction.

Fig. 1 shows a schematic diagram of one application scenario in which embodiments of the present description may be applied. As shown in fig. 1, in the application scenario shown in fig. 1, a linear regression model is taken as an example for explanation. In the process of producing the wafer A, the dies on the wafer A can be detected in different production stages, so that detection data of each die in different production stages can be obtained, wherein the detection data can comprise detection stages, detection types and detection results. For each die on the wafer a, a plurality of detection results of the die may be input into the linear regression model 101 according to the detection stage and the detection type of the plurality of detection data of the die, and the failure probability P of the die may be output by the linear regression model 101. Here, the linear regression model 101 is used to predict the failure probability of the die based on the input detection results of the die, and the plurality of inputs of the linear regression model 101 are set to receive the detection results of different detection types of different detection stages of the die, respectively. In this example, the formula of the linear regression model 101 may not include the bias (bais) term, for example, the formula of the linear regression model 101 may be: f (x) =k1x1+k2x2+k3x3+ … … +knxn, where n may represent the number of detection results that the model can input, and k1, k2, … … kn may represent the weight. After the failure probability of each die on the wafer A is obtained, the number of failed dies can be obtained according to the comparison between the failure probability of each die on the wafer A and the threshold value, and then the yield of the wafer A is calculated.

With continued reference to fig. 2, fig. 2 shows a flow diagram of a method of predicting wafer yield in accordance with one embodiment. It is understood that the method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities. As shown in fig. 2, the method for predicting wafer yield may include the following steps:

in step 201, a plurality of inspection data generated at different stages of production for each die on a target wafer is obtained.

In general, the production flow of a semiconductor integrated circuit is relatively long, and some inspection steps may be inserted in some production stages during the long production flow. The wafer is the carrier for designing the integrated circuit, and the analog circuit or the digital circuit designed by people are finally realized on the wafer. Each small cell on each wafer is a complete chip circuit unit, such as a CPU or a memory, known as a Die (Die). Inspection of the wafer may include inspection of die (die) on the wafer.

In this embodiment, a plurality of inspection data generated at different production stages for each die on the target wafer may be acquired. Wherein each detection data may include a detection stage, a detection type, and a detection result. Here, the inspection stage may refer to a stage at the time of inspection, for example, assuming that M inspection steps are included in the entire production flow, each inspection step may be counted as one stage, and the entire production flow includes M inspection stages. The inspection type may refer to a type of inspection, such as defect inspection or electrical inspection. The detection result may refer to the result of the detection. Taking the detection type as defect detection as an example, the detection result may include the defect type of the scratch and the number of scratches. Taking the detection type as a resistor as an example, the detection result may include a resistance value.

In some implementations, the plurality of inspection data may include inspection data of Defect (Defect) inspection and inspection data of electrical test (ETest). Here, defect detection may be used to detect defects on the die surface and within the material, such as crystal defects, metal impurities, chemical contaminants, structural defects, and the like. Defect detection is typically performed using techniques such as microscopy, electron microscopy, laser scanning, and the like. Electrical testing is a method used to test the performance of a die by measuring the electrical properties of the die to detect defects. Electrical testing may detect defects associated with electrical properties such as resistance, capacitance, leakage, etc.

Step 202, for each die on the target wafer, inputting the detection result of the die into a target model according to the detection stages and the detection types of the multiple detection data of the die, so as to obtain the failure probability of the die, wherein the target model can be a linear regression model or a naive Bayesian model.

In this embodiment, the linear regression model or the naive bayes model may include a plurality of input nodes, each of which may be configured to receive the detection results of a different detection type of the different detection stage of the die. For example, the input node 1 may be set to receive the detection result of the detection type T1 in the detection stage S1, the input node 2 may be set to receive the detection result of the detection type T2 in the detection stage S1, the input node 3 may be set to receive the detection result of the detection type T1 in the detection stage S2, and so on. The linear regression model or the naive bayes model may be used to predict the failure probability of the die according to the input detection result of the die. In practice, according to the setting of a plurality of input nodes, the detection results of a plurality of detection data of each die can be input into a linear regression model or a naive Bayesian model, so as to obtain the failure probability of the die. As an example, assuming that the wafer production process includes 10 inspection stages, the wafer may be inspected in all of the 10 inspection stages or only in some of the inspection stages, which is set according to actual needs. In a scenario where the wafer is inspected only in a part of inspection stages, inspection data in an inspection stage is empty, and inspection results of the inspection data that is empty when the linear regression model or the naive bayes model is input may be set to be empty or a default value.

As an example, when the target model employs a linear regression model, the linear regression model may include no bias term, but only weight terms. For example, taking n (n is a positive integer) as an example of the number of detection results that can be input by the model, the formula corresponding to the linear regression model may be:

f(x)＝k1x1+k2x2+k3x3+……+knxn，

where k1, k2, … … kn may represent weights, x1, x2, x3 … … xn may represent detection results from each detection stage, each detection type, and f (x) may represent failure probabilities. Where k1, k2, … … kn may be trainable parameters in a linear regression model.

In practice, the execution subject used to train the above linear regression model or naive bayes model may be any apparatus, device, platform, cluster of devices, etc. with computing, processing capabilities. It will be appreciated that the execution bodies used to train the linear regression model or the naive bayes model described above may be the same as or different from the execution bodies used to execute the method shown in fig. 2. As an example, the above linear regression model or naive bayes model (hereinafter, model) may be trained by:

s1, acquiring a first training sample set.

In this example, each first training sample in the first training sample set may include a plurality of sample detection data corresponding to each die on the first sample wafer and a true failure condition of each die. Here, the sample inspection data may be inspection data generated at different stages of production for each die on the first sample wafer. The sample detection data may include a sample detection stage, a sample detection type, and a sample detection result.

S2, for each die on the first sample wafer, obtaining the predicted failure probability of the die according to the sample detection data and the model of the die. Specifically, according to the sample detection stage and sample detection type of the sample detection data of the die, the sample detection result is input into the model, and the model outputs the prediction failure probability.

S3, determining first difference loss between the predicted failure probability and the actual failure condition of each die on the sample wafer. For example, a loss function may be set according to actual needs, and the difference loss may be calculated based on the loss function.

S4, adjusting model parameters of the model with the determined first difference loss minimization as a target. For example, BP (Back Propagation) algorithm or SGD (Stochastic Gradient Descent, random gradient descent) or the like may be employed.

The processing and training of the model is described above. The model thus obtained can process a plurality of pieces of detection data of the die input, and output the failure probability of the die. In this way, by adopting the model, the relation between the detection data set of the die in each type detection at each stage and the failure probability of the die can be learned, and the kill rate of the die is not required to be determined from detection stage to detection stage and from detection type to detection type.

Step 203, determining whether the corresponding die is a failed die according to the comparison of the failure probability of each die on the target wafer and the target threshold; and determining the yield of the target wafer according to the number of the failed dies on the target wafer.

In this embodiment, the number of failed dies on the target wafer may be determined first according to the comparison between the failure probability of each die on the target wafer and the target threshold T, where the failed dies may refer to dies that are not available. Specifically, if the failure probability of a certain die is greater than the target threshold T, the die is determined to be a failed die. Conversely, if the failure probability of the die is equal to or less than the target threshold T, the die is determined to be a normal die or an available die that is not failed.

As an example, the above-described target threshold T may be determined in various ways, and may be, for example, a value manually set according to actual experience. More preferably, the threshold T may be a value learned using training data.

And judging whether each die on the target wafer is a failed die or not one by one, and obtaining the number of the failed dies on the wafer. And then, determining the yield of the target wafer according to the number of the failed dies on the target wafer and the total number of the dies on the target wafer. Specifically, the formula is used: yield = number of dies available on wafer/total number of dies on wafer, the yield of the target wafer can be calculated.

According to the implementation mode, the die with the failure probability larger than the threshold T can be determined to be the failure die, so that the influence of the failure probability with the failure probability smaller than or equal to the threshold T on the yield calculation is ignored, and the yield calculation is more accurate.

In some implementations, the target threshold T may be trained by:

1) A second training sample set is obtained.

In this example, each second training sample in the second training sample set may include a failure probability of each die on the second sample wafer and a sample yield of the second sample wafer.

2) And determining the predicted yield of the second sample wafer according to the failure probability of each die on the second sample wafer and the threshold to be trained.

For example, first, it may be determined whether each die fails according to the threshold to be trained and the failure probability of each die on the second sample wafer. Specifically, if the failure probability is greater than the threshold to be trained, the die fails; if the failure probability size is equal to or greater than the threshold to be trained, the die does not fail. The number of failed dies on the second sample wafer is then counted. And finally, calculating the yield of the second sample wafer according to the statistical result, wherein the yield is the predicted yield.

3) A second differential loss between the predicted yield and the sample yield of the second sample wafer is determined. For example, a loss function may be set according to actual needs, and the difference loss may be calculated based on the loss function.

4) The threshold to be trained is adjusted with the determined second difference loss minimized as a goal. After the adjustment is completed, the target threshold T is obtained.

The learning process of the target threshold is described above. The target threshold T thus obtained may be used to determine whether the die fails according to the probability of failure of the die. Compared with manual setting, the threshold value obtained through training of a large number of samples is more accurate.

In some implementations, the method for predicting wafer yield may further include the following steps, not shown in fig. 2: in response to determining that the die fails, information of the failed die is output. Here, the information of the failed die may include location information of the failed die.

In practice, the dies on the wafer are arranged in a matrix, and each die has a unique position on the wafer corresponding to it. Thus, after a die is determined to fail, the location of the failed die may be further determined. As shown in fig. 3, fig. 3 illustrates a schematic diagram of one example of determining the number and location of failed dies on a wafer based on the probability of die failure and a threshold value. In the example shown in fig. 3, it is assumed that the threshold t=0.3. For ease of understanding, in this example, the probability of failure of each die is labeled on each die on the wafer. Comparing the failure probability of each die on the wafer with a threshold value T=0.3, and if the failure probability is larger than 0.3, the die fails, wherein the die is a failed die; if less than or equal to 0.3, the die does not fail. In this example, the failed die may be represented by the symbol "x". In this example, 4 dies with failure probabilities of 0.5, 0.6, 0.7, and 0.8 are judged as failed dies. The 3 dies having a failure probability of 0.1 are not judged as failed dies. From the results shown in fig. 3, it is clear that die on the wafer is a failed die. It will be appreciated that the number of die on the wafer, probability of failure, etc. shown in fig. 3 are merely illustrative and are not limiting.

Referring back to the above procedure, in the embodiment of the present specification, first, a plurality of inspection data generated at different production stages for each die on the target wafer is acquired, each inspection data including an inspection stage, an inspection type, and an inspection result. Then, for each die on the target wafer, according to the detection stages and the detection types of the plurality of detection data of the die, the detection result of the die is input into a linear regression model or a naive Bayesian model to obtain the failure probability of the die. And finally, determining whether the dies fail or not according to comparison of the failure probability of each die on the target wafer and the threshold value, and further determining the yield of the target wafer. Therefore, the relation between the detection data of the die and the failure probability of the die is learned through the model, so that the failure probability of the die is determined according to the detection data of the die generated in different production stages, and the yield of the wafer is further determined.

According to another aspect, an apparatus for predicting wafer yield is provided. The apparatus for predicting wafer yield may be deployed in any device, platform, or cluster of devices having computing and processing capabilities.

Fig. 4 shows a schematic block diagram of an apparatus for predicting wafer yield in accordance with one embodiment. As shown in fig. 4, the apparatus 400 for predicting wafer yield includes: an obtaining unit 401, configured to obtain a plurality of detection data generated by each die on the target wafer in different production stages, where the detection data includes a detection stage, a detection type, and a detection result; an input unit 402 configured to, for each die on the target wafer, input a detection result of the die into a linear regression model or a naive bayes model according to detection stages and detection types of a plurality of detection data of the die, so as to obtain a failure probability of the die, where a plurality of inputs of the linear regression model or the naive bayes model are respectively set to receive detection results of different detection types of different detection stages of the die; a determining unit 403 configured to determine whether the corresponding die is a failed die according to a comparison between the failure probability of each die on the target wafer and a target threshold; and determining the yield of the target wafer according to the number of the failed dies on the target wafer.

In some optional implementations of this embodiment, the above linear regression model or naive bayes model is trained by: acquiring a first training sample set, wherein the first training sample comprises sample detection data and sample failure probability corresponding to each die on a first sample wafer; for each die on the first sample wafer, obtaining the predicted failure probability of the die according to the sample detection data and the model of the die; determining a first difference loss between the predicted failure probability and the sample failure probability of each die on the sample wafer; the model parameters of the model are adjusted with the determined first difference loss minimization as a goal.

In some optional implementations of this embodiment, the determining unit 403 is further configured to: determining that the die fails in response to determining that the probability of failure of the die is greater than a preset threshold; if the failure probability of the die is smaller than or equal to the target threshold value, determining the die as a non-failed available die; and determining the yield of the target wafer according to the number of the failed dies on the target wafer and the total number of the dies on the target wafer.

In some alternative implementations of the present embodiment, the threshold is trained by: acquiring a second training sample set, wherein the second training sample comprises the failure probability of each die on a second sample wafer and the sample yield of the second sample wafer; determining the predicted yield of the second sample wafer according to the failure probability of each die on the second sample wafer and the threshold to be trained; determining a second difference loss between the predicted yield and the sample yield on the second sample wafer; and adjusting the threshold to be trained with the aim of minimizing the determined second difference loss.

In some optional implementations of this embodiment, the apparatus 400 further includes: an output unit (not shown in the figure) configured to output information of the failed die in response to determining that the die fails, wherein the information includes location information of the failed die.

In some optional implementations of this embodiment, the plurality of inspection data includes inspection data for defect inspection and inspection data for electrical testing.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device including a memory and a processor, wherein the memory has executable code stored therein, and the processor, when executing the executable code, implements the method described in fig. 2.

Those of ordinary skill would further appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those of ordinary skill in the art may implement the described functionality using different approaches for each particular application, but such implementation is not to be considered as beyond the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method of predicting wafer yield, comprising:

acquiring a plurality of detection data generated by each die on a target wafer in different production stages, wherein each detection data comprises a detection stage, a detection type and a detection result;

for each die on the target wafer, inputting a detection result of the die into a target model according to detection stages and detection types of a plurality of detection data of the die to obtain failure probability of the die, wherein the target model is a linear regression model or a naive Bayesian model, and a plurality of inputs of the target model are respectively set to receive detection results of different detection types of different detection stages of the die;

determining whether the corresponding die is a failed die according to the comparison of the failure probability of each die on the target wafer and a target threshold value; and determining the yield of the target wafer according to the number of the failed dies on the target wafer.

2. The method of claim 1, wherein the target model is trained by:

acquiring a first training sample set, wherein the first training sample comprises sample detection data and real failure conditions corresponding to each die on a first sample wafer;

for each die on the first sample wafer, obtaining the predicted failure probability of the die according to the sample detection data of the die and the target model;

determining a first differential loss between a predicted failure probability and a true failure condition for each die on the sample wafer;

adjusting model parameters of the target model with the determined first difference loss minimization as a goal.

3. The method of claim 1, wherein determining whether the corresponding die is a failed die based on a comparison of a failure probability of each die on the target wafer to a target threshold comprises:

if the failure probability of the die is greater than the target threshold, determining the die as a failed die;

if the failure probability of the die is less than or equal to the target threshold, the die is determined to be a non-failed available die.

4. A method according to claim 3, wherein the target threshold is trained by:

acquiring a second training sample set, wherein the second training sample comprises the failure probability of each die on a second sample wafer and the sample yield of the second sample wafer;

determining the predicted yield of the second sample wafer according to the failure probability of each die on the second sample wafer and the threshold to be trained;

determining a second difference loss between the predicted yield and the sample yield of the second sample wafer;

and adjusting the threshold to be trained with the determined second difference loss as a target, and obtaining the target threshold.

5. A method according to claim 3, wherein the method further comprises:

in response to determining a failed die, outputting information of the failed die, wherein the information includes location information of the failed die.

6. The method of claim 1, wherein the plurality of inspection data includes inspection data for defect inspection and inspection data for electrical testing.

7. An apparatus for predicting wafer yield, comprising:

the device comprises an acquisition unit, a detection unit and a detection unit, wherein the acquisition unit is configured to acquire a plurality of detection data generated by each die on a target wafer in different production stages, and the detection data comprises a detection stage, a detection type and a detection result;

the input unit is configured to input a detection result of each die on the target wafer into a target model according to detection stages and detection types of a plurality of detection data of the die to obtain failure probability of the die, wherein the target model is a linear regression model or a naive Bayesian model, and a plurality of inputs of the target model are respectively set to receive detection results of different detection types of different detection stages of the die;

a determining unit configured to determine whether the corresponding die is a failed die according to comparison of failure probability of each die on the target wafer and a target threshold; and determining the yield of the target wafer according to the number of the failed dies on the target wafer.

8. The apparatus of claim 7, wherein the linear regression model or naive bayes model is trained by:

for each die on the first sample wafer, obtaining the predicted failure probability of the die according to the sample detection data and the target model of the die;

the model parameters of the target model are adjusted with the determined first difference loss minimization as a goal.

9. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-6.

10. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 1-6.