CN113222031A

CN113222031A - Photolithographic hot zone detection method based on federal personalized learning

Info

Publication number: CN113222031A
Application number: CN202110545686.2A
Authority: CN
Inventors: 卓成; 林学忠; 徐金明; 孟文超; 朱建新; 黄炎; 朱泽晗
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2021-08-06
Anticipated expiration: 2041-05-19
Also published as: CN113222031B

Abstract

The invention discloses a lithography hot zone detection method based on federal personalized learning.A central server aggregates global model parameters returned by each node, is used for fusing the common characteristics of each node, updates the global model parameters and feeds the latest global model parameters back to each node; each node downloads global model parameters from a central server, and then local model parameters are trained by using local data to find the optimal local model parameters under the current global model parameters, so that model isomerism and data isomerism of different nodes are overcome; after the local model parameters are finely adjusted, the nodes train all the parameters by using local data to find the optimal of the current parameters, and the optimal parameters are used for searching common characteristics of different nodes. The method solves the problem of model overfitting caused by too little local data; data among chip design manufacturers are protected, and privacy protection is realized; the stability and the overall precision of the federal personalized learning model in the heterogeneous environment are improved.

Description

Photolithographic hot zone detection method based on federal personalized learning

Technical Field

The invention belongs to the field of machine learning, and particularly relates to a photolithographic hot zone detection method based on federal personalized learning.

Background

The lithography hot area is an integrated circuit layout area with manufacturing defects, and how to quickly and accurately detect the lithography hot area is a problem which needs to be solved at present. The hot zone detection methods at the present stage mainly include the following four methods:

1. and lithography simulation, namely performing rapid plane lithography simulation aiming at a one-dimensional chip layout by fully utilizing the partial coherence characteristic of a light source in a lithography system and the characteristic of a one-dimensional chip graph. The photoetching simulation method consists of a one-dimensional element figure table look-up method, a minimum look-up table and the edge extension thereof and large-area layout simulation without cutting. The traditional photoetching hot area detection depends on photoetching simulation to a great extent, the hot area in the layout can be detected with extremely high accuracy by the method, but the method has high calculation complexity and long time consumption, is inconvenient to quickly and accurately detect the photoetching hot area in a test stage, and is generally used for manufacturing a final verification stage.

2. Pattern recognition is adopted for the photoetching hot area detection framework, important design rules are used for characterizing topological features of the photoetching hot area, and a tangent space distance measurement is used for hot area pattern analysis and classification. Although pattern recognition can accurately and quickly detect lithography hotspots, the accuracy is not satisfactory for some unknown lithography hotspot patterns.

3. Centralized machine learning, a large amount of data of each node is obtained on a central server for model training, all training data are gathered to train a convolutional neural network until the model is converged finally, and a photoetching hot zone detection model is obtained, wherein a training schematic diagram of the photoetching hot zone detection model is shown in fig. 1. The method extracts the depth features of the layout, remarkably improves the detection efficiency, but needs to obtain a large amount of data for model training, and due to the consideration of privacy protection, data among chip design manufacturers are not intercommunicated, so that the problem of model overfitting is easily caused.

4. Federal learning, in a basic federal learning framework, there is a central server (server) and several nodes (clients). Each node stores own unshared local photoetching hot area data, the local data are used for training a convolutional neural network photoetching hot area detection model respectively, and the model is uploaded to a central server. The central server is responsible for organizing local training of each node, aggregating the obtained models of each node, and sharing the models back to the nodes, so that the process is a round. And training the nodes on the aggregated model, and repeating the training until the final model converges to obtain a uniform lithography hot zone detection model, wherein a training schematic diagram of the uniform lithography hot zone detection model is shown in fig. 2. The method can solve the problem of data islands such as data non-intercommunication and the like among chip design manufacturers, but the performance of the method is often poor when the high data heterogeneity problem and the asynchronous problem are processed by federal learning, and the detection precision standard is difficult to achieve.

Disclosure of Invention

The invention aims to provide a lithography hot zone detection method based on federal personalized learning, aiming at the defects of the prior art.

The purpose of the invention is realized by the following technical scheme: a photoetching hot zone detection method based on federal personalized learning comprises the following steps:

s1, constructing a convolutional neural network with the same architecture but different parameters for each node based on the photoetching hot zone data of each node;

s2, determining global model parameters and local model parameters according to the parameter similarity of the convolutional neural network between the nodes:

comparing parameter distances of the same layer on a convolutional neural network obtained by training different nodes, calculating difference values of parameters of the jth layer of all nodes and the average value of the parameters of the jth layer on the jth layer of the convolutional neural network, if the sum of 2 norms of all the difference values is less than or equal to a distance threshold parameter delta, taking the jth layer of parameters as common characteristic parameters of different nodes, determining the jth layer of parameters as global model parameters to aggregate, and otherwise, considering that the jth layer of parameters are incompatible characteristics of different nodes, and taking the jth layer of parameters as local model parameters to perform local fine tuning;

s3, establishing a lithography hot zone federal personalized learning model:

wherein, w_globalIs a global model parameter that is common to all nodes,

the kth column is the local model parameter of the kth node, N is the number of nodes, and the probability of each node being selected

p_kNot less than 0 and

n_kis the number of samples of the node k,

is the sum of all node sample numbers; in this model, F is the overall empirical loss function, F_k(. is) the lithographic hot zone data distribution for node k

Local empirical loss function of F_k(. o) is non-convex, assuming that the kth node holds n_kLithography thermal zone training data:

then F_k(. cndot.) can be defined as:

where l (-) is a loss function based on a certain sample;

s4, iteratively updating parameters of the federal personalized learning model of the photo-etching hot zone, wherein the updating process of the t round is as follows:

firstly, the central server broadcasts the latest photoetching hot area global model parameter w to all nodes_t，global；

Secondly, assume the lithography hot zone federal personalized learning model of the kth node as

Execution E_local(more than or equal to 1) times of local model parameter fine adjustment of the photoetching hot area:

wherein eta is_tIs the learning rate, xi_kIs a uniformly selected sample from the local lithographic hot zone data; at this time, the model of the node k is updated to

And E (more than or equal to 1) times of photoetching hotspot all parameter updating are carried out:

finally, the central server aggregates the global model parameters of the node lithography hotspot model to generate new global model parameters w_t+1，global；

And S5, after a plurality of rounds of iterative updating, until the lithography hot zone federal personalized learning model converges, and using the converged model for lithography hot zone data detection.

Further, the construction of the convolutional neural network of each node is specifically as follows:

the convolution neural network of each node consists of two convolution units and two full-connection layers which are connected in sequence;

each convolution unit comprises two convolution layers, a ReLU layer and a maximum pooling layer which are connected in sequence; in each convolution process, a series of convolution kernels perform convolution operation on the data tensor of the bottom photoetching hot area; the ReLU layer activates output data of the convolutional layer to ensure that the whole neural network is nonlinear and sparse; the maximum pooling layer performs 2 multiplied by 2 downsampling on the output of the ReLU layer and serves as an output layer of the current convolution unit;

two convolution units are followed by two fully-connected layers, during training, a dropout operation is performed on the first fully-connected layer to mitigate overfitting, and the second fully-connected layer is an output layer of the whole neural network and is provided with two output channels which are respectively the predicted probabilities of the lithography hot area and the non-lithography hot area.

Further, in S4, when all nodes participate in the aggregation, i.e., the process of synchronous aggregation, is as follows:

photoetching global parameters of a hot zone model according to all nodes

Generating a new global parameter w_t+1，global(ii) a After the parameters of all the nodes in each round are updated, all the nodes send the global parameters to the central server for aggregation, and the aggregation formula is as follows:

further, in S4, when the partial nodes participate in the aggregation, i.e., asynchronous aggregation, the process is as follows:

setting a threshold value K (K is more than or equal to 1 and less than N) of the number of aggregation nodes, and enabling a central server to collect the output of the former K response nodes; after collecting the outputs of the K nodes, the central server stops waiting for the rest of the nodes; in this update, the K +1 th to Nth nodes are regarded as laggard nodes; let S_t(|S_tK) is the set of the first K response nodes in the t-th iteration, and the aggregation formula is as follows:

wherein n is_KIs the sum of the sample data size of the first K nodes,

the invention has the following beneficial effects:

1. due to the fact that data among chip design manufacturers are not intercommunicated, researchers are difficult to obtain a large amount of data to conduct model training, the problem of data islands can be broken through by the method, and the problem of model overfitting caused by the fact that local data are too little is solved.

2. And data among chip design manufacturers are protected, and privacy protection is realized.

3. The problem that the traditional federal learning model is low in precision is solved, and the precision of the model can be effectively improved by using the method provided by the invention.

4. The problem of data heterogeneity and asynchrony that traditional federal study faced is solved. The method of the invention can effectively overcome data isomerism and support the online dynamic update of each node.

Drawings

FIG. 1 is a schematic diagram of centralized machine learning;

FIG. 2 is a diagram of conventional federal learning;

FIG. 3 is a diagram illustrating relative distances between neural network layers in an embodiment of the present invention;

FIG. 4 is a schematic diagram of a lithography hotspot detection method based on federated personalized learning in an embodiment of the present invention;

fig. 5 and 6 respectively show the experimental result and the precision comparison result of the synchronous training of two data sets divided into 2 nodes;

fig. 7 and 8 respectively show the experimental result and the precision comparison result of the synchronous training of two data sets divided into 4 nodes;

fig. 9 and 10 are respectively an experimental result and a precision comparison result of two data sets divided into 10 nodes for synchronous training;

fig. 11 and 12 are respectively an experimental result and a precision comparison result of two data sets divided into 4 nodes for asynchronous training;

fig. 13 and 14 respectively show experimental results and precision comparison results of asynchronous training performed by dividing two data sets into 10 nodes.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

When lithographic adjustments are made to an integrated circuit design layout, some layouts are less robust to such adjustments and are more likely to cause open or short circuit failures during the manufacturing process, such failure prone areas being defined as lithographic hot spots. The main goal of lithography hot zone detection is to improve detection accuracy and minimize detection error rate as much as possible. Training a lithography hot-zone inspection model with bleeding properties typically requires a large amount of data. However, factories with lithography hot zone data will not share data with each other due to the privacy of the data. It is difficult for each plant to obtain a large amount of data for model training, which easily results in model overfitting. Federal learning is introduced for lithography hotspot detection purposes to learn the data characteristics of each node and protect data privacy. Individual plants often generate and collect circuit lithography hotspot data in a highly heterogeneous fashion. In addition, the amount of lithography hot zone data may vary greatly from factory to factory. This data generation violates the independent co-distribution (IID) assumption often used in federal learning, and conventional joint learning cannot deal with statistical heterogeneity. Therefore, the present invention introduces a lithography hotspot detection method based on federal personalized learning, with the goal of achieving factory-specific personalized modeling, which is generally a more efficient method of dealing with statistical heterogeneity of data.

Neural network architecture of one node and each node

Firstly, describing a neural network architecture local to each node, specifically adopting a convolutional neural network architecture, wherein a Convolutional Neural Network (CNN) has excellent performance in the field of image classification. The convolutional neural network is composed of several layers of convolutional units for performing feature extraction and several layers of fully connected layers for generating sample classification probabilities.

In the invention, the convolutional neural network of each node is composed of two convolutional units and two fully-connected layers which are connected in sequence, and each convolutional unit comprises two convolutional layers, a ReLU layer and a max-pooling layer (max-pooling layer) which are connected in sequence. During each convolution, a series of convolution kernels perform the following convolution operations on the underlying lithographic hot zone data tensor X:

Y＝conv(W，X)+b

where W is the weight matrix of the convolutional layer, b is the offset parameter, Y is the output data of the convolutional layer, all the convolutional kernels in this embodiment have a size of 3 × 3, and the number of output channels of the two convolutional layers of each convolutional unit is 16 and 32, respectively. ReLU is an activation function that operates on each data element Y after the convolutional layer, whose expression is shown below, which ensures that the entire neural network is non-linear and sparse.

The maximum pooling layer down-samples the output of the previous layer by 2 x 2 and serves as the output layer of the current convolution unit. Two convolution units are followed by two fully connected layers with output channel numbers of 250 and 2, respectively. During training, a dropout operation is performed on the first fully-connected layer with a 50% probability to mitigate overfitting, and the second fully-connected layer is the output layer of the entire neural network, which has two output channels, which are the predicted probabilities of the lithographic and non-lithographic hot zones, respectively. The model configuration parameters are detailed in table 1.

TABLE 1 node neural network model configuration

Determining global model parameters and local model parameters

Starting from the neural network model parameters of each node, the data similarity of the nodes can be deduced according to the similarity of the model parameters, by using the thought, the neural network model parameters are taken as a means for extracting common characteristics, the parameter distance of the same layer on the neural network trained by different nodes is compared according to the neural network model parameters trained by different nodes, the difference value between the jth layer parameter of all the nodes and the average value of the jth layer parameter is calculated on a certain neural network layer (for example, the jth layer), if the sum of 2 norms of all the difference values is less than or equal to a distance threshold parameter delta, the jth layer parameter is taken as the common characteristic parameter of different nodes, and the common characteristic parameter is determined as the global model parameter to be fused, namely:

wherein, W^k，jIs the jth layer parameter of the kth node model,

is the average of the j-th layer parameters of the N models, and

if the sum of the 2 norms of all the differences is greater than the distance threshold parameter delta, the parameter of the layer is considered to be incompatible features of different nodes, the part of the parameter is used as a local parameter to be locally updated, and the local model parameter is finely adjusted after the global model parameter is aggregated so as to improve the performance of the local model, namely:

by using the above method, we obtain the corresponding distances of the neural network parameters of each layer of all nodes, in this embodiment, the distance of the first fully-connected layer fc1 is calculated to be the smallest, and the distance is taken as a reference to obtain the relative distance between different layers, as shown in fig. 3, it can be found that the distance of the last layer fc2 is significantly higher than that of other neural network layers, so we finally determine the last layer fc2 of the convolutional neural network as the local fine-tuning neural network layer.

Thirdly, establishing a lithography hot zone federal personalized learning model as follows:

wherein, w_globalIs a global model parameter that is common to all nodes,

p_kNot less than 0 and

n_kis the number of samples of the node k,

is the sum of the number of samples of all nodes. In this model, F is the overall empirical loss function, F_k(. is) the lithographic hot zone data distribution for node k

Local empirical loss function of F_k(. o) is non-convex, assuming that the kth node holds n_kLithography thermal zone training data: x is the number of_k，1，x_k，2，…

Then F is_k(. cndot.) can be defined as:

where l (-) is a loss function based on a certain sample.

Fourthly, in the lithography hot zone federal personalized learning model, a parameter iterative updating method is as follows:

for the t-th round, firstly, the central server broadcasts the latest photoetching hot area global model parameters w to all nodes_t，global. Secondly, assume the lithography hot zone federal personalized learning model of the kth node as

Then execute E_local(more than or equal to 1) times of local model parameter updating of the photoetching hot area:

wherein eta is_tIs the learning rate (also called step size), ξ_kIs a uniformly selected sample from the local lithographic hot zone data. At this time, the model of the node k is updated to

Then E (≧ 1) total parameter updates of the lithography hotspots are performed:

finally, the central server aggregates the global model parameters of the node lithography hotspot model to generate new global model parameters w_t+1，global. The method is divided into two cases of all nodes participating in aggregation and part of nodes participating in aggregation, wherein all nodes participate in an ideal case, and only part of nodes can participate in aggregation (namely asynchronous aggregation) in a real scene.

All nodes participate in the aggregation. Photoetching global model parameters of hot zone model according to all nodes

Generating new global model parameters w_t+1，global. After the parameters of all the nodes in each round are updated, all the nodes send the global model parameters to the central server for aggregation, and the aggregation formula is as follows:

unfortunately, in a practical application environment, the requirement for all nodes to participate in the aggregation is affected by a severe "dequeue effect" (which means that all nodes are waiting for the slowest node). For example, if there are thousands of user devices in a federal learning system, a small percentage of the devices are always offline. Full device participation means that the central server must wait for these "laggars", which is clearly impractical.

Part of the nodes participate in the aggregation. This strategy is more practical because it does not require all nodes to be online at the same time (asynchronous). We can set the threshold number of aggregation nodes K (1 ≦ K < N) and let the central server collect the outputs of the first K response nodes. After collecting the outputs of the K nodes, the central server stops waiting for the rest of the nodes; in this update, the K +1 th to Nth nodes are considered as laggard nodes. Let S_t(|S_tK) is the set of the first K response nodes in the t-th iteration, and the aggregation formula is as follows:

wherein n is_KIs the sum of the sample data size of the first K nodes,

as shown in fig. 4, the photolithography hot zone detection method based on federal personalized learning mainly comprises the following three parts:

a central server aggregation stage: the central server aggregates the global model parameters returned by each node, is used for fusing the common characteristics of each node, updates the global model parameters, and feeds back the latest global model parameters to each node.

And (3) fine tuning of local model parameters of the nodes: each node downloads global model parameters from the central server, and then local model parameters are trained by using local data to find the optimal local model parameters under the current global model parameters, so that model isomerism and data isomerism of different nodes are overcome.

And (3) updating all parameters of the nodes: after the local model parameters are finely adjusted, the nodes train all the parameters by using local data to find the optimal of the current parameters, and the optimal parameters are used for searching common characteristics of different nodes.

The photoetching hot area detection method based on the federal personalized learning provided by the invention solves the isomerism challenge in theory and experiment. The key thought of the invention is that global model parameters representing common characteristics of each node are subjected to federal fusion so as to fuse the common characteristics of each node, and local model parameters representing the characteristic characteristics of each node are subjected to local fine adjustment so as to make corresponding adjustment according to the heterogeneity of each node, thereby improving the stability of the model. These improvements improve the stability and overall accuracy of federal personalized learning in heterogeneous environments (nonIID).

This embodiment uses 2 data sets, one of which is ICCAD 2012 content and the other is an industry data set asml 1. These 2 data sets are all very representative data sets in the field of lithography hot spots, and the basic information is shown in table 2. The four columns of the table list the total number of lithographically defined hot zones (hotspots) and non-lithographically defined hot zones (non-hotspots) in the training set and test set, respectively.

TABLE 2 data set basic information

The following are experimental details of the implementation:

dividing ICCAD and asml1 data sets into data sets with different quantities, wherein each data set corresponds to a node, and performing synchronous/asynchronous training, specifically: the ICCAD and the asml1 are divided into 1, 2 and 5 data sets respectively, 2, 4 and 10 data sets are totally arranged, and the data sets correspond to 2, 4 and 10 nodes and are respectively tested.

The algorithm used for comparison includes:

FedAvg (traditional federal learning)

FedProx (modified version of traditional federal learning)

Local train (only build convolutional neural network for training and detection)

Federated Personalized Learning (method of the invention FPL, last layer fc2 as local parameter for fine tuning)

First, in case of synchronization (i.e. all nodes are involved in training and parameter aggregation):

the experimental results of the two data sets divided into 2 nodes for synchronous training are shown in fig. 5, and the total accuracy (acc) comparison results are shown in fig. 6. For the 4 algorithms, the index of the left graph of each algorithm is True Positive Rate (TPR, i.e. the proportion of correctly determined Positive samples to all Positive samples), and the index of the right graph is False Positive Rate (FPR, i.e. the proportion of number of incorrectly determined negative samples to the total number of negative samples). The higher the TPR, the better, the lower the FPR. Finally, comparing the total precision, the FPL of the invention can be found to have obviously better performance.

The experimental results of the two data sets divided into 4 nodes for synchronous training are shown in fig. 7, and the total accuracy (acc) comparison results are shown in fig. 8. It can be found that the performance of the FPL of the invention is still superior to other methods.

The experimental results of the two data sets divided into 10 nodes for synchronous training are shown in fig. 9, and the total accuracy (acc) comparison results are shown in fig. 10. It can be found that the performance of the FPL of the invention is still significantly higher than that of conventional federal learning and FedProx. Compared with pure local training, the FPL of the invention simultaneously achieves higher True Positive Rate, lower False Positive Rate and obviously better performance. The result shows that when the number of samples on the node is small, the FPL can successfully use the information of other nodes to improve the performance of the node.

Secondly, in the asynchronous (i.e. only part of the nodes in each round of aggregation participate in training and parameter aggregation) case:

the experimental results of the asynchronous training of the two data sets divided into 4 nodes are shown in fig. 11, and the total accuracy (acc) comparison results are shown in fig. 12. And randomly selecting half of nodes for training and aggregating in each round. The result here is similar to that of 4-node synchronous training. Overall the performance of the FPL of the invention is better than other approaches.

The experimental results of the asynchronous training of the two data sets divided into 10 nodes are shown in fig. 13, and the overall accuracy (acc) comparison results are shown in fig. 14. It can be found that the convergence rate of the FPL of the invention is still higher than that of other methods, and the performance is obviously better. Compared with other methods, the FPL still can simultaneously achieve a higher True Positive Rate, a lower False Positive Rate and a significantly better performance. The above experiment shows that the FPL can still improve the local performance by using other node information under the asynchronous condition.

According to the above experimental results, it can be found that:

1. the invention adopts federal personalized learning, does not need each node to share photoetching hot area data information, and can learn the wanted knowledge from the global model only by sharing the common characteristic parameters of each node, thereby achieving satisfactory effect.

2. The TPR and FPR in the photoetching hot area detection are well balanced, and the final photoetching hot area detection precision is improved.

3. Meanwhile, the number of parameters of the local vernier (fc2) is very small, and only accounts for 0.68% of the parameters of the global model.

4. The global model parameter is a very good pre-training parameter, the global model parameter contains common characteristics of photoetching hot zone data of a plurality of nodes, and the set model precision can be achieved only by carrying out local updating for a few times on the basis of the global model parameter.

5. The algorithm can learn global knowledge, can also carry out self-adaptive adjustment according to respective local photoetching hot area data sets, and has strong tolerance on data heterogeneity in the photoetching hot area field.

6. Compared with the existing method (FL, local train, fepprox), the method of the invention can well solve the problems of model isomerism and data isomerism in the field of lithography hot zone detection, can realize better and more stable personalized performance as long as the neural network parameters for global polymerization are kept consistent, and improves the precision by about 10% compared with the FL.

7. Asynchronous, the method of the invention supports dynamic joining or exiting of nodes in the training process.

Hereinbefore, specific embodiments of the present invention are described with reference to the drawings. However, those skilled in the art will appreciate that various modifications and substitutions can be made to the specific embodiments of the present invention without departing from the spirit and scope of the invention. Such modifications and substitutions are intended to be included within the scope of the present invention as defined by the appended claims.

Claims

1. A photoetching hot zone detection method based on federal personalized learning is characterized by comprising the following steps:

s3, establishing a lithography hot zone federal personalized learning model:

wherein, w_globalIs a global model parameter that is common to all nodes,

And is

n_kIs the number of samples of the node k,

then F_k(. cndot.) can be defined as:

where l (-) is a loss function based on a certain sample;

2. The lithography hotspot detection method based on federal personalized learning according to claim 1, wherein the convolutional neural network construction of each node is specifically as follows:

3. The method for lithography hotspot detection based on federal personalized learning according to claim 1, wherein in the step S4, when all nodes participate in the aggregation, i.e. synchronous aggregation, the process is as follows:

photoetching global parameters of a hot zone model according to all nodes

。

4. the method for lithography hotspot detection based on federal personalized learning according to claim 1, wherein in the step S4, when part of nodes participate in aggregation, i.e. asynchronous aggregation, the process is as follows:

setting a threshold value K (K is more than or equal to 1 and less than N) of the number of aggregation nodes, and enabling a central server to collect the output of the former K response nodes;after collecting the outputs of the K nodes, the central server stops waiting for the rest of the nodes; in this update, the K +1 th to Nth nodes are regarded as laggard nodes; let S_t(|S_tK) is the set of the first K response nodes in the t-th iteration, and the aggregation formula is as follows:

wherein n is_KIs the sum of the sample data size of the first K nodes,