WO2022052523A1 - 晶圆检测任务的处理方法、装置、系统及存储介质 - Google Patents

晶圆检测任务的处理方法、装置、系统及存储介质 Download PDF

Info

Publication number
WO2022052523A1
WO2022052523A1 PCT/CN2021/097390 CN2021097390W WO2022052523A1 WO 2022052523 A1 WO2022052523 A1 WO 2022052523A1 CN 2021097390 W CN2021097390 W CN 2021097390W WO 2022052523 A1 WO2022052523 A1 WO 2022052523A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
working
nodes
wafer inspection
load
Prior art date
Application number
PCT/CN2021/097390
Other languages
English (en)
French (fr)
Inventor
瞿德清
Original Assignee
长鑫存储技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 长鑫存储技术有限公司 filed Critical 长鑫存储技术有限公司
Priority to EP21769327.4A priority Critical patent/EP3992878B1/en
Priority to US17/400,431 priority patent/US12085916B2/en
Publication of WO2022052523A1 publication Critical patent/WO2022052523A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06316Sequencing of tasks or work
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L21/00Processes or apparatus adapted for the manufacture or treatment of semiconductor or solid state devices or of parts thereof
    • H01L21/67Apparatus specially adapted for handling semiconductor or electric solid state devices during manufacture or treatment thereof; Apparatus specially adapted for handling wafers during manufacture or treatment of semiconductor or electric solid state devices or components ; Apparatus not specifically provided for elsewhere
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L21/00Processes or apparatus adapted for the manufacture or treatment of semiconductor or solid state devices or of parts thereof
    • H01L21/67Apparatus specially adapted for handling semiconductor or electric solid state devices during manufacture or treatment thereof; Apparatus specially adapted for handling wafers during manufacture or treatment of semiconductor or electric solid state devices or components ; Apparatus not specifically provided for elsewhere
    • H01L21/67005Apparatus not specifically provided for elsewhere
    • H01L21/67242Apparatus for monitoring, sorting or marking
    • H01L21/67276Production flow monitoring, e.g. for increasing throughput

Definitions

  • the present application relates to the technical field of information processing, and in particular, to a method, device, system and storage medium for processing wafer inspection tasks.
  • Wafers are the basic raw materials for manufacturing semiconductor devices. Ultra-high-purity semiconductors are prepared into wafers through crystal pulling, slicing and other processes. The wafers undergo a series of semiconductor manufacturing processes to form extremely tiny circuit structures, which are then cut, packaged, and tested to become chips, which are widely used in various electronic equipment. .
  • wafer appearance inspection equipment is often used for quality inspection.
  • the wafer appearance inspection equipment is based on the intelligent defect detection algorithm of deep learning, which can effectively improve the accuracy of wafer inspection compared with the traditional image recognition detection algorithm.
  • the algorithm based on deep learning has high complexity, and the wafer appearance inspection equipment that relies on the central processing unit (CPU) to provide computing power can no longer meet the real-time requirements of wafer defect inspection.
  • the inspection capacity of the equipment is limited and the inspection efficiency is low.
  • the present application provides a processing method, device, system and storage medium for a wafer inspection task, so as to improve the inspection efficiency of a wafer inspection system.
  • an embodiment of the present application provides a method for processing a wafer inspection task, which is applied to a resource management node, where the resource management node is connected to a plurality of working nodes, and the method includes:
  • a target work node is determined according to the weight value of each work node; wherein, the target work node is the work node with the largest weight value among the plurality of work nodes, and the weight value of each work node is based on each work node.
  • the parameters for assigning wafer inspection tasks determined by the load information;
  • the wafer inspection task is sent to the target worker node.
  • the method before the target working node is determined according to the weight value of each of the working nodes, the method further includes:
  • the weight value of each working node is determined; wherein, the load size of the working node is negatively correlated with the weight value of the working node.
  • the obtaining the load value of each of the worker nodes includes:
  • the work parameter set includes the GPU utilization and available video memory of multiple GPUs of the working node, the central processing unit CPU utilization of the working node, and the available memory. at least one;
  • the load value of each of the working nodes is determined according to the set of working parameters of each of the working nodes.
  • the obtaining the load value of each of the worker nodes includes:
  • a load value is received from each of the worker nodes, the load value of each of the worker nodes being determined by each of the worker nodes according to a set of working parameters.
  • the determining the load value of each of the working nodes according to the set of working parameters of each of the working nodes includes:
  • the receiving a set of working parameters from each of the working nodes includes:
  • the determining the weight value of each of the working nodes according to the multiple load values of the multiple working nodes includes:
  • the load threshold value is used to indicate the average load size of the plurality of working nodes
  • a weight value of each of the working nodes is determined according to the load threshold and multiple load values of the multiple working nodes.
  • determining the weight value of each of the working nodes according to the load threshold and multiple load values of the multiple working nodes includes:
  • the weight value of the worker nodes whose load value is less than or equal to the load threshold value is set as the number of GPUs connected to the worker nodes.
  • an embodiment of the present application provides a processing device for a wafer inspection task, the processing device is connected to a plurality of working nodes, and the processing device includes:
  • a receiving module configured to receive a wafer inspection task from the storage server, where the wafer inspection task includes at least one wafer picture;
  • a processing module configured to determine a target work node according to the weight value of each work node; wherein, the target work node is the work node with the largest weight value among the plurality of work nodes, and the weight value of each work node is Parameters for allocating wafer inspection tasks determined according to the load information of each worker node;
  • a sending module configured to send the wafer detection task to the target working node.
  • an embodiment of the present application provides a processing device for a wafer inspection task, including:
  • the memory stores instructions executable by the at least one processor to enable the processing device to perform the method of any one of the first aspects.
  • an embodiment of the present application provides a wafer inspection system, including:
  • each of the working nodes includes a plurality of graphics processing units (GPUs);
  • the resource management node is configured to perform the method according to any one of the first aspects.
  • each of the worker nodes after receiving the wafer inspection task sent by the resource management node, each of the worker nodes selects an idle GPU from a plurality of GPUs in the worker node, and assigns the The wafer inspection task is allocated to the idle GPU for execution.
  • an embodiment of the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are executed by a processor, the processor can be executed The method of any one of the first aspects.
  • Embodiments of the present application provide a method, device, system, and storage medium for processing a wafer inspection task.
  • the method includes: a resource management node receives a wafer inspection task from a storage server, selects a target work node from a plurality of work nodes according to a weight value of each work node connected to the resource management node, and allocates the wafer inspection task to the target worker node.
  • the target worker node selects an idle GPU from the resource pool, and assigns the received wafer inspection task to the idle GPU for execution.
  • the GPU preprocesses the wafer images in the wafer inspection task, and inputs the processed wafer images into the wafer inspection model to obtain inspection results.
  • the wafer inspection tasks are distributed to the GPUs of each working node for execution to achieve load balancing between working nodes and between GPUs, which can meet the real-time requirements for defect detection of massive wafer images. Increased throughput of the system for wafer inspection tasks.
  • FIG. 1 is a schematic structural diagram of a wafer inspection system provided by an embodiment of the present application.
  • FIG. 2 is an execution flowchart of a resource management node provided by an embodiment of the present application
  • FIG. 3 is an execution flowchart of a worker node provided by an embodiment of the present application.
  • FIG. 4 is an interactive schematic diagram of a method for processing a wafer inspection task provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of a wafer inspection model training method provided by an embodiment of the present application.
  • FIG. 6 is an interactive schematic diagram of a method for processing a wafer inspection task provided by an embodiment of the present application
  • FIG. 7 is a schematic structural diagram of a processing device for a wafer inspection task provided by an embodiment of the present application.
  • FIG. 8 is a hardware schematic diagram of a processing apparatus for a wafer inspection task provided by an embodiment of the present application.
  • the technical solutions provided by the embodiments of the present application relate to the field of semiconductor wafer production, and in particular, to the field of defect detection in the wafer production process.
  • the chip (ie integrated circuit) industry is a strategic, basic and leading industry for national economic and social development, and plays a key role in several major fields such as computers, consumer electronics, network communications, and automotive electronics.
  • the production and manufacturing process of chips is very complex.
  • wafers are the main material for manufacturing chips, and their surface defects are the main obstacles that affect the product yield.
  • By detecting wafer surface defects not only defective dies can be found, but also faults in the process flow can be judged based on the distribution pattern of defective dies, so that engineers can improve the process.
  • wafer defect detection is mainly divided into two categories, one is to detect the electrical properties of the die through probe testing, and the other is to detect defects on the wafer surface through manual visual inspection. Both methods require experienced engineers to analyze and judge, which is labor-intensive, labor-intensive and prone to errors. In the case of the continuous improvement of factory production capacity, the detection efficiency through manual methods is low.
  • the detection method mainly includes the following steps: first, feature extraction is performed on the wafer image, and then the extracted features are input into a machine learning model for judgment, and the wafer defects in the wafer image are classified and identified. Subsequently, detection methods based on deep learning emerged. As the hottest machine learning method at present, deep learning requires a large amount of training data. Compared with the above-mentioned detection methods of image recognition, it can further improve the accuracy of wafer detection and reduce the false alarm rate. .
  • the detection method based on deep learning has higher algorithm complexity, and devices that rely on the central processing unit (CPU) to provide computing power cannot meet the real-time requirements of the algorithm.
  • the inspection machine will generate a large number of wafer images every day. At present, the inspection capacity of the equipment is limited and the inspection efficiency is low.
  • the embodiment of the present application provides a distributed solution based on a graphics processor GPU cluster, which distributes the detection of wafer images to the GPUs of each working node in the cluster for execution, thereby improving the throughput of intelligent defect detection.
  • a graphics processor GPU cluster which distributes the detection of wafer images to the GPUs of each working node in the cluster for execution, thereby improving the throughput of intelligent defect detection.
  • the embodiment of the present application also provides a distributed system architecture based on GPU cluster, aiming at the characteristics of intelligent wafer defect detection, through a customized GPU cluster scheduling algorithm, the resource utilization rate of GPU is optimized, and the intelligent defect detection of wafer is improved. throughput.
  • FIG. 1 is a schematic diagram of the architecture of a wafer inspection system provided by an embodiment of the present application.
  • the wafer inspection system provided by this embodiment includes: a plurality of image acquisition devices, a storage server, and at least one resource management node ( Figure 1 shows a resource management node), multiple worker nodes.
  • each image acquisition device is connected to a storage server
  • the storage server is connected to a resource management node
  • the resource management server is respectively connected to a plurality of working nodes.
  • Each worker node includes multiple GPUs, which are used to actually perform wafer inspection tasks.
  • the image acquisition device of this embodiment is used to collect pictures of each wafer on the production line, and the image acquisition device stores the collected wafer pictures on a storage server.
  • the image acquisition device can be set on the inspection machine of the production line.
  • the storage server in this embodiment is used to store wafer pictures from different image acquisition devices, and trigger the GPU cluster to perform intelligent defect detection on the wafer pictures.
  • the GPU cluster includes a Resource Manager Node (RMN for short) and a Work Node (WN for short).
  • the resource management node is responsible for the scheduling of wafer inspection tasks, and the worker nodes are responsible for the execution of wafer inspection tasks.
  • the storage server sends the wafer inspection task to the resource management node.
  • the wafer inspection task is sent to the GPU that actually executes the task.
  • the resource management node assigns the wafer inspection task to the worker node, and the worker node assigns the wafer inspection task to the GPU.
  • the resource management node may use a dynamic weight-based polling algorithm to assign wafer inspection tasks to the worker nodes, and periodically check the health status of each worker node connected to the resource management node.
  • FIG. 2 is an execution flowchart of a resource management node provided by an embodiment of the present application.
  • the resource management node After receiving a wafer inspection task from a storage server, the resource management node first determines whether to update the working node. If the weight of the working node needs to be updated, the load of the working node is calculated and the weight of the working node is updated. Based on the updated weight, a working node is selected from multiple working nodes, and the wafer inspection task is assigned to the selected working node. the worker node.
  • the resource management node may update the weights of the worker nodes periodically, for example, every 5 minutes.
  • the resource management node may update the weight of the worker node after receiving the load information reported by the worker node (usually when the load information of the worker node changes).
  • the load information of the working node can be the load value of the working node (representing the load size of the working node), or it can be the working parameter set of the working node, and the working parameter set includes at least one of the following: the utilization rate of the working node CPU, the available Memory, the utilization and available video memory of each GPU of the worker node.
  • the worker node includes multiple GPUs, and allocates wafer inspection tasks to idle GPUs by maintaining a task queue (Task Queue) and a resource pool (Resource Pool).
  • the task queue is responsible for maintaining the wafer inspection tasks that need to be executed, and the wafer inspection tasks are executed in the order of "first-in, first-out" in the task queue.
  • the resource pool is responsible for maintaining the idle/busy state of the GPU, and the resource pool includes idle GPUs.
  • FIG. 3 is an execution flowchart of a worker node provided by an embodiment of the present application.
  • the worker node includes two parts: a CPU and a GPU, and the execution process of the worker node will be performed on two hardware devices, the CPU and the GPU. run.
  • the CPU part is responsible for the scheduling of wafer inspection tasks from the working node to the GPU, and the wafer inspection tasks are executed in a first-in, first-out order.
  • the GPU part is responsible for the defect detection task of the wafer image in the wafer inspection task, including the preprocessing of the wafer image, the wafer defect detection based on the wafer inspection model, and the post-processing of the inspection results.
  • the detection result includes a label value for indicating whether a wafer defect exists in the wafer image, multiple defect categories, a confidence level corresponding to each defect category, and a defect location.
  • the post-processing of the inspection results may be to remove defect categories with low confidence, and only retain the defect category with the highest confidence and the defect position of the defect category.
  • the worker node obtains wafer inspection tasks from the task queue and determines whether there are idle GPUs in the resource pool. If there is an idle GPU in the resource pool, the worker node randomly selects an idle GPU from the resource pool, assigns the wafer inspection task to the idle GPU, and at the same time updates the GPU status as busy, waiting for the wafer inspection task After execution, update the GPU state to idle. If there is no idle GPU in the resource pool, wait until there is an idle GPU in the resource pool.
  • a wafer inspection model is preset in the GPU, and the model can be trained based on any deep learning model.
  • Using the wafer inspection model to detect defects in wafer images requires a lot of computing resources.
  • By hardware acceleration on the GPU the performance can be improved by more than 10 times compared to the CPU, which meets the real-time requirements of wafer inspection.
  • FIG. 4 is an interactive schematic diagram of a processing method for a wafer inspection task provided by an embodiment of the present application. As shown in FIG. 4 , the processing method provided by this embodiment includes the following steps:
  • Step 101 The storage server sends a wafer inspection task to the resource management node, where the wafer inspection task includes at least one wafer picture.
  • the storage server can be connected to one resource management node, or it can be connected to multiple resource management nodes.
  • the steps in this embodiment take a resource management node as an example to introduce the solution.
  • the storage server may simultaneously send different wafer inspection tasks to multiple resource management nodes, and each resource management node is responsible for allocating the received wafer inspection tasks to a worker node connected to it.
  • the storage server may be a network storage server, such as a network attached storage (Network Attached Storage, NAS for short) server.
  • NAS Network Attached Storage
  • Step 102 The resource management node determines the target working node according to the weight value of each working node.
  • the target work node is the work node with the largest weight value among the plurality of work nodes, and the weight value of each work node is a parameter for allocating the wafer inspection task determined according to the load information of each work node.
  • the load information of the working node can be a load value, and the load value is used to indicate the load size of the working node within a preset time period, and it can also be a working parameter used to indicate the load condition of the working node, and the working parameter includes the CPU utilization of the working node, At least one of available memory, GPU utilization of multiple GPUs, and available video memory.
  • the available memory refers to the available memory of the CPU
  • the available video memory refers to the available video memory of the GPU.
  • the weight of the worker node can be set by default to the number of GPUs of the worker node, which is expressed by the formula as where x ⁇ (1,2,...,n), n is a positive integer greater than or equal to 2, and n corresponds to the number of working nodes in Figure 1.
  • Step 103 The resource management node sends the wafer inspection task to the target worker node.
  • Step 104 The target worker node determines an idle GPU.
  • Step 105 The target worker node sends a wafer inspection task to the GPU.
  • the target worker node receives the wafer inspection task sent by the resource management node, determines whether there is an idle GPU in the resource pool, and if there is an idle GPU in the resource pool, assigns the wafer inspection task to the idle GPU. If there is no idle GPU in the resource pool, wait for an idle GPU in the resource pool, and then assign the wafer inspection task to the idle GPU.
  • the target worker node can randomly assign the wafer inspection task to one of the idle GPUs.
  • the target worker node can allocate the wafer inspection task to the idle GPU with the largest video memory according to the size of the video memory of the GPU.
  • Step 106 the GPU executes the wafer inspection task.
  • the GPU receives the wafer inspection task sent by the working node, and firstly preprocesses the wafer image in the wafer inspection task to obtain the processed wafer image.
  • the preprocessing includes at least one of rotation, cropping, scaling, and numerical normalization of the wafer image.
  • Data normalization refers to normalizing the RGB value and position information of each pixel of the wafer image to [0, 1].
  • the advantage of normalizing to [0, 1] is to make different dimensions Data values (RGB values, positions) can be compared by the same dimension, so that each feature contributes the same value to the result.
  • the processed wafer image meets the requirements of the wafer inspection model for the input image.
  • the processed wafer images are input into the pre-trained wafer inspection model to obtain inspection results.
  • the wafer inspection model is obtained by training an arbitrary deep learning model.
  • the detection result output by the model includes at least one item of a label used to indicate whether a wafer defect exists in the wafer image, a defect category, and a defect location.
  • the label may be a label value, for example, 0 indicates that there is no wafer defect in the wafer picture, and 1 indicates that there is a wafer defect in the wafer picture.
  • the defect category may be indicated by the ID of the defect category, for example, wafer defects include scratch defects, particle defects, poor coating, poor edge coverage and other defects.
  • the defect position indicates the area of the wafer defect, and the area may be a rectangular area.
  • the rectangular area may be represented by diagonal vertex coordinates or four vertex coordinates.
  • the detection result also includes a confidence level (which can be understood as a probability value) corresponding to the defect category.
  • a resource management node receives a wafer inspection task from a storage server, and selects a target from a plurality of work nodes according to the weight value of each work node connected to the resource management node.
  • Worker node which assigns wafer inspection tasks to target worker nodes.
  • the target worker node selects an idle GPU from the resource pool, and assigns the received wafer inspection task to the idle GPU for execution.
  • the GPU preprocesses the wafer images in the wafer inspection task, and inputs the processed wafer images into the wafer inspection model to obtain inspection results.
  • the wafer inspection tasks are distributed to the GPUs of each working node for execution to achieve load balancing between working nodes and between GPUs, which can meet the real-time requirements for defect detection of massive wafer images. Increased throughput of the system for wafer inspection tasks.
  • each GPU of the working node includes a wafer inspection model
  • the training process of the wafer inspection model is briefly introduced below with reference to the accompanying drawings.
  • FIG. 5 is a flowchart of a training process of a wafer inspection model provided by an embodiment of the present application. As shown in FIG. 5 , the training method of a wafer inspection model includes the following steps:
  • Step 201 establishing an initial wafer inspection model.
  • Step 202 Acquire an image sample and an annotation result of the image sample, wherein the image sample includes a first image sample of different wafer defect categories and a second image sample without wafer defects, and the annotation result includes a label used to indicate whether the image sample is in the image sample.
  • the first image samples include different shooting angles, different wafer defect types, different wafer defect locations, different surfaces (the front and back of the wafer), and different environmental conditions (such as lighting conditions, temperature environments, humidity environments) etc.) wafer pictures.
  • the second image samples include images of wafers without wafer defects at different shooting angles, different surfaces, and different environmental conditions.
  • Step 203 using the image sample as the input of the wafer inspection model, using the labeling result of the image sample as the output of the wafer inspection model, and training the initial wafer inspection model to obtain the wafer inspection model.
  • the GPU of this embodiment uses the above-mentioned wafer detection model to perform defect detection on wafer images, which can greatly improve detection accuracy and detection efficiency.
  • FIG. 6 is an interactive schematic diagram of a processing method for a wafer inspection task provided by an embodiment of the present application.
  • the processing method provided by this embodiment includes the following steps:
  • Step 301 The storage server sends a wafer inspection task to the resource management node, and the wafer inspection task includes at least one wafer picture.
  • Step 302 The resource management node acquires the load value of each worker node, where the load value is used to indicate the load size of the worker node within a preset time period.
  • the above step 302 includes: the resource management node receives a working parameter set from each working node; and determining the load value of each working node according to the working parameter set of each working node.
  • the working parameter set includes GPU utilization and available video memory of multiple GPUs of the working node, and CPU utilization and available memory of the working node.
  • the resource management node can receive the set of work parameters from the work node in the following ways:
  • the resource management node periodically receives the set of working parameters from each working node.
  • the resource management node may periodically send a query request to multiple worker nodes, where the query request is used to request the current working status of the worker nodes.
  • each worker node sends a query response to the resource management node, where the query response includes the current set of working parameters of the worker node.
  • the resource management node and the working node agree on a reporting period of the working parameter set, and each working node actively reports the working parameter set to the resource management node according to the reporting period.
  • the resource management node receives the work parameter set from each worker node.
  • the worker node 1 assigns the wafer inspection task to the GPU1, and when the GPU1 starts to perform the wafer inspection task, the worker node 1 reports the working parameter set of the worker node 1 to the resource management node.
  • the worker node 1 reports the working parameter set of the worker node 1 to the resource management node.
  • the resource management node determines the load value of each work node according to the work parameter set of each work node, including: the resource management node determines each work node according to the preset weight value set and the work parameter set of each work node.
  • the preset weight value set includes weight coefficients used to indicate GPU utilization, available video memory, CPU utilization, and available memory.
  • the resource management node can determine the load value of each worker node by the following formula:
  • L x represents the load value of the xth working node, where x ⁇ (1,2,...,n), where n is a positive integer greater than or equal to 2.
  • U CPU represents the CPU utilization of the x-th worker node
  • a RAM represents the available memory capacity of the x-th worker node.
  • f GPU represents the weight factor of GPU utilization
  • f VRAM represents the weight factor of available video memory
  • f CPU represents the weight factor of CPU utilization
  • f RAM represents the weight factor of available memory
  • the above step 302 includes: the resource management node receives a load value from each worker node, and the load value of each worker node is determined by each worker node according to the set of working parameters.
  • the worker node directly reports its load value to the resource management node, which can reduce the calculation amount of the resource management node, save some computing resources, and improve the processing performance of the resource management node.
  • Step 303 The resource management node determines the weight value of each working node according to the multiple load values of the multiple working nodes.
  • the load value of the work node is negatively correlated with the weight value of the work node, that is, the larger the load value of the work node, the smaller the weight value of the work node, and the smaller the load value of the work node, the larger the weight value of the work node.
  • the above step 303 includes: the resource management node obtains multiple load values of multiple working nodes, determines a load threshold according to the multiple load values, and determines the load threshold according to the load threshold and multiple loads of multiple working nodes. value to determine the weight value of each worker node.
  • the load threshold is used to indicate the average load of multiple worker nodes.
  • the load threshold is a dynamically changing threshold.
  • the resource management node can calculate the load threshold T through the following formula:
  • is the coefficient, ⁇ [1,+ ⁇ ), and ⁇ is set to 2 by default.
  • the resource management node determines that the load value of a worker node is greater than the load threshold Then set the weight value of the worker node to 1. The probability that the worker node is assigned to the task is reduced to The working node in the above situation is considered to be in a sub-health state, indicating that the load of the working node is too large, so try not to assign new tasks to the working node, which can reduce the probability that the working node is assigned to a new task.
  • the resource management node may periodically calculate the above-mentioned load threshold, obtain the latest load threshold, and adjust the weight value of each worker node based on the load threshold. This method occupies less resources of the resource management node, but the update of the weight value of each worker node is not timely enough.
  • the resource management node after receiving the load information reported by the worker nodes (as long as one worker node reports updated load information), the resource management node recalculates the load threshold to obtain the updated load threshold, and based on this The load threshold adjusts the weight value of each worker node. In this way, the update of the weight value of each working node is timely, but the resource is high.
  • Step 304 The resource management node determines the target working node according to the weight value of each working node.
  • Step 305 The resource management node sends the wafer inspection task to the target worker node.
  • Step 306 The target worker node determines an idle GPU.
  • Step 307 The target worker node sends a wafer inspection task to the GPU.
  • Step 308 the GPU executes the wafer inspection task.
  • Steps 304 to 308 in this embodiment are the same as steps 102 to 106 in the foregoing embodiment.
  • steps 304 to 308 in this embodiment are the same as steps 102 to 106 in the foregoing embodiment.
  • the resource management node receives the wafer inspection task from the storage server, obtains the weight value of each working node and the current load threshold, and compares the weight value of each working node with the current load threshold. Adjust the weight value of each worker node according to the current load threshold. Then, according to the weight value of each working node, a target working node is selected from multiple working nodes, and the wafer inspection task is allocated to the target working node. The target worker node selects an idle GPU from multiple GPUs, and assigns the wafer inspection task to the idle GPU for execution.
  • the wafer inspection tasks are distributed to the GPUs of each working node for execution to achieve load balancing between working nodes and between GPUs, which can meet the real-time requirements for defect detection of massive wafer images. Increased throughput of the system for wafer inspection tasks.
  • the processing apparatus may be divided into functional modules according to the above method embodiments.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, or can be implemented in the form of software function modules. It should be noted that, the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and there may be other division manners in actual implementation. The following description will be given by using the division of each function module corresponding to each function as an example.
  • FIG. 7 is a schematic structural diagram of a processing apparatus for a wafer inspection task provided by an embodiment of the present application. As shown in FIG. 7 , the processing device 400 of the wafer inspection task in this embodiment is connected to a plurality of working nodes, and the processing device 400 includes:
  • the receiving module 401 is configured to receive a wafer inspection task from a storage server, where the wafer inspection task includes at least one wafer picture;
  • the processing module 402 is configured to determine a target work node according to the weight value of each work node; wherein, the target work node is the work node with the largest weight value among the plurality of work nodes, and the weight value of each work node is is the parameter for allocating wafer inspection tasks determined according to the load information of each working node;
  • the sending module 403 is configured to send the wafer detection task to the target working node.
  • the processing module 402 before determining the target working node according to the weight value of each of the working nodes, is further configured to:
  • the weight value of each working node is determined; wherein, the load size of the working node is negatively correlated with the weight value of the working node.
  • the receiving module 401 is further configured to receive a work parameter set from each of the worker nodes, where the work parameter set includes GPU utilization and available video memory of multiple GPUs of the worker node. , at least one item of CPU utilization of the central processing unit and available memory of the working node;
  • the processing module 402 is further configured to determine the load value of each of the working nodes according to the set of working parameters of each of the working nodes.
  • the receiving module 401 is further configured to receive a load value from each of the working nodes, where the load value of each of the working nodes is a set of working parameters for each of the working nodes definite.
  • processing module 402 is specifically configured to:
  • the receiving module 401 is specifically used for:
  • processing module 402 is specifically configured to:
  • the load threshold value is used to indicate the average load size of the plurality of working nodes
  • a weight value of each of the working nodes is determined according to the load threshold and multiple load values of the multiple working nodes.
  • processing module 402 is specifically configured to:
  • the weight value of a worker node whose load value is less than or equal to the load threshold is set as the number of GPUs of the worker node.
  • the processing apparatus provided by the embodiment of the present application is configured to execute each step of the resource management node in any of the foregoing method embodiments, and the implementation principle and technical effect thereof are similar, and details are not described herein again.
  • FIG. 8 is a hardware schematic diagram of a processing apparatus for a wafer inspection task provided by an embodiment of the present application. As shown in FIG. 8 , the processing apparatus 500 for the wafer inspection task of this embodiment includes:
  • processor 501 (only one processor is shown in Figure 8);
  • a memory 502 in communication with the at least one processor;
  • the memory 502 stores instructions executable by the at least one processor 501, and the instructions are executed by the at least one processor 501, so that the processing device 500 can perform the resource management in any of the foregoing method embodiments The individual steps of the node.
  • the memory 502 may be independent or integrated with the processor 501 .
  • the processing apparatus 500 further includes: a bus 503 for connecting the memory 502 and the processor 501 .
  • the present application further provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are executed by a processor, are used to implement the resource management node in any of the foregoing method embodiments technical solution.
  • Embodiments of the present application also provide a wafer inspection system, as shown in FIG. 1 , the system includes:
  • At least one resource management node and multiple worker nodes connected to the resource management node, wherein each worker node includes multiple graphics processing units (GPUs), and the resource management node is configured to perform each step of the resource management node in any of the foregoing method embodiments .
  • GPUs graphics processing units
  • each worker node after receiving the wafer inspection task sent by the resource management node, each worker node selects an idle GPU from multiple GPUs in the worker node, and assigns the wafer inspection task to the idle one GPU execution.
  • the GPU is used to perform wafer inspection tasks.
  • the GPU includes a wafer inspection model, and the wafer inspection model is obtained by training a deep learning model, and is used to detect whether the wafer in each wafer picture in the wafer inspection task has defects, Defect category and defect location.
  • the wafer inspection system provided by the embodiment of the present application is a distributed architecture inspection system based on a GPU cluster, which can realize real-time inspection of massive wafer images generated by a defect inspection machine, and can assist the production department to quickly locate problematic defect inspections.
  • the machine can help to find out the process problems as soon as possible, respond to the production deviation in time, and greatly reduce the cost of detecting the problem machine, which can improve the process yield and reduce the production cost.
  • processors mentioned in the embodiments of the present application may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application-specific integrated circuits ( Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable Gate Array (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory mentioned in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically programmable read-only memory (Erasable PROM, EPROM). Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be Random Access Memory (RAM), which acts as an external cache.
  • RAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • enhanced SDRAM ESDRAM
  • synchronous link dynamic random access memory Synchlink DRAM, SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components
  • the memory storage module
  • memory described herein is intended to include, but not be limited to, these and any other suitable types of memory.
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Power Engineering (AREA)
  • Condensed Matter Physics & Semiconductors (AREA)
  • Manufacturing & Machinery (AREA)
  • Computer Hardware Design (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Automation & Control Theory (AREA)
  • Testing Or Measuring Of Semiconductors Or The Like (AREA)

Abstract

本申请提供一种晶圆检测任务的处理方法、装置、系统及存储介质。该方法包括:资源管理节点接收来自存储服务器的晶圆检测任务,根据与资源管理节点连接的每个工作节点的权重值,从多个工作节点中选出目标工作节点,将晶圆检测任务分配给目标工作节点。目标工作节点从资源池中选出一个空闲GPU,将晶圆检测任务分配给空闲GPU执行。GPU对晶圆检测任务中的晶圆图片进行预处理,将处理后的晶圆图片输入到晶圆检测模型,得出检测结果。通过上述两级任务调度,将晶圆检测任务分散到各个工作节点的GPU上执行,实现工作节点之间以及GPU之间的负载均衡,可满足对海量晶圆图片进行缺陷检测的实时性要求,提高了系统处理晶圆检测任务的吞吐量。

Description

晶圆检测任务的处理方法、装置、系统及存储介质
本申请要求于2020年09月11日提交中国专利局、申请号为202010955527.5、申请名称为“晶圆检测任务的处理方法、装置、系统及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息处理技术领域,尤其涉及一种晶圆检测任务的处理方法、装置、系统及存储介质。
背景技术
晶圆(wafer)是制造半导体器件的基础性原材料。极高纯度的半导体经过拉晶、切片等工序制备成为晶圆,晶圆经过一系列半导体制造工艺形成极微小的电路结构,再经切割、封装、测试成为芯片,广泛应用到各类电子设备当中。
在大批量生产晶圆的情况下,外观检测依靠人工是完全应付不过来的,因此,当前常采用晶圆外观检测设备来进行品质检测。晶圆外观检测设备基于深度学习的智能缺陷检测算法,相比传统图像识别检测算法,能够有效提升晶圆检测准确率。
然而,基于深度学习的算法复杂度较高,依赖中央处理器(central processing unit,CPU)提供算力的晶圆外观检测设备已无法满足晶圆缺陷检测的实时性需求。尤其是在工厂产能不断提升的情况下,每天有海量的晶圆图片待检测,目前设备的检测能力有限,检测效率低。
发明内容
本申请提供一种晶圆检测任务的处理方法、装置、系统及存储介质,提高晶圆检测系统的检测效率。
第一方面,本申请实施例提供一种晶圆检测任务的处理方法,应用于资源管理节点,所述资源管理节点与多个工作节点连接,所述方法包括:
接收来自存储服务器的晶圆检测任务,所述晶圆检测任务中包括至少一个晶圆图片;
根据每个所述工作节点的权重值确定目标工作节点;其中,所述目标工作节点为所述多个工作节点中权重值最大的工作节点,每个工作节点的权重值是根据每个工作节点的负载信息确定的用于分配晶圆检测任务的参数;
向所述目标工作节点发送所述晶圆检测任务。
在本申请的一个实施例中,所述根据每个所述工作节点的权重值确定目标工作节点之前,所述方法还包括:
获取每个所述工作节点的负载值,所述负载值用于指示所述工作节点在预设时段内的负载大小;
根据所述多个工作节点的多个负载值,确定每个所述工作节点的权重值;其中,工作节点的负载大小与工作节点的权重值负相关。
在本申请的一个实施例中,所述获取每个所述工作节点的负载值,包括:
接收来自每个所述工作节点的工作参数集合,所述工作参数集合包括所述工作节点的多个GPU的GPU利用率、可用显存,所述工作节点的中央处理器CPU利用率、可用内存的至少一项;
根据每个所述工作节点的工作参数集合,确定每个所述工作节点的负载值。
在本申请的一个实施例中,所述获取每个所述工作节点的负载值,包括:
接收来自每个所述工作节点的负载值,所述每个所述工作节点的负载值是每个所述工作节点根据工作参数集合确定的。
在本申请的一个实施例中,所述根据每个所述工作节点的工作参数集合,确定每个所述工作节点的负载值,包括:
根据预设权重值集合以及每个所述工作节点的工作参数集合,确定每个所述工作节点的负载值;其中,所述预设权重值集合包括用于指示所述GPU利用率、所述可用显存、所述CPU利用率以及所述可用内存的权重系数。
在本申请的一个实施例中,所述接收来自每个所述工作节点的工作参数集合,包括:
周期性地接收来自每个所述工作节点的工作参数集合;或者
每个所述工作节点的晶圆检测任务开始执行或结束执行时,接收来自每个所述工作节点的工作参数集合。
在本申请的一个实施例中,所述根据所述多个工作节点的多个负载值,确定每个所述工作节点的权重值,包括:
获取所述多个工作节点的多个负载值;
根据所述多个负载值确定负载阈值,所述负载阈值用于指示所述多个工作节点的平均负载大小;
根据所述负载阈值以及所述多个工作节点的多个负载值,确定每个所述工作节点的权重值。
在本申请的一个实施例中,所述根据所述负载阈值以及所述多个工作节点的多个负载值,确定每个所述工作节点的权重值,包括:
将负载值大于所述负载阈值的工作节点的权重值设置为1;或者
将负载值小于或等于所述负载阈值的工作节点的权重值设置为与工作节点连接的GPU数量。
第二方面,本申请实施例提供一种晶圆检测任务的处理装置,所述处理装置与多个工作节点连接,所述处理装置包括:
接收模块,用于接收来自存储服务器的晶圆检测任务,所述晶圆检测任务中包括至少一个晶圆图片;
处理模块,用于根据每个所述工作节点的权重值确定目标工作节点;其中,所述目标工作节点为所述多个工作节点中权重值最大的工作节点,每个工作节点的权重值是根据每个工作节点的负载信息确定的用于分配晶圆检测任务的参数;
发送模块,用于向所述目标工作节点发送所述晶圆检测任务。
第三方面,本申请实施例提供一种晶圆检测任务的处理装置,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述处理装置能够执行第一方面中任一项所述的方法。
第四方面,本申请实施例提供一种晶圆检测系统,包括:
至少一个资源管理节点,与所述资源管理节点连接的多个工作节点;每个所述工作节点包括多个图形处理器GPU;
所述资源管理节点用于执行如第一方面中任一项所述的方法。
在本申请的一个实施例中,每个所述工作节点在接收到所述资源管理节点发送的晶圆检测任务之后,从所述工作节点中的多个GPU中选择空闲的GPU,并将所述晶圆检测任务分配给所述空闲的GPU执行。
第五方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当所述计算机执行指令被处理器执行时,使得所述处理器能够执行第一方面中任一项所述的方法。
本申请实施例提供一种晶圆检测任务的处理方法、装置、系统及存储介质。该方法包括:资源管理节点接收来自存储服务器的晶圆检测任务,根据与资源管理节点连接的每个工作节点的权重值,从多个工作节点中选出目标工作节点,将晶圆检测任务分配给目标工作节点。目标工作节点从资源池中选出一个空闲GPU,将接收到的晶圆检测任务分配给空闲GPU执行。GPU对晶圆检测任务中的晶圆图片进行预处理,将处理后的晶圆图片输入到晶圆检测模型,得出检测结果。通过上述两级任务调度,将晶圆检测任务分散到各个工作节点的GPU上执行,实现工作节点之间以及GPU之间的负载均衡,可满足对海量晶圆图片进行缺陷检测的实时性要求,提高了系统处理晶圆检测任务的吞吐量。
附图说明
图1为本申请实施例提供的晶圆检测系统的架构示意图;
图2为本申请实施例提供的资源管理节点的执行流程图;
图3为本申请实施例提供的工作节点的执行流程图;
图4为本申请实施例提供的晶圆检测任务的处理方法的交互示意图;
图5为本申请实施例提供的晶圆检测模型训练方法的示意图;
图6为本申请实施例提供的晶圆检测任务的处理方法的交互示意图;
图7为本申请实施例提供的晶圆检测任务的处理装置的结构示意图;
图8为本申请实施例提供的晶圆检测任务的处理装置的硬件示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。
此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
本申请实施例提供的技术方案涉及半导体(Semiconductor)晶圆生产领域,特别涉及晶圆生产过程中的缺陷检测(Defect Detection)领域。
芯片(即集成电路)产业是国民经济和社会发展的战略性、基础性、先导性产业,在计算机、消费类电子、网络通信、汽车电子等几大领域起着关键作用。芯片的生产制造流程非常复杂,其中,晶圆作为制造芯片的主要材料,其表面缺陷是影响产品良率的主要障碍。通过检测晶圆表面缺陷,不仅能发现有缺陷的晶粒,还能根据缺陷晶粒的分布模式判断工艺流程中存在的故障,以便工程师进行工艺上的改良。
目前,晶圆缺陷检测主要分为两类,一是通过探针测试检测晶粒的电气性能,二是通过人工目检检测晶圆表面的缺陷。这两种方式均需要经验丰富的工程师进行分析判断,耗费人力、劳动强度大且容易出现误差。在工厂产能不断提升的情况下,通过人工方式的检测效率低。
随着检测技术的不断发展,出现基于图像识别的检测方法,能够一定程度上提升晶圆检测的效率和准确率。该检测方法主要包括如下几个步骤:首先对晶圆图片进行特征提取,然后提取的特征输入到机器学习模型中进行判断,分类识别出晶圆图片的晶圆缺陷。随后,出现基于深度学习的检测方法,深度学习作为目前最热的机器学习方法,需要大量的训练数据,与上述图像识别的检测方法相比,能够进一步提升晶圆检测准确率,降低误报率。然而,基于深度学习的检测方法,算法复杂度更高,依赖中央处理器CPU提供算力的设备无法满足算法实时性要求。随着工厂产能提升,每天检测机台都会产生海量的晶圆图片,目前设备的检测能力有限,检测效率低。
针对上述技术问题,本申请实施例提供一种基于图形处理器GPU集群的分布式解决方案,将晶圆图片的检测分散至集群中各个工作节点的GPU上执行,提高智能缺陷检测的吞吐量,满足对海量晶圆图片进行智能缺陷检测的实时性要求。利用GPU的强大算力对晶圆智能缺陷检测过程进行硬件加速,降低延迟。另外,本申请实施例还提供一种基于GPU集群的分布式系统架构,针对晶圆智能缺陷检测的特性,通过定制化的GPU集群调度算法,优化GPU的资源利用率,提高晶圆智能缺陷检测的吞吐量。
在介绍本申请实施例提供的晶圆检测任务的处理方法之前,首先对该方法的系统架构进行简要介绍。
图1为本申请实施例提供的晶圆检测系统的架构示意图,如图1所示,本实施例提供的晶圆检测系统,包括:多个图像采集设备,存储服务器,至少一个资源管理节点(图1示出一个资源管理节点),多个工作节点。其中,每个图像采集设备与存储服务器连接,存储服务器与资源管理节点连接,资源管理服务器分别与多个工作节点连接。每个工作节点包括多个GPU,GPU用于实际执行晶圆检测任务。
本实施例的图像采集设备,用于采集产线上每个晶圆的图片,图像采集设备将采集的晶圆图片存储到存储服务器上。作为一种示例,图像采集设备可以设置在产线的检测机台上。
本实施例的存储服务器用于存储来自不同图像采集设备的晶圆图片,并触发GPU集群对晶圆图片进行智能缺陷检测。其中,GPU集群包括资源管理节点(Resource Manager Node,简称RMN)和工作节点(Work Node,简称WN)。资源管理节点负责晶圆检测任务的调度,工作节点负责晶圆检测任务的执行。
在本申请的一个实施例中,存储服务器向资源管理节点发送晶圆检测任务。通过两级调度算法,将晶圆检测任务下发至实际执行任务的GPU。具体的,资源管理节点将晶圆检测任务分配给工作节点,工作节点再将晶圆检测任务分配给GPU。
在本申请的一个实施例中,资源管理节点可采用基于动态权重的轮询算法将晶圆检测任务分配给工作节点,并定期检查与资源管理节点连接的各个工作节点的健康状况。
示例性的,图2为本申请实施例提供的资源管理节点的执行流程图,如图2所示,资源管理节点在接收到来自存储服务器的晶圆检测任务后,首先确定是否需要更新工作节点的权重,若需要更新工作节点的权重,则计算工作节点的负载并更新工作节点的权重,基于更新后的权重从多个工作节点中选出一个工作节点,将晶圆检测任务分配给选出的工作节点。
作为一种示例,资源管理节点可以定期更新工作节点的权重,例如每5分钟更新一次。作为一种示例,资源管理节点可以在接收到工作节点上报的负载信息(通常是在工作节点的负载信息发生变化时),可以更新该工作节点的权重。其中,工作节点的负载信息可以是工作节点的负载值(表征工作节点的负载大小),还可以是工作节点的工作参数集合,工作参数集合包括以下至少一项:工作节点CPU的利用率、可用内存,工作节点的各GPU的利用率、可用显存。
在本申请的一个实施例中,工作节点包括多个GPU,通过维护任务队列(Task Queue)和资源池(Resource Pool),将晶圆检测任务分配给空闲的GPU。其中,任务队列负责维护需要执行的晶圆检测任务,晶圆检测任务在任务队列中按照“先入先出”的顺序执行。资源池负责维护GPU处于空闲/忙碌状态,资源池中包括空闲的GPU。
示例性的,图3为本申请实施例提供的工作节点的执行流程图,如图3所示,工作节点包括CPU和GPU两部分,工作节点的执行流程会在CPU和GPU两种硬件设备上运行。其中,CPU部分负责晶圆检测任务由工作节点至GPU的调度,晶圆检测任务按先入先出的顺序执行。GPU部分负责晶圆检测任务中晶圆图片的缺陷检测任务,包括晶圆图片的预处理、基于晶圆检测模型的晶圆缺陷检测以及检测结果的后处理。作为一种示例,检测结果包括用于指示晶圆图片中是否存在晶圆缺陷的标签值,多种缺陷类别,每一种缺陷类别对应的置信度以及缺陷位置。检测结果的后处理可以是剔除置信度较低的缺陷类别,仅保留置信度最高的缺陷类别以及该缺陷类别的缺陷位置。
作为一种示例,工作节点从任务队列获取晶圆检测任务,确定资源池中是否有空闲的GPU。若资源池中有空闲的GPU,则工作节点从资源池中随机选出一个空闲的GPU,将晶圆检测任务分配给该空闲的GPU,同时更新该GPU状态为忙碌,待该晶圆检测任务执行完毕后,更新该GPU状态为空闲。若资源池中没有空闲的GPU,则等待直至资源池中有空闲的GPU。
作为一种示例,GPU中预置晶圆检测模型,该模型可以是基于任意深度学习模型训练得到的。使用晶圆检测模型检测晶圆图片中的缺陷需要耗费大量计算资源,通过在GPU 上进行硬件加速,相比于CPU,性能提升可以达到10倍以上,满足晶圆检测的实时性要求。
上文从整体上描述了基于图1所示系统架构的信息处理过程,下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。
图4为本申请实施例提供的晶圆检测任务的处理方法的交互示意图,如图4所示,本实施例提供的处理方法包括如下步骤:
步骤101、存储服务器向资源管理节点发送晶圆检测任务,晶圆检测任务中包括至少一个晶圆图片。
存储服务器可以与一个资源管理节点连接,也可以与多个资源管理节点连接。本实施例的步骤是以一个资源管理节点为例进行方案介绍。
在一些实施例中,存储服务器可以同时向多个资源管理节点发送不同的晶圆检测任务,每个资源管理节点负责将接收到的晶圆检测任务分配给与其连接的一个工作节点。
可选的,存储服务器可以是一种网络存储服务器,例如网络附属存储(Network Attached Storage,简称NAS)服务器。
步骤102、资源管理节点根据每个工作节点的权重值确定目标工作节点。
其中,目标工作节点为多个工作节点中权重值最大的工作节点,每个工作节点的权重值是根据每个工作节点的负载信息确定的用于分配晶圆检测任务的参数。
工作节点的负载信息可以是负载值,负载值用于指示工作节点在预设时段内的负载大小,还可以是用于指示工作节点负载情况的工作参数,工作参数包括工作节点的CPU利用率、可用内存、多个GPU的GPU利用率、可用显存的至少一项。其中,可用内存是指CPU的可用内存,可用显存是指GPU的可用显存。
本实施例的工作节点的权重值越大,被分配到晶圆检测任务的概率越大;工作节点的权重值越小,被分配到晶圆检测任务的概率越小;工作节点的权重值为0,则不会被分配到晶圆检测任务。
可选的,可以默认设置工作节点的权重值为该工作节点的GPU数量,用公式表示为
Figure PCTCN2021097390-appb-000001
其中x∈(1,2,…,n),n为大于或等于2的正整数,n对应图1中工作节点的数量。
步骤103、资源管理节点向目标工作节点发送晶圆检测任务。
步骤104、目标工作节点确定空闲的GPU。
步骤105、目标工作节点向GPU发送晶圆检测任务。
目标工作节点接收资源管理节点发送的晶圆检测任务,确定资源池中是否有空闲的GPU,若资源池中有空闲的GPU,则将晶圆检测任务分配给该空闲的GPU。若资源池中没有空闲的GPU,则等待资源池中出现空闲的GPU,然后将晶圆检测任务分配给空闲的GPU。
作为一种示例,若资源池中有多个空闲的GPU,目标工作节点可以将该晶圆检测任务随机分配给其中一个空闲的GPU。作为另一种示例,若资源池中有多个空闲的GPU,目标工作节点可以根据GPU的显存大小,将晶圆检测任务分配给显存最大的空闲GPU。
步骤106、GPU执行晶圆检测任务。
GPU接收工作节点发送的晶圆检测任务,首先对晶圆检测任务中的晶圆图片进行预处理,得到处理后的晶圆图片。其中,预处理包括对晶圆图片的旋转、裁剪、缩放、数值归一化的至少一项。数据归一化是指将晶圆图片的每个像素点的RGB值和位置信息都归一化到[0,1]之间,归一化到[0,1]的好处是让不同维度的数据值(RGB值,位置)可以通过相同的量纲进行比较,使得各个特征对结果的贡献值相同。处理后的晶圆图片满足晶圆检测模型对输入图片的要求。
然后,将处理后的晶圆图片输入预先训练好的晶圆检测模型,得到检测结果。其中,晶圆检测模型采用任意深度学习模型训练得到。模型输出的检测结果包括:用于指示晶圆图片中是否存在晶圆缺陷的标签,缺陷类别以及缺陷位置的至少一项。
示例性的,标签可以是标签值,例如0表示晶圆图片中不存在晶圆缺陷,1表示晶圆图片中存在晶圆缺陷。缺陷类别可通过缺陷类别的ID指示,示例性的,晶圆缺陷包括划痕缺陷,颗粒缺陷,镀膜不良,边缘覆盖差等缺陷。缺陷位置指示晶圆缺陷的区域,该区域可以是一矩形区域,相应的,矩形区域可通过对角顶点坐标或者四个顶点坐标表示。
可选的,检测结果还包括缺陷类别对应的置信度(可以理解为概率值)。
本实施例提供的晶圆检测任务的处理方法,资源管理节点接收来自存储服务器的晶圆检测任务,根据与资源管理节点连接的每个工作节点的权重值,从多个工作节点中选出目标工作节点,将晶圆检测任务分配给目标工作节点。目标工作节点从资源池中选出一个空闲GPU,将接收到的晶圆检测任务分配给空闲GPU执行。GPU对晶圆检测任务中的晶圆图片进行预处理,将处理后的晶圆图片输入到晶圆检测模型,得出检测结果。通过上述两级任务调度,将晶圆检测任务分散到各个工作节点的GPU上执行,实现工作节点之间以及GPU之间的负载均衡,可满足对海量晶圆图片进行缺陷检测的实时性要求,提高了系统处理晶圆检测任务的吞吐量。
在上述实施例中,工作节点的每个GPU均包括晶圆检测模型,下面结合附图对晶圆检测模型的训练过程进行简要介绍。示例性的,图5为本申请实施例提供的晶圆检测模型训练过程的流程图,如图5所示,晶圆检测模型的训练方法,包括如下步骤:
步骤201、建立初始的晶圆检测模型。
步骤202、获取图像样本以及图像样本的标注结果,其中,图像样本中包含不同晶圆缺陷类别的第一图像样本以及没有晶圆缺陷的第二图像样本,标注结果包括用于指示图像样本中是否存在晶圆缺陷的标签、缺陷类别的ID以及缺陷位置。
本实施例中,第一图像样本包括不同拍摄角度、不同晶圆缺陷类别、不同晶圆缺陷位置、不同表面(晶圆的正面、背面)、不同环境条件(例如光照条件、温度环境、湿度环境等)的晶圆图片。同样的,第二图像样本包括不同拍摄角度、不同表面、不同环境条件的没有晶圆缺陷的晶圆图片。
步骤203、将图像样本作为晶圆检测模型的输入,将图像样本的标注结果作为晶圆检测模型的输出,对初始的晶圆检测模型进行训练,得到晶圆检测模型。
本实施例的GPU采用上述晶圆检测模型对晶圆图片进行缺陷检测,可大大提高检测的准确率和检测效率。
在上述实施例的基础上,下面结合附图对晶圆检测任务的处理方法进行详细说明。示例性的,图6为本申请实施例提供的晶圆检测任务的处理方法的交互示意图,如图6所示, 本实施例提供的处理方法包括如下步骤:
步骤301、存储服务器向资源管理节点发送晶圆检测任务,晶圆检测任务中包括至少一个晶圆图片。
步骤302、资源管理节点获取每个工作节点的负载值,负载值用于指示工作节点在预设时段内的负载大小。
在本申请的一个实施例中,上述步骤302,包括:资源管理节点接收来自每个工作节点的工作参数集合;根据每个工作节点的工作参数集合,确定每个工作节点的负载值。其中,工作参数集合包括工作节点的多个GPU的GPU利用率、可用显存,以及工作节点的CPU利用率、可用内存。
资源管理节点可通过如下几种方式接收来自工作节点的工作参数集合:
在一种可能的实现方式中,资源管理节点周期性地接收来自每个工作节点的工作参数集合。
可选的,资源管理节点可以周期性地向多个工作节点发送查询请求,查询请求用于请求工作节点当前的工作状态。每个工作节点响应于查询请求,向资源管理节点发送查询响应,查询响应包括工作节点当前的工作参数集合。
可选的,资源管理节点与工作节点约定工作参数集合的上报周期,每个工作节点根据上报周期主动向资源管理节点上报工作参数集合。
在一种可能的实现方式中,每个工作节点的晶圆检测任务开始执行或结束执行时,资源管理节点接收来自每个工作节点的工作参数集合。例如,工作节点1将晶圆检测任务分配给GPU1,GPU1开始执行该晶圆检测任务时,工作节点1向资源管理节点上报工作节点1的工作参数集合。又例如,工作节点1的GPU1在结束执行晶圆检测任务时,工作节点1向资源管理节点上报工作节点1的工作参数集合。
作为一种示例,资源管理节点根据每个工作节点的工作参数集合,确定每个工作节点的负载值,包括:资源管理节点根据预设权重值集合以及每个工作节点的工作参数集合,确定每个工作节点的负载值。其中,预设权重值集合包括用于指示GPU利用率、可用显存、CPU利用率以及可用内存的权重系数。
具体的,资源管理节点可通过如下公式确定每个工作节点的负载值:
Figure PCTCN2021097390-appb-000002
式中,L x表示第x个工作节点的负载值,其中x∈(1,2,…,n),n为大于或等于2的正整数。
Figure PCTCN2021097390-appb-000003
表示第x个工作节点的第i个GPU的GPU利用率,
Figure PCTCN2021097390-appb-000004
表示第x个工作节点的所有GPU的平均GPU利用率,其中i∈(1,2,…,k),k为大于或等于2的正整数。
Figure PCTCN2021097390-appb-000005
表示第x个工作节点的第i个GPU的可用显存容量,U CPU表示第x个工作节点的CPU利用率,A RAM表示第x个工作节点的可用内存容量。
f GPU表示GPU利用率的权重系数,f VRAM表示可用显存的权重系数,f CPU表示CPU利用率的权重系数,f RAM表示可用内存的权重系数。
在本申请的一个实施例中,上述步骤302包括:资源管理节点接收来自每个工作节点的负载值,每个工作节点的负载值是每个工作节点根据工作参数集合确定的。上述实例的 工作节点直接向资源管理节点上报其负载值,可减轻资源管理节点的计算量,节省了部分计算资源,可提升资源管理节点的处理性能。
步骤303、资源管理节点根据多个工作节点的多个负载值,确定每个工作节点的权重值。其中,工作节点的负载值与工作节点的权重值负相关,即工作节点的负载值越大,工作节点的权重值越小,工作节点的负载值越小,工作节点的权重值越大。
在本申请的一个实施例中,上述步骤303,包括:资源管理节点获取多个工作节点的多个负载值,根据多个负载值确定负载阈值,根据负载阈值以及多个工作节点的多个负载值,确定每个工作节点的权重值。其中,负载阈值用于指示多个工作节点的平均负载大小。负载阈值是一个动态变化的阈值,资源管理节点可通过如下公式计算负载阈值T:
Figure PCTCN2021097390-appb-000006
式中,λ为系数,λ∈[1,+∞),默认将λ设置为2。
一种可能的情况,资源管理节点确定某一工作节点的负载值大于负载阈值
Figure PCTCN2021097390-appb-000007
Figure PCTCN2021097390-appb-000008
则将该工作节点的权重值设置为1。该工作节点被分配到任务的概率降低为
Figure PCTCN2021097390-appb-000009
上述情况的工作节点被认定为亚健康状态,表明该工作节点的负载过大,因此尽量不向该工作节点分配新任务,可以降低该工作节点被分配到新任务的概率。
一种可能的情况,资源管理节点确定某一工作节点的负载值小于或等于负载阈值
Figure PCTCN2021097390-appb-000010
则将该工作节点的权重值设置为该工作节点的GPU数量
Figure PCTCN2021097390-appb-000011
该工作节点被分配到任务的概率提升为
Figure PCTCN2021097390-appb-000012
上述情况的工作节点被认定为健康状态,表明该工作节点的负载较小,因此优先考虑向该工作节点分配新任务,可以提升该工作节点被分配到新任务的概率。
除了上述两种情况之外,还有一种特殊情况:某一工作节点与资源管理节点之间原本存在连接关系,但是由于网络问题或设备问题,该工作节点与资源管理节点处于断线状态,该工作节点被认定为不健康状态,资源管理节点不再考虑向该工作节点分配新任务,可以将该工作节点的权重值设置为0。
在一种可能的实现方式中,资源管理节点可以定期计算上述负载阈值,得到最新的负载阈值,并基于该负载阈值调整各个工作节点的权重值。该方式占用资源管理节点的资源少,但各个工作节点权重值的更新不够及时。
在另一种可能的实现方式中,资源管理节点在接收到工作节点上报的负载信息(只要有一个工作节点上报更新的负载信息)后,重新计算负载阈值,得到更新的负载阈值,并基于该负载阈值调整各个工作节点的权重值。该方式各个工作节点权重值的更新及时,但占用资源高。
步骤304、资源管理节点根据每个工作节点的权重值确定目标工作节点。
步骤305、资源管理节点向目标工作节点发送晶圆检测任务。
步骤306、目标工作节点确定空闲的GPU。
步骤307、目标工作节点向GPU发送晶圆检测任务。
步骤308、GPU执行晶圆检测任务。
本实施例的步骤304至步骤308与上述实施例的步骤102至步骤106相同,具体可参见上述实施例,此处不再赘述。
本实施例提供的晶圆检测任务的处理方法,资源管理节点接收来自存储服务器的 晶圆检测任务,获取每个工作节点的权重值以及当前的负载阈值,通过比较每个工作节点的权重值与当前的负载阈值的大小关系,调整每个工作节点的权重值。再根据每个工作节点的权重值,从多个工作节点中选出目标工作节点,将晶圆检测任务分配给目标工作节点。目标工作节点从多个GPU中选出一个空闲GPU,将晶圆检测任务分配给该空闲GPU执行。通过上述两级任务调度,将晶圆检测任务分散到各个工作节点的GPU上执行,实现工作节点之间以及GPU之间的负载均衡,可满足对海量晶圆图片进行缺陷检测的实时性要求,提高了系统处理晶圆检测任务的吞吐量。
本申请实施例可以根据上述方法实施例对处理装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以使用硬件的形式实现,也可以使用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。下面以使用对应各个功能划分各个功能模块为例进行说明。
图7为本申请实施例提供的晶圆检测任务的处理装置的结构示意图。如图7所示,本实施例的晶圆检测任务的处理装置400与多个工作节点连接,处理装置400包括:
接收模块401,用于接收来自存储服务器的晶圆检测任务,所述晶圆检测任务中包括至少一个晶圆图片;
处理模块402,用于根据每个所述工作节点的权重值确定目标工作节点;其中,所述目标工作节点为所述多个工作节点中权重值最大的工作节点,每个工作节点的权重值是根据每个工作节点的负载信息确定的用于分配晶圆检测任务的参数;
发送模块403,用于向所述目标工作节点发送所述晶圆检测任务。
在本申请的一个实施例中,处理模块402根据每个所述工作节点的权重值确定目标工作节点之前,还用于:
获取每个所述工作节点的负载值,所述负载值用于指示所述工作节点在预设时段内的负载大小;
根据所述多个工作节点的多个负载值,确定每个所述工作节点的权重值;其中,工作节点的负载大小与工作节点的权重值负相关。
在本申请的一个实施例中,接收模块401,还用于接收来自每个所述工作节点的工作参数集合,所述工作参数集合包括所述工作节点的多个GPU的GPU利用率、可用显存,所述工作节点的中央处理器CPU利用率、可用内存的至少一项;
处理模块402,还用于根据每个所述工作节点的工作参数集合,确定每个所述工作节点的负载值。
在本申请的一个实施例中,接收模块401,还用于接收来自每个所述工作节点的负载值,所述每个所述工作节点的负载值是每个所述工作节点根据工作参数集合确定的。
在本申请的一个实施例中,处理模块402,具体用于:
根据预设权重值集合以及每个所述工作节点的工作参数集合,确定每个所述工作节点的负载值;其中,所述预设权重值集合包括用于指示所述GPU利用率、所述可用显存、所述CPU利用率以及所述可用内存的权重系数。
在本申请的一个实施例中,接收模块401,具体用于:
周期性地接收来自每个所述工作节点的工作参数集合;或者
每个所述工作节点的晶圆检测任务开始执行或结束执行时,接收来自每个所述工作节点的工作参数集合。
在本申请的一个实施例中,处理模块402,具体用于:
获取所述多个工作节点的多个负载值;
根据所述多个负载值确定负载阈值,所述负载阈值用于指示所述多个工作节点的平均负载大小;
根据所述负载阈值以及所述多个工作节点的多个负载值,确定每个所述工作节点的权重值。
在本申请的一个实施例中,处理模块402,具体用于:
将负载值大于所述负载阈值的工作节点的权重值设置为1;或者
将负载值小于或等于所述负载阈值的工作节点的权重值设置为所述工作节点的GPU数量。
本申请实施例提供的处理装置,用于执行前述任一方法实施例中资源管理节点的各个步骤,其实现原理和技术效果类似,在此不再赘述。
图8为本申请实施例提供的晶圆检测任务的处理装置的硬件示意图。如图8所示,本实施例的晶圆检测任务的处理装置500,包括:
至少一个处理器501(图8中仅示出了一个处理器);以及
与所述至少一个处理器通信连接的存储器502;其中,
所述存储器502存储有可被所述至少一个处理器501执行的指令,所述指令被所述至少一个处理器501执行,以使所述处理装置500能够执行前述任一方法实施例中资源管理节点的各个步骤。
可选的,存储器502既可以是独立的,也可以跟处理器501集成在一起。
当存储器502是独立于处理器501之外的器件时,处理装置500还包括:总线503,用于连接存储器502和处理器501。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当所述计算机执行指令被处理器执行时用于实现前述任一方法实施例中资源管理节点的技术方案。
本申请实施例还提供一种晶圆检测系统,可参照图1,该系统包括:
至少一个资源管理节点,与资源管理节点连接的多个工作节点,其中,每个工作节点包括多个图形处理器GPU,资源管理节点用于执行前述任一方法实施例中资源管理节点的各个步骤。
在本申请的一个实施例中,每个工作节点在接收到资源管理节点发送的晶圆检测任务之后,从工作节点中的多个GPU中选择空闲的GPU,并将晶圆检测任务分配给空闲的GPU执行。GPU用于执行晶圆检测任务。
在本申请的一个实施例中,GPU中包括晶圆检测模型,晶圆检测模型采用深度学习模型训练得到,用于检测晶圆检测任务中的每一个晶圆图片中的晶圆是否存在缺陷,缺陷类别以及缺陷位置。
本申请实施例提供的晶圆检测系统是基于GPU集群的分布式架构的检测系统,可实现对缺陷检测机台产生的海量晶圆图片的实时检测,可协助生产部门快速锁定有问题的缺 陷检测机台,有助于及早找出制程问题,及时回复制成偏差,并大幅度减少查找问题机台而进行检测的成本,可提高制程良率并降低生产成本。
应理解,本申请实施例中提及的处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)集成在处理器中。
应注意,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。

Claims (13)

  1. 一种晶圆检测任务的处理方法,其特征在于,应用于资源管理节点,所述资源管理节点与多个工作节点连接,所述方法包括:
    接收来自存储服务器的晶圆检测任务,所述晶圆检测任务中包括至少一个晶圆图片;
    根据每个所述工作节点的权重值确定目标工作节点;其中,所述目标工作节点为所述多个工作节点中权重值最大的工作节点,每个工作节点的权重值是根据每个工作节点的负载信息确定的用于分配晶圆检测任务的参数;
    向所述目标工作节点发送所述晶圆检测任务。
  2. 根据权利要求1所述的方法,其特征在于,所述根据每个所述工作节点的权重值确定目标工作节点之前,所述方法还包括:
    获取每个所述工作节点的负载值,所述负载值用于指示所述工作节点在预设时段内的负载大小;
    根据所述多个工作节点的多个负载值,确定每个所述工作节点的权重值;其中,工作节点的负载大小与工作节点的权重值负相关。
  3. 根据权利要求2所述的方法,其特征在于,所述获取每个所述工作节点的负载值,包括:
    接收来自每个所述工作节点的工作参数集合,所述工作参数集合包括所述工作节点的多个GPU的GPU利用率、可用显存,所述工作节点的中央处理器CPU利用率、可用内存的至少一项;
    根据每个所述工作节点的工作参数集合,确定每个所述工作节点的负载值。
  4. 根据权利要求2所述的方法,其特征在于,所述获取每个所述工作节点的负载值,包括:
    接收来自每个所述工作节点的负载值,所述每个所述工作节点的负载值是每个所述工作节点根据工作参数集合确定的。
  5. 根据权利要求3所述的方法,其特征在于,所述根据每个所述工作节点的工作参数集合,确定每个所述工作节点的负载值,包括:
    根据预设权重值集合以及每个所述工作节点的工作参数集合,确定每个所述工作节点的负载值;其中,所述预设权重值集合包括用于指示所述GPU利用率、所述可用显存、所述CPU利用率以及所述可用内存的权重系数。
  6. 根据权利要求3所述的方法,其特征在于,所述接收来自每个所述工作节点的工作参数集合,包括:
    周期性地接收来自每个所述工作节点的工作参数集合;或者
    每个所述工作节点的晶圆检测任务开始执行或结束执行时,接收来自每个所述工作节点的工作参数集合。
  7. 根据权利要求2所述的方法,其特征在于,所述根据所述多个工作节点的多个负载值,确定每个所述工作节点的权重值,包括:
    获取所述多个工作节点的多个负载值;
    根据所述多个负载值确定负载阈值,所述负载阈值用于指示所述多个工作节点的平均负载大小;
    根据所述负载阈值以及所述多个工作节点的多个负载值,确定每个所述工作节点的权重值。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述负载阈值以及所述多个工作节点的多个负载值,确定每个所述工作节点的权重值,包括:
    将负载值大于所述负载阈值的工作节点的权重值设置为1;或者
    将负载值小于或等于所述负载阈值的工作节点的权重值设置为所述工作节点的GPU数量。
  9. 一种晶圆检测任务的处理装置,其特征在于,所述处理装置与多个工作节点连接,所述处理装置包括:
    接收模块,用于接收来自存储服务器的晶圆检测任务,所述晶圆检测任务中包括至少一个晶圆图片;
    处理模块,用于根据每个所述工作节点的权重值确定目标工作节点;其中,所述目标工作节点为所述多个工作节点中权重值最大的工作节点,每个工作节点的权重值是根据每个工作节点的负载信息确定的用于分配晶圆检测任务的参数;
    发送模块,用于向所述目标工作节点发送所述晶圆检测任务。
  10. 一种晶圆检测任务的处理装置,其特征在于,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述处理装置能够执行权利要求1-8中任一项所述的方法。
  11. 一种晶圆检测系统,其特征在于,包括:
    至少一个资源管理节点,与所述资源管理节点连接的多个工作节点;每个所述工作节点包括多个图形处理器GPU;
    所述资源管理节点用于执行如权利要求1-8中任一项所述的方法。
  12. 根据权利要求11所述的系统,其特征在于,每个所述工作节点在接收到所述资源管理节点发送的晶圆检测任务之后,从所述工作节点中的多个GPU中选择空闲的GPU,并将所述晶圆检测任务分配给所述空闲的GPU执行。
  13. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当所述计算机执行指令被处理器执行时,使得所述处理器能够执行权利要求1-8中任一项所述的方法。
PCT/CN2021/097390 2020-09-11 2021-05-31 晶圆检测任务的处理方法、装置、系统及存储介质 WO2022052523A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21769327.4A EP3992878B1 (en) 2020-09-11 2021-05-31 Method and apparatus for processing wafer inspection task, system, and storage medium
US17/400,431 US12085916B2 (en) 2020-09-11 2021-08-12 Method and device for processing wafer detection tasks, system, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010955527.5 2020-09-11
CN202010955527.5A CN114168310A (zh) 2020-09-11 2020-09-11 晶圆检测任务的处理方法、装置、系统及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/400,431 Continuation US12085916B2 (en) 2020-09-11 2021-08-12 Method and device for processing wafer detection tasks, system, and storage medium

Publications (1)

Publication Number Publication Date
WO2022052523A1 true WO2022052523A1 (zh) 2022-03-17

Family

ID=78806445

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097390 WO2022052523A1 (zh) 2020-09-11 2021-05-31 晶圆检测任务的处理方法、装置、系统及存储介质

Country Status (2)

Country Link
CN (1) CN114168310A (zh)
WO (1) WO2022052523A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103946966A (zh) * 2011-11-18 2014-07-23 富士机械制造株式会社 晶圆关联数据管理方法及晶圆关联数据生成装置
CN108933822A (zh) * 2018-06-28 2018-12-04 北京百度网讯科技有限公司 用于处理信息的方法和装置
CN110969598A (zh) * 2018-09-28 2020-04-07 台湾积体电路制造股份有限公司 晶圆检查方法以及晶圆检查系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103946966A (zh) * 2011-11-18 2014-07-23 富士机械制造株式会社 晶圆关联数据管理方法及晶圆关联数据生成装置
CN108933822A (zh) * 2018-06-28 2018-12-04 北京百度网讯科技有限公司 用于处理信息的方法和装置
CN110969598A (zh) * 2018-09-28 2020-04-07 台湾积体电路制造股份有限公司 晶圆检查方法以及晶圆检查系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3992878A4

Also Published As

Publication number Publication date
CN114168310A (zh) 2022-03-11

Similar Documents

Publication Publication Date Title
US20200357109A1 (en) Method for detecting display screen quality, apparatus, electronic device and storage medium
CN113096098B (zh) 基于深度学习的铸件外观缺陷检测方法
KR102065821B1 (ko) 디자인 데이터를 사용하여 반도체 웨이퍼 상의 반복적인 결함을 검출하기 위한 방법 및 시스템
WO2022147966A1 (zh) 晶圆检测方法、装置、设备及存储介质
KR20200000451A (ko) 반도체 검사 및 계측 시스템들을 위한 확장가능하고 유연한 작업 분배 아키텍처
CN111815564B (zh) 一种检测丝锭的方法、装置及丝锭分拣系统
US11354156B2 (en) Master device for managing distributed processing of task by using resource information
US7739064B1 (en) Inline clustered defect reduction
US10885620B2 (en) Neural network image processing system
US10636133B2 (en) Automated optical inspection (AOI) image classification method, system and computer-readable media
WO2021120179A1 (zh) 产品制造消息处理方法、设备和计算机存储介质
CN110307903B (zh) 一种家禽特定部位无接触温度动态测量的方法
CN117311989B (zh) 一种gpu集群动态功率管理系统及方法
CN112506637A (zh) 图像数据处理方法、装置、计算机设备和存储介质
CN114372980A (zh) 一种工业缺陷检测方法及系统
WO2022052523A1 (zh) 晶圆检测任务的处理方法、装置、系统及存储介质
US11928808B2 (en) Wafer detection method, device, apparatus, and storage medium
CN111091163B (zh) 一种最小距离分类方法、装置、计算机设备和存储介质
Chien et al. Image-based defect classification for TFT-LCD array via convolutional neural network
CN116205918B (zh) 基于图卷积的多模态融合半导体检测方法、装置及介质
US12085916B2 (en) Method and device for processing wafer detection tasks, system, and storage medium
CN114359300B (zh) 一种图像分割模型的优化方法、装置、系统及存储介质
CN116524296A (zh) 设备缺陷检测模型的训练方法、装置和设备缺陷检测方法
JP5562656B2 (ja) パターン評価システム、パターン評価方法および半導体装置の製造方法
Yang et al. An improved yolov3 algorithm for pedestrian detection on uav imagery

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021769327

Country of ref document: EP

Effective date: 20210921

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21769327

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE