JP2016014975A

JP2016014975A - Data calculation device, data calculation method, and defect inspection device

Info

Publication number: JP2016014975A
Application number: JP2014136056A
Authority: JP
Inventors: 拓矢安田; Takuya Yasuda
Original assignee: Screen Holdings Co Ltd
Current assignee: Screen Holdings Co Ltd
Priority date: 2014-07-01
Filing date: 2014-07-01
Publication date: 2016-01-28
Anticipated expiration: 2034-07-01
Also published as: JP6460660B2

Abstract

PROBLEM TO BE SOLVED: To reduce a time required for performing calculation processing on a small number of pieces of run data as well as a large number of pieces of run data.SOLUTION: There is provided a data calculation device that performs calculation processing on a run data group comprising a plurality of pieces of run data obtained by run-lengthening binary image data, the data calculation device including: a plurality of first thread processing parts that perform calculation processing on single run data; and a processing control part that can execute first parallel calculation processing of causing first thread processing parts different from each other to perform calculations in parallel on each run data forming the run data group.

Description

この発明は、ランデータについて演算処理を施すデータ演算技術ならびに当該データ演算技術を用いる欠陥検査装置に関するものである。 The present invention relates to a data calculation technique for performing calculation processing on run data and a defect inspection apparatus using the data calculation technique.

半導体基板やプリント基板等の製造技術分野では、製品に含まれる欠陥を検出し、さらに当該欠陥を分析・評価するために、評価対象物を顕微鏡等により撮像し、得られた画像から欠陥部を含む画像を抽出する。そして、抽出した画像に対してマスク画像によるマスク処理を実行することで欠陥画像を正確に求めている。このマスク処理を行うために、例えば特許文献１に記載の論理演算が上記画像のランデータに対して実行される。 In the field of manufacturing technology such as semiconductor substrates and printed circuit boards, in order to detect defects contained in products, and to analyze and evaluate the defects, the evaluation object is imaged with a microscope or the like. Extract the containing image. Then, the defect image is accurately obtained by executing a mask process using the mask image on the extracted image. In order to perform this masking process, for example, a logical operation described in Patent Document 1 is executed on the run data of the image.

特開平７−２０３１７８号公報JP-A-7-203178

特許文献１に記載の発明は、演算処理対象となる２つの画像のランデータ群をビットマップデータに変換することなく、２つの画像の論理積、論理和および排他的論理和を求めている。より具体的には、演算処理対象となる２つのランデータ群を一方端のランから順に接続判定し、重なっている部分に応じて論理積、論理和および排他的論理和の判定を行っている。このようにビットマップデータへの変換を行うことなく、ランデータのまま演算処理を行うことで演算処理に要する時間を短縮することができる。 In the invention described in Patent Document 1, the logical product, logical sum, and exclusive logical sum of two images are obtained without converting the run data groups of the two images to be processed into bitmap data. More specifically, two run data groups to be processed are determined to be connected in order from one end of the run, and logical product, logical sum, and exclusive logical sum are determined according to overlapping portions. . Thus, the time required for the arithmetic processing can be shortened by performing the arithmetic processing with the run data without performing the conversion to the bitmap data.

しかしながら、一方端のランから順に処理を行っているためにランデータの本数が多くなるにしたがって接続判定に時間を要してしまう。このような問題は、上記欠陥検査装置で行われるランデータをそのまま用いた他の演算処理、例えばラベリング関連処理（面積計算、外接矩形計算、重心計算など）においても生じる。また上記欠陥検査装置以外の装置においてランデータをそのまま用いて行う演算処理、例えば膨張収縮処理、座標反転処理、回転処理、ラン反転処理などにおいても生じる。 However, since the processing is performed in order from one end of the run, it takes time to determine the connection as the number of run data increases. Such a problem also occurs in other arithmetic processing using the run data performed by the defect inspection apparatus as it is, for example, labeling related processing (area calculation, circumscribed rectangle calculation, centroid calculation, etc.). This also occurs in arithmetic processing performed using run data as it is in an apparatus other than the defect inspection apparatus, for example, expansion / contraction processing, coordinate inversion processing, rotation processing, run inversion processing, and the like.

この発明は上記課題に鑑みなされたものであり、ランデータの本数が少ない時はもちろんのこと、当該本数が多くなったとしてもランデータについて演算処理を施す際に要する時間を短縮することができるデータ演算技術ならびに当該データ演算技術を用いて欠陥検査に要する時間の短縮を図ることができる欠陥検査装置を提供することを目的とする。 The present invention has been made in view of the above problems, and can reduce the time required to perform arithmetic processing on run data as well as when the number of run data is small. It is an object of the present invention to provide a data inspection technique and a defect inspection apparatus capable of shortening the time required for defect inspection using the data calculation technique.

この発明の第１態様は、二値画像データをランレングス化して得られる複数のランデータで構成されるランデータ群に対して演算処理を行うデータ演算装置であって、単一のランデータに対して演算処理を行う複数の第１スレッド処理部と、ランデータ群を構成する各ランデータについて互いに異なる第１スレッド処理部で並列して演算させる第１並列演算処理を実行可能な処理制御部とを備えることを特徴としている。 A first aspect of the present invention is a data arithmetic device that performs arithmetic processing on a run data group composed of a plurality of run data obtained by converting binary image data into run lengths, and includes a single run data. A plurality of first thread processing units that perform arithmetic processing on the processing, and a processing control unit capable of executing a first parallel arithmetic processing that causes each run data constituting the run data group to perform parallel calculations in different first thread processing units It is characterized by comprising.

また、この発明の第２態様は、二値画像データをランレングス化して得られる複数のランデータに対して演算処理を行うデータ演算方法であって、ランデータ毎に演算処理を行う第１スレッド処理を並列して行う第１並列演算工程を備えることを特徴としている。 According to a second aspect of the present invention, there is provided a data calculation method for performing arithmetic processing on a plurality of run data obtained by converting binary image data to run length, wherein the first thread performs arithmetic processing for each run data. It is characterized by comprising a first parallel operation step for performing processing in parallel.

さらに、この発明の第３態様は、欠陥検査装置であって、検査対象画像を取得する画像取得部と、検査対象画像を検査して欠陥部位が含まれる抽出画像を抽出する画像抽出部と、抽出画像をランレングス化して得られる複数のランデータを有する抽出ランデータ群と、マスク画像をランレングス化して得られる複数のランデータを有するマスクランデータ群との論理演算を行うことで、抽出画像のうち欠陥部位以外の部位をマスク画像によりマスク処理して欠陥画像データを得るデータ演算部とを備え、データ演算部は、単一のランデータに対して論理演算を行う複数の第１スレッド処理部と、抽出ランデータ群を構成する各ランデータについて互いに異なる第１スレッド処理部で並列して演算させる第１並列演算処理を実行可能な処理制御部とを有することを特徴としている。 Furthermore, the third aspect of the present invention is a defect inspection apparatus, an image acquisition unit that acquires an inspection target image, an image extraction unit that inspects the inspection target image and extracts an extracted image including a defective part, Extraction is performed by performing a logical operation on an extracted run data group having a plurality of run data obtained by converting the extracted image to run length and a mask run data group having a plurality of run data obtained by converting the mask image to run length. A data operation unit that obtains defect image data by masking a part other than the defect part in the image with a mask image, and the data operation unit includes a plurality of first threads that perform a logical operation on a single run data. Process control capable of executing a first parallel operation process in which a processor and each run data constituting the extracted run data group are operated in parallel by different first thread processing units. It is characterized by having and.

以上のように、本発明では、二値画像データをランレングス化して得られる複数のランデータで構成されるランデータ群に対する演算処理が次のようにして実行される。つまり、ランデータ毎に演算処理（論理積処理、論理和処理、ラベリング処理など）を行う第１スレッド処理が並列して実行される。このため、ランデータ群を構成するランデータの要素数にかかわらず、ランデータ群に対して演算処理を施す際に要する時間を安定的に短縮することができる。 As described above, in the present invention, the arithmetic processing for the run data group composed of a plurality of run data obtained by converting the binary image data into run lengths is executed as follows. That is, the first thread processing for performing arithmetic processing (logical product processing, logical sum processing, labeling processing, etc.) for each run data is executed in parallel. For this reason, regardless of the number of elements of the run data constituting the run data group, it is possible to stably reduce the time required for performing arithmetic processing on the run data group.

本発明にかかるデータ演算装置の第１実施形態を示す図である。It is a figure showing a 1st embodiment of a data operation device concerning the present invention. 図１のデータ演算装置によるデータ演算動作を示すフローチャートである。It is a flowchart which shows the data calculation operation | movement by the data calculation apparatus of FIG. 図１のデータ演算装置によるデータ演算動作を模式的に示す図である。It is a figure which shows typically the data calculation operation | movement by the data calculation apparatus of FIG. 図１のデータ演算装置でのラン要素数に対する演算処理に要する時間の変化を示すグラフである。It is a graph which shows the change of the time required for the arithmetic processing with respect to the number of run elements in the data arithmetic device of FIG. 比較例における演算動作を模式的に示す図である。It is a figure which shows typically the calculating operation in a comparative example. 比較例においてラン要素数に対する演算処理に要する時間の変化を示すグラフである。It is a graph which shows the change of the time required for the arithmetic processing with respect to the number of run elements in a comparative example. 本発明にかかるデータ演算装置の第２実施形態を示す図である。It is a figure which shows 2nd Embodiment of the data arithmetic unit concerning this invention. 図７のデータ演算装置によるデータ演算動作を示すフローチャートである。It is a flowchart which shows the data calculation operation | movement by the data calculation apparatus of FIG. 図７のデータ演算装置によるデータ演算動作を示すフローチャートである。It is a flowchart which shows the data calculation operation | movement by the data calculation apparatus of FIG. 図１や図７のデータ演算装置を装備する欠陥検査装置の一例を示す図である。It is a figure which shows an example of the defect inspection apparatus equipped with the data arithmetic unit of FIG.1 and FIG.7. 図１０に示す欠陥検査装置の画像処理部の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the image process part of the defect inspection apparatus shown in FIG.

Ａ．第１実施形態
図１は本発明にかかるデータ演算装置の第１実施形態を示す図である。また、図２は図１のデータ演算装置によるデータ演算動作を示すフローチャートである。さらに、図３は図１のデータ演算装置によるデータ演算動作を模式的に示す図である。このデータ演算装置１００は、外部装置から与えられる二値画像データをランレングス化して複数のランデータからなるランデータ群を生成するとともに、複数のランデータで構成されるランデータ群に対して論理積、論理和、排他的論理和、ラベリング関連処理（面積計算、外接矩形計算、重心計算など）、膨張収縮処理、座標反転処理、回転処理、ラン反転処理などの演算処理を行う装置である。 A. First Embodiment FIG. 1 is a diagram showing a first embodiment of a data operation apparatus according to the present invention. FIG. 2 is a flowchart showing a data operation performed by the data operation device of FIG. Further, FIG. 3 is a diagram schematically showing a data operation by the data operation device of FIG. The data operation device 100 generates a run data group composed of a plurality of run data by converting binary image data supplied from an external device into a run length, and performs logical processing on a run data group composed of a plurality of run data. This is an apparatus that performs arithmetic processing such as product, logical sum, exclusive logical sum, labeling related processing (area calculation, circumscribed rectangle calculation, centroid calculation, etc.), expansion / contraction processing, coordinate inversion processing, rotation processing, run inversion processing, and the like.

本実施形態では、データ演算装置１００は、複数のプロセッサコア１１０、記憶部１２０、ラン生成部１３０、並列処理制御部１４０およびデータ初期設定部１５０を有するＧＰＵ（Graphics Processing Unit）で構成されている。このデータ演算装置１００は、装置外部から二値画像データが与えられると、当該二値画像データを記憶部１２０に記憶する。また、ラン生成部１３０は、記憶部１２０に記憶されている二値画像データを適宜読み出し、ランレングス化して複数のランデータで構成されるランデータ群を生成した後で、当該ランデータ群を記憶部１２０に書き込む。なお、ランレングス化処理については従来周知技術が数多く提案されており、本実施形態においても一般的なランレングス化処理をそのまま用いている。当該ランレングス化処理によって得られる各ランデータは、二値画像データにおける当該ランの始端画素位置を示す列インデクス（始端座標）、二値画像データにおける当該ランの終端画素位置を示す列インデクス（終端座標）、ならびに二値画像データにおける当該ランの行インデクスを含んでいる。 In the present embodiment, the data arithmetic device 100 is configured by a GPU (Graphics Processing Unit) having a plurality of processor cores 110, a storage unit 120, a run generation unit 130, a parallel processing control unit 140, and a data initial setting unit 150. . When the binary image data is given from the outside of the apparatus, the data arithmetic apparatus 100 stores the binary image data in the storage unit 120. In addition, the run generation unit 130 appropriately reads the binary image data stored in the storage unit 120, generates a run data group composed of a plurality of run data by performing run length, and then stores the run data group. Write to the storage unit 120. Note that many well-known techniques have been proposed for the run-length processing, and the general run-length processing is used as it is in this embodiment. Each run data obtained by the run length processing includes a column index (start coordinate) indicating the start pixel position of the run in binary image data, and a column index (end) indicating the end pixel position of the run in binary image data. Coordinate), and the row index of the run in the binary image data.

ここで、例えば図３の最上欄に示すようにｐ行×ｑ列の画素の二値画像データＤａ、Ｄｂが与えられ、これらの論理積および論理和を演算する場合、データ演算装置１００はこれら２つの二値画像データＤａ、Ｄｂを記憶部１２０に一時的に記憶するとともに、各二値画像データＤａ、Ｄｂをランレングス化して２つのランデータ群ＲＤａ、ＲＤｂを生成し、記憶部１２０に記憶させる。なお、装置構成および動作の理解を容易にするため、二値画像データＤａをランレングス化して得られるランデータ群ＲＤａを「第１ランデータ群」と称する一方、二値画像データＤｂをランレングス化して得られるランデータ群ＲＤｂを「第２ランデータ群」と称する。 Here, for example, as shown in the uppermost column of FIG. 3, binary image data Da and Db of pixels of p rows × q columns are given, and when the logical product and the logical sum of these are calculated, the data arithmetic device 100 performs these operations. Two binary image data Da and Db are temporarily stored in the storage unit 120, and each binary image data Da and Db is run-lengthed to generate two run data groups RDa and RDb. Remember. In order to facilitate understanding of the apparatus configuration and operation, the run data group RDa obtained by run-lengthing the binary image data Da is referred to as a “first run data group”, while the binary image data Db is run-length. The run data group RDb obtained by the conversion is referred to as a “second run data group”.

各プロセッサコア１１０は単一のランデータに対して演算処理を行う第１スレッド処理部１１１として機能する。この実施形態では、各第１スレッド処理部１１１は第１ランデータ群ＲＤａを構成するランデータ毎に第２ランデータ群ＲＤｂとの論理積および論理和を求める演算機能を有している。また、図３中の最上欄に示す二値画像データＤａ、Ｄｂをランレングス化して得られる各行でのランデータの最大数は、値ｑが偶数のときに（ｑ／２）であり、値ｑが奇数のときに（ｑ／２）＋１）である。したがって、ランレングス化処理によって得られるランデータの要素数の最大値Ｍは、
値ｑが偶数のとき、（ｑ／２）×ｐ
値ｑが奇数のとき、（（ｑ／２）＋１）×ｐ、
である。 Each processor core 110 functions as a first thread processing unit 111 that performs arithmetic processing on a single run data. In this embodiment, each first thread processing unit 111 has an arithmetic function for obtaining a logical product and a logical sum with the second run data group RDb for each run data constituting the first run data group RDa. Further, the maximum number of run data in each row obtained by run lengthing the binary image data Da and Db shown in the uppermost column in FIG. 3 is (q / 2) when the value q is an even number. (q / 2) +1) when q is an odd number. Therefore, the maximum value M of the number of elements of run data obtained by the run length process is
When the value q is an even number, (q / 2) × p
When the value q is an odd number, ((q / 2) +1) × p,
It is.

そこで、本実施形態では、Ｍ個以上のプロセッサコア１１０を有するＧＰＵをデータ演算装置１００として用いており、後述するように第１ランデータ群ＲＤａを構成する全ランデータについて上記演算処理を並列して行うことが可能となっている。そして、各プロセッサコア１１０での演算結果（ランデータ）が記憶部１２０に書き込まれる。なお、図１においては、第１番目のプロセッサコア（第１プロセッサコア）１１０から第Ｍ番目のプロセッサコア（第Ｍプロセッサコア）１１０までを図示し、それ以外のプロセッサコア１１０については図示を省略している。もちろん、データ演算装置１００が有するプロセッサコア１１０の個数をきっちりＭ個に設定してもよいことはいうまでもない。 Therefore, in this embodiment, a GPU having M or more processor cores 110 is used as the data arithmetic device 100, and the arithmetic processing is performed in parallel for all run data constituting the first run data group RDa as described later. Can be done. Then, the calculation result (run data) in each processor core 110 is written in the storage unit 120. In FIG. 1, the first processor core (first processor core) 110 to the Mth processor core (Mth processor core) 110 are shown, and the other processor cores 110 are not shown. doing. Of course, it goes without saying that the number of processor cores 110 included in the data arithmetic device 100 may be set to exactly M.

また、複数のプロセッサコア１１０を並列して作動させるために、本実施形態では、並列処理制御部１４０が設けられている。この並列処理制御部１４０は、各プロセッサコア１１０で実行されるスレッド処理を制御しながら並列して実行させる機能を有している。さらに、データ初期設定部１５０は記憶部１２０に記憶されている二値画像データ、各ランデータに含まれる各種情報（始端座標、終端座標、行インデクス）をゼロクリアする。なお、本実施形態では、二値画像データＤａ、Ｄｂを受け取り、それらの二値画像データＤａ、Ｄｂをランレングス化して第１ランデータ群ＲＤａおよび第２ランデータ群ＲＤｂを生成しているが、それらのランデータ群のうちの少なくとも一方を装置外部から受け取って記憶部１２０に記憶させてランレングス化処理を省略してもよい。 In addition, in order to operate a plurality of processor cores 110 in parallel, a parallel processing control unit 140 is provided in the present embodiment. The parallel processing control unit 140 has a function of executing thread processing executed in each processor core 110 in parallel while controlling the thread processing. Further, the data initial setting unit 150 clears the binary image data stored in the storage unit 120 and various types of information (start coordinate, end coordinate, and row index) included in each run data to zero. In the present embodiment, the binary image data Da and Db are received, and the binary image data Da and Db are run-lengthed to generate the first run data group RDa and the second run data group RDb. The run length process may be omitted by receiving at least one of the run data groups from the outside of the apparatus and storing it in the storage unit 120.

次に、上記のように構成されたデータ演算装置１００によるデータ演算動作について説明する。このデータ演算装置１００は、装置外部から論理演算の対象となる２つの二値画像データＤａ、Ｄｂが指定されると、それらの二値画像データＤａ、Ｄｂを装置外部から取得して記憶部１２０に記憶する（ステップＳ１）。そして、ラン生成部１３０が記憶部１２０から二値画像データＤａを読み出し、当該二値画像データＤａに対してランレングス化処理を施してランデータｒａ［１］〜ｒａ［ｍ］を取得し、これらのランデータｒａ［１］〜ｒａ［ｍ］からなる第１ランデータ群ＲＤａ（図３参照）を記憶部１２０に記憶する（ステップＳ２）。なお、上記したようにｐ行×ｑ列の二値画像データをランレングス化したときのランデータｒａの最大要素数は値Ｍであり、値Ｍと値ｍとの関係は
ｍ≦Ｍ
となる。 Next, a data operation performed by the data operation device 100 configured as described above will be described. When two binary image data Da and Db to be subjected to a logical operation are designated from the outside of the apparatus, the data arithmetic apparatus 100 acquires the binary image data Da and Db from the outside of the apparatus and stores the storage unit 120. (Step S1). Then, the run generation unit 130 reads the binary image data Da from the storage unit 120, performs run length processing on the binary image data Da, and acquires run data ra [1] to ra [m]. A first run data group RDa (see FIG. 3) including these run data ra [1] to ra [m] is stored in the storage unit 120 (step S2). As described above, the maximum number of elements of the run data ra when the binary image data of p rows × q columns is run-length is the value M, and the relationship between the value M and the value m is m ≦ M
It becomes.

また、ラン生成部１３０が記憶部１２０から二値画像データＤｂを読み出し、当該二値画像データＤｂに対してランレングス化処理を施してランデータｒｂ［１］〜ｒｂ［ｎ］を取得し、これらのランデータｒｂ［１］〜ｒｂ［ｎ］からなる第２ランデータ群ＲＤｂ（図３参照）を記憶部１２０に記憶する（ステップＳ３）。これらステップＳ２、Ｓ３の順序は本実施形態での順序に限定されるものではなく、任意である。 In addition, the run generation unit 130 reads the binary image data Db from the storage unit 120, performs run length processing on the binary image data Db, and acquires run data rb [1] to rb [n]. A second run data group RDb (see FIG. 3) including these run data rb [1] to rb [n] is stored in the storage unit 120 (step S3). The order of these steps S2 and S3 is not limited to the order in the present embodiment, and is arbitrary.

演算対象となる２つのランデータ群ＲＤａ、ＲＤｂの取得が完了すると、第１ランデータ群ＲＤａを構成するランデータｒａ毎にプロセッサコア１１０による演算処理（本実施形態では論理積処理および論理和処理）が並列して実行される（ステップＳ４−１〜Ｓ４−ｍ）。より具体的には、第１番目のランデータｒａ［１］については、第１プロセッサコア１１０の第１スレッド処理部１１１が第２ランデータ群ＲＤｂを構成するランデータｒｂ［１］〜ｒｂ［ｎ］との論理演算を行う（ステップＳ４−１）。このような第１スレッドと同様の処理を第２〜第ｍプロセッサコア１１０の第１スレッド処理部１１１でも実行する（ステップＳ４−２〜Ｓ４−ｍ）。なお、第（ｍ＋１）番目以降のランデータｒａが存在しないときには、第（ｍ＋１）〜第Ｍプロセッサコア１１０の第１スレッド処理部１１１では演算処理を行わず、第１〜第ｍプロセッサコア１１０の第１スレッド処理部１１１による並列演算処理が完了するまで待機する。 When the acquisition of the two run data groups RDa and RDb to be calculated is completed, arithmetic processing by the processor core 110 (in this embodiment, logical product processing and logical sum processing) is performed for each run data ra constituting the first run data group RDa. ) Are executed in parallel (steps S4-1 to S4-m). More specifically, for the first run data ra [1], the first thread processing unit 111 of the first processor core 110 executes run data rb [1] to rb [that constitute the second run data group RDb. n] is performed (step S4-1). The same processing as that of the first thread is also executed by the first thread processing unit 111 of the second to m-th processor cores 110 (steps S4-2 to S4-m). When the (m + 1) th and subsequent run data ra do not exist, the first thread processing unit 111 of the (m + 1) th to Mth processor cores 110 does not perform arithmetic processing, and the first to mth processor cores 110 It waits until the parallel operation processing by the first thread processing unit 111 is completed.

こうして並列演算処理（ステップＳ４−１〜Ｓ４−ｍ）が完了すると、データ演算装置１００は演算結果（ランデータ）を記憶部１２０に書き込んで一連の処理を終了する（ステップＳ５）。 When the parallel operation processing (steps S4-1 to S4-m) is completed in this way, the data operation device 100 writes the operation result (run data) in the storage unit 120 and ends the series of processing (step S5).

以上のように、本発明の第１実施形態によれば、図２や図３に示すように、第１ランデータ群ＲＤａを構成するランデータｒａ毎にプロセッサコア１１０による演算処理を並列して実行しているため、ランデータ群に対する演算処理に要する時間を短縮することができる。特に、本実施形態では、二値画像データＤａ、Ｄｂをランレングス化して得られるランデータの要素数の最大値Ｍ以上のプロセッサコア１１０が予め設けられているため、第１ランデータ群ＲＤａを構成するランデータｒａの個数ｍが比較的少ない場合はもちろんのこと、最大値Ｍと一致する場合であっても、ランデータｒａ［１］〜ｒａ［ｍ］毎の第１スレッド処理を並列して行うことができ、例えば図４に示すように二値画像データの演算処理に要する処理時間を安定して短縮することができる。 As described above, according to the first embodiment of the present invention, as shown in FIG. 2 and FIG. 3, the arithmetic processing by the processor core 110 is performed in parallel for each run data ra constituting the first run data group RDa. Since it is executed, the time required for the arithmetic processing for the run data group can be shortened. In particular, in this embodiment, since the processor core 110 having the maximum number M of the number of elements of run data obtained by converting the binary image data Da and Db into run length is provided in advance, the first run data group RDa The first thread processing for each of the run data ra [1] to ra [m] is performed in parallel even if the number m of the run data ra to be configured is relatively small, and even when the number m is equal to the maximum value M. For example, as shown in FIG. 4, the processing time required for the arithmetic processing of binary image data can be stably reduced.

図４は図１のデータ演算装置でのラン要素数に対する演算処理に要する時間の変化を示すグラフである。ここでは、互いに異なる２つの２５６階調画像（ファイルサイズ３２ＭＢ）を乱数発生により２つ作成し、画像毎に当該画像を２値化する際のしきい値を多段階で変更することでランデータの要素数が互いに異なる３種類の二値画像データを作成した。より具体的には、ランデータの要素数が約８００万（８Ｍ）点、約５６０万（５．６Ｍ）点、約５万（０．５Ｍ）点の二値画像データ、つまり、
二値画像データＤａ(8)、Ｄｂ(8)、
二値画像データＤａ(5.6)、Ｄｂ(5.6)、
二値画像データＤａ(0.5)、Ｄｂ(0.5)、
を作成した。そして、ランデータの要素数毎に、第１実施形態にかかるデータ演算装置１００によって二値画像データＤａ、Ｄｂの論理積および論理和を演算し、その演算処理に要した処理時間を計測した。その結果をまとめたものが図４である。この図４から明らかなようにランデータの要素数とはほぼ無関係に、二値画像データＤａ、Ｄｂの論理積および論理和を短時間で演算することが可能となっている。なお、第１実施形態での処理時間の短縮効果については、次の比較例（図５および図６）を参照しつつ説明する。 FIG. 4 is a graph showing a change in time required for the arithmetic processing with respect to the number of run elements in the data arithmetic device of FIG. Here, two 256 gradation images (file size 32 MB) that are different from each other are created by random number generation, and the threshold value for binarizing the image for each image is changed in multiple stages to thereby obtain run data. Three types of binary image data having different numbers of elements were created. More specifically, binary image data having about 8 million (8M) points, about 5.6 million (5.6M) points, and about 50,000 (0.5M) points of run data,
Binary image data Da (8), Db (8),
Binary image data Da (5.6), Db (5.6),
Binary image data Da (0.5), Db (0.5),
It was created. For each number of elements of run data, the data arithmetic device 100 according to the first embodiment calculates the logical product and logical sum of the binary image data Da and Db, and measures the processing time required for the arithmetic processing. FIG. 4 summarizes the results. As apparent from FIG. 4, the logical product and logical sum of the binary image data Da and Db can be calculated in a short time almost independently of the number of elements of the run data. The effect of shortening the processing time in the first embodiment will be described with reference to the following comparative examples (FIGS. 5 and 6).

図５は比較例における演算動作を模式的に示す図である。また、図６は比較例でのラン要素数に対する演算処理に要する時間の変化を示すグラフである。この比較例が第１実施形態と大きく相違する点は並列演算を行う単位であり、この比較例ではランデータ群を構成するランデータを行単位または列単位で分割して複数のデータ領域に組分けし、各データ領域に対して演算処理を行う。より詳しくは、ランデータ群ＲＤａを構成するランデータｒａ［１］〜ｒａ［ｍ］を行単位で分割してｐ個のデータ領域に組分けし、データ領域毎に第２ランデータ群ＲＤｂとの論理積および論理和を求めるスレッド処理を並列して行う。例えば図３に示すランデータ群ＲＤａ、ＲＤｂの並列演算処理を行う場合、第１行目のランデータ群ｒａ（ｒａ［１］、ｒａ［２］）とランデータ群ｒｂ（ｒｂ［１］、ｒｂ［２］）との論理積および論理和が演算される（第１スレッド）。また、第２行目ないし第ｐ行目についても第１行目と同様の演算処理が第１スレッドと同様に、かつ並列して実行される（第２スレッド〜第ｐスレッド）。なお、（ｐ＋１）行目以降のスレッドについては、第１〜第ｐスレッドの並列演算処理が完了するまで待機する。 FIG. 5 is a diagram schematically showing a calculation operation in the comparative example. FIG. 6 is a graph showing a change in time required for the arithmetic processing with respect to the number of run elements in the comparative example. The comparative example is greatly different from the first embodiment in the unit for performing the parallel operation. In this comparative example, the run data constituting the run data group is divided into row units or column units and assembled into a plurality of data areas. Divide and perform arithmetic processing on each data area. More specifically, the run data ra [1] to ra [m] constituting the run data group RDa are divided into rows and divided into p data areas, and the second run data group RDb is divided for each data area. Thread processing for obtaining the logical product and logical sum of these is performed in parallel. For example, when performing parallel operation processing of the run data groups RDa and RDb shown in FIG. 3, the run data group ra (ra [1], ra [2]) and the run data group rb (rb [1], rb [2]) and a logical product and a logical sum are calculated (first thread). For the second to p-th rows, the same arithmetic processing as that for the first row is executed in parallel with the first thread (second thread to p-th thread). Note that the threads on and after the (p + 1) -th line are on standby until the parallel calculation processing of the first to p-th threads is completed.

このように比較例においても、ランデータの演算を並列して行うため、演算に要する時間を特許文献１に記載の発明よりも短縮することができる。ただし、各スレッドに含まれるランデータｒａは１個または複数個となるため、ランデータ群ＲＤａを構成するランデータｒａの要素数が増えるにしたがって処理時間は長くなる。ここで、比較例においても、第１実施形態と同様にして、二値画像データＤａ、Ｄｂの論理積および論理和を演算し、その演算処理に要した処理時間を計測すると、図６に示す結果が得られた。なお、同図中の点線は第１実施形態で計測された処理時間を示している。 As described above, in the comparative example, the run data is calculated in parallel, so that the time required for the calculation can be shortened as compared with the invention described in Patent Document 1. However, since the run data ra included in each thread is one or plural, the processing time becomes longer as the number of elements of the run data ra constituting the run data group RDa increases. Here, in the comparative example, as in the first embodiment, the logical product and the logical sum of the binary image data Da and Db are calculated, and the processing time required for the arithmetic processing is measured, as shown in FIG. Results were obtained. In addition, the dotted line in the figure has shown the processing time measured in 1st Embodiment.

同図から明らかなように、第１実施形態は比較例よりも優れた処理性能を有している。つまり、第１実施形態によれば比較例（行単位のスレッド処理を並列して行う）よりもランデータの要素数の影響を受けることなく、安定して高速演算を行うことが可能となっている。 As is apparent from the figure, the first embodiment has a processing performance superior to that of the comparative example. That is, according to the first embodiment, it is possible to stably perform high-speed computation without being affected by the number of elements of run data, compared to the comparative example (performing thread processing in units of rows). Yes.

Ｂ．第２実施形態
ところで、図６では処理時間の逆転現象が示されている。すなわち、ランデータの要素数が極端に少なくなり、第１実施形態で行われるスレッド処理の並列性が低くなると、比較例での処理時間の方が第１実施形態での処理時間よりも短くなる。このことは、スレッド処理の並列性を示す指標値が上記逆転現象の発生時における指標値（以下「境界値」という）を上回っている場合には第１実施形態で行った並列演算処理を行うのが有利である一方、指標値が境界値以下である場合には比較例で行った並列演算処理を行うのが有利であることを意味している。そこで、本発明の第２実施形態では、２種類の並列演算処理を実行可能に構成するとともに、指標値が境界値を上回っているか否かに応じて並列演算処理を使い分けることで、高速演算が可能な範囲を広げている。以下、図７ないし図９を参照しながら本発明の第２実施形態について説明する。 B. Second Embodiment By the way, FIG. 6 shows a reverse phenomenon of processing time. That is, when the number of run data elements is extremely reduced and the parallelism of the thread processing performed in the first embodiment is reduced, the processing time in the comparative example is shorter than the processing time in the first embodiment. . This means that when the index value indicating the parallelism of the thread processing exceeds the index value (hereinafter referred to as “boundary value”) at the time of occurrence of the reverse phenomenon, the parallel arithmetic processing performed in the first embodiment is performed. On the other hand, if the index value is less than or equal to the boundary value, it means that it is advantageous to perform the parallel operation processing performed in the comparative example. Therefore, in the second embodiment of the present invention, two types of parallel arithmetic processing can be executed, and high-speed arithmetic can be performed by properly using parallel arithmetic processing depending on whether or not the index value exceeds the boundary value. It expands the possible range. Hereinafter, a second embodiment of the present invention will be described with reference to FIGS.

図７は本発明にかかるデータ演算装置の第２実施形態を示す図である。この第２実施形態が第１実施形態と大きく相違するのは、以下の３点である。第１点目は、各プロセッサコア１１０が第１スレッド処理部１１１以外に比較例と同様に行単位で演算処理を行う第２スレッド処理部１１２としても機能する点である。第２点目は、二値画像データの画像サイズと、当該二値画像データをランレングス化して生成されるランデータの要素数とから求めるランデータの発生頻度を上記指標値として演算する発生頻度演算部１６０を新たに設けた点である。さらに第３点目は、発生頻度演算部１６０で求めた発生頻度に応じて並列演算処理を切り替える点である。なお、その他の構成および動作が基本的に第１実施形態と同様であるため、同一構成については同一符号を付して説明を省略する。 FIG. 7 is a diagram showing a second embodiment of the data arithmetic device according to the present invention. The second embodiment is greatly different from the first embodiment in the following three points. The first point is that each processor core 110 functions as a second thread processing unit 112 that performs arithmetic processing in units of rows in addition to the first thread processing unit 111 as in the comparative example. The second point is the occurrence frequency of calculating the occurrence frequency of run data obtained from the image size of the binary image data and the number of elements of run data generated by run lengthing the binary image data as the index value. The operation unit 160 is newly provided. Furthermore, the third point is that the parallel calculation processing is switched according to the occurrence frequency obtained by the occurrence frequency calculation unit 160. Since other configurations and operations are basically the same as those of the first embodiment, the same components are denoted by the same reference numerals and description thereof is omitted.

図８および図９は図７のデータ演算装置によるデータ演算動作を示すフローチャートである。第２実施形態においても、装置外部から演算の対象となる２つの二値画像データＤａ、Ｄｂが指定されると、データ演算装置１００は、二値画像データＤａ、Ｄｂの記憶部１２０への記憶（ステップＳ１）、ランデータｒａ［１］〜ｒａ［ｍ］からなる第１ランデータ群ＲＤａの取得および記憶（ステップＳ２）、ランデータｒｂ［１］〜ｒｂ［ｎ］からなる第２ランデータ群ＲＤｂの取得および記憶（ステップＳ３）を行う。 8 and 9 are flow charts showing the data calculation operation by the data calculation device of FIG. Also in the second embodiment, when two binary image data Da and Db to be calculated are designated from the outside of the apparatus, the data calculation apparatus 100 stores the binary image data Da and Db in the storage unit 120. (Step S1), acquisition and storage of a first run data group RDa consisting of run data ra [1] to ra [m] (Step S2), second run data consisting of run data rb [1] to rb [n] The group RDb is acquired and stored (step S3).

次のステップＳ６では、データ演算装置１００の発生頻度演算部１６０が、ランデータｒａ［１］〜ｒａ［ｍ］の要素数に相当する値ｍを二値画像データＤａの画像サイズで除算してランデータの発生頻度を算出している。例えば二値画像データＤａが線図画像を示すデータである場合には発生頻度は低くなる傾向にある一方、写真画像を示すデータである場合には発生頻度は高くなる傾向にある。 In the next step S6, the occurrence frequency calculation unit 160 of the data calculation device 100 divides the value m corresponding to the number of elements of the run data ra [1] to ra [m] by the image size of the binary image data Da. The frequency of occurrence of run data is calculated. For example, when the binary image data Da is data indicating a diagram image, the frequency of occurrence tends to be low, whereas when it is data indicating a photographic image, the frequency of occurrence tends to be high.

こうしてランデータの発生頻度を上記指標値として求めた後で、データ演算装置１００は、発生頻度が境界値を上回っているか否かを判定し（ステップＳ７）、上回っていると判定したときには第１実施形態と同様にランデータｒａ毎に演算処理を並列して行う（ステップＳ４−１〜Ｓ４−ｍ）。 After determining the occurrence frequency of the run data as the index value in this way, the data arithmetic device 100 determines whether or not the occurrence frequency exceeds the boundary value (step S7). Similar to the embodiment, arithmetic processing is performed in parallel for each run data ra (steps S4-1 to S4-m).

一方、データ演算装置１００は、発生頻度が境界値以下であると判定したときには第２スレッド処理部１１２による並列演算処理を行う（ステップＳ８−１〜Ｓ８−ｐ）。すなわち、第１プロセッサコア１１０の第２スレッド処理部１１２は第１行目について第１ランデータＲＤａ群と第２ランデータ群ＲＤｂとの論理積および論理和を求める演算機能を有している。また、それ以降の第２、第３、…、第Ｍプロセッサコア１１０の第２スレッド処理部１１２も、第１プロセッサコア１１０の第２スレッド処理部１１２と同様に、それぞれ第２行目ないし第ｐ行目について論理積および論理和を求める演算機能を有している。そして、ステップＳ８−１〜Ｓ８−ｐでは、行単位で第１ないし第ｐプロセッサコア１１０による演算処理（本実施形態では論理積処理および論理和処理）が並列して実行される。なお、第（ｐ＋１）行以降についてはランデータｒａ、ｒｂは存在しないため、第（ｐ＋１）〜第Ｍプロセッサコア１１０の第２スレッド処理部１１２では演算処理を行わず、第１〜第ｐプロセッサコア１１０の第２スレッド処理部１１２による並列演算処理が完了するまで待機する。 On the other hand, when it is determined that the occurrence frequency is equal to or lower than the boundary value, the data arithmetic device 100 performs parallel arithmetic processing by the second thread processing unit 112 (steps S8-1 to S8-p). That is, the second thread processing unit 112 of the first processor core 110 has an arithmetic function for obtaining a logical product and a logical sum of the first run data group RDa and the second run data group RDb for the first row. Further, the second thread processing unit 112 of the second, third,..., M processor cores 110 thereafter is also similar to the second thread processing unit 112 of the first processor core 110. An arithmetic function for obtaining a logical product and a logical sum for the p-th row is provided. In steps S8-1 to S8-p, arithmetic processing (logical product processing and logical sum processing in the present embodiment) by the first to p-th processor cores 110 is executed in parallel in units of rows. Since the run data ra and rb do not exist after the (p + 1) th row, the second thread processing unit 112 of the (p + 1) th to Mth processor cores 110 does not perform arithmetic processing, and the first to pth processors. It waits until the parallel arithmetic processing by the second thread processing unit 112 of the core 110 is completed.

こうして２種類の並列演算処理のうちの一方が完了すると、データ演算装置１００は演算結果（ランデータ）を記憶部１２０に書き込んで一連の処理を終了する（ステップＳ５）。 When one of the two types of parallel arithmetic processing is completed in this way, the data arithmetic device 100 writes the arithmetic result (run data) in the storage unit 120 and ends the series of processing (step S5).

以上のように、本発明の第２実施形態では、予め２種類の並列演算処理を用意しておき、ランデータの発生頻度をスレッド処理の並列性を示す指標値として求め（ステップＳ６）、それが境界値を上回っているか否かに応じて並列演算処理の選択切替を行っている。このように第１スレッド処理部１１１による演算の並列性を示すランデータの発生頻度に対応した並列演算処理が行われ、いずれも場合も高速演算が可能となり、高速演算が可能な範囲が第１実施形態よりも広がり、高い汎用性が得られる。 As described above, in the second embodiment of the present invention, two types of parallel arithmetic processing are prepared in advance, and the occurrence frequency of run data is obtained as an index value indicating the parallelism of thread processing (step S6). Depending on whether or not exceeds the boundary value, the parallel arithmetic processing selection is switched. As described above, the parallel calculation processing corresponding to the frequency of generation of run data indicating the parallelism of the calculation by the first thread processing unit 111 is performed. In either case, high-speed calculation is possible, and the range in which high-speed calculation is possible is the first. More than the embodiment, high versatility can be obtained.

Ｃ．欠陥検査装置
図１０は、第２実施形態にかかるデータ演算装置を装備する欠陥検査装置の一例を示す図である。この欠陥検査装置１は、検査対象である半導体基板（以下「基板」という。）Ｓの外観に現れたピンホールや異物等の欠陥検査を行う。欠陥検査装置１は、基板Ｓ上の検査対象領域を撮像する撮像装置２と、撮像装置２からの画像データに基づいて欠陥検査を行う制御装置３を有する。 C. Defect Inspection Device FIG. 10 is a diagram illustrating an example of a defect inspection device equipped with a data operation device according to the second embodiment. The defect inspection apparatus 1 inspects for defects such as pinholes and foreign matters appearing on the appearance of a semiconductor substrate (hereinafter referred to as “substrate”) S to be inspected. The defect inspection apparatus 1 includes an imaging device 2 that images an inspection target region on the substrate S and a control device 3 that performs defect inspection based on image data from the imaging device 2.

上記欠陥検査装置１を装備する検査システムでは、欠陥検査装置１とは別に基板Ｓの製造ラインに設けられた基板検出装置２００において基板Ｓに欠陥が発見されると、その欠陥の位置座標がこの欠陥検査装置１に与えられる。製造ラインに組み込まれた基板検出装置２００は、予め定められた処理アルゴリズムによって基板Ｓ全体を検査し、基板表面に欠陥としての要件を満たす領域があればその位置座標を取得して出力する。したがって、該基板検出装置２００が有する撮像部は比較的低解像度であり、処理アルゴリズムも固定的である。 In the inspection system equipped with the defect inspection apparatus 1, when a defect is found in the substrate S in the substrate detection apparatus 200 provided in the production line of the substrate S separately from the defect inspection apparatus 1, the position coordinates of the defect are The defect inspection apparatus 1 is given. The substrate detection apparatus 200 incorporated in the production line inspects the entire substrate S by a predetermined processing algorithm, and acquires and outputs the position coordinates if there is a region on the substrate surface that satisfies the requirement as a defect. Therefore, the imaging unit included in the substrate detection apparatus 200 has a relatively low resolution, and the processing algorithm is fixed.

一方、この欠陥検査装置１は、図示しないインターフェースを介して基板検出装置２００と接続されており、基板検出装置２００から欠陥として位置座標が報告された領域をより高い解像度を有する撮像装置２によって撮像するとともに、その画像を制御装置３が精査して欠陥の有無やその種類などをより詳しく判定するとともに、欠陥部分の画像を表示部に表示する。 On the other hand, the defect inspection apparatus 1 is connected to the substrate detection apparatus 200 via an interface (not shown), and an image pickup apparatus 2 having higher resolution captures an area in which position coordinates are reported as a defect from the substrate detection apparatus 200. At the same time, the control device 3 examines the image to determine in detail the presence or absence of the defect and the type thereof, and displays the image of the defective portion on the display unit.

撮像装置２は、基板Ｓ上の検査対象領域を撮像することにより画像データを取得する撮像部２１、基板Ｓを保持するステージ２２、および、撮像部２１に対してステージ２２を相対的に移動させるステージ駆動部２３を有している。また、撮像部２１は、照明光を出射する照明部２１１、基板Ｓに照明光を導くとともに基板Ｓからの光が入射する光学系２１２、および、光学系２１２により結像された基板Ｓの像を電気信号に変換する撮像デバイス２１３を有している。ステージ駆動部２３はボールねじ、ガイドレールおよびモータにより構成され、制御装置３に設けられた装置制御部４がステージ駆動部２３および撮像部２１を制御することにより、基板Ｓ上の検査対象領域が撮像される。 The imaging device 2 moves the stage 22 relative to the imaging unit 21 that acquires image data by imaging the inspection target region on the substrate S, the stage 22 that holds the substrate S, and the imaging unit 21. A stage driving unit 23 is provided. In addition, the imaging unit 21 includes an illumination unit 211 that emits illumination light, an optical system 212 that guides illumination light to the substrate S and receives light from the substrate S, and an image of the substrate S formed by the optical system 212. An imaging device 213 for converting the signal into an electrical signal. The stage drive unit 23 includes a ball screw, a guide rail, and a motor. The device control unit 4 provided in the control device 3 controls the stage drive unit 23 and the imaging unit 21, so that the inspection target region on the substrate S is changed. Imaged.

制御装置３は装置制御部４を有しており、この装置制御部４が予め読み込まれた制御プログラムを実行することにより、図１０に示す制御装置各部を以下のように動作させる。制御装置３は、上記の装置制御部４のほか、画像取得部５および画像処理部６を備えている。画像取得部５は、撮像部２１から出力される電気信号をデータ化して、撮像画像に対応する画像データを取得する。画像処理部６は、画像取得部５が取得した画像データに対して適宜の画像処理を施して、画像に含まれる欠陥の検出や欠陥部分の画像（以下「欠陥画像」という）を作成する。なお、当該画像処理部６は、図７に示すデータ演算装置１００と同一構成を有するデータ演算部を含んでおり、撮像装置２により撮像された画像（検査対象画像）から抽出した画像に対してマスク処理を施して欠陥部分の情報を導出可能となっている。画像処理部６、特にデータ演算部の構成および動作については、後で詳述する。 The control device 3 has a device control unit 4. When the device control unit 4 executes a control program read in advance, each unit of the control device shown in FIG. 10 is operated as follows. The control device 3 includes an image acquisition unit 5 and an image processing unit 6 in addition to the device control unit 4 described above. The image acquisition unit 5 converts the electrical signal output from the imaging unit 21 into data, and acquires image data corresponding to the captured image. The image processing unit 6 performs appropriate image processing on the image data acquired by the image acquisition unit 5 to detect a defect included in the image and create an image of a defective portion (hereinafter referred to as “defect image”). Note that the image processing unit 6 includes a data calculation unit having the same configuration as that of the data calculation device 100 shown in FIG. 7, and for an image extracted from an image captured by the imaging device 2 (inspection target image). It is possible to derive information on a defective portion by performing mask processing. The configuration and operation of the image processing unit 6, particularly the data calculation unit will be described in detail later.

さらに、制御装置３は、各種データを記憶するための記憶部７、ユーザからの操作入力を受け付けるキーボードおよびマウスなどの入力受付部８および操作手順や処理結果等のユーザ向け視覚情報を表示する表示部９などを備えている。また、図示を省略しているが、光ディスク、磁気ディスク、光磁気ディスク等のコンピュータ読み取り可能な記録媒体から情報の読み取りを行う読取装置を有し、欠陥検査装置１の他の構成との間で信号を送受信する通信部が、適宜、インターフェース（Ｉ／Ｆ）を介する等して接続される。 Further, the control device 3 displays a storage unit 7 for storing various data, an input receiving unit 8 such as a keyboard and a mouse for receiving an operation input from the user, and a visual information for the user such as an operation procedure and a processing result. Part 9 and the like. Although not shown in the figure, the apparatus has a reader for reading information from a computer-readable recording medium such as an optical disk, a magnetic disk, a magneto-optical disk, etc. A communication unit that transmits and receives signals is appropriately connected through an interface (I / F) or the like.

図１１は図１０に示す欠陥検査装置の画像処理部の概略構成を示すブロック図である。画像処理部６は、フィルタリング部６１、差分抽出部６２、二値化処理部６３およびデータ演算部６４を有している。フィルタリング部６１には、画像取得部５から撮像画像が送られてくるとともに、記憶部７から参照画像が送られてくる。これら両画像のうち撮像画像は、撮像装置２によって撮像された基板Ｓの画像であり、欠陥検出検査の対象となる検査対象画像に相当する。また、参照画像は欠陥のない理想的な基板に対応する画像であり、この実施形態では、次に説明するように、検査対象画像と参照画像との比較によって検査対象画像から欠陥検出が行われる。これらの欠陥画像および参照画像は記憶部７に記憶され、必要に応じて参照されるが、外部の記憶媒体に記憶された画像データを必要に応じて読み込む形態であってもよい。 FIG. 11 is a block diagram showing a schematic configuration of the image processing unit of the defect inspection apparatus shown in FIG. The image processing unit 6 includes a filtering unit 61, a difference extraction unit 62, a binarization processing unit 63, and a data calculation unit 64. A captured image is sent from the image acquisition unit 5 to the filtering unit 61, and a reference image is sent from the storage unit 7. Of these two images, the picked-up image is an image of the substrate S picked up by the image pickup device 2 and corresponds to an inspection target image to be subjected to defect detection inspection. Further, the reference image is an image corresponding to an ideal substrate having no defect, and in this embodiment, as will be described below, defect detection is performed from the inspection target image by comparing the inspection target image with the reference image. . These defect images and reference images are stored in the storage unit 7 and referred to as necessary. However, image data stored in an external storage medium may be read as necessary.

フィルタリング部６１は、検査対象画像および参照画像のそれぞれについて、画像ノイズおよび欠陥と関係のない軽微な画像の差異を除去するためのフィルタリング処理を行い、各画像を差分抽出部６２に送る。この差分抽出部６２は、フィルタリング処理後の検査対象画像および参照画像の差分を求めることで画像内容が互いに異なる領域を抽出し、当該差分画像を本発明の「抽出画像」として二値化処理部６３に送る。そして、二値化処理部６３は適宜の閾値によって差分画像を二値化して二値画像データＤａを生成し、データ演算部６４に送る。このようにデータ演算部６４に対しては、差分画像の二値画像データＤａ以外に、マスク画像の二値画像データＤｂが記憶部７から適宜送られる。そして、データ演算部６４は、第２実施形態（図８および図９）と同様に、二値画像データＤａをランレングス化して第１ランデータ群ＲＤａを本発明の「抽出ランデータ群」として求めるとともに、二値画像データＤｂをランレングス化して第２ランデータ群ＲＤｂを本発明の「マスクランデータ群」として求める。そして、データ演算部６４は両ランデータ群ＲＤａ、ＲＤｂの論理積を演算する。これにより、欠陥部位以外の部位を抽出画像から除去し、良好な欠陥画像データを示すランデータを求める。 The filtering unit 61 performs a filtering process for removing minor image differences unrelated to image noise and defects for each of the inspection target image and the reference image, and sends each image to the difference extraction unit 62. The difference extraction unit 62 extracts areas having different image contents by obtaining a difference between the inspection target image and the reference image after the filtering process, and uses the difference image as an “extraction image” of the present invention. Send to 63. Then, the binarization processing unit 63 binarizes the difference image with an appropriate threshold value to generate binary image data Da, and sends it to the data calculation unit 64. As described above, the binary image data Db of the mask image is appropriately sent from the storage unit 7 to the data calculation unit 64 in addition to the binary image data Da of the difference image. Then, similarly to the second embodiment (FIGS. 8 and 9), the data calculation unit 64 converts the binary image data Da into run lengths and sets the first run data group RDa as the “extracted run data group” of the present invention. At the same time, the binary image data Db is run-lengthed to obtain the second run data group RDb as the “mask run data group” of the present invention. Then, the data calculation unit 64 calculates the logical product of both run data groups RDa and RDb. Thereby, parts other than the defective part are removed from the extracted image, and run data indicating good defect image data is obtained.

以上のように、データ演算装置の第２実施形態を装備する欠陥検査装置では、欠陥ランデータを求めるための演算を高速で行うことができ、欠陥検査に要する時間を短縮することができる。なお、第２実施形態の代わりに第１実施形態にかかるデータ演算装置と同一構成を有するものでデータ演算部６４を構成してもよい。 As described above, in the defect inspection apparatus equipped with the second embodiment of the data operation apparatus, the operation for obtaining the defect run data can be performed at high speed, and the time required for defect inspection can be shortened. In addition, you may comprise the data calculating part 64 by having the same structure as the data calculating device concerning 1st Embodiment instead of 2nd Embodiment.

このように、第１スレッド処理部１１１による並列演算処理が本発明の「第１並列演算処理」および「第１並列演算工程」の一例に相当し、第２スレッド処理部１１２による並列演算処理が本発明の「第２並列演算処理」および「第２並列演算工程」の一例に相当している。また、並列処理制御部１４０が本発明の「処理制御部」の一例に相当している。また、差分抽出部６２が本発明の「画像抽出部」の一例に相当している。 Thus, the parallel arithmetic processing by the first thread processing unit 111 corresponds to an example of “first parallel arithmetic processing” and “first parallel arithmetic process” of the present invention, and the parallel arithmetic processing by the second thread processing unit 112 is performed. This corresponds to an example of the “second parallel calculation process” and the “second parallel calculation step” of the present invention. The parallel processing control unit 140 corresponds to an example of the “processing control unit” in the present invention. Further, the difference extraction unit 62 corresponds to an example of the “image extraction unit” of the present invention.

Ｄ．その他
なお、本発明は上記した実施形態に限定されるものではなく、その趣旨を逸脱しない限りにおいて上述したもの以外に種々の変更を行うことが可能である。例えば、上記実施形態では、各プロセッサコア１１０が一のスレッドを実行するように構成しているが、各プロセッサコア１１０が複数のスレッドを実行するように構成してもよい。また、上記実施形態では、演算処理（論理積処理および論理和処理）を実行するために複数のプロセッサコア１１０を有するＧＰＵを用いているが、ＧＰＵの代わりに複数のＣＰＵを設け、各ＣＰＵで一または複数のスレッドを実行するように構成してもよい。 D. Others The present invention is not limited to the above-described embodiment, and various modifications other than those described above can be made without departing from the spirit of the present invention. For example, in the above embodiment, each processor core 110 is configured to execute one thread, but each processor core 110 may be configured to execute a plurality of threads. In the above embodiment, a GPU having a plurality of processor cores 110 is used to perform arithmetic processing (logical product processing and logical sum processing). However, a plurality of CPUs are provided in place of the GPU, One or more threads may be configured to execute.

また、上記実施形態では、外部装置から送られてくる二値画像データＤａ、Ｄｂを装置内のラン生成部１３０でランレングス化して複数のランデータを生成しているが、両者またはそれらの一方を、二値画像データではなく既にランレングス化されたランデータ群の形で取得してもよい。 In the above embodiment, the binary image data Da and Db sent from the external device are run-lengthed by the run generation unit 130 in the device to generate a plurality of run data. May be acquired in the form of a run data group that has already been run-length instead of binary image data.

また、上記実施形態では、演算処理として論理積処理および論理和処理を行っているが、これらのうちの一方のみを行う、あるいはこれら以外の演算処理を行うデータ演算装置や方法に対して本発明を適用することができる。なお、「これら以外の演算処理」としては、例えば排他的論理和、ラベリング関連処理（面積計算、外接矩形計算、重心計算など）、膨張収縮処理、座標反転処理、回転処理、ラン反転処理などが含まれる。 In the above-described embodiment, logical product processing and logical sum processing are performed as arithmetic processing. However, the present invention is applied to a data arithmetic device or method that performs only one of these processing or performs arithmetic processing other than these. Can be applied. Examples of “calculation processing other than these” include exclusive OR, labeling related processing (area calculation, circumscribed rectangle calculation, center of gravity calculation, etc.), expansion / contraction processing, coordinate inversion processing, rotation processing, run inversion processing, and the like. included.

また、上記第２実施形態では、行単位で分割した領域を「データ領域」としているが、この「データ領域」はこれに限定されるものではなく、例えば列単位で分割した領域を「データ領域」として用いてもよい。 In the second embodiment, the area divided in units of rows is referred to as a “data area”. However, the “data area” is not limited to this. For example, the area divided in units of columns is referred to as a “data area”. May be used.

さらに、上記実施形態では、複数のランデータ群を論理演算するデータ演算技術を欠陥検査装置に適用しているが、本発明にかかるデータ演算方法および装置に適用対象はこれに限定されるものではなく、上記データ演算技術を利用する装置全般に適用することができる。 Furthermore, in the above-described embodiment, the data calculation technique for logically calculating a plurality of run data groups is applied to the defect inspection apparatus, but the application target is not limited to this for the data calculation method and apparatus according to the present invention. However, the present invention can be applied to all devices using the above data calculation technique.

この発明は、ランデータについて演算処理を施すデータ演算技術に対して好適に適用することができる。 The present invention can be suitably applied to a data calculation technique for performing calculation processing on run data.

１…欠陥検査装置、
５…画像取得部、
６…画像処理部、
６２…差分抽出部（画像抽出部）、
６４…データ演算部（データ演算装置）、
１００…データ演算装置
１１０…プロセッサコア、
１１１…第１スレッド処理部、
１１２…第２スレッド処理部、
１３０…ラン生成部、
１４０…並列処理制御部、
１６０…発生頻度演算部、
Ｄａ，Ｄｂ…二値画像データ、
ｒａ，ｒｂ…ランデータ、
ＲＤａ…第１ランデータ群、
ＲＤｂ…第２ランデータ群、
Ｓ…基板 1 ... Defect inspection device,
5 ... Image acquisition unit,
6 ... Image processing unit,
62 ... difference extraction unit (image extraction unit),
64 ... data operation unit (data operation device),
100: Data processing unit 110: Processor core,
111 ... 1st thread processing part,
112 ... the second thread processing unit,
130 ... run generation unit,
140 ... parallel processing control unit,
160... Occurrence frequency calculation unit,
Da, Db ... binary image data,
ra, rb ... run data,
RDa: first run data group,
RDb: second run data group,
S ... Board

Claims

A data arithmetic device that performs arithmetic processing on a run data group composed of a plurality of run data obtained by converting the binary image data into run lengths,
A plurality of first thread processing units for performing the arithmetic processing on a single run data;
A data operation device comprising: a process control unit capable of executing a first parallel operation process for causing each run data constituting the run data group to perform an operation in parallel by the first thread processing units different from each other.

The data operation device according to claim 1,
A plurality of second thread processing units for performing the arithmetic processing on each of the data areas composed by dividing the plurality of run data;
The processing control unit is capable of executing a second parallel calculation process that causes each set to be operated in parallel by the second thread processing unit different from each other, and relates to the parallelism of the calculation in the first parallel calculation process. A data arithmetic device that executes one of the first parallel arithmetic processing and the second parallel arithmetic processing according to an index value.

The data operation device according to claim 2,
The data calculation device, wherein the index value is a frequency of occurrence of run data generated by converting the binary image data to run length.

The data operation device according to claim 3,
A data operation device further comprising an occurrence frequency calculation unit that calculates the occurrence frequency based on the number of elements of the run data constituting the run data group and the image size of the binary image data.

A data arithmetic device according to any one of claims 1 to 4,
The binary image data is image data of pixels of p rows × q columns,
The number of the first thread processing units is M. However, when q is an even number, M = (q / 2) × p,
When q is an odd number, M = ((q / 2) +1) × p
A data arithmetic device.

The data operation device according to claim 5,
The data operation device, wherein the plurality of run data is divided into row units or column units and grouped into the plurality of data areas.

A data calculation method for performing calculation processing on a plurality of run data obtained by converting binary image data to run length,
A data calculation method comprising: a first parallel calculation step of performing in parallel a first thread process for performing the calculation process for each run data.

The data calculation method according to claim 7,
A second parallel operation step of dividing the plurality of run data into a plurality of groups and performing a second thread process for performing the operation process for each group in parallel;
A data operation method for executing one of the first parallel operation step and the second parallel operation step in accordance with an index value related to parallelism of operations in the first parallel operation step.

An image acquisition unit for acquiring an image to be inspected;
An image extraction unit that inspects the inspection target image and extracts an extracted image including a defective portion;
By performing a logical operation between an extracted run data group having a plurality of run data obtained by run-lengthing the extracted image and a mask run data group having a plurality of run data obtained by run-lengthing the mask image, A data calculation unit that obtains defect image data by masking a portion other than the defective portion of the extracted image with the mask image;
The data calculation unit is
A plurality of first thread processing units for performing the logical operation on a single run data;
A defect inspection apparatus, comprising: a process control unit capable of executing a first parallel operation process in which each run data constituting the extracted run data group is operated in parallel by the first thread processing units different from each other.