JP7293813B2

JP7293813B2 - semiconductor equipment

Info

Publication number: JP7293813B2
Application number: JP2019070146A
Authority: JP
Inventors: 康智桜井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-04-01
Filing date: 2019-04-01
Publication date: 2023-06-20
Anticipated expiration: 2039-04-01
Also published as: JP2020170577A

Description

本発明は、半導体装置に関する。 The present invention relates to semiconductor devices.

クロックに同期してシリアルデータを出力するメモリを試験する場合、最初のデータと中間のデータと最後のデータとでクロックの位相を変えることで、タイミングマージン試験の試験時間は短縮される（例えば、特許文献１参照）。 When testing a memory that outputs serial data in synchronization with a clock, changing the phase of the clock for the first data, the middle data, and the last data shortens the test time of the timing margin test (for example, See Patent Document 1).

メモリの監視機能付きのメモリコントローラは、監視の開始指示に基づいて、メモリから順次データを読み出して、エラーの有無およびエラーが修正可能かを判定し、修正可能な場合、データを修正してメモリに再書き込みする（例えば、特許文献２参照）。 A memory controller with a memory monitoring function sequentially reads data from the memory based on a monitoring start instruction, determines whether there is an error and whether the error can be corrected, and corrects the data if correctable. (see, for example, Patent Document 2).

メモリセルの不良を救済する冗長領域を有する不揮発性メモリにおいて、救済に使用する不良アドレスをヒューズにプログラムする代わりにセルアレイに記憶することで、チップ面積が削減され、歩留りが向上する（例えば、特許文献３参照）。 In a nonvolatile memory having a redundant area for relieving defective memory cells, by storing the defective address used for relieving in the cell array instead of programming it in the fuse, the chip area is reduced and the yield is improved (for example, patent Reference 3).

特開平６－３６５９８号公報JP-A-6-36598 国際公開第２００６／１００７７６号WO2006/100776 特開２０００－２６０１９８号公報Japanese Patent Application Laid-Open No. 2000-260198

近時、ＣＰＵ（Central Processing Unit）等のプロセッサが扱うデータ量が増加する傾向にあり、プロセッサによりアクセスされるメモリの記憶容量も増加する傾向にある。記憶容量の増加によるメモリレイテンシの増大を抑止するために、複数のメモリチップを積層することで配線の負荷容量を減らした積層メモリが使用されるようになってきている。 Recently, the amount of data handled by a processor such as a CPU (Central Processing Unit) tends to increase, and the storage capacity of memory accessed by the processor also tends to increase. In order to suppress an increase in memory latency due to an increase in storage capacity, a stacked memory has been used in which a plurality of memory chips are stacked to reduce the load capacity of wiring.

しかしながら、積層メモリは、ＴＳＶ（Through-Silicon Via）等により複数のメモリチップを一体化しており、さらに一体化したメモリチップは、バンプを使用してシリコンインターポーザ等の基板等に接続される。このため、不良が発生したメモリチップだけを交換することは困難である。また、メモリチップに不良が発生した場合、システム停止等の深刻な障害が発生し得るため、不良が発生する可能性を予め検出し、予防的措置を行うことが好ましい。 However, in a stacked memory, a plurality of memory chips are integrated by TSV (Through-Silicon Via) or the like, and the integrated memory chip is connected to a substrate such as a silicon interposer using bumps. Therefore, it is difficult to replace only the defective memory chip. Further, when a defect occurs in a memory chip, a serious failure such as system stop may occur. Therefore, it is preferable to detect the possibility of occurrence of a defect in advance and take preventive measures.

１つの側面では、本発明は、積層メモリにおいて、不良の発生の可能性があるメモリブロックを検出し、当該メモリブロックの使用を停止することで、積層メモリの不良化を抑止することを目的とする。 According to one aspect of the present invention, it is an object of the present invention to detect a memory block that may be defective in a stacked memory and stop using the memory block, thereby preventing the stacked memory from becoming defective. do.

一つの観点によれば、半導体装置は、積層された複数のメモリを含む積層メモリと、前記複数のメモリが有する複数のメモリブロックからそれぞれ読み出される読み出しデータの読み出しマージンを減らしたテストを実行するテスト部と、前記テスト部によるテストでエラーが発生した回数を前記メモリブロックごとに保持する第１のカウンタ部と、前記積層メモリで発生する訂正可能なエラーを訂正するエラー検出訂正部と、前記エラー検出訂正部で訂正したエラー訂正回数を前記メモリブロックごとに保持する第２のカウンタ部と、前記第１のカウンタ部が保持する回数が予め定めた第１の閾値を超えたメモリブロックの使用を禁止し、前記第２のカウンタ部が保持するエラー訂正回数が予め定めた第２の閾値を超えたメモリブロックの使用を禁止するメモリ管理部と、を有する。

According to one aspect, a semiconductor device includes a stacked memory including a plurality of stacked memories, and a test for executing a test with a reduced read margin for read data read from a plurality of memory blocks included in the plurality of memories. a first counter unit that holds the number of times an error occurs in the test by the test unit for each memory block; an error detection/correction unit that corrects a correctable error that occurs in the stacked memory; A second counter unit that holds the number of error corrections corrected by the detection/correction unit for each memory block, and a memory block in which the number of times held by the first counter unit exceeds a predetermined first threshold is used. and a memory management unit that prohibits the use of a memory block in which the number of error corrections held by the second counter unit exceeds a predetermined second threshold.

１つの側面では、本発明は、積層メモリにおいて、不良の発生の可能性があるメモリブロックを検出し、当該メモリブロックの使用を停止することで、積層メモリの不良化を抑止することができる。 According to one aspect of the present invention, in a stacked memory, it is possible to prevent the stacked memory from becoming defective by detecting a memory block that may be defective and stopping the use of the memory block.

一実施形態における半導体装置の一例を示す図である。It is a figure which shows an example of the semiconductor device in one Embodiment. 別の実施形態における半導体装置の一例を示す図である。It is a figure which shows an example of the semiconductor device in another embodiment. 図２のテスト部の要部の一例を示す図である。3 is a diagram showing an example of a main part of a test section in FIG. 2; FIG. 図３のテスト部が生成するセット信号の一例を示す図である。4 is a diagram showing an example of a set signal generated by the test section in FIG. 3; FIG. 図３のテスト部による読み出しテストの一例を示す図である。4 is a diagram showing an example of a read test by the test section of FIG. 3; FIG. 図２のＣＰＵが実行するユーザプログラム処理のフローの一例を示す図である。3 is a diagram showing an example of the flow of user program processing executed by the CPU in FIG. 2; FIG. 図２のＣＰＵが実行するメモリパトロールの起動フローの一例を示す図である。3 is a diagram illustrating an example of a memory patrol startup flow executed by the CPU in FIG. 2; FIG. 図２のＣＰＵが実行するメモリパトロールの処理フローの一例を示す図である。3 is a diagram illustrating an example of a processing flow of memory patrol executed by the CPU in FIG. 2; FIG. 図２のＣＰＵが実行するページの使用を禁止する処理フローの一例を示す図である。3 is a diagram showing an example of a processing flow for prohibiting use of a page executed by the CPU in FIG. 2; FIG. 図２の半導体装置が搭載されるパッケージの一例を示す図である。3 is a diagram showing an example of a package in which the semiconductor device of FIG. 2 is mounted; FIG.

以下、図面を用いて実施形態が説明される。 Embodiments will be described below with reference to the drawings.

図１は、一実施形態における半導体装置の一例を示す。図１に示す半導体装置１００は、積層された複数のメモリチップ１０を含む積層メモリＭＥＭと、テスト部１２と、第１のカウンタ部１４と、メモリ管理部１６とを有する。メモリチップ１０は、メモリの一例である。各メモリチップ１０は、例えば、ユーザプログラムに割り当てる単位である複数のメモリブロック１０ａ（メモリ領域）を有する。テスト部１２は、積層メモリＭＥＭに出力したアドレスに基づいてメモリブロック１０ａからそれぞれ読み出される読み出しデータの読み出しマージンを減らしたテストを実行する。例えば、テスト部１２は、読み出しデータの取り込みタイミングを標準のタイミングよりも早めることで読み出しマージンを減らす。 FIG. 1 shows an example of a semiconductor device according to one embodiment. The semiconductor device 100 shown in FIG. 1 has a stacked memory MEM including a plurality of stacked memory chips 10 , a test section 12 , a first counter section 14 and a memory management section 16 . Memory chip 10 is an example of a memory. Each memory chip 10 has, for example, a plurality of memory blocks 10a (memory areas) that are units allocated to user programs. The test section 12 performs a test with a reduced read margin for read data read from the memory block 10a based on the address output to the stacked memory MEM. For example, the test unit 12 reduces the read margin by making the read data fetching timing earlier than the standard timing.

第１のカウンタ部１４は、テスト部１２による読み出しテストでエラーが発生した回数をメモリブロック１０ａ毎に保持する複数のカウンタ領域を有する。第１のカウンタ部１４は、８個のカウンタ領域が記載されているが、実際には、４つのメモリチップ１０の全てのメモリブロック１０ａに対応するカウンタ領域を有する。テスト部１２は、読み出しマージンを減らした読み出しテストでエラーが発生したメモリブロック１０ａに対応するカウンタ領域に保持されているカウンタ値を更新する。メモリ管理部１６は、第１のカウンタ部１４が保持する回数（カウンタ値）が、予め定めた第１の閾値を超えたメモリブロック１０ａの使用を禁止する。 The first counter section 14 has a plurality of counter areas for holding the number of times an error has occurred in the read test by the test section 12 for each memory block 10a. Although the first counter section 14 is described as having eight counter areas, it actually has counter areas corresponding to all the memory blocks 10 a of the four memory chips 10 . The test unit 12 updates the counter value held in the counter area corresponding to the memory block 10a in which an error occurred in the read test with the reduced read margin. The memory management unit 16 prohibits the use of the memory block 10a for which the number of times (counter value) held by the first counter unit 14 exceeds a predetermined first threshold.

テスト部１２は、読み出しマージンを減らしてテストを実行するため、テストでエラーが発生したメモリブロック１０ａは、読み出しマージンを減らさなければ正常に動作する。しかしながら、テスト部１２によるテストでエラーが発生したメモリブロック１０ａは、読み出しマージンが低下している。読み出しマージンの低下は、α線の入射や電源ノイズ等のソフト的な要因によっても発生するが、トランジスタの閾値電圧の上昇、内部配線やプラグ部の抵抗値の増加等、メモリ内部の特性劣化によって発生する場合もある。 Since the test unit 12 performs the test by reducing the read margin, the memory block 10a in which an error occurred in the test operates normally unless the read margin is reduced. However, the read margin of the memory block 10a in which an error has occurred in the test by the test section 12 is lowered. Decrease in readout margin occurs due to software factors such as incident α-rays and power supply noise, but also due to deterioration of internal memory characteristics such as increase in threshold voltage of transistors, increase in resistance of internal wiring and plugs, etc. It may occur.

読み出しマージンの低下がメモリ内部の特性劣化による場合、エラーが発生したメモリブロック１０ａは、将来、不良になる可能性がある。このため、読み出しマージン不足によるエラーが所定の回数（第１の閾値）を超えたメモリブロック１０ａは、不良の予兆と判断することが好ましい。すなわち、エラーの発生要因がソフト的なものか、メモリ内部の特性変動によるものかに関わらず、使用を禁止することが、半導体装置１００の信頼性を維持するために好ましい。 If the deterioration of the read margin is due to deterioration of the internal characteristics of the memory, the memory block 10a in which the error occurred may become defective in the future. For this reason, it is preferable to determine that the memory block 10a in which errors due to insufficient read margin have exceeded a predetermined number of times (first threshold value) is a sign of failure. In other words, regardless of whether the cause of the error is software or characteristic variation inside the memory, prohibiting the use is preferable in order to maintain the reliability of the semiconductor device 100 .

この実施形態では、読み出しマージンが低下したメモリブロック１０ａの使用を禁止することで、読み出しマージン不足のメモリブロック１０ａをアクセスし続けることを抑止することができ、メモリブロック１０ａが不良になることを抑止することができる。この結果、半導体装置１００の信頼性を維持することができる。 In this embodiment, by prohibiting the use of the memory block 10a with a reduced read margin, it is possible to prevent continued access to the memory block 10a with an insufficient read margin, thereby preventing the memory block 10a from becoming defective. can do. As a result, the reliability of the semiconductor device 100 can be maintained.

使用を禁止したメモリブロック１０ａを含む積層メモリＭＥＭの使用を継続する場合にも、不良が発生する可能性のあるメモリブロック１０ａはアクセスされないため、積層メモリＭＥＭに不良が発生することはない。したがって、不良の積層メモリＭＥＭを交換するメンテナンスが発生することはなく、メンテナンス性を向上できるとともに、半導体装置１００を含むシステムの運用コストを削減することができる。例えば、エラーが発生する前に不良の発生の予兆をつかむことで、エラーの発生によりシステムダウンが発生する前に余裕を持って積層メモリＭＥＭを交換することができる。 Even when the stacked memory MEM including the memory block 10a whose use is prohibited is continued to be used, the memory block 10a in which a defect may occur is not accessed, so that the stacked memory MEM does not become defective. Therefore, maintenance to replace the defective stacked memory MEM does not occur, and maintainability can be improved, and the operation cost of the system including the semiconductor device 100 can be reduced. For example, by catching a sign of the occurrence of a defect before an error occurs, it is possible to replace the stacked memory MEM well before the system goes down due to the occurrence of the error.

図２は、別の実施形態における半導体装置の一例を示す。図１と同様の要素については、同じ符号を付し、詳細な説明は省略する。図２に示す半導体装置１０２は、図１と同様の積層メモリＭＥＭと、ＣＰＵとを有する。ＣＰＵは、演算処理装置の一例である。特に限定されないが、積層メモリＭＥＭは、例えば、ＨＢＭ（High Bandwidth Memory）でもよい。なお、半導体装置１０２は、ＣＰＵの代わりに他のプロセッサを有してもよい。積層メモリＭＥＭの構成は、図１と同様であるため、説明を省略する。 FIG. 2 shows an example of a semiconductor device according to another embodiment. Elements similar to those in FIG. 1 are denoted by the same reference numerals, and detailed description thereof is omitted. The semiconductor device 102 shown in FIG. 2 has a stacked memory MEM similar to that shown in FIG. 1 and a CPU. A CPU is an example of an arithmetic processing unit. Although not particularly limited, the stacked memory MEM may be, for example, HBM (High Bandwidth Memory). Note that the semiconductor device 102 may have another processor instead of the CPU. Since the configuration of the stacked memory MEM is the same as that of FIG. 1, the description is omitted.

ＣＰＵは、テスト部１２、第１のカウンタ部１４、メモリ管理部１７、出力バッファ２０、エラー検出訂正部２２および第２のカウンタ部２４を有する。出力バッファ２０は、積層メモリＭＥＭから読み出される読み出しデータを、セット信号ｓｅｔＸに同期してラッチし、ラッチした読み出しデータをテスト部１２およびエラー検出訂正部２２に出力する。 The CPU has a test section 12 , a first counter section 14 , a memory management section 17 , an output buffer 20 , an error detection/correction section 22 and a second counter section 24 . The output buffer 20 latches read data read from the stacked memory MEM in synchronization with the set signal setX, and outputs the latched read data to the test section 12 and the error detection/correction section 22 .

テスト部１２は、図１での説明と同様に、メモリブロック１０ａからそれぞれ読み出される読み出しデータの読み出しマージンを減らしたテストを実行する。テスト部１２は、セット信号ｓｅｔに対して遷移エッジがずれたセット信号ｓｅｔＸを生成することで、読み出しデータの読み出しマージンを減らす。テスト部１２は、読み出しマージンを低下させた読み出しテストでエラーが発生したメモリブロック１０ａに対応する第１のカウンタ部１４のカウンタ領域に保持されているカウンタ値を更新する（例えば、インクリメント）。カウンタ領域は、メモリチップ１０の全てのメモリブロック１０ａに対応して設けられる。 The test unit 12 performs a test with a reduced read margin for read data read out from the memory block 10a, as described with reference to FIG. The test unit 12 reduces the read data read margin by generating the set signal setX whose transition edge is shifted with respect to the set signal set. The test unit 12 updates (eg, increments) the counter value held in the counter area of the first counter unit 14 corresponding to the memory block 10a in which an error occurred in the read test with the reduced read margin. A counter area is provided corresponding to all the memory blocks 10 a of the memory chip 10 .

エラー検出訂正部２２は、例えば、エラー検出・訂正符号を用いて、出力バッファ２０からの読み出しデータのエラーを検出し、訂正可能なエラーを訂正してメモリチップ１０に書き戻す。第２のカウンタ部２４は、第１のカウンタ部１４と同様に、メモリブロック１０ａにそれぞれ対応するカウンタ領域を有する。エラー検出訂正部２２は、エラーを検出した場合、メモリブロック１０ａ毎に第２のカウンタ部２４のカウンタ領域に保持されたカウンタ値を更新する（例えば、インクリメント）。 The error detection/correction unit 22 detects errors in data read from the output buffer 20 using, for example, an error detection/correction code, corrects correctable errors, and writes the data back to the memory chip 10 . The second counter section 24, like the first counter section 14, has counter areas respectively corresponding to the memory blocks 10a. When an error is detected, the error detection/correction unit 22 updates (eg, increments) the counter value held in the counter area of the second counter unit 24 for each memory block 10a.

メモリ管理部１７は、図１に示したメモリ管理部１６と同様に、第１のカウンタ部１４が保持する回数が、予め定めた第１の閾値を超えたメモリブロック１０ａの使用を禁止する機能を有する。さらに、メモリ管理部１７は、第２のカウンタ部２４が保持する回数が、予め定めた第２の閾値を超えたメモリブロック１０ａの使用を禁止する機能を有する。第１の閾値と第２の閾値は、同じ値でもよく、第２の閾値が第１の閾値より大きくてもよい。なお、以下の説明では、メモリブロック１０ａは、ページ１０ａとも称される。ページ１０ａは、ＣＰＵがユーザプログラムに割り当て可能な最小のメモリサイズを有し、メモリサイズは、例えば、４ｋＢまたは６４ｋＢ等である。 The memory management unit 17, like the memory management unit 16 shown in FIG. 1, has a function of prohibiting the use of the memory block 10a whose number of times held by the first counter unit 14 exceeds a predetermined first threshold. have Furthermore, the memory management unit 17 has a function of prohibiting the use of the memory block 10a whose number of times held by the second counter unit 24 exceeds a predetermined second threshold. The first threshold and the second threshold may be the same value, and the second threshold may be greater than the first threshold. Note that the memory block 10a is also referred to as a page 10a in the following description. Page 10a has the minimum memory size that the CPU can allocate to the user program, and the memory size is, for example, 4 kB or 64 kB.

図３は、図２のテスト部１２の要部の一例を示す。テスト部１２は、選択番号を保持するレジスタ３２と、選択番号のそれぞれに対応する複数の選択値ｓｅｌ（ｓｅｌ０、ｓｅｌ１、．．．ｓｅｌｎ：ｎは例えば”５”）を保持するレジスタ部３４と、遅延調整回路３６と、マルチプレクサ３８とを有する。例えば、選択番号は、レジスタ部３４に保持される選択値ｓｅｌが保持される領域のいずれかを示す。レジスタ部３４は、選択番号の値に対応する領域に保持している選択値ｓｅｌをマルチプレクサ３８に出力する。 FIG. 3 shows an example of a main part of the test section 12 of FIG. The test unit 12 includes a register 32 that holds selection numbers, and a register unit 34 that holds a plurality of selection values sel (sel0, sel1, . , a delay adjustment circuit 36 and a multiplexer 38 . For example, the selection number indicates one of the areas in which the selection value sel held in the register section 34 is held. The register unit 34 outputs the selection value sel held in the area corresponding to the value of the selection number to the multiplexer 38 .

遅延調整回路３６は、セット信号ｓｅｔの論理を反転させるインバータと、インバータからの出力信号を順次遅延させる複数のバッファとを有する。遅延調整回路３６は、セット信号ｓｅｔを遅延させない信号と、セット信号ｓｅｔの論理を反転し、順次遅延させた複数の信号とをマルチプレクサ３８に出力する。 The delay adjustment circuit 36 has an inverter that inverts the logic of the set signal set and a plurality of buffers that sequentially delay the output signal from the inverter. The delay adjustment circuit 36 outputs to the multiplexer 38 a signal that does not delay the set signal set and a plurality of signals obtained by inverting the logic of the set signal set and sequentially delaying them.

マルチプレクサ３８は、レジスタ３２に保持された選択番号に応じてレジスタ部３４から出力される選択値ｓｅｌに対応する信号（セット信号ｓｅｔまたはセット信号ｓｅｔを反転して遅延した信号）を選択し、選択した信号をセット信号ｓｅｔＸとして出力する。例えば、マルチプレクサ３８は、選択値ｓｅｌ０を受けたとき、遅延させないセット信号ｓｅｔを選択し、選択値ｓｅｌ１を受けたとき、反転したセット信号ｓｅｔをバッファで１段遅延させた信号を選択する。また、マルチプレクサ３８は、選択値ｓｅｌ２を受けたとき、反転したセット信号ｓｅｔをバッファで２段遅延させた信号を選択し、選択値ｓｅｌｎを受けたとき、反転したセット信号ｓｅｔをバッファで最大段数遅延させた信号を選択する。図３では、簡易な論理回路により、出力バッファ２０のラッチタイミングに使用するセット信号ｓｅｔＸの発生タイミングを容易に生成することができる。 The multiplexer 38 selects a signal (set signal set or a signal obtained by inverting and delaying the set signal set) corresponding to the selection value sel output from the register unit 34 according to the selection number held in the register 32, and selects The resulting signal is output as the set signal setX. For example, when the multiplexer 38 receives the selection value sel0, it selects the undelayed set signal set, and when it receives the selection value sel1, it selects the signal obtained by delaying the inverted set signal set by one stage with a buffer. When the multiplexer 38 receives the selection value sel2, it selects a signal obtained by delaying the inverted set signal set by two stages with a buffer. Select the delayed signal. In FIG. 3, the generation timing of the set signal setX used for the latch timing of the output buffer 20 can be easily generated by a simple logic circuit.

図４は、図３のテスト部１２が生成するセット信号ｓｅｔＸの一例を示す。例えば、選択値ｓｅｌ０が選択された場合、セット信号ｓｅｔに対してマルチプレクサ３８の遅延時間だけ遅れたセット信号ｓｅｔＸが出力される。選択値ｓｅｌ１－ｓｅｌｎのいずれかが選択された場合、セット信号ｓｅｔの論理を反転し、かつセット信号ｓｅｔに対してバッファの遅延時間だけ遅れたセット信号ｓｅｔＸがそれぞれ出力される。 FIG. 4 shows an example of the set signal setX generated by the test section 12 of FIG. For example, when the selection value sel0 is selected, the set signal setX delayed by the delay time of the multiplexer 38 with respect to the set signal set is output. When one of the selection values sel1-seln is selected, the logic of the set signal set is inverted, and the set signal setX delayed by the delay time of the buffer with respect to the set signal set is output.

セット信号ｓｅｔに対するセット信号ｓｅｔＸの遅延量は、選択値ｓｅｌの番号が大きいほど大きくなる。その結果、図２に示した出力バッファ２０のラッチタイミングであるセット信号ｓｅｔＸの立ち上がりエッジの位相は、選択値ｓｅｌ０が最も遅く、選択値ｓｅｌ１－ｓｅｌｎの番号が小さいほど早い。 The delay amount of the set signal setX with respect to the set signal set increases as the number of the selection value sel increases. As a result, the phase of the rising edge of the set signal setX, which is the latch timing of the output buffer 20 shown in FIG.

図５は、図３のテスト部１２による読み出しテストの一例を示す。テスト部１２は、エラー検出訂正部２２によりエラーを訂正するために、通常の読み出しテストを実行し、読み出しマージンの劣化を判定するために、読み出しマージンを減らした読み出しテストを実行する。 FIG. 5 shows an example of a read test by the test section 12 of FIG. The test unit 12 executes a normal read test in order to correct the error by the error detection/correction unit 22, and executes a read test with a reduced read margin in order to determine deterioration of the read margin.

例えば、通常の読み出しテストと読み出しマージンを減らした読み出しテストとは、全てのページ１０ａの全てのメモリ領域（メモリセル）に対して周期的に実行される。また、図５に示す読み出しテストは、ユーザプログラム処理中に、ユーザプログラムの裏で実行される。すなわち、図５に示す読み出しテストは、いわゆるメモリパトロールにより繰り返し実行される。 For example, a normal read test and a read test with reduced read margin are periodically performed on all memory areas (memory cells) of all pages 10a. Also, the read test shown in FIG. 5 is executed behind the scenes of the user program during user program processing. That is, the read test shown in FIG. 5 is repeatedly executed by so-called memory patrol.

テスト部１２は、読み出しテストにおいて、通常時のセット信号ｓｅｔＸの立ち上がりエッジに同期して、読み出しアドレスを積層メモリＭＥＭに出力する。例えば、読み出しアドレスの出力期間は、セット信号ｓｅｔＸの周期に等しく、セット信号ｓｅｔＸの周期が読み出しサイクルになる。 In the read test, the test section 12 outputs the read address to the stacked memory MEM in synchronization with the rising edge of the set signal setX during normal operation. For example, the read address output period is equal to the cycle of the set signal setX, and the cycle of the set signal setX is the read cycle.

積層メモリＭＥＭは、読み出しアドレスに基づいて、読み出しアドレスが示すメモリ領域にアクセスし、アクセスしたメモリ領域から出力される読み出しデータを出力する。出力バッファ２０は、セット信号ｓｅｔＸの立ち上がりエッジに同期して読み出しデータをラッチする。 The stacked memory MEM accesses the memory area indicated by the read address based on the read address, and outputs read data output from the accessed memory area. The output buffer 20 latches the read data in synchronization with the rising edge of the set signal setX.

読み出しマージンを減らしていない通常の読み出しテストでは、読み出しデータが出力されてからセット信号ｓｅｔＸの立ち上がりエッジまでの時間は、出力バッファ２０が所定のマージン（セットアップ時間）を持って読み出しデータをラッチできる時間である。 In a normal read test in which the read margin is not reduced, the time from the output of read data to the rising edge of the set signal setX is the time during which the output buffer 20 can latch the read data with a predetermined margin (setup time). is.

これに対して、読み出しマージンを減らした読み出しテストでは、読み出しデータが出力されてからセット信号ｓｅｔＸの立ち上がりエッジまでの時間は、通常の読み出しテストに比べて短い。このため、出力バッファ２０が読み出しデータをラッチするタイミングマージンは減少する。 On the other hand, in a read test with a reduced read margin, the time from the output of read data to the rising edge of the set signal setX is shorter than in a normal read test. Therefore, the timing margin for the output buffer 20 to latch the read data is reduced.

なお、読み出しマージンを減らした読み出しテストにおけるセット信号ｓｅｔＸの立ち上がりエッジのタイミングは、読み出し特性が劣化していないメモリチップ１０から出力される読み出しデータを、出力バッファ２０がラッチできるタイミングに設定される。このため、読み出し特性が劣化していないメモリチップ１０では、読み出しマージンを減らした読み出しテストはパスする。 The timing of the rising edge of the set signal setX in the read test with the reduced read margin is set to the timing at which the output buffer 20 can latch the read data output from the memory chip 10 whose read characteristics are not degraded. Therefore, the memory chip 10 whose read characteristics are not degraded passes the read test with a reduced read margin.

例えば、積層メモリＭＥＭを半導体装置１０２に実装する前のテストにおいて、正常なメモリチップ１０から出力される読み出しデータを出力バッファ２０にラッチできるセット信号ｓｅｔＸの出力タイミングが評価される。そして、評価により決定したセット信号ｓｅｔＸの出力タイミングに対応する選択番号が決定する。テスト部１２は、読み出しマージンを減らした読み出しテストでは、評価により決定した選択番号をレジスタ３２にセットする。 For example, in a test before mounting the stacked memory MEM on the semiconductor device 102, the output timing of the set signal setX that enables the output buffer 20 to latch the read data output from the normal memory chip 10 is evaluated. Then, the selection number corresponding to the output timing of the set signal setX determined by the evaluation is determined. The test unit 12 sets the selection number determined by the evaluation in the register 32 in the read test with the reduced read margin.

なお、例えば、メモリチップ１０において、トランジスタの閾値電圧の上昇、内部配線やプラグ部の抵抗値の局所的な増加により、読み出しデータの出力タイミングが遅くなったとする。この場合、読み出しマージンを減らした読み出しテストでは、出力バッファ２０は、読み出しデータをラッチできず、エラーが発生する場合がある。エラーが発生した場合、テスト部１２は、読み出しデータを出力したページ１０ａに対応する第１のカウンタ部１４のカウンタ領域のカウンタ値をインクリメントする。そして、メモリ管理部１７は、後述するように、第１のカウンタ部１４が保持する回数が予め定めた第１の閾値を超えたページ１０ａの使用を禁止する。 For example, in the memory chip 10, it is assumed that the read data output timing is delayed due to an increase in the threshold voltage of the transistor and a local increase in the resistance value of the internal wiring and plug portion. In this case, in a read test with a reduced read margin, the output buffer 20 may not be able to latch the read data and an error may occur. When an error occurs, the test section 12 increments the counter value of the counter area of the first counter section 14 corresponding to the page 10a that has output the read data. Then, as will be described later, the memory management unit 17 prohibits the use of the page 10a whose number of times held by the first counter unit 14 exceeds a predetermined first threshold.

図６は、図２のＣＰＵが実行するユーザプログラム処理のフローの一例を示す。例えば、半導体装置１０２は、仮想記憶システムを採用している。そのため、物理アドレス空間に割り当てられた積層メモリＭＥＭの各ページ１０ａは、メモリ領域の使用状況を管理するページテーブルＰＴＢＬに格納される論理アドレス情報により論理アドレス空間（仮想アドレス空間）に紐付けられる。例えば、ページテーブルＰＴＢＬは、各ページ１０ａにそれぞれ対応するエントリを有する。各エントリは、故障フラグ、バリッドフラグ、アクセスの可否、アクセスの制限、論理アドレスとの対応等の情報を格納する領域を有する。また、各エントリは、図２の第１のカウンタ部１４および第２のカウンタ部２４を有する。 FIG. 6 shows an example of the flow of user program processing executed by the CPU in FIG. For example, the semiconductor device 102 employs a virtual memory system. Therefore, each page 10a of the stacked memory MEM allocated to the physical address space is linked to the logical address space (virtual address space) by the logical address information stored in the page table PTBL that manages the usage status of the memory area. For example, the page table PTBL has an entry corresponding to each page 10a. Each entry has areas for storing information such as failure flags, valid flags, permission/prohibition of access, access restrictions, correspondence with logical addresses, and the like. Each entry also has the first counter section 14 and the second counter section 24 of FIG.

故障フラグは、テスト部１２によりページ１０ａの読み出しマージンの劣化が判定されたときにセットされる。バリッドフラグは、ユーザプログラムまたはＯＳ（Operating System）でページ１０ａを使用する場合にセットされる。アクセスの可否は、ユーザプログラムによるページ１０ａの使用を許可する場合にセットされる。アクセスの制限は、例えば、読み出しのみ可能、読み出しと書き込みの両方が可能等の情報が格納される。論理アドレスとの対応を示す領域には、例えば、紐付ける論理アドレスの上位の所定数のビットが格納される。 The failure flag is set when the test section 12 determines that the read margin of the page 10a has deteriorated. The valid flag is set when the page 10a is used by a user program or an OS (Operating System). Whether or not access is permitted is set when the use of the page 10a by the user program is permitted. For access restrictions, for example, information such as read only or both read and write is stored. The area indicating the correspondence with the logical address stores, for example, a predetermined number of high-order bits of the associated logical address.

図６に示す処理フローでは、まず、ステップＳ１０において、ＣＰＵが実行するＯＳは、ユーザプログラム処理を開始する前に、ユーザプログラムが使用するメモリ領域を確保する。この際、故障フラグがセットされたページ１０ａを避けてメモリ領域が確保される。次に、ステップＳ１１において、ＯＳは、確保したメモリ領域に対応して、ページテーブルＰＴＢＬを更新し、ユーザプログラムで使用するページ１０ａを決定する。次に、ステップＳ１２において、確保したページ１０ａを使用してユーザプログラム処理が開始される。 In the processing flow shown in FIG. 6, first, in step S10, the OS executed by the CPU secures a memory area used by the user program before starting the user program processing. At this time, a memory area is secured by avoiding the page 10a for which the failure flag is set. Next, in step S11, the OS updates the page table PTBL according to the secured memory area, and determines the page 10a to be used by the user program. Next, in step S12, user program processing is started using the secured page 10a.

ユーザプログラム処理が終了した場合、ステップＳ１３において、ＯＳは、ユーザプログラムで使用したページ１０ａに対応する第１のカウンタ部１４のカウンタ値および第２のカウンタ部２４のカウンタ値に基づいて、故障フラグをセットする。第１のカウンタ部１４のカウンタ値および第２のカウンタ部２４のカウンタ値は、ユーザプログラム処理中に実行されるテスト部１２によるメモリパトロールにより更新される。 When the user program processing ends, in step S13, the OS sets a failure flag based on the counter values of the first counter unit 14 and the counter values of the second counter unit 24 corresponding to the page 10a used by the user program. to set. The counter value of the first counter section 14 and the counter value of the second counter section 24 are updated by memory patrol by the test section 12 that is executed during user program processing.

なお、故障フラグのセットは、ユーザプログラム処理中に実行されてもよく、ステップＳ１５のページテーブルＰＴＢＬの初期化後に実行されてもよい。ＯＳは、その後のユーザプログラム処理のフローにおいて、故障フラグがセットされたページ１０ａをメモリ領域として確保しない。 Note that the failure flag may be set during the user program processing, or after the page table PTBL is initialized in step S15. The OS does not secure the page 10a for which the failure flag is set as a memory area in the flow of subsequent user program processing.

次に、ステップＳ１４において、ＯＳは、ユーザプログラムで使用していたメモリ領域を開放する。次に、ステップＳ１５において、ＯＳは、ユーザプログラムで使用するために割り当てていたページ１０ａに対応するページテーブルＰＴＢＬのエントリを初期化して、ユーザプログラム処理に対する一連の動作を完了する。なお、故障フラグは、通常はエントリの初期化によっても初期化されないため、一度セットされた故障フラグがリセットされることを抑止することができる。 Next, in step S14, the OS releases the memory area used by the user program. Next, in step S15, the OS initializes the page table PTBL entry corresponding to the page 10a allocated for use by the user program, and completes a series of operations for user program processing. Since the failure flag is not normally initialized even by the initialization of the entry, it is possible to prevent the failure flag from being reset once it has been set.

図７は、図２のＣＰＵが実行するメモリパトロールの起動フローの一例を示す。まず、ステップＳ２０において、ＯＳは、メモリパトロールの起動を指示する。ステップＳ２１において、ＣＰＵは、ＯＳからの起動の指示に基づいてメモリパトロールを開始する。 FIG. 7 shows an example of a memory patrol startup flow executed by the CPU in FIG. First, in step S20, the OS instructs activation of memory patrol. In step S21, the CPU starts memory patrol based on an activation instruction from the OS.

ステップＳ２２において、ＣＰＵに搭載されるメモリアクセスコントローラは、メモリパトロールの開始に伴い、積層メモリＭＥＭから読み出しデータを読み出すためにアドレスや制御信号を含む読み出しアクセス要求を積層メモリＭＥＭに出力する。ステップＳ２３において、積層メモリＭＥＭは、読み出しアクセス要求に基づいて、対象のページ１０ａからデータを読み出す読み出し動作を実行する。 In step S22, the memory access controller mounted on the CPU outputs a read access request including an address and a control signal to the stacked memory MEM in order to read read data from the stacked memory MEM as the memory patrol starts. In step S23, the stacked memory MEM executes a read operation of reading data from the target page 10a based on the read access request.

この後、ステップＳ２２、Ｓ２３が繰り返し実行されることで、アクセス領域を変えながらメモリパトロールが繰り返し実行される。メモリパトロールの処理の例は、図８に示す。なお、メモリパトロールを停止する場合、ＯＳは、ＣＰＵにメモリパトロールの停止を指示し、ＣＰＵはメモリパトロールを停止する。ＯＳは、メモリパトロールの起動および停止の指示しかしないため、メモリパトロールによる負荷は小さい。また、メモリパトロールによる積層メモリＭＥＭの制御は、メモリアクセスコントローラにより実行されるため、メモリパトロールによるＣＰＵの負荷は小さい。 Thereafter, by repeatedly executing steps S22 and S23, the memory patrol is repeatedly executed while changing the access area. An example of memory patrol processing is shown in FIG. When stopping the memory patrol, the OS instructs the CPU to stop the memory patrol, and the CPU stops the memory patrol. Since the OS only instructs to start and stop the memory patrol, the load due to the memory patrol is small. In addition, since the memory access controller controls the stacked memory MEM by memory patrol, the load on the CPU due to memory patrol is small.

図８は、図２のＣＰＵが実行するメモリパトロールの処理フローの一例を示す。ステップＳ３０、Ｓ３１、Ｓ３２、Ｓ３３は、図５に示した通常の読み出しテストによるメモリパトロール処理を示す。ステップＳ３４、Ｓ３５、Ｓ３６、Ｓ３７は、図５に示した読み出しマージンを減らした読み出しテストによるメモリパトロール処理を示す。例えば、メモリパトロールでは、誤り検出・訂正符号を用いて、１ビットのエラー訂正または２ビットのエラー検出が行われる。 FIG. 8 shows an example of a processing flow of memory patrol executed by the CPU in FIG. Steps S30, S31, S32, and S33 show memory patrol processing by the normal read test shown in FIG. Steps S34, S35, S36, and S37 show memory patrol processing by the read test with the reduced read margin shown in FIG. For example, in memory patrol, 1-bit error correction or 2-bit error detection is performed using an error detection/correction code.

まず、ステップＳ３０において、ＣＰＵは、積層メモリＭＥＭに読み出しアクセス要求を発行し、積層メモリＭＥＭから出力される読み出しデータを受信する。なお、ＣＰＵ（テスト部１２）は、読み出しアクセス要求の発行前に、選択値ｓｅｌ０を選択するための選択番号をレジスタ３２（図３）にセットする。このため、出力バッファ２０は、遅延させないセット信号ｓｅｔに基づいて生成されたセット信号ｓｅｔＸに同期して読み出しデータをラッチする。 First, in step S30, the CPU issues a read access request to the stacked memory MEM and receives read data output from the stacked memory MEM. Note that the CPU (test unit 12) sets a selection number for selecting the selection value sel0 in the register 32 (FIG. 3) before issuing the read access request. Therefore, the output buffer 20 latches the read data in synchronization with the set signal setX generated based on the undelayed set signal set.

次に、ステップＳ３１において、ＣＰＵは、エラー検出訂正部２２により読み出しデータのエラーが検出された場合、ステップＳ３２に移行し、エラーが検出されない場合、ステップＳ３４に移行する。なお、この例では、説明を簡単にするため、訂正できないエラーは発生しないものとする。 Next, in step S31, if the error detection/correction unit 22 detects an error in the read data, the CPU proceeds to step S32, and if no error is detected, proceeds to step S34. In this example, for the sake of simplicity, it is assumed that no uncorrectable error occurs.

ステップＳ３２において、エラー検出訂正部２２は、読み出しデータのエラーを訂正し、積層メモリＭＥＭに書き戻す。次に、ステップＳ３３において、エラー検出訂正部２２は、第２のカウンタ部２４において、エラーを検出したページ１０ａに対応するカウンタ領域のカウンタ値をインクリメントし、ステップＳ３４に移行する。 In step S32, the error detection/correction unit 22 corrects the read data error and writes it back to the stacked memory MEM. Next, in step S33, the error detection/correction unit 22 increments the counter value of the counter area corresponding to the page 10a in which the error is detected in the second counter unit 24, and proceeds to step S34.

ステップＳ３４において、ＣＰＵ（テスト部１２）は、セット信号ｓｅｔＸを、読み出しマージンを減らすタイミングに設定するために、標準状態の選択番号が設定されたレジスタ３２を、読み出しマージンを減らすための選択番号に書き換える。選択番号を格納するレジスタ３２を設けることで、セット信号ｓｅｔＸのタイミングを容易に調整することができる。 In step S34, the CPU (test unit 12) sets the set signal setX to the timing for reducing the read margin, by changing the register 32 in which the standard state selection number is set to the selection number for reducing the read margin. rewrite. By providing the register 32 for storing the selection number, the timing of the set signal setX can be easily adjusted.

ステップＳ３５において、ＣＰＵは、積層メモリＭＥＭに読み出しアクセス要求を発行し、積層メモリＭＥＭから出力される読み出しデータを受信する。出力バッファ２０は、セット信号ｓｅｔを反転して所定量遅延させた信号に基づいて生成されたセット信号ｓｅｔＸに同期して、読み出しデータをラッチする。 In step S35, the CPU issues a read access request to the stacked memory MEM and receives read data output from the stacked memory MEM. The output buffer 20 latches read data in synchronization with a set signal setX generated based on a signal obtained by inverting the set signal set and delaying it by a predetermined amount.

次に、ステップＳ３６において、テスト部１２は、読み出しデータのエラーを検出した場合、ステップＳ３７に移行し、エラーが検出されない場合、処理を終了する。なお、ステップＳ３６により検出されるエラーは、シングルビットエラーでもよく、複数ビットエラーでもよい。 Next, in step S36, if the test section 12 detects an error in the read data, the process proceeds to step S37, and if no error is detected, the process ends. The error detected in step S36 may be a single bit error or a multiple bit error.

ステップＳ３７において、第１のカウンタ部１４において、エラーを検出したページ１０ａに対応するカウンタ領域のカウンタ値をインクリメントし、処理を終了する。 In step S37, the first counter unit 14 increments the counter value of the counter area corresponding to the page 10a in which the error is detected, and the process ends.

この実施形態では、通常の読み出しテストと読み出しマージンを減らした読み出しテストとのそれぞれについて、エラーの発生数をカウントする。これにより、読み出しマージンの不足によるエラー以外のエラー（例えば、メモリ内部の特性変動によるエラー）が発生する場合にも、異常の予兆を検出することができ、エラーの発生したページ１０ａが不良になる前に縮退することができる。 In this embodiment, the number of error occurrences is counted for each of a normal read test and a read test with a reduced read margin. As a result, even if an error other than an error due to insufficient read margin occurs (for example, an error due to a characteristic variation inside the memory), a sign of abnormality can be detected, and the page 10a in which the error has occurred becomes defective. You can degenerate before.

図９は、図２のＣＰＵが実行するページの使用を禁止する処理フローの一例を示す。例えば、図９に示す処理は、図６のステップＳ１３に対応する処理である。図９に示す処理は、メモリ管理部１７により、ページ１０ａ毎に実行される。 FIG. 9 shows an example of a processing flow for prohibiting use of a page executed by the CPU in FIG. For example, the processing shown in FIG. 9 is processing corresponding to step S13 in FIG. The processing shown in FIG. 9 is executed by the memory management unit 17 for each page 10a.

まず、ステップＳ４０において、メモリ管理部１７は、対象のページ１０ａに対応する第２のカウンタ部２４のカウンタ領域からカウンタ値を読み出す。ステップＳ４１において、メモリ管理部１７は、読み出したカウンタ値が第２の閾値を超えている場合、劣化等の何らかの異常の予兆が見られると判断し、処理をステップＳ４４に移行する。メモリ管理部１７は、読み出したカウンタ値が第２の閾値を超えていない場合、処理をステップＳ４２に移行する。 First, in step S40, the memory management unit 17 reads a counter value from the counter area of the second counter unit 24 corresponding to the target page 10a. In step S41, when the read counter value exceeds the second threshold value, the memory management unit 17 determines that there is a sign of some abnormality such as deterioration, and shifts the process to step S44. If the read counter value does not exceed the second threshold value, the memory management unit 17 shifts the process to step S42.

ステップＳ４１において、メモリ管理部１７は、対象のページ１０ａに対応する第１のカウンタ部１４のカウンタ領域からカウンタ値を読み出す。ステップＳ４３において、メモリ管理部１７は、読み出したカウンタ値が第１の閾値を超えている場合、劣化等の何らかの異常の予兆が見られると判断し、処理をステップＳ４４に移行する。メモリ管理部１７は、読み出したカウンタ値が第１の閾値を超えていない場合、処理をステップＳ４５に移行する。 In step S41, the memory management unit 17 reads the counter value from the counter area of the first counter unit 14 corresponding to the target page 10a. In step S43, when the read counter value exceeds the first threshold value, the memory management unit 17 determines that there is a sign of some abnormality such as deterioration, and shifts the process to step S44. If the read counter value does not exceed the first threshold value, the memory management unit 17 shifts the process to step S45.

ステップＳ４４において、メモリ管理部１７は、ページテーブルＰＴＢＬにおいて、対象のページ１０ａに対応するエントリの故障フラグをセットし処理を終了する。すなわち、第１および第２のカウンタ部１４、２４の評価対象のカウンタ値の一方または両方が、それぞれの閾値を超えた場合、該当ページ１０ａの使用を停止する縮退予約が行われる。この実施形態では、ＯＳが管理する記憶領域はページ１０ａ単位で割り当てられるため、縮退予約は、ページオフラインの動作である。なお、故障フラグのセットにより、セットされた故障フラグに対応するページ１０ａのユーザプログラムによる使用が禁止される。ステップＳ４４の後、処理を終了せずにステップＳ４５の処理が実行されてもよい。 In step S44, the memory management unit 17 sets the failure flag of the entry corresponding to the target page 10a in the page table PTBL, and ends the process. That is, when one or both of the counter values to be evaluated of the first and second counter units 14 and 24 exceed the respective threshold values, degeneracy reservation is performed to stop using the corresponding page 10a. In this embodiment, since the storage area managed by the OS is allocated in units of pages 10a, the fallback reservation is a page offline operation. By setting the failure flag, the page 10a corresponding to the set failure flag is prohibited from being used by the user program. After step S44, the process of step S45 may be executed without ending the process.

ステップＳ４５において、メモリ管理部１７は、対象のページ１０ａに対応する第１のカウンタ部１４のカウンタ値と、対象のページ１０ａに対応する第２のカウンタ部２４のカウンタ値とをクリアし（例えば、”０”）、処理を終了する。 In step S45, the memory management unit 17 clears the counter value of the first counter unit 14 corresponding to the target page 10a and the counter value of the second counter unit 24 corresponding to the target page 10a (for example, , "0"), the process ends.

なお、第１のカウンタ部１４のカウンタ値および第２のカウンタ部２４のカウンタ値は、一定時間毎にクリアされてもよい。例えば、カウンタ値は、１日毎にクリアされ、あるいは、ページ１０ａのスワップの発生時にクリアされてもよい。これにより、カウンタ値が、ソフト的に発生したエラー数が長期間かけて蓄積されて閾値を超えたのか、劣化等の異常の予兆によりエラーの発生頻度が高くなってきたことにより閾値を超えたのかを判断することができる。 The counter value of the first counter section 14 and the counter value of the second counter section 24 may be cleared at regular time intervals. For example, the counter value may be cleared every day, or when a page 10a swap occurs. As a result, the counter value exceeded the threshold because the number of software-generated errors accumulated over a long period of time, or because the frequency of errors increased due to signs of abnormalities such as deterioration. It is possible to judge whether

例えば、第１の閾値は”１０”であり、第２の閾値は”３”であり、第１の閾値は、第２の閾値より大きい。読み出しマージンを減らした読み出しテストは、通常の読み出しテストに比べてエラーが発生しやすいため、第１の閾値と第２の閾値とを同じ値にすると、図９のフローにおいて、常に第１の閾値超えにより故障フラグが発生する可能性が高くなる。このため、エラーの発生頻度を考慮して、第１の閾値を第２の閾値より大きくすることで、第１の閾値超えにより故障フラグをセットする頻度と、第２の閾値超えにより故障フラグをセットする頻度とを同等にすることができる。 For example, the first threshold is "10", the second threshold is "3", and the first threshold is greater than the second threshold. A read test with a reduced read margin is more likely to cause an error than a normal read test. Exceeding increases the likelihood of fault flags being generated. Therefore, by setting the first threshold larger than the second threshold in consideration of the frequency of occurrence of errors, the frequency of setting the failure flag when the first threshold is exceeded and the frequency of setting the failure flag when the second threshold is exceeded are calculated. can be equated with the set frequency.

図１０は、図２の半導体装置１０２が搭載されるパッケージの一例を示す。半導体装置１０２は、ＴＳＶを介して複数のメモリチップ１０およびロジックチップ５０を相互に接続した積層メモリＭＥＭと、ＣＰＵチップとをシリコンインターポーザを介して接続した構造を有する。各チップ間およびチップとシリコンインターポーザとの間は、バンプにより接続される。また、シリコンインターポーザは、バンプを介してパッケージ基板に接続される。さらに、パッケージ基板は、バンプを介して図示しないシステム基板等に接続される。 FIG. 10 shows an example of a package in which the semiconductor device 102 of FIG. 2 is mounted. The semiconductor device 102 has a structure in which a stacked memory MEM in which a plurality of memory chips 10 and logic chips 50 are interconnected via TSVs and a CPU chip are connected via a silicon interposer. Connections between chips and between the chips and the silicon interposer are made by bumps. Also, the silicon interposer is connected to the package substrate via bumps. Furthermore, the package substrate is connected to a system substrate or the like (not shown) via bumps.

以上、図２から図１０に示す実施形態において、図１に示す実施形態と同様の効果を得ることができる。すなわち、メモリパトロール時に、通常のパトロール動作とは別に、読み出しマージンを低下させたパトロール動作を実行することにより、メモリチップのトランジスタの劣化等による異常の予兆をページ１０ａ毎に検出することができる。そして、異常の予兆が検出されたページ１０ａを縮退（オフライン）することで、その後に開始されるユーザプログラム処理に使用されることを抑止することができる。この結果、半導体装置１０２の信頼性を向上することができる。 As described above, in the embodiment shown in FIGS. 2 to 10, the same effects as in the embodiment shown in FIG. 1 can be obtained. In other words, during memory patrol, by executing a patrol operation with a reduced read margin in addition to the normal patrol operation, it is possible to detect a sign of an abnormality due to degradation of the memory chip transistor or the like for each page 10a. Then, by degenerating (offlining) the page 10a in which the sign of abnormality is detected, it is possible to prevent the page 10a from being used in the user program processing started thereafter. As a result, reliability of the semiconductor device 102 can be improved.

また、読み出しマージンを減らした読み出しテストだけでなく、通常の読み出しテストによるメモリパトロールを実行することで、通常の読み出しタイミングで発生するエラーに対応して故障フラグをセットすることができる。これにより、読み出しマージンの不足によるエラー以外のエラー（例えば、メモリ内部の特性変動によるエラー）が発生する場合にも、異常の予兆を検出することができ、エラーの発生したページ１０ａが不良になる前に縮退することができる。 In addition, by executing memory patrol by not only a read test with a reduced read margin but also a normal read test, it is possible to set a failure flag in response to an error that occurs at normal read timing. As a result, even if an error other than an error due to insufficient read margin occurs (for example, an error due to a characteristic variation inside the memory), a sign of abnormality can be detected, and the page 10a in which the error has occurred becomes defective. You can degenerate before.

第１の閾値を第２の閾値より大きくすることで、第１の閾値超えにより故障フラグをセットする頻度と、第２の閾値超えにより故障フラグをセットする頻度とを同等にすることができる。すなわち、２つの読み出しテストに基づく故障フラグのセットの頻度を、偏ることなく均等にすることができる。 By making the first threshold larger than the second threshold, the frequency of setting the failure flag when the first threshold is exceeded can be made equal to the frequency of setting the failure flag when the second threshold is exceeded. That is, it is possible to equalize the frequency of setting fault flags based on the two read tests.

半導体装置１０２が仮想記憶システムを採用し、ＣＰＵがユーザプログラム処理を行う毎にページ１０ａを割り当てる場合に、エラー数が閾値を超えたページのその後の使用を禁止することができ、半導体装置１０２の信頼性を向上することができる。 When the semiconductor device 102 adopts a virtual memory system and the page 10a is allocated each time the CPU executes the user program processing, subsequent use of the page whose number of errors exceeds the threshold can be prohibited. Reliability can be improved.

第１のカウンタ部１４のカウンタ値および第２のカウンタ部２４のカウンタ値を、一定時間毎にクリアすることで、カウンタ値が閾値を超えた理由を判断することができる。 By clearing the counter value of the first counter unit 14 and the counter value of the second counter unit 24 at regular time intervals, it is possible to determine the reason why the counter value exceeds the threshold.

以上の詳細な説明により、実施形態の特徴点および利点は明らかになるであろう。これは、特許請求の範囲がその精神および権利範囲を逸脱しない範囲で前述のような実施形態の特徴点および利点にまで及ぶことを意図するものである。また、当該技術分野において通常の知識を有する者であれば、あらゆる改良および変更に容易に想到できるはずである。したがって、発明性を有する実施形態の範囲を前述したものに限定する意図はなく、実施形態に開示された範囲に含まれる適当な改良物および均等物に拠ることも可能である。 From the detailed description above, the features and advantages of the embodiments will become apparent. It is intended that the claims cover the features and advantages of such embodiments without departing from their spirit and scope. In addition, any improvements and modifications will readily occur to those skilled in the art. Accordingly, the scope of inventive embodiments is not intended to be limited to that described above, but can be relied upon by suitable modifications and equivalents within the scope disclosed in the embodiments.

１０メモリチップ
１０ａメモリブロック（ページ）
１２テスト部
１４第１のカウンタ部
１６、１７メモリ管理部
２０出力バッファ
２２エラー検出訂正部
２４第２のカウンタ部
３２レジスタ
３４レジスタ部
３６遅延調整回路
１００、１０２半導体装置
ＭＥＭ積層メモリ
ｓｅｌ選択値
ｓｅｔ、ｓｅｔＸセット信号
10 memory chip 10a memory block (page)
12 test section 14 first counter section 16, 17 memory management section 20 output buffer 22 error detection correction section 24 second counter section 32 register 34 register section 36 delay adjustment circuit 100, 102 semiconductor device MEM laminated memory sel selection value set , setX set signal

Claims

a stacked memory including a plurality of stacked memories;
a test unit that performs a test with a reduced read margin for read data read from a plurality of memory blocks of the plurality of memories;
a first counter unit that holds, for each memory block, the number of times an error has occurred in the test by the test unit;
an error detection and correction unit that corrects a correctable error that occurs in the stacked memory;
a second counter unit that holds the number of error corrections made by the error detection/correction unit for each memory block;
prohibiting the use of a memory block in which the number of times held by the first counter exceeds a predetermined first threshold , and the number of error corrections held by the second counter exceeds a predetermined second threshold; a memory management unit that prohibits the use of exceeding memory blocks ;
A semiconductor device comprising:

2. The semiconductor device according to claim 1, wherein said first threshold is greater than said second threshold.

3. The semiconductor device according to claim 2 , wherein said memory management unit clears the counter value of said second counter unit at a predetermined frequency.

Having a buffer that holds read data read from the stacked memory,
4. The semiconductor device according to claim 1, wherein the test section shortens a read margin by advancing timing of loading the read data into the buffer.

a processor for executing a user program using a predetermined number of said memory blocks;
5. The semiconductor device according to any one of claims 1 to 4, wherein the memory management unit prohibits use of the memory block for which use prohibition has been determined based on the termination of the user program. .

6. The semiconductor device according to claim 1, wherein said memory management unit clears the counter value of said first counter unit at a predetermined frequency .