JP7058132B2

JP7058132B2 - Systems and methods for maximized deduplication memory

Info

Publication number: JP7058132B2
Application number: JP2018010614A
Authority: JP
Inventors: 冬岩姜; 強彭; 宏忠鄭
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2017-01-25
Filing date: 2018-01-25
Publication date: 2022-04-21
Anticipated expiration: 2038-01-25
Also published as: TWI761419B; TW201830249A; JP2018120594A; CN108345433B; KR20180087838A; CN108345433A; KR102509913B1

Description

本発明は、一般にメモリに関し、より詳しくは重複除去メモリを最大化するシステム及び方法に関する。 The present invention relates generally to memory, and more particularly to systems and methods for maximizing deduplication memory.

重複除去された（又は重複除去）メモリは、データを格納するためのより効率的なメカニズムを提供する。従来のメモリソリューションでは、各々のデータオブジェクトはメモリ内の固有位置に書き込まれる。同一のデータオブジェクトはメモリ内の任意の数の位置に格納され、各々はそのデータオブジェクトの分離されたコピーである。メモリシステムは、このようなデータの繰り返し格納を識別又は防止する方法を有しない。大規模なデータオブジェクトに対して、データのこのような繰り返し格納は無駄である。任意のデータオブジェクトの１つのコピーのみを格納する重複除去メモリはこの問題を解決しようとする。 Deduplicated (or deduplicated) memory provides a more efficient mechanism for storing data. In traditional memory solutions, each data object is written to a unique location in memory. The same data object is stored in any number of locations in memory, each being a separate copy of that data object. The memory system has no way of identifying or preventing such repeated storage of data. Such repetitive storage of data is useless for large data objects. Deduplication memory, which stores only one copy of any data object, attempts to solve this problem.

幾つかの重複除去メモリは、データオブジェクトを格納するためにハッシュテーブル（ＨａｓｈＴａｂｌｅ）を使用する。しかし、ハッシュテーブルは、サイズを２倍にするメカニズムによってのみ増加される。このような大きい増加粒状度（ｇｒａｎｕｌａｒｉｔｙ）は、しばしば重複除去メモリとして使用できないメモリの大きな部分を残し、単にオーバーフロー領域として処理される。オーバーフロー領域のメモリは重複除去されないので、メモリの大きな部分が重複除去できない場合、全般的な重複除去比率は低下する。 Some deduplication memory uses a hash table to store data objects. However, the hash table is only incremented by a mechanism that doubles its size. Such a large granularity often leaves a large portion of the memory that cannot be used as the deduplication memory and is simply treated as an overflow area. Since the memory in the overflow area is not deduplicated, the overall deduplication ratio will be reduced if a large portion of the memory cannot be deduplicated.

そこで、重複除去の対象にされるメモリの比率を増加させるための方法が依然必要である。 Therefore, there is still a need for a method to increase the proportion of memory targeted for deduplication.

米国特許第９６３９２７４号明細書U.S. Pat. No. 9639274 米国特許出願公開第２０１５／０３７０８３５号明細書U.S. Patent Application Publication No. 2015/0370835 米国特許出願公開第２０１７／０１６１３２９号明細書U.S. Patent Application Publication No. 2017/0161329

本発明は、上記従来の問題点に鑑みてなされたものであって、本発明の目的は、重複除去の対象となるメモリの部分を増加させるためのシステム及び方法を提供することにある。 The present invention has been made in view of the above-mentioned conventional problems, and an object of the present invention is to provide a system and a method for increasing a portion of memory to be deduplicated.

上記目的を達成するためになされた本発明の一態様によるメモリシステムは、データを格納するメモリと、前記メモリに格納され、所定数のバケット及び第１の数のウェイを含み、２の第１の累乗である第１数のバイトを含む前記メモリの第１部分を含む大きいハッシュテーブルと、前記メモリに格納され、所定数のバケット及び第２の数のウェイを含み、２の第２の累乗である第２数のバイトを含む前記メモリの第２部分を含む小さいハッシュテーブルと、前記メモリに格納され、前記メモリの第３部分を含むオーバーフロー領域と、論理アドレスを領域識別子及び物理アドレスを含むＰＬＩＤ（ＰｈｙｓｉｃａｌＬｉｎｅＩｄｅｎｔｉｆｉｅｒ）にマッピングする変換テーブルと、を備えることを特徴とする。
A memory system according to an aspect of the present invention made to achieve the above object includes a memory for storing data, a predetermined number of buckets and a first number of ways stored in the memory, and a first of two. A large hash table containing the first part of the memory containing the first part of the memory, which is a power of, and a second number of 2 buckets stored in the memory containing a predetermined number of buckets and a second number of ways . A small hash table containing a second part of the memory containing a second number of bytes that is a power , an overflow area stored in the memory containing the third part of the memory, and a logical address as an area identifier and a physical address. It is characterized by comprising a conversion table for mapping to a PLID (Physical Line Idea) including the above.

上記目的を達成するためになされた本発明の一態様によるメモリシステムの方法は、プロセッサから論理アドレスを受信する段階と、変換テーブルを使用して、前記論理アドレスを領域識別子及び物理アドレスを含むＰＬＩＤ（ＰｈｙｓｉｃａｌＬｉｎｅＩｄｅｎｔｉｆｉｅｒ）にマッピングする段階と、前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階と、前記物理アドレスを利用して前記メモリ内のデータにアクセスする段階と、を有することを特徴とする。 A method of a memory system according to an aspect of the present invention made to achieve the above object uses a step of receiving a logical address from a processor and a conversion table to convert the logical address to a PLL including an area identifier and a physical address. The stage of mapping to (Physical Line Idea), the stage of determining whether or not the physical address exists in the large hash table, the small hash table, or the overflow area in the memory by using the area identifier, and the physical. It is characterized by having a stage of accessing the data in the memory by using an address.

上記目的を達成するためになされた本発明の一態様によるコンピュータ読み取り可能な記録媒体は、コンピュータに下記方法を実行させるためのプログラムを記録し、前記方法は、プロセッサから論理アドレスを受信する段階と、変換テーブルを使用して、前記論理アドレスを領域識別子及び物理アドレスを含むＰＬＩＤ（ＰｈｙｓｉｃａｌＬｉｎｅＩｄｅｎｔｉｆｉｅｒ）にマッピングする段階と、前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階と、前記物理アドレスを利用して前記メモリ内のデータにアクセスする段階と、を有することを特徴とする。 A computer-readable recording medium according to an aspect of the present invention made to achieve the above object records a program for causing a computer to execute the following method, and the method is a step of receiving a logical address from a processor. , A step of mapping the logical address to a PLID (Physical Line Identity) including an area identifier and a physical address using a conversion table, and a large hash table and a small hash table in memory using the area identifier. Alternatively, it is characterized by having a step of determining whether or not the physical address exists in the overflow area, and a step of accessing the data in the memory by using the physical address.

本発明によれば、重複除去された（又は重複除去）メモリは、データを格納するより効果的なメカニズムを提供する。また、メモリの使用を向上させて、重複除去が効果的になるために必要な重複除去比率を減少させる。 According to the present invention, deduplicated (or deduplicated) memory provides a more effective mechanism for storing data. It also improves memory usage and reduces the deduplication ratio required for effective deduplication.

本発明の一実施形態による重複除去メモリを使用するために動作するマシンを示す図である。It is a figure which shows the machine which operates for using the deduplication memory by one Embodiment of this invention. 図１に示すマシンのさらなる詳細を示す図である。It is a figure which shows the further detail of the machine shown in FIG. 図１のマシン内での重複除去メモリに対する従来のハッシュテーブルの使用を示す図である。It is a figure which shows the use of the conventional hash table for the deduplication memory in the machine of FIG. 本発明の一実施形態による拡張可能なハッシュテーブルの使用を示す図である。It is a figure which shows the use of the expandable hash table by one Embodiment of this invention. 本発明の一実施形態による拡張可能なハッシュテーブルの使用を示す図である。It is a figure which shows the use of the expandable hash table by one Embodiment of this invention. 論理アドレスを多様なメモリ目的地にマッピングするために図４の変換テーブルの使用を示す図である。It is a figure which shows the use of the conversion table of FIG. 4 to map a logical address to various memory destinations. 本発明の一実施形態による重複除去メモリに図４の拡張可能なハッシュテーブルを使用する手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure which uses the expandable hash table of FIG. 4 for the deduplication memory by one Embodiment of this invention. 本発明の一実施形態による重複除去メモリに図４の拡張可能なハッシュテーブルを使用する手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure which uses the expandable hash table of FIG. 4 for the deduplication memory by one Embodiment of this invention. 本発明の一実施形態によるメモリの読出し要請での論理アドレスに対するＰＬＩＤ（ＰｈｙｓｉｃａｌＬｉｎｅＩｄｅｎｔｉｆｉｅｒ）を判別するための手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure for discriminating the PLID (Physical Line Identifier) with respect to the logical address in the read request of the memory by one Embodiment of this invention. 本発明の一実施形態によるメモリの書込み要請での論理アドレスに対するＰＬＩＤを判別するための手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure for discriminating the PLID with respect to the logical address in the write request of the memory by one Embodiment of this invention. 本発明の一実施形態によるメモリの書込み要請での論理アドレスに対するＰＬＩＤを判別するための手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure for discriminating the PLID with respect to the logical address in the write request of the memory by one Embodiment of this invention. 本発明の一実施形態によるメモリの書込み要請での論理アドレスに対するＰＬＩＤを判別するための手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure for discriminating the PLID with respect to the logical address in the write request of the memory by one Embodiment of this invention. 本発明の一実施形態による小さいハッシュテーブルのサイズを増加させるか否かを判別するための手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure for determining whether or not to increase the size of a small hash table by one Embodiment of this invention.

以下、本発明を実施するための形態の具体例を、図面を参照しながら詳細に説明する。後述する詳細な説明で、本発明の概念が完全に理解できるよう多数の具体的な事例が提示される。しかし、該当技術分野の通常の技術者はこのような具体的な事例無しに本発明を実施することができる。広く公知された方法、装置、構成要素、回路、及びネットワークは、本発明の態様を不必要に曖昧にしないために詳細に説明しない。 Hereinafter, specific examples of embodiments for carrying out the present invention will be described in detail with reference to the drawings. In the detailed description described below, a number of specific examples are presented so that the concept of the present invention can be fully understood. However, an ordinary engineer in the relevant technical field can carry out the present invention without such a specific case. Widely known methods, devices, components, circuits, and networks are not described in detail in order not to unnecessarily obscure aspects of the invention.

本明細書で多様な構成要素を説明するために第１、第２等の用語が使用されるが、これらの構成要素はこのような用語によって制限されない。例えば、本発明の技術的範囲から逸脱しない範囲で、第１モジュールは第２モジュールと称され、同様に第２モジュールは第１モジュールと称される。 The first, second, and the like are used herein to describe the various components, but these components are not limited by such terms. For example, to the extent that it does not deviate from the technical scope of the present invention, the first module is referred to as a second module, and similarly, the second module is referred to as a first module.

本発明の説明に使用される用語は、特定の実施形態を説明する目的にのみ使用され、本発明の技術的思想を限定しない。文脈上で明確に指示しない限り、単数表現は複数表現を含む。「及び／又は」の用語は１つ又はそれより多い連関されたエントリの任意且つ可能な全ての組み合わせを含む。「含む」及び／又は「含んでいる」の用語は記載された特性、整数、段階、動作、元素、及び／又は構成要素の存在を明記し、１つ又はそれより多い他の特性、整数、段階、動作、元素、構成要素、及び／又はそれらのグループの存在又は追加を排除しない。図面に示された構成要素及び特性は実際の比率に必ずしも比例しない。 The terms used in the description of the present invention are used only for the purpose of describing a particular embodiment and do not limit the technical idea of the present invention. Singular expressions include multiple expressions unless explicitly stated in the context. The term "and / or" includes any and possible combinations of one or more associated entries. The terms "contains" and / or "contains" specify the presence of the described properties, integers, steps, actions, elements, and / or components, one or more other properties, integers, Does not rule out the presence or addition of stages, actions, elements, components, and / or groups thereof. The components and properties shown in the drawings are not necessarily proportional to the actual ratio.

従来のハッシュテーブルでは、ハッシュテーブルのサイズをｍ×ｎで表し、ｍはハッシュバケット（又はロー）番号であり、ｎはウェイ（ｗａｙ）（又はカラム）番号である。例えば、ハッシュテーブルは、２^ｍ＝２^２６ハッシュバケット及び２^ｎ＝２^５ウェイを有する。 In a conventional hash table, the size of the hash table is represented by m × n, where m is a hash bucket (or low) number and n is a way (or column) number. For example, the hash table has 2 ^m = 2 ²⁶ hash buckets and 2 ⁿ = ²⁵ ways.

従来のハッシュテーブルが大きくなる時、ハッシュテーブルサイズの増加分は現在のサイズの２倍である（Ｎが指数を増加させるため、ハッシュテーブルの次元を２倍に増加させる）。ハッシュバケットの数は変わらず、ウェイの数のみが変わる。したがって、利用可能なメモリ容量及びハッシュテーブルのサイズに応じて、ハッシュテーブルのサイズを２倍に大きくすることが可能でないこともあり得る。この事実は、重複除去されないメモリの大きな部分を残し、この部分はオーバーフロー領域として使用される。 When the traditional hash table grows, the increase in hash table size is twice the current size (since N increases the exponent, the dimension of the hash table is doubled). The number of hash buckets does not change, only the number of ways. Therefore, it may not be possible to double the size of the hash table, depending on the amount of memory available and the size of the hash table. This fact leaves a large portion of memory that is not deduplicated, and this portion is used as an overflow area.

本明細書で、拡張可能な（ｓｃａｌａｂｌｅ）ハッシュテーブルは、従来のハッシュテーブルである大きいハッシュテーブル（ＢｉｇＨａｓｈＴａｂｌｅ）、及び同じ数のハッシュバケットを含むが、より小さい数のウェイを有する小さいハッシュテーブル（ＬｉｔｔｌｅＨａｓｈＴａｂｌｅ）を含む。したがって、例えば、大きいハッシュテーブルに対して２^ｍ＝２^２６及び２^ｎ＝２^５（また代わりに、ｍ＝２６及びｎ＝５）である場合、小さいハッシュテーブルのサイズは２^ｍ＝２^２６及び２^ｎ’であり、ここで、ｎ’は１とｎ－１との間の任意の数である。拡張可能なハッシュテーブルは、重複除去メモリサイズを最大化するためにハッシュテーブルサイズをより細かい粒状度（ｇｒａｎｕｌａｒｉｔｙ）で調整して、高い重複除去比率を達成するように柔軟性（ｆｌｅｘｉｂｉｌｉｔｙ）を提供する。「大きいハッシュテーブル」及び「小さいハッシュテーブル」という名称は、単にいずれのハッシュテーブルが参照されるかを明確にするためのものである。ハッシュテーブルは何らの機能を失うことなく、ただ「第１ハッシュテーブル」及び「第２ハッシュテーブル」と容易に称される。 As used herein, a scalable hash table includes a large hash table, which is a conventional hash table, and a small hash table that includes the same number of hash buckets but has a smaller number of ways. (Little Hash Table) is included. So, for example, if 2 ^m = 2 ²⁶ and 2 ⁿ = ²⁵ (and instead m = 26 and n = 5) for a large hash table, then the size of the small hash table is 2 ^m = 2 ²⁶ and 2 ^n' , where n'is any number between 1 and n-1. The extensible hash table adjusts the hash table size with finer granularity to maximize the deduplication memory size, providing flexibility to achieve high deduplication ratios. .. The names "large hash table" and "small hash table" are merely to clarify which hash table is referenced. Hash tables are simply referred to simply as "first hash tables" and "second hash tables" without losing any functionality.

論理アドレスの物理メモリへのマッピング（ｍａｐｐｉｎｇ）（物理的ラインＩＤ、又はＰＬＩＤ（ＰｈｙｓｉｃａｌＬｉｎｅＩＤ）としても知られる）は、変換テーブル（ＴｒａｎｓｌａｔｉｏｎＴａｂｌｅ）によって管理される。使用者データ（物理的ライン又はＰＬ（ＰｈｙｓｉｃａｌＬｉｎｅ）としても知られる）は、大きいハッシュテーブル、小さいハッシュテーブル、及びオーバーフロー領域の中の１つに格納される。 The mapping of logical addresses to physical memory (also known as physical line ID, or PLID (Physical Line ID)) is managed by a translation table. User data (also known as a physical line or PL (Physical Line)) is stored in one of a large hash table, a small hash table, and an overflow area.

変換テーブルエントリは、使用者データが大きいハッシュテーブルの中、又は小さいハッシュテーブル若しくはオーバーフロー領域の中の１つに存在するか否かを示す領域を含む。したがって、例えば、この領域が値０を格納する場合、使用者データは大きいハッシュテーブルで発見される。そうでなければ、使用者データは小さいハッシュテーブル又はオーバーフロー領域で発見される。 The translation table entry includes an area indicating whether the user data is present in one of the large hash tables or the small hash table or overflow area. So, for example, if this area stores a value of 0, the user data will be found in a large hash table. Otherwise, user data will be found in a small hash table or overflow area.

使用者データが大きいハッシュテーブルに格納されない場合、変換テーブルエントリは小さいハッシュテーブルとオーバーフロー領域との中のいずれが使用者データを格納するかを示す下位領域も含む。したがって、例えば、下位領域が値０を格納する場合、使用者データは小さいハッシュテーブルで発見される。そうでなければ、使用者データはオーバーフロー領域で発見される。 If the user data is not stored in a large hash table, the translation table entry also contains a subregion that indicates which of the smaller hash table and the overflow area stores the user data. So, for example, if the subregion stores the value 0, the user data is found in a small hash table. Otherwise, user data will be found in the overflow area.

図１は、本発明の一実施形態による重複除去メモリを使用するために動作するマシンを示す。図１に、マシン１０５が示されている。マシン１０５は、デスクトップコンピュータ、ラップトップコンピュータ、サーバー（スタンドアロンサーバー（ｓｔａｎｄａｌｏｎｅｓｅｒｖｅｒ）又はラックサーバー（ｒａｃｋｓｅｒｖｅｒ）の中の１つ）、又は本発明の実施形態から利益を得ることができる他の任意の装置を含む。但し、これらに制限されない。マシン１０５は、特殊な携帯用コンピューティング装置、タブレットコンピュータ、スマートフォン、及び他のコンピューティング装置を含む。マシン１０５は任意の所望の応用プログラムを実行する。データベースアプリケーションがよい例であるが、本発明の実施形態は任意の所望のアプリケーションに拡張される。 FIG. 1 shows a machine operating to use the deduplication memory according to an embodiment of the invention. FIG. 1 shows the machine 105. The machine 105 can be a desktop computer, a laptop computer, a server (one of a stand-alone server or a rack server), or any other embodiment of the invention that can benefit from it. Includes equipment. However, it is not limited to these. Machine 105 includes specialized portable computing devices, tablet computers, smartphones, and other computing devices. The machine 105 runs any desired application program. A database application is a good example, but embodiments of the invention extend to any desired application.

マシン１０５は、特定の形態には拘らず、プロセッサ１１０、メモリ１１５、及び格納装置１２０を備える。プロセッサ１１０は、任意の多様なプロセッサである。例えば、Ｉｎｔｅｌ（登録商標）Ｘｅｏｎ（登録商標）、Ｃｅｌｅｒｏｎ（登録商標）、Ｉｔａｎｉｕｍ（登録商標）、又はＡｔｏｍ（登録商標）プロセッサ、ＡＭＤ（登録商標）Ｏｐｔｅｒｏｎ（登録商標）プロセッサ、ＡＲＭ（登録商標）プロセッサ等である。図１には、１つのプロセッサを示しているが、マシン１０５は任意の数のプロセッサを含み、各々はシングルコア又はマルチコアプロセッサである。メモリ１１５は、フラッシュメモリ、ＳＲＡＭ（ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＰＲＡＭ（ＰｅｒｓｉｓｔｅｎｔＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＦｅＲＡＭ（ＦｅｒｒｏｅｌｅｃｔｒｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、又はＭＲＡＭ（ＭａｇｎｅｔｏｒｅｓｉｓｔｉｖｅＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）のようなＮＶＲＡＭ（Ｎｏｎ－ＶｏｌａｔｉｌｅＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などの任意の多様なメモリである。メモリ１１５は、異なるメモリタイプの任意の所望の組み合わせであってもよい。メモリ１１５は、マシン１０５の一部であるメモリコントローラ１２５によって制御される。 The machine 105 includes a processor 110, a memory 115, and a storage device 120, regardless of the specific form. The processor 110 is any variety of processors. For example, Intel® Xeon®, Celeron®, Itanium®, or Atom® processor, AMD® Opteron® processor, ARM®. It is a processor etc. Although FIG. 1 shows one processor, the machine 105 includes any number of processors, each of which is a single-core or multi-core processor. Memory 115, a flash memory, SRAM (Static Random Access Memory), PRAM (Persistent Random Access Memory), FeRAM (Ferroelectric Random Access Memory), or MRAM (Magnetoresistive Random Access Memory) such as NVRAM (Non-Volatile Random Access Memory ) And any other variety of memory. The memory 115 may be any desired combination of different memory types. The memory 115 is controlled by a memory controller 125 that is part of the machine 105.

格納装置１２０は、任意の多様な格納装置である。格納装置１２０は、メモリ１１５内に配置される装置ドライバー１３０によって制御される。 The storage device 120 is any variety of storage devices. The storage device 120 is controlled by the device driver 130 arranged in the memory 115.

図２は、図１に示すマシン１０５のさらなる詳細を示す図である。図２を参照すると、一般的に、マシン１０５は、マシン１０５の構成要素の動作を調整するために使用されるメモリコントローラ１２５及びクロック２０５を含む１つ以上のプロセッサ１１０を備える。プロセッサ１１０は、例えば、ＲＡＭ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）、ＲＯＭ（ｒｅａｄ－ｏｎｌｙｍｅｍｏｒｙ）、又は他の状態保持媒体を含むメモリ１１５に連結される。プロセッサ１１０は、格納装置１２０、及びイーサーネットコネクター又は無線コネクターなどのネットワークコネクター２１０と結合される。プロセッサ１１０は、他の構成要素の中で、使用者インターフェイス２２０及び入／出力エンジン２２５を使用して管理される入力／出力インターフェイスポートが接続されるバス２１５に連結される。 FIG. 2 is a diagram showing further details of the machine 105 shown in FIG. Referring to FIG. 2, in general, a machine 105 comprises one or more processors 110 including a memory controller 125 and a clock 205 used to coordinate the operation of the components of the machine 105. The processor 110 is connected to a memory 115 including, for example, a RAM (random access memory), a ROM (read-only memory), or another state-holding medium. The processor 110 is coupled with a storage device 120 and a network connector 210 such as an Ethernet connector or a wireless connector. The processor 110 is connected, among other components, to the bus 215 to which the input / output interface ports managed using the user interface 220 and the input / output engine 225 are connected.

図１乃び図２で、メモリ１１５は重複除去メモリである。重複除去メモリの実装は、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）のようなメモリの従来の形態とは異なるが、これらの相異点は重複除去メモリの実装には関連しない。また、プロセッサ１１０のようなマシン１０５の他のハードウェア構成要素がメモリ１１５の特定の実装を認知するか否かは、これらの構成要素がメモリ１１５の物理構造を知る必要があるか否かに依存する。このメモリ１１５の特定の実装に対する「知識の不足」は、マシン１０５上で実行されるアプリケーションプログラムのような、ソフトウェア構成要素に拡張される。アプリケーションプログラムは、メモリ１１５がＤＲＡＭ、重複除去メモリ、又は任意の他の形態のメモリを含むか否かに対する知識無しに、読出し及び書込み要請をメモリ１１５に伝送する。 In FIG. 1 and FIG. 2, the memory 115 is a deduplication memory. The implementation of deduplication memory differs from conventional forms of memory such as DRAM (Dynamic Random Access Memory), but these differences are not relevant to the implementation of deduplication memory. Also, whether other hardware components of the machine 105, such as the processor 110, recognize a particular implementation of the memory 115 depends on whether these components need to know the physical structure of the memory 115. Dependent. This "lack of knowledge" for a particular implementation of memory 115 extends to software components, such as application programs running on machine 105. The application program transmits read and write requests to memory 115 without knowledge of whether memory 115 includes DRAM, deduplication memory, or any other form of memory.

図３は、本明細書でのすべての目的に対する参照として引用される、２０１７年４月２６日に提出された米国特許出願第１５／４９８３７１明細書及び２０１７年１０月５日に公開された米国特許出願公開第２０１７／０２８６０１０明細書で説明されているように、図１のマシン１０５での重複除去メモリに対する従来のハッシュテーブルの使用を示す。図３に示すように、メモリ１１５はハッシュテーブル３０５、変換テーブル３１０、シグネチャーテーブル３１５、及びオーバーフロー領域３２０を含む。ハッシュテーブル３０５は、２^ｍロー又はバケット、及び２^ｎウェイ又はカラムを含むように構成される。ハッシュテーブル３０５は使用者データを格納するために使用され、各々の使用者データは特定のハッシュバケット内の特定のウェイに格納される。図３はハッシュテーブル３０５が全体メモリの約１／３であることを示しているが、実際にはハッシュテーブル３０５は任意のサイズであり、しばしば利用可能なメモリ内に合うよう、できる限り大きくなる（重複除去メモリを最大化するため）。オーバーフロー領域３２０は、重複除去メモリとして使用されないメモリ１１５の部分を示す（ハッシュテーブル３０５によって使用されるメモリよりも多くのメモリがあるが、ハッシュテーブル３０５のウェイの数を２倍にするのに十分な追加メモリがないためである）。 FIG. 3 is the US Patent Application No. 15/489371 filed on April 26, 2017 and the United States published on October 5, 2017, cited as references for all purposes herein. As described in Japanese Patent Application Publication No. 2017/0286010, the use of conventional hash tables for deduplication memory in machine 105 of FIG. 1 is shown. As shown in FIG. 3, the memory 115 includes a hash table 305, a conversion table 310, a signature table 315, and an overflow area 320. The hash table 305 is configured to include a 2 ^m row or bucket and a 2 ⁿ way or column. The hash table 305 is used to store user data, and each user data is stored in a particular way within a particular hash bucket. Figure 3 shows that the hash table 305 is about 1/3 of the total memory, but in reality the hash table 305 is of arbitrary size and is often as large as possible to fit within the available memory. (To maximize deduplication memory). The overflow area 320 indicates a portion of memory 115 that is not used as deduplication memory (there is more memory than is used by hash table 305, but sufficient to double the number of ways in hash table 305. Because there is no additional memory).

図４は、本発明の一実施形態による拡張可能なハッシュテーブルの使用を示す図である。図３とは対照的に、図４は大きいハッシュテーブル３０５、変換テーブル３１０、シグネチャーテーブル３１５、オーバーフロー領域３２０、及び小さいハッシュテーブル４０５を含む。小さいハッシュテーブル４０５は大きいハッシュテーブル３０５のように、２^ｍ個のバケットを含む。しかし、小さいハッシュテーブル４０５は２^ｎ’個のウェイを含み、ここで、ｎ’はｎより小さい。本発明の幾つかの実施形態で、ｎ’は変化し、小さいハッシュテーブル４０５が時間に応じて動的に大きくなる。したがって、例えば、ｎ’は（実装に応じて）０又は１から開始し、小さいハッシュテーブル４０５が十分に満たされて新しいエントリがバケットに配置されない時、ｎ’は１だけ増加する。例えば、メモリ１１５が比較的少ない重複除去を遂行する場合に、小さいハッシュテーブル４０５は動的に縮小する。本発明の他の実施形態で、小さいハッシュテーブル４０５は静的に（それはメモリ内にある可能性があるほどに大きい）設定される。 FIG. 4 is a diagram showing the use of an extensible hash table according to an embodiment of the present invention. In contrast to FIG. 3, FIG. 4 includes a large hash table 305, a conversion table 310, a signature table 315, an overflow area 320, and a small hash table 405. The small hash table 405, like the large hash table 305, contains 2 ^m buckets. However, the small hash table 405 contains 2 n'way, where n'is less than ⁿ . In some embodiments of the invention, n'varies and the small hash table 405 grows dynamically over time. So, for example, n'starts from 0 or 1 (depending on the implementation) and increases by 1 when the small hash table 405 is fully filled and no new entries are placed in the bucket. For example, the small hash table 405 shrinks dynamically when the memory 115 performs relatively little deduplication. In another embodiment of the invention, the small hash table 405 is set statically (large enough that it may be in memory).

大きいハッシュテーブル３０５又は小さいハッシュテーブル４０５の中の１つに与えられたロー及びカラム値に対して、ハッシュテーブルは、エントリ４１０のようなエントリを含む。エントリ４１０は、データ４１５及び度数カウンター（ｆｒｅｑｕｅｎｃｙｃｏｕｎｔｅｒ）４２０を含む。データ４１５は実際のデータを格納する。度数カウンター４２０はデータに対する異なる参照の数を追跡する。アプリケーションがデータ４１５の使用に関心を示すと、度数カウンター４２０は増加する。アプリケーションがそれ以上データ４１５に関心がない時、度数カウンター４２０は減少する。 For a row and column value given to one of the large hash table 305 or the small hash table 405, the hash table contains an entry such as entry 410. Entry 410 includes data 415 and a frequency counter 420. The data 415 stores the actual data. Frequency counter 420 keeps track of the number of different references to the data. When the application is interested in using the data 415, the frequency counter 420 increases. When the application is no longer interested in data 415, the frequency counter 420 is decremented.

ｎ’はｎより大きくてはならない。結局、メモリが、ｎ’がｎくらいの大きい数になるための充分な空間を含むならば、大きいハッシュテーブル３０５は最初から２倍に作成され、小さいハッシュテーブル４０５を使用する必要はない。 n'must not be greater than n. After all, if the memory contains enough space for n'to be as large as n, the large hash table 305 is created twice from the beginning and there is no need to use the small hash table 405.

以上の説明は、小さいハッシュテーブル４０５が静的であるか、又は動的であるかに拘らず、大きいハッシュテーブル３０５が静的に設定されることを提案する。本発明の幾つかの実施形態で、大きいハッシュテーブル３０５は静的に設定されるが、本発明の他の実施形態で大きいハッシュテーブル３０５は（物理的メモリの範囲内でそれ以上大きくならないところまで）必要に応じて動的に大きくなる。さらに、大きいハッシュテーブル３０５と小さいハッシュテーブル４０５との間に、これらのどちらが静的又は動的であるかを要求される関係はない。即ち、テーブルの両方が静的であってもよく、一方が静的であり、他方は動的であってもよく、又は両方が動的であってもよい。 The above description proposes that the large hash table 305 is statically set regardless of whether the small hash table 405 is static or dynamic. In some embodiments of the invention the large hash table 305 is statically set, whereas in other embodiments of the invention the large hash table 305 does not grow any further within physical memory. ) It grows dynamically as needed. Furthermore, there is no requirement between the large hash table 305 and the small hash table 405 which of these is static or dynamic. That is, both of the tables may be static, one may be static, the other may be dynamic, or both may be dynamic.

大きいハッシュテーブル３０５及び小さいハッシュテーブル４０５の両方を使用して、より多くのメモリ１１５が重複除去メモリに使用され、より少ないメモリ１１５がオーバーフロー領域３２０に割り当られる。これはメモリ１１５の使用を向上させて、重複除去が効果的になるために必要な重複除去比率を減少させる。 Using both the large hash table 305 and the small hash table 405, more memory 115 is used for deduplication memory and less memory 115 is allocated to the overflow area 320. This improves the use of memory 115 and reduces the deduplication ratio required for effective deduplication.

幾つかの例が参考になる。総メモリ容量が２７４，８７７，９０６，９４４バイト（約２５６ＧＢ）で、大きいハッシュテーブル３０５が３２（２^５）ウェイを有し、小さいハッシュテーブル４０５が１６（２^４）ウェイを有する状況を考察する。以下の表１は、ハッシュテーブル３０５のみを使用した場合とハッシュテーブル３０５及び小さいハッシュテーブル４０５を使用した場合とを比較した、ハッシュテーブルの使用に関する関連データを示す。表１から明らかなように、３．０の効果的な重複除去比率を達成するために、ハッシュテーブル３０５のみに対して要求される原始重複除去比率は５．４である。即ち、メモリ１１５に格納された約５．４％のデータが３．０の効果的な重複除去比率を達成するために重複除去されたデータを示さなければならない。一方、大きいハッシュテーブル３０５及び小さいハッシュテーブル４０５の両方が共に使用される場合、３．０の効果的な重複除去比率を達成するためには、ただ３．９の原始重複除去比率が必要である。これは相当に向上されている。 Some examples will be helpful. Consider a situation where the total memory capacity is 274,877,906,944 bytes (about 256 GB), the large hash table 305 has 32 ( ²⁵ ) ways, and the small hash table 405 has 16 ⁽ 24) ways. .. Table 1 below shows relevant data on the use of the hash table comparing the use of the hash table 305 alone with the use of the hash table 305 and the smaller hash table 405. As is clear from Table 1, in order to achieve an effective deduplication ratio of 3.0, the primitive deduplication ratio required for hash table 305 only is 5.4. That is, about 5.4% of the data stored in memory 115 must indicate the deduplicated data in order to achieve an effective deduplication ratio of 3.0. On the other hand, if both the large hash table 305 and the small hash table 405 are used together, only 3.9 primitive deduplication ratios are needed to achieve an effective deduplication ratio of 3.0. .. This has been significantly improved.

大きいハッシュテーブル３０５及び小さいハッシュテーブル４０５の両方を使用する場合、より低い原始重複除去比率が必要となる理由は、より多くのメモリ１１５が重複除去メモリとして使用されるためである。即ち、オーバーフロー領域３２０は、より小さい。より少ないメモリ１１５がオーバーフロー領域３２０に使用されるため、より低い原始重複除去比率が必要となり、メモリ１１５は全般的により効率的に使用される。 When both the large hash table 305 and the small hash table 405 are used, the reason why a lower primitive deduplication ratio is required is that more memory 115 is used as the deduplication memory. That is, the overflow region 320 is smaller. Since less memory 115 is used for the overflow area 320, a lower primitive deduplication ratio is required and memory 115 is generally used more efficiently.

第２の例として、同じ物理メモリ内にある同一のハッシュテーブルを考察する。この例では、一定の原始重複除去比率を仮定する効果的な重複除去比率を考察する。表２はこのような状況を示している。表２に示すように、ハッシュテーブル３０５が単独で使用される時の効果的な重複除去比率は、大きいハッシュテーブル３０５と小さいハッシュテーブル４０５とが共に使用される時の効果的な重複除去比率よりも低い。 As a second example, consider the same hash table in the same physical memory. In this example, we consider an effective deduplication ratio that assumes a constant primitive deduplication ratio. Table 2 shows such a situation. As shown in Table 2, the effective deduplication ratio when the hash table 305 is used alone is higher than the effective deduplication ratio when the large hash table 305 and the small hash table 405 are used together. Is also low.

以上で説明した実施形態は、１つの大きいハッシュテーブル３０５及び１つの小さいハッシュテーブル４０５を示す。しかし、ただ１つの小さいハッシュテーブル４０５のみが使用される理由はない。図５は、本発明の他の実施形態による拡張可能なハッシュテーブルの使用を示す図である。本実施形態では、投資回収率を犠牲にして、多数の小さいハッシュテーブルを支援する。例えば、図５で、メモリ１１５は大きいハッシュテーブル３０５、小さいハッシュテーブル４０５、及び小さいハッシュテーブル５０５を含むことが示されている。小さいハッシュテーブル５０５は、このハッシュテーブル５０５がＮ”ウェイを含むことを除いて、形態及び機能において小さいハッシュテーブル４０５と同一であり、ここで、（Ｎ’がＮより少ない２の累乗であるように）Ｎ”は、Ｎ’より少ない２の累乗である。 The embodiments described above show one large hash table 305 and one small hash table 405. However, there is no reason to use only one small hash table 405. FIG. 5 is a diagram illustrating the use of an extensible hash table according to another embodiment of the invention. In this embodiment, a large number of small hash tables are supported at the expense of return on investment. For example, in FIG. 5, memory 115 is shown to include a large hash table 305, a small hash table 405, and a small hash table 505. The small hash table 505 is identical in form and function to the small hash table 405, except that the hash table 505 includes an N "way, where (N'is a power of 2 less than N). Ni) N "is a power of 2 less than N'.

変換テーブル３１０は、論理アドレスを所望の使用者データが格納されたアドレスにマッピングすることを担当する。具体的に、変換テーブル３１０は、ロー及びカラム（即ち、バケット及びウェイ）を、（使用者データがハッシュテーブルの中の１つに格納された場合）使用者データが格納されたハッシュテーブルに格納するか、又は（使用者データがハッシュテーブルの中の１つに格納されない場合）オーバーフロー領域３２０に物理アドレスを格納する。図６はこのプロセスを示している。即ち、図６は、論理アドレスを多様なメモリ目的地にマッピングするために図４の変換テーブルの使用を示す図である。 The translation table 310 is responsible for mapping the logical address to the address where the desired user data is stored. Specifically, the conversion table 310 stores rows and columns (ie, buckets and ways) in a hash table containing user data (if the user data is stored in one of the hash tables). Or store the physical address in the overflow area 320 (if the user data is not stored in one of the hash tables). FIG. 6 illustrates this process. That is, FIG. 6 is a diagram showing the use of the conversion table of FIG. 4 to map logical addresses to various memory destinations.

図６で、変換テーブル３１０は、ホストコンピュータから論理アドレス６０５を受信する（論理アドレス６０５は、アプリケーション、オペレーティングシステム（Ｏｐｅｒａｔｉｎｇｓｙｓｔｅｍ）、又は図１のメモリ１１５からデータにアクセスが必要な任意の他のソフトウェア又はハードウェアから最終的に得られる）。論理アドレス６０５は、読出し要請又は書込み要請の中の１つであるデータ要請の部分である。論理アドレス６０５は、２つの構成要素、即ち、変換テーブルインデックス及び粒状度を含むものと考えられる。変換テーブルインデックスは、要請されたデータが発見される特定ページ（又はキャッシュライン）を示す。粒状度は検索されるデータの特定バイトを示す。具体的に、変換テーブルインデックスは、論理アドレス６０５から最下位ビットをマスクすることによって生成される。変換テーブルインデックスを生成するためにどのくらい多くのビットがマスクされるかは、変換テーブルインデックスのサイズに依存する（これは、図１のメモリ１１５のサイズ及びコンピュータシステムで使用されるキャッシュラインのサイズに依存する）。 In FIG. 6, the translation table 310 receives a logical address 605 from the host computer (the logical address 605 is an application, an operating system, or any other data that needs to be accessed from the memory 115 of FIG. 1). Ultimately obtained from software or hardware). The logical address 605 is a part of a data request which is one of a read request or a write request. The logical address 605 is considered to include two components: a translation table index and a granularity. The conversion table index indicates the specific page (or cache line) where the requested data is found. The granularity indicates a specific byte of the data to be searched. Specifically, the translation table index is generated by masking the least significant bit from the logical address 605. How many bits are masked to generate a translation table index depends on the size of the translation table index (this depends on the size of memory 115 in FIG. 1 and the size of the cache line used by the computer system. Dependent).

その後、変換テーブルインデックスは、ＰＬＩＤ（ＰｈｙｓｉｃａｌＬｉｎｅＩｄｅｎｔｉｆｉｅｒ）６１０が読み出される変換テーブル内でインデックスとして使用される。ＰＬＩＤ６１０は、使用者データが実際にどこに格納されるかに応じて異なる形態を取る。しかし、すべての場合において、ＰＬＩＤ６１０は、領域識別子６１５及び物理アドレス６２０を含む。 The conversion table index is then used as an index in the conversion table from which the PLID (Physical Line Identifier) 610 is read. The PLEID 610 takes different forms depending on where the user data is actually stored. However, in all cases, the PLEID 610 includes a region identifier 615 and a physical address 620.

使用者データが図４の大きいハッシュテーブル３０５に格納される場合、ＰＬＩＤ６１０はエントリ６２５のように見える。エントリ６２５で、領域識別子は１ビットを含み、これは使用者データが図４の大きいハッシュテーブル３０５に格納されたことを示す。そして、物理アドレスは、ローインデックス（ハッシュバケットを識別する）に対するｍビット及びカラムインデックス（ウェイを識別する）に対するｎビットを含む。ｍビットは２^ｍハッシュバケットの中から選択するのに十分であり、ｎビットは２^ｎウェイの中から選択するのに十分であるので、固有の使用者データが大きいハッシュテーブル３０５内で識別される。 If the user data is stored in the large hash table 305 of FIG. 4, the PLEID 610 looks like entry 625. At entry 625, the region identifier contains one bit, indicating that the user data was stored in the large hash table 305 of FIG. The physical address then includes m bits for the low index (identifying the hash bucket) and n bits for the column index (identifying the way). Since the m bits are sufficient to select from within the 2 ^m hash bucket and the n bits are sufficient to select from within the 2 ⁿ way, the unique user data is identified in the large hash table 305. To.

使用者データが図４の小さいハッシュテーブル４０５に格納される場合、ＰＬＩＤ６１０はエントリ６３０のように見える。エントリ６３０で、領域識別子は２ビットを含む。即ち、使用者データが図４の大きいハッシュテーブル３０５に格納されていないことを示す第１ビット、及び使用者データが図４の小さいハッシュテーブル４０５に格納されていることを示す第２ビットを含む。そして、物理アドレスは、ローインデックス（ハッシュバケットを識別する）に対するｍビット及びカラムインデックス（ウェイを識別する）に対するｎ’ビットを含む。ｎ’は常にｎより小さいので、２ビットが使用者データが格納された領域を識別するのに使用されるにも拘らず、エントリ６３０はエントリ６２５よりもさらに多くのビットを必要としない。 If the user data is stored in the small hash table 405 of FIG. 4, the PLEID 610 looks like entry 630. At entry 630, the region identifier contains 2 bits. That is, it includes a first bit indicating that the user data is not stored in the large hash table 305 of FIG. 4 and a second bit indicating that the user data is stored in the small hash table 405 of FIG. .. The physical address then includes an m-bit for the low index (identifying the hash bucket) and an n'bit for the column index (identifying the way). Since n'is always less than n, entry 630 does not require any more bits than entry 625, even though 2 bits are used to identify the area where user data is stored.

使用者データが図４のオーバーフロー領域３２０に格納される場合、ＰＬＩＤ６１０はエントリ６３５のように見える。エントリ６３５で、領域識別子は再び２ビットを含む。即ち、使用者データが図４の大きいハッシュテーブル３０５に格納されないことを示す第１ビット、及び使用者データが図４のオーバーフロー領域３２０に格納されていることを示す第２ビットを含む。物理アドレスは、ロー及びカラムインデックス（エントリ６２５、６３０のように）を含むか、又は任意の他の所望のフォーマットを使用して、任意の所望の方式にフォーマットされる。 If the user data is stored in the overflow area 320 of FIG. 4, the PLEID 610 looks like entry 635. At entry 635, the region identifier again contains 2 bits. That is, it includes a first bit indicating that the user data is not stored in the large hash table 305 of FIG. 4, and a second bit indicating that the user data is stored in the overflow area 320 of FIG. The physical address contains raw and column indexes (such as entries 625, 630) or is formatted in any desired format using any other desired format.

エントリ６２５、６３０、及び６３５が、図４の大きいハッシュテーブル３０５、図４の小さいハッシュテーブル４０５、及び図４のオーバーフロー領域３２０のエントリの間をいかに区別するかを示したが、本発明の実施形態は使用される他の形態を支援する。例えば、図５に示す多数の小さいハッシュテーブル４０５及び５０５を含む実施形態で、領域識別子６１５は、図５の大きいハッシュテーブル３０５と他の領域との間を区別するための１ビットを含み、また、図５の小さいハッシュテーブル（４０５、５０５）、及び図５のオーバーフロー領域３２０の中から選択するための２ビットを使用する。又は、領域識別子６１５は常に最大４つの異なる領域まで選択するための２ビットを使用するか、又は常に異なる最大８個の領域まで選択するための３ビットを使用することなどができる。このアプローチは、領域を選択するために多様な数のビットを必要としない長所を有するが、全体ＰＬＩＤ６１０を格納するための多様な数のビットを必要とし、使用されない領域６１５に対するビットの幾つかの組み合わせが残る。 Although entries 625, 630, and 635 show how to distinguish between entries in the large hash table 305 of FIG. 4, the small hash table 405 of FIG. 4, and the overflow region 320 of FIG. 4, the present invention is practiced. The morphology supports the other morphology used. For example, in an embodiment comprising a large number of small hash tables 405 and 505 shown in FIG. 5, the region identifier 615 includes one bit for distinguishing between the large hash table 305 of FIG. 5 and the other region. , The small hash table of FIG. 5 (405, 505), and 2 bits for selection from the overflow area 320 of FIG. Alternatively, the region identifier 615 may always use 2 bits to select up to 4 different regions, or may always use 3 bits to select up to 8 different regions. This approach has the advantage that it does not require a diverse number of bits to select the region, but it does require a diverse number of bits to store the entire PLEID 610 and some of the bits for the unused region 615. The combination remains.

図６は、論理アドレスから多くのビットをキーオフされたエントリと共に、テーブルとして変換テーブル３１０を説明したが、変換テーブル３１０は他の技術、例えばハッシュ関数を利用して、具現される。ハッシュ関数を利用する場合、特定の論理アドレス（又は論理アドレスの関連上位ビット）はハッシュ関数の対象になる可能性がある。ハッシュ関数の結果は、使用者データの位置（図４の大きいハッシュテーブル３０５、図４の小さいハッシュテーブル４０５、又は図４のオーバーフロー領域３２０）、及び物理アドレスを判別するのに使用される。 FIG. 6 describes the conversion table 310 as a table, with entries keyed off many bits from the logical address, but the conversion table 310 is embodied using other techniques, such as hash functions. When using a hash function, a specific logical address (or the associated high-order bit of the logical address) may be the target of the hash function. The result of the hash function is used to determine the location of the user data (large hash table 305 in FIG. 4, small hash table 405 in FIG. 4, or overflow area 320 in FIG. 4) and physical address.

図４に戻ると、重複除去メモリを使用する時の１つの問題は、使用者データが実際にメモリ１１５のどこかに格納されているのか否かを確認することである。例えば、異なるアプリケーションは同一のデータへのアクセスを要請するが、異なる論理アドレスを使用する（どちらのアプリケーションも、他のアプリケーション又は使用者データに対する他のアプリケーションの関心を認知できないので）。シグネチャーテーブル３１５は、与えられた使用者データがメモリ１１５に既に存在する幾つかの他の使用者データの複製であるか否かを判別し、使用者データの不必要なコピーが格納されることを防止するために使用される。 Returning to FIG. 4, one problem when using the deduplication memory is to check whether the user data is actually stored somewhere in the memory 115. For example, different applications request access to the same data, but use different logical addresses (because neither application recognizes the interest of other applications or user data). The signature table 315 determines if the given user data is a duplicate of some other user data that already exists in memory 115 and stores an unnecessary copy of the user data. Used to prevent.

使用者データがメモリ１１５に新たに格納される時、ハッシュ関数はシグネチャーを生成するために使用者データに適用される。このハッシュ関数は、メモリに使用者データが実際に格納された場所を判別するのに使用されるハッシュ関数と同一であるか、又は、異なるハッシュ関数である。使用者データが格納された場所を判別するのに使用されるハッシュ関数とは異なり、シグネチャーを生成するのに使用されるハッシュ関数は使用者データの論理アドレスではなく、使用者データ自体のハッシュを遂行する。その後、シグネチャーテーブル３１５はシグネチャーが存在するか否かを確認するために検索される。 When the user data is newly stored in memory 115, the hash function is applied to the user data to generate a signature. This hash function is the same as or different from the hash function used to determine where the user data is actually stored in memory. Unlike the hash function used to determine where the user data is stored, the hash function used to generate the signature is not the logical address of the user data, but the hash of the user data itself. To carry out. The signature table 315 is then searched to see if the signature exists.

シグネチャーは、通常、使用者データ自体よりも長さが短い（即ち、より少ビット）。したがって、異なる使用者データが同一のシグネチャーを生成することが可能である。言い換えると、シグネチャーの一致がシグネチャーテーブル３１５で発見される場合、その一致は使用者データがメモリ１１５に既に格納されていることを自動的に意味するものではない。使用者データがメモリ１１５に実際に格納されたか否かを判別するために、使用者データはメモリ１１５の識別されたデータと比較される。完全な比較が一致を示す場合、使用者データはメモリ１１５に既に格納されている。その場合、変換テーブル３１０は、図６のＰＬＩＤ６１０が使用者データが格納された位置を示すように設定する。逆の命題が真であることに留意する。シグネチャーテーブル３１５でシグネチャーの一致が発見されなければ、使用者データはメモリ１１５に未だ格納されていない（同一データが同一のハッシュ関数を使用して異なるシグネチャーを生成しないため）。その場合、新しいエントリが、論理アドレスに再びマッピングされるシグネチャーテーブル３１５に追加される。 Signatures are usually shorter (ie, less bits) in length than the user data itself. Therefore, it is possible for different user data to generate the same signature. In other words, if a signature match is found in the signature table 315, that match does not automatically mean that the user data is already stored in memory 115. User data is compared with the identified data in memory 115 to determine if the user data was actually stored in memory 115. If the exact comparison shows a match, the user data is already stored in memory 115. In that case, the conversion table 310 is set so that the PLL 610 in FIG. 6 indicates the position where the user data is stored. Keep in mind that the opposite proposition is true. If no signature match is found in the signature table 315, the user data is not yet stored in memory 115 (because the same data does not use the same hash function to generate different signatures). In that case, the new entry is added to the signature table 315, which is remapped to the logical address.

シグネチャーテーブル３１５は、一般的に大きいハッシュテーブル３０５及び小さいハッシュテーブル４０５に格納された使用者データに関してのみに使用される。即ち、シグネチャーテーブル３１５は、オーバーフロー領域３２０に格納されたデータのために使用されない。理由は単純である。オーバーフロー領域３２０は重複除去の対象ではないため、そこに重複データが格納される可能性がある。 The signature table 315 is generally used only for user data stored in the large hash table 305 and the small hash table 405. That is, the signature table 315 is not used for the data stored in the overflow area 320. The reason is simple. Since the overflow area 320 is not a target of deduplication, duplicate data may be stored there.

図７Ａ及び図７Ｂは、本発明の一実施形態による重複除去メモリに図４の拡張可能なハッシュテーブルを使用する手順の一例を示すフローチャートである。図７Ａの、段階７０５で、図４のメモリ１１５は、データ要請の一部として、図１のマシン１０５から図６の論理アドレス６０５を受信する（マシン１０５、オペレーティングシステム、又は幾つかの他の構成要素で実行中の応用プログラムから）。段階７１０で、図４のメモリ１１５は図６の論理アドレス６０５に対応する図６のＰＬＩＤ６１０を判別する。この場合、異なるアプローチ（ａｐｐｒｏａｃｈ）が図６のＰＬＩＤ６１０を判別するのに使用されるので、図６のＰＬＩＤ６１０がいかに判別されるかは、データが読み出されるか、又は書き込まれるかによって異なる。図６のＰＬＩＤ６１０が読出し及び書込み要請に対していかに判別されるかに対するフローチャートが以下の図８、及び図９Ａ～図９Ｃを各々参照して示される。 7A and 7B are flowcharts showing an example of a procedure for using the expandable hash table of FIG. 4 in the deduplication memory according to one embodiment of the present invention. At step 705 of FIG. 7A, memory 115 of FIG. 4 receives the logical address 605 of FIG. 6 from machine 105 of FIG. 1 as part of a data request (machine 105, operating system, or some other). From the application program running on the component). At step 710, the memory 115 of FIG. 4 determines the PLEID 610 of FIG. 6 corresponding to the logical address 605 of FIG. In this case, since different approaches are used to determine the PLLD 610 of FIG. 6, how the PLLD 610 of FIG. 6 is determined depends on whether the data is read or written. A flowchart of how the PLLD 610 of FIG. 6 is discriminated against a read and write request is shown with reference to FIGS. 8 and 9A-9C below, respectively.

図６のＰＬＩＤ６１０が判別されると、段階７１５で図４のメモリ１１５は、データが格納された場所を判別するために図６のＰＬＩＤ６１０の領域識別子を使用する。データが図４のオーバーフロー領域３２０に格納されると、段階７２０で使用者データは（発行された要請のタイプに応じて、図４のオーバーフロー領域３２０からデータを読み出すか、又は図４のオーバーフロー領域にデータを書き込む）処理が終了された後、図６のＰＬＩＤ６１０から物理アドレスを使用してアクセスされる。そうでなければ、段階７２５で、図４のメモリ１１５はローインデックス及びカラムインデックス（即ち、ハッシュバケット及びウェイ）を図４の大きいハッシュテーブル３０５又は図４の小さいハッシュテーブル４０５の中の１つで判別する（ハッシュテーブルが図４のメモリ１１５に格納されるところを除外すれば、データのアクセスは同一である）。 When the PLLD 610 of FIG. 6 is determined, in step 715 the memory 115 of FIG. 4 uses the area identifier of the PLL ID 610 of FIG. 6 to determine where the data is stored. When the data is stored in the overflow area 320 of FIG. 4, in step 720 the user data (depending on the type of request issued, either read the data from the overflow area 320 of FIG. 4 or read the data from the overflow area 320 of FIG. 4). After the process of writing data to) is completed, the data is accessed from the PLL 610 in FIG. 6 using the physical address. Otherwise, at step 725, memory 115 in FIG. 4 has a low index and column index (ie, hash bucket and way) in one of the large hash table 305 of FIG. 4 or the small hash table 405 of FIG. Determination (data access is the same except where the hash table is stored in memory 115 of FIG. 4).

段階７３０（図７Ｂ）で、図４のメモリ１１５は要請されたデータが特定のローインデックス及びカラムインデックスで「発見」されたか否かを判別する。例えば、データがハッシュテーブルの中の１つに書き込まれる時、ローインデックス及びカラムインデックスが、既にデータがある位置を識別することが発生する（可能性は低いが、発生可能である）。その場合、使用者データは付近の位置に書き込まれ（例えば、カラムインデックスの幾つかの予め決定されたデルタ内で、同じハッシュバケットの他のウェイ）、したがってデータは読み出される時の代わりにその位置で検索される必要がある。ハッシュテーブルで「近い」と看做されることは、図９Ａ～図９Ｃを参照して、以下でさらに論議される。書込み要請の場合、「発見」されるデータは、データが書き込まれるハッシュテーブルに利用可能なエントリがあることを意味する。 At step 730 (FIG. 7B), memory 115 of FIG. 4 determines whether the requested data has been "discovered" at a particular low index and column index. For example, when data is written to one of the hash tables, it happens that the raw and column indexes identify where the data is already (less likely, but possible). In that case, the user data is written to a nearby location (eg, within some predetermined delta of the column index, other ways in the same hash bucket), so the data is in that location instead of being read. Need to be searched for. What is considered "close" in the hash table is further discussed below with reference to FIGS. 9A-9C. In the case of a write request, the data "discovered" means that there is an available entry in the hash table to which the data is written.

図６の論理アドレス６０５が読出し要請の部分である状況で、図４のメモリ１１５は、特定のローインデックス及びカラムインデックスのデータが要請されたデータではないことを判別する方法がない。このような状況で、段階７３０は、データにアクセスするために、一般的に常に「はい（Ｙｅｓ）」の結果を返し、段階７３５へ自動的に進行する。 In the situation where the logical address 605 of FIG. 6 is a read request portion, the memory 115 of FIG. 4 has no way of determining that the data of a specific low index and column index is not the requested data. In such a situation, step 730 generally always returns a "Yes" result to access the data and automatically proceeds to step 735.

データが特定のローインデックス及びカラムインデックスで発見されるか、又は特定のローインデックス及びカラムインデックスが書込み要請に対するデータを格納可能な場合、段階７３５で図４のメモリ１１５はデータにアクセスし、その後に処理が完了される。そうでない場合、データが書き込まれる予定であるが、ローインデックス及びカラムインデックスがデータを既に格納している位置を識別する場合、段階７４０で図４のメモリ１１５は使用者データを書き込む位置のために近くのエントリを検索する。段階７４５で、使用者データが書き込まれる近くの位置が無い場合、段階７５０で図４のメモリ１１５はエラーを報告する。又は、特にデータがハッシュテーブルの中の１つに書き込まれる予定であるが、幾つかの理由によって書き込むことができない場合、図６のＰＬＩＤ６１０はハッシュテーブルの代わりに図４のオーバーフロー領域３２０を指すように変更される（この場合に処理が図７Ａの段階７２０へ進行される）。そうでない場合、データは付近で「発見」され、段階７３５で使用者データはハッシュテーブルからアクセスされ、処理は完了される。 If the data is found at a particular low index and column index, or if the particular low index and column index can store the data for a write request, then in step 735 the memory 115 of FIG. 4 accesses the data and then. The process is completed. If not, data will be written, but if the low index and column index identify where the data is already stored, then in step 740 memory 115 in FIG. 4 is for the position where the user data is written. Search for nearby entries. At step 745, if there is no location near where user data is written, memory 115 of FIG. 4 reports an error at step 750. Or, in particular, if the data is to be written to one of the hash tables but cannot be written for some reason, the PLEID 610 in FIG. 6 should point to the overflow area 320 in FIG. 4 instead of the hash table. (In this case, the process proceeds to step 720 in FIG. 7A). Otherwise, the data is "discovered" in the vicinity, user data is accessed from the hash table at step 735, and processing is complete.

図８は、本発明の一実施形態によるメモリの読出し要請での論理アドレスに対するＰＬＩＤ（ＰｈｙｓｉｃａｌＬｉｎｅＩｄｅｎｔｉｆｉｅｒ）を判別するための手順の一例を示すフローチャートである。読出し要請の場合、プロセスは単純である。段階８０５で、論理アドレスは図４の変換テーブル３１０にアクセスするのに使用される。変換テーブル３１０は、図４のメモリ１１５内の適切な領域からデータにアクセスするのに使用される物理アドレス（ハッシュテーブルの中の１つにローインデックス及びカラムインデックスを含む）が決定される図６のＰＬＩＤ６１０を提供する。 FIG. 8 is a flowchart showing an example of a procedure for determining a PLID (Physical Line Identifier) for a logical address in a memory read request according to an embodiment of the present invention. For read requests, the process is simple. At step 805, the logical address is used to access the translation table 310 of FIG. The translation table 310 determines the physical address used to access the data from the appropriate area in memory 115 of FIG. 4 (one of the hash tables includes a low index and a column index). PLID 610 is provided.

図９Ａ～図９Ｃは、本発明の一実施形態によるメモリの書込み要請での論理アドレスに対するＰＬＩＤを判別するための手順の一例を示すフローチャートである。図９Ａの段階９０５で、図４のメモリ１１５は書き込まれるデータのシグネチャーを生成する。段階９１０で、図４のシグネチャーテーブル３１５はシグネチャーが存在するか否かを確認する。シグネチャーの一致が発見されると、段階９１５で図４のメモリ１１５はデータがハッシュテーブル（図４の大きいハッシュテーブル３０５又は図４の小さいハッシュテーブルの中の１つである）に格納されたデータと一致するか否かを確認する。そうでない場合、段階９２０で、図４のメモリ１１５は図４のシグネチャーテーブル３１５が新しいエントリのための空間を有するか否かを確認する。 9A-9C are flowcharts showing an example of a procedure for determining a PLID for a logical address in a memory write request according to an embodiment of the present invention. At step 905 of FIG. 9A, memory 115 of FIG. 4 generates a signature of the data to be written. At step 910, the signature table 315 of FIG. 4 confirms whether or not a signature exists. When a signature match is found, in step 915 the memory 115 of FIG. 4 is the data whose data is stored in a hash table (one of the large hash table 305 of FIG. 4 or the small hash table of FIG. 4). Check if it matches with. If not, at step 920, memory 115 in FIG. 4 checks to see if signature table 315 in FIG. 4 has space for new entries.

段階９１５でデータがハッシュテーブルに格納されたデータと一致すると、段階９２５（図９Ｂ）で、図４のメモリ１１５は、図４の度数カウンター４２０が増加する場合、図４の度数カウンター４２０がオーバーフローしているか否かを確認する（図４の度数カウンター４２０が既に最大値に到達する場合に発生する）。その場合、又はシグネチャーテーブルが図９Ａの段階９２０で、新しいエントリに対する空間を有しない場合、使用者データは図４のオーバーフロー領域３２０に書き込まれ、段階９３０でメモリ１１５は図４のオーバーフロー領域３２０に対する図６のＰＬＩＤ６１０を生成する。或は、シグネチャーが図４のシグネチャーテーブル３１５で発見されると、データはハッシュテーブルのエントリに一致し、図４の度数カウンター４２０はオーバーフローせず、段階９３５で図４のメモリ１１５は図４の度数カウンター４２０を増加させて、処理は終了する。図４の度数カウンター４２０がオーバーフローされずに、増加されると、データはそれ以上図４のメモリ１１５に書き込まれる必要がなく（既にハッシュテーブルに格納されている）、したがって図７Ａの処理も終了する。 When the data matches the data stored in the hash table in step 915, in step 925 (FIG. 9B), the memory 115 in FIG. 4 overflows the frequency counter 420 in FIG. 4 if the frequency counter 420 in FIG. 4 increases. Check if it is done (occurs when the frequency counter 420 in FIG. 4 has already reached the maximum value). In that case, or if the signature table is in step 920 of FIG. 9A and has no space for a new entry, user data is written to the overflow area 320 of FIG. 4, and in step 930 the memory 115 is relative to the overflow area 320 of FIG. Generate the PLL 610 of FIG. Alternatively, if the signature is found in the signature table 315 of FIG. 4, the data matches the entry in the hash table, the frequency counter 420 of FIG. 4 does not overflow, and the memory 115 of FIG. 4 at step 935 is of FIG. The frequency counter 420 is increased, and the process ends. When the frequency counter 420 of FIG. 4 is increased without overflowing, the data does not need to be further written to the memory 115 of FIG. 4 (already stored in the hash table), and therefore the processing of FIG. 7A is also completed. do.

シグネチャーが図４のシグネチャーテーブル３１５で発見されたが、データが段階９１５で一致しない場合、論理アドレスが通常マッピングするハッシュバケット及びウェイの特定の組み合わせは既に使用中である。この状況は「ハッシュ衝突（ｃｏｌｌｉｓｉｏｎ）」と称される。ハッシュ衝突が発生する時、対応する幾つかの方法がある。段階９４０に示すように、１つの可能性はハッシュテーブルの新しい利用可能な位置を探し、新しい位置を指すように図６のＰＬＩＤ６１０をアップデートする。第２の可能性は段階９３０に制御を進めて、ハッシュテーブルではなく図４のオーバーフロー領域３２０にデータを書き込み、これにより図６のＰＬＩＤを再びアップデートすることである。第３の可能性は、段階９４５に示すように、図６のＰＬＩＤ６１０を変更しないで残し、データが他のどこかに実際に格納されていることを把握するために図４のメモリ１１５に残すことである。例えば、開番地法（ｏｐｅｎａｄｄｒｅｓｓｉｎｇ）のように、ハッシュ衝突に対する既存のソリューションがあり、ここで、データは識別された正確な位置に格納されないが、ハッシュ衝突が発生した後、空いている第１位置内のどこかに格納される。開番地法を使用する時、データは、図６のＰＬＩＤ６１０によって識別された位置の後の任意のアドレスに格納されるか、又はデータは固定された予め決定された数の位置内に格納される（固定された予め決定された数は任意の所望の値に設定される）。本発明の他の実施形態で、データは、図６のＰＬＩＤ６１０によって特定された位置の前に、再び幾つかの固定された予め決定された数の位置内に格納される。何れのアプローチが使用されても、図４のメモリ１１５は図６のＰＬＩＤ６１０を生成し、図６のＰＬＩＤ６１０によって特定された位置からデータにアクセスするためにこのアプローチを使用する。本発明の実施形態はハッシュ衝突に対する他のソリューションも支援する。 If the signature is found in the signature table 315 of FIG. 4, but the data do not match at stage 915, then the particular combination of hash buckets and ways that the logical address normally maps to is already in use. This situation is referred to as a "hash collision". When hash collisions occur, there are several ways to deal with them. As shown in step 940, one possibility is to look for a new available location in the hash table and update the PLLD 610 in FIG. 6 to point to the new location. The second possibility is to advance control to step 930 and write data to the overflow area 320 of FIG. 4 instead of the hash table, thereby updating the PLID of FIG. 6 again. The third possibility, as shown in step 945, is to leave the PLLD 610 of FIG. 6 unchanged and to the memory 115 of FIG. 4 to know that the data is actually stored somewhere else. That is. There are existing solutions to hash collisions, for example open addressing, where the data is not stored in the exact location identified, but is free after the hash collision occurs. Stored somewhere in the location. When using the open address method, the data is stored at any address after the position identified by PLID610 in FIG. 6, or the data is stored within a fixed, predetermined number of positions. (A fixed, predetermined number is set to any desired value). In another embodiment of the invention, the data is stored again in a number of fixed and predetermined numbers of positions prior to the positions identified by PLLD610 in FIG. Regardless of which approach is used, the memory 115 of FIG. 4 will generate the PLLD 610 of FIG. 6 and use this approach to access the data from the location specified by the PLL 610 of FIG. Embodiments of the invention also support other solutions to hash collisions.

図４のシグネチャーテーブル３１５でシグネチャーが発見されず、図４のシグネチャーテーブル３１５が新しいエントリに対する空間を有する場合、以段階９５０（図９Ｃ）で、シグネチャーは図４のシグネチャーテーブル３１５に追加され、段階９５５でメモリ１１５は、利用可能なエントリがどこに発見されるかによって、図４の大きいハッシュテーブル３０５及び図４の小さいハッシュテーブル４０５の中の１つに対して図６のＰＬＩＤ６１０を生成する。 If no signature is found in the signature table 315 of FIG. 4 and the signature table 315 of FIG. 4 has space for a new entry, then at stage 950 (FIG. 9C), the signature is added to the signature table 315 of FIG. At 955, memory 115 produces PLID 610 of FIG. 6 for one of the large hash table 305 of FIG. 4 and the small hash table 405 of FIG. 4, depending on where the available entry is found.

図１０は、本発明の一実施形態による小さいハッシュテーブルのサイズを増加させるか否かを判別するための手順の一例を示すフローチャートである。図１０の段階１００５で、図４の小さいハッシュテーブル４０５は収容容量に近づいているかを確認する。また、図４のメモリ１１５は、図４のオーバーフロー領域３２０が図４の小さいハッシュテーブル４０５に使用されるために用途変更される利用可能な十分な格納空間を有するかを確認する。図４の小さいハッシュテーブル４０５はサイズが２倍になるので、図４のオーバーフロー領域３２０は、少なくとも図４の小さいハッシュテーブルが既に使用しているくらいの空間を有する必要がある。幾つかのデータが図４のオーバーフロー領域３２０に既に格納されている可能性があり、図４のオーバーフロー領域３２０に残す必要があるので、できる限り多くの空間を要する。図４の小さいハッシュテーブル４０５が十分な空間を有するか、又は図４のオーバーフロー領域３２０が図４の増加する小さいハッシュテーブル４０５をサポートするための十分な格納空間を有しない場合、処理は終了する。そうでない場合（そして、図４の小さいハッシュテーブル４０５に対する値ｎ’が図４の大きいハッシュテーブル３０５に対する値ｎより少なくとも２小さいと仮定すると）、段階１０１０で、図４のオーバーフロー領域３２０のサイズを減少させる代償として、図４の小さいハッシュテーブル４０５はサイズが増加される。その後、段階１０１５で、図４の小さいハッシュテーブル４０５は新しく追加されたメモリを使用するために（値ｎ’を増加させることによって）ウェイの数を２倍に増加させて処理は終了する。 FIG. 10 is a flowchart showing an example of a procedure for determining whether or not to increase the size of a small hash table according to an embodiment of the present invention. At step 1005 in FIG. 10, it is confirmed that the small hash table 405 in FIG. 4 is approaching the capacity. Also, the memory 115 of FIG. 4 confirms whether the overflow area 320 of FIG. 4 has sufficient storage space available for use in the small hash table 405 of FIG. Since the small hash table 405 of FIG. 4 is doubled in size, the overflow area 320 of FIG. 4 needs to have at least enough space for the small hash table of FIG. 4 to already use. It takes as much space as possible because some data may already be stored in the overflow area 320 of FIG. 4 and needs to be left in the overflow area 320 of FIG. If the small hash table 405 of FIG. 4 has sufficient space, or the overflow area 320 of FIG. 4 does not have sufficient storage space to support the increasing small hash table 405 of FIG. 4, the process ends. .. If not (and assuming that the value n'for the small hash table 405 in FIG. 4 is at least 2 less than the value n for the large hash table 305 in FIG. 4), in step 1010 the size of the overflow region 320 in FIG. 4 is increased. In return for the reduction, the small hash table 405 in FIG. 4 is increased in size. Then, at step 1015, the small hash table 405 of FIG. 4 doubles the number of ways (by increasing the value n') to use the newly added memory and the process ends.

図７Ａ～図１０で、本発明の幾つかの実施形態が開示されている。しかし、当該技術分野における通常の知識を有する者は、段階の順序を変更するか、段階を省略するか、又は図面に示されていない連結を含めることによって、本発明の他の実施形態も実施可能である。フローチャートのすべてのこのような変形は、明示的に記載されているか否かに拘らず、本発明の実施形態と考えられる。 7A-10 disclose some embodiments of the present invention. However, those with ordinary knowledge in the art may also implement other embodiments of the invention by rearranging the order of the steps, omitting the steps, or including concatenation not shown in the drawings. It is possible. All such modifications of the flow chart are considered embodiments of the invention, whether explicitly stated or not.

以下の説明は、本発明の技術的思想の特定の側面が具現される適切なマシンの簡単且つ一般的な説明を提供する。マシンは、少なくとも部分的に、他のマシンから受信する指示、仮想現実（ｖｉｒｔｕａｌＲｅａｌｉｔｙ：ＶＲ）との相互作用、生体認証フィードバック、又は他の入力信号のみならず、キーボード、マウス等のような既存の入力装置からの入力によって、制御される。本明細書で使用される、「マシン」という用語は１つのマシン、仮想マシン、通信可能に結合されたマシンのシステム、又は共に動作する装置を幅広く含む。マシンの一例は、例えば、自動車、汽車、タクシー等の個人又は公共交通機関のような輸送機器のみならず、個人用コンピュータ、ワークステーション、サーバー、携帯用コンピュータ、ハンドヘルド（ｈａｎｄｈｅｌｄ）装置、電話機、タブレット等のようなコンピューティング装置を含む。 The following description provides a brief and general description of a suitable machine that embodies certain aspects of the technical idea of the present invention. Machines, at least in part, are existing such as instructions received from other machines, interaction with virtual reality (VR), biometric feedback, or other input signals, as well as keyboards, mice, etc. It is controlled by the input from the input device of. As used herein, the term "machine" broadly includes one machine, a virtual machine, a system of communicably coupled machines, or devices that work together. Examples of machines include personal computers, workstations, servers, portable computers, handheld devices, telephones, tablets, as well as transportation equipment such as personal or public transportation such as automobiles, trains, taxis, etc. Includes computing devices such as.

マシンは、プログラム可能な又は非プログラマブルな論理装置又はアレイ、特定用途向け集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔｓ、ＡＳＩＣ）、組込みコンピュータ、スマトカード等のような組込みコントローラを含む。マシンは、例えばネットワークインターフェイス、モデム、又は他の通信結合器を通じて、１つ以上の遠隔マシンに対する１つ以上の接続を利用する。マシンは、イントラネット（登録商標）、インターネット、ＬＡＮ（ｌｏｃａｌａｒｅａｎｅｔｗｏｒｋ）、ＷＡＮ（ｗｉｄｅａｒｅａｎｅｔｗｏｒｋ）等のような物理的及び／又は論理的ネットワーク手段で相互連結される。当業者はネットワークコミニュケーションが、無線（ＲＦ）、衛星、マイクロ波、ＩＥＥＥ８０２．１１規格、ブルートゥース（登録商標）、光学的、赤外線、ケーブル、レーザー等を含む多様な有線及び／又は無線の短距離又は長距離のキャリヤー及びプロトコルを使用できることを理解する。 Machines include programmable or non-programmable logic devices or arrays, embedded controllers such as application specific integrated circuits (ASICs), embedded computers, smart cards, and the like. The machine utilizes one or more connections to one or more remote machines, for example through a network interface, modem, or other communication combiner. The machines are interconnected by physical and / or logical network means such as intranet®, internet, LAN (local area network), WAN (wide area network) and the like. We have a variety of wired and / or wireless short-range or wireless network communication including wireless (RF), satellite, microwave, 802.11 standard, Bluetooth®, optical, infrared, cable, laser, etc. Understand that long range carriers and protocols can be used.

本発明の実施形態は、マシンによってアクセスされる時、マシンが業務を遂行するか、或いは抽象的なデータタイプ又は低いレベルのハードウェアコンテキストを定義する結果を行う、機能、手続、データ構造、アプリケーションプログラム等を含む関連データを参照するか、又は関連データと共に説明される。関連データは、例えば、揮発性及び／又は不揮発性メモリ（例えば、ＲＡＭ、ＲＯＭ等）、又は他の記憶装置、及びハードドライブ、フロッピーディスク、光学記憶装置、テープ、フラッシュメモリ、メモリスティック、デジタルビデオディスク、生物学的記憶装置等を含む関連記憶媒体に格納される。関連データは、物理及び／又は論理ネットワークを含む伝送環境を通じてパケット、シリアルデータ、並列データ、伝播信号等の形態で伝達され、圧縮又は暗号化されたフォーマットで使用される。関連データは、分散環境で使用され、マシンアクセスに対してローカル及び／又は遠隔に格納される。 Embodiments of the invention, when accessed by a machine, perform a function, procedure, data structure, application that results in the machine performing business or defining an abstract data type or low level hardware context. Refer to related data including programs, etc., or be explained together with related data. Related data includes, for example, volatile and / or non-volatile memory (eg, RAM, ROM, etc.), or other storage devices, and hard drives, floppy disks, optical storage devices, tapes, flash memories, memory sticks, digital video. It is stored in a related storage medium including a disk, biological storage device, and the like. Related data is transmitted in the form of packets, serial data, parallel data, propagated signals, etc. through transmission environments including physical and / or logical networks and is used in compressed or encrypted formats. Relevant data is used in a distributed environment and is stored locally and / or remotely to machine access.

本発明の実施形態は、１つ以上のプロセッサによって実行可能な命令を含むタイプの、非一時的なマシンで解読することができる媒体を含み、命令は本明細書で説明した発明の概念の構成要素を遂行するための命令を含む。 Embodiments of the invention include a medium that can be decoded by a non-temporary machine of the type including instructions that can be executed by one or more processors, where the instructions constitute the concept of the invention described herein. Contains instructions to carry out the element.

以上、図面を参照しながら本発明の実施形態を説明したが、本発明は、本発明の技術範囲から逸脱しない範囲で変形され、他の望ましい方式に結合される。また、本明細書の説明は特定の実施形態に集中したが、他の構成も考慮され得る。特に、「本発明の実施形態によれば」等の表現が本明細書で使用されるが、このような句は実施形態の可能性を一般的に参照することを意味し、特定の実施構成として本発明を制限するものではない。本明細書で使用した、これらの用語は他の実施形態と結合可能な同一又は異なる実施形態を参照する。 Although the embodiments of the present invention have been described above with reference to the drawings, the present invention is modified without departing from the technical scope of the present invention and combined with other desirable methods. Also, although the description herein has focused on a particular embodiment, other configurations may be considered. In particular, expressions such as "according to embodiments of the present invention" are used herein, such clauses mean to generally refer to the possibilities of embodiments, and particular embodiments. It does not limit the present invention. As used herein, these terms refer to the same or different embodiments that can be combined with other embodiments.

本発明は、上述の実施形態に限定されるものではなく、そのような実施形態に対して多様に変更実施が可能である。したがって、このようなすべての変形は本発明の技術範囲内に含まれる。 The present invention is not limited to the above-described embodiment, and various modifications can be made to such an embodiment. Therefore, all such modifications are within the scope of the present invention.

本発明の実施形態は以下の説明に制限無しで拡張される。 The embodiments of the present invention are extended without limitation to the following description.

説明１．本発明の実施形態によるメモリシステムは、データを格納するメモリと、
前記メモリに格納され、所定数のバケット及び第１の数のウェイを含み、２の第１の累乗である第１数のバイトを含む前記メモリの第１部分を含む大きいハッシュテーブルと、
前記メモリに格納され、所定数のバケット及び第２の数のウェイを含み、２の第２の累乗である第２数のバイトを含む前記メモリの第２部分を含む小さいハッシュテーブルと、
前記メモリに格納され、前記メモリの第３部分を含むオーバーフロー領域と、
論理アドレスを領域識別子及び物理アドレスを含むＰＬＩＤ（ＰｈｙｓｉｃａｌＬｉｎｅＩｄｅｎｔｉｆｉｅｒ）にマッピングするための変換テーブルを含むメモリシステムと、を備える。 Explanation 1. The memory system according to the embodiment of the present invention includes a memory for storing data and a memory system.
A large hash table stored in the memory, containing a predetermined number of buckets and a first number of ways, and containing a first portion of the memory containing a first power of two, a first number of bytes.
A small hash table stored in the memory, containing a predetermined number of buckets and a second number of ways, and a second portion of the memory containing a second power of 2 bytes.
An overflow area stored in the memory and including a third portion of the memory,
A memory system including a conversion table for mapping a logical address to a PLID (Physical Line Identity) including an area identifier and a physical address is provided.

説明２．本発明の実施形態は、説明１にしたがうメモリシステムを含み、前記領域識別子は、前記ＰＬＩＤが前記大きいハッシュテーブルのデータを識別することを示す第１ビットを含む。 Explanation 2. An embodiment of the invention includes a memory system according to Description 1, wherein the area identifier includes a first bit indicating that the PLID identifies data in the large hash table.

説明３．本発明の実施形態は、説明２にしたがうメモリシステムを含み、前記物理アドレスは、ローインデックス及びカラムインデックスを含む。 Explanation 3. An embodiment of the present invention includes a memory system according to Description 2, wherein the physical address includes a low index and a column index.

説明４．本発明の実施形態は、説明２にしたがうメモリシステムを含み、
前記第１ビットは、前記ＰＬＩＤが前記大きいハッシュテーブルのデータを識別しないことを示し、
前記領域識別子は、前記ＰＬＩＤが前記小さいハッシュテーブル内にあるか又は前記オーバーフロー領域内にあるかを示す第２ビットを含む。 Explanation 4. Embodiments of the present invention include a memory system according to Description 2.
The first bit indicates that the PLID does not identify the data in the large hash table.
The region identifier includes a second bit indicating whether the PLID is in the small hash table or in the overflow region.

説明５．本発明の実施形態は、説明４にしたがうメモリシステムを含み、
前記第２ビットは、前記ＰＬＩＤが前記小さいハッシュテーブルのデータを職別することを示し、
前記物理アドレスは、ローインデックス及びカラムインデックスを含む。 Explanation 5. Embodiments of the present invention include a memory system according to Description 4.
The second bit indicates that the PLID classifies the data in the small hash table by job.
The physical address includes a low index and a column index.

説明６．本発明の実施形態は、説明２にしたがうメモリシステムを含み、前記小さいハッシュテーブルは、動的に大きくなる。 Explanation 6. An embodiment of the present invention includes a memory system according to Description 2, wherein the small hash table dynamically grows.

説明７．本発明の実施形態は、説明６にしたがうメモリシステムを含み、前記大きいハッシュテーブルは、動的に大きくなる。 Explanation 7. An embodiment of the present invention includes a memory system according to Description 6, wherein the large hash table dynamically grows.

説明８．本発明の実施形態は、説明１によるメモリシステムを含み、前記メモリシステムに対する第１の効果的な最小の重複除去比率は、前記小さいハッシュテーブル無しで前記大きいハッシュテーブルに対する第２の効果的な最小の重複除去比率よりも低い。 Explanation 8. Embodiments of the invention include the memory system according to Description 1, where the first effective minimum deduplication ratio for the memory system is the second effective minimum for the large hash table without the small hash table. Is lower than the deduplication ratio of.

説明９．本発明の実施形態は、説明１にしたがうメモリシステムを含み、
前記メモリに格納され、前記大きいハッシュテーブル及び前記小さいハッシュテーブルに格納された複数のデータのシグネチャーを格納するシグネチャーテーブルをさらに含み、
前記シグネチャーテーブルは、共通シグネチャーがある多数のデータを前記大きいハッシュテーブル又は前記小さいハッシュテーブルのローに格納することを防止する。 Explanation 9. Embodiments of the present invention include a memory system according to Description 1.
Further including a signature table stored in the memory and storing a signature of the large hash table and a plurality of data stored in the small hash table.
The signature table prevents a large amount of data having a common signature from being stored in the row of the large hash table or the small hash table.

説明１０．本発明の実施形態による方法は、
プロセッサから論理アドレスを受信する段階と、
変換テーブルを使用して、前記論理アドレスを領域識別子及び物理アドレスを含むＰＬＩＤ（ＰｈｙｓｉｃａｌＬｉｎｅＩｄｅｎｔｉｆｉｅｒ）にマッピングする段階と、
前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階と、
前記物理アドレスを利用して前記メモリ内のデータにアクセスする段階と、を有する。 Explanation 10. The method according to the embodiment of the present invention
The stage of receiving a logical address from the processor and
A step of mapping the logical address to a PLID (Physical Line Identity) including a region identifier and a physical address using a translation table, and a step of mapping.
The step of determining whether or not the physical address exists in the large hash table, the small hash table, or the overflow area in the memory by using the area identifier, and
It has a stage of accessing data in the memory by using the physical address.

説明１１．本発明の実施形態は、説明１０にしたがう方法を含み、前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階は、前記領域識別子の第１ビットが設定されない場合、前記物理アドレスが前記大きいハッシュテーブルにあると判別する段階を含む。 Explanation 11. An embodiment of the present invention includes a method according to the description 10, and uses the area identifier to determine whether or not the physical address is present in a large hash table, a small hash table, or an overflow area in a memory. Includes a step of determining that the physical address is in the large hash table if the first bit of the region identifier is not set.

説明１２．本発明の実施形態は、説明１０にしたがう方法を含み、前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階は、前記領域識別子の第１ビットが設定されて前記領域識別子の第２ビットが設定されない場合、前記物理アドレスが前記小さいハッシュテーブルにあると判別する段階を含む。 Explanation 12. An embodiment of the present invention includes a method according to the description 10, and uses the area identifier to determine whether or not the physical address is present in a large hash table, a small hash table, or an overflow area in a memory. Includes the step of determining that the physical address is in the small hash table when the first bit of the area identifier is set and the second bit of the area identifier is not set.

説明１３．本発明の実施形態は、説明１０にしたがう方法を含み、前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階は、前記領域識別子の第１ビットが設定されて前記領域識別子の第２ビットが設定された場合、前記物理アドレスが前記オーバーフロー領域にあると判別する段階を含む。 Explanation 13. An embodiment of the present invention includes a method according to the description 10, and uses the area identifier to determine whether or not the physical address is present in a large hash table, a small hash table, or an overflow area in a memory. Includes a step of determining that the physical address is in the overflow region when the first bit of the region identifier is set and the second bit of the region identifier is set.

説明１４．本発明の実施形態は、説明１０にしたがう方法を含み、前記物理アドレスを利用して前記メモリ内のデータにアクセスする段階は、
前記物理アドレスからローインデックス及びカラムインデックスを判別する段階と、
前記ローインデックス及び前記カラムインデックスを利用して前記大きいハッシュテーブル及び前記小さいハッシュテーブルの中の１つのデータにアクセスする段階と、を含む。 Explanation 14. The embodiment of the present invention includes the method according to the description 10, and the step of accessing the data in the memory by using the physical address is
The stage of determining the low index and column index from the physical address, and
The low index and the column index are used to access one of the data in the large hash table and the small hash table.

説明１５．本発明の実施形態は、説明１４にしたがう方法を含み、前記ローインデックス及び前記カラムインデックスを利用して前記大きいハッシュテーブル及び前記小さいハッシュテーブルの中の１つのデータにアクセスする段階は、前記小さいハッシュテーブルの前記ローインデックス及び前記カラムインデックスで前記データが発見されない場合、前記小さいハッシュテーブルの近くのエントリを検索する段階を含む。 Explanation 15. Embodiments of the present invention include the method according to Description 14, and the step of accessing one data in the large hash table and the small hash table using the low index and the column index is the small hash. If the data is not found in the low index and the column index of the table, it comprises searching for entries near the small hash table.

説明１６．本発明の実施形態は、説明１０にしたがう方法を含み、前記物理アドレスを利用して前記メモリ内のデータにアクセスする段階は、前記物理アドレスを利用して前記オーバーフロー領域のデータにアクセスする段階を含む。 Explanation 16. The embodiment of the present invention includes the method according to the description 10, and the step of accessing the data in the memory by using the physical address is the step of accessing the data in the overflow area by using the physical address. include.

説明１７．本発明の実施形態は、説明１０にしたがう方法を含み、
前記小さいハッシュテーブルが収容容量に近づいていることを判別する段階と、
前記オーバーフロー領域のサイズを減少させる一方、前記小さいハッシュテーブルのサイズを増加させる段階と、をさらに含む。 Explanation 17. Embodiments of the present invention include a method according to Description 10.
At the stage of determining that the small hash table is approaching the capacity,
It further comprises a step of reducing the size of the overflow area while increasing the size of the small hash table.

説明１８．本発明の実施形態は、説明１７にしたがう方法を含み、
前記小さいハッシュテーブルのサイズを増加させる段階は、
前記小さいハッシュテーブルのサイズを２倍に増加させる段階と、
前記オーバーフロー領域のサイズを減少させる段階と、を含む。 Explanation 18. Embodiments of the present invention include a method according to Description 17.
The step of increasing the size of the small hash table is
At the stage of doubling the size of the small hash table,
Includes a step of reducing the size of the overflow region.

説明１９．本発明の実施形態は、説明１７にしたがう方法を含み、前記小さいハッシュテーブルのサイズを増加させる段階は、前記小さいハッシュテーブルのカラムに相当するウェイの数を増加させる段階を含む。 Explanation 19. An embodiment of the present invention comprises the method according to Description 17, wherein the step of increasing the size of the small hash table comprises increasing the number of ways corresponding to the columns of the small hash table.

説明２０．本発明の実施形態は、説明１０にしたがう方法を含み、
前記物理アドレスを利用して前記メモリ内のデータにアクセスする段階は、前記メモリに前記データを書き込む段階を含み、
変換テーブルを使用して、前記論理アドレスを領域識別子及び物理アドレスを含むＰＬＩＤにマッピングする段階は、前記変換テーブルを使用して前記データを書込みために、前記大きいハッシュテーブル、前記小さいハッシュテーブル、及び前記オーバーフロー領域の中の１つを選択する段階を含む。 Explanation 20. Embodiments of the present invention include a method according to Description 10.
The step of accessing the data in the memory using the physical address includes the step of writing the data to the memory.
The step of using a translation table to map the logical address to a PLID containing a region identifier and a physical address is to use the translation table to write the data to the large hash table, the small hash table, and the small hash table. It comprises the step of selecting one of the overflow areas.

説明２１．本発明の実施形態は、説明２０にしたがう方法を含み、前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階は、
シグネチャーを生産するために前記データにハッシュ関数を適用する段階と、
前記シグネチャーに対するシグネチャーテーブルを確認する段階と、
前記シグネチャーが前記シグネチャーテーブルにある場合、前記データを前記オーバーフロー領域に書き込む段階と、を含む。 Explanation 21. An embodiment of the present invention includes the method according to the description 20, and uses the area identifier to determine whether or not the physical address is in a large hash table, a small hash table, or an overflow area in the memory. teeth,
At the stage of applying a hash function to the data to produce a signature,
At the stage of checking the signature table for the signature,
If the signature is in the signature table, it includes writing the data to the overflow area.

説明２２．本発明の実施形態は、説明２１にしたがう方法を含み、前記シグネチャーに対するシグネチャーテーブルを確認する段階は、前記物理アドレスのローでの前記シグネチャーに対するシグネチャーテーブルを確認する段階を含む。 Explanation 22. An embodiment of the invention comprises the method according to Description 21, wherein the step of checking the signature table for the signature comprises checking the signature table for the signature at the row of the physical address.

説明２３．本発明の実施形態によるコンピュータ読み取り可能な記録媒体は、コンピュータに下記方法を実行させるためのプログラムを記録し、、
前記方法は、
プロセッサから論理アドレスを受信する段階と、
変換テーブルを使用して、前記論理アドレスを領域識別子及び物理アドレスを含むＰＬＩＤ（ＰｈｙｓｉｃａｌＬｉｎｅＩｄｅｎｔｉｆｉｅｒ）にマッピングする段階と、
前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階と、
前記物理アドレスを使用して前記メモリ内のデータにアクセスする段階と、を有する。 Explanation 23. The computer-readable recording medium according to the embodiment of the present invention records a program for causing the computer to execute the following method, and ...
The method is
The stage of receiving a logical address from the processor and
A step of mapping the logical address to a PLID (Physical Line Identity) including a region identifier and a physical address using a translation table, and a step of mapping.
The step of determining whether or not the physical address exists in the large hash table, the small hash table, or the overflow area in the memory by using the area identifier, and
It has a step of accessing data in the memory using the physical address.

説明２４．本発明の実施形態は、説明２３にしたがう記録媒体を含み、前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階は、前記領域識別子の第１ビットが設定されない場合、前記物理アドレスが前記大きいハッシュテーブルにあると判別する段階を含む。 Explanation 24. An embodiment of the present invention includes a recording medium according to the description 23, and uses the area identifier to determine whether or not the physical address is in a large hash table, a small hash table, or an overflow area in a memory. The step includes determining that the physical address is in the large hash table if the first bit of the region identifier is not set.

説明２５．本発明の実施形態は、説明２３にしたがう記録媒体を含み、前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階は、前記領域識別子の第１ビットが設定されて前記領域識別子の第２ビットが設定されない場合、前記物理アドレスが前記小さいハッシュテーブルにあると判別する段階を含む。 Explanation 25. An embodiment of the present invention includes a recording medium according to the description 23, and uses the area identifier to determine whether or not the physical address is in a large hash table, a small hash table, or an overflow area in a memory. The step includes the step of determining that the physical address is in the small hash table when the first bit of the area identifier is set and the second bit of the area identifier is not set.

説明２６．本発明の実施形態は、説明２３にしたがう記録媒体を含み、前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階は、前記領域識別子の第１ビットが設定されて前記領域識別子の第２ビットが設定された場合、前記物理アドレスが前記オーバーフロー領域にあると判別する段階を含む。 Explanation 26. An embodiment of the present invention includes a recording medium according to the description 23, and uses the area identifier to determine whether or not the physical address is in a large hash table, a small hash table, or an overflow area in a memory. The step includes a step of determining that the physical address is in the overflow region when the first bit of the region identifier is set and the second bit of the region identifier is set.

説明２７．本発明の実施形態は、説明２３にしたがう記録媒体を含み、前記物理アドレスを利用して前記メモリ内のデータにアクセスする段階は、
前記物理アドレスからローインデックス及びカラムインデックスを判別する段階と、
前記ローインデックス及び前記カラムインデックスを利用して前記大きいハッシュテーブル及び前記小さいハッシュテーブルの中の１つのデータにアクセスする段階と、を含む。 Explanation 27. The embodiment of the present invention includes a recording medium according to the description 23, and the step of accessing the data in the memory by using the physical address is
The stage of determining the low index and column index from the physical address, and
The low index and the column index are used to access one of the data in the large hash table and the small hash table.

説明２８．本発明の実施形態は、説明２７にしたがう記録媒体を含み、前記ローインデックス及び前記カラムインデックスを利用して前記大きいハッシュテーブル及び前記小さいハッシュテーブルの中の１つのデータにアクセスする段階は、前記小さいハッシュテーブルの前記ローインデックス及び前記カラムインデックスで前記データが発見されない場合、前記小さいハッシュテーブルの付近のエントリを検索する段階を含む。 Explanation 28. The embodiment of the present invention includes a recording medium according to the description 27, and the step of accessing one data in the large hash table and the small hash table by using the low index and the column index is small. If the data is not found in the low index and the column index of the hash table, it includes a step of searching for entries in the vicinity of the small hash table.

説明２９．本発明の実施形態は、説明２３にしたがう記録媒体を含み、前記物理アドレスを利用して前記メモリ内のデータにアクセスする段階は、前記物理アドレスを利用して前記オーバーフロー領域のデータにアクセスする段階を含む。 Explanation 29. The embodiment of the present invention includes a recording medium according to the description 23, and the step of accessing the data in the memory by using the physical address is the step of accessing the data in the overflow area by using the physical address. including.

説明３０．本発明の実施形態は、説明２３にしたがう記録媒体を含み、前記方法は、
前記小さいハッシュテーブルが収容容量に近づいていることを判別する段階と、
前記オーバーフロー領域のサイズを減少させる一方、前記小さいハッシュテーブルのサイズを増加させる段階と、をさらに含む。 Explanation 30. An embodiment of the present invention comprises a recording medium according to Description 23, wherein the method is:
At the stage of determining that the small hash table is approaching the capacity,
It further comprises a step of reducing the size of the overflow area while increasing the size of the small hash table.

説明３１．本発明の実施形態は、説明３０にしたがう記録媒体を含み、
前記小さいハッシュテーブルのサイズを増加させる段階は、
前記小さいハッシュテーブルのサイズを２倍に増加させる段階と、
前記オーバーフロー領域のサイズを減少させる段階と、を含む。 Explanation 31. An embodiment of the present invention includes a recording medium according to the description 30.
The step of increasing the size of the small hash table is
At the stage of doubling the size of the small hash table,
Includes a step of reducing the size of the overflow region.

説明３２．本発明の実施形態は、説明３０にしたがう記録媒体を含み、前記小さいハッシュテーブルのサイズを増加させる段階は、前記小さいハッシュテーブルのカラムに相当するウェイの数を増加させる段階を含む。 Explanation 32. An embodiment of the present invention includes a recording medium according to Description 30, and the step of increasing the size of the small hash table comprises increasing the number of ways corresponding to the columns of the small hash table.

説明３３．本発明の実施形態は、説明２３にしたがう記録媒体を含み、
前記物理アドレスを利用して前記メモリ内のデータにアクセスする段階は、前記メモリに前記データを書き込む段階を含み、
変換テーブルを使用して、前記論理アドレスを領域識別子及び物理アドレスを含むＰＬＩＤにマッピングする段階は、前記変換テーブルを使用して前記データを書込みために、前記大きいハッシュテーブル、前記小さいハッシュテーブル、及び前記オーバーフロー領域の中の１つを選択する段階を含む。 Explanation 33. An embodiment of the present invention includes a recording medium according to the description 23, and comprises a recording medium.
The step of accessing the data in the memory using the physical address includes the step of writing the data to the memory.
The step of using a translation table to map the logical address to a PLID containing a region identifier and a physical address is to use the translation table to write the data to the large hash table, the small hash table, and the small hash table. It comprises the step of selecting one of the overflow areas.

説明３４．本発明の実施形態は、説明３３にしたがう記録媒体を含み、前記領域識別子を利用して、メモリ内の大きいハッシュテーブル、小さいハッシュテーブル、又はオーバーフロー領域に前記物理アドレスがあるか否かを判別する段階は、
シグネチャーを生産するために前記データにハッシュ関数を適用する段階と、
前記シグネチャーに対するシグネチャーテーブルを確認する段階と、
前記シグネチャーが前記シグネチャーテーブルにある場合、前記データを前記オーバーフロー領域に書き込む段階と、を含む。 Explanation 34. An embodiment of the present invention includes a recording medium according to the description 33, and uses the area identifier to determine whether or not the physical address is in a large hash table, a small hash table, or an overflow area in a memory. The stage is
At the stage of applying a hash function to the data to produce a signature,
At the stage of checking the signature table for the signature,
If the signature is in the signature table, it includes writing the data to the overflow area.

説明３５．本発明の実施形態は、説明３４にしたがう記録媒体を含み、前記シグネチャーに対するシグネチャーテーブルを確認する段階は、前記物理アドレスのローでの前記シグネチャーに対するシグネチャーテーブルを確認する段階を含む。 Explanation 35. An embodiment of the present invention includes a recording medium according to the description 34, and a step of checking the signature table for the signature includes a step of checking the signature table for the signature at the row of the physical address.

１０５マシン
１１０プロセッサ
１１５メモリ
１２０格納装置
１２５メモリコントローラ
１３０装置ドライバー
２０５クロック
２１０ネットワークコネクター
２１５バス
２２０使用者インターフェイス
２２５入／出力エンジン
３０５（大きい）ハッシュテーブル
３１０変換テーブル
３１５シグネチャーテーブル
３２０オーバーフロー領域
４０５、５０５小さいハッシュテーブル
４１０、６２５、６３０、６３５エントリ
４１５データ
４２０度数カウンター
６０５論理アドレス
６１０ＰＬＩＤ（ＰｈｙｓｉｃａｌＬｉｎｅＩｄｅｎｔｉｆｉｅｒ）
６１５領域識別子
６２０物理アドレス
105 Machine 110 Processor 115 Memory 120 Storage Device 125 Memory Controller 130 Device Driver 205 Clock 210 Network Connector 215 Bus 220 User Interface 225 Input / Output Engine 305 (Large) Hash Table 310 Conversion Table 315 Signature Table 320 Overflow Area 405, 505 Small Hash table 410, 625, 630, 635 Entry 415 Data 420 Frequency counter 605 Logical address 610 PLID (Physical Line Processor)
615 Area identifier 620 Physical address

Claims

Memory to store data and
A large hash table stored in the memory, containing a predetermined number of buckets and a first number of ways, and containing a first portion of the memory containing a first power of two, a first number of bytes.
A small hash table stored in the memory, containing a predetermined number of buckets and a second number of ways, and a second portion of the memory containing a second power of 2 bytes.
An overflow area stored in the memory and including a third portion of the memory,
A memory system comprising: a conversion table for mapping a logical address to a PLID (Physical Line Identity) including an area identifier and a physical address.

The memory system according to claim 1, wherein the area identifier includes a first bit indicating that the PLID identifies data in the large hash table.

The memory system according to claim 2, wherein the physical address includes a low index and a column index.

The first bit indicates that the PLID does not identify the data in the large hash table.
The memory system according to claim 2, wherein the area identifier includes a second bit indicating whether the PLID is in the small hash table or in the overflow area.

The memory system according to claim 1, wherein the small hash table dynamically grows.

Claim 1 characterized in that the first effective minimum deduplication ratio for the memory system is less than the second effective minimum deduplication ratio for the large hash table without the small hash table. The memory system described in.

Further including a signature table stored in the memory and storing a signature of the large hash table and a plurality of data stored in the small hash table.
The memory system according to claim 1, wherein the signature table prevents a large amount of data having a common signature from being stored in the row of the large hash table or the small hash table.

The stage of receiving a logical address from the processor and
A step of mapping the logical address to a PLID (Physical Line Identity) including a region identifier and a physical address using a translation table, and a step of mapping.
The step of determining whether or not the physical address exists in the large hash table, the small hash table, or the overflow area in the memory by using the area identifier, and
A method characterized by having a step of accessing data in the memory using the physical address.

When the first bit of the area identifier is not set, the step of determining whether or not the physical address exists in the large hash table, the small hash table, or the overflow area in the memory by using the area identifier is described. The method of claim 8, wherein the method comprises the step of determining that the physical address is in the large hash table.

In the step of determining whether or not the physical address exists in the large hash table, the small hash table, or the overflow area in the memory by using the area identifier, the first bit of the area identifier is set and the area is set. The method according to claim 8, wherein when the second bit of the identifier is not set, the step of determining that the physical address is in the small hash table is included.

In the step of determining whether or not the physical address exists in the large hash table, the small hash table, or the overflow area in the memory by using the area identifier, the first bit of the area identifier is set and the area is set. The method according to claim 8, wherein when the second bit of the identifier is set, the step of determining that the physical address is in the overflow area is included.

The stage of accessing the data in the memory using the physical address is
The stage of determining the low index and column index from the physical address, and
The method according to claim 8, wherein the method comprising the step of accessing one data in the large hash table and the small hash table by using the low index and the column index.

At the stage of accessing one data in the large hash table and the small hash table by using the low index and the column index, the data is not found in the low index and the column index of the small hash table. The method of claim 12, wherein the method comprises searching for entries near the small hash table.

At the stage of determining that the small hash table is approaching the capacity,
8. The method of claim 8, further comprising a step of reducing the size of the overflow area while increasing the size of the small hash table.

14. The method of claim 14, wherein the step of increasing the size of the small hash table comprises increasing the number of ways corresponding to the columns of the small hash table.

A computer-readable recording medium on which a program for causing a computer to execute the method according to any one of claims 8 to 12 is recorded.