JP6243884B2

JP6243884B2 - Information processing apparatus, processor, and information processing method

Info

Publication number: JP6243884B2
Application number: JP2015197189A
Authority: JP
Inventors: 英幸斎藤
Original assignee: Sony Interactive Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2015-10-02
Filing date: 2015-10-02
Publication date: 2017-12-06
Anticipated expiration: 2035-10-02
Also published as: JP2017068805A

Description

本発明は、二次記憶装置としてフラッシュメモリを用いる情報処理装置および情報処理方法に関する。 The present invention relates to an information processing apparatus and information processing method using a flash memory as a secondary storage device.

ＮＡＮＤ型フラッシュメモリの容量拡大に伴い、従来のＨＤＤ（Hard Disk Drive）に代わる記憶装置としてＳＳＤ（Solid State Drive）が用いられるようになってきた（例えば特許文献１参照）。一般的にＳＳＤは、複数のＮＡＮＤ型フラッシュメモリデバイスを並列させた記憶装置を含み、個々のデバイス性能と並列化により、ＨＤＤと比較し格段に高速なデータアクセスを実現できる。 With the expansion of the capacity of NAND flash memories, SSDs (Solid State Drives) have come to be used as storage devices that replace conventional HDDs (Hard Disk Drives) (see, for example, Patent Document 1). In general, an SSD includes a storage device in which a plurality of NAND flash memory devices are arranged in parallel, and can achieve data access much faster than an HDD by paralleling individual device performance and parallelism.

ＷＯ２０１４／１３２３４６Ａ１公報WO 2014/132346 A1

アプリケーションがＳＳＤに格納されたファイルへのアクセス要求を行ってから、実際に当該ファイルのデータを用いた処理が開始されるまでには、様々な手続きが必要となる。代表的には要求されたファイル名に基づきメタデータを参照し論理アドレスを取得したり、論理アドレスを実際にデータが格納されている領域を表す物理アドレスに変換したり、読み出されたデータに対し改ざんチェックや復号処理を施したりする必要がある。これらの処理は、フラッシュメモリの容量が大きくなるほど複雑化していく傾向にあり、処理の負荷の増大がレイテンシに繋がり、ＳＳＤが有する高い伝送レートを活かしきれない、という課題があった。 Various procedures are required from when an application makes an access request to a file stored in the SSD to when processing using the data of the file is actually started. Typically, the logical address is obtained by referring to the metadata based on the requested file name, the logical address is converted into a physical address representing the area where the data is actually stored, or the read data is It is necessary to perform tampering checks and decryption processing. These processes tend to become more complex as the capacity of the flash memory increases, and there is a problem that an increase in processing load leads to latency and the high transmission rate of the SSD cannot be utilized.

本発明はこうした課題に鑑みてなされたものであり、その目的は、ＳＳＤを利用する情報処理装置におけるデータアクセスを効率化できる技術を提供することにある。 The present invention has been made in view of these problems, and an object of the present invention is to provide a technique capable of improving the efficiency of data access in an information processing apparatus using an SSD.

本発明のある態様は情報処理装置に関する。この情報処理装置は、二次記憶装置に格納されたファイルのアクセス要求を発行するメインプロセッサと、アクセス要求を受け付け、ファイルを構成する複数のデータブロックごとのアクセス要求に分割して発行し、ファイルを構成するデータブロックが全て読み出されアクセス可能となった時点でメインプロセッサに通知するサブプロセッサと、サブプロセッサからのアクセス要求を受け付け、二次記憶装置から要求されたデータブロックを読み出すごとにサブプロセッサに通知するコントローラと、を備えたことを特徴とする。 One embodiment of the present invention relates to an information processing apparatus. This information processing device has a main processor that issues an access request for a file stored in a secondary storage device, receives the access request, issues the access request for each of a plurality of data blocks constituting the file, The sub-processor that notifies the main processor when all the data blocks that make up the memory are read and accessible, and the access request from the sub-processor is accepted, and the sub-block is read each time the requested data block is read from the secondary storage device. And a controller for notifying the processor.

本発明の別の態様はプロセッサに関する。このプロセッサは、情報処理装置において二次記憶装置に格納されたファイルへのアクセス要求を他のプロセッサから受け付け、ファイルを構成する複数のデータブロックごとのアクセス要求に分割するアクセス要求分割部と、分割されたアクセス要求を二次記憶装置へのアクセスを制御するコントローラに発行するアクセス要求発行部と、コントローラにより前記ファイルを構成するデータブロックが全て読み出されアクセス可能となった時点で、前記他のプロセッサに通知する通知部と、を備えたことを特徴とする。 Another aspect of the invention relates to a processor. The processor receives an access request for a file stored in the secondary storage device in the information processing apparatus from another processor, and divides the access request into access requests for each of a plurality of data blocks constituting the file, An access request issuing unit that issues the access request issued to the controller that controls access to the secondary storage device, and when all the data blocks constituting the file are read and accessible by the controller, the other And a notification unit for notifying the processor.

本発明のさらに別の態様は情報処理方法に関する。この情報処理方法は、メインプロセッサが、二次記憶装置に格納されたファイルへのアクセス要求を発行するステップと、サブプロセッサが、アクセス要求を受け付け、ファイルを構成する複数のデータブロックごとのアクセス要求に分割して発行するステップと、コントローラが、サブプロセッサからのアクセス要求を受け付け、二次記憶装置から要求されたデータブロックを読み出すごとにサブプロセッサに通知するステップと、サブプロセッサが、ファイルを構成するデータブロックが全て読み出されアクセス可能となった時点でメインプロセッサに通知するステップと、を含むことを特徴とする。 Yet another embodiment of the present invention relates to an information processing method. The information processing method includes a step in which a main processor issues an access request to a file stored in a secondary storage device, and an access request for each of a plurality of data blocks constituting a file, in which a sub processor accepts the access request. A step in which the controller accepts an access request from the sub processor and notifies the sub processor each time the requested data block is read from the secondary storage device, and the sub processor configures the file And a step of notifying the main processor when all the data blocks to be read are read and accessible.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、コンピュータプログラム、コンピュータプログラムを記録した記録媒体などの間で変換したものもまた、本発明の態様として有効である。 Note that any combination of the above-described components, and the expression of the present invention converted between a method, an apparatus, a system, a computer program, a recording medium on which the computer program is recorded, and the like are also effective as an aspect of the present invention. .

本発明によると、ＳＳＤを利用した情報処理装置をリソースおよび処理時間の双方で効率化できる。 According to the present invention, an information processing apparatus using SSD can be made efficient in both resources and processing time.

本実施の形態における情報処理装置の内部構成を示す図である。It is a figure which shows the internal structure of the information processing apparatus in this Embodiment. 本実施の形態におけるフラッシュメモリに格納されたデータとアドレス変換テーブルの構成との関係を模式的に示す図である。It is a figure which shows typically the relationship between the data stored in the flash memory in this Embodiment, and the structure of an address conversion table. 本実施の形態において第２アドレス空間のデータの格納領域を取得する手法を説明するための図である。It is a figure for demonstrating the method of acquiring the storage area of the data of 2nd address space in this Embodiment. 本実施の形態における情報処理装置の内部構成を示す図である。It is a figure which shows the internal structure of the information processing apparatus in this Embodiment. 本実施の形態におけるソフトウェアスタックの構成を示す図である。It is a figure which shows the structure of the software stack in this Embodiment. 本実施の形態におけるファイルアーカイブおよびフラッシュコントローラが、処理対象のファイルデータをフラッシュメモリに格納する手順を模式的に示す図である。It is a figure which shows typically the procedure in which the file archive and flash controller in this Embodiment store the file data to be processed in the flash memory. 本実施の形態における、ファイルアーカイブを用いて要求されたファイルへアクセスするまでの処理手順を模式的に示す図である。It is a figure which shows typically the process sequence until it accesses the file requested | required using the file archive in this Embodiment. 本実施の形態における、ファイルアーカイブを用いて要求されたファイルの読み出しを完了するまでの処理手順を模式的に示す図である。It is a figure which shows typically the process sequence until the reading of the file requested | required using the file archive in this Embodiment is completed.

図１は、本実施の形態の情報処理装置の内部構成を示している。ここで例示する情報処理装置は、携帯ゲーム機、パーソナルコンピュータ、携帯電話、タブレット端末、ＰＤＡなど一般的な情報機器のいずれでもよい。情報処理装置１０は、ＣＰＵを含むホストユニット１２、システムメモリ１４、ＮＡＮＤ型フラッシュメモリ２０（以後、単にフラッシュメモリ２０と呼ぶ）、フラッシュコントローラ１８を含む。 FIG. 1 shows an internal configuration of the information processing apparatus according to the present embodiment. The information processing apparatus exemplified here may be any of general information equipment such as a portable game machine, a personal computer, a mobile phone, a tablet terminal, and a PDA. The information processing apparatus 10 includes a host unit 12 including a CPU, a system memory 14, a NAND flash memory 20 (hereinafter simply referred to as a flash memory 20), and a flash controller 18.

ホストユニット１２は、フラッシュメモリ２０に格納されたプログラムやデータをシステムメモリ１４にロードし、それを用いて情報処理を行う。またアプリケーションプログラムやデータを、図示しない記録媒体駆動部において駆動された記録媒体から読み出したり通信部によりネットワーク接続されたサーバからダウンロードしたりしてフラッシュメモリ２０に格納する。この際、ホストユニット１２はフラッシュコントローラ１８に、フラッシュメモリ２０に対するアクセス要求を発行し、フラッシュコントローラ１８はそれに応じてフラッシュメモリ２０に対し読み出し／書き込み処理を実施する。 The host unit 12 loads programs and data stored in the flash memory 20 into the system memory 14 and performs information processing using the programs and data. The application program and data are read from a recording medium driven by a recording medium driving unit (not shown) or downloaded from a server connected to the network by a communication unit and stored in the flash memory 20. At this time, the host unit 12 issues an access request to the flash memory 20 to the flash controller 18, and the flash controller 18 performs read / write processing on the flash memory 20 accordingly.

フラッシュメモリ２０には複数のＮＡＮＤ型フラッシュメモリが接続されており、データは図示するように複数のチャネル（図では「ｃｈ０」〜「ｃｈ３」の４チャネル）に分散されて格納される。フラッシュコントローラ１８は、ホストユニット１２とのインターフェース機能を有するホストコントローラ２２、フラッシュメモリ２０とのインターフェース機能を有するメモリコントローラ２８、およびＳＲＡＭ（Static Random Access Memory）２４を含む。ホストコントローラ２２およびメモリコントローラ２８の動作は、ハードウェア的には各種回路や装置で実現でき、ソフトウェア的には、内部に保持するプログラムで実現される。したがってハードウェアのみ、ソフトウェアのみ、またはそれらの組合せによっていろいろな形で実現できることは当業者には理解されるところであり、いずれかに限定されるものではない。 A plurality of NAND flash memories are connected to the flash memory 20, and data is distributed and stored in a plurality of channels (four channels "ch0" to "ch3" in the figure) as shown. The flash controller 18 includes a host controller 22 having an interface function with the host unit 12, a memory controller 28 having an interface function with the flash memory 20, and an SRAM (Static Random Access Memory) 24. The operations of the host controller 22 and the memory controller 28 can be realized by various circuits and devices in terms of hardware, and can be realized by programs stored therein in terms of software. Therefore, it is understood by those skilled in the art that the present invention can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any one.

ホストユニット１２は、情報処理の進捗に応じてフラッシュメモリ２０に対するアクセス要求を発生させ、それをシステムメモリ１４に格納する。当該アクセス要求にはアクセス先の論理アドレス（ＬＢＡ：Logical Block Address）が含まれる。フラッシュコントローラ１８のホストコントローラ２２は、システムメモリ１４に格納されたアクセス要求を読み出し、ＬＢＡをフラッシュメモリ２０の物理アドレスに変換する。この際、必要となるアドレス変換テーブルは、元々フラッシュメモリ２０に格納されていたものの少なくとも一部をＳＲＡＭ２４に展開しておく。 The host unit 12 generates an access request for the flash memory 20 according to the progress of information processing, and stores it in the system memory 14. The access request includes an access destination logical address (LBA: Logical Block Address). The host controller 22 of the flash controller 18 reads the access request stored in the system memory 14 and converts the LBA into a physical address of the flash memory 20. At this time, at least a part of the necessary address conversion table originally stored in the flash memory 20 is expanded in the SRAM 24.

ホストコントローラ２２は、当該アドレス変換テーブルを参照してＬＢＡに基づき取得した物理アドレスをメモリコントローラ２８に供給する。メモリコントローラ２８は当該物理アドレスに基づきフラッシュメモリ２０の該当する領域にアクセスすることによりデータを読み出したり書き込んだりする。一般に、フラッシュメモリ２０に対するデータ読み出し／書き込みは、４０９６バイト等のアクセス単位でなされる。 The host controller 22 supplies the memory controller 28 with the physical address acquired based on the LBA with reference to the address conversion table. The memory controller 28 reads and writes data by accessing the corresponding area of the flash memory 20 based on the physical address. Generally, data reading / writing with respect to the flash memory 20 is performed in an access unit such as 4096 bytes.

またデータを書き換える際は、フラッシュメモリ２０の対象領域のデータ消去を伴うが、このときは数ＭｉＢ（１ＭｉＢ＝１０^２０バイト）のブロック単位でなされる。フラッシュメモリ２０はデータ消去を繰り返すほど摩耗されるため、消去回数を抑えるための工夫が必要となる。具体的には、データの書き換えが発生した際、できるだけ書き換え前のデータの消去を行わず、別の領域に書き換え後のデータを格納したうえ当該領域を指すようにアドレス変換テーブルを更新する。一方で、書き換えが頻発しても新規に割り当てる領域が枯渇しないよう、定期的に使用済みの領域を消去する。 When data is rewritten, data in the target area of the flash memory 20 is erased. In this case, the data is written in units of several MiB (1 MiB = 10 ²⁰ bytes). Since the flash memory 20 is worn as data erasure is repeated, it is necessary to devise a technique for suppressing the number of erasures. Specifically, when data rewriting occurs, the data before rewriting is not erased as much as possible, the data after rewriting is stored in another area, and the address conversion table is updated to point to the area. On the other hand, even if rewriting occurs frequently, the used area is periodically deleted so that the newly allocated area is not exhausted.

上記のように４０９６バイトのアクセス単位でデータの読み出し／書き込みを行う場合、アドレス変換テーブルの各エントリのデータサイズを４バイトとすると、アドレス変換テーブル全体では、フラッシュメモリ２０の全容量に対し０．１％のデータサイズを有することになる。例えばフラッシュメモリ２０の全容量が１テラバイトであれば、アドレス変換テーブルは１ギガバイトとなる。フラッシュコントローラ１８は、ホストユニット１２からデータアクセスが要求される都度、指定されたＬＢＡを物理アドレスに変換するために、まずアドレス変換テーブルを参照する必要がある。 As described above, when data is read / written in an access unit of 4096 bytes, if the data size of each entry in the address conversion table is 4 bytes, the entire address conversion table has a capacity of 0. It will have a data size of 1%. For example, if the total capacity of the flash memory 20 is 1 terabyte, the address conversion table is 1 gigabyte. Whenever the host unit 12 requests data access, the flash controller 18 needs to refer to the address conversion table in order to convert the designated LBA into a physical address.

参照対象のアドレス変換テーブルをフラッシュメモリ２０に格納した場合、アドレス変換のためにフラッシュメモリ２０へアクセスする頻度が増大し、処理のスループットの低下やレイテンシの増大につながる。アドレス変換テーブルの多くの部分を外付けＤＲＡＭなどにキャッシュしておくことにより効率化は図られるが、フラッシュメモリ２０が大容量化するほど大容量のＤＲＡＭが必要となる。またＤＲＡＭのデータ伝送レートが支配的となり、やはりスループットがレイテンシの十分な改善が見込めないこともある。 When the address conversion table to be referenced is stored in the flash memory 20, the frequency of accessing the flash memory 20 for address conversion increases, leading to a decrease in processing throughput and an increase in latency. Efficiency can be improved by caching a large part of the address conversion table in an external DRAM or the like, but a larger capacity DRAM is required as the capacity of the flash memory 20 increases. Further, the data transmission rate of the DRAM becomes dominant, and the throughput may not be expected to sufficiently improve the latency.

そこで本実施の形態では、少なくとも一部のデータについては、書き込み要求に対するデータの処理単位、すなわち粒度を大きくすることによりアドレス変換テーブルのサイズを抑える。例えば書き込みの粒度を１２８ＭｉＢとし、アドレス変換テーブルの各エントリのデータサイズを上述同様、４バイトとすると、アドレス変換テーブル全体のデータサイズは、フラッシュメモリ２０の容量の１／２^２５倍となる。例えば３２ＫｉＢ（３２×２^１０バイト）のアドレス変換テーブルにより１ＴｉＢ（２^４０バイト）の領域を表すことができる。 Therefore, in the present embodiment, the size of the address conversion table is suppressed by increasing the data processing unit for the write request, that is, the granularity, for at least a part of the data. For example, the particle size of the write and 128MiB, similarly above the data size of each entry of the address conversion table, when a 4-byte, the data size of the entire address translation table is a 1/2 ²⁵ times the capacity of the flash memory 20. For example it is possible to represent the area of 1TiB ^{(2 40} bytes) by the address translation table 32KiB (32 × ^{2 10} bytes).

このように十分小さなサイズのアドレス変換テーブルをフラッシュコントローラ１８内部のＳＲＡＭ２４に格納することにより、外付けＤＲＡＭの介在なくアドレス変換が行える。書き込みの粒度を粗くする態様は、光ディスクやネットワークからロードしてフラッシュメモリ２０に格納し、繰り返し参照のみされるゲームプログラムなどにおいては特に有効である。これは、格納済みのデータに対する書き換えが発生しない結果、書き換え後のデータを格納するために新たな領域をその単位で確保する必要がないことによる。 By storing an address conversion table having a sufficiently small size in the SRAM 24 in the flash controller 18 as described above, address conversion can be performed without any external DRAM. A mode in which the granularity of writing is coarse is particularly effective in a game program or the like that is loaded from an optical disk or a network, stored in the flash memory 20, and only repeatedly referenced. This is because, as a result of rewriting of stored data not occurring, it is not necessary to secure a new area in units for storing the rewritten data.

なおこのような粗い粒度で書き込みを行っても、当該書き込み単位内ではデータの連続性が保たれているため、読み出し時のデータは、より細かい単位でランダムに指定できる。一方、セーブデータなど書き換えが必要なデータのためには、書き込みについても細かい粒度で行えるようにすることが望ましい。そのため、データの特性に応じて書き込みの粒度が異なる複数の変換テーブルを定義する。粒度の細かいアドレス変換テーブルは上述のとおりデータサイズが大きくなるため、その一部をＳＲＡＭ２４にキャッシュする。 Even if writing is performed with such a coarse granularity, data continuity is maintained within the writing unit, so that data at the time of reading can be randomly specified in smaller units. On the other hand, for data that needs to be rewritten, such as saved data, it is desirable that writing can be performed with fine granularity. Therefore, a plurality of conversion tables having different write granularities are defined according to the data characteristics. Since the address conversion table with fine granularity has a large data size as described above, a part of the address conversion table is cached in the SRAM 24.

図２は、フラッシュメモリ２０に格納されたデータとアドレス変換テーブルの構成との関係を模式的に示している。アドレス変換テーブルには粒度の異なる複数のアドレス空間が定義され、図示する例では粒度の細かい第１アドレス空間と粒度の粗い第２アドレス空間とからなっている。ただし粒度を３つ以上としてもよい。同図では、ホストユニット１２が指定するＬＢＡを「（アドレス空間番号）−（空間内でのアドレス）」なる形式で表している。例えば「１−１」は第１アドレス空間内のアドレス「１」、「２−２」は第２アドレス空間のアドレス「２」を表す。ただしＬＢＡを記述する４バイト等のデータにはそのほかの情報が含まれてよい。 FIG. 2 schematically shows the relationship between the data stored in the flash memory 20 and the configuration of the address conversion table. A plurality of address spaces with different granularities are defined in the address conversion table. In the example shown in the figure, the address conversion table is composed of a first address space with a fine granularity and a second address space with a coarse granularity. However, the particle size may be three or more. In the figure, the LBA designated by the host unit 12 is represented in a format of “(address space number) − (address in space)”. For example, “1-1” represents an address “1” in the first address space, and “2-2” represents an address “2” in the second address space. However, other information may be included in the data such as 4 bytes describing the LBA.

またアドレス変換によって取得される物理アドレスは基本的に、「（チャネル番号）−（アドレス）」あるいは「アドレス」なる形式で表している。一方、フラッシュメモリ２０は、格納領域をチャネルごとに縦長の長方形で表している。当該長方形を分割してなる矩形のうち、「Ｔ」が記載された矩形はアドレス変換テーブルが格納された領域であり、「１−１」などのＬＢＡが記載された矩形は、それに対応するデータが格納された領域を示す。第１アドレス空間で定義されるデータの書き込み時の粒度は典型的には読み出し時の粒度と等しい、例えば４ＫｉＢなどであり、ＬＢＡは当該サイズの領域に対し１つ定義する。 The physical address acquired by the address conversion is basically expressed in a format of “(channel number) − (address)” or “address”. On the other hand, in the flash memory 20, the storage area is represented by a vertically long rectangle for each channel. Of the rectangles formed by dividing the rectangle, a rectangle with “T” is an area in which an address conversion table is stored, and a rectangle with an LBA such as “1-1” is data corresponding thereto. Indicates the area where is stored. The granularity at the time of writing data defined in the first address space is typically equal to the granularity at the time of reading, for example, 4 KiB, and one LBA is defined for an area of the size.

一方、第２アドレス空間では、１つのＬＢＡによって、第１アドレス空間で定義される領域より大きい領域をまとめて定義できるようにする。図示する例ではｃｈ０〜ｃｈ３の４つのチャネルに渡る連続領域としている。このアドレス空間が対象とするデータの書き込みの時の粒度は、例えば１２８ＭｉＢであるが、フラッシュメモリ２０とＳＲＡＭ２４の容量に基づくアドレス変換テーブルのサイズの上限などに応じて適宜決定してよい。このデータは情報処理による書き換えはなされないため、基本的にはその格納領域やデータ構成が維持される。 On the other hand, in the second address space, an area larger than the area defined in the first address space can be collectively defined by one LBA. In the example shown in the figure, it is a continuous region extending over four channels ch0 to ch3. The granularity at the time of writing data targeted by this address space is, for example, 128 MiB, but may be determined as appropriate according to the upper limit of the size of the address conversion table based on the capacity of the flash memory 20 and the SRAM 24. Since this data is not rewritten by information processing, its storage area and data structure are basically maintained.

ただしフラッシュメモリ２０の経年劣化などによってビットエラーが検出され、データを移動させるときは、新たに当該サイズの連続領域を割り当てる。また消去回数の増加に伴いフラッシュメモリ２０のあるブロックが不良となった場合、第２アドレス空間のデータについては、書き込み単位に含まれるその周辺のブロックも割り当て不可となる。これに対し第１アドレス空間で定義されるデータを書き換えたり移動させたりする際は、４ＫｉＢなど細かい粒度で新たな領域を割り当てることができる。また粒度が細かいほど、近傍の不良により割り当て不可とされるブロックを少なくできる。 However, when a bit error is detected due to aging degradation of the flash memory 20 and data is moved, a continuous area of the size is newly allocated. In addition, when a block in the flash memory 20 becomes defective as the number of erasures increases, the peripheral blocks included in the write unit cannot be assigned to the data in the second address space. On the other hand, when the data defined in the first address space is rewritten or moved, a new area can be allocated with a fine granularity such as 4 KiB. Also, the finer the granularity, the fewer blocks that cannot be assigned due to nearby defects.

したがって上述のように、データの特性によって書き込み時の粒度を異ならせることにより、アドレス変換テーブルのサイズの縮小とフラッシュメモリ２０におけるデータ格納領域の効率性とを両立させることができる。アドレス変換テーブルは、フラッシュメモリ２０の「Ｔ」の領域に格納されていたものを、第１アドレス空間、第２アドレス空間で別々にＳＲＡＭ２４に読み出しておく。第２アドレス空間のアドレス変換テーブルは上述のようにデータサイズが小さいため、起動時に全てをプリロードしておくことにより常にキャッシュヒットさせることができる。 Therefore, as described above, the size of the address translation table can be reduced and the efficiency of the data storage area in the flash memory 20 can be achieved by varying the granularity at the time of writing depending on the data characteristics. The address conversion table stored in the “T” area of the flash memory 20 is read out to the SRAM 24 separately in the first address space and the second address space. Since the address conversion table in the second address space has a small data size as described above, a cache hit can always be made by preloading all of the addresses at the time of activation.

第１アドレス空間のアドレス変換テーブルは、ＳＲＡＭ２４の容量に応じてその一部をキャッシュする。キャッシュ動作の手順については従来技術を適用できる。データの移動や書き換えなどによりＳＲＡＭ２４に格納されたアドレス変換テーブルを更新した場合は、適当なタイミングでフラッシュメモリ２０に格納された元のテーブルに書き戻しておく。 A part of the address conversion table in the first address space is cached according to the capacity of the SRAM 24. Conventional techniques can be applied to the cache operation procedure. When the address conversion table stored in the SRAM 24 is updated by moving or rewriting data, it is written back to the original table stored in the flash memory 20 at an appropriate timing.

ここでホストユニット１２が、第１アドレス空間であるＬＢＡ＝「１−１」を指定して読み出しあるいは書き込みを要求したとする。このときフラッシュコントローラ１８は、アドレス変換テーブルを参照し、それに対応づけられた、フラッシュメモリ２０の物理アドレス「ｃｈ０−Ｃ」を取得する。上述のとおり第１アドレス空間では書き込みの単位、ひいては読み出しの単位ごとにＬＢＡが設定される。したがってアドレス変換テーブルに示された物理アドレス「ｃｈ０−Ｃ」、すなわちチャネル番号ｃｈ０のアドレスＣを先頭アドレスとする１単位分のデータを読み出す。 Here, it is assumed that the host unit 12 requests reading or writing by designating LBA = “1-1” as the first address space. At this time, the flash controller 18 refers to the address conversion table and acquires the physical address “ch0-C” of the flash memory 20 associated therewith. As described above, in the first address space, an LBA is set for each unit of writing and thus for each unit of reading. Therefore, data of one unit having the physical address “ch0-C” shown in the address conversion table, that is, the address C of the channel number ch0 as the head address is read.

書き込み要求の場合は、読み出したデータを適宜更新し、フラッシュメモリ２０の別の領域に書き込んだうえアドレス変換テーブルにおける物理アドレスを、当該領域を示すように更新する。一方、ホストユニット１２が第２アドレス空間であるＬＢＡ＝「２−１」を指定して読み出しを要求したら、フラッシュコントローラ１８は、アドレス変換テーブルに記述される物理アドレス「Ａ」に基づき、読み出し単位のデータの格納領域を計算によって求める。 In the case of a write request, the read data is updated as appropriate, written to another area of the flash memory 20, and the physical address in the address conversion table is updated to indicate the area. On the other hand, when the host unit 12 designates LBA = “2-1” as the second address space and requests reading, the flash controller 18 reads the read unit based on the physical address “A” described in the address conversion table. The data storage area is calculated.

図３は第２アドレス空間のデータの格納領域を取得する手法を説明するための図である。第２アドレス空間におけるＬＢＡの上位ビットは、これまで述べたように書き込み単位の領域ごとに一意に与えられた論理アドレスを表す。このアドレスは、図２における「２−１」などに対応する。例えば第２アドレス空間の書き込みアクセス粒度が１２８ＭｉＢ（２^２７バイト）、アドレス変換テーブルのサイズが５１２Ｂ（２^９バイト）、各アドレス空間が１ＴｉＢ（１０^４０バイト）とすると、ＬＢＡ３２ビットのうちビット３１：１９を上位ビットとする。 FIG. 3 is a diagram for explaining a method of acquiring a data storage area in the second address space. As described above, the upper bits of the LBA in the second address space represent a logical address uniquely given to each area of the write unit. This address corresponds to “2-1” in FIG. For example write access granularity of the second address space 128MiB ^{(2 27} bytes), the size of the address conversion table 512B ^{(2 9} bytes), if the address space is to 1TiB ^{(10 40} bytes), LBA32 of the bits Bit 31: 19 is the upper bit.

フラッシュコントローラ１８は、この上位ビットをインデックスとしてアドレス変換テーブル１００を参照し、それに対応づけられた物理アドレス（ＰＡ）を取得する。このアドレスは、図２における「Ａ」などに対応し、書き込み単位の領域における先頭物理アドレスを表す。図３ではＬＢＡの上位ビット「ｉｎｄｅｘ」がアドレス変換テーブル１００の「ｉｎｄｅｘ３」と合致し、それに対応する物理アドレス「ＰＡ３」が取得されることを示している。 The flash controller 18 refers to the address conversion table 100 using the upper bits as an index, and obtains a physical address (PA) associated therewith. This address corresponds to “A” or the like in FIG. 2 and represents the head physical address in the area of the write unit. FIG. 3 shows that the upper bit “index” of the LBA matches “index3” of the address conversion table 100 and the corresponding physical address “PA3” is acquired.

第１アドレス空間の場合、上述のとおりこの「ＰＡ３」を先頭アドレスとする読み出し単位の領域からデータを読み出せばよい。一方、第２アドレス空間の場合、フラッシュコントローラ１８は、同図において「ｏｆｆｓｅｔ」と記載されたＬＢＡの下位ビットと、取得した物理アドレス「ＰＡ３」とを加算したものを、最終的な物理アドレス１０２として取得する。そしてこの物理アドレス１０２を先頭アドレスとする読み出し単位の領域からデータを読み出す。ホストユニット１２は、ＬＢＡの下位ビットを変化させることにより、粗い粒度の書き込み単位のうちの任意の部分を読み出すことができる。 In the case of the first address space, as described above, data may be read from the read unit area having “PA3” as the head address. On the other hand, in the case of the second address space, the flash controller 18 adds the lower bit of the LBA indicated as “offset” in the drawing and the acquired physical address “PA3” to obtain the final physical address 102. Get as. Then, the data is read from the read unit area having the physical address 102 as the head address. The host unit 12 can read an arbitrary part of the coarse granularity write unit by changing the lower bits of the LBA.

第２アドレス空間で定義される領域に格納されるデータは、移動などにより先頭アドレスは低頻度で変化するものの、書き込み単位の内部におけるデータ構成は変化しないため、ホストユニット１２は、下位ビットを含め同じＬＢＡを示せば、常に同じデータを読み出すことができる。なおフラッシュメモリ２０の構成として、チャネル数＝４、チップセレクト数＝１、ブロックサイズ＝４ＭｉＢ、ページサイズ＝１６ＫｉＢ、ＬＵＮ（Logical Unit Number）＝１、プレーン数＝２とすると、物理アドレスＰＡ［３１：０］の各ビットは次のような割り当てとなる。 For the data stored in the area defined in the second address space, although the head address changes infrequently due to movement, etc., the data structure inside the write unit does not change, so the host unit 12 includes the lower bits. If the same LBA is shown, the same data can always be read out. Assuming that the number of channels = 4, the number of chip selects = 1, the block size = 4 MiB, the page size = 16 KiB, the LUN (Logical Unit Number) = 1, and the number of planes = 2 as the configuration of the flash memory 20, the physical address PA [31 : 0] are assigned as follows.

オフセット［１３：０］＝｛ＰＡ［４：０］，９’ｂ０｝
チャネル［１：０］＝ＰＡ［６：５］
プレーン＝ＰＡ［７］
ブロック＝ＰＡ［３１：８］／（４＊１０２４／１６）
ページ＝ＰＡ［３１：８］％（４＊１０２４／１６） Offset [13: 0] = {PA [4: 0], 9′b0}
Channel [1: 0] = PA [6: 5]
Plane = PA [7]
Block = PA [31: 8] / (4 * 1024/16)
Page = PA [31: 8]% (4 * 1024/16)

次に本実施の形態におけるホストユニット１２について説明する。フラッシュメモリ２０は、それを構成するＮＡＮＤデバイスのそれぞれが実現可能な読み出し時の伝送レートがＨＤＤ１台より大きく、レイテンシはＨＤＤの１／１０以下である。大容量のＳＳＤは多数のＮＡＮＤデバイスを搭載するため、ＨＤＤと比べて飛躍的に高い伝送レートを実現できる。しかしながら大部分のＳＳＤは、フラッシュコントローラにおけるホストインターフェースがボトルネックとなり、デバイス自体の高い伝送レートを活用しきれない。 Next, the host unit 12 in this embodiment will be described. The flash memory 20 has a read transmission rate higher than that of one HDD, which can be realized by each of the NAND devices constituting the flash memory 20, and has a latency of 1/10 or less of that of the HDD. Since a large-capacity SSD is equipped with a large number of NAND devices, it can realize a significantly higher transmission rate than an HDD. However, in most SSDs, the host interface in the flash controller becomes a bottleneck, and the high transmission rate of the device itself cannot be fully utilized.

一般的にＨＤＤに格納されたデータは、５１２バイトあるいは４０９６バイトごとのブロックに分割され、分散されて記録されている。ファイルシステムは分散したデータを１つの連続データとして見せるためのメタデータを持ち、ファイルの連続領域に対するアクセス命令を、分散された複数のブロックに対するアクセス命令に変換する。アクセス対象のファイル名をＨＤＤの各ブロックに対応するＬＢＡに変換するためのメタデータもＨＤＤに記録されるため、ファイルを読み出すためにはまずメタデータを読み出す必要がある。 In general, data stored in an HDD is divided into 512-byte or 4096-byte blocks, and is recorded in a distributed manner. The file system has metadata for making distributed data appear as one continuous data, and converts an access command for a continuous area of a file into an access command for a plurality of distributed blocks. Since the metadata for converting the file name to be accessed into the LBA corresponding to each block of the HDD is also recorded in the HDD, the metadata must first be read in order to read the file.

メタデータ自体もＨＤＤ内の複数の領域に分散していることがあり得るため、当該メタデータを読み出すためにさらに上位のメタデータを読み出すなど、メタデータの階層化によりＨＤＤに対して小さなデータアクセスが頻発する可能性がある。その間、アクセス先のデータが格納されている領域の論理ブロックアドレスを取得できないため、ＣＰＵは次の読み出し要求を発行できない。このようなデータアクセス手順をＳＳＤにそのまま適用すると、複数のＮＡＮＤデバイスへの並列アクセスによる高い伝送レートも実現できない。 Since the metadata itself may also be distributed over multiple areas in the HDD, small data access to the HDD by layering metadata, such as reading higher-level metadata to read the metadata, etc. May occur frequently. Meanwhile, since the logical block address of the area where the access destination data is stored cannot be acquired, the CPU cannot issue the next read request. When such a data access procedure is directly applied to the SSD, a high transmission rate by parallel access to a plurality of NAND devices cannot be realized.

また一般的なＨＤＤは、暗号化や改ざん防止機能を持たないため、ホストＣＰＵが暗号化や改ざんチェックを行う必要がある。暗号化および改ざんチェックはＢＩＯＳレベルでなされることもあればファイルシステムレベルでなされることもあるが、いずれの場合でもＣＰＵが処理するため、ＳＳＤの高い伝送レートに対し当該処理がボトルネックになり得る。アクセラレータを用いてこれらの処理の負荷を分散させることも考えられるが、そのためには読み出したファイルを処理単位に分割したりそれに応じた多数の処理要求を発行したりする必要がありＣＰＵの処理の負荷軽減には繋がりにくい。 Further, since a general HDD does not have an encryption or tampering prevention function, the host CPU needs to perform encryption or tampering check. Encryption and tampering checks may be done at the BIOS level or at the file system level, but in either case, the CPU processes them, and this process becomes a bottleneck for high SSD transmission rates. obtain. It is conceivable to distribute the load of these processes using an accelerator, but for that purpose, it is necessary to divide the read file into processing units and issue a number of processing requests according to the processing units. It is difficult to reduce the load.

さらにそのような多数の処理要求に対し完了を通知する割り込みが多数、発生することにより、ＣＰＵの処理が滞ることもあり得る。また、ファイルシステムにはデータ圧縮をサポートするものがある。この場合、ファイルシステムはファイル書き込み時にデータ圧縮を行い、ファイル読み出し時にデータ伸張を行う。このときデータ格納先のインターフェース速度が遅い場合は、データ量が減ることで実効伝送レートが向上することもあるが、ＳＳＤの高い伝送レートに対してはデータ圧縮／伸張処理がボトルネックになり得る。 Furthermore, the processing of the CPU may be delayed due to the occurrence of a large number of interrupts notifying completion of such a large number of processing requests. Some file systems support data compression. In this case, the file system performs data compression when writing the file and performs data expansion when reading the file. At this time, if the interface speed of the data storage destination is slow, the effective transmission rate may be improved by reducing the amount of data, but the data compression / decompression process may become a bottleneck for a high transmission rate of SSD. .

このようにＮＡＮＤフラッシュデバイス単体でみると飛躍的に向上する伝送レートも、ＨＤＤのために設計されたシステムに組み込むことにより生じる様々なボトルネックにより、それを活かしきれないことが多い。これらの様々なボトルネックを緩和するため、本実施の形態では、従来のファイルシステムに加えて高速アクセス用のソフトウェアスタックを設ける。従来のファイルシステムは、様々なストレージデバイスやネットワークファイルシステムに対応するために仮想ファイルシステムを介してアクセスされる。そのため上述のとおりメタデータが複数の階層にわたって構成され、目的とするファイルを読み出すまでに何度もメタデータを読み出す場合があった。 In this way, the transmission rate that is dramatically improved when the NAND flash device alone is viewed cannot often be utilized due to various bottlenecks caused by being incorporated in a system designed for the HDD. In order to alleviate these various bottlenecks, in this embodiment, a software stack for high-speed access is provided in addition to the conventional file system. A conventional file system is accessed through a virtual file system to support various storage devices and network file systems. Therefore, as described above, the metadata is configured across a plurality of hierarchies, and the metadata may be read many times before the target file is read.

本実施の形態では、フラッシュメモリに特化した高速アクセス用のソフトウェアスタックを設けることによりメタデータを単純化する。さらに本実施の形態では、従来のＣＰＵに加え、当該ソフトウェアスタックを主に実行・制御する補助プロセッサを設け、暗号化／復号、改ざんチェック、データ伸張のためのハードウェアアクセラレータの制御についても当該補助プロセッサが担うことにより処理を分散させる。またフラッシュメモリにおけるデータ読み出しの単位を拡張し、かつ統一することにより効率的な読み出し処理を実現する。 In the present embodiment, metadata is simplified by providing a software stack for high-speed access specialized for flash memory. Furthermore, in this embodiment, in addition to the conventional CPU, an auxiliary processor that mainly executes and controls the software stack is provided, and the hardware accelerator for encryption / decryption, tampering check, and data decompression is also controlled. The processing is distributed by the processor. Further, an efficient read process is realized by expanding and unifying the unit of data read in the flash memory.

図４は、本実施の形態の情報処理装置の内部構成を示している。なお同図は、図１で示した情報処理装置１０の内部構成のうち、ホストユニット１２の構成を詳細に示したものである。したがってフラッシュメモリ２０、フラッシュコントローラ１８、システムメモリ１４は図１で示したのと同様でよい。ただしフラッシュコントローラ１８がＬＢＡを物理アドレスに変換するためのアドレス変換テーブルは、上述のように粒度の異なる複数のアドレス空間で構成されてもよいし統一した粒度としてもよい。 FIG. 4 shows the internal configuration of the information processing apparatus of this embodiment. This figure shows in detail the configuration of the host unit 12 in the internal configuration of the information processing apparatus 10 shown in FIG. Accordingly, the flash memory 20, the flash controller 18, and the system memory 14 may be the same as those shown in FIG. However, the address conversion table for the flash controller 18 to convert the LBA into a physical address may be composed of a plurality of address spaces having different granularities as described above, or may have a unified granularity.

ホストユニット１２は、メインＣＰＵ３０、サブＣＰＵ３２、メモリコントローラ３４がコヒーレントバス３６で相互に接続された構成を有する。コヒーレントバス３６にはさらに、ＩＯコントローラ４０およびアクセラレータ４２が接続されたＩＯバス３８が接続される。メインＣＰＵ３０はフラッシュメモリ２０に格納されたプログラムやデータをシステムメモリ１４にロードし、それを用いて情報処理を行う。 The host unit 12 has a configuration in which a main CPU 30, a sub CPU 32, and a memory controller 34 are connected to each other via a coherent bus 36. The coherent bus 36 is further connected to an IO bus 38 to which an IO controller 40 and an accelerator 42 are connected. The main CPU 30 loads programs and data stored in the flash memory 20 into the system memory 14 and performs information processing using the programs and data.

サブＣＰＵ３２は上述のように、フラッシュメモリ２０に対するデータアクセスのための処理を主に担う補助プロセッサである。サブＣＰＵ３２はいわゆるエンベデッド・プロセッサに用いられるような、演算性能はメインＣＰＵ３０には劣るがチップ面積が小さいプロセッサコアでよい。メインＣＰＵ３０とサブＣＰＵ３２の命令セットアーキテクチャやオペレーティングシステムは同じである必要はないが、メインＣＰＵ３０とサブＣＰＵ３２でシステムメモリ１４に格納されたデータを共有できるように、両者はコヒーレントバス３６で接続され、ページサイズも共通とする。 As described above, the sub CPU 32 is an auxiliary processor mainly responsible for processing for data access to the flash memory 20. The sub CPU 32 may be a processor core that is inferior to the main CPU 30 but has a small chip area as used in a so-called embedded processor. The instruction set architecture and operating system of the main CPU 30 and the sub CPU 32 do not need to be the same, but are connected by a coherent bus 36 so that the main CPU 30 and the sub CPU 32 can share data stored in the system memory 14. The page size is also common.

サブＣＰＵ３２は、メインＣＰＵ３０から発行されたファイルの読み出し要求を、所定サイズのデータに対する読み出し要求に分割し、システムメモリ１４に格納する。このように本実施の形態では、メインＣＰＵ３０以外のハードウェアがフラッシュメモリ２０に対するデータアクセスの主たる部分を遂行するとともに、ファイルへのアクセス要求の発行直後に読み出し単位を細かくする。これにより複数のＮＡＮＤデバイスへの並列アクセスを可能にし、高い伝送レートを実現する。また読み出されたデータの内蔵ＳＲＡＭへのバッファや、暗号化、改ざんチェックなどアクセラレータがなす処理と、データサイズの面で親和性を高くし、途中で処理が滞らないようにする。 The sub CPU 32 divides the file read request issued from the main CPU 30 into read requests for data of a predetermined size and stores them in the system memory 14. As described above, in the present embodiment, hardware other than the main CPU 30 performs the main part of data access to the flash memory 20, and the read unit is made fine immediately after issuing the access request to the file. This enables parallel access to a plurality of NAND devices and realizes a high transmission rate. In addition, the buffer that stores the read data in the built-in SRAM, the processing performed by the accelerator, such as encryption and falsification check, and the data size are made highly compatible so that the processing is not delayed.

ＩＯバス３８には、暗号処理、データ改ざんチェック、データ伸張を行うアクセラレータ４２が搭載される。それらはシステムメモリ１４に格納されたデータを図示しないＤＭＡＣにより読み出し、復号、改ざんチェック、データ伸張を施して、再びＤＭＡＣによりシステムメモリ１４に格納する。フラッシュコントローラ１８は、ホストユニット１２が発行したデータアクセスに係る命令をシステムメモリ１４より読み出し、フラッシュメモリ２０に対する読み出し／書き込み処理を行う。 The IO bus 38 includes an accelerator 42 that performs cryptographic processing, data falsification check, and data expansion. They read the data stored in the system memory 14 by a DMAC (not shown), perform decoding, tampering check, and data expansion, and store them again in the system memory 14 by the DMAC. The flash controller 18 reads a command related to data access issued by the host unit 12 from the system memory 14 and performs read / write processing on the flash memory 20.

フラッシュコントローラ１８は、フラッシュメモリ２０から読み出したデータを一旦、内蔵するＳＲＡＭ２４に格納したうえＥＣＣチェックを施した後、システムメモリ１４に転送する。ホストユニット１２のメモリコントローラ３４およびＩＯコントローラ４０は、それぞれシステムメモリ１４とフラッシュコントローラ１８に対する一般的なインターフェース機能を有する。 The flash controller 18 temporarily stores the data read from the flash memory 20 in the built-in SRAM 24, performs an ECC check, and transfers the data to the system memory 14. The memory controller 34 and the IO controller 40 of the host unit 12 have general interface functions for the system memory 14 and the flash controller 18, respectively.

図５は本実施の形態におけるソフトウェアスタックの構成を示している。一般的な技術では、最上層のアプリケーション５０からコマンドが発行されると、システムコールにより仮想ファイルシステム４８の処理がなされる。これにより、ネットワークファイルシステムやディスクファイルシステムといったローカルファイルシステム４６が呼び出され、それぞれに対応するデバイスドライバ４４へのアクセスが実現される。つまり仮想ファイルシステム４８は、様々なデバイスに対応するローカルファイルシステム４６を、アプリケーション５０において共通の方法で扱えるようにするための機能を提供する抽象化レイヤである。 FIG. 5 shows the configuration of the software stack in the present embodiment. In a general technique, when a command is issued from the uppermost application 50, the virtual file system 48 is processed by a system call. As a result, a local file system 46 such as a network file system or a disk file system is called, and access to the corresponding device driver 44 is realized. In other words, the virtual file system 48 is an abstraction layer that provides a function for enabling the application 50 to handle the local file system 46 corresponding to various devices in a common method.

仮想ファイルシステム４８はメタデータを構成するディレクトリエントリ情報を管理し、ファイル名やパスを解釈してデータが各デバイスのどこにあるのかを計算する。この際、ディレクトリーツリーの検索、排他制御、キャッシュ管理など複雑な処理を伴うため、例えば小さなファイルを大量にオープンするときなどには特に処理が滞りやすい。そこで本実施の形態では、仮想ファイルシステム４８と別に、ファイルアーカイブ５２と呼ぶ層を定義する。アプリケーション５０は、ファイルアーカイブ固有のＡＰＩ（Application Programming Interface）を介してファイルにアクセスする。 The virtual file system 48 manages directory entry information constituting the metadata, interprets the file name and path, and calculates where the data is in each device. At this time, since complicated processing such as directory tree search, exclusive control, and cache management is involved, for example, when a large number of small files are opened, the processing is particularly likely to be delayed. Therefore, in this embodiment, a layer called a file archive 52 is defined separately from the virtual file system 48. The application 50 accesses the file via an API (Application Programming Interface) specific to the file archive.

ファイルアーカイブ５２は、フラッシュメモリ２０を動作させるＮＡＮＤフラッシュドライバやアクセラレータ４２を動作させるアクセラレータドライバとアプリケーション５０とのインターフェースであり、アプリケーションからのアクセス要求を直接、それらのドライバに通達する。これにより、ファイルに対応するデータ格納領域の取得処理を単純化する。また細かい単位に分割したアクセス要求を円滑に並列処理できるように、対象となるデータは特有の形式でフラッシュメモリ２０に格納しておく。 The file archive 52 is an interface between the NAND flash driver that operates the flash memory 20 and the accelerator driver that operates the accelerator 42 and the application 50, and directly notifies the driver of access requests from the application. This simplifies the process of acquiring the data storage area corresponding to the file. The target data is stored in the flash memory 20 in a specific format so that the access requests divided into fine units can be processed in parallel smoothly.

具体的には、ファイルアーカイブ５２を介してアクセスするファイルは、あらかじめ６４ＫｉＢなどの固定長のブロックに分割して圧縮したうえで格納する。またこれらのファイルを読み出し専用とすることにより、アプリケーション５０の複数のプロセスが同時にファイルアクセスを行っても整合性が保たれるようにする。これにより同期処理をせずとも複数のファイルアクセスを並列に処理でき、圧縮によるデータサイズの縮小との相乗効果でより高い伝送レートを実現できる。 Specifically, a file accessed via the file archive 52 is divided into fixed-length blocks such as 64 KiB, compressed, and stored. Also, by making these files read-only, consistency is maintained even if a plurality of processes of the application 50 perform file access simultaneously. As a result, a plurality of file accesses can be processed in parallel without performing synchronization processing, and a higher transmission rate can be realized by a synergistic effect with data size reduction by compression.

例えば上述のゲームプログラムのように繰り返し参照のみされるファイルは、ファイルアーカイブによる処理対象とするのに好適である。このように、ファイルアーカイブ５２の処理対象とするか否かは、データの特性に応じて適宜決定する。図６は、ファイルアーカイブ５２およびフラッシュコントローラ１８が、処理対象のファイルデータをフラッシュメモリ２０に格納する手順を模式的に示している。この段階でのファイルアーカイブの実行主体はメインＣＰＵ３０、あるいはサブＣＰＵ３２である。まずファイルアーカイブ５２は処理対象とするファイル１１２をフラッシュメモリ２０の連続領域に書き込む。 For example, a file that is repeatedly referred to as in the above-described game program is suitable for processing by a file archive. As described above, whether or not to be processed by the file archive 52 is appropriately determined according to the data characteristics. FIG. 6 schematically shows a procedure in which the file archive 52 and the flash controller 18 store the file data to be processed in the flash memory 20. The execution subject of the file archive at this stage is the main CPU 30 or the sub CPU 32. First, the file archive 52 writes the file 112 to be processed into a continuous area of the flash memory 20.

同図では１つのファイル１１２のみを示しているが、実際には複数のファイルをまとめて連続領域に格納する。例えば光ディスクなどから読み出した複数のプログラムファイルを、１２８ＭｉＢなどの粗い粒度で連続領域に書き込むように書き込み要求を発行する。実際の書き込み処理はフラッシュコントローラ１８が行う。ここでファイルアーカイブ５２は、ファイル名からその格納先の領域の論理アドレスが直接引けるようにハッシュリストを生成する。すなわちファイル名から所定のハッシュ関数を用いて固定長のハッシュ値を生成し、当該ファイルの論理アドレスを示すエントリをハッシュ値でソートしてハッシュリストを生成する。ただしアドレス検索の機構をこれに限定する主旨ではない。 In the figure, only one file 112 is shown, but actually, a plurality of files are collectively stored in a continuous area. For example, a write request is issued so that a plurality of program files read from an optical disc or the like are written in a continuous area with a coarse granularity such as 128 MiB. The actual writing process is performed by the flash controller 18. Here, the file archive 52 generates a hash list so that the logical address of the storage destination area can be directly subtracted from the file name. That is, a fixed-length hash value is generated from a file name using a predetermined hash function, and an entry indicating the logical address of the file is sorted by the hash value to generate a hash list. However, the address search mechanism is not limited to this.

次にファイルアーカイブ５２は、そのようにしてフラッシュメモリ２０の連続領域に格納した対象ファイル１１２を所定サイズのブロックに分割する（Ｓ１０）。例えば１６ＭｉＢのサイズのファイル１１２を６４ＫｉＢ単位で分割することにより、２５６個のブロック群１１４が形成される。さらにファイルアーカイブ５２は、当該ブロックごとに圧縮処理を施し、圧縮後のブロック群からなるデータ１１６をフラッシュメモリ２０の別の領域に格納するようにフラッシュコントローラ１８に要求する（Ｓ１２）。この場合も１２８ＭｉＢなどの粗い粒度で連続領域に書き込むようにする。 Next, the file archive 52 divides the target file 112 thus stored in the continuous area of the flash memory 20 into blocks of a predetermined size (S10). For example, a block group 114 of 256 blocks is formed by dividing a file 112 having a size of 16 MiB in units of 64 KiB. Further, the file archive 52 performs compression processing for each block, and requests the flash controller 18 to store the data 116 including the compressed block group in another area of the flash memory 20 (S12). In this case as well, data is written in a continuous area with a coarse granularity such as 128 MiB.

フラッシュコントローラ１８は当該格納処理の実施とともに、圧縮後のデータの論理アドレスと実際のデータが格納されている領域の物理アドレスとを対応づけるアドレス変換テーブルを生成する。圧縮後のデータを粗い粒度で論理アドレスと対応づけておくことにより、図２で示した第２アドレス空間が定義され、アドレス変換テーブル６４のデータサイズを小さくできる。一方、ファイルアーカイブ５２は、ブロックの圧縮前後の論理アドレスを対応づける圧縮テーブルを生成する。 The flash controller 18 generates an address conversion table that associates the logical address of the compressed data with the physical address of the area where the actual data is stored, along with the execution of the storage process. By associating the compressed data with logical addresses with a coarse granularity, the second address space shown in FIG. 2 is defined, and the data size of the address conversion table 64 can be reduced. On the other hand, the file archive 52 generates a compression table that associates logical addresses before and after compression of blocks.

図７は、ファイルアーカイブを用いて要求されたファイルへアクセスするまでの処理手順を模式的に示している。まずアプリケーションを処理するメインＣＰＵ３０は、その過程においてファイル読み出しの必要が生じたとき、当該ファイル名を指定してファイルアーカイブのＡＰＩを呼び出す。図では「/map/001/dat01.bin」なるパスとファイル名を指定している。当該ＡＰＩにより、メインＣＰＵ３０は指定されたファイルに対応する論理アドレスを、上述のハッシュリスト６０を利用して取得する。ハッシュリスト６０はあらかじめフラッシュメモリ２０からシステムメモリ１４にロードしておく。 FIG. 7 schematically shows a processing procedure until the requested file is accessed using the file archive. First, when it becomes necessary to read a file during the process, the main CPU 30 that processes the application calls the file archive API by designating the file name. In the figure, the path and file name “/map/001/dat01.bin” are specified. With the API, the main CPU 30 acquires a logical address corresponding to the designated file using the hash list 60 described above. The hash list 60 is loaded from the flash memory 20 into the system memory 14 in advance.

そしてアプリケーションおいて指定されたファイル名に基づき、ハッシュリストを作成したときと同じハッシュ関数を用いてハッシュ値を導出し（Ｓ２０）、ハッシュリスト６０を二分探索するなどして対応する論理アドレスを取得する（Ｓ２２）。取得した論理アドレスをサブＣＰＵ３２に通知したら、サブＣＰＵ３２が処理を引き継ぐことにより、メインＣＰＵ３０はファイルアーカイブ５２の処理から一旦、開放される。サブＣＰＵ３２は、システムメモリ１４にロードしておいた上述の圧縮テーブル６２を参照し、通知されたファイルの論理アドレスから、ファイルを分割してなる複数のブロックの圧縮後の論理アドレスを取得する（Ｓ２４、Ｓ２６）。 Based on the file name specified in the application, a hash value is derived using the same hash function as when the hash list was created (S20), and the corresponding logical address is obtained by searching the hash list 60 in a binary manner. (S22). When the sub CPU 32 is notified of the acquired logical address, the main CPU 30 is temporarily released from the processing of the file archive 52 by the sub CPU 32 taking over the processing. The sub CPU 32 refers to the compression table 62 loaded in the system memory 14 and obtains the compressed logical addresses of a plurality of blocks obtained by dividing the file from the notified logical addresses of the file ( S24, S26).

そしてサブＣＰＵ３２は、圧縮後のブロックごとに、取得した論理アドレスを上述のＬＢＡの形式として指定した読み出し要求を生成してフラッシュコントローラ１８に発行する（Ｓ２８）。すなわちサブＣＰＵ３２は、メインＣＰＵ３０が１つのファイルについて発行した読み出し要求を、ブロック単位の複数の読み出し要求に変換する。上述の１６ＭｉＢのファイルの場合、６４ＫｉＢ単位の読み出し要求が２５６個発行されることになる。なおサブＣＰＵ３２は読み出し要求時に、読み出されたデータの格納領域をシステムメモリ１４のカーネルエリアに確保しておく。 Then, the sub CPU 32 generates a read request specifying the acquired logical address as the above-mentioned LBA format for each block after compression, and issues it to the flash controller 18 (S28). That is, the sub CPU 32 converts a read request issued by the main CPU 30 for one file into a plurality of read requests in units of blocks. In the case of the 16 MiB file described above, 256 read requests in 64 KiB units are issued. Note that the sub CPU 32 reserves a storage area for the read data in the kernel area of the system memory 14 at the time of the read request.

これに応じてフラッシュコントローラ１８は、図２で説明したのと同様に、アドレス変換テーブル６４を用いてＬＢＡを物理アドレスに変換し、フラッシュメモリ２０から該当アドレスのデータを取得する（Ｓ３０）。アドレス変換に際してフラッシュコントローラ１８は、ファイルアーカイブを要求元とする読み出し要求については、上述のとおりＬＢＡの上位ビットをインデックスとしてアドレス変換テーブル６４を参照し、それに対応づけられた物理アドレスにＬＢＡの下位ビットを加算することにより、要求されたブロックごとの物理アドレスを取得する。 In response to this, the flash controller 18 converts the LBA into a physical address using the address conversion table 64 and acquires the data of the corresponding address from the flash memory 20 (S30). In the address conversion, the flash controller 18 refers to the address conversion table 64 using the LBA upper bits as an index for a read request with a file archive as a request source, and assigns the LBA lower bits to the physical address associated therewith. Is added to obtain the physical address for each requested block.

図８は、ファイルアーカイブを用いて要求されたファイルの読み出しを完了するまでの処理手順を模式的に示している。まずフラッシュコントローラ１８は、図７で説明したようにしてフラッシュメモリ２０から要求されたデータを読み出し、内蔵するＳＲＡＭ２４に展開する（Ｓ４０）。このデータは、元のファイルをブロック分割して圧縮した単位であり、ＳＲＡＭ２４に十分格納できるサイズを有する。そしてフラッシュコントローラ１８は、当該単位のデータにＥＣＣチェックを施す（Ｓ４２）。 FIG. 8 schematically shows a processing procedure until reading of the requested file using the file archive is completed. First, the flash controller 18 reads the requested data from the flash memory 20 as described with reference to FIG. 7, and develops it in the built-in SRAM 24 (S40). This data is a unit obtained by dividing the original file into blocks and compressed, and has a size that can be sufficiently stored in the SRAM 24. Then, the flash controller 18 performs an ECC check on the unit data (S42).

ＥＣＣチェックをパスした場合、サブＣＰＵ３２が前もって確保しておいたシステムメモリ１４内のカーネルエリア７０に、図示しないＤＭＡＣ等によって当該データを格納するとともにサブＣＰＵ３２にその旨を通知する（Ｓ４４）。なおＥＣＣチェックでエラーが検出された場合は、フラッシュコントローラ１８内で再送要求を生成することにより再度、データを読み出す。フラッシュコントローラ１８は、サブＣＰＵ３２から発行された細かい単位の読み出し要求に対する処理が全て完了するまで、当該処理を繰り返す。 When the ECC check is passed, the data is stored in the kernel area 70 in the system memory 14 reserved in advance by the sub CPU 32 by DMAC (not shown) and the sub CPU 32 is notified of this (S44). If an error is detected by the ECC check, data is read again by generating a retransmission request in the flash controller 18. The flash controller 18 repeats the processing until all the processing for the fine unit read request issued from the sub CPU 32 is completed.

サブＣＰＵ３２は、フラッシュコントローラ１８からの通知に応じて、カーネルエリア７０に読み出されたデータに対する、改ざんチェック、復号、伸張の処理要求をアクセラレータ４２に発行する（Ｓ４６）。アクセラレータ４２は、格納されているデータごとにそれらの処理を実行し、処理後のデータ、すなわちファイルを構成するブロックのデータをシステムメモリ１４のユーザバッファ７２に格納しサブＣＰＵ３２に通知する（Ｓ４８）。 In response to the notification from the flash controller 18, the sub CPU 32 issues an alteration check, decryption, and decompression processing request to the accelerator 42 for the data read to the kernel area 70 (S46). The accelerator 42 executes the processing for each stored data, stores the processed data, that is, the data of the blocks constituting the file in the user buffer 72 of the system memory 14 and notifies the sub CPU 32 (S48). .

サブＣＰＵ３２は、要求されたファイルを構成するブロックのデータが全て揃った時点で、メインＣＰＵ３０に割り込みまたはプロセス間通信を用いて、読み出しの完了を通知する。これに応じてメインＣＰＵ３０は、ファイルアーカイブのＡＰＩ処理に伴う事後処理を適宜行い、アプリケーションに処理を戻す。本実施の形態では図５で示すように、ファイルアーカイブ５２と仮想ファイルシステム４８が共存するファイルスタックを形成する。上述のようにフラッシュコントローラ１８は、ファイルアーカイブ５２からの要求に対しては、粗い粒度で書き込みを行う第２アドレス空間によりアドレス変換を行う。 The sub CPU 32 notifies the main CPU 30 of the completion of reading by using interrupt or inter-process communication when all the data of the blocks constituting the requested file are prepared. In response to this, the main CPU 30 appropriately performs post processing accompanying the API processing of the file archive, and returns the processing to the application. In this embodiment, as shown in FIG. 5, a file stack in which the file archive 52 and the virtual file system 48 coexist is formed. As described above, in response to a request from the file archive 52, the flash controller 18 performs address conversion using the second address space in which writing is performed with coarse granularity.

一方、仮想ファイルシステム４８からの要求に対しては、それより細かい粒度で書き込み行う第１アドレス空間によりアドレス変換を行うようにしてもよい。この場合、図２で示したように、各アドレス空間に対応する２つのアドレス変換テーブルをＳＲＡＭ２４に個別に格納し、要求元がファイルアーカイブ５２か仮想ファイルシステム４８かに応じて参照先を切り替える。要求元は、アクセス要求に含まれるＬＢＡの上位ビットにより特定できる。 On the other hand, for the request from the virtual file system 48, the address conversion may be performed in the first address space in which writing is performed with a finer granularity. In this case, as shown in FIG. 2, two address conversion tables corresponding to each address space are individually stored in the SRAM 24, and the reference destination is switched depending on whether the request source is the file archive 52 or the virtual file system 48. The request source can be specified by the upper bits of the LBA included in the access request.

ファイルアーカイブ５２による読み出し要求に対し高い伝送レートを保障するため、当該要求に対する処理を、仮想ファイルシステム４８による読み出し／書き込み要求や、ファイルアーカイブ５２による圧縮後のデータの書き込み要求に対する処理より高い優先度で行ってもよい。優先制御や伝送レート管理は、フラッシュコントローラ１８に要求を発行するサブＣＰＵ３２、または要求を受けたフラッシュコントローラ１８が行う。ファイルアーカイブ５２からの要求によりデータが読み出されている期間は、別の読み出し要求を禁止することにより、読み出されるべきデータが消去されることがなくなり、エラーが発生しない限りはガベージコレクションによるデータ消去も禁止できる。 In order to guarantee a high transmission rate for the read request by the file archive 52, the processing for the request is given higher priority than the read / write request by the virtual file system 48 and the processing for the compressed data write request by the file archive 52. You may go on. Priority control and transmission rate management are performed by the sub CPU 32 that issues a request to the flash controller 18 or the flash controller 18 that has received the request. During the period when data is being read by the request from the file archive 52, the data to be read is not erased by prohibiting another read request, and data is erased by garbage collection unless an error occurs. Can also be prohibited.

次に上記構成による情報処理装置１０の性能について検討する。所望の伝送レートを実現するために許容される処理時間は次の表のようになる。例えば１要求当たり４ＫｉＢのデータ粒度（処理単位）で１ＧＢ／秒の伝送レートを実現するには、１要求当たり４．１μ秒で処理を完了する必要がある。処理完了までの時間がそれより増えれば伝送レートは当然、低下していく。 Next, the performance of the information processing apparatus 10 having the above configuration will be examined. The processing time allowed to achieve the desired transmission rate is as shown in the following table. For example, in order to realize a transmission rate of 1 GB / second with a data granularity (processing unit) of 4 KiB per request, it is necessary to complete the processing in 4.1 μsec per request. If the time until the completion of processing is further increased, the transmission rate naturally decreases.

これまで述べたように本実施の形態においてメインＣＰＵ３０は以下の処理を行う。
１．ファイル名からのハッシュ値計算
２．ハッシュリストの検索
３．サブＣＰＵ３２に対する要求発行 As described above, in the present embodiment, the main CPU 30 performs the following processing.
1. 1. Hash value calculation from file name 2. Search hash list Issuing a request to the sub CPU 32

ファイルサイズが４ＫｉＢ程度と小さい場合、１０ＧＢ／秒の伝送レートを実現するには１ファイル当たり０．４μ秒以内で処理を完了させる必要がある。そのような細かい単位で読み出し要求を頻発させた場合、並列化の効果が小さいうえ、サブＣＰＵ３２、フラッシュコントローラ１８、フラッシュメモリ２０における伝送レートの影響を受け易くなるため、容易にレイテンシが増加しメインＣＰＵ３０での伝送レートが低下することが考えられる。したがって、このような小さいサイズでのデータアクセス要求が１ｍ秒に１回程度の頻度の場合、複数のファイルを１つにまとめ、一度に１０ＭｉＢ程度のデータアクセスを要求することにより、高い頑健性で１０ＧＢ／秒の伝送レートを実現できる。 When the file size is as small as about 4 KiB, in order to realize a transmission rate of 10 GB / sec, it is necessary to complete the processing within 0.4 μsec per file. When read requests are frequently made in such a fine unit, the effect of parallelization is small, and it becomes easy to be affected by the transmission rate in the sub CPU 32, the flash controller 18, and the flash memory 20. It is conceivable that the transmission rate at the CPU 30 decreases. Therefore, if the data access request with such a small size is about once per 1 ms, a plurality of files are combined into one, and by requesting data access of about 10 MiB at a time, high robustness is achieved. A transmission rate of 10 GB / second can be realized.

本実施の形態においてサブＣＰＵ３２は以下の処理を行う。
１．メインＣＰＵ３０が発行した要求を固定長のデータブロックごとに分割する
２．圧縮テーブルを参照し圧縮後のデータが格納された論理アドレスを取得する
３．フラッシュコントローラ１８に読み出し要求を発行する
４．読み出されたデータに対する改ざんチェック等の処理をアクセラレータに要求する
５．元のファイルを構成するブロックの準備ができた時点でメインＣＰＵ３０へ通知する In the present embodiment, the sub CPU 32 performs the following processing.
1. 1. A request issued by the main CPU 30 is divided into fixed-length data blocks. 2. Refer to the compression table to obtain the logical address where the compressed data is stored. 3. Issue a read request to the flash controller 18. 4. Request processing such as tampering check for the read data from the accelerator. Notify the main CPU 30 when the blocks making up the original file are ready

サブＣＰＵ３２は常に固定長のデータを処理単位とする。例えば処理単位を６４ＫｉＢとすると、１０ＧＢ／秒の伝送レートを実現するには、１要求当たり６．６μ秒で処理を完了させる必要がある。例えばメインＣＰＵ３０から１６ＭｉＢのファイルに対する読み出し要求が発行された場合、サブＣＰＵ３２は上述のとおり、それを６４ＫｉＢごとに分割して２５６個の読み出し要求を発行する。このとき６．６μ秒で１つずつ要求を処理するのでなく、複数の要求をまとめて処理する。 The sub CPU 32 always uses fixed length data as a processing unit. For example, if the processing unit is 64 KiB, in order to realize a transmission rate of 10 GB / second, it is necessary to complete the processing at 6.6 μsec per request. For example, when a read request for a file of 16 MiB is issued from the main CPU 30, the sub CPU 32 divides it into 64 KiB and issues 256 read requests as described above. At this time, instead of processing requests one by one in 6.6 μs, a plurality of requests are processed together.

本実施の形態では、フラッシュコントローラ１８や各種アクセラレータに対するコマンドの発行と完了通知の受領、といった単純化された処理のみを並列に高頻度で行える。そのため、例えば３２個程度の要求をまとめて処理し２１１μ秒以内で完了させることも可能となり、結果として１０ＧＢ／秒の伝送レートを実現できる。実現しなければならない伝送レートによっては、例えばフラッシュコントローラ１８に対する要求発行や完了通知の受領などの処理を、さらに複数のＣＰＵコアに分散させてもよい。 In the present embodiment, only simplified processes such as issuing commands to the flash controller 18 and various accelerators and receiving completion notifications can be performed in parallel at high frequency. Therefore, for example, about 32 requests can be processed together and completed within 211 μsec, and as a result, a transmission rate of 10 GB / sec can be realized. Depending on the transmission rate that must be realized, for example, processing such as issuing a request to the flash controller 18 and receiving a completion notification may be distributed to a plurality of CPU cores.

本実施の形態においてフラッシュコントローラ１８は以下の処理を行う。
１．サブＣＰＵ３２が発行した要求を読み出す
２．アドレス変換テーブルを参照しＬＢＡを物理アドレスへ変換する
３．フラッシュメモリ２０の該当領域からデータを読み出す
４．データをシステムメモリ１４に格納しその旨をサブＣＰＵ３２に通知する In the present embodiment, the flash controller 18 performs the following processing.
1. 1. Read the request issued by the sub CPU 32 2. Refer to the address conversion table to convert LBA to physical address. 3. Read data from the corresponding area of the flash memory 20 Data is stored in the system memory 14 and a notification to that effect is sent to the sub CPU 32.

高速処理が要求されるアドレス変換において、参照すべきアドレス変換テーブルは内蔵するＳＲＡＭ２４に格納されている。サブＣＰＵ３２からの要求が６４ＫｉＢ単位であれば、１０ＧＢ／秒の伝送レートを実現するには１要求当たり６．６μ秒で処理を完了させる必要がある。ＩＯＰＳ（Input Output Per Second）換算すると１５１，５１５ＩＯＰＳとなるため、サブＣＰＵ３２と同様に、複数のプロセッサコアに処理を分散させることが望ましい。またフラッシュコントローラ１８は上述のように、フラッシュメモリ２０に対し複数のインターフェースチャネルを持ち、読み出し要求はさらにチャネルごとに分割される。例えば６４ＫｉＢのデータの読み出し要求が１６ＫｉＢの処理単位に分割されると、４倍のＩＯＰＳが必要となるが、それらは複数のチャネルで並列処理されるため伝送レートに対する影響は大きくない。 In address conversion that requires high-speed processing, an address conversion table to be referred to is stored in the built-in SRAM 24. If the request from the sub CPU 32 is a unit of 64 KiB, it is necessary to complete the processing at 6.6 μsec per request in order to realize a transmission rate of 10 GB / sec. Since IOPS (Input Output Per Second) conversion is 151,515 IOPS, it is desirable to distribute the processing to a plurality of processor cores as in the case of the sub CPU 32. As described above, the flash controller 18 has a plurality of interface channels with respect to the flash memory 20, and the read request is further divided for each channel. For example, if a 64 KiB data read request is divided into 16 KiB processing units, four times as much IOPS is required, but these are processed in parallel by a plurality of channels, so the influence on the transmission rate is not great.

本実施の形態においてアクセラレータ４２は以下の処理を行う。
１．サブＣＰＵ３２からの処理依頼
２．システムメモリ１４からのデータ読み出し
３．改ざんチェック、復号、伸張処理
４．処理後のデータのシステムメモリ１４への格納およびサブＣＰＵ３２への通知 In the present embodiment, the accelerator 42 performs the following processing.
1. 1. Processing request from sub CPU 32 2. Read data from system memory 14 Tamper check, decryption, and decompression processing4. Storage of processed data in the system memory 14 and notification to the sub CPU 32

平均１０ＧＢ／秒以上のスループットで動作させる場合、各種処理間のオーバーヘッドなども考慮すると、アクセラレータにはそれ以上のピークスループットが必要になる。１２８ビット／サイクル、１ＧＨｚで動作する回路であれば、ピークスループットは１６ＧＢ／秒と十分な値となる。一方、１２８ビットの処理に１６サイクルを要するような回路であれば、アクセラレータを１６並列させるなどの対策を講じてもよい。 When operating at an average throughput of 10 GB / second or more, considering the overhead between various processes, the accelerator needs a peak throughput higher than that. If the circuit operates at 128 bits / cycle and 1 GHz, the peak throughput is a sufficient value of 16 GB / second. On the other hand, if the circuit requires 16 cycles for 128-bit processing, measures such as 16 parallel accelerators may be taken.

以上述べた本実施の形態によれば、フラッシュメモリに対する書き込みアクセスの粒度を、ページ単位など従来技術での書き込み粒度より大きくする。これによりアドレス変換テーブル全体を内蔵ＳＲＡＭに格納できる程度のサイズとすることができ、アドレス変換のためにフラッシュメモリへのアクセスを繰り返す必要がなくなる。またフラッシュメモリの容量が大きくなっても、アドレス変換テーブルをキャッシュするために大容量の外付けＤＲＡＭを設ける必要がなくなる。結果として、フラッシュメモリや外付けＤＲＡＭへのアクセスによるレイテンシの増加やスループットの低下を防ぐことができるとともに、製造コストやチップ面積を削減できる。 According to the present embodiment described above, the granularity of write access to the flash memory is made larger than the write granularity in the prior art such as a page unit. As a result, the entire address conversion table can be sized to be stored in the built-in SRAM, and there is no need to repeat access to the flash memory for address conversion. Even if the capacity of the flash memory is increased, it is not necessary to provide a large capacity external DRAM for caching the address conversion table. As a result, an increase in latency and a decrease in throughput due to access to the flash memory and the external DRAM can be prevented, and a manufacturing cost and a chip area can be reduced.

さらにアドレス変換テーブルを複数のアドレス空間に分割し、それぞれで書き込みアクセスの粒度を異ならせることにより、データの特性に適したアドレス変換テーブルを選択できるようにする。細かい粒度のアドレス空間を定義するアドレス変換テーブルは、その一部をＳＲＡＭにキャッシュする。例えばデータが更新されることのないゲームプログラムなどは粒度を大きくし、更新されることの多いユーザデータなどは粒度を細かくする。これにより、データ更新時には別の領域を新たに確保することが求められるフラッシュメモリの特性を考慮した無駄のない領域割り当てが可能となり、上述した、粒度を大きくすることによる効果と両立できる。 Further, the address conversion table is divided into a plurality of address spaces, and the address conversion table suitable for the characteristics of the data can be selected by changing the granularity of the write access for each. A part of the address conversion table that defines an address space with a fine granularity is cached in the SRAM. For example, a game program or the like whose data is not updated is increased in granularity, and user data and the like that are frequently updated is decreased in granularity. As a result, it is possible to allocate a wasteful area in consideration of the characteristics of the flash memory, which is required to newly secure another area when updating data, and this can be compatible with the effect of increasing the granularity described above.

またフラッシュメモリへのアクセス要求を担うプロセッサを、メインのプロセッサとは別に設ける。このプロセッサは、ファイルごとのアクセス要求を細かい単位に分割し、以後の処理はその単位でなるべく並列になされるようにする。これに対応するように、フラッシュメモリにはファイルを分割してなるブロックごとに圧縮したデータを格納しておく。これらの構成により、１つのアクセス要求に対する処理速度が実質的に増加する。また読み出されるデータもサイズが小さくなるため、フラッシュコントローラに内蔵したＳＲＡＭでのバッファが可能となり、外付けＤＲＡＭの介在が必要なくなる。 In addition, a processor for requesting access to the flash memory is provided separately from the main processor. This processor divides the access request for each file into fine units, and the subsequent processing is performed in parallel in that unit as much as possible. To cope with this, the compressed data is stored in the flash memory for each block obtained by dividing the file. These configurations substantially increase the processing speed for one access request. Since the data to be read is also reduced in size, it is possible to buffer the SRAM in the flash controller, eliminating the need for an external DRAM.

さらにソフトウェアスタックにおいて、従来のファイルシステムに加えて、アプリケーションとＳＳＤとを直接つなぐインターフェース層を設けることにより、ファイル名からアドレス取得までの処理を単純化する。また読み出しアクセスの要求元が、当該インターフェース層か従来のファイルシステムか、により処理に優先順位をつけ、前者をより優先させるとともに、読み出し単位を従来より大きくすることにより読み出し処理を効率化する。これにより、プログラムファイルのように迅速な読み出しが求められるデータの読み出し処理を、ユーザデータのような一般的なファイルの読み出し処理と差別化できる。 Further, in the software stack, in addition to the conventional file system, an interface layer that directly connects the application and the SSD is provided, thereby simplifying the processing from the file name to the address acquisition. In addition, depending on whether the requester of the read access is the interface layer or the conventional file system, priority is given to the process, the former is given higher priority, and the read unit is made larger than before, thereby improving the efficiency of the read process. This makes it possible to differentiate data read processing that requires quick reading, such as a program file, from general file read processing such as user data.

ファイル名からアドレス取得までに必要なメタデータを単純化することにより、階層化されたメタデータを逐次辿っていく必要がなく、またメタデータ自体のデータサイズを小さくできるためシステムメモリに全体を格納することも可能である。結果として、アドレス取得のためのメモリアクセス処理の負荷が大幅に軽減される。以上の構成により、従来の様々なストレージに対するデータアクセス手順と共存させつつ、フラッシュメモリの高い伝送レートを十分に活かした高速なデータアクセスが可能となる。 By simplifying the metadata required from the file name to address acquisition, it is not necessary to follow the hierarchical metadata sequentially, and the data size of the metadata itself can be reduced, so the entire system memory is stored It is also possible to do. As a result, the load of memory access processing for address acquisition is greatly reduced. With the above configuration, it is possible to perform high-speed data access that makes full use of the high transmission rate of the flash memory while coexisting with conventional data access procedures for various storages.

以上、本発明を実施の形態をもとに説明した。上記実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on the embodiments. Those skilled in the art will understand that the above-described embodiment is an exemplification, and that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are also within the scope of the present invention. is there.

１０情報処理装置、１２ホストユニット、１４システムメモリ、１６外付けＤＲＡＭ、１８フラッシュコントローラ、２０フラッシュメモリ、２２ホストコントローラ、２４ＳＲＡＭ、２６ＤＲＡＭコントローラ、２８メモリコントローラ、３０メインＣＰＵ、３２サブＣＰＵ、３４メモリコントローラ、３６コヒーレントバス、４０ＩＯコントローラ、４２アクセラレータ。 DESCRIPTION OF SYMBOLS 10 Information processing apparatus, 12 Host unit, 14 System memory, 16 External DRAM, 18 Flash controller, 20 Flash memory, 22 Host controller, 24 SRAM, 26 DRAM controller, 28 Memory controller, 30 Main CPU, 32 Sub CPU, 34 Memory controller, 36 coherent bus, 40 IO controller, 42 accelerator.

Claims

A main processor that issues an access request for a file stored in the secondary storage device;
Wherein receiving the access request, the access request, issued by being divided into a plurality of access requests for each data block formed by dividing the file, the main when the data block is enabled all read access A sub-processor to notify the processor;
A controller for receiving an access request from the sub-processor and processing the data blocks individually by processing in parallel using a plurality of channels included in the secondary storage device, and notifying the sub-processor each time ; ,
An information processing apparatus comprising:

The main processor uses an API (Application Programming Interface) for deriving the logical address of the storage location of the file in the secondary storage device from the attribute of the target file when the file needs to be accessed in the processing of the application program. invoke, notify the derived logical address to said sub processor,
The information processing apparatus according to claim 1, wherein the sub-processor generates the plurality of access requests by acquiring a logical address of the data block unit based on the logical address .

The file to be accessed by the main processor is compressed in units of the data block of a predetermined size, and the compressed data blocks of a plurality of files have a size larger than the read unit in the secondary storage device. The information processing apparatus according to claim 1, wherein the information processing apparatus is stored in a continuous area.

The controller refers to an address conversion table for converting a logical address for each data block included in an access request from the sub-processor into a physical address in the secondary storage device;
The address conversion table defines a coarse-grained address space in which physical addresses of the large unit continuous area are collectively associated with logical addresses, and the entirety is stored in a memory built in the controller. The information processing apparatus according to claim 3.

The controller further accepts a file access request from the main processor via a file system unique to the controller, determines a path of the access request based on a logical address of an access destination included therein, and according to the result 5. The information processing apparatus according to claim 3, wherein the size of the writing unit is switched.

For each data block read by the controller, further includes an accelerator for performing tampering check, decoding, and decompression processing,
The sub-processor issues a processing request to the accelerator and receives a processing completion notification from the accelerator every time the sub-processor is notified that a data block has been read from the controller. 6. The information processing apparatus according to any one of 5 above.

An access request dividing unit that receives an access request to a file stored in the secondary storage device in the information processing apparatus from another processor, and divides the file into a plurality of access requests for each data block obtained by dividing the file;
The divided access requests, the access request issuing unit for issuing to the controller to be processed in parallel using a plurality of channels provided in the secondary storage device,
Each time the data block is read, a notification is acquired from the controller, and when all the data blocks are read and accessible, a notification unit that notifies the other processor;
A processor comprising:

The main processor issuing a request to access a file stored in the secondary storage device;
A sub-processor that accepts the access request and issues the access request by dividing it into a plurality of access requests for each data block obtained by dividing the file;
The controller receives an access request from the sub processor and processes the data block individually by processing in parallel using a plurality of channels provided in the secondary storage device, and notifies the sub processor each time. And steps to
A step of said sub-processor, notifies the main processor when the data block is enabled all read access,
An information processing method comprising: