WO2015015727A1 - ストレージ装置、データアクセス方法およびプログラム記録媒体 - Google Patents
ストレージ装置、データアクセス方法およびプログラム記録媒体 Download PDFInfo
- Publication number
- WO2015015727A1 WO2015015727A1 PCT/JP2014/003733 JP2014003733W WO2015015727A1 WO 2015015727 A1 WO2015015727 A1 WO 2015015727A1 JP 2014003733 W JP2014003733 W JP 2014003733W WO 2015015727 A1 WO2015015727 A1 WO 2015015727A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- access
- access request
- storage
- unit
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0877—Cache access modes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1027—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
Definitions
- the present invention relates to a distributed data storage and distribution system, a storage apparatus, a data distribution method, a divided data management apparatus, a host terminal, and a data distribution program, and in particular, a plurality of data generated from a plurality of geographically distant information sources.
- the present invention relates to a technique for storing data in distributed storage devices.
- a wide area network such as the Internet
- systems for processing stored data are known.
- position data using GPS Global Positioning System
- temperature information by a thermometer As information from the sensor, position data using GPS (Global Positioning System), temperature information by a thermometer, acceleration and speed by an acceleration sensor, numerical data such as power consumption by a smart meter, and the like can be considered.
- complex binary data such as audio information acquired by a microphone, a still image acquired by a camera, and a moving image stream can be considered.
- information from the user terminal information such as posting to a microblog service and a log of telephone call information can be considered.
- the above data With the spread of cloud computing that processes data using computer resources connected via the Internet, the above data will be aggregated at geographically distant data centers via the Internet, public wireless networks, etc. It has become to.
- the data In order to transmit the collected data (hereinafter referred to as “collected data”) to the data center system, the data is transmitted to a gateway server (or application server) provided at the entrance of the data center system.
- a gateway server or application server
- the gateway server or application server provided at the entrance of the data center is referred to as a “storage client”.
- the collected data that has reached the network in the data center is received and processed by the storage client, and is stored in the storage system to be made permanent and used for analysis and the like.
- “permanent data” refers to holding data so that it persists without disappearing.
- An example of the persistence is to store, in a nonvolatile storage medium, a copy or a code that satisfies the redundancy defined in the system.
- a storage system is a system that holds data and provides the held data. Specifically, the storage system provides basic functions (access) such as CREATE (INSERT), READ, WRITE (UPDATE), and DELETE for a part of data. In addition, the storage system may provide various functions such as authority management and data structure organization.
- the distributed storage system has a large number of computers connected via a communication network and an interconnect, and realizes a storage system using a storage device included in these computers.
- data is distributed and stored in multiple storage nodes. Therefore, when a storage client accesses data, it is necessary to grasp the storage node that holds the data. Further, when there are a plurality of storage nodes that hold data to be accessed, the storage client needs to grasp which storage node should be accessed.
- Stored data is accessed in a meaningful unit.
- data is written in units called records or tuples.
- data is written as a set of blocks.
- key value store data is written as an object. The data written in this way is read by the user computer for each unit.
- this data unit is referred to as a “data object”.
- HDD Hard Disk Drive
- magnetic tape has been used as a storage device.
- SSD solid state drive
- nonvolatile semiconductor memory such as a flash memory capable of reading and writing at a higher speed in addition to the HDD
- the distributed storage system can also use a volatile storage device by holding replicas in a plurality of storage nodes.
- in-memory storage that uses DRAM (Dynamic Random Access Memory), which is used as a main storage device of a computer, and can read and write at higher speed than SSD is increasing.
- DRAM Dynamic Random Access Memory
- the size of each data object is as small as several tens to several hundreds of bytes. Therefore, it is inefficient for access in units of 4 Kbytes that are normally used in HDDs. It is. Therefore, it is preferable to use in-memory storage.
- the CPU Central Processing Unit
- the CPU browses and processes the data in the main memory of the storage node in order to store, acquire, scan, and specify the data.
- the access speed to the DRAM is generally several hundred times slower than the CPU operating clock. Therefore, the CPU is composed of an accessible SRAM (Static Random Access Memory) that is faster and has lower latency (ie, a shorter time from when a data transfer is requested until the result is returned).
- SRAM Static Random Access Memory
- a cache memory A cache memory.
- the cache memory has a multi-stage configuration, and a cache having a relatively long access latency shared by a plurality of cores is used.
- a cache with short latency that is held in each core and consistently managed between the cores, or a primary cache that operates at almost the same speed as the CPU is also used.
- the CPU may have a function called MMU (Memory Management Unit) in order to handle main memory efficiently.
- MMU Memory Management Unit
- Access from a program running on a computer uses a series of memory address spaces (virtual memory spaces) closed for each program (or process).
- Access to the main memory of each process is specified by the address (logical address) of the virtual memory space, and this logical address is the address of the physical memory unit (physical address) by the logical-physical address conversion function (logical-physical conversion). ).
- This logical-physical address conversion is implemented in OS (Operating System) software, but there is a problem that the operation is slow when realized only by software. Therefore, the MMU performs a part of logical-physical address conversion.
- the MMU is equipped with a small amount of cache memory called TLB (Translation Look-aside Buffer), and can perform logical-physical conversion at high speed by recording frequently used logical-physical conversion data in the TLB.
- TLB Translation Look-aside Buffer
- the amount of data that can be stored in main memory as in-memory storage that is, the amount of memory installed in each computer, has increased, and the CPU has become faster than DRAM. Therefore, it is known that a long access time (penalty) due to a cache miss or a TLB miss when using the main memory as an in-memory storage is a problem in performance.
- the cache miss is DRAM access when necessary data does not exist in the cache memory.
- a TLB miss is DRAM access when there is no logical-physical conversion information for necessary data access in the MMU TLB.
- Non-Patent Document 2 proposes a memory index structure that takes into account the penalty of cache misses and TLB misses.
- Recent computers have a multi-core configuration equipped with a plurality of CPUs, and the processing is preferably divided into processing units called a plurality of threads in order to utilize the processing by the cores.
- the number of threads is the same as the number of cores, no core context switch occurs, but the number of threads is generally much larger than the number of cores. This is because the program is simplified (easy design), the core idle resource is hidden when a cache miss occurs, and the same software is reused in various hardware.
- Non-Patent Document 1 discloses a technique related to thread allocation for OLTP (On-Line Transaction Processing) in consideration of the influence of this context switch.
- one object When the stream data is a list in which objects are arranged in time series, one object includes a unique primary key (primary key) and one or more properties (metadata 1, metadata 2,). For example, stream data including values of two properties (name1, name2) in addition to the primary key has a configuration of ⁇ key: hogehoge, name1: value1, name2: value2 ⁇ .
- the data newly acquired by the sensor is stored as the lump of data described above.
- an index structure there is known a tree structure called B + -Tree that can be quickly searched and specified.
- a structure such as T-Tree that is more suitable for memory access is known for in-memory storage.
- stream data is stored in distributed storage that stores data using in-memory storage. If there is a difference in the frequency of occurrence of stream data and the frequency of use of the data for each sensor, the locality of data usage occurs, which contributes to speeding up the cache memory.
- an access request hereinafter referred to as “data use access” for using data with less bias (less locality) is generated for a stream with less occurrence frequency.
- data use access an access request for using data with less bias (less locality) is generated for a stream with less occurrence frequency.
- Various applications have come to be considered.
- such a system may be a system in which all video data acquired by a surveillance camera or the like is stored as stream data and all the data is used for facial image recognition.
- a node to be stored is determined according to the range of the primary key value or the hash value.
- a method is used in which a node to be stored is determined according to the range of the primary key value or the hash value.
- desired data may be stored in all nodes, and therefore data use access must be issued to all nodes. Therefore, since the number of data use accesses to each storage node increases dramatically, the effect of a decrease in access performance due to a cache miss is increased as described above.
- the use of data that causes a large amount of data use access as described above is characterized in that a larger access delay time is allowed, unlike the distributed storage system used in financial institutions and companies. . Since the delay time of public wireless lines is as large as several tens of milliseconds, even if a distributed storage system that performs communication via such public wireless lines does not necessarily provide data on the order of microseconds, the impact on performance is not affected. Few. In the data usage environment as described above, the throughput performance of how many accesses can be handled per second is more important than the response.
- the present invention has been made in view of the above problems, and provides a storage device, a data access method, and a program recording medium capable of providing data with higher throughput performance in in-memory storage access in an access environment with less locality.
- the main purpose is to provide.
- a storage device includes a data storage unit including a main memory that stores data in units of blocks, and a cache memory that can store data stored in the main memory in units of blocks, and the data
- An access request accumulating unit that accumulates access requests for data stored in the storage unit, and the access request accumulated in the access request accumulating unit includes a main request included in the data storage unit according to a predetermined condition being satisfied.
- the data stored in the memory is sequentially read out in units of blocks, written to the cache memory and scanned, and an access request for the data specified by the scan is read from the access request storage unit, The specified data for the source of the access request Reply identifiable information and an access retrieval unit.
- a data access method is stored in a data storage unit including a main memory that stores data in units of blocks, and a cache memory that can store the data stored in the main memories in units of blocks.
- the access request for the data to be stored is stored in the access request storage unit, and the access request stored in the access request storage unit is stored in the main memory included in the data storage unit in response to a predetermined condition being satisfied.
- the data scanning unit sequentially reads the data in units of blocks, writes the data to the cache memory, and scans the access request for the data specified by the scan, and the access search unit reads the access request from the access request accumulation unit. Specified for the source of the access request. And returns the information that data can be identified.
- the object is also achieved by a computer program for realizing the storage apparatus or data access method having the above-described configurations by a computer, and a computer-readable recording medium in which the computer program is stored.
- FIG. 1 is a block diagram showing a configuration of a distributed storage system according to a first embodiment of the present invention. It is a block diagram which shows the structure of the storage node which concerns on the 1st Embodiment of this invention. It is a flowchart explaining operation
- FIG. 1 is a block diagram showing a configuration of a distributed storage system 100 according to a first embodiment of the present invention.
- the distributed storage system 100 includes a device 200 and a distributed storage apparatus 400 that can communicate with each other via an internal network 300.
- the device 200 is a device equipped with, for example, a GPS, an acceleration sensor, a camera, and the like, and acquires position information, acceleration, image data, and the like, and transmits them to the distributed storage apparatus 400 via the internal network 300.
- the internal network 300 is, for example, Ethernet (registered trademark), Fiber Channel or FCoE (Fibre Channel over Ethernet (registered trademark)), InfiniBand, QsNet, Myrinet, Ethernet, PCI Express, Thunderbolt, or TCP / IP using these. (Transmission Control Protocol / Internet Protocol), RDMA (Remote Direct Memory Access) and other upper protocols are used.
- the distributed storage device 400 includes a plurality of storage nodes 40.
- the storage node 40 includes a data transmission / reception unit 41 that transmits / receives stream data via the internal network 300 and a data storage unit 42 that stores the received stream data.
- the distributed storage apparatus 400 is not limited to the apparatus 400 receiving the stream data transmitted from the device 200, and a computer (not shown) may receive the stream data and receive the stream data from the computer. .
- Storage node 40 transmits / receives stream data to / from each other via internal network 300.
- a storage node 40 that accesses another storage node 40 is a client terminal.
- the client terminal may be a computer different from its own node, or a software instance (process, thread, fiber, etc.) operating on the computer.
- the client terminal may be a software instance that operates on the storage node 40 or another device constituting the distributed storage device 400.
- a plurality of software programs that operate on one or more computers may be regarded as one virtual client terminal.
- the client terminal can acquire stream data from each of the plurality of storage nodes 40 that distribute and store the stream data transmitted from the device 200.
- FIG. 2 is a block diagram showing the configuration of the storage node 40 according to the first embodiment of the present invention.
- the storage node 40 includes a data transmission / reception unit 41, a data storage unit 42, a control unit 43, a data use access buffer 44, a data scanning unit 45, a data acquisition unit 46, and a data search unit 47.
- the data storage unit 42 includes a main memory 42a and a cache memory 42b.
- the access request for data use is called “data use access”, and the access request for data storage (write access) is called “data storage access”.
- a terminal that transmits data use access or data storage access to the storage node 40 is referred to as a client terminal 40a.
- the storage node 40 receives an access request from the client terminal 40a, the storage node 40 performs processing according to the request and returns a response to the client terminal 40a.
- the storage node 40 may return a response including the success or failure of the storage to the data storage access.
- the storage node 40 may return a response including whether or not there is data that matches the requested access condition for the data use access.
- the storage node 40 may return a response including a part or all of the data when the corresponding data exists for the data use access, or instead of a part or all of the data, A response including handle information necessary for acquiring the data may be returned.
- the client terminal 40a can acquire data from the storage node 40, another storage node, or another information system using the handle information.
- the data transmission / reception unit 41 transmits / receives stream data and access requests to / from the client terminal 40a.
- the data storage unit 42 stores the stream data received via the data transmission / reception unit 41.
- the control unit 43 stores the stream data in the data storage unit 42 or stores the access request in the data use access buffer 44 based on the type of access request received by the data transmission / reception unit 41.
- the data use access buffer 44 stores the data use access acquired from the control unit 43.
- the data scanning unit 45 scans the stream data stored in the data storage unit 42 based on the data usage access stored in the data usage access buffer 44.
- the data acquisition unit 46 acquires the stream data scanned by the data scanning unit 45.
- the data search unit 47 reads the data use access corresponding to the stream data acquired by the data acquisition unit 46 by searching the data use access buffer 44.
- FIG. 3 is a flowchart for explaining the operation of the storage node 40. Details of the operation of the storage node 40 will be described with reference to FIG.
- the data transmission / reception unit 41 When the data transmission / reception unit 41 receives an access request from the client terminal 40a, the data transmission / reception unit 41 notifies the control unit 43 of the access request.
- the control unit 43 determines the type of the received access request (Step S101).
- the control unit 43 stores the stream data (hereinafter, also simply referred to as “data”) acquired together with the data storage access in the data storage unit 42 (step S102). S103).
- control unit 43 may physically store the stream data in the data storage unit 42 and perform the perpetuation processing of the stream data. That is, the control unit 43 may create a copy of the stream data and store it, or calculate an error correction code and add it to the stream data. Further, the control unit 43 may change not only the stream data itself but also structure data for managing the data.
- the control unit 43 After storing the stream data, the control unit 43 sends an appropriate response to the client terminal 40a (step S104).
- the appropriate response includes information indicating that the stream data has been normally stored in the data storage unit 42.
- the reply may be made before data storage or after data storage. If it is allowed to reply before storing data, the storage node 40 has a faster configuration. If it is allowed to return after storing data, the storage node 40 is configured to be more resistant to failures.
- control unit 43 stores the access request in the data use access buffer 44 (step S105).
- Data access includes data specific conditions.
- the data specifying condition is a condition such as including or not including a key value of stream data, a specific value of a part constituting the stream data, or a specific value range.
- the data specifying condition is that “key” is “hogehoge” or “name1” It must be between “value0” and “valu2”.
- the data specifying conditions in the present invention described using the present embodiment of the present application as an example are not limited to the above.
- the control unit 43 stores the received data use access in the data use access buffer 44 until a predetermined access trigger condition is satisfied (step S106).
- the access trigger condition may be, for example, a case where the number of access requests stored in the data use access buffer 44 becomes a certain number or more.
- the access trigger condition may be a case where the amount of access requests stored in the data use access buffer 44 is equal to or greater than a certain amount.
- the access trigger condition may be a case where a predetermined time has elapsed from the issue time of the oldest data use access stored in the data use access buffer 44.
- the access trigger condition may be a combination of the above examples.
- the access trigger condition in the present invention that is described taking the present embodiment of the present application as an example is not limited to the above.
- the control unit 43 instructs the data scanning unit 45 to scan the stream data.
- the data scanning unit 45 refers to the access request stored in the data use access buffer 44 and sequentially scans the data stored in the data storage unit 42 (step S107). It is desirable that the scanning order be a system that can be accessed from the data storage unit 42 at a higher speed.
- the scanning is to read data stored in the main memory 42a and write it to the cache memory 42b, and to sequentially specify the written data in the cache memory 42b.
- the data scanning unit 45 may scan all data in the order of the memory addresses of the main memory 42a of the data storage unit 42. Alternatively, the data scanning unit 45 may first scan the data stored in the cache memory 42b first, and then scan unscanned data (details will be described later).
- the data to be scanned is all the necessary data stored in the data storage unit 42 in advance.
- the necessary data is, for example, all stored data, data that has been updated after the previous scan of stored data, or data that has been updated within the last second of stored data, etc. is there.
- the data scanning unit 45 sequentially sends the identified data to the data acquisition unit 46 (step S108).
- the data scanning unit 45 may send all the specified data to the data acquisition unit 46, or may send only a part of the data necessary for data access.
- the above data is sent from the data acquisition unit 46 to the data search unit 47.
- the data search unit 47 Upon receiving the data, the data search unit 47 reads an access request (data use access) for the data from the data use access buffer 44 (step S109).
- the data search unit 47 specifies the data acquired from the data acquisition unit 46, a part of the data, or the data in the access request response area. Handle information is inserted (step S111).
- the data specifying condition included in the access request “X” stored in the data use access buffer 44 is ““ key ”is“ hogehoge ””.
- the data specifying condition included in the access request “Y” stored in the data use access buffer 44 is “name1” of data whose “key” is “hogehoge”.
- the data search unit 47 sends to the control unit 43 an access request in which data is inserted into the response area.
- the control unit 43 receives notification from the data scanning unit 45 that all necessary data has been scanned (step S112)
- the control unit 43 receives the client via the data transmission / reception unit 41 based on the information stored in the data use access buffer 44.
- a response to the access request is made to the terminal 40a (step S113).
- the control unit 43 deletes the returned access request from the data use access buffer 44 (step S114).
- the storage node 40 accumulates the data use access in the data use access buffer 44, and sequentially scans the data stored in the data storage unit 42 when the access trigger condition is satisfied. With this operation, the storage node 40 can return an appropriate data when receiving a data use access from another terminal.
- the data scanning unit 45 scans all data in the order of the memory addresses of the main memory 42a of the data storage unit 42. That is, first, the data scanning unit 45 reads the block 1 of the main memory 42a, writes it into the cache memory 42b, and scans the cache memory 42b (this scanning is referred to as “first scanning”). As a result of the first scan, data “A”, “B”, “C”, and “D” are specified.
- the data scanning unit 45 reads the block 2 of the main memory 42a, writes it into the cache memory 42b, and scans the cache memory 42b (this scanning is referred to as “second scanning”). As a result of the second scan, data “E”, “F”, “G”, and “H” are specified. Subsequently, the data scanning unit 45 reads the block 3 of the main memory 42a, writes it into the cache memory 42b, and scans the cache memory 42b (this scan is referred to as “third scan”). As a result of the third scan, data “I”, “J”, “K”, and “L” are specified.
- the result of the above-described cache miss / cache hit for data use access is A (miss) B (hit) C (hit) D (hit) E (miss) F (hit) G (hit) H (hit) I (miss) J (hit) K (hit) L (hit).
- a (miss) indicates that access to data A has been missed (cache miss)
- B (hit) indicates that data B has been read from the cache memory 42b (cache hit).
- data C, G, L, and D are subject to the access request.
- the cache miss occurs three times.
- the data scanning unit 45 does not scan all data in the main memory 42a but accesses the data C, G, L, and D every time an access request is received, the access result is shown in FIG. In the example shown, C (miss) G (miss) L (miss) D (miss). That is, in this case, the cache miss occurs four times.
- N cache misses and N TLB misses occur in the worst case in the normal technology.
- the cache memory 42b of the data storage unit 42 can store 100 objects in one page, data can be acquired with N / 100 cache misses at worst. , Cache hit rate can be improved.
- the control unit 43 accumulates the received access request in the data use access buffer 44, and the data scanning unit 45 stores the data when the access trigger condition is satisfied. All data in the unit 42 is scanned in order.
- the data search unit 47 reads an access request for the data specified by scanning from the data use access buffer 44.
- the control unit 43 inserts information related to the data into the read access request and returns it to the client terminal 40a.
- execution of data use access is sequential with respect to the data storage unit 42, and the number of cache misses and TLB misses with respect to the number of accesses per hour can be reduced.
- execution of data use access is sequential with respect to the data storage unit 42, and the number of cache misses and TLB misses with respect to the number of accesses per hour can be reduced.
- FIG. 5 is a block diagram showing a configuration of a storage node 50 according to a second embodiment of the present invention. As illustrated in FIG. 5, the storage node 50 includes a data decomposition unit 51 in addition to the storage node 40 according to the first embodiment.
- the data decomposing unit 51 decomposes the stream data sent from the control unit 43 into a plurality of fragments and stores them in the data storage unit 42 in a decomposed state.
- the data decomposing unit 51 has three data, that is, ⁇ ⁇ Key: "key1", uid: “101", temp: 3 ⁇ , ⁇ ⁇ Key: “key2”, uid: “102”, temp: 10 ⁇ , ⁇ ⁇ Key: “key3”, uid: "103", temp: 1 ⁇ , Is stored in a column-oriented format in a state where each data is decomposed as follows. That is, the data decomposition unit 51 Memory-area1 ⁇ "key1", “key2”, “key3”, ... ⁇ , Memory-area2 ⁇ "101", "102", “103", ... ⁇ , Memory-area3 ⁇ 3, 10, 1, ... ⁇ , Each data is stored as follows.
- the above storage format is an example, and the storage format in the present invention that is described taking the present embodiment of the present application as an example is not limited to the above.
- the storage node 50 according to the second embodiment of the present invention further speeds up data use access by further increasing the efficiency of data storage.
- the data decomposition unit 51 stores the data acquired from the control unit 43 in the data storage unit 42 in a column-oriented format.
- the data scanning unit 45 scans only the property part included in the data specifying condition among the data stored in the data storage unit 42. Since the other components operate in the same manner as described in the first embodiment, description thereof is omitted.
- the data decomposing unit 51 stores the data in the data storage unit 42 in a decomposed state, and the data scanning unit 45 scans only the property part included in the data specifying condition.
- the number of cache misses relative to the number of accesses per hour can be further reduced.
- the configuration described in the first embodiment causes N / 100 cache misses and N TLB misses in the worst case.
- the capacity of the property to be accessed is 10% of the capacity of the entire data object, the data is acquired with N / 1000 cache misses and TLB misses at worst. be able to.
- the data decomposition unit 51 stores the data acquired from the control unit 43 in the data storage unit 42 in a state of being decomposed based on, for example, a column-oriented format.
- the data scanning unit 45 scans only the property part included in the data specifying condition among the data stored in the data storage unit 42.
- FIG. 6 is a block diagram showing a configuration of a storage node 60 according to a third embodiment of the present invention.
- the control unit 61 includes an access sorting unit 62, and the data use access buffer 63 includes the first buffer 63a.
- the configuration including the second buffer 63b is different.
- Other configurations are the same as those of the storage node 40 according to the first embodiment.
- the access sorting unit 62 sorts the access request according to the access buffer condition.
- the access sorting unit 62 stores the access request in one or both of the first buffer 63a and the second buffer 63b according to the sorting.
- the access buffer condition is, for example, a condition for sorting access requests for each property used for data specification by data use access.
- the data specifying condition for data ⁇ key: hogehoge, name1: value1, name2: value2 ⁇ stores an access request for "key” in the first buffer 63a, and an access request for "name1" Is stored in the second buffer 63b.
- the data search unit 47 searches for the “key” portion of the data in the first buffer 63a and the “name1” portion in the second buffer 63b. Used for searching.
- the data search unit 47 decomposes the specified data received from the data acquisition unit 46 based on the access buffer condition as necessary (in this case, the data is divided into a “key” portion and a “name1” portion.
- the “key” portion of the data is used for searching the first buffer 63a, and the “name1” portion is used for searching the second buffer 63b.
- the storage node 60 may include the data decomposition unit 51, and the data storage unit 42 may store the data in a decomposed state.
- the data scanning unit 45 scans in parallel the region storing the “key” portion and the region storing the “name” portion of the data storage unit 42.
- the data search unit 47 uses the “key” portion of the data for searching the first buffer 63a and the “name1” portion for searching the second buffer 63b. .
- the access buffer condition a condition for sorting access requests according to the range of some values of data can be considered.
- the first buffer 63a stores an access request for data whose data specifying condition for the data ⁇ key: hogehoge, name1: value1, name2: value2 ⁇ is “a”.
- An access request for data whose initial of “key” is “b” is stored in the second buffer 63b.
- the access sorting unit 62 sorts the access requests according to the access buffer conditions, and one or both of the first buffer 63a and the second buffer 63b. To store.
- the data search unit 47 searches the first buffer 63a and the second buffer 63b in parallel for access to the data received from the data acquisition unit 46 based on the access buffer condition.
- a plurality of access buffer means may be arranged in caches of different cores. With this configuration, the cache utilization efficiency can be further improved and the system throughput can be improved.
- FIG. 7 is a block diagram showing a configuration of a storage node 70 according to a fourth embodiment of the present invention.
- the storage node 70 is different from the storage node 40 according to the first embodiment in the configuration in which the control unit 71 includes an access compression unit 72, and the other configurations are the same as those in the first embodiment. This is the same as the storage node 40 according to the embodiment.
- the access compression unit 72 compresses the data. That is, the access compression unit 72 extracts minimum information (access specifying information) that can specify data from the data use access. For example, the access compression unit 72 extracts a set of a data access identifier of several bits and a data specifying condition of several bits. Thereby, the storage node 70 can execute one access based on information of about 2 bytes.
- the access compression unit 72 stores the access specifying information and information representing the entire access in the data use access buffer 44 in different areas.
- the data search unit 47 searches for only access specifying information stored in the data use access buffer 44, thereby reading an access request corresponding to the specified data.
- the access compressing unit 72 extracts the access specifying information from the data use access and stores the access specifying information in the data use access buffer 44.
- the data retrieval unit 47 retrieves access specifying information from the data use access buffer 44 to read an access request corresponding to the specified data.
- the data search unit 47 searches only the area of the data use access buffer 44 that stores the access specifying information, so that the effect of speeding up the data use access can be obtained.
- FIG. 8 is a block diagram showing a configuration of a storage node 80 according to a fifth embodiment of the present invention.
- the storage node 80 includes a data storage unit 81, an access request storage unit (data use access buffer) 82, a data scanning unit 83, and an access search unit 84.
- the data storage unit 81 includes a main memory that stores data in units of blocks, and a cache memory that can store data stored in the main memory in units of blocks.
- the access request storage unit 82 stores access requests for data stored in the data storage unit 81.
- the data scanning unit 83 sequentially reads the data stored in the main memory included in the data storage unit 81 in units of blocks in response to the access request stored in the access request storage unit 82 satisfying a predetermined condition. Write to the cache memory and scan.
- the access search unit 84 reads an access request for data specified by scanning from the access request storage unit 82 and returns information that can specify the specified data to the transmission source of the access request.
- each unit of the storage node (storage device) shown in FIGS. 2, 5, 6, 7, and 8 is realized by a computer, it is realized by the hardware resources illustrated in FIG. That is, the configuration shown in FIG. 9 includes a CPU 10, a RAM (Random Access Memory) 11, a ROM (Read Only Memory) 12, a network interface 13, and a storage medium 14.
- the CPU 10 of the storage node controls the overall operation of the storage node by reading out various software programs (computer programs) stored in the ROM 12 or the storage medium 14, writing them in the RAM 11, and executing them. That is, in each of the above embodiments, the CPU 10 executes a software program that executes each function (each unit) included in the storage node while referring to the ROM 12 or the storage medium 14 as appropriate.
- the storage node (storage device) shown in FIGS. 2, 5, 6, 7 and 8 is executed by the software program as an example executed by the CPU 10 shown in FIG. The case where it is realized has been described. However, some or all of the functions shown in each block shown in the above drawings may be realized as hardware.
- the present invention described by taking each embodiment as an example provides the computer program after supplying a computer program capable of realizing the function of the flowchart (FIG. 3) referred to in the description to the storage node (storage device). Is achieved by the CPU 10 writing to the RAM 11 and executing it.
- the supplied computer program may be stored in a computer-readable storage device such as a readable / writable memory (temporary storage medium) or a hard disk device.
- a computer-readable storage device such as a readable / writable memory (temporary storage medium) or a hard disk device.
- the present invention can be understood to be configured by a code representing the computer program or a recording medium storing the computer program.
- the present invention can be applied to, for example, a system for storing and processing sensor information of mobile phones and smartphones, and a system for storing and processing log information of computer systems. Further, the present invention can be applied to a system for storing and processing power generation information and usage information such as a smart grid and a digital grid, and a vehicle sensor and car navigation information such as ITS (Intelligent Transport Systems) for storing and processing. Further, the present invention can be applied to, for example, an M2M (Machine-To-Machine) system that sequentially collects purchase information and operation information of machines such as vending machines through a network.
- M2M Machine-To-Machine
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
図1は、本発明の第1の実施形態に係る分散ストレージシステム100の構成を示すブロック図である。図1に示すように、分散ストレージシステム100は、内部ネットワーク300を介して互いに通信可能なデバイス200と分散ストレージ装置400を備える。
図5は、本発明の第2の実施形態に係るストレージノード50の構成を示すブロック図である。図5に示すように、ストレージノード50は、第1の実施形態に係るストレージノード40に加えて、データ分解部51を備える。
・{key: "key1", uid:"101", temp: 3},
・{key: "key2", uid:"102", temp: 10},
・{key: "key3", uid:"103", temp: 1},
を、列指向形式で格納する場合、以下のように各データを分解した状態で格納する。すなわち、データ分解部51は、
・memory-area1{ "key1", "key2", "key3",...},
・memory-area2{ "101", "102", "103",...},
・memory-area3{ 3, 10, 1,...},
のように各データを格納する。ここで、上記格納形式は一例であり、本願の本実施形態を例に説明する本発明における格納形式は、上記に限定されない。
図6は、本発明の第3の実施形態に係るストレージノード60の構成を示すブロック図である。図6に示すように、ストレージノード60は、第1の実施形態に係るストレージノード40と比較して、制御部61がアクセス仕分部62を備え、データ利用アクセスバッファ63が第1のバッファ63aと第2のバッファ63bを備える構成が異なる。その他の構成は、第1の実施形態に係るストレージノード40と同様である。
図7は、本発明の第4の実施形態に係るストレージノード70の構成を示すブロック図である。図7に示すように、ストレージノード70は、第1の実施形態に係るストレージノード40と比較して、制御部71がアクセス圧縮部72を備える構成が異なり、その他の構成は、第1の実施形態に係るストレージノード40と同様である。
図8は、本発明の第5の実施形態に係るストレージノード80の構成を示すブロック図である。図8に示すように、ストレージノード80は、データ格納部81、アクセス要求蓄積部(データ利用アクセスバッファ)82、データ走査部83およびアクセス検索部84を備える。
41 データ送受信部
42 データ格納部
43 制御部
44 データ利用アクセスバッファ
45 データ走査部
46 データ取得部
47 データ検索部
51 データ分解部
62 アクセス仕分部
72 アクセス圧縮部
Claims (10)
- ブロック単位でデータを格納するメインメモリと、当該メインメモリに格納されるデータを、前記ブロック単位で格納可能なキャッシュメモリとを含むデータ格納手段と、
前記データ格納手段に格納されるデータに対するアクセス要求を蓄積するアクセス要求蓄積手段と、
前記アクセス要求蓄積手段に蓄積されたアクセス要求が所定の条件を満たすのに応じて、前記データ格納手段に含まれる前記メインメモリに格納されているデータを、前記ブロック単位で順に読み出して前記キャッシュメモリに書き込むと共に走査するデータ走査手段と、
前記走査によって特定されたデータに対するアクセス要求を、前記アクセス要求蓄積手段から読み出すと共に、当該アクセス要求の送信元に対して前記特定されたデータを特定できる情報を返信するアクセス検索手段と
を備えたストレージ装置。 - 前記データ走査手段は、前記メインメモリのメモリアドレス順に、前記データを、前記ブロック単位で読み出して、前記キャッシュメモリに書き込む
請求項1記載のストレージ装置。 - 前記データ走査手段は、前記キャッシュメモリにデータが格納される場合は、そのデータを走査した後に、未走査のデータを前記ブロック単位で順に前記メインメモリから読み出して前記キャッシュメモリに書き込む
請求項1記載のストレージ装置。 - キー値と、当該キー値に紐づけられたプロパティとを含むデータを、前記キー値と前記プロパティとに分解すると共に、分解した状態で当該データを前記データ格納手段に格納するデータ分解手段をさらに備え、
前記データ走査手段は、前記データ格納手段に含まれる前記メインメモリに格納されているデータの前記プロパティを前記キャッシュメモリに書き込むと共に走査する
請求項1記載のストレージ装置。 - 前記アクセス要求を、前記データを特定するための対象ごとに前記アクセス要求蓄積手段の異なる領域に蓄積するアクセス仕分手段をさらに備え、
前記アクセス検索手段は、前記走査によって特定されたデータに対するアクセス要求を、当該データを特定するための対象を含むアクセス要求が蓄積された領域から読み出す
請求項1記載のストレージ装置。 - 前記アクセス要求から、前記データを特定可能な情報を抽出すると共に、当該抽出した情報と前記アクセス要求とを、前記アクセス要求蓄積手段の異なる領域に蓄積するアクセス圧縮手段をさらに備え、
前記アクセス検索手段は、前記走査によって特定されたデータに対するアクセス要求を、前記データを特定可能な情報が蓄積された領域から読み出す
請求項1記載のストレージ装置。 - 前記データ走査手段は、前記メインメモリのメモリアドレス順に、前記ブロック単位ですべてのデータを読み出し、前記キャッシュメモリに書き込む
請求項1記載のストレージ装置。 - ブロック単位でデータを格納するメインメモリと、当該メインメモリに格納されるデータを、前記ブロック単位で格納可能なキャッシュメモリとを含むデータ格納手段に格納されるデータに対するアクセス要求をアクセス要求蓄積手段に蓄積し、
前記アクセス要求蓄積手段に蓄積されたアクセス要求が所定の条件を満たすのに応じて、前記データ格納手段に含まれる前記メインメモリに格納されているデータを、データ走査手段により前記ブロック単位で順に読み出して前記キャッシュメモリに書き込むと共に走査し、
前記走査によって特定されたデータに対するアクセス要求を、アクセス検索手段により、前記アクセス要求蓄積手段から読み出すと共に、当該アクセス要求の送信元に対して前記特定されたデータを特定できる情報を返信する
データアクセス方法。 - 前記走査する際に、前記メインメモリのメモリアドレス順に、前記ブロック単位で前記データを前記キャッシュメモリに書き込む
請求項8記載のデータアクセス方法。 - ブロック単位でデータを格納するメインメモリと、当該メインメモリに格納されるデータを、前記ブロック単位で格納可能なキャッシュメモリとを含むデータ格納手段に格納されるデータに対するアクセス要求をアクセス要求蓄積手段に蓄積する処理と、
前記アクセス要求蓄積手段に蓄積されたアクセス要求が所定の条件を満たすのに応じて、前記データ格納手段に含まれる前記メインメモリに格納されているデータを、前記ブロック単位で順に読み出して前記キャッシュメモリに書き込むと共に走査する処理と、
前記走査によって特定されたデータに対するアクセス要求を、前記アクセス要求蓄積手段から読み出すと共に、当該アクセス要求の送信元に対して前記特定されたデータを特定できる情報を返信する処理
を、コンピュータに実行させるデータアクセスプログラムを記録するプログラム記録媒体。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2015529343A JP6406254B2 (ja) | 2013-07-30 | 2014-07-15 | ストレージ装置、データアクセス方法およびデータアクセスプログラム |
US14/908,161 US20160210237A1 (en) | 2013-07-30 | 2014-07-15 | Storage device, data access method, and program recording medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2013157346 | 2013-07-30 | ||
JP2013-157346 | 2013-07-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015015727A1 true WO2015015727A1 (ja) | 2015-02-05 |
Family
ID=52431290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/003733 WO2015015727A1 (ja) | 2013-07-30 | 2014-07-15 | ストレージ装置、データアクセス方法およびプログラム記録媒体 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20160210237A1 (ja) |
JP (1) | JP6406254B2 (ja) |
WO (1) | WO2015015727A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20220027634A (ko) * | 2020-08-27 | 2022-03-08 | 주식회사 아미크 | 인 메모리 데이터베이스의 데이터를 처리하는 방법 및 장치 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10168901B2 (en) * | 2015-03-12 | 2019-01-01 | Toshiba Memory Corporation | Memory system, information processing apparatus, control method, and initialization apparatus |
JP6241449B2 (ja) * | 2015-05-21 | 2017-12-06 | 横河電機株式会社 | データ管理システム及びデータ管理方法 |
CN114428707B (zh) * | 2022-01-12 | 2024-08-09 | 武汉美和易思数字科技有限公司 | 一种基于资源的分布式存储方法、系统、设备及存储介质 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006227688A (ja) * | 2005-02-15 | 2006-08-31 | Hitachi Ltd | ストレージシステム |
JP2011008705A (ja) * | 2009-06-29 | 2011-01-13 | Toshiba Corp | ファイル共有システム |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6182121B1 (en) * | 1995-02-03 | 2001-01-30 | Enfish, Inc. | Method and apparatus for a physical storage architecture having an improved information storage and retrieval system for a shared file environment |
US7162550B2 (en) * | 2003-07-21 | 2007-01-09 | Intel Corporation | Method, system, and program for managing requests to an Input/Output device |
US9734198B2 (en) * | 2007-11-30 | 2017-08-15 | Red Hat, Inc. | Query processing |
JP4798672B2 (ja) * | 2009-06-29 | 2011-10-19 | 東芝ストレージデバイス株式会社 | 磁気ディスク装置 |
US9104629B2 (en) * | 2009-07-09 | 2015-08-11 | International Business Machines Corporation | Autonomic reclamation processing on sequential storage media |
JP5999645B2 (ja) * | 2009-09-08 | 2016-10-05 | ロンギチュード エンタープライズ フラッシュ エスエイアールエル | ソリッドステート記憶デバイス上にデータをキャッシングするための装置、システム、および方法 |
US20140372607A1 (en) * | 2010-03-15 | 2014-12-18 | Cleversafe, Inc. | Adjusting allocation of dispersed storage network resources |
US20120311271A1 (en) * | 2011-06-06 | 2012-12-06 | Sanrad, Ltd. | Read Cache Device and Methods Thereof for Accelerating Access to Data in a Storage Area Network |
US8595267B2 (en) * | 2011-06-27 | 2013-11-26 | Amazon Technologies, Inc. | System and method for implementing a scalable data storage service |
US9063974B2 (en) * | 2012-10-02 | 2015-06-23 | Oracle International Corporation | Hardware for table scan acceleration |
-
2014
- 2014-07-15 JP JP2015529343A patent/JP6406254B2/ja active Active
- 2014-07-15 US US14/908,161 patent/US20160210237A1/en not_active Abandoned
- 2014-07-15 WO PCT/JP2014/003733 patent/WO2015015727A1/ja active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006227688A (ja) * | 2005-02-15 | 2006-08-31 | Hitachi Ltd | ストレージシステム |
JP2011008705A (ja) * | 2009-06-29 | 2011-01-13 | Toshiba Corp | ファイル共有システム |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20220027634A (ko) * | 2020-08-27 | 2022-03-08 | 주식회사 아미크 | 인 메모리 데이터베이스의 데이터를 처리하는 방법 및 장치 |
KR102529704B1 (ko) | 2020-08-27 | 2023-05-09 | 주식회사 아미크 | 인 메모리 데이터베이스의 데이터를 처리하는 방법 및 장치 |
Also Published As
Publication number | Publication date |
---|---|
JP6406254B2 (ja) | 2018-10-17 |
JPWO2015015727A1 (ja) | 2017-03-02 |
US20160210237A1 (en) | 2016-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10795905B2 (en) | Data stream ingestion and persistence techniques | |
US10691716B2 (en) | Dynamic partitioning techniques for data streams | |
US8751763B1 (en) | Low-overhead deduplication within a block-based data storage | |
US10013317B1 (en) | Restoring a volume in a storage system | |
US9965394B2 (en) | Selective compression in data storage systems | |
US10635644B2 (en) | Partition-based data stream processing framework | |
US11245774B2 (en) | Cache storage for streaming data | |
US20190236008A1 (en) | Server-based persistence management in user space | |
CN107491523B (zh) | 存储数据对象的方法及装置 | |
US12067236B2 (en) | Data stability in data storage system | |
JP6406254B2 (ja) | ストレージ装置、データアクセス方法およびデータアクセスプログラム | |
CN114253908A (zh) | 键值存储系统的数据管理方法及其装置 | |
Xie et al. | Fleche: an efficient GPU embedding cache for personalized recommendations | |
CN117539915B (zh) | 一种数据处理方法及相关装置 | |
US20220342888A1 (en) | Object tagging | |
CN116048425B (zh) | 一种分层缓存方法、系统及相关组件 | |
US20150106884A1 (en) | Memcached multi-tenancy offload | |
US20210397581A1 (en) | Sparse file system implemented with multiple cloud services | |
CN113835613B (zh) | 一种文件读取方法、装置、电子设备和存储介质 | |
US10664442B1 (en) | Method and system for data consistency verification in a storage system | |
Qian et al. | FastCache: A client-side cache with variable-position merging schema in network storage system | |
US20200349090A1 (en) | Movement of stored data utilizing sizes of indexes in an indexing data structure | |
CN118394762B (zh) | 分布式存储系统的存储管理方法、设备、程序产品及介质 | |
Shen et al. | A unified storage system for whole-time-range data analytics over unbounded data | |
Shen et al. | A Distributed Caching Scheme for Improving Read-write Performance of HBase |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14831542 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2015529343 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14908161 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14831542 Country of ref document: EP Kind code of ref document: A1 |