CN108920110A

CN108920110A - A kind of parallel processing big data storage system and method calculating mode based on memory

Info

Publication number: CN108920110A
Application number: CN201810826423.7A
Authority: CN
Inventors: 吴勇
Original assignee: Hunan Mechanical and Electrical Polytechnic
Current assignee: Hunan Mechanical and Electrical Polytechnic
Priority date: 2018-07-25
Filing date: 2018-07-25
Publication date: 2018-11-30

Abstract

The invention belongs to information retrieval and its database structure technical fields, disclose a kind of parallel processing big data storage system and method for calculating mode based on memory, the present invention is based on novel storage level memories and the novel mixing memory hierarchy of traditional DRAM design, memory size is substantially improved under the premise of keeping cost and energy demand advantages, making to calculate can not only carry out on DRAM memory, it can also be carried out on SCM, a kind of data-centered tupe based on mixing memory architecture is provided for big data processing, significantly promotes the timeliness of big data processing.Large capacity mixing memory hierarchy is constructed based on novel non-volatile memory devices to accelerate the mode of data processing, to significantly promote the timeliness of big data processing, referred to as memory is calculated.From the point of view of architecturally, the appearance of memory calculating mode provides strong timeliness, high-performance, the high architecture handled up for big data processing and supports to bring possibility.

Description

A kind of parallel processing big data storage system and method calculating mode based on memory

Technical field

The invention belongs to information retrieval and its database structure technical fields, more particularly to one kind to calculate mode based on memory Parallel processing big data storage system and method.

Background technique

Currently, the prior art commonly used in the trade is such：Big data brings the challenge of 4V：Scale, data volume are more next It is bigger, from terabyte grade to 10,000,000 hundred million byte levels even to 10,000,100,000,000 byte ranks；Type, data class is various, both wraps Include traditional structural data includes the unstructured datas such as text, video, picture and audio again, and unstructured number According to specific gravity quickling increase；Value, data value density are low, it is difficult to carry out the meters such as forecast analysis, operation intelligence, decision support It calculates；The speed issue of speed, big data processing is more prominent, and timeliness is difficult to ensure.All in all, the challenge of big data processing Caused by contradiction substantially as the problem of the processing capacity of rural IT application and data processing between scale.Big data institute table The features such as incremental velocity revealed is fast, temporal locality is low objectively exacerbates contradictory evolution, so that centered on calculating Traditional mode is faced with that memory size is limited, input/output (I/O) pressure is big, cache hit rate is low, the bulking property of data processing The low lot of challenges of energy, it is difficult to the optimum balance of performance, energy consumption and cost is obtained, so that current computer system can not be effective The big data of PB grades of processing or more.In terms of distributed system, people towards big data processing propose with MapReduce (or Hadoop) framework etc. solves this problem.MapReduce is based on key assignments by providing two function processing of Map and Reduce (key-value) data that mode stores can simply and easily obtain good scalability and fault-tolerant in distributed system Property.However MapReduce needs to obtain data from disk, then intermediate result data is write back disk, setting based on disk For meter so that its efficiency is lower, I/O expense is very big, is not suitable for the application with online and real-time demand.Pass through multiple nodes Although big data processing facing challenges can be alleviated by handling data simultaneously, its distributed system is mainly with coarse grain parallelism Based on, do not give full play to the resource capability of existing computing unit.It can be seen that being all to the optimization of big data processing at present Based on traditional memory-disk access mode, although taking various ways carries out certain optimization, the pass of data processing Key " data I/O bottleneck " always exists.

In conclusion problem of the existing technology is：

(1) PB grades or more of big data can not be effectively treated in current computer system.For PB grades of data, memory headroom It is limited, it needs constantly to influence the efficiency of data processing from the external memory exchange page to memory.Present big data system, although logical It crosses the technological means such as fragment, Mapper, Reducer to be handled to decompose big data, but due to memory-limited, in treatment process In, being related to data will constantly read and write from external memory, and big data processing is caused to be had a greatly reduced quality in terms of real-time.

(2) MapReduce needs to obtain data from disk, calculates the data generated by mapper, will not write direct Disk, but memory is first written, disk can be just written by reaching certain amount.That is, when the data volume of processing is excessive, it will incite somebody to action Intermediate result data writes back disk, and for the design based on disk so that its efficiency is lower, I/O expense is very big, is not suitable for having There is online and real-time demand application.

Solve the difficulty and meaning of above-mentioned technical problem：

1, how to coordinate the unified addressing and application of novel storage SCM and DRAM.After regarding them as an entirety, how It is addressed,

2, how the different data of write operation frequency are stored in SCM and DRAM respectively.Due to SCM only reading rate It is suitable with DRAM, and writing rate then differs 10~100 times or more, and SCM will cause permanent failure when writing millions of secondary.How The number write is reduced, maximizes favourable factors and minimizes unfavourable ones, becomes the difficult point of the program.

3, in this mixing memory hierarchy, how to guarantee reading data, the accuracy of write-in.

By designing Novel internal memory SCM and DRAM mixing memory counting system, on the one hand increases memory size, improve and calculate Efficiency, reduce power consumption, be on the other hand avoided that data power down lose, to data have protective effect.

Summary of the invention

In view of the problems of the existing technology, the present invention provides a kind of big numbers of parallel processing for calculating mode based on memory According to storage system and method.

The invention is realized in this way a kind of parallel processing big data storage method for calculating mode based on memory, described The parallel processing big data storage method of calculating mode is by novel storage level memory SCM and DRAM empir-ical formulation one based on memory Block is collectively as memory, and SCM mainly stores initial data for read operation, and DRAM is then used to store the number of frequent read-write operation According to；

It is described based on memory calculate mode parallel processing big data storage method include：By Novel internal memory SCM and DRAM Unified addressing, the initial data read in from external memory are stored in SCM, by the intermediate data frequently read and write in program operation process and It verifies in data deposit DRAM, the intermediate data of write-in is then written in SCM when reaching certain amount；

The data processing of parallel processing big data storage method for calculating mode based on memory further comprises：

(1) if being arbitrarily G by the binary system encoder matrix that " 0,1 " determines_r·m, G_r·mServe as reasons " 0,1 " composition two into Matrix processed, matrix are embodied as generating redundant data：

(2) according to the row vector l of binary coded matrix₁, l₂..., l_r·mIn the number of " 1 " determine according to the vector Required XOR calculation times when check bit are calculated, and calculate any two vectors l_a, l_bBetween different digit；

(3) if vector l_aMiddle element is that the digit of " 1 " is k, then system carries out generating redundant data needs using the vector Carry out k-1 XOR operation.

Further, the parallel processing big data storage method for calculating mode based on memory is directed to entire encoder matrix G_r·mTo original document carry out coding calculating optimization method include：

(1) according to G in encoder matrix_r·mEach row vector in " 1 " number, determine according to the row vector calculate school XOR number required for position is tested, the number of " 1 " is marked with k in row vector, then is calculated required for check bit using the row vector XOR number be (k-1) m, wherein m be it is each participate in verification calculate original data block size；

(2) compare the element identical bits in encoder matrix between any two row vector and the number of element difference position, remember For (e/d), wherein e indicates the identical position number of element in two vectors；D indicates the position number that element is different in two vectors；

(3) if a certain row vector l_iXOR number required for (1≤i≤rm) is less than or equal in step B not isotopic number D, then directly according to the vector calculate the row corresponding to verification data block, and the vector is denoted as l_j；

(4) the vector l for utilizing (3) to determine_j, according to digit identical in step B and not the ratio between isotopic number, determine next meter Row vector is calculated, as certain row vector l_kWith vector l_jIsotopic number is not less than identical digit, and l_kWith vector l_jIsotopic number is not each with remaining A vector is not when isotopic number reaches minimum, then according to vector l_jThe verification data that have calculated that are calculated by l_kDetermining check number According to；

(5) check bit is not calculated if still having, according to (4) computation rule, with l_kFor basic vector, find next to be calculated Vector, and return to (4)；

(6) complete verification position calculating process whether is had determined that, if so, check bit successively calculating process is saved, if it is not, then It is calculated according to original corresponding relationship.

Further, the storage of the inspection data includes with index process method：

(1) first by every a line, i.e., attribute-name of one record using major key as rowkey is as column family name, all column families All only one is arranged, and column name is fixed, and attribute value is stored as train value into HBase；By each attribute storage a to column family In, when verification rule is related to matching a certain attribute value according to major key, incoherent attribute value is read into；

(2) the line unit format of concordance list concordance list is established using the attribute field value that verification rule is related to as rowkey again For { main table index train value }, the value format of concordance list is { main table row key 1, main table row key 2 ... }；Each main table row key is made It is stored for a column name, when needing to increase a main table row key, it is only necessary to increase by a column, when verification rule involves a need to basis When some attribute value matches other attribute values, quickly finds all records with same alike result value and verified；

(3) based on the concordance list of timestamp, the data rapidly inquired in Fixed Time Interval are verified, when line unit is Between stab, key assignments is major key, and full dose data processing is that data when carrying out quality indicator for the high-volume data of historical accumulation are deposited Storage and index process process, input data full dose data processing data storage identical as incremental data processing and index process mistake Cheng Shi：After incrementally data and index are poured into HBase by data storage and indexing means, while extracting full dose verification rule Then relevant attribute field is stored into HDFS index file.

Further, the multiplexing querying method of the frequent read-write operation of storage includes：

Data warehouse D={ S₁, before loading object table T, calculated according to Schema Matching () algorithm and Filter () Method obtains triple M '；Then the relevant information of M ' is saved in RTable table, last item Dataload_Reusing () algorithm completes the loading of initial data；

(1) match query knows A ' in T by inquiry RTable table₄It can not find multiplexed information, and A '₁, A '₂, A '₃? It can be in S₁In find reusable data；

(2) query rewrite, first it is ensured that query statement is of equal value, Q₁In for A '₁> const₁；A′₁=A₁₁, i.e., two column Data equivalent；Alternative condition is without changing, for A '₂> const₂, A '₂With A₁₂Data between there are transformational relation, choosing Select whether condition needs to change the source for depending on target data；Sum1, sum2 are respectively T and S₁Record sum；If sum1 =sum2, A '₂Multiplexing A completely₁₂；Q ' is rewritten as according to the former query statement of f '₁；SELEC T A′₁, A '₂, A '₃, A '₄FROM T WHERE A′₁> const₁ AND A′₂> (const₂/0.1)；Otherwise as sum1 > sum2；Data source is in A₁₂In can The data imported outside reusing data collection；For the data that outside imports, alternative condition is A '₂> const₂, look into Q₁Ask sentence not Become, is Q ' for reusable data query rewrite₁；

(3) query execution, for being multiplexed completely, directly execution Q₁；Inquiry according to the difference of data source be broken down by< col′_t, col '_s>The start-stop data block that two parts data in each multiplexing relationship are obtained in corresponding blk_id_list, for can Reusing data executes Q '₁；The data imported for outside still carry out Q₁；

(4) result is integrated, and the data item for meeting condition in reusable data is read, according to col '_t=f (col '_s) respectively into Row conversion finally integrates the data after converting and exports final result.

Another object of the present invention is to provide the parallel processing big datas for calculating mode described in a kind of realize based on memory The parallel processing big data storage system for calculating mode based on memory of storage method, it is described to calculate the parallel of mode based on memory Handling big data storage system includes：

User program module is connect with multicore module, memory modules, disk module respectively, the data for that will handle into Row output；

Disk module is connect with memory modules, and memory modules obtain data from disk, i.e., based on traditional memory-disk Access module；

Memory modules are connect with multicore module, for handling the data of storage by multicore module.

Another object of the present invention is to provide the parallel processing big datas for calculating mode described in a kind of realize based on memory The computer program of storage method.

Another object of the present invention is to provide the parallel processing big datas for calculating mode described in a kind of realize based on memory The information data processing terminal of storage method.

Another object of the present invention is to provide a kind of computer readable storage mediums, including instruction, when it is in computer When upper operation, so that computer executes the parallel processing big data storage method for calculating mode based on memory.

In conclusion advantages of the present invention and good effect are：The present invention is based on novel storage level memory (Storage Class Memory, SCM) and the novel mixing memory hierarchy of traditional DRAM design, under the premise of keeping cost and energy demand advantages Memory size is substantially improved, making to calculate can not only carry out on DRAM memory, can also carry out on SCM, be at big data Reason provide it is a kind of based on mixing memory architecture data-centered tupe, significantly promoted big data processing when Effect property.One is provided effectively based on novel memory devices part and traditional novel mixing memory hierarchy of DRAM design for big data processing Support technology；Large capacity mixing memory hierarchy is constructed based on novel non-volatile memory devices to accelerate data processing Mode, to significantly promote the timeliness of big data processing, referred to as memory is calculated.From the point of view of architecturally, memory is calculated The appearance of mode provides strong timeliness, high-performance, the high architecture handled up for big data processing and supports to bring possibility.Based on interior The parallel processing system (PPS) for depositing calculating mode is mainly faced with the challenge of three key technical problems：Isomery collaboration, efficient parallel and Adaptive scheduling management.Isomery collaboration refers to how architecture and operating system level realize isomery level memory hierarchy Coordinated management, transparent service data processing back-up environment；Efficient parallel refers in programming model and parallel processing level how It calculates based on memory, realizes the efficient parallel processing environment of big data；Adaptive scheduling management refers to that memory calculates parallel ring It, can be dynamically using different suitable how according to calculate node structure and characteristic in border, and the characteristics of application data processing Resource scheduling management strategy, with realize big data parallel processing system (PPS) resource load stabilization and efficiently utilize.

Compared with prior art, the present invention optimizing cataloged procedure, the reduction of cataloged procedure calculation amount can be realized.It is depositing It, can be according to the characteristics of each row vector, changing original check number in encoder matrix when storage system carries out code storage to data According to the calculating order of block, and then reduce the calculation times of cataloged procedure；It is carried out using method proposed by the present invention to encoder matrix Optimization after calculating order, can store in a computer, in later each calculating, can be according to the optimization after Rule is calculated；Cataloged procedure optimization method proposed by the present invention can be suitable for all binary matrixs, particularly, should Method can be adapted for any correlated process calculated based on binary matrix, be applicable not only to coding when data storage Process is applied also for when dropout of data block, carries out the process of data reconstruction to loss data block using binary system check matrix, Value with popularization and use.

The processing method of inspection data of the invention the full dose data of marketing table and GIS have been carried out based on HDFS index and Single node that graftabl is handled, the checking experiment of single gauge then, test result 42s, wherein from HDFS by whole HDFS Index data graftabl spends 40s, scans through into full dose verification only time-consuming 2s in memory；And it is existing based on database For data check production system when executing full dose verification, single gauge will then spend about 40min；Confirmatory system based on Hadoop platform When system carries out GIS and marketing table full dose data sheet rule verification, the existing production system based on database of performance comparable is mentioned If high about more than 50 times set a certain number of Hadoop nodes and realize that the parallelization of more rules full dose data check executes, even if Share and access HDFS generates certain performance decline when more rules execute, it is contemplated that total full dose data check time will compare mesh The preceding production system based on database at least improves an order of magnitude.

Detailed description of the invention

Fig. 1 is that the parallel processing big data memory system architecture provided in an embodiment of the present invention for calculating mode based on memory is shown It is intended to；

In figure：1, user program module；2, multicore module；3, memory modules；4, disk module.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.

Present invention seek to address that PB grades or more of big data can not be effectively treated in current computer system；MapReduce Need to obtain data from disk, then intermediate result data write back into disk, the design based on disk so that its efficiency is lower, I/O expense is very big, is not suitable for the application with online and real-time demand.The present invention architecturally from the point of view of, memory meter The appearance of calculation mode provides strong timeliness, high-performance, the high architecture handled up for big data processing and supports to bring possibility.

Application principle of the invention is explained in detail with reference to the accompanying drawing.

As shown in Figure 1, the parallel processing big data storage system provided in an embodiment of the present invention for calculating mode based on memory Including：

User program module 1 is connect with multicore module 2, memory modules 3, disk module 4 respectively, the data for will handle It is exported.

Disk module 4 is connect with memory modules 3, and memory modules 3 need to obtain data from disk, i.e., based in traditional Deposit-disk access mode.

Memory modules 3 are connect with multicore module 2, for handling the data of storage by multicore module 2.

The storage method of the of the invention parallel processing big data storage system for calculating mode based on memory is：Design will be new The Novel internal memory system of type storage level memory SCM and tradition DRAM mixing.

It is provided in an embodiment of the present invention based on memory calculate mode parallel processing big data storage system and method include： By novel storage level memory SCM and DRAM empir-ical formulation at one piece collectively as memory, SCM mainly stores initial data for reading Operation, DRAM are then used to store the data of frequent read-write operation；

It is described based on memory calculate mode parallel processing big data storage method include：By Novel internal memory SCM and DRAM Unified addressing, the initial data read in from external memory are stored in SCM, by the intermediate data frequently read and write in program operation process and It verifies in data deposit DRAM, the intermediate data of write-in is then written in SCM when reaching certain amount, promotes the effect of data processing Rate.

(2) query rewrite, first it is ensured that query statement is of equal value, Q₁In for A '₁> const₁；A′₁=A₁₁, i.e., two column Data equivalent；Alternative condition is without changing, for A '₂> const₂, A '₂With A₁₂Data between there are transformational relation, choosing Select whether condition needs to change the source for depending on target data；Sum1, sum2 are respectively T and S₁Record sum；If sum1 =sum2, A '₂Multiplexing A completely₁₂；Q ' is rewritten as according to the former query statement of f '₁；SELECT A′₁, A '₂, A '₃, A '₄FROM T WHERE A′₁> const₁ A ND A′₂> (con st₂/0.1)；Otherwise as sum1 > sum2；Data source is in A₁₂In The data imported outside reusable data set；For the data that outside imports, alternative condition is A '₂> const₂, look into Q₁Ask sentence It is constant, it is Q ' for reusable data query rewrite₁；

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When using entirely or partly realizing in the form of a computer program product, the computer program product include one or Multiple computer instructions.When loading on computers or executing the computer program instructions, entirely or partly generate according to Process described in the embodiment of the present invention or function.The computer can be general purpose computer, special purpose computer, computer network Network or other programmable devices.The computer instruction may be stored in a computer readable storage medium, or from one Computer readable storage medium is transmitted to another computer readable storage medium, for example, the computer instruction can be from one A web-site, computer, server or data center pass through wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL) Or wireless (such as infrared, wireless, microwave etc.) mode is carried out to another web-site, computer, server or data center Transmission).The computer-readable storage medium can be any usable medium or include one that computer can access The data storage devices such as a or multiple usable mediums integrated server, data center.The usable medium can be magnetic Jie Matter, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. a kind of parallel processing big data storage method for calculating mode based on memory, which is characterized in that described to count based on memory The parallel processing big data storage method of calculation mode by novel storage level memory SCM and DRAM empir-ical formulation one piece collectively as Memory, SCM mainly store initial data for read operation, and DRAM is then used to store the data of frequent read-write operation；

It is described based on memory calculate mode parallel processing big data storage method include：Novel internal memory SCM and DRAM is unified Addressing, the initial data read in from external memory is stored in SCM, by the intermediate data frequently read and write in program operation process and verification Data are stored in DRAM, and the intermediate data of write-in is then written in SCM when reaching certain amount；

(1) if being arbitrarily G by the binary system encoder matrix that " 0,1 " determines_r·m, G_r·mThe binary system square for " 0,1 " composition of serving as reasons Battle array, matrix are embodied as generating redundant data：

(2) according to the row vector l of binary coded matrix₁, l₂..., l_r·mIn " 1 " number determine according to the vector calculate school Required XOR calculation times when testing, and calculate any two vectors l_a, l_bBetween different digit；

(3) if vector l_aMiddle element be " 1 " digit be k, then system using the vector carry out generation redundant data need to carry out k- 1 XOR operation.

2. calculating the parallel processing big data storage method of mode based on memory as described in claim 1, which is characterized in that institute The parallel processing big data storage method for calculating mode based on memory is stated for entire encoder matrix G_r·mOriginal document is compiled Code calculate optimization method include：

(1) according to G in encoder matrix_r·mEach row vector in " 1 " number, determine according to the row vector calculate check bit Required XOR number, the number of " 1 " is marked with k in row vector, then is calculated required for check bit using the row vector XOR number is (k-1) m, and wherein m is each size for participating in the original data block that verification calculates；

(2) compare the element identical bits in encoder matrix between any two row vector and the number of element difference position, be denoted as (e/ D), wherein e indicates the identical position number of element in two vectors；D indicates the position number that element is different in two vectors；

(4) the vector l for utilizing (3) to determine_j, according to digit identical in step B and not the ratio between isotopic number, determine next calculating row Vector, as certain row vector l_kWith vector l_jIsotopic number is not less than identical digit, and l_kWith vector l_jNot isotopic number and remaining it is each to Amount is not when isotopic number reaches minimum, then according to vector l_jThe verification data that have calculated that are calculated by l_kDetermining verification data；

(5) check bit is not calculated if still having, according to (4) computation rule, with l_kFor basic vector, next vector to be calculated is found, And return to (4)；

(6) whether have determined that complete verification position calculating process, if so, save check bit successively calculating process, if it is not, then according to Original corresponding relationship is calculated.

3. calculating the parallel processing big data storage method of mode based on memory as claimed in claim 2, which is characterized in that institute The storage and index process method for stating inspection data include：

(1) first by every a line, i.e., attribute-name of one record using major key as rowkey is as column family name, and all column families are all only There are a column, column name is fixed, and attribute value is stored as train value into HBase；By the storage of each attribute into a column family, When verification rule is related to matching a certain attribute value according to major key, incoherent attribute value is read into；

(2) it is { main for the attribute field value that verification rule is related to being established the line unit format of concordance list concordance list as rowkey again Table index train value }, the value format of concordance list is { main table row key 1, main table row key 2 ... }；Using each main table row key as one Column name storage, when needing to increase a main table row key, it is only necessary to increase by a column, when verification rule involves a need to belong to according to some When property value matches other attribute values, quickly finds all records with same alike result value and verified；

(3) based on the concordance list of timestamp, the data rapidly inquired in Fixed Time Interval are verified, and line unit is the time Stamp, key assignments are major key, data storage when full dose data processing is the high-volume data progress quality indicator for historical accumulation With index process process, input data full dose data processing data storage identical as incremental data processing and index process process It is：After incrementally data and index are poured into HBase by data storage and indexing means, while extracting full dose verification rule Relevant attribute field is stored into HDFS index file.

4. calculating the parallel processing big data storage method of mode based on memory as claimed in claim 2, which is characterized in that institute It states and stores the multiplexing querying method of frequent read-write operation and include：

Data warehouse D={ S₁, before loading object table T, according to SchemaMatching () algorithm and Filter () algorithm, obtain Triple M '；Then the relevant information of M ' is saved in RTable table, last item Dataload_Reusing () algorithm, Complete the loading of initial data；

(1) match query knows A ' in T by inquiry RTable table₄It can not find multiplexed information, and A '₁, A '₂, A '₃It can be S₁In find reusable data；

(2) query rewrite, first it is ensured that query statement is of equal value, Q₁In for A '₁> const₁；A′₁=A₁₁, i.e. two column datas Equivalent；Alternative condition is without changing, for A '₂> const₂, A '₂With A₁₂Data between there are transformational relation, selector bars Whether part needs to change the source for depending on target data；Sum1, sum2 are respectively T and S₁Record sum；If sum1= Sum2, A '₂Multiplexing A completely₁₂；Q ' is rewritten as according to the former query statement of f '₁；SELECT A′₁, A '₂, A '₃, A '_tFROM T WHERE A′₁> const₁ AND A′₂> (const₂/0.1)；Otherwise as sum1 > sum2；Data source is in A₁₂In reusable number According to the external data imported of collection；For the data that outside imports, alternative condition is A '₂> const₂, look into Q₁It is constant to ask sentence, it is right In reusable data query rewrite be Q '₁；

(3) query execution, for being multiplexed completely, directly execution Q₁；Inquiry according to the difference of data source be broken down by<col ′_t, col '_s>The start-stop data block that two parts data in each multiplexing relationship are obtained in corresponding blk_id_list, for that can weigh Q ' is executed with data₁；The data imported for outside still carry out Q₁；

(4) result is integrated, and the data item for meeting condition in reusable data is read, according to col '_t=f (col '_s) turned respectively Change, the data after finally integration conversion simultaneously export final result.

5. a kind of realize calculates the parallel processing big data storage method of mode based on memory based on memory described in claim 1 The parallel processing big data storage system of calculating mode, which is characterized in that the parallel processing for calculating mode based on memory is big Data-storage system includes：

User program module is connect with multicore module, memory modules, disk module respectively, and the data for that will handle carry out defeated Out；

Disk module is connect with memory modules, and memory modules obtain data from disk, i.e., based on traditional memory-disk access Mode；

6. a kind of realize the calculating for calculating the parallel processing big data storage method of mode described in Claims 1 to 4 based on memory Machine program.

7. a kind of realize the information for calculating the parallel processing big data storage method of mode described in Claims 1 to 4 based on memory Data processing terminal.

8. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer is executed as weighed Benefit require 1~4 described in based on memory calculate mode parallel processing big data storage method.