CN103246498A - Memory storage structures supporting relational data parallel processing and achieving method thereof - Google Patents

Memory storage structures supporting relational data parallel processing and achieving method thereof Download PDF

Info

Publication number
CN103246498A
CN103246498A CN2013101742470A CN201310174247A CN103246498A CN 103246498 A CN103246498 A CN 103246498A CN 2013101742470 A CN2013101742470 A CN 2013101742470A CN 201310174247 A CN201310174247 A CN 201310174247A CN 103246498 A CN103246498 A CN 103246498A
Authority
CN
China
Prior art keywords
data
row
parallel processing
storage organization
supporting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013101742470A
Other languages
Chinese (zh)
Inventor
陆大鹏
孙立新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Group Shandong General Software Co Ltd
Original Assignee
Inspur Group Shandong General Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Group Shandong General Software Co Ltd filed Critical Inspur Group Shandong General Software Co Ltd
Priority to CN2013101742470A priority Critical patent/CN103246498A/en
Publication of CN103246498A publication Critical patent/CN103246498A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of memory storage structures, in particular to memory storage structures supporting relational data parallel processing and an achieving method thereof. The storage structures adopt a column storage mode, and one storage structure is composed of a plurality of columns. Each storage structure comprises a hidden column, a data type of the hidden column is a 32-digit integer, and the hidden column is used for recording a data version of a line where the hidden column is arranged so as conduct optimistic concurrency control. The memory storage structures supporting the relational data parallel processing and the achieving method have good performance on data operation such as data loading, read-write access, sorting, projection and connection and are especially suitable to conduct parallel calculation on multiple processors or a multi-core computer.

Description

A kind of memory structure and implementation method of supporting the relation data parallel processing
Technical field
The present invention relates to the memory technical field of structures, particularly a kind of memory structure and implementation method of supporting the relation data parallel processing.
Background technology
Along with the fast development of manufacturing process, the microprocessor frequency constantly promotes.But by 2005, when dominant frequency during near 4GHz, Intel company and AMD(advanced micro devices company) find that speed also can run into the oneself limit: that is exactly simple dominant frequency lifting, obvious Hoisting System overall performance.Dominant frequency promotes and causes power to increase, and heat dissipation problem also more and more becomes the obstacle that can't go beyond.
So microprocessor begins to the development of multinuclear aspect, i.e. integrated a plurality of complete calculating kernels in one piece of processor.Nowadays, adopt the computer of the little processing of multinuclear to be seen everywhere, a lot of mobile phones also all adopt double-core even four nuclears.
Also promote the development of parallel computing thereupon, improved processing speed to take full advantage of the multinuclear resource.A lot of programming languages have all increased the support to parallel computation, and original simple data structure is strengthened.But for the relationship type structure, support badly usually, cause a lot of problems:
1, storage organization does not design at parallel processing;
2, data load not adopt and walk abreast, and data manipulation performance is not ideal enough;
Add exclusive lock when 3, revising data, cause concurrent performance issue.
The present invention is directed to above problem and proposed a kind of effective solution.
Summary of the invention
In order to solve prior art problems, the invention provides a kind of memory structure and implementation method of supporting the relation data parallel processing, have good performance in data manipulations such as data loading, read and write access, ordering, projection, connections, be particularly useful for carrying out parallel computation at multiprocessor or multi-core computer.
The technical solution adopted in the present invention is as follows:
A kind of memory structure of supporting the relation data parallel processing, its structure adopts the row storage mode, and a storage organization is made up of a plurality of row, comprise the implicit row of row in each storage organization, data type is 32 integers, and the versions of data that record is expert at is used for optimistic concurrent control.
The present invention describes this storage organization from following three aspects:
1, structure is formed.
The row storage mode is adopted in this storage, and data are stored in the row (Column).A storage organization is made up of a plurality of row, is also referred to as table (Table).All comprise the implicit row that a row name is called Version in each storage organization, data type is 32 integers, and the versions of data that record is expert at is used for optimistic concurrent control.
For relational data, the data in the same row have identical data type, adopt the array mode to deposit.With respect to traditional mode that stores data on the row, row are stored in the modification storage organization, as increasing, being very easy to fast during delete columns.Per-column numerous operation as connecting and projection, also has good performance.On this basis, preferably use general type to be listed as to support various data types.As everyone knows, the mutual conversion between value type and the reference type performance issue that vanning is devaned occurs through regular meeting.Use general type row, carry out data type when being listed in initialization and specify, the data type of each row is fixed, and can not produce the performance issue that vanning is devaned for all operations that is listed as.
This storage organization also comprises the concept of row (Row), uses parallel worker thread technology to prepare data in advance, for each row increases a row index column, to add fastrunning traversal speed.Eliminate the inferior position of row visit data, reach the effect of similar line data visit.
Therefore, the developer can obtain and revise data by row.The line data that the developer obtains is a snapshot, can call capable Save method after the operation modification is saved to the row storage organization.
2, access interface.
This storage organization externally provides following access interface:
Data load: load data from data source;
List structure is revised: increase row, delete columns;
Reading and writing data: by the row read-write, by the row read-write.
3, the parallel support.
Row storage organization among the present invention has adopted parallel processing to improve performance in a plurality of operations.
At first, when data loaded, the row storage organization can be opened a thread for each row automatically, and a thread only is responsible for the data loading of row.From data source loaded data line data normally, also need to open a thread line data is split by row, the data after the fractionation are given columns according to loading asynchronous each row that are written into of thread.Data source loads data, line data splits and columns is that flowing water is parallel according to loading, can at utmost reduce the data load time.Because multi-threaded parallel is carried out, and can effectively reduce the data load time, when especially big data quantity loads.The data load time shortens, and has also reduced the taking for a long time of resources such as various databases, thread, internal memory, can effectively improve overall performance
Secondly, for relational data, access by row is commonplace.Therefore the row storage organization need be from each row peek, and combination is embarked on journey, and uses for outside.It is consuming time that this can increase operation usually, bigger to this operating influence of whole table traversal.By the row traversal time, storage organization is opened a thread automatically, in advance follow-up data combination is embarked on journey, and can effectively raise the efficiency like this.
Again, it is concurrent that this storage organization adopts the mode of optimistic lock to handle, the data consistency when guaranteeing concurrent modification.The capable snapshot data that the developer obtains has comprised the value of hidden columns Version, i.e. the version number of current data.The developer calls capable GetItem and SetItem and obtains and revise data in the capable snapshot, calls the Save method after finishing and preserves.The row storage organization can judge whether with in the structure whether Version train value in the capable snapshot equate with the Version of delegation, if equate that just the data of revising in the snapshot are incorporated into the row storage organization, and this row Version train value is added 1; If different, think then and revised by other thread that StaleDataException will dish out.CLIENT PROGRAM catch this unusual after, should read data again and handle again.Use optimistic lock to replace for Pessimistic Locking, can effectively solve problems such as performance, deadlock.
In sum, storage organization of the present invention adopts the row storage mode, and inside has encapsulated parallel algorithm, and optimistic lock concurrent processing mechanism is provided, and the access interface that is simple and easy to usefulness externally is provided simultaneously.
A kind of memory structure and implementation method of supporting the relation data parallel processing of the present invention, its data are stored in and list, rather than the storage be expert on, this mode has inborn advantage when revising list structure, operate very quick: increase row and remove row and do not relate to modification to data, the tentation data storage increases row on being expert at, and every data on the row need be made amendment with this.In addition, for reading of permutation data, the array data that only need read out whole row storage gets final product, and has avoided capable storage to read the problem of data line by line fully.Simultaneously, based on numerous operations of row storage, as connecting and projection, also has good performance.
Use general type to be listed as to support various data types.As everyone knows, the mutual conversion between value type and the reference type performance issue that vanning is devaned occurs through regular meeting.Use general type row, carry out data type when being listed in initialization and specify, the data type of each row is fixed, and can not produce the performance issue that vanning is devaned for all operations that is listed as.
Evaded .NET DataTable can only by the row traversal, can't concurrent operations drawback, when the algorithm of operations commonly used such as the loading of design data, calculating, association, ordering, projection, grouping, adopt the mentality of designing of parallel computation, give full play to multinuclear advantage and the performance of hardware.In addition, in inside data and parallel work-flow are packaged together, have both met OO development requires, reached the low coupling of high cohesion purpose again.
Use parallel worker thread technology to prepare data in advance, for each row increases a row index column, to add fastrunning traversal speed.Consider the visit data mode based on row, be a kind of inferior position to the row storage organization, therefore use the parallel worker thread data that are prepared in advance, increase a line index and be listed as to identify each and be listed as, in the hope of this inferior position is eliminated, reach the effect of similar line data visit.
Use is based on the optimism lock treatment mechanism of shaping versions of data row (Version row), that is: all there is the version of a shaping in the data of each row, handle the concurrent problem that the exclusive lock that this mode applies when effectively having avoided data manipulation causes during data access based on the current version of taking.
The beneficial effect that technical scheme provided by the invention is brought is:
Relational data is stored in the form of array and lists, and whole table is made up of a plurality of row; The general type of inner use is listed as to support various data types, to reduce the performance consumption that vanning is devaned and brought; Evaded .NET DataTable can only by the row traversal, can't concurrent operations drawback.In the notebook data inside configuration, to operations such as the loading of data, calculating, association, ordering, projection, groupings, on algorithm based on the parallel computation mode, in the hope of obtaining best performance; Use parallel worker thread technology to prepare line data in advance, to accelerate the speed by the row traversal; Handle concurrent access conflict based on optimistic lock mechanism, guarantee data consistency.
  
Description of drawings
Fig. 1 is a kind of uml class figure that supports memory structure and the implementation method of relation data parallel processing of the present invention;
Fig. 2 is employee's data rows storage synoptic diagram of the embodiment 1 of a kind of memory structure of supporting the relation data parallel processing of the present invention and implementation method.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, embodiment of the present invention is described further in detail below in conjunction with accompanying drawing.
Embodiment one
As shown in Figure 1, a kind of memory structure of supporting the relation data parallel processing, its structure adopts the row storage mode, a storage organization is made up of a plurality of row, comprise the implicit row of row in each storage organization, data type is 32 integers, and the versions of data that record is expert at is used for optimistic concurrent control.
In the present embodiment, use general type to be listed as to support various data types.
In the present embodiment, storage organization also comprises capable concept, uses parallel worker thread technology to prepare data in advance, for each row increases a row index column.
The storage organization of present embodiment externally provides following access interface:
Data load: be used for loading data from data source;
List structure is revised: for increasing row, delete columns;
Reading and writing data: be used for by the row read-write, by the row read-write.
As shown in Figure 2, use to concern storage organization as carrier among the present invention, one typical use as follows:
Exploit person is write the code execution data inquiry, gets access to the DbDataReader that the employee shows Employee, and the developer passes to this DbDataReader the Load method of the example of this relation storage organization.Invoke code is as follows:
var?table?=?new?DsmTable();
table.AddColumn(“ID”,typeof(string));
table.AddColumn(“Name”,typeof(string));
table.AddColumn(“Sex”,typeof(int));
table.AddColumn(“age”,typeof(int));
table.AddColumn(“Department”,typeof(string));
using(var?reader?=?db.ExcuteReader(“select?*?from?employee”))
table.Load(reader);
Before data loaded, each Lieque of whole table reserved data type separately, and namely the data type of each row in the table is fixed.In the inner parallel computation characteristic that provides among the .NET, each the columns certificate of loaded in parallel used of Load method; So, can effectively reduce the data load time, reduce the performance consumption that the data vanning is devaned and brought.Data storage after the loading as shown in Figure 2.
The developer writes code and reads employee information line by line, and the inner thread that starts automatically of storage organization example is merged into row with the columns certificate in advance, has effectively eliminated the shortcoming of row storage mode access by row, has improved execution speed.Traversal visit code sample is as follows:
foreach(var?row?in?table.Rows){
//…
}
Developer A and developer B have obtained first line data from same storage organization, the Version train value is 1.It is NewYork that A revises the City train value, and it is Tokyo that B revises the City train value.A calls capable Save and preserves data, Version value in the row that storage organization inspection A holds is identical with the Version of first row in the storage organization, storage organization is incorporated into storage organization first row with the modification of A, and the storage organization first line data Version value is revised as 2, preserves successfully.B calls Save and preserves modification subsequently, and the storage organization inspection finds that the Version value is different, and the StaleDataException that dishes out is unusual.B captures that this is unusual, knows that data are modified, need obtain data and handle again.By the mode of this optimistic lock, can effectively handle concurrently, avoided the variety of problems that externally locks and bring simultaneously.Concurrent processing uses code as follows:
try{
row[“City”]?=?“Tokyo”;
//…
row.Save();
}
catch(StaleDataException){
// eject prompt window at UI, inform that user data is modified
}
At last, by above description as seen, this storage organization uses the row storage organization, can carry out data fast and load, and effectively reduces the performance consumption that the data vanning is devaned and brought; The parallel processing that it is inner can effectively improve the operating performance of relation data; The optimistic lock mechanism that adopts during data manipulation has been avoided the outside concurrent problem that locks and bring to a great extent; Simultaneously, the simple interface that externally provides has also reduced the requirement to technology to the developer.
The above only is preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, is equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1. memory structure of supporting the relation data parallel processing, its structure adopts row storage mode, and a storage organization is made up of a plurality of row, comprise the implicit row of row in each storage organization, data type is 32 integers, and the versions of data that record is expert at is used for optimistic concurrent control.
2. a kind of memory structure of supporting the relation data parallel processing according to claim 1 is characterized in that, uses general type to be listed as to support various data types.
3. a kind of memory structure of supporting the relation data parallel processing according to claim 2, it is characterized in that, described storage organization also comprises capable concept, uses parallel worker thread technology to prepare data in advance, for each row increases a row index column.
4. according to any described a kind of memory structure of supporting the relation data parallel processing among the claim 1-3, it is characterized in that described storage organization externally provides following access interface:
Data load: be used for loading data from data source;
List structure is revised: for increasing row, delete columns;
Reading and writing data: be used for by the row read-write, by the row read-write.
5. implementation method of supporting the memory structure of relation data parallel processing may further comprise the steps:
When data load, the row storage organization can be opened a thread for each row automatically, a thread only is responsible for the data loading of row, if be line data from the data source loaded data, also need to open a thread line data is split by row, the data after the fractionation are given columns according to loading asynchronous each row that are written into of thread.
6. a kind of implementation method of supporting the memory structure of relation data parallel processing according to claim 5, it is characterized in that, for relational data, by the row traversal time, storage organization is opened a thread automatically, in advance follow-up data combination is embarked on journey.
7. a kind of implementation method of supporting the memory structure of relation data parallel processing according to claim 5 is characterized in that, it is concurrent that described storage organization adopts the mode of optimistic lock to handle, the data consistency when guaranteeing concurrent modification.
8. a kind of implementation method of supporting the memory structure of relation data parallel processing according to claim 7 is characterized in that, the described mode that adopts optimistic lock is handled and concurrently specifically be may further comprise the steps:
A, obtain capable snapshot data, described capable snapshot data has comprised the value of hidden columns, i.e. the version number of current data;
B, call capable GetItem and SetItem and obtain and revise data in the capable snapshot, call the Save method after finishing and preserve;
With in the structure whether the value that C, row storage organization judge hidden columns in the row snapshot equate with the value of the hidden columns of delegation, if equal, the data of revising in the snapshot be incorporated into the row storage organization, and the value of this row hidden columns is added 1; If different, then think and revised by other thread, the StaleDataException that will dish out, CLIENT PROGRAM catch this unusual after, read data again and handle again.
CN2013101742470A 2013-05-13 2013-05-13 Memory storage structures supporting relational data parallel processing and achieving method thereof Pending CN103246498A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013101742470A CN103246498A (en) 2013-05-13 2013-05-13 Memory storage structures supporting relational data parallel processing and achieving method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013101742470A CN103246498A (en) 2013-05-13 2013-05-13 Memory storage structures supporting relational data parallel processing and achieving method thereof

Publications (1)

Publication Number Publication Date
CN103246498A true CN103246498A (en) 2013-08-14

Family

ID=48926033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013101742470A Pending CN103246498A (en) 2013-05-13 2013-05-13 Memory storage structures supporting relational data parallel processing and achieving method thereof

Country Status (1)

Country Link
CN (1) CN103246498A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573833A (en) * 2014-12-31 2015-04-29 普天新能源有限责任公司 Charging ordering method
CN106294683A (en) * 2016-08-05 2017-01-04 中国银行股份有限公司 A kind of file declustering method and device
CN108182599A (en) * 2017-12-27 2018-06-19 五八有限公司 One kind is registered bonusing method, equipment and computer readable storage medium
CN104750727B (en) * 2013-12-30 2019-03-26 沈阳亿阳计算机技术有限责任公司 A kind of column memory storage inquiry unit and column memory storage querying method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05108431A (en) * 1991-10-11 1993-04-30 Hitachi Ltd Quick access method for relational data base in hierarchical structure
CN102129458A (en) * 2011-03-09 2011-07-20 胡劲松 Method and device for storing relational database
CN102156714A (en) * 2011-03-22 2011-08-17 清华大学 Method for realizing self-adaptive vertical divided relational database and system thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05108431A (en) * 1991-10-11 1993-04-30 Hitachi Ltd Quick access method for relational data base in hierarchical structure
CN102129458A (en) * 2011-03-09 2011-07-20 胡劲松 Method and device for storing relational database
CN102156714A (en) * 2011-03-22 2011-08-17 清华大学 Method for realizing self-adaptive vertical divided relational database and system thereof

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750727B (en) * 2013-12-30 2019-03-26 沈阳亿阳计算机技术有限责任公司 A kind of column memory storage inquiry unit and column memory storage querying method
CN104573833A (en) * 2014-12-31 2015-04-29 普天新能源有限责任公司 Charging ordering method
CN106294683A (en) * 2016-08-05 2017-01-04 中国银行股份有限公司 A kind of file declustering method and device
CN108182599A (en) * 2017-12-27 2018-06-19 五八有限公司 One kind is registered bonusing method, equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
JP5006382B2 (en) Declarative model for concurrency control of lightweight threads
US10885015B2 (en) Database system transaction management
US10942915B2 (en) Method and apparatus for database
Gu et al. Biscuit: A framework for near-data processing of big data workloads
Balasubramonian et al. Near-data processing: Insights from a micro-46 workshop
Alexandrov et al. The stratosphere platform for big data analytics
KR102549994B1 (en) Systems and methods for performing data processing operations using variable level parallelism
US20110185359A1 (en) Determining A Conflict in Accessing Shared Resources Using a Reduced Number of Cycles
Wang et al. Towards efficient solutions of bitruss decomposition for large-scale bipartite graphs
Cheung et al. New directions in cloud programming
CN101980213A (en) J2EE-based data persistence method and system
CN103246498A (en) Memory storage structures supporting relational data parallel processing and achieving method thereof
WO2023082644A1 (en) Network model processing method and apparatus, and device, storage medium and computer program product
WO2020221170A1 (en) Fractal calculating device and method, integrated circuit and board card
Dann et al. Non-relational databases on FPGAs: Survey, design decisions, challenges
CN109344296A (en) Realize domain life cycle control method, system, server and the storage medium of the HASH key of Redis
US11620311B1 (en) Transformation of directed graph into relational data
Perera et al. A fast, scalable, universal approach for distributed data aggregations
CN109753533A (en) A kind of multi-source relevant database client development approach and device
Watkins et al. Automatic and transparent I/O optimization with storage integrated application runtime support
US20140136471A1 (en) Rapid Provisioning of Information for Business Analytics
KR20210025024A (en) System and method for data flow graph optimization
Abeykoon et al. HPTMT Parallel Operators for High Performance Data Science and Data Engineering
CN112534401B (en) System and method for dataflow graph optimization
Lang et al. Interfaces for coordinated access in the file system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130814

WD01 Invention patent application deemed withdrawn after publication