CN103246498A

CN103246498A - Memory storage structures supporting relational data parallel processing and achieving method thereof

Info

Publication number: CN103246498A
Application number: CN2013101742470A
Authority: CN
Inventors: 陆大鹏; 孙立新
Original assignee: Inspur Group Shandong General Software Co Ltd
Current assignee: Inspur Group Shandong General Software Co Ltd
Priority date: 2013-05-13
Filing date: 2013-05-13
Publication date: 2013-08-14

Abstract

The invention relates to the technical field of memory storage structures, in particular to memory storage structures supporting relational data parallel processing and an achieving method thereof. The storage structures adopt a column storage mode, and one storage structure is composed of a plurality of columns. Each storage structure comprises a hidden column, a data type of the hidden column is a 32-digit integer, and the hidden column is used for recording a data version of a line where the hidden column is arranged so as conduct optimistic concurrency control. The memory storage structures supporting the relational data parallel processing and the achieving method have good performance on data operation such as data loading, read-write access, sorting, projection and connection and are especially suitable to conduct parallel calculation on multiple processors or a multi-core computer.

Description

A kind of memory structure and implementation method of supporting the relation data parallel processing

Technical field

The present invention relates to the memory technical field of structures, particularly a kind of memory structure and implementation method of supporting the relation data parallel processing.

Background technology

Along with the fast development of manufacturing process, the microprocessor frequency constantly promotes.But by 2005, when dominant frequency during near 4GHz, Intel company and AMD(advanced micro devices company) find that speed also can run into the oneself limit: that is exactly simple dominant frequency lifting, obvious Hoisting System overall performance.Dominant frequency promotes and causes power to increase, and heat dissipation problem also more and more becomes the obstacle that can't go beyond.

So microprocessor begins to the development of multinuclear aspect, i.e. integrated a plurality of complete calculating kernels in one piece of processor.Nowadays, adopt the computer of the little processing of multinuclear to be seen everywhere, a lot of mobile phones also all adopt double-core even four nuclears.

Also promote the development of parallel computing thereupon, improved processing speed to take full advantage of the multinuclear resource.A lot of programming languages have all increased the support to parallel computation, and original simple data structure is strengthened.But for the relationship type structure, support badly usually, cause a lot of problems:

1, storage organization does not design at parallel processing;

2, data load not adopt and walk abreast, and data manipulation performance is not ideal enough;

Add exclusive lock when 3, revising data, cause concurrent performance issue.

The present invention is directed to above problem and proposed a kind of effective solution.

Summary of the invention

In order to solve prior art problems, the invention provides a kind of memory structure and implementation method of supporting the relation data parallel processing, have good performance in data manipulations such as data loading, read and write access, ordering, projection, connections, be particularly useful for carrying out parallel computation at multiprocessor or multi-core computer.

The technical solution adopted in the present invention is as follows:

A kind of memory structure of supporting the relation data parallel processing, its structure adopts the row storage mode, and a storage organization is made up of a plurality of row, comprise the implicit row of row in each storage organization, data type is 32 integers, and the versions of data that record is expert at is used for optimistic concurrent control.

The present invention describes this storage organization from following three aspects:

1, structure is formed.

The row storage mode is adopted in this storage, and data are stored in the row (Column).A storage organization is made up of a plurality of row, is also referred to as table (Table).All comprise the implicit row that a row name is called Version in each storage organization, data type is 32 integers, and the versions of data that record is expert at is used for optimistic concurrent control.

For relational data, the data in the same row have identical data type, adopt the array mode to deposit.With respect to traditional mode that stores data on the row, row are stored in the modification storage organization, as increasing, being very easy to fast during delete columns.Per-column numerous operation as connecting and projection, also has good performance.On this basis, preferably use general type to be listed as to support various data types.As everyone knows, the mutual conversion between value type and the reference type performance issue that vanning is devaned occurs through regular meeting.Use general type row, carry out data type when being listed in initialization and specify, the data type of each row is fixed, and can not produce the performance issue that vanning is devaned for all operations that is listed as.

This storage organization also comprises the concept of row (Row), uses parallel worker thread technology to prepare data in advance, for each row increases a row index column, to add fastrunning traversal speed.Eliminate the inferior position of row visit data, reach the effect of similar line data visit.

Therefore, the developer can obtain and revise data by row.The line data that the developer obtains is a snapshot, can call capable Save method after the operation modification is saved to the row storage organization.

2, access interface.

This storage organization externally provides following access interface:

Data load: load data from data source;

List structure is revised: increase row, delete columns;

Reading and writing data: by the row read-write, by the row read-write.

3, the parallel support.

Row storage organization among the present invention has adopted parallel processing to improve performance in a plurality of operations.

At first, when data loaded, the row storage organization can be opened a thread for each row automatically, and a thread only is responsible for the data loading of row.From data source loaded data line data normally, also need to open a thread line data is split by row, the data after the fractionation are given columns according to loading asynchronous each row that are written into of thread.Data source loads data, line data splits and columns is that flowing water is parallel according to loading, can at utmost reduce the data load time.Because multi-threaded parallel is carried out, and can effectively reduce the data load time, when especially big data quantity loads.The data load time shortens, and has also reduced the taking for a long time of resources such as various databases, thread, internal memory, can effectively improve overall performance

Secondly, for relational data, access by row is commonplace.Therefore the row storage organization need be from each row peek, and combination is embarked on journey, and uses for outside.It is consuming time that this can increase operation usually, bigger to this operating influence of whole table traversal.By the row traversal time, storage organization is opened a thread automatically, in advance follow-up data combination is embarked on journey, and can effectively raise the efficiency like this.

Again, it is concurrent that this storage organization adopts the mode of optimistic lock to handle, the data consistency when guaranteeing concurrent modification.The capable snapshot data that the developer obtains has comprised the value of hidden columns Version, i.e. the version number of current data.The developer calls capable GetItem and SetItem and obtains and revise data in the capable snapshot, calls the Save method after finishing and preserves.The row storage organization can judge whether with in the structure whether Version train value in the capable snapshot equate with the Version of delegation, if equate that just the data of revising in the snapshot are incorporated into the row storage organization, and this row Version train value is added 1; If different, think then and revised by other thread that StaleDataException will dish out.CLIENT PROGRAM catch this unusual after, should read data again and handle again.Use optimistic lock to replace for Pessimistic Locking, can effectively solve problems such as performance, deadlock.

In sum, storage organization of the present invention adopts the row storage mode, and inside has encapsulated parallel algorithm, and optimistic lock concurrent processing mechanism is provided, and the access interface that is simple and easy to usefulness externally is provided simultaneously.

A kind of memory structure and implementation method of supporting the relation data parallel processing of the present invention, its data are stored in and list, rather than the storage be expert on, this mode has inborn advantage when revising list structure, operate very quick: increase row and remove row and do not relate to modification to data, the tentation data storage increases row on being expert at, and every data on the row need be made amendment with this.In addition, for reading of permutation data, the array data that only need read out whole row storage gets final product, and has avoided capable storage to read the problem of data line by line fully.Simultaneously, based on numerous operations of row storage, as connecting and projection, also has good performance.

Use general type to be listed as to support various data types.As everyone knows, the mutual conversion between value type and the reference type performance issue that vanning is devaned occurs through regular meeting.Use general type row, carry out data type when being listed in initialization and specify, the data type of each row is fixed, and can not produce the performance issue that vanning is devaned for all operations that is listed as.

Evaded .NET DataTable can only by the row traversal, can't concurrent operations drawback, when the algorithm of operations commonly used such as the loading of design data, calculating, association, ordering, projection, grouping, adopt the mentality of designing of parallel computation, give full play to multinuclear advantage and the performance of hardware.In addition, in inside data and parallel work-flow are packaged together, have both met OO development requires, reached the low coupling of high cohesion purpose again.

Use parallel worker thread technology to prepare data in advance, for each row increases a row index column, to add fastrunning traversal speed.Consider the visit data mode based on row, be a kind of inferior position to the row storage organization, therefore use the parallel worker thread data that are prepared in advance, increase a line index and be listed as to identify each and be listed as, in the hope of this inferior position is eliminated, reach the effect of similar line data visit.

Use is based on the optimism lock treatment mechanism of shaping versions of data row (Version row), that is: all there is the version of a shaping in the data of each row, handle the concurrent problem that the exclusive lock that this mode applies when effectively having avoided data manipulation causes during data access based on the current version of taking.

The beneficial effect that technical scheme provided by the invention is brought is:

Relational data is stored in the form of array and lists, and whole table is made up of a plurality of row; The general type of inner use is listed as to support various data types, to reduce the performance consumption that vanning is devaned and brought; Evaded .NET DataTable can only by the row traversal, can't concurrent operations drawback.In the notebook data inside configuration, to operations such as the loading of data, calculating, association, ordering, projection, groupings, on algorithm based on the parallel computation mode, in the hope of obtaining best performance; Use parallel worker thread technology to prepare line data in advance, to accelerate the speed by the row traversal; Handle concurrent access conflict based on optimistic lock mechanism, guarantee data consistency.

Description of drawings

Fig. 1 is a kind of uml class figure that supports memory structure and the implementation method of relation data parallel processing of the present invention;

Fig. 2 is employee's data rows storage synoptic diagram of the embodiment 1 of a kind of memory structure of supporting the relation data parallel processing of the present invention and implementation method.

Embodiment

For making the purpose, technical solutions and advantages of the present invention clearer, embodiment of the present invention is described further in detail below in conjunction with accompanying drawing.

Embodiment one

As shown in Figure 1, a kind of memory structure of supporting the relation data parallel processing, its structure adopts the row storage mode, a storage organization is made up of a plurality of row, comprise the implicit row of row in each storage organization, data type is 32 integers, and the versions of data that record is expert at is used for optimistic concurrent control.

In the present embodiment, use general type to be listed as to support various data types.

In the present embodiment, storage organization also comprises capable concept, uses parallel worker thread technology to prepare data in advance, for each row increases a row index column.

The storage organization of present embodiment externally provides following access interface:

Data load: be used for loading data from data source;

List structure is revised: for increasing row, delete columns;

Reading and writing data: be used for by the row read-write, by the row read-write.

As shown in Figure 2, use to concern storage organization as carrier among the present invention, one typical use as follows:

Exploit person is write the code execution data inquiry, gets access to the DbDataReader that the employee shows Employee, and the developer passes to this DbDataReader the Load method of the example of this relation storage organization.Invoke code is as follows:

var?table?=?new?DsmTable();

table.AddColumn(“ID”,typeof(string));

table.AddColumn(“Name”,typeof(string));

table.AddColumn(“Sex”,typeof(int));

table.AddColumn(“age”,typeof(int));

table.AddColumn(“Department”,typeof(string));

using(var?reader?=?db.ExcuteReader(“select?*?from?employee”))

table.Load(reader);

Before data loaded, each Lieque of whole table reserved data type separately, and namely the data type of each row in the table is fixed.In the inner parallel computation characteristic that provides among the .NET, each the columns certificate of loaded in parallel used of Load method; So, can effectively reduce the data load time, reduce the performance consumption that the data vanning is devaned and brought.Data storage after the loading as shown in Figure 2.

The developer writes code and reads employee information line by line, and the inner thread that starts automatically of storage organization example is merged into row with the columns certificate in advance, has effectively eliminated the shortcoming of row storage mode access by row, has improved execution speed.Traversal visit code sample is as follows:

foreach(var?row?in?table.Rows){

//…

}

Developer A and developer B have obtained first line data from same storage organization, the Version train value is 1.It is NewYork that A revises the City train value, and it is Tokyo that B revises the City train value.A calls capable Save and preserves data, Version value in the row that storage organization inspection A holds is identical with the Version of first row in the storage organization, storage organization is incorporated into storage organization first row with the modification of A, and the storage organization first line data Version value is revised as 2, preserves successfully.B calls Save and preserves modification subsequently, and the storage organization inspection finds that the Version value is different, and the StaleDataException that dishes out is unusual.B captures that this is unusual, knows that data are modified, need obtain data and handle again.By the mode of this optimistic lock, can effectively handle concurrently, avoided the variety of problems that externally locks and bring simultaneously.Concurrent processing uses code as follows:

try{

row[“City”]?=?“Tokyo”;

//…

row.Save();

}

catch(StaleDataException){

// eject prompt window at UI, inform that user data is modified

}

At last, by above description as seen, this storage organization uses the row storage organization, can carry out data fast and load, and effectively reduces the performance consumption that the data vanning is devaned and brought; The parallel processing that it is inner can effectively improve the operating performance of relation data; The optimistic lock mechanism that adopts during data manipulation has been avoided the outside concurrent problem that locks and bring to a great extent; Simultaneously, the simple interface that externally provides has also reduced the requirement to technology to the developer.

The above only is preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, is equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. memory structure of supporting the relation data parallel processing, its structure adopts row storage mode, and a storage organization is made up of a plurality of row, comprise the implicit row of row in each storage organization, data type is 32 integers, and the versions of data that record is expert at is used for optimistic concurrent control.

2. a kind of memory structure of supporting the relation data parallel processing according to claim 1 is characterized in that, uses general type to be listed as to support various data types.

3. a kind of memory structure of supporting the relation data parallel processing according to claim 2, it is characterized in that, described storage organization also comprises capable concept, uses parallel worker thread technology to prepare data in advance, for each row increases a row index column.

4. according to any described a kind of memory structure of supporting the relation data parallel processing among the claim 1-3, it is characterized in that described storage organization externally provides following access interface:

Data load: be used for loading data from data source;

List structure is revised: for increasing row, delete columns;

5. implementation method of supporting the memory structure of relation data parallel processing may further comprise the steps:

When data load, the row storage organization can be opened a thread for each row automatically, a thread only is responsible for the data loading of row, if be line data from the data source loaded data, also need to open a thread line data is split by row, the data after the fractionation are given columns according to loading asynchronous each row that are written into of thread.

6. a kind of implementation method of supporting the memory structure of relation data parallel processing according to claim 5, it is characterized in that, for relational data, by the row traversal time, storage organization is opened a thread automatically, in advance follow-up data combination is embarked on journey.

7. a kind of implementation method of supporting the memory structure of relation data parallel processing according to claim 5 is characterized in that, it is concurrent that described storage organization adopts the mode of optimistic lock to handle, the data consistency when guaranteeing concurrent modification.

8. a kind of implementation method of supporting the memory structure of relation data parallel processing according to claim 7 is characterized in that, the described mode that adopts optimistic lock is handled and concurrently specifically be may further comprise the steps:

A, obtain capable snapshot data, described capable snapshot data has comprised the value of hidden columns, i.e. the version number of current data;

B, call capable GetItem and SetItem and obtain and revise data in the capable snapshot, call the Save method after finishing and preserve;

With in the structure whether the value that C, row storage organization judge hidden columns in the row snapshot equate with the value of the hidden columns of delegation, if equal, the data of revising in the snapshot be incorporated into the row storage organization, and the value of this row hidden columns is added 1; If different, then think and revised by other thread, the StaleDataException that will dish out, CLIENT PROGRAM catch this unusual after, read data again and handle again.