CN101329676B - Data paralleling abstracting method and apparatus and database system - Google Patents

Data paralleling abstracting method and apparatus and database system Download PDF

Info

Publication number
CN101329676B
CN101329676B CN2007101233422A CN200710123342A CN101329676B CN 101329676 B CN101329676 B CN 101329676B CN 2007101233422 A CN2007101233422 A CN 2007101233422A CN 200710123342 A CN200710123342 A CN 200710123342A CN 101329676 B CN101329676 B CN 101329676B
Authority
CN
China
Prior art keywords
data
subregion
sign
subprocess
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2007101233422A
Other languages
Chinese (zh)
Other versions
CN101329676A (en
Inventor
陆春义
谭力夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN2007101233422A priority Critical patent/CN101329676B/en
Publication of CN101329676A publication Critical patent/CN101329676A/en
Application granted granted Critical
Publication of CN101329676B publication Critical patent/CN101329676B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data parallel extraction method, a device and a database system which are applied to improving the extraction performance of the existing parallel extraction technique. The technical scheme provided by the embodiment of the invention is that according to the identification information of a database to be extracted, the distribution information of a storage area of the database to be extracted is obtained; according to the distribution information of the storage area, the storage area of the database is divided into storage subareas with set numbers; a data extraction subprocess with a set number is established; the extraction subprocess of each data is in parallel, and data in a database table to be extracted are extracted from one of the storage subareas. Therefore, data in neighboring or near physical storage areas are assigned to the same extraction subprocess as far as possible, and during the extraction process of each subprocess, the moving range of a pointer is confined into a comparatively small storage area; pointer movement is reduced, and extraction performance is enhanced; especially, the extraction performance of a large database exceeding the amount grade of GB data is enhanced.

Description

A kind of data parallel abstracting method, device and Database Systems
Technical field
The present invention relates to computer technology, particularly a kind of data parallel extraction technique.
Background technology
Along with developing rapidly of modern enterprise, the various data that enterprise accumulated also increase severely thereupon.For the ease of these mass datas are carried out science, effectively manage and analyzed, data warehouse (Data Warehouse) technology begins to obtain extensive promotion and application.ETL (Extract, Transform, Load extract, change, load) as an important tool in the data warehouse, mainly is responsible for extraction, conversion and the loading processing procedure of business datum.In the application of enterprise-level, the user often need extract some database tables big or super large from database, and the data magnitude of its individual data storehouse table can reach several GB, even tens GB or a hundreds of GB.The extraction mode of tradition one process can't be met customer need at all.Therefore, in order to solve the performance bottleneck problem that big tables of data extracts, adopt high performance multi-process paralleling abstracting technical scheme to become inevitable trend.
The industry existing technical scheme is according to user and line number requirement, with some in the database table or a plurality of field values as cutting apart constraint condition, the data of storing in the database table are divided into several parts, then each partial data being distributed to an independently extraction process handles, like this, each extraction process in the process of implementation, the constraint condition of cutting apart according to data, that part of data recording of oneself is distributed in extraction, the data that all extraction processes are extracted merge at last, form a full database table.For example cut apart with the age field in the database table, with field value as cutting apart constraint condition, for example each age groups such as 0-10,10-20,20-30,30-40... are respectively as cutting apart constraint condition, and each extraction process is responsible for one at least and is cut apart the data pick-up work that constraint condition limits.
Some or a plurality of field values in this paralleling abstracting technology dependency database table are used as data to be cut apart condition and carries out data and cut apart, and the selection of field value needs artificial the participation, also needs manually to cut apart condition to each process input data, therefore can't realize automatically.And in the partitioning scheme of the field value in the dependency database table,, otherwise be difficult to realize evenly the cutting apart of database table respectively extracted the data pick-up amount of process with equilibrium unless the user is perfectly clear to selected field span.
Storage mode by database data is determined in addition, when cutting apart constraint condition as data according to field value, the DATA DISTRIBUTION of distributing to same extraction process is in different physical storage locations zones, database server is in the process of reading of data like this, need frequent moving hand, consume a large amount of unnecessary times, greatly reduced the data pick-up performance.Therefore, though this paralleling abstracting technology can improve the data pick-up performance to a certain extent, but be difficult to satisfy the demand of client to the high-performance extract function.
Summary of the invention
The embodiment of the invention provides a kind of data parallel abstracting method, device and Database Systems, in order to improve the performance of existing paralleling abstracting technology.
A kind of data parallel abstracting method that the embodiment of the invention provides comprises:
Obtain the storage area distributed intelligence for the treatment of the extracted data storehouse according to the identification information for the treatment of the extracted data storehouse;
The storage area of database is divided into the storage subregion of setting number according to described storage area distributed intelligence;
Create the data pick-up subprocess of described setting number, from one of them storage subregion, paralleling abstracting is treated the data of extracted data storehouse table to each data pick-up subprocess respectively.
The embodiment of the invention also provides a kind of data parallel draw-out device, comprising:
Acquiring unit is used for according to treating that extracted data storehouse identification information obtains the storage area distributed intelligence for the treatment of the extracted data storehouse;
Cutting unit is used for the storage area of database being divided into the storage subregion of setting number according to described storage area distributed intelligence;
Extracting unit is used to create the also data pick-up subprocess of the described setting number of parallel starting, and from one of them storage subregion, paralleling abstracting is treated the data of extracted data storehouse table to each data pick-up subprocess respectively.
The embodiment of the invention also provides a kind of Database Systems, comprises database and described data parallel draw-out device.
The embodiment of the invention is according to the storage area distributed intelligence of data cell, the storage area for the treatment of extracted data storehouse table is divided into a plurality of storage subregions, each data pick-up subprocess is responsible for extracting the data in the storage subregion, thereby make the data cell of adjacent or close physical storage areas be distributed to same extraction subprocess as much as possible, like this, each subprocess is in extraction process, the moving range of pointer is limited in the less relatively storage area, reduce the moving range of pointer, strengthened the extraction performance.Especially improved the extraction performance that surpasses the big database of GB data magnitude.
Description of drawings
Fig. 1 is the physical store mode synoptic diagram of existing database;
The data parallel abstracting method schematic flow sheet that Fig. 2 provides for the embodiment of the invention;
The schematic flow sheet of the specific embodiment of data parallel abstracting method that Fig. 3 provides for the embodiment of the invention;
The data parallel draw-out device structural representation that Fig. 4 provides for the embodiment of the invention;
The database system structure synoptic diagram that Fig. 5 provides for the embodiment of the invention.
Embodiment
As shown in Figure 1, physical store mode for database, comprise a plurality of database tables in the general database, each database table has unique identification information, identification information comprises table name and user name, database table of unique qualification, the user is by table name and user name accessing database table, and the data of extracted data storehouse table.Data in database are stored in each subregion of storage medium, for example in each subregion of memory disk, and adopt the storage mode of similar data file, data block to manage, comprise at least one data file in each database, at least comprise a data block in each data file, comprise a data unit in each data block at least.Data cell is the minimum data unit of data base administration, and according to different data base administration patterns, data cell can be a record or the field in the database.
All data cells of database are the unified storage administrations of carrying out, in storing process, each data cell of each database table may belong to different data blocks, and each data block may belong to different data files again, and each data file may be stored in respectively in the different subregions again.
In the storage administration of database, usually represent subregion with data object tag, with data file label table registration according to file, show data block with the data block label table, and according to the data file sign of each data cell place subregion corresponding data objects sign, place data file correspondence, the data block sign of place data block correspondence, generate the data cell sign of this data cell, each data cell identifies the physical storage locations information of unique mapping (enum) data unit.According to different data base management methods, the method that generates the data cell sign according to these information may be different, but these information have all reflected the physical storage locations information of data cell, therefore is commonly referred to as the storage area distributed intelligence of database table.
Although the data cell of a database table may belong to a plurality of subregions, a plurality of data files, a plurality of data blocks, make that the data cell sign of each data cell is discontinuous, the physical storage locations of mapping also is discontinuous, but storage mode according to database, in the database table, belong to same subregion, the memory location of the data cell of same data file or same each data block is adjacent or close, therefore, be responsible for extraction if utilize the storage area distributed intelligence of database table that adjacent or close data cell is distributed to a data extraction subprocess, then can make each subprocess in extraction process, the moving range of pointer is limited in the less relatively storage area, reduce the moving range of pointer, strengthened the extraction performance.Especially improved the extraction performance that surpasses the big database of GB data magnitude.
As shown in Figure 2, the embodiment of the invention just is being based on the analysis to data library storage way to manage, a kind of data parallel abstracting method is provided, utilize database storing zone distributed intelligence that the storage area of database table is divided into a plurality of storage subregions, by the responsible respectively data cell that extracts on one of them storage subregion of each data pick-up subprocess, specifically comprise the steps: then
S201, basis treat that extracted data storehouse identification information obtains the storage area distributed intelligence for the treatment of the extracted data storehouse;
The storage area distributed intelligence of each database table is preserved respectively in the data library storage management document, the database storing management document can be called as database dictionary, and the data pick-up process can obtain to treat all storage area distributed intelligences in extracted data storehouse according to Database Identification information inquiry database storing management document.
S202, the storage area of database is divided into the storage subregion of setting number according to described storage area distributed intelligence;
The data pick-up subprocess of S203, the described setting number of establishment, each data pick-up subprocess is parallel from one of them storage subregion, extracts the data for the treatment of extracted data storehouse table.
Wherein, can adopt among the step S202 a variety of methods just the storage area of database be divided into the storage subregion of setting number, and according to different dividing mode, at step S203, each data pick-up subprocess is cut apart the constraint condition extracted data according to corresponding data respectively, for example:
One, cuts apart according to data object tag
Storage subregion partitioning scheme is: according to the size order of data object tag, each subregion is divided into the grouping of setting number, then each is grouped into a storage subregion, comprises a subregion at least in each storage subregion;
Accordingly, the data object tag of each subregion that each data pick-up subprocess can be respectively comprises with the corresponding stored subregion is directly cut apart constraint condition as data, and data are cut apart the constraint condition input database, from database, extract the extracted data storehouse for the treatment of that is stored in this storage subregion and show data; Perhaps
The extraction process respectively according to the data object tag of each subregion in each storage subregion, generates the data cell sign maximum value and the minimal value of each storage subregion correspondence earlier; Each data pick-up subprocess is cut apart constraint condition with the data cell sign maximum value and the minimal value of each storage subregion correspondence as data respectively, and data are cut apart the constraint condition input database, extracts the data for the treatment of extracted data storehouse table from database.
Two, cut apart according to the data file sign
The dividing method of storage subregion is: according to the size order of data file sign, each data file is divided into the grouping of setting number, each is grouped into a storage subregion, stores a data file at least in each storage subregion;
Accordingly, each data pick-up subprocess is directly cut apart constraint condition as data with the data object tag and the corresponding data file identification of each subregion in the corresponding stored subregion respectively, and data are cut apart the constraint condition input database, from database, extract the extracted data storehouse for the treatment of that is stored in this storage subregion and show data; Perhaps
The extraction process respectively according to the data object tag of each subregion in each storage subregion, the data file sign of each data object tag correspondence, generates data cell sign maximal value and minimum value in each data file earlier; All data cells according to each storage subregion correspondence identify maximal value and minimum value, specified data unit marks maximum value and minimal value respectively; Each data pick-up subprocess is cut apart constraint condition with the data cell sign maximum value and the minimal value of storage subregion correspondence as data respectively then, and data are cut apart the constraint condition input database, from database, extract the data for the treatment of extracted data storehouse table.
Three, cut apart according to the data block sign
The dividing method of storage subregion is: the size order according to described data block sign, each data block is divided into the grouping of setting number, and each is grouped into a storage subregion, stores a data block at least in each storage subregion;
Accordingly, each data pick-up subprocess can directly be cut apart constraint condition as data with the data object tag of each subregion in the corresponding stored subregion, the data file sign and the corresponding data block sign of each data file sign of each data object tag correspondence respectively, and data are cut apart the constraint condition input database, from database, extract the extracted data storehouse for the treatment of that is stored in this storage subregion and show data; Perhaps
The data pick-up process respectively according to the data object tag of each subregion in each storage subregion, the data file sign and the corresponding data block sign of each data file sign of each data object tag correspondence, generates the data cell sign maximal value and the minimum value of each data block earlier; And, determine respectively to store the data cell sign maximum value and the minimal value of subregion correspondence respectively according to all data cell sign maximal value and minimum value of each storage subregion correspondence; Each data pick-up subprocess is cut apart constraint condition with the data cell sign maximum value and the minimal value of corresponding stored subregion as data respectively more then, and data are cut apart the constraint condition input database, from database, extract the data for the treatment of extracted data storehouse table.
In the above-mentioned various embodiment, because each sign generally numbers in order, so when dividing into groups according to the size order of each sign, the nature data cell that the memory location is adjacent or close is divided into data and stores in the subregion.
In the above-mentioned various embodiment, the data pick-up subprocess can directly be cut apart constraint condition with the correlated identities of each storage subregion correspondence as data, also the data cell that can wherein store is designated data and cuts apart constraint condition, cut apart constraint condition if be designated data, then need to determine corresponding data cell sign maximum value and minimal value according to the create-rule of data cell sign with data cell.
The data cell sign has certain create-rule, and for example: the form of data cell sign is:
Data object tag+data file sign+data block sign+data storage cell sequence number
Therefore, if when generating data cell sign maximum value and minimal value according to data object tag, according to the figure place that " data file sign+data block sign+data storage cell sequence number " occupies, data cell sign maximum value is with each position 9, and data cell sign minimal value is with each position 0;
If during according to " data object tag+data file sign " generation data cell sign maximum value and minimal value, the figure place of occupying according to " data block sign+data storage cell sequence number ", data cell sign maximum value is with each position 9, and data cell sign minimal value is with each position 0;
The generation method of other data cell sign is similar, illustrates no longer one by one.
Preferable, can also store cutting apart of subregion according to data file, but when cutting apart constraint condition as data with data cell sign maximum value and minimal value, can the corresponding data block sign maximal value of each data file sign generate corresponding data cell maximal value, the data block sign minimum value corresponding with each data file sign generates corresponding data cell minimum value; When obtaining the data storage distributed intelligence, only need obtain each data file sign corresponding data block sign maximal value and minimum value.In this mode, the size order that can identify according to data file earlier divides into groups, back generation each self-corresponding data block sign maximal value and minimum value also can sort according to the size order of data file sign, identify peaked size order according to data block then, each is divided into groups a storage of the corresponding expression of data block sign maximum value subregion in each grouping with minimal value to data block identification maximal value and minimum value.Below be example just with this preferred mode, describe the embodiment of the invention in detail the data parallel abstracting method be provided.
As shown in Figure 3, the schematic flow sheet of a kind of data parallel abstracting method that provides for the embodiment of the invention comprises:
Step S301, obtain each data object tag for the treatment of extracted data storehouse table, each data file sign and the corresponding data block sign maximal value and the minimum value of each data file sign of each data object tag correspondence;
Each data file sign of each database table and corresponding data block sign maximal value and minimum value are to be stored in the data base management system (DBMS), for example in the database dictionary, the data pick-up process can be inquired about each data file sign for the treatment of extracted data storehouse table and corresponding data block sign maximal value and minimum value according to database table identification information and user name from data dictionary.
Step S302, according to the size order of data file sign, generate respectively and respectively store in the subregion, the data cell of each data file correspondence sign maximal value and minimum value;
Obtaining data object tag, data file sign, and after the maximal value and minimum value of data block sign,, can generate the maximal value and the minimum value of the data cell sign of each data file correspondence according to the concrete form of data cell.
For example: the form of data cell sign is: data object tag+data file sign+data block sign maximal value+data storage cell sequence number, and the figure place of data cell sequence number is when being 3, data cell identifies maximal value and is:
Data object tag+data file sign+data block sign maximal value+999 data cells sign minimum value is:
Data object tag+data file sign+data block sign minimum value+000
Step S303, each is divided into the grouping of setting number to data unit marks maximal value and minimum value, each is grouped into a storage subregion, stores a data file at least in each storage subregion;
Step S304, respectively according to all data cells sign maximal value and minimum value of each storage subregion correspondence, specified data unit marks maximum value and minimal value;
Wherein, the data cell sign maximum value of each storage subregion correspondence and minimal value can be set to an item number according to cutting apart the constraint condition parameter, one item number in fact defines a data unit marks set according to cutting apart the constraint condition parameter, and each data cell in this set identifies smaller or equal to maximum value more than or equal to minimal value.The partial data unit marks for the treatment of extracted data storehouse table is included in this set, certainly, does not get rid of and treats that extracted data storehouse table takies the situation of all data cell signs in this maximum value and the minimal value limited range.
Step S305, establishment and the same number of data pick-up subprocess of branch distribute the data of a storage subregion correspondence to cut apart constraint condition to each data pick-up subprocess;
Step S306, each data pick-up subprocess are cut apart constraint condition according to the data of oneself respectively, walk abreast and extract the described data for the treatment of extracted data storehouse table from database;
Wherein, the data cell of the data cell that each data pick-up subprocess extracts sign is smaller or equal to the data cell sign maximum value of corresponding stored subregion and more than or equal to minimal value;
Wherein an item number has limited all smaller or equal to maximum value and more than or equal to minimizing data cell sign according to the data cell sign maximum value of cutting apart constraint condition and minimal value.The data pick-up subprocess calls a query function, maximum value and minimal value are inquired about to database as query argument, the query interface of data base management system (DBMS) is according to the set of maximum value and minimal value qualification, inquire the corresponding data cell of each data cell sign that belongs in this set, and return Query Result to the data pick-up subprocess.
In fact, behind data file division storage subregion, data adjacent or close physical storage areas are distributed to same extraction subprocess as much as possible, like this, data base management system (DBMS) is in query script, the pen travel scope is limited in the storage area of each data file, thereby has reduced the moving range of pointer, has strengthened the extraction performance.
After all data for the treatment of extracted data storehouse table are extracted out, can gather merging, obtain a complete database table.
It will be appreciated by those skilled in the art that in the acquisition methods of database table data parallel abstracting method that the embodiment of the invention provides and database table, all or part of step is to finish by the relevant hardware of programmed instruction, this program can be stored in the read/write memory medium, and read/write memory medium is random access memory, disk, CD etc. for example.After this program is by computer run, can paralleling abstracting database table data also further the data that extract be merged into a database table.
Application example with oracle database is elaborated again below, the implementation all fours of other database, and the person skilled in the art of the present invention fully can be with reference to realization.
Data cell in the oracle database is called as a record, all there is line identifier (ROWID) field of an acquiescence in each bar record, ROWID is the unique identification of each bar record, each ROWID shines upon mutually with a physical store information, physical storage address of each physical store information points.
18 altogether of ROWID, coding rule is as follows:
7 bit data object identities+3 bit data file identifications++ 3 record row of 5 bit data block sign sequence number
Suppose that the user sets and use n subprocess to come data in the paralleling abstracting A table, the affiliated user of A table is called B.The paralleling abstracting implementation procedure is as follows:
1, according to A table name (database table sign) and party B-subscriber's name (user totem information), data query dictionary DBA_OBJECTS, all data object tags (DATA_OBJECT_ID) information of inquiry A table;
2, according to A table name and party B-subscriber's name, data query dictionary DBA_EXTENTS, all data file signs (FILE_ID) of inquiry A table, and the maximal value and the minimum value of the data block of each FILE_ID correspondence sign (BLOCK_ID), and represent that according to data file (FILE_ID) sorts from small to large; Query statement is as follows:
SELECT?FILE_ID,MAX(BLOCK_ID),MIN(BLOCK_ID)
FROM?DBA_EXTENTS
WHERE?OWNER=’B’
AND?SEGMENT_NAME=’A’
GROUP?BY?FILE_ID
ORDER?BY?FILE_ID?DESC
3, respectively with the minimum BLOCK_ID of DATA_OBJECT_ID, FILE_ID, data file, maximum BLOCK_ID as input parameter, recursive call DBMS_ROWID.ROWID_CREATE () function generates the maximum ROWID and the minimum ROWID of each DATA_OBJECT_ID and FILE_ID correspondence;
The prototype of DBMS_ROWID.ROWID_CREATE () function is defined as follows:
DBMS_ROWID.ROWID_CREATE (ROWID_TYPE, //the ROWID type
OBJECT_NUMBER, // data object tag
RELATIVE_FNO, // file identification
BLOCK_NUMBER, // data block sign
Capable number of ROW_NUMBER // record
);
When obtaining maximum ROWID, the BLOCK_NUMBER parameter is the maximum BLOCK_ID of data file, and ROW_NUMBER is 999;
When obtaining minimum ROWID, the BLOCK_NUMBER parameter is the minimum BLOCK_ID of data file, and ROW_NUMBER is 000;
Suppose that the 1st DATA_OBJECT_ID that obtains of step is x, the FILE_ID that the 2nd step obtained is y, and maximum ROWID that then calculates at last and minimum ROWID value are that z=x*y is right;
4, the paralleling abstracting process number of setting according to the user is organized above-mentioned maximum ROWID and minimum ROWID value to being divided into n, and the very big ROWID and the minimum ROWID that get each group are cut apart constraint condition as data;
5, create n data and extract subprocess, and cut apart constraint condition as data for one group of very big ROWID of each data pick-up course allocation and minimum ROWID;
6, start all data pick-up subprocesss, each data pick-up subprocess is according to greatly ROWID and minimum ROWID restricted portion extract the data recording of oneself being responsible for from database;
The data pick-up statement is as follows:
SELECT*FROMA
WHERE ROWID>=minimum ROWID
AND ROWID>=very big ROWID
7, the record that all data pick-up subprocesss are drawn into gathers merging, obtains a complete A table data.
According to the disclosed content of the embodiment of the invention, those skilled in the art can be used for the data pick-up of other types of database easily, are example here with the oracle database, do not limit protection scope of the present invention.
As shown in Figure 4, the embodiment of the invention also provides a kind of data parallel draw-out device, mainly comprises:
Acquiring unit 401 is used for according to treating that extracted data storehouse identification information obtains the storage area distributed intelligence for the treatment of the extracted data storehouse;
Cutting unit 402 is used for the storage area of database being divided into the storage subregion of setting number according to described storage area distributed intelligence;
Extracting unit 403 is used to create the data pick-up subprocess of described setting number, and from one of them storage subregion, paralleling abstracting is treated the data of extracted data storehouse table respectively in each data pick-up subprocess unit.
As shown in Figure 5, the embodiment of the invention also provides a kind of Database Systems, comprise database 51 and data parallel draw-out device 52, the structure of this data parallel draw-out device 52 as shown in Figure 4, comprise: acquiring unit 401, cutting unit 402 and extracting unit 403 are used for extracting the data for the treatment of extracted data storehouse table from database.
Wherein, the cutting unit 402 of data parallel draw-out device and the realization details of extracting unit 403 no longer are repeated in this description here referring to the data parallel abstracting method that the embodiment of the invention provides.
The data parallel draw-out device that the embodiment of the invention provides is applied in the Database Management System, can strengthen the paralleling abstracting performance of data.
In sum, after the technical scheme that the embodiment of the invention provides is treated dividing into groups of extracted data storehouse table according to data file, data adjacent or close physical storage areas are distributed to same extraction subprocess as much as possible, like this, each subprocess is in extraction process, refer to that the moving range that extracts the performance pin is limited in the storage area of a data file, reduced moving of pointer, strengthened the extraction performance.Especially improved the extraction performance of the big database of data magnitudes.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (18)

1. a data parallel abstracting method is characterized in that, comprising:
Obtain the storage area distributed intelligence for the treatment of the extracted data storehouse according to the identification information for the treatment of the extracted data storehouse;
The database storing zone is divided into the storage subregion of setting number according to described storage area distributed intelligence;
Create the data pick-up subprocess of described setting number, from one of them storage subregion, paralleling abstracting is treated the data of extracted data storehouse table to each data pick-up subprocess respectively.
2. the method for claim 1 is characterized in that:
Described storage area distributed intelligence comprises: store described each data object tag for the treatment of extracted data storehouse table data; And
The dividing method of described storage subregion is: according to the size order of data object tag, each subregion is divided into the grouping of setting number, each is grouped into a storage subregion, comprises a subregion at least in each storage subregion.
3. method as claimed in claim 2, it is characterized in that, each data pick-up subprocess extracts the method for the treatment of extracted data storehouse table data: each partition data object identity that each data pick-up subprocess comprises with the corresponding stored subregion respectively is that data are cut apart constraint condition, extracts the extracted data storehouse for the treatment of that is stored in this storage subregion and shows data.
4. method as claimed in claim 2 is characterized in that, each data pick-up subprocess extracts the method for the treatment of extracted data storehouse table data and comprises:
According to each partition data object identity in each storage subregion, generate the data cell sign maximum value and the minimal value of respectively storing the subregion correspondence respectively;
Each data pick-up subprocess is that data are cut apart constraint condition with the data cell sign maximum value and the minimal value of corresponding stored subregion respectively, extracts the data for the treatment of extracted data storehouse table.
5. the method for claim 1 is characterized in that:
Described storage area distributed intelligence comprises: store described each data object tag of extracted data storehouse table data and the data file sign of each data object tag correspondence treated; And
The dividing method of described storage subregion is: according to the size order of data file sign, each data file is divided into the grouping of setting number, each is grouped into a storage subregion, stores a data file at least in each storage subregion.
6. method as claimed in claim 5, it is characterized in that, each data pick-up subprocess is designated data with the data file of each partition data object identity, each data object tag correspondence in the corresponding stored subregion respectively and cuts apart constraint condition, extracts the extracted data storehouse for the treatment of that is stored in this storage subregion and shows data.
7. method as claimed in claim 5 is characterized in that, each data pick-up subprocess extracts the method for the treatment of extracted data storehouse table data and comprises:
According to the data object tag of each subregion in each storage subregion, the data file sign of each data object tag correspondence, generate data cell sign maximal value and minimum value in each data file respectively;
According to all data cell sign maximal value and minimum value of each storage subregion correspondence, determine respectively to store the data cell sign maximum value and the minimal value of subregion correspondence respectively;
Each data pick-up subprocess is that data are cut apart constraint condition with the data cell sign maximum value and the minimal value of storage subregion correspondence respectively, extracts the data for the treatment of extracted data storehouse table.
8. the method for claim 1 is characterized in that:
Described storage area distributed intelligence comprises: stores described each data object tag of extracted data storehouse table data, the data file of each data object tag correspondence treated and identifies, and each corresponding data block sign of each data file sign; And
The dividing method of described storage subregion is: the size order according to described data block sign, each data block is divided into the grouping of setting number, and each is grouped into a storage subregion, stores a data block at least in each storage subregion.
9. method as claimed in claim 8, it is characterized in that, each data pick-up subprocess is designated data with the data file sign of the data object tag of each subregion in the corresponding stored subregion, each data object tag correspondence and the corresponding data block of each data file sign respectively and cuts apart constraint condition, extracts the extracted data storehouse for the treatment of that is stored in this storage subregion and shows data.
10. method as claimed in claim 8 is characterized in that, the data pick-up subprocess extracts the method for the treatment of extracted data storehouse table data and comprises:
According to the data object tag of each subregion in each storage subregion, the data file sign and the corresponding data block sign of each data file sign of each data object tag correspondence, generate data cell sign maximal value and minimum value in each data block respectively;
According to all data cell sign maximal value and minimum value of each storage subregion correspondence, determine respectively to store the data cell sign maximum value and the minimal value of subregion correspondence respectively;
Each data pick-up subprocess is that data are cut apart constraint condition with the data cell sign maximum value and the minimal value of corresponding stored subregion correspondence respectively, extracts the data for the treatment of extracted data storehouse table.
11. the method for claim 1 is characterized in that:
Described storage area distributed intelligence comprises: stores described each data object tag of extracted data storehouse table data, the data file of each data object tag correspondence treated and identifies, and corresponding data block sign maximal value and the minimum value of each data file sign.
12. method as claimed in claim 11 is characterized in that:
The dividing method of described storage subregion is: the size order according to described data file sign, each data file is divided into the grouping of setting number, and each is grouped into a storage subregion, stores a data file at least in each storage subregion; And
Described data pick-up subprocess extracts the method for the treatment of extracted data storehouse table data and comprises:
Respectively according to the data object tag of each subregion in each storage subregion, the data file sign and the corresponding data block sign maximal value and the minimum value of each data file sign of each data object tag correspondence, corresponding data cell sign maximal value and the minimum value that generates in each data file;
According to all data cell sign maximal value and minimum value of each storage subregion correspondence, determine respectively to store the data cell sign maximum value and the minimal value of subregion correspondence respectively;
Each data pick-up subprocess is that data are cut apart constraint condition with the data cell sign maximum value and the minimal value of corresponding stored subregion respectively, extracts the data for the treatment of extracted data storehouse table.
13. method as claimed in claim 11 is characterized in that:
Describedly according to the storage area distributed intelligence database storing zone is divided into the method for setting number storage subregion and comprises:
Respectively according to the data object tag of each subregion in each storage subregion, the data file sign and the corresponding data block sign maximal value and the minimum value of each data file sign of each data object tag correspondence, corresponding data cell sign maximal value and the minimum value that generates in each data file;
Identify peaked size order according to described data file sign or data cell, each is divided into the grouping of setting number to data unit marks maximal value and minimum value, and determine data cell sign maximum value and minimal value in each grouping, wherein: each is grouped into a storage subregion; And
Described data pick-up subprocess extracts the method for the treatment of extracted data storehouse table data and comprises:
Each data pick-up subprocess is that data are cut apart constraint condition with the data cell sign maximum value and the minimal value of corresponding stored subregion respectively, extracts the data for the treatment of extracted data storehouse table.
14., it is characterized in that the described data for the treatment of that the extracted data storehouse is shown are that of database table writes down or field as claim 4,7,10,12 or 13 described methods.
15. the method for claim 1 is characterized in that, described storage area distributed intelligence is obtained from the database storing management document according to Database Identification information.
16. method as claimed in claim 15 is characterized in that, described Database Identification information comprises the table name and the user name of database table.
17. a data parallel draw-out device is characterized in that, comprising:
Acquiring unit is used for according to treating that extracted data storehouse identification information obtains the storage area distributed intelligence for the treatment of the extracted data storehouse;
Cutting unit is used for the storage area of database being divided into the storage subregion of setting number according to described storage area distributed intelligence;
Extracting unit is used to create the also data pick-up subprocess of the described setting number of parallel starting, and from one of them storage subregion, paralleling abstracting is treated the data of extracted data storehouse table to each data pick-up subprocess respectively.
18. Database Systems comprise database and data parallel draw-out device, it is characterized in that, described data parallel draw-out device comprises:
Acquiring unit is used for according to treating that extracted data storehouse identification information obtains the storage area distributed intelligence for the treatment of the extracted data storehouse;
Cutting unit is used for the storage area of database being divided into the storage subregion of setting number according to described storage area distributed intelligence;
Extracting unit is used to create the also data pick-up subprocess of the described setting number of parallel starting, and each data pick-up subprocess from one of them storage subregion, walks abreast and extract the data for the treatment of extracted data storehouse table from described database respectively.
CN2007101233422A 2007-06-20 2007-06-20 Data paralleling abstracting method and apparatus and database system Active CN101329676B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2007101233422A CN101329676B (en) 2007-06-20 2007-06-20 Data paralleling abstracting method and apparatus and database system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2007101233422A CN101329676B (en) 2007-06-20 2007-06-20 Data paralleling abstracting method and apparatus and database system

Publications (2)

Publication Number Publication Date
CN101329676A CN101329676A (en) 2008-12-24
CN101329676B true CN101329676B (en) 2010-04-14

Family

ID=40205489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007101233422A Active CN101329676B (en) 2007-06-20 2007-06-20 Data paralleling abstracting method and apparatus and database system

Country Status (1)

Country Link
CN (1) CN101329676B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10108649B2 (en) 2014-02-25 2018-10-23 Internatonal Business Machines Corporation Early exit from table scans of loosely ordered and/or grouped relations using nearly ordered maps
CN104182502B (en) * 2014-08-18 2017-10-27 浪潮(北京)电子信息产业有限公司 A kind of data pick-up method and device
CN104881475A (en) * 2015-06-02 2015-09-02 北京京东尚科信息技术有限公司 Method and system for randomly sampling big data
CN105468725B (en) * 2015-11-20 2019-03-08 北京京东尚科信息技术有限公司 Table segmenting extraction system and method in a kind of relevant database
CN105677903A (en) * 2016-02-05 2016-06-15 华为技术有限公司 Data acquisition method and device as well as computer device
CN107436883B (en) * 2016-05-26 2020-06-30 北京京东尚科信息技术有限公司 Data extraction method, device and system based on remainder
CN107784039A (en) * 2016-08-31 2018-03-09 阿里巴巴集团控股有限公司 A kind of data load method, apparatus and system
CN108121719B (en) * 2016-11-28 2020-06-30 北京国双科技有限公司 Method and device for realizing data extraction conversion loading ETL
CN108228908B (en) * 2018-02-09 2021-11-12 中国银行股份有限公司 Data extraction method and device
CN110032559A (en) * 2019-04-19 2019-07-19 成都四方伟业软件股份有限公司 A kind of data pick-up method and device
CN111209321A (en) * 2019-12-25 2020-05-29 北京永洪商智科技有限公司 Grouping data mart method for complex query
CN112991758B (en) * 2021-03-24 2022-08-26 西安华旗电子技术有限公司 Random inspection method and device for cargo entrainment inspection of administrative vehicles in customs special supervision area

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1136847A (en) * 1993-12-03 1996-11-27 艾利森电话股份有限公司 Method and device for extracting data from a group of data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1136847A (en) * 1993-12-03 1996-11-27 艾利森电话股份有限公司 Method and device for extracting data from a group of data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
谭支鹏等.基于工作流的数据抽取转换加载.华中科技大学学报(自然科学版)34 2.2006,34(2),61-63,69.
谭支鹏等.基于工作流的数据抽取转换加载.华中科技大学学报(自然科学版)34 2.2006,34(2),61-63,69. *
颜雪松等.数据挖掘的并行策略研究.计算机工程与应用.2003,187-189. *

Also Published As

Publication number Publication date
CN101329676A (en) 2008-12-24

Similar Documents

Publication Publication Date Title
CN101329676B (en) Data paralleling abstracting method and apparatus and database system
CN105117417B (en) A kind of memory database Trie tree indexing means for reading optimization
CN104915450B (en) A kind of big data storage and retrieval method and system based on HBase
US7558802B2 (en) Information retrieving system
CN101446962B (en) Data conversion method, device thereof and data processing system
CN107608773A (en) task concurrent processing method, device and computing device
CN104112008A (en) Multi-table data association inquiry optimizing method and device
CN103823865A (en) Database primary memory indexing method
CN103577440A (en) Data processing method and device in non-relational database
WO2006046669A1 (en) Database management device, method and program
KR20010083096A (en) Value-instance-connectivity computer-implemented database
CN107463665A (en) A kind of data correlation rule mining algorithms
Andrzejewski et al. Parallel approach to incremental co-location pattern mining
CN104871153A (en) System and method for flexible distributed massively parallel processing (mpp) database
CN101299213A (en) N-dimension clustering order recording tree space index method
CN107247624A (en) A kind of cooperative optimization method and system towards Key Value systems
CN105095436A (en) Automatic modeling method for data of data sources
Lwin et al. Non-redundant dynamic fragment allocation with horizontal partition in Distributed Database System
US20200278980A1 (en) Database processing apparatus, group map file generating method, and recording medium
Alam et al. Performance of point and range queries for in-memory databases using radix trees on GPUs
CN101739523B (en) Data permission control method and device
US7181481B2 (en) System and method for concurrently reorganizing logically related LOB table spaces
CN104376055B (en) A kind of large-sized model data comparing method based on allocation methods
Vespa et al. Efficient bulk-loading on dynamic metric access methods
CN106682047A (en) Method for importing data and related device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant