CN102999519B - Read-write method and system for database - Google Patents

Read-write method and system for database Download PDF

Info

Publication number
CN102999519B
CN102999519B CN201110273321.5A CN201110273321A CN102999519B CN 102999519 B CN102999519 B CN 102999519B CN 201110273321 A CN201110273321 A CN 201110273321A CN 102999519 B CN102999519 B CN 102999519B
Authority
CN
China
Prior art keywords
data
write
node
replica node
reading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110273321.5A
Other languages
Chinese (zh)
Other versions
CN102999519A (en
Inventor
邓明
潘佳伟
邢钦华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI SHENG PAY THROUGH E-COMMERCE CO LTD
Original Assignee
SHANGHAI SHENG PAY THROUGH E-COMMERCE CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI SHENG PAY THROUGH E-COMMERCE CO LTD filed Critical SHANGHAI SHENG PAY THROUGH E-COMMERCE CO LTD
Priority to CN201110273321.5A priority Critical patent/CN102999519B/en
Publication of CN102999519A publication Critical patent/CN102999519A/en
Application granted granted Critical
Publication of CN102999519B publication Critical patent/CN102999519B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a read-write method and system for a database. The method comprises the steps of: transversely cutting record data according to a main keyword into a plurality of data segments, wherein each data segment is stored as a write copy and corresponding read copies, and the write copy is stored in form of line storage, so that the write performance of the database is optimized; the read copies are stored in form of column storage, and data in each read copy is organized in different manners, so that the read performance of the database is optimized. According to the method, an overall index and a local index are further established, so that the operating position of data to be written or to be read can be quickly positioned when data is written or read. According to the embodiment of the method or system, not only is data quickly written, but also data can be quickly read.

Description

A kind of reading/writing method and system of data base
Technical field
The present invention relates to field of computer data processing, the reading/writing method and system of more particularly to a kind of data base.
Background technology
With the development of society and science and technology, computer has obtained increasingly extensive application in daily life and work.Such as The present, user needed to carry out many data storage process by computer due to daily life and requirements of one's work, and data base System is exactly a kind of application system that can realize the above, is the one kind grown up to adapt to the needs of data processing The core institution of ideal data processing.In the daily life and work of reality, user is required to convenient to data Storehouse carries out data storage and data query etc. and accesses operation.
Distributed data base system is the one kind in Database Systems, is developed on the basis of centralized data base system Get up, it is the product that computer technology and network technology are combined, including client, metadata node and data storage section Point.On service application service device, business application is by client to distributed memory system collection for wherein client deployment Mass-send out data access request;Metadata node is used for depositing metadata information;Data memory node is used for depositing data block.
The data base read-write method of prior art, the data storage format of its data base being written and read has two kinds.It is a kind of Using row storage, the row data base based on capable storage mode could accelerate the read-write for data because of the outside index of dependence, And the maintenance of outside index can consume the substantial amounts of time and space, so when can consume substantial amounts of when being written and read to row data base Between and space resources;Another kind will not be consumed using row storage, row storage because having from the high characteristic of straw line compression ratio Substantial amounts of time and space resource, but this storage mode can cause write, and data line needs to carry out multiple disk operating As a result, so write performance is low.
Therefore, under the huge background of modern data, a kind of reading/writing method of new data base how is provided, can Realizing the no write de-lay of data can realize the quick reading of data again, be the technical problem that prior art is badly in need of solving.
The content of the invention
In view of this, the invention provides the reading/writing method and system of a kind of data base, to overcome prior art in due to The low problem of caused data base write degraded performance or reading performance using single storage mode.
For achieving the above object, the present invention provides following technical scheme:
Record data is many numbers according to major key transversally cutting by a kind of reading/writing method of data base, metadata node According to section, each data segment saves as a write copy and corresponding many parts of readings copy, wherein write copy is deposited using row Storage form is stored, and is read copy and is stored using row storage form, each to read in copy data in different ways Tissue, the method includes:
Receive the access request that client is initiated;
In the case where access request is data write operation:
Major key in the global index and access request preserved in metadata node determines data need to be written The data interval of write and write replica node corresponding with the data interval, the global index is used to indicate major key With the corresponding relation between data interval and write replica node corresponding with the data interval;
The write replica node of write is needed to initiate operation requests to data to be written, said write replica node will update Data supplementing is write in its increment block, and the increment block is the disk file that updates the data of record, described to update the data as preset The set of data to be written in bar number;
In the case where access request is data read operation:
Judge whether there is major key in access request, if major key, then according to preservation in metadata node Global index and the major key determine continue fetch data place data interval and the data interval it is corresponding read it is secondary This node, and in the case where there is other filterconditions, one is determined with other filterconditions most by the reading replica node The local index of matching, without in the case of other filterconditions, then an arbitrarily selected local index, the local index For indicating the corresponding relation of key word and memory block, the memory block is fast for the least unit of data storage;
If without major key, access request is sent into current meta data node all of data interval, and In the case where there is other filterconditions, by all of data interval it is corresponding read replica node determine one and other The local index that filtercondition is most matched, without in the case of other filterconditions, then an arbitrarily selected local index;
It is determined that the memory block and reading replica node corresponding with the memory block being likely located at of fetching data that continue, and The access request is sent to into each and reads replica node;
The replica node that reads judged with the presence or absence of the updating the data of fetching data of continuing in its increment block, if It is just to read to continue from increment block and fetch data, if it is not, then reading in the memory block indicated from the local index to be read Data, the increment block is the disk file that record is updated the data.
Wherein, the method for building up of the global index includes:
All of write copy is sampled, and the data that sampling is obtained are ranked up according to major key;
To the data demarcation interval after sequence, and distribute corresponding start node for the data interval after division, formed just Beginning global index;
The initial global index is sent to into each write replica node, so that described each write replica node foundation Corresponding relation distributed data between the data interval and start node;
Metadata node receives the data distribution result that each write replica node is returned, and ties according to the data distribution Fruit divides memory block, formulates the plan of data interval balance dispatching and is sent to each write replica node, notifies that each write is secondary This node is ranked up according to the data interval balance dispatching plan, and the memory block is fast for the least unit of data storage;
Each write replica node start node internal sort;
Receive what each write replica node sent, the result of the sequencing and scheduling carried out in units of memory block is set up Relation between the major key and the data interval and write replica node corresponding with the data interval.
Wherein, the method for building up of the local index includes:
It is that the corresponding reading of data interval distribution in the global index is secondary in the case where global index has built up This node;
The above-mentioned relations of distribution are sent to into each write replica node, the data that said write replica node stores itself Record sends to corresponding and reads replica node storage;
Trigger each reading replica node to be ranked up the data in reading copy according to the Sorted list specified;
The data after sequence are preserved in units of memory block, keyword is set up with the memory block and corresponding with the memory block Reading replica node between relation.
Wherein, methods described also includes creating the operation of filter, and whether the filter continues for judgement fetches data In certain memory block.
Wherein, methods described also includes:
Described updating the data is sent to corresponding reading replica node by write replica node;
The reading replica node updates the data batch write increment block by described.
Wherein, obtain to continue in the memory block indicated from the local index and fetch data, specifically also include:
When access request is that occurrence is inquired about and inquiry is classified as keyword, replica node application filter is read to specifying Memory block filtered,
Record data is many numbers according to major key transversally cutting by a kind of read-write system of data base, metadata node According to section, each data segment saves as a write copy and corresponding many parts of readings copy, wherein write copy is deposited using row Storage form is stored, and is read copy and is stored using row storage form, each to read in copy data in different ways Tissue, the system includes:
Metadata node, is data write behaviour in access request for making requests on judgement to the access that client is initiated In the case of work:
The data that data to be written are located are determined according to the major key in global index and access request that itself is preserved Write replica node interval and corresponding with the data interval;
The write replica node of write is needed to initiate operation requests to data to be written;
In the case where access request is data read operation:
Judge whether there is keyword in access request, if major key, then according to the global index for itself preserving and The major key determines the corresponding reading replica node of data interval and the data interval at place of fetching data that continues;
If without major key, access request is sent to current meta data node into all of data interval;
It is determined that the memory block and reading replica node corresponding with the memory block being likely located at of fetching data that continue, and The access request is sent to into each and reads replica node;
Write replica node, writes for after the operation requests for receiving metadata node initiation, will update the data to add In entering its increment block;
Replica node is read, in the case where there are other filterconditions, determining one with other filterconditions most The local index matched somebody with somebody, in the case of without other filterconditions, then an arbitrarily selected local index, determines described to be read The memory block that data are likely located at, judges with the presence or absence of the updating the data of fetching data of continuing in its increment block, if it is, just Read to continue from increment block and fetch data, if it is not, then obtain to continue in the memory block indicated from the local index fetching data.
Wherein, the metadata node is additionally operable to:All of write copy is sampled, and to sampling
The data for obtaining are ranked up according to major key;
To the data demarcation interval after sequence, and distribute corresponding start node for the data interval after division, formed just Beginning global index;
The initial global index is sent to into each write replica node, so that described each write replica node foundation Corresponding relation distributed data between the data interval and start node;
The data distribution result that each write replica node is returned is received, and data are divided according to the data distribution result Block, formulates the plan of data interval balance dispatching and is sent to each write replica node;
Receive what each write replica node sent, the result of the sequencing and scheduling carried out in units of memory block is set up Relation between the major key and the data interval and write replica node corresponding with the data interval;
Said write replica node is additionally operable to:
The initial global index sent according to metadata node is carrying out data distribution;
The data interval balance dispatching formulated according to metadata node is planned to be ranked up in node and is entered between node Row scheduling.
Wherein, the metadata node is additionally operable to:For the data interval distribution correspondence in the global index
Reading replica node;
The above-mentioned relations of distribution are sent to into each write replica node;
Trigger each reading replica node to be ranked up the data in reading copy according to the Sorted list specified;
Said write replica node is additionally operable to:
After receiving the relations of distribution that the metadata node is sent, the data record for itself storing is sent to corresponding Read replica node storage;
The reading replica node is additionally operable to:
The data after sequence are preserved in units of memory block, keyword is set up with the memory block and corresponding with the memory block Reading replica node between relation.
Wherein, the reading replica node is additionally operable to:
Create filter.
Wherein, said write replica node is additionally operable to:
Described updating the data is sent to into corresponding reading replica node;
The reading replica node is additionally operable to:
Batch write increment block is updated the data by described.
Wherein, the reading replica node is additionally operable to:
When access request is that occurrence is inquired about and inquiry is classified as keyword, specified memory block is carried out using filter Filter.
Understand via above-mentioned technical scheme, compared with prior art, the invention discloses a kind of read-write side of data base Record data transversally cutting is multiple data segments by method and system, the method, and each data segment saves as a write copy With corresponding many parts reading copies, wherein write copy is stored using row storage form, the write of data base is optimized Can, read copy and stored using row storage form, the reading performance of data base is optimized, the method has also set up global rope Draw and local index, the position that access request needs to access quickly is positioned using global index and local index, using this system The no write de-lay of data can either be realized, it is also possible to realize the quick reading of data.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can be with basis The accompanying drawing of offer obtains other accompanying drawings.
Fig. 1 is system structure diagram disclosed in the embodiment of the present invention;
Fig. 2 is data write operation schematic flow sheet disclosed in the embodiment of the present invention;
Fig. 3 is to set up global index's schematic flow sheet disclosed in the embodiment of the present invention;
Fig. 4 is data read operation schematic flow sheet disclosed in the embodiment of the present invention;
Fig. 5 is to set up local index schematic flow sheet disclosed in the embodiment of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than the embodiment of whole.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
Embodiment one
Fig. 1 is the structural representation of the embodiment of the present invention each node in actual applications, wherein, a metadata node Include n write replica node down, and write replica node correspondence m reads replica node, wherein, n and m are nature Number.For example, for write replica node 1, its corresponding m is read replica node and is respectively:Reading replica node 1-1, Read replica node 1-2...... and read replica node 1-m.In actual applications, the metadata node in Fig. 1 is first by data Record data transversally cutting in storehouse is multiple data segments, then each data segment is stored in into a write replica node In, these write replica nodes are stored in the form of row storage to data, and facilitate implementation user is carried out soon to data base Fast write operation.Metadata node, further according to the feature of data, is to write in replica node under each write replica node Data segment distribute multiple reading replica nodes, then the data segment in each write replica node is copied to and the write pair This node is corresponding to be read in replica node, and these are read replica node data are stored in the form of row storage, and Different sequential organization data are adopted between each reading replica node under each write replica node, in order to realize user couple Data base carries out fast reading operations.
Shown in Figure 2, Fig. 2 is the embodiment flow chart for realizing method for writing data of the present invention, and concrete steps can be as Under:
Step 201:The access request that client is initiated is received, the access request is data write operation.
In this step, the data write operation that client is initiated is received by metadata node.
Step 202:Major key in the global index and access request preserved in metadata node determines to be written Enter data interval and write replica node corresponding with the data interval that data need to write, the global index is used to indicate Corresponding relation between major key and data interval and write replica node corresponding with the data interval.
In this step, metadata node preserves in itself global index because have recorded in global index major key with The corresponding relation of data interval, so when metadata node receives the data write operation with primary keyword, Neng Gougen Accordingly corresponding relation determines which data interval is data to be written should write, and then metadata node is further according in global index The data interval of record determines data write operation should be in which write copy section with the corresponding relation of write replica node Point is carried out.
Wherein, the method that the global index sets up can be found in Fig. 3, and its step is specific as follows:
Step 301:Metadata node is sampled to all of write copy, and the data obtained to sampling are according to main pass Key word is ranked up.
In this step, metadata node is sampled to the record data in all write replica nodes, and sample proportion can To be defined by the user, then the data that sampling is obtained are ranked up by comparison element of major key.
Step 302:To the data demarcation interval after sequence, and distribute corresponding initial section for the data interval after division Point, forms initial global index.
It is by sorted data demarcation interval and initial for ready-portioned each interval distribution one in this step Node, store in each start node sampling get it is corresponding interval in data, material is thus formed one not The initial global index of optimization.
Step 303:The initial global index is sent to into each write replica node, so that described each write copy Node is according to the corresponding relation distributed data between the data interval and start node.
In this step, the initial global index formed in step 302 is sent to into each write replica node, each write Replica node is received after the initial global index, by the major key and initial global rope of each data for itself recording Interval in drawing is compared, determine the data record should be located at which data interval, then send the data to really The corresponding write replica node storage of fixed data interval.Meanwhile, all have recorded each data interval in each write replica node The number of data record.
Step 304:Metadata node receives the data distribution result that each write replica node is returned, and according to the number Memory block is divided according to distribution results, the plan of data interval balance dispatching is formulated and is sent to each write replica node, notify each Individual write replica node is ranked up according to the data interval balance dispatching plan, and the memory block is the minimum of data storage Unit is fast.
In this step, the data distribution result carried out according to initial global index is sent to unit by each write replica node Statistical computation is done in back end, data record distribution of the metadata node to each data interval, then according to this result of calculation With the memory block internal memory of administrator configurations, memory block is divided, formulates the data interval balance dispatching plan in units of memory block, This plan is sent to into each write replica node and notifies that each write replica node is ranked up scheduling according to this plan.
Step 305:Dispatch between each write replica node start node internal sort and node.
In this step, each write replica node is arranged according to data interval balance dispatching plan in units of memory block Sequence, and the memory block for needing to be sent to other write replica nodes is sent to into according to plan specified memory block.
Step 306:Metadata node receives what each write replica node sent, the sequence carried out in units of memory block With the result of scheduling, set up the major key and the data interval and write replica node corresponding with the data interval it Between corresponding relation.
In this step, what each write replica node generated step 305, the sequence and scheduling knot in units of memory block Fruit is sent to metadata node, metadata node record the major key of each memory block and the data interval and with the number According to the corresponding relation between the corresponding write replica node in interval, a global index is formed.
Step 203:The write replica node of write is needed to initiate operation requests, the write replica node to data to be written To update the data to add and write in its increment block, the increment block will be the disk file that record is updated the data, described to update the data For the set of data to be written in preset bar number.
In this step, metadata node to step 202 determines, data to be written need the write replica node of write to send out Data write operation request is played, the write replica node is received after request, will update the data and its increasing is write in the way of adding Gauge block, completes the write of data.
Wherein, also the operation of the reading replica node for being sent to corresponding will be updated the data including write replica node.
In the present embodiment, first data write operation request is received by metadata node, then determined according to global index Data to be written need the data interval of write, then access request is sent to into write copy section corresponding with the data interval Point, by the write replica node data write operation is carried out, and it is using row storage shape to write the data record in replica node What formula was stored, the write performance of data base is optimized, therefore using the method for writing data embodiment of the present invention, Neng Goushi The no write de-lay of existing data.
Embodiment two
Fig. 1 is the structural representation of the embodiment of the present invention each node in actual applications, the wherein application of each node Function and feature can refer in embodiment one with regard to the description of Fig. 1.
Shown in Figure 4, Fig. 4 is the embodiment flow chart for realizing method for reading data of the present invention, and concrete steps can be as Under:
Step 401:Receive the access request of client initiation, the access request is data read operation.
In this step, the data read operation of client initiation is received by metadata node.
Step 402:Judge whether there is major key in access request, if it has, then execution step 403;If it is not, Execution step 404.
In this step, judge whether contain major key in access request by metadata node.
Step 403:Global index and the major key according to preserving in metadata node determines institute of fetching data of continuing The corresponding reading replica node of data interval and the data interval.
In this step, metadata node preserves in itself global index because have recorded in global index major key with The corresponding relation of data interval, so when metadata node receives the data read operation with primary keyword, Neng Gougen Accordingly corresponding relation determines continue to fetch data which data interval be located at, and then metadata node is further according in global index The data interval of record determines that data read operation should be carried out in which replica node with the corresponding relation for reading replica node.
Step 404:Access request is sent into current meta data node all of data interval.
In this step, without in the case of major key in access request, metadata node not can determine that number to be read According to positioned at which data interval, so access request is sent to all of data interval.
Step 405:Judge whether containing other filterconditions in access request, if it has, then execution step 406;If No, then execution step 407.
In this step, whether judged in access request containing other filterconditions by reading replica node.
Step 406:One local index most matched with other filterconditions is determined by the reading replica node, it is described Local index is used to indicate the corresponding relation of key word and memory block that the memory block to be fast for the least unit of data storage.
In this step, in the case of it is determined that there is other filterconditions in access request, the reading replica node is selected One is used to filter with the local index that other filtercondition contents are most matched.
Wherein, the method for building up of the local index can be found in Fig. 5, and its step is specific as follows:
Step 501:Judge whether global index sets up, if it is, into step 503;If it is not, then into step Rapid 502.
In this step, judge whether global index has set up by metadata node.
Step 502:Set up global index.
In this step, in the case where global index is not set up, metadata node is built firstly the need of global index is set up Cube method can refer to the method for building up of global index in embodiment one.
Step 503:Metadata node is that the data interval in the global index distributes corresponding reading replica node.
In this step, in the case where global index has built up, metadata node, according to the characteristic of record data, is complete Each data interval in office's index distributes multiple reading replica nodes.
Step 504:The above-mentioned relations of distribution are sent to into each write replica node, said write replica node deposits itself The data record of storage sends to corresponding and reads replica node storage.
In this step, the relations of distribution described in step 504 are sent to each write replica node by metadata node, and each is write Enter replica node to receive after the relations of distribution, the data record for itself storing is copied to into each corresponding reading secondary This node.
Step 505:Metadata node triggers each and reads replica node according to the Sorted list specified to reading replica node In data be ranked up.
In this step, metadata node triggers each reading replica node data record is carried out according to different keyword Sequence.
Step 506:Read replica node and the data after sequence are preserved in units of memory block, set up keyword and deposit with described Storage block and the relation read between replica node corresponding with the memory block.
In this step, each reads what is generated in copy section storing step 505, the ranking results in units of memory block, Record between the keyword of each memory block and the data interval and reading replica node corresponding with the data interval Corresponding relation, forms a local index.
Wherein, also include creating the operation of filter while local index is set up, the filter is treated for judgement Read whether data are located in certain memory block.
Each filter corresponds to a single memory block, can interpolate that to continue to fetch data by filter and whether there is In certain memory block.Its insertion method is:
Prepare the bit group that a length is m in advance, the value of m is expected to be 20 times of memory block element or so, in bit group Portion's element initial value is 0.When a line record is added in memory block, using the rope of the k different useless function pair record Draw row to be calculated, the codomain of result of calculation is in [0, m], with this k result of calculation as index, by corresponding unit in bit group Element is set to 1.
Step 407:An arbitrarily selected local index.
In this step, in the case of without other filterconditions, the reading replica node not can determine that one locally Index for filtering, so an arbitrarily selected local index is used to filter.
Step 408:It is determined that the memory block and reading pair corresponding with the memory block being likely located at of fetching data that continue This node, and the access request is sent to into each reading replica node.
By above-mentioned steps, according to it has been determined that data interval and local index, it is determined that the possibility of fetching data that continues The memory block being located at, determines reading replica node corresponding with the memory block, it is determined that the reading further according to local index After replica node, access request is sent to the reading replica node by metadata node.
Wherein, the reading replica node is concrete in access request after the access request for receiving metadata node In the case that value is inquired about and inquiry is classified as index column, follow the steps below:
(A), access request application filter is filtered to specified memory block and increment block, exclude institute either with or without Continue the memory block fetched data;
Judgement continues to fetch data with the presence or absence of the method in certain memory block:
Continue to fetch data using above-mentioned k different useless function pair and calculated, draw k result of calculation;
This k result of calculation go in bit group inquiry as being indexed, is 0 if there is an element value, explanation is treated Read data not exist in this memory block;
If not existing for 0 element value in bit group, illustrate to continue to fetch data there may be in this memory block, also need Further to compare differentiation.
(B), according to other filterconditions, you and local index are positioned to data, are obtained continuing and are fetched data in memory block In sequence number, be designated as R;
(C), obtain continuing according to local index sequence number R that obtains of inquiry and inquiry fetch data place memory block position Put, prepare to read data;
(D), operate below executed in parallel, opening continues the file at place of fetching data, by search index to the maximum less than R Sequence number, obtains the corresponding document misregistration of the sequence number and navigates to the position, the element being successively read in file, until the R it is first Till element.
Step 409:The reading replica node judges to whether there is the renewal number for fetching data that continues in its increment block According to if it is, execution step 410;If it is not, then execution step 411, the increment block is the disk text that record is updated the data Part.
Wherein, updating the data in the reading replica node increment block, is by corresponding with the reading replica node Write replica node sends and comes.
The reading replica node is when the record strip scalar product for updating the data is tired out and reaches threshold value, then data procession is turned Change, row storage is changed into from row storage, while removable partial compression is carried out to each column data, by the batch data write magnetic after conversion Disk, because the change of the increment block accumulation of the reading back end is big, can reduce the query performance of system, it is therefore desirable to periodically right All data blocks in node merge sequence, to keep the succession of data.
Wherein, the conversion of data procession is specifically as follows:
To each attribute point in data line record, during corresponding row file is stored after fractionation, the attribute point is There is the information of independent attribute in data line record.
Wherein, the attribute point for being used as to sort in a line record is called " Sorted list ", and the attribute as sequence is at this Specify during ground index creation, except " Sorted list ", other attribute points are called " non-Sorted list " in a line record." Sorted list " " organizational form of the non-Sorted list when disk is write is different.
Sorted list is stored in order in disk.For quick location data position, the rope for quoting auxiliary is needed Quotation part, it is possible to use B+tree.Store the value of starting elemental in certain memory block in the index of Sorted list, sequence number, and Skew hereof, when being ranked up the reading of column element occurrence, in memory block from the beginning of the deviant of starting elemental, according to It is secondary from file read sequence column element, until reading the sequence column element to be inquired about till, the sequence number of initial value is added The Sorted list element number skipped during reading, obtains needing the global sequence number of the sequence column element of reading, is designated as k, the overall situation Sequence number is used to indicate to need position of the reads data log in certain memory block.
Storage order without fixation of the non-Sorted list in disk, equally using B+tree.In the index of non-Sorted list In store the deviation post hereof of starting elemental in certain memory block and sequence number, carry out non-sequence column element occurrence and read When, after the global sequence number for needing to read sequence column element is determined, from the beginning of the deviation post of starting elemental, successively from file It is middle to read non-sequence column element, until reading k-th non-sequence column element.
So read by sequence column element occurrence and non-sequence column element corresponding with the sequence column element read, Complete to treat the complete read work for reading data
Step 410:Read to continue from increment block and fetch data.
In this step, it is determined that continue fetch data it is described reading replica node increment block in update the data when, directly In connecing the reading reading replica node increment block, the updating the data of fetching data of continuing.
Step 411:Read to continue in the memory block indicated from the local index and fetch data.
In this step, it is determined that continue fetch data it is described reading replica node increment block kind do not update the data when, Read to continue in the memory block determined from step 408 and fetch data.
In the present embodiment, first data read operation request is received by metadata node, judge whether have in access request Major key and other filterconditions, in the case of having major key, can determine according to major key and global index and continue Fetch data the data interval at place, without in the case of major key, access request is sent to all of data interval;Visiting Ask in the case of there are other filterconditions in request, can determine a local index most matched with described other filterconditions For filtering, without in the case of other filterconditions in access request, an arbitrarily selected local index is used to filter.Really Set continue fetch data place data interval and memory block after, by access request be sent to determine data interval it is corresponding Replica node is read, data read operation is carried out by the reading replica node, be that row are inquired about and inquired about to occurrence in access request In the case of for index column, quickly positioning continue the memory block at place of fetching data can also to utilize the filter in local index. The data record read in replica node is stored using row storage form, optimizes the reading performance of data base, because This can realize the quick reading of data using the method for reading data embodiment of the present invention.
Embodiment three
A kind of read-write system of data base, can be found in Fig. 1, and Fig. 1 is system structure diagram disclosed in the embodiment of the present invention. Record data is multiple data segments according to major key transversally cutting by metadata node, and each data segment saves as portion and writes Enter copy and corresponding many parts of readings copy, wherein write copy is stored using row storage form, read copy using row Storage form is stored, and data are organized in different ways in each reading copy, and the system can include:
Metadata node, is data write behaviour in access request for making requests on judgement to the access that client is initiated In the case of work:
The data that data to be written are located are determined according to the major key in global index and access request that itself is preserved Write replica node interval and corresponding with the data interval;
The write replica node of write is needed to initiate operation requests to data to be written;
In the case where access request is data read operation:
Judge whether there is keyword in access request, if major key, then according to the global index for itself preserving and The major key determines the corresponding reading replica node of data interval and the data interval at place of fetching data that continues;
If without major key, access request is sent to current meta data node into all of data interval;
It is determined that the memory block and reading replica node corresponding with the memory block being likely located at of fetching data that continue, and The access request is sent to into each and reads replica node;
Write replica node, writes for after the operation requests for receiving metadata node initiation, will update the data to add In entering its increment block;
Replica node is read, in the case where there are other filterconditions, determining one with other filterconditions most The local index matched somebody with somebody, in the case of without other filterconditions, then an arbitrarily selected local index, determines described to be read The memory block that data are likely located at, judges with the presence or absence of the updating the data of fetching data of continuing in its increment block, if it is, just Read to continue from increment block and fetch data, if it is not, then obtain to continue in the memory block indicated from the local index fetching data.
In actual applications, during the foundation of the global index for preserving in the metadata node, the metadata Node can be used for:
All of write copy is sampled, and the data that sampling is obtained are ranked up according to major key;
To the data demarcation interval after sequence, and distribute corresponding start node for the data interval after division, formed just Beginning global index;
The initial global index is sent to into each write replica node, so that described each write replica node foundation Corresponding relation distributed data between the data interval and start node;
The data distribution result that each write replica node is returned is received, and data are divided according to the data distribution result Block, formulates the plan of data interval balance dispatching and is sent to each write replica node;
Receive what each write replica node sent, the result of the sequencing and scheduling carried out in units of memory block is set up Relation between the major key and the data interval and write replica node corresponding with the data interval;
The foundation of the global index for preserving in the metadata node simultaneously during, said write replica node also may be used For:
The initial global index sent according to metadata node is carrying out data distribution;
The data interval balance dispatching formulated according to metadata node is planned to be ranked up in node and is entered between node Row scheduling.
In actual applications, during the foundation of local index, the metadata node can be also used for:
Distribute corresponding reading replica node for the data interval in the global index;
The above-mentioned relations of distribution are sent to into each write replica node;
Trigger each reading replica node to be ranked up the data in reading copy according to the Sorted list specified;
During the foundation of local index, said write replica node can be also used for:
After receiving the relations of distribution that the metadata node is sent, the data record for itself storing is sent to corresponding Read replica node storage;
During the foundation of local index, the reading replica node can be also used for:
The data after sequence are preserved in units of memory block, keyword is set up with the memory block and corresponding with the memory block Reading replica node between relation.
In other examples, the reading replica node can be also used for:
Create filter.
In other embodiments, said write replica node can be also used for:
Described updating the data is sent to into corresponding reading replica node;
The reading replica node can be also used for:
Batch write increment block is updated the data by described.
In other embodiments, the reading replica node can be also used for:
When access request is that occurrence is inquired about and inquiry is classified as keyword, specified memory block is carried out using filter Filter.
Data read-write system disclosed in the present embodiment, can optimize the readwrite performance of data base, can realize the fast of data Literary sketch enters and quickly reads.
The step of method described with reference to the embodiments described herein or algorithm, directly can be held with hardware, processor Capable software module, or the combination of the two is implementing.Software module can be placed in random access memory (RAM), internal memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, depositor, hard disk, moveable magnetic disc, CD-ROM or technology In field in known any other form of storage medium.
The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or using the present invention. Various modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, the present invention The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope for causing.

Claims (12)

1. a kind of reading/writing method of data base, it is characterised in that by record data according to major key transversally cutting be many numbers According to section, each data segment saves as a write copy and corresponding many parts of readings copy, wherein write copy is deposited using row Storage form is stored, and is read copy and is stored using row storage form, each to read in copy data in different ways Tissue, the method includes:
Receive the access request that client is initiated;
In the case where access request is data write operation:
Major key in the global index and access request preserved in metadata node determines that data to be written need write Data interval and write replica node corresponding with the data interval, the global index be used for indicate major key and number According to the corresponding relation between write replica node interval and corresponding with the data interval;
The write replica node of write is needed to initiate operation requests to data to be written, said write replica node will be updated the data Add and write in its increment block, the increment block is the disk file that record is updated the data, described to update the data as preset bar number The set of interior data to be written;
In the case where access request is data read operation:
Judge whether there is major key in access request, if major key, then according to the overall situation preserved in metadata node Index and the major key determine continue fetch data place data interval and the data interval it is corresponding read copy section Point, and in the case where there is other filterconditions, determine that one most matches with other filterconditions by the reading replica node Local index, without in the case of other filterconditions, then an arbitrarily selected local index, the local index is used for The corresponding relation of key word and memory block is indicated, the memory block is the least unit block of data storage;
If without major key, access request is sent to current meta data node into all of data interval, and having In the case of other filterconditions, one is determined with other filtrations by the corresponding replica node that reads of all of data interval The local index that condition is most matched, without in the case of other filterconditions, then an arbitrarily selected local index;
It is determined that the memory block and reading replica node corresponding with the memory block being likely located at of fetching data that continue, and by institute State access request and be sent to each reading replica node;
The replica node that reads is judged with the presence or absence of the updating the data of fetching data of continuing in its increment block, if it is, just Read to continue from increment block and fetch data, if it is not, then read to continue in the memory block indicated from the local index fetch data, The increment block is the disk file that record is updated the data.
2. method according to claim 1, it is characterised in that the method for building up of the global index includes:
All of write copy is sampled, and the data that sampling is obtained are ranked up according to major key;
To the data demarcation interval after sequence, and distribute corresponding start node for the data interval after division, form initial complete Office's index;
The initial global index is sent to into each write replica node, so that described each write replica node is according to described Corresponding relation distributed data between data interval and start node;
Metadata node receives the data distribution result that each write replica node is returned, and draws according to the data distribution result Divide memory block, formulate the plan of data interval balance dispatching and be simultaneously sent to each write replica node, notify each write copy section Point is ranked up according to the data interval balance dispatching plan, and the memory block is the least unit block of data storage;
Each write replica node start node internal sort;
Receive what each write replica node sent, the result of the sequencing and scheduling carried out in units of memory block sets up described Relation between major key and the data interval and write replica node corresponding with the data interval.
3. method according to claim 1, it is characterised in that the method for building up of the local index includes:
It is the corresponding reading copy section of data interval distribution in the global index in the case where global index has built up Point;
The above-mentioned relations of distribution are sent to into each write replica node, the data record that said write replica node stores itself Send to corresponding and read replica node storage;
Trigger each reading replica node to be ranked up the data in reading copy according to the Sorted list specified;
The data after sequence are preserved in units of memory block, keyword is set up with the memory block and reading corresponding with the memory block Take the relation between replica node.
4. method according to claim 3, it is characterised in that methods described also includes creating the operation of filter, described Whether filter continues to fetch data for judgement and is located in certain memory block.
5. method according to claim 1, it is characterised in that methods described also includes:
Described updating the data is sent to corresponding reading replica node by write replica node;
The reading replica node updates the data batch write increment block by described.
6. method according to claim 1, it is characterised in that obtain in the memory block indicated from the local index Continue and fetch data, specifically also include:
When access request is that occurrence is inquired about and inquiry is classified as keyword, reads replica node application filter and specified is deposited Storage block is filtered.
7. a kind of read-write system of data base, it is characterised in that by record data according to major key transversally cutting be many numbers According to section, each data segment saves as a write copy and corresponding many parts of readings copy, wherein write copy is deposited using row Storage form is stored, and is read copy and is stored using row storage form, each to read in copy data in different ways Tissue, the system includes:
Metadata node, is data write operation in access request for making requests on judgement to the access that client is initiated In the case of:
The data interval that data to be written are located is determined according to the major key in global index and access request that itself is preserved And write replica node corresponding with the data interval;
The write replica node of write is needed to initiate operation requests to data to be written;
In the case where access request is data read operation:
Judge whether there is keyword in access request, if major key, then according to the global index for itself preserving and described Major key determines the corresponding reading replica node of data interval and the data interval at place of fetching data that continues;
If without major key, access request is sent to current meta data node into all of data interval;
It is determined that the memory block and reading replica node corresponding with the memory block being likely located at of fetching data that continue, and by institute State access request and be sent to each reading replica node;
Write replica node, for after the operation requests for receiving metadata node initiation, will update the data to add it is write In increment block;
Replica node is read, in the case where there are other filterconditions, determining that one most matches with other filterconditions Local index, without in the case of other filterconditions, then an arbitrarily selected local index, it is determined that described continuing is fetched data The memory block being likely located at, judges with the presence or absence of the updating the data of fetching data of continuing in its increment block, if it is, just from increasing Read to continue in gauge block and fetch data, if it is not, then obtain to continue in the memory block indicated from the local index fetching data.
8. system according to claim 7, it is characterised in that the metadata node is additionally operable to:
All of write copy is sampled, and the data that sampling is obtained are ranked up according to major key;
To the data demarcation interval after sequence, and distribute corresponding start node for the data interval after division, form initial complete Office's index;
The initial global index is sent to into each write replica node, so that described each write replica node is according to described Corresponding relation distributed data between data interval and start node;
The data distribution result that each write replica node is returned is received, and data block is divided according to the data distribution result, Formulate the plan of data interval balance dispatching and be sent to each write replica node;
Receive what each write replica node sent, the result of the sequencing and scheduling carried out in units of memory block sets up described Relation between major key and the data interval and write replica node corresponding with the data interval;
Said write replica node is additionally operable to:
The initial global index sent according to metadata node is carrying out data distribution;
The data interval balance dispatching formulated according to metadata node is planned to be ranked up in node and is adjusted between node Degree.
9. system according to claim 7, it is characterised in that the metadata node is additionally operable to:
Distribute corresponding reading replica node for the data interval in the global index;
The above-mentioned relations of distribution are sent to into each write replica node;
Trigger each reading replica node to be ranked up the data in reading copy according to the Sorted list specified;
Said write replica node is additionally operable to:
After receiving the relations of distribution that the metadata node is sent, the data record for itself storing is sent to corresponding reading Replica node is stored;
The reading replica node is additionally operable to:
The data after sequence are preserved in units of memory block, keyword is set up with the memory block and reading corresponding with the memory block Take the relation between replica node.
10. system according to claim 7, it is characterised in that the reading replica node is additionally operable to:
Create filter.
11. systems according to claim 7, it is characterised in that said write replica node is additionally operable to:
Described updating the data is sent to into corresponding reading replica node;
The reading replica node is additionally operable to:
Batch write increment block is updated the data by described.
12. systems according to claim 7, it is characterised in that the reading replica node is additionally operable to:
When access request is that occurrence is inquired about and inquiry is classified as keyword, specified memory block was carried out using filter Filter.
CN201110273321.5A 2011-09-15 2011-09-15 Read-write method and system for database Expired - Fee Related CN102999519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110273321.5A CN102999519B (en) 2011-09-15 2011-09-15 Read-write method and system for database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110273321.5A CN102999519B (en) 2011-09-15 2011-09-15 Read-write method and system for database

Publications (2)

Publication Number Publication Date
CN102999519A CN102999519A (en) 2013-03-27
CN102999519B true CN102999519B (en) 2017-05-17

Family

ID=47928093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110273321.5A Expired - Fee Related CN102999519B (en) 2011-09-15 2011-09-15 Read-write method and system for database

Country Status (1)

Country Link
CN (1) CN102999519B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345518B (en) * 2013-07-11 2016-08-10 清华大学 Self-adapting data memory management method based on data block and system
US10311154B2 (en) 2013-09-21 2019-06-04 Oracle International Corporation Combined row and columnar storage for in-memory databases for OLTP and analytics workloads
CN103745008B (en) * 2014-01-28 2016-08-31 河海大学 A kind of sort method of big data directory
CN105718484A (en) * 2014-12-04 2016-06-29 中兴通讯股份有限公司 File writing method, file reading method, file deletion method, file query method and client
CN105740295B (en) * 2014-12-12 2019-06-14 中国移动通信集团公司 A kind of processing method and processing device of distributed data
CN104598652B (en) * 2015-02-14 2017-11-24 广州华多网络科技有限公司 A kind of data base query method and device
US11403318B2 (en) * 2015-10-01 2022-08-02 Futurewei Technologies, Inc. Apparatus and method for managing storage of a primary database and a replica database
CN107368490A (en) * 2016-05-12 2017-11-21 中国移动通信集团河北有限公司 Data processing method and device
US10719446B2 (en) 2017-08-31 2020-07-21 Oracle International Corporation Directly mapped buffer cache on non-volatile memory
US11675761B2 (en) 2017-09-30 2023-06-13 Oracle International Corporation Performing in-memory columnar analytic queries on externally resident data
US11061924B2 (en) * 2017-11-22 2021-07-13 Amazon Technologies, Inc. Multi-region, multi-master replication of database tables
CN110765125B (en) * 2018-07-25 2022-09-20 杭州海康威视数字技术股份有限公司 Method and device for storing data
CN109325031B (en) * 2018-09-13 2021-08-03 上海达梦数据库有限公司 Data statistical method, device, equipment and storage medium
US11170002B2 (en) 2018-10-19 2021-11-09 Oracle International Corporation Integrating Kafka data-in-motion with data-at-rest tables
CN109783571B (en) * 2018-12-13 2023-10-27 平安科技(深圳)有限公司 Data processing method, device, computer equipment and storage medium for isolated environment
CN110162563B (en) * 2019-05-28 2023-11-17 深圳市网心科技有限公司 Data warehousing method and system, electronic equipment and storage medium
WO2022257685A1 (en) * 2021-06-07 2022-12-15 华为技术有限公司 Storage system, network interface card, processor, and data access method, apparatus, and system
CN114064588B (en) * 2021-11-24 2023-04-25 建信金融科技有限责任公司 Storage space scheduling method and system
CN114238362A (en) * 2022-03-01 2022-03-25 广州观必达数据技术有限责任公司 Water conservancy data management system
CN115438114B (en) * 2022-11-09 2023-03-24 浪潮电子信息产业股份有限公司 Storage format conversion method, system, device, electronic equipment and storage medium
CN115544321B (en) * 2022-11-28 2023-03-21 厦门渊亭信息科技有限公司 Method and device for realizing graph database storage and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1965316A (en) * 2004-04-09 2007-05-16 甲骨文国际公司 Index for accessing XML data
CN101496005A (en) * 2005-12-29 2009-07-29 亚马逊科技公司 Distributed replica storage system with web services interface
CN101727465A (en) * 2008-11-03 2010-06-09 中国移动通信集团公司 Methods for establishing and inquiring index of distributed column storage database, device and system thereof
US7761460B1 (en) * 2004-02-04 2010-07-20 Rockwell Automation Technologies, Inc. Systems and methods that utilize a standard database interface to access data within an industrial device
CN101996250A (en) * 2010-11-15 2011-03-30 中国科学院计算技术研究所 Hadoop-based mass stream data storage and query method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7761460B1 (en) * 2004-02-04 2010-07-20 Rockwell Automation Technologies, Inc. Systems and methods that utilize a standard database interface to access data within an industrial device
CN1965316A (en) * 2004-04-09 2007-05-16 甲骨文国际公司 Index for accessing XML data
CN101496005A (en) * 2005-12-29 2009-07-29 亚马逊科技公司 Distributed replica storage system with web services interface
CN101727465A (en) * 2008-11-03 2010-06-09 中国移动通信集团公司 Methods for establishing and inquiring index of distributed column storage database, device and system thereof
CN101996250A (en) * 2010-11-15 2011-03-30 中国科学院计算技术研究所 Hadoop-based mass stream data storage and query method and system

Also Published As

Publication number Publication date
CN102999519A (en) 2013-03-27

Similar Documents

Publication Publication Date Title
CN102999519B (en) Read-write method and system for database
CN103020204B (en) A kind of method and its system carrying out multi-dimensional interval query to distributed sequence list
CN102521405B (en) Massive structured data storage and query methods and systems supporting high-speed loading
CN105528367B (en) Storage and near real-time querying method based on open source big data to time sensitive data
CN102521406B (en) Distributed query method and system for complex task of querying massive structured data
CN107423422B (en) Spatial data distributed storage and search method and system based on grid
CN103154935B (en) For inquiring about the system and method for data stream
CN102646130B (en) Method for storing and indexing mass historical data
US7886124B2 (en) Method and mechanism for implementing dynamic space management for large objects
CN108694195B (en) Management method and system of distributed data warehouse
US8301588B2 (en) Data storage for file updates
JP5233233B2 (en) Information search system, information search index registration device, information search method and program
CN110162528A (en) Magnanimity big data search method and system
CN102016789A (en) Data processing apparatus and method of processing data
CN109284069A (en) A kind of distributed memory system and method for storing Backup Data
CN102819586B (en) A kind of URL sorting technique based on high-speed cache and equipment
CN101452487B (en) Data loading method and system, and data loading unit
CN103176754A (en) Reading and storing method for massive amounts of small files
CN105956123A (en) Local updating software-based data processing method and apparatus
CN104239377A (en) Platform-crossing data retrieval method and device
CN106951375A (en) The method and device of snapped volume is deleted within the storage system
CN107209768A (en) Method and apparatus for the expansible sequence of data set
CN106991190A (en) A kind of database automatically creates subdata base system
CN107491495A (en) Storage method of the preferential space-time trajectory data file of space attribute in auxiliary storage device
CN105512325B (en) Update, deletion and the method for building up and device of multi-edition data index

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170517

Termination date: 20180915