CN106354831A - Method and device for loading segmented data blocks - Google Patents
Method and device for loading segmented data blocks Download PDFInfo
- Publication number
- CN106354831A CN106354831A CN201610777791.8A CN201610777791A CN106354831A CN 106354831 A CN106354831 A CN 106354831A CN 201610777791 A CN201610777791 A CN 201610777791A CN 106354831 A CN106354831 A CN 106354831A
- Authority
- CN
- China
- Prior art keywords
- data
- newline
- url
- offset address
- set space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/06—Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/278—Data partitioning, e.g. horizontal or vertical partitioning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a method and a device for loading segmented data blocks. The method includes judging whether offset addresses of received data blocks are equal to 0 or not and reading data specified in URL (uniform resource locators) if the offset addresses are equal to 0; searching line separators in the ranges from first line separators at the fronts of the offset addresses to preset spaces at the rears of the offset addresses if the offset addresses are larger than 0; discarding data at the fronts of the line separators if the line separators are found; discarding all data in the ranges of the preset spaces at the rears of the offset addresses if the line separators are not found. The method and the device have the advantages that data contents can be determined by loading nodes according to data blocks of the loading nodes, and the data can be parallelly loaded; load among the various loading nodes can be balanced, and the overall loading speed can be increased.
Description
Technical field
The invention belongs to distributed data base technique field, especially relate to a kind of loading method of cutting data block and dress
Put.
Background technology
Distributed data base system is usually used less computer system, and every computer can individually be placed on a ground
Side, all may have a complete copy copy of dbms in every computer, or copied part copy, and has oneself local
Data base, the many computers positioned at different location are interconnected by network, collectively constitute one and complete, the overall situation patrol
The large database concentrated, be physically distributed on volume.
In large-scale distributed analytical type data base cluster system, generally require loading large quantities of from external data source
Amount data.In the face of the external data of substantial amounts of data base set group node and magnanimity, executed simultaneously using clustered node as much as possible
Row data loads, and is the effective ways realized each load balancing loading between node and improve overall loading velocity.How efficient
Ground cutting continuous data and the data being directed to after cutting carry out loading the key factor being to improve overall loading velocity.
Content of the invention
In view of this, a kind of loading method of cutting data block and device are embodiments provided, to realize loading
The purpose of the quick loading data of node.
On the one hand, embodiments provide a kind of loading method of cutting data block, comprising:
Judge whether the data block offset address receiving is equal to 0, if equal to 0, then read the data specified in url;
If greater than 0, then search from the range of the front first newline pre-set space to offset address of offset address and change
Row symbol;
If finding newline, abandon the data before described newline;
Otherwise all data in the range of pre-set space after discarding offset address.
Further, the data specified in described url includes:
Specify Offsets the data in position and space.
Further, methods described also includes:
After the data specified in reading url, continue to read the data in the range of pre-set space;
In the data specified from url, last newline searches newline to pre-set space;
If finding newline, abandon the data after described newline;
Otherwise retain all data.
Further, methods described also includes:
Data cached in scanning preset range, determine the newline in described preset range.
Further, described pre-set space is 4mb.
On the other hand, the embodiment of the present invention additionally provides a kind of charger of cutting data block, comprising:
Judging unit, for judging whether the data block offset address receiving is equal to 0, if equal to 0, then read in url
The data specified;
First searching unit, if greater than 0, is then used for default to offset address from front first newline of offset address
Newline is searched in spatial dimension;
First discarding unit, if finding newline, for abandoning the data before described newline;
Second discarding unit, for abandoning all data in the range of pre-set space after offset address.
Further, the data specified in described url includes:
Specify Offsets the data in position and space.
Further, described device also includes:
Reading unit, for after reading the data specified in url, continuing to read the data in the range of pre-set space;
Second searching unit, looks into pre-set space for last newline in the data specified from url
Look for newline;
3rd discarding unit, if finding newline, for abandoning the data after described newline;
Stick unit, for retaining all data
Further, described device also includes:
Scanning element, data cached in preset range for scanning, determine the newline in described preset range.
Further, described pre-set space is 4mb.
The embodiment of the present invention obtains the specified data in url by the offset address of receiving data block, so that loading
Node can determine data content according to the data block of itself, realize parallel data and load.Can realize between each loading node
Load balancing and the overall loading velocity of raising.
Brief description
In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below will be in embodiment or description of the prior art
The accompanying drawing of required use be briefly described it should be apparent that, drawings in the following description be only the present invention some are real
Apply example, for those of ordinary skill in the art, without having to pay creative labor, can also be attached according to these
Figure obtains other accompanying drawings.
Fig. 1 is the schematic flow sheet of the loading method of cutting data block that the embodiment of the present invention one provides;
Fig. 2 is the schematic flow sheet of the loading method of cutting data block that the embodiment of the present invention two provides;
Fig. 3 is the schematic flow sheet of the loading method of cutting data block that the embodiment of the present invention three provides;
Fig. 4 is the structural representation of the charger of cutting data block that the embodiment of the present invention four provides.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation description is it is clear that described embodiment a part of embodiment that is the present invention, rather than whole embodiments.Based on this
Embodiment in bright, the every other enforcement that those of ordinary skill in the art are obtained under the premise of not making creative work
Example, broadly falls into the scope of protection of the invention.
Embodiment one
Fig. 1 is the schematic flow sheet of the loading method of cutting data block that the embodiment of the present invention one provides, and the present embodiment can
Be applied to the situation loading nodal parallel loading data distributed data base system, the method can be by cutting data block
Charger executing, can be realized by software/hardware mode, and can be integrated in the loading in distributed data base system by this device
In node.
Referring to Fig. 1, the loading method of described cutting data block, comprising:
S110, judges whether the data block offset address receiving is equal to 0, if equal to 0, then read the number specified in url
According to.
In embodiments of the present invention, the single computer node executing file loading tasks in data-base cluster is to load section
Point, data-base cluster, can be according to certain load balancings, to big data literary composition when loading the data file of gb level or bigger
Part carries out coarseness logic cutting, and the file fragment information after cutting is distributed to different clustered nodes.Data-base cluster
Pass to the file fragment packet offset address containing file fragment loading node and two parameters of file fragment length.These ginsengs
The url suffix mode of number shape such as #offset=value&length=value represents.File fragment offset address: referred to as partially
Move address or offset, be the length away from top of file for the file fragment first byte, unit is represented with byte (byte).
File fragment length: abbreviation fragment length or length, are the sizes of file fragment, unit byte (byte) table
Show.
It is with data behavior unit that data loads, and each data is about to be added in database table as a record.
Because cluster is coarseness to the cutting of file, only cutting is carried out for tolerance foundation with file size, so offset table
The document location showing is not necessarily located at the first character section of a data row in data file, and offset+length table
After the document location showing also is not necessarily located in last byte of a data row in data file.Save for ensureing that each loads
The loaded data sum of point can cover all files data, and each loads and occurs without weight between the data that node is loaded
Fold it is desirable to load node cutting relocation process is carried out to file fragment with stable algorithm.File fragment letter after treatment
Breath is referred to as effective document piece segment information, and its parameter comprises effective offset address offset ' and effective fragment length length ', right
For node, if offset address is 0, segment data block headed by the data block loading, then directly read the number specified in url
According to.
S120, if greater than 0, then looks in the range of pre-set space to offset address from front first newline of offset address
Look for newline.
During data loads, regulation supports that the greatest length of single line of data is 4mb, exceedes this length it cannot be guaranteed that loading correct.
Row data for ensureing to be less than 4mb (comprising newline) can be correctly processed.If data block offset address is more than 0,
The non-first data block of data block is described, then needs pre-set space scope after front first newline of offset address is to offset address
First newline of interior lookup is to guarantee the complete of loading data.Described pre-set space is 4mb, 4mb space can be considered as head
Block.
S130, if finding newline, abandons the data before described newline.
If finding newline it is determined that corresponding single line of data, the data before described newline is other nodes
The data loading, then abandon the data before described newline.
S140, if not finding newline, all data in the range of pre-set space after discarding offset address.
If not finding newline, the data in the range of pre-set space non-load data are described, needing will be pre-
If all data are abandoned in spatial dimension.
The embodiment of the present invention obtains the specified data in url by the offset address of receiving data block, so that loading
Node can determine data content according to the data block of itself, realize parallel data and load the accurately and unique of guarantee loading data
Property, it is possible to achieve each load balancing loading between node and the overall loading velocity of raising.
Embodiment two
Fig. 2 is the schematic flow sheet of the loading method of cutting data block that the embodiment of the present invention two provides, and the present invention is implemented
Based on above-described embodiment, further, methods described also includes example: after the data specified in reading url, continues to read
Data in the range of pre-set space;In the data specified from url, last newline is searched to pre-set space and is changed
Row symbol;If finding newline, abandon the data after described newline.
Referring to Fig. 2, the loading method of described cutting data block, comprising:
S210, judges whether the data block offset address receiving is equal to 0, if equal to 0, then read the number specified in url
According to.
S220, if greater than 0, then looks in the range of pre-set space to offset address from front first newline of offset address
Look for newline.
S230, if finding newline, abandons the data before described newline.
S240, if not finding newline, all data in the range of pre-set space after discarding offset address.
S250, after the data specified, continues to read the data in the range of pre-set space in reading url.
Exemplary, after having read the data of url designated length, then excess reads 4mb data, can regard 4mb data
For tail block.
S260, in the data specified from url, last newline searches newline to pre-set space.
From the position of a newline length (sep_len) before tail block, to the interval of tail agllutination beam position, by
First newline is searched after forward direction.I.e. scanning space can be considered [offset1+length1-sep_len, offset1+
length1+tail]
S270, if finding newline, abandoning the data after described newline, otherwise retaining all data.
The present embodiment is by increasing following steps: after the data specified in reading url, continues to read pre-set space scope
Interior data;In the data specified from url, last newline searches newline to pre-set space;If searched
To newline, then abandon the data after described newline.The data of loading can effectively be determined it is ensured that the integrity of loading data
Embodiment three
Fig. 3 is the schematic flow sheet of the loading method of cutting data block that the embodiment of the present invention three provides, and the present invention is implemented
Based on above-described embodiment, further, methods described also includes example: data cached in scanning preset range, determine institute
State the newline in preset range.
Referring to Fig. 3, the loading method of described cutting data block, comprising:
S310, judges whether the data block offset address receiving is equal to 0, if equal to 0, then read the number specified in url
According to.
S320, if greater than 0, then looks in the range of pre-set space to offset address from front first newline of offset address
Look for newline.
S330, if finding newline, abandons the data before described newline.
S340, if not finding newline, all data in the range of pre-set space after discarding offset address.
S350, after the data specified, continues to read the data in the range of pre-set space in reading url.
S360, in the data specified from url, last newline searches newline to pre-set space.
S370, if finding newline, abandoning the data after described newline, otherwise retaining all data.
S380, data cached in scanning preset range, determine the line feed in described preset range.
When newline length be more than 1 byte when, and former and later two newlines r and n be all located at 4mb block front and rear side end to end
When in boundary, Ru Guo r and n is cut from middle, it will lead to: node 1 is wrong because the incomplete newline of data trailer produces
Data by mistake, node 2 produces wrong data because unnecessary imperfect newline in data header.In the present embodiment, pass through
Setting setting sizeable lookup box (red dotted line frame represents) a, it is possible to achieve cache sweep, improves looking into of newline
Look for efficiency.Searching box minimal size is (sep_len+4mb), can once accommodate full line data, and can avoid newline quilt
From middle cutting, the node 2 of node 1 can correctly cutting loading data.
The present embodiment is data cached in following steps scanning preset range by increasing, and determines in described preset range
Newline.Can avoid producing because newline is imperfect wrong data between node.Can correctly cutting and loading data.
Example IV
Fig. 4 is the structural representation of the charger of cutting data block that the embodiment of the present invention four provides, as shown in figure 4,
Described device includes:
Judging unit 410, for judging whether the data block offset address receiving is equal to 0, if equal to 0, then read url
In the data specified;
First searching unit 420, if greater than 0, is then used for from front first newline of offset address to offset address
Newline is searched in the range of pre-set space;
First discarding unit 430, if finding newline, for abandoning the data before described newline;
Second discarding unit 440, for abandoning all data in the range of pre-set space after offset address.
Further, the data specified in described url includes:
Specify Offsets the data in position and space.
Further, described device also includes:
Reading unit 450, for after reading the data specified in url, continuing to read the data in the range of pre-set space;
Second searching unit 460, for last newline in the data specified from url to pre-set space
Search newline;
3rd discarding unit 470, if finding newline, for abandoning the data after described newline;
Stick unit 480, for retaining all data
Further, described device also includes:
Scanning element 490, data cached in preset range for scanning, determine the newline in described preset range.
Further, described pre-set space is 4mb.
The embodiment of the present invention obtains the specified data in url by the offset address of receiving data block, so that loading
Node can determine data content according to the data block of itself, realize parallel data and load.Can realize between each loading node
Load balancing and the overall loading velocity of raising.
One of ordinary skill in the art will appreciate that: all or part of step realizing above-mentioned each method embodiment can be led to
Cross the related hardware of programmed instruction to complete.Aforesaid program can be stored in a computer read/write memory medium.This journey
Sequence upon execution, executes the step including above-mentioned each method embodiment;And aforesaid storage medium includes: rom, ram, magnetic disc or
Person's CD etc. is various can be with the medium of store program codes.
Finally it is noted that various embodiments above, only in order to technical scheme to be described, is not intended to limit;To the greatest extent
Pipe has been described in detail to the present invention with reference to foregoing embodiments, it will be understood by those within the art that: its according to
So the technical scheme described in foregoing embodiments can be modified, or wherein some or all of technical characteristic is entered
Row equivalent;And these modifications or replacement, do not make the essence of appropriate technical solution depart from various embodiments of the present invention technology
The scope of scheme.
Claims (10)
1. a kind of loading method of cutting data block is it is characterised in that include:
Judge whether the data block offset address receiving is equal to 0, if equal to 0, then read the data specified in url;
If greater than 0, then search newline from the range of the front first newline pre-set space to offset address of offset address;
If finding newline, abandon the data before described newline;
Otherwise all data in the range of pre-set space after discarding offset address.
2. method according to claim 1 is it is characterised in that the data specified in described url includes:
Specify Offsets the data in position and space.
3. method according to claim 2 is it is characterised in that methods described also includes:
After the data specified in reading url, continue to read the data in the range of pre-set space;
In the data specified from url, last newline searches newline to pre-set space;
If finding newline, abandon the data after described newline;
Otherwise retain all data.
4. method according to claim 3 is it is characterised in that methods described also includes:
Data cached in scanning preset range, determine the newline in described preset range.
5. the method according to claim 1 or 3 is it is characterised in that described pre-set space is 4mb.
6. a kind of charger of cutting data block is it is characterised in that include:
Judging unit, for judging whether the data block offset address receiving is equal to 0, if equal to 0, then read in url and specify
Data;
First searching unit, if greater than 0, is then used for from offset address front first newline pre-set space to offset address
In the range of search newline;
First discarding unit, if finding newline, for abandoning the data before described newline;
Second discarding unit, for abandoning all data in the range of pre-set space after offset address.
7. device according to claim 6 is it is characterised in that the data specified in described url includes:
Specify Offsets the data in position and space.
8. device according to claim 7 is it is characterised in that described device also includes:
Reading unit, for after reading the data specified in url, continuing to read the data in the range of pre-set space;
Second searching unit, searches to pre-set space for last newline in the data specified from url and changes
Row symbol;
3rd discarding unit, if finding newline, for abandoning the data after described newline;
Stick unit, for retaining all data
9. device according to claim 8 is it is characterised in that described device also includes:
Scanning element, data cached in preset range for scanning, determine the newline in described preset range.
10. the device according to claim 6 or 8 is it is characterised in that described pre-set space is 4mb.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610777791.8A CN106354831A (en) | 2016-08-31 | 2016-08-31 | Method and device for loading segmented data blocks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610777791.8A CN106354831A (en) | 2016-08-31 | 2016-08-31 | Method and device for loading segmented data blocks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106354831A true CN106354831A (en) | 2017-01-25 |
Family
ID=57857134
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610777791.8A Pending CN106354831A (en) | 2016-08-31 | 2016-08-31 | Method and device for loading segmented data blocks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106354831A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115292420A (en) * | 2022-10-10 | 2022-11-04 | 天津南大通用数据技术股份有限公司 | Method and device for rapidly loading data in distributed database |
CN115292373A (en) * | 2022-10-09 | 2022-11-04 | 天津南大通用数据技术股份有限公司 | Method and device for segmenting data block |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663090A (en) * | 2012-04-10 | 2012-09-12 | 华为技术有限公司 | Method and device for inquiry metadata |
CN102841860A (en) * | 2012-08-17 | 2012-12-26 | 珠海世纪鼎利通信科技股份有限公司 | Large data volume information storage and access method |
CN103164538A (en) * | 2013-04-11 | 2013-06-19 | 深圳市华力特电气股份有限公司 | Method and device for analyzing data |
CN103544285A (en) * | 2013-10-28 | 2014-01-29 | 华为技术有限公司 | Data loading method and device |
-
2016
- 2016-08-31 CN CN201610777791.8A patent/CN106354831A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663090A (en) * | 2012-04-10 | 2012-09-12 | 华为技术有限公司 | Method and device for inquiry metadata |
CN102841860A (en) * | 2012-08-17 | 2012-12-26 | 珠海世纪鼎利通信科技股份有限公司 | Large data volume information storage and access method |
CN103164538A (en) * | 2013-04-11 | 2013-06-19 | 深圳市华力特电气股份有限公司 | Method and device for analyzing data |
CN103544285A (en) * | 2013-10-28 | 2014-01-29 | 华为技术有限公司 | Data loading method and device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115292373A (en) * | 2022-10-09 | 2022-11-04 | 天津南大通用数据技术股份有限公司 | Method and device for segmenting data block |
CN115292373B (en) * | 2022-10-09 | 2023-01-24 | 天津南大通用数据技术股份有限公司 | Method and device for segmenting data block |
CN115292420A (en) * | 2022-10-10 | 2022-11-04 | 天津南大通用数据技术股份有限公司 | Method and device for rapidly loading data in distributed database |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3678346A1 (en) | Blockchain smart contract verification method and apparatus, and storage medium | |
CN103699585B (en) | Methods, devices and systems for file metadata storage and file recovery | |
TWI499909B (en) | Hierarchical immutable content-addressable memory processor | |
CN105630955B (en) | A kind of data acquisition system member management method of high-efficiency dynamic | |
US9871727B2 (en) | Routing lookup method and device and method for constructing B-tree structure | |
CN105095116A (en) | Cache replacing method, cache controller and processor | |
CN107122130B (en) | Data deduplication method and device | |
CN103729303A (en) | Data writing and data reading methods of Flash | |
CN103106158A (en) | Memory system including key-value store | |
CN105117351A (en) | Method and apparatus for writing data into cache | |
CN105677904B (en) | Small documents storage method and device based on distributed file system | |
CN104238962A (en) | Method and device for writing data into cache | |
CN106407224A (en) | Method and device for file compaction in KV (Key-Value)-Store system | |
CN108241632A (en) | A kind of data verification method of data base-oriented Data Migration | |
CN106354831A (en) | Method and device for loading segmented data blocks | |
CN103914483A (en) | File storage method and device and file reading method and device | |
CN111159140B (en) | Data processing method, device, electronic equipment and storage medium | |
CN103699435B (en) | Load balancing method and device | |
CN101741708A (en) | Method, device and system for storing data | |
CN114297368A (en) | Efficient keyword filtering method realized in FPGA (field programmable Gate array) way | |
CN106254270A (en) | A kind of queue management method and device | |
CN110018794A (en) | A kind of rubbish recovering method, device, storage system and readable storage medium storing program for executing | |
CN104750846A (en) | Method and device for finding substring | |
CN105279166A (en) | File management method and system | |
CN104077555A (en) | Method and device for identifying badcase in image search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170125 |
|
RJ01 | Rejection of invention patent application after publication |