CN107480205A - A kind of method and apparatus for carrying out data partition - Google Patents
A kind of method and apparatus for carrying out data partition Download PDFInfo
- Publication number
- CN107480205A CN107480205A CN201710606901.9A CN201710606901A CN107480205A CN 107480205 A CN107480205 A CN 107480205A CN 201710606901 A CN201710606901 A CN 201710606901A CN 107480205 A CN107480205 A CN 107480205A
- Authority
- CN
- China
- Prior art keywords
- data
- partition
- split
- subregion
- data partition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method and apparatus for carrying out data partition, it is related to field of computer technology.One embodiment of this method includes:Step S101, the size of each data partition is obtained;Step S102, judge whether to need the data partition split;Step S103, when in the presence of the data partition for needing to be split, the data partition to be split split according to predefined subregion field to needs performs division operation to obtain multiple new data subregions;Step S104, step S101 to step S103 is repeated, until step S102 judges the data partition for needing to be split is not present.The embodiment solves the problems, such as that high error rate caused by manual operation, the wasting of resources and efficiency are low, and can record the details of each division operation, provides foundation for follow-up searching data partitioning scenario, is easy to system administration.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of method and apparatus for carrying out data partition.
Background technology
When data volume is very big, and often carries out condition filter or packet according to certain field, it may be considered that logarithm
According to subregion, such as the sales situation of certain commodity is carried out, often to check that some month, the sale in some season are detailed or total
Meter, then subregion can be carried out according to the sales date, be divided into an area every month, and be preferably capable of the data point not same district
It is not stored on different physical hard disks, so when being inquired about, directly can be inquired about in specified hard disc, data volume
Small, speed is fast, if inquiring about the data in all months, polylith hard disk be able to can also be significantly improved with parallel query, speed.
Partitions of database is a kind of Physical database design technology, although partitioning technique can realize many effects, such as
Simplification of lifting, management of performance etc., but its main purpose is to reduce reading and writing data in specific database manipulation
Total amount improves the efficiency of data query to reduce the response time.
At present, the process of conventional progress partitions of database mainly includes:
1st, by inquiring about data, whether micro-judgment needs subregion, such as:Exceed necessarily when the response time of data query
Threshold value when, can rule of thumb judge that the data volume of the subregion is excessive;
2nd, by professional knowledge, partition identification is set, for carrying out subregion according to the partition identification;
3rd, according to different type of database, by the partitioning instruction in its corresponding database language, perform to create and divide
Area.
In process of the present invention is realized, inventor has found that at least there are the following problems in the prior art:
1st, prior art needs manual intervention and micro-judgment, sectorized when running into, but a certain subregion is still too big
Situation, it is necessary to manually carry out subregion again;
2nd, partition identification is manually set with reference to professional knowledge in specific operation process;
Need manually to be arranged when the 3rd, subsequently searching partitioned record and operating guidance.
To sum up, prior art manual intervention is excessive, and error probability is big, the wasting of resources and efficiency is low;And without written
Standard and record is performed, be unfavorable for system administration.
The content of the invention
In view of this, the embodiment of the present invention provides a kind of method and apparatus for carrying out data partition, can solve the problem that artificial behaviour
The problem of error rate caused by work is high, the wasting of resources and efficiency are low, and the details of each division operation can be recorded,
Foundation is provided for follow-up searching data partitioning scenario, is easy to system administration.
To achieve the above object, a kind of one side according to embodiments of the present invention, there is provided side for carrying out data partition
Method.
A kind of method for carrying out data partition, including:Step S101, the size of each data partition is obtained;Step S102,
Judge whether to need the data partition split;Step S103, when in the presence of the data partition for needing to be split, root
Division operation is performed to the data partition to be split for needing to be split according to predefined subregion field, it is multiple new to obtain
Data partition, the data storage path of the new data subregion include data storage path and the institute of the data partition to be split
The partition identification of new data subregion is stated, the partition identification generates according to the subregion field;Step S104, step is repeated
S101 to step S103, until step S102 judges the data partition for needing to be split is not present.
Alternatively, the step S102 includes:Whether predetermined threshold value is exceeded to sentence according to the size of the data partition
The disconnected data partition split with the presence or absence of the needs;If in the presence of certain data partition, and the size of certain data partition
More than the predetermined threshold value, then judge the data partition for needing to be split be present;Otherwise, it is determined that the need are not present
The data partition split.
Alternatively, the data partition to be split split according to predefined subregion field to needs performs division operation
The step of include:The subregion field is added in the afterbody of each row of data of the data partition to be split;According to the subregion
Field determines that this time data corresponding to division operation deposit path to each row of data;Path is deposited according to the data to establish newly
Data partition;According to each row of data, this time each row of data is saved in pair by data storage path corresponding to division operation
The new data subregion answered.
Alternatively, subregion behaviour is performed in the data partition to be split split according to predefined subregion field to needs
Before the step of making, in addition to:Record the partition name of the data partition for needing to be split.
Alternatively, the subregion field includes character and additional character, and character described in each two is by the additional character
Separate.
A kind of another aspect according to embodiments of the present invention, there is provided device for carrying out data partition.
A kind of device for carrying out data partition, including:Acquisition module, judge module and segmentation module, wherein, the acquisition
Module, for obtaining the size of each data partition;The judge module, for judging whether to need the number split
According to subregion, the segmentation module is performed if it the data partition for needing to be split be present, and return and continue to determine whether
The data partition for needing to be split be present, until in the absence of the data partition for needing to be split;The segmentation
Module, for when in the presence of the data partition for needing to be split, being divided according to predefined subregion field the needs
The data partition to be split cut performs division operation, to obtain multiple new data subregions, the data storage of the new data subregion
Path includes data storage path and the partition identification of the new data subregion of the data partition to be split, the subregion mark
Know and generated according to the subregion field.
Alternatively, the judge module is additionally operable to:According to the size of the data partition whether exceed predetermined threshold value come
Judge whether the data partition for needing to be split;If in the presence of certain data partition, and certain data partition is big
It is small to exceed the predetermined threshold value, then judge the data partition for needing to be split be present;Otherwise, it is determined that in the absence of described
Need the data partition split.
Alternatively, the segmentation module includes:Described in afterbody addition in each row of data of the data partition to be split
Subregion field;The data storage path according to corresponding to the subregion field determines each row of data this time division operation;According to
New data subregion is established in the data storage path;According to data storage path corresponding to each row of data this time division operation
The each row of data is saved in corresponding new data subregion.
Alternatively, in addition to partitioned record module, it is used for:The needs are being divided according to predefined subregion field
Before the data partition to be split cut performs division operation, the partition name of the data partition for needing to be split is recorded.
Alternatively, the subregion field includes character and additional character, and character described in each two is by the additional character
Separate.
A kind of another aspect according to embodiments of the present invention, there is provided terminal for carrying out data partition.
A kind of terminal for carrying out data partition, including:One or more processors;Storage device, for store one or
Multiple programs, when one or more of programs are by one or more of computing devices so that one or more of places
The method that reason device realizes the progress data partition that the embodiment of the present invention is provided.
A kind of another further aspect according to embodiments of the present invention, there is provided computer-readable medium.
A kind of computer-readable medium, computer program is stored thereon with, this is realized when described program is executed by processor
The method for the progress data partition that inventive embodiments are provided.
One embodiment in foregoing invention has the following advantages that or beneficial effect:
It is multiple new with hierarchical relationship by will be divided into more than the data partition of predetermined threshold according to recursive method
Data partition, manually-operated uncertainty is overcome, solve high error rate caused by manual operation, the wasting of resources and efficiency
The problem of low, and the details of each division operation can be recorded, foundation is provided for follow-up searching data partitioning scenario,
It is easy to system administration.
Further effect adds hereinafter in conjunction with embodiment possessed by above-mentioned non-usual optional mode
With explanation.
Brief description of the drawings
Accompanying drawing is used to more fully understand the present invention, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is the schematic diagram of the key step of the method for progress data partition according to embodiments of the present invention;
Fig. 2 is implementation process figure according to an embodiment of the invention;
Fig. 3 is the schematic diagram of the main modular of the device of progress data partition according to embodiments of the present invention;
Fig. 4 is that the embodiment of the present invention can apply to exemplary system architecture figure therein;
Fig. 5 is adapted for the structural representation of the computer system of the terminal for realizing the embodiment of the present invention.
Embodiment
The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, the description to known function and structure is eliminated in following description.
Fig. 1 is the schematic diagram of the key step of the method for progress data partition according to embodiments of the present invention.Such as Fig. 1 institutes
Show, the method for the progress data partition of the embodiment of the present invention mainly includes steps S101 to step S104.
Step S101:Obtain the size of each data partition.
When needing to carry out subregion to the data partition of tables of data or table, it is necessary first to obtain each before the data entry
The size of individual data partition.Wherein, when the tables of data is performs subregion for the first time, whole tables of data is a data point
Area., can be by using hadoop command code " hadoop fs-du-s-h hdfs when obtaining the size of data partition
Address " is obtained, and the order is upon execution, it is only necessary to know the address where tables of data, you can is returned under current path
Data partition size corresponding to all data partition addresses.Such as:According to code " hadoop fs-du-s-h hdfs:// section
Point/user/ user names/library name/table name ", you can return to the size of all data partitions of the tables of data.
Step S102:Judge whether to need the data partition split.
In an embodiment of the present invention, whether predetermined threshold value can be exceeded to judge whether according to the size of data partition
The data partition split in the presence of needs;
If in the presence of certain data partition, and the size of certain data partition exceedes predetermined threshold value, then judge to exist to need
The data partition split;
Otherwise, it is determined that the data partition split in the absence of needs.
, it is necessary to preset the threshold of each data partition before subregion is carried out to the data partition of tables of data or table every time
Value, default threshold value is, for example, 100GB (GB).Wherein, the threshold value of data partition can manually be set, and can also be used
The default value of hadoop configurations.When setting the threshold value of data partition, can flexibly be adjusted according to the concrete condition of practical application
It is whole, depending on the efficiency of data search and the workload of data maintenance should be taken into account, if subregion is too small, the work when carrying out data maintenance
Work amount is possible will be larger, if subregion is excessive, the efficiency of data search will step-down.But the threshold value of data partition is necessary
More than the size of any a data.So, after the size of each data partition is got, by by each data partition
Size is compared with the threshold value, you can determines whether the data partition for needing to be split be present.
Step S103:When in the presence of need split data partition when, according to predefined subregion field to need into
The data partition to be split of row segmentation performs division operation, to obtain multiple new data subregions, the data storage of new data subregion
Path includes data storage path and the partition identification of new data subregion of data partition to be split, and partition identification is according to subregion word
Duan Shengcheng.
Before subregion is carried out to the data partition of tables of data or table every time, it is also necessary to preset this division operation
Subregion field identification and its position, and file separator (such as:/ 001), can be obtained by this 3 parameters set in advance
Take the subregion field for performing this division operation.Such as:Assuming that the field identification bag that a tables of data and its data partition include
Include " name ", " sex ", " age " and " height ", then include between this 4 field identifications and be expressed as after file separator:
" age of the sex of name/001/001/001 height ", when position where known subregion field identification (such as " 3 "), you can obtain
Corresponding subregion field identification is " age ", the subregion field according to corresponding to subregion field identification can get each row of data.
Wherein, the data partition to be split split according to predefined subregion field to needs performs division operation
Process is mainly as follows:
Step S1031:Subregion field is added in the afterbody of each row of data of data partition to be split;
Step S1032:The data storage path according to corresponding to subregion field determines each row of data this time division operation;
Step S1033:Path, which is deposited, according to data establishes new data subregion;
Step S1034:Each row of data is saved according to data storage path corresponding to each row of data this time division operation
Corresponding new data subregion.
Step S1031 can utilize MR (MapReduce) program finished writing, in each row of data tail of data partition to be split
Portion adds a row subregion field, and subregion field includes character and additional character, and each two character is separated by additional character.Such as
Subregion field identification is event_id, for certain the row data included in data partition to be split, subregion corresponding to the row data
The value of field is " ABC123 ", then, the word that the subregion field identification includes in the subregion field that the row data afterbody adds
Symbol is " ABC123 ", and additional character is added between each two character and " after@@", can obtain point of the row data afterbody addition
Area's field is:“A@@B@@C@@1@@2@@3”.Herein, additional character is generally referred to as the character that will not be included in data, example
Such as:@, & etc..
Step S1032 when determining each row of data this time data storage path corresponding to division operation according to subregion field,
The character and the additional character that need to be included according to subregion field is carried out.Data deposit number of the path by data partition to be split
Formed according to the partition identification of storage path and new data subregion, wherein, the partition identification of new data subregion is correspondingly obtained by character.
Define subregion number a j, j are the natural number more than 0 and assign initial value to be 1, represent to a data to be split point
Area performs the number of division operation.Because a data partition to be split may be very big, after a division operation has been performed, point
The new data subregion that is cut into now is needed to the new data subregion that is obtained after segmentation again it is possible to can exceed predetermined threshold value
Division operation is performed, and this time division operation is regarded as second of the division operation performed to the data partition to be split.
The partition identification of new data subregion is correspondingly obtained by character, and the character that subregion field includes has serial number, and
The serial number of character is corresponding with subregion number j, that is, the partition identification of new data subregion is by serial number and subregion number j phases
Corresponding character correspondingly obtains.So that character that subregion field includes is " ABC123 " as an example, each character is endowed predetermined suitable
Sequence number, serial number are 1 for the natural number more than 0 and tax initial value, then serial number corresponding to character " A " is 1, and character " B " is right
The serial number answered is 2, and serial number corresponding to character " 1 " is 4, etc..Thus, it is corresponding new as subregion number j=1
The partition identification of data partition is " A ";As subregion number j=2, the partition identification of corresponding new data subregion is " B ";
As subregion number j=4, the partition identification of corresponding new data subregion is " 1 ", etc..
Path and new data subregion are deposited by the data of data partition to be split in the data storage path of new data subregion
Partition identification forms.Assuming that the data storage path of data partition to be split is X, subregion field identification is test_id, for treating
Certain the row data included in partition data subregion, the value of subregion field corresponding to the row data are " ABC123 ", then, the line number
According to afterbody addition subregion field for " the@@3 " of 1@@of A@@B@@C@@2, as subregion number j=1, corresponding new data subregion
Data storage path is X/test_id_partition=A;As subregion number j=2, the data of corresponding new data subregion
Storage path is X/test_id_partition=A/test_id_partition=B;It is corresponding as subregion number j=3
The data storage path of new data subregion is X/test_id_partition=A/test_id_partition=B/test_
Id_partition=C, the rest may be inferred, you can obtains data corresponding to each division operation of the row data and deposits path.And for example,
Certain data partition path to be split is X, wherein comprising 4 row data, is carried out according to predetermined subregion field identification " Event_id "
Second of division operation is (i.e.:When j=2), the data storage path of new data subregion corresponding to each data is as shown in table 1.
Table 1
Event_id | Subregion field | Data storage path during j=2 |
ABC | A@@B@@C | X/event_id_partition=A/event_id_partition=B |
ADC | A@@D@@C | X/event_id_partition=A/event_id_partition=D |
ABD | A@@B@@D | X/event_id_partition=A/event_id_partition=B |
BDC | B@@D@@C | X/event_id_partition=B/event_id_partition=D |
It can determine that this time data corresponding to division operation deposit path to each row of data by step S1032, afterwards, perform
Step S1033:Path, which is deposited, according to data establishes new data subregion.When establishing new data subregion, for different data, hold
Data storage path may be identical corresponding to this division operation of row, now only needs to establish a new data subregion.Together
When, when establishing new data subregion, can also be the new data subregion name, naming rule can sets itself as needed,
Such as its partition name etc. simply can be used as using the path of the new data subregion.
After different new data subregions is established, you can perform step S1034:According to data, this time division operation is corresponding
Data storage path store data into corresponding to new data subregion.Such as:The data storage road of certain data partition to be split
Footpath is X, and subregion field identification is test_id, and the data partition to be split includes 3 datas, wherein, subregion corresponding to data 1
The character that field includes is " ABC123 ";The character that subregion field corresponding to data 2 includes is " AED56 ";Divide corresponding to data 3
The character that area's field includes is " CD24 ".When performing first time division operation to the data partition to be split, according to this 3 numbers
According to subregion field can obtain this time corresponding to (first time) division operation data deposit path be respectively:X/test_id_
Partition=A and X/test_id_partition=C;Two new datas point can be established by depositing path according to the two data
Area, if depositing path as partition name using data, the partition name of the two new data subregions is respectively:X/test_id_
Partition=A and X/test_id_partition=C;According to data storage corresponding to this 3 data this time division operation
Path, data 1 and data 2 should be saved in new data subregion X/test_id_partition=A, data 3 are saved in newly
In data partition X/test_id_partition=C, in this way, can be achieved the data partition to be split divide into two newly
Data partition X/test_id_partition=A and X/test_id_partition=C.During data partition is carried out,
The method of data as described above in data partition to be split has been assigned in two new data subregions, now, is treated point
Cut data partition and become the new data subregion after segmentation.
Step S104:Step S101 to step S103 is repeated, until step S102 judges to be divided in the absence of needs
The data partition cut.
As it was previously stated, by data partition to be split carry out a division operation after, will perform again step S101 and
During S102, determine whether to still suffer from the data partition for needing to be split.When still suffering from the data partition for needing to be split,
Step S103 is then performed again;When in the absence of the data partition for needing to be split, then to the tables of data or the data of table
The subregion process of subregion terminates.
In addition, after data partition process as described above completion, can also will be in MR program processes, by not
Small documents caused by same reduce (reduction) node merge, and these small documents are the files in data partition to be split,
It is stored in different files by scattered during division operation is performed, by the way that these small documents are merged, can optimizes
Search efficiency.
In order to facilitate the partitioning scenario of subsequent query data, step S103 according to predefined subregion field to need into
Before the data partition to be split of row segmentation performs division operation, the subregion for the data partition for needing to be split can also be recorded
Title, to be formed " partition list " of standardization.According to inquiry needs, data to be split should can also be recorded in " partition list "
The sliced time of subregion, segmentation times, and data partition path obtained every time after segmentation etc. information.
According to step S101 as described above to step S104, you can by more than the data partition of predetermined threshold according to recurrence
Method be divided into multiple new data subregions with hierarchical relationship, overcome manually-operated uncertainty, solve artificial
The problem of error rate caused by operation is high, the wasting of resources and efficiency are low, and the detailed letter of each division operation can be recorded
Breath, provides foundation for follow-up searching data partitioning scenario, is easy to system administration.
Fig. 2 is implementation process figure according to an embodiment of the invention.As shown in Fig. 2 before configuration processor is started, need
Partitioned parameters (step S201) are first set, such as:Partition threshold, subregion field identification and its position, file separator etc..
After being provided with, it is necessary first to obtain the size (step S202) of data partition;Then whether surpassed according to the size of data partition
The partition threshold that pre-sets is crossed to judge whether data partition (step S203) to be split;When in the absence of data to be split
During subregion, EP (end of program) (step S208), when data partition to be split being present, division operation will be performed, is mainly included:Treating
Partition data subregion each row of data afterbody addition subregion field (step S204);This division operation is determined according to subregion field
Data deposit path (step S205);Path dynamic creation new data subregion (step S206) is deposited according to data;Will be to be split
Data in data partition are saved in new data subregion (step S207), in this way, this division operation can be completed;Finally,
Step S202 is jumped to again, and is repeated according to this section of previously described flow, until data partition to be split is not present
When, EP (end of program) (step S208).
Fig. 3 is the schematic diagram of the main modular of the device of progress data partition according to embodiments of the present invention.Such as Fig. 3 institutes
Show, the device 300 of the progress data partition of the embodiment of the present invention mainly includes acquisition module 301, judge module 302 and segmentation mould
Block 303.
Acquisition module 301 is used for the size for obtaining each data partition;
Judge module 302 is used to judge whether to need the data partition split, if being split in the presence of needs
Data partition then perform segmentation module, and return and continue to determine whether the data partition for needing to be split be present, until not
The data partition split in the presence of needs;
Segmentation module 303 is used for when in the presence of the data partition for needing to be split, according to predefined subregion field pair
The data partition to be split for needing to be split performs division operation, to obtain multiple new data subregions, the number of new data subregion
According to the data storage path in storage path including data partition to be split and the partition identification of new data subregion, partition identification according to
Subregion field generates.
According to one embodiment of present invention, judge module 302 can be also used for:
Judge whether to need the data split according to whether the size of data partition exceedes predetermined threshold value
Subregion;
If in the presence of certain data partition, and the size of certain data partition exceedes predetermined threshold value, then judge to exist to need
The data partition split;
Otherwise, it is determined that the data partition split in the absence of needs.
According to one embodiment of present invention, segmentation module 303 can be also used for performing division operation by procedure below:
Subregion field is added in the afterbody of each row of data of data partition to be split;
The data storage path according to corresponding to subregion field determines each row of data this time division operation;
Path, which is deposited, according to data establishes new data subregion;
According to each row of data this time data corresponding to division operation deposit path each row of data is saved in corresponding to it is new several
According to subregion.
In addition, the device 300 of the progress data partition of the embodiment of the present invention can also include partitioned record module (in figure not
Show), it is used for:The data partition to be split split of needs is performed according to predefined subregion field division operation it
Before, record needs the partition name for the data partition split.
Technical scheme according to embodiments of the present invention, subregion field can include character and additional character, and each two word
Symbol is separated by additional character.
Technical scheme according to embodiments of the present invention, by by more than the data partition of predetermined threshold according to recursive method
Multiple new data subregions with hierarchical relationship are divided into, solve high error rate caused by manual operation, the wasting of resources and effect
The problem of rate is low, and the details of each division operation can be recorded, for follow-up searching data partitioning scenario provide according to
According to being easy to system administration.
Fig. 4 shows the method for the progress data partition that can apply the embodiment of the present invention or carries out the device of data partition
The exemplary system architecture 400 of (being adjusted according to specific case).
As shown in figure 4, system architecture 400 can include terminal device 401,402,403, network 404 and server 405
(this framework is only example, and the component included in specific framework can be according to the adjustment of application concrete condition).Network 404 to
The medium of communication link is provided between terminal device 404,402,403 and server 405.Network 404 can include various connections
Type, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted with using terminal equipment 401,402,403 by network 404 with server 405, to receive or send out
Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 401,402,403
(merely illustrative) such as the application of page browsing device, searching class application, JICQ, mailbox client, social platform softwares.
Terminal device 401,402,403 can have a display screen and a various electronic equipments that supported web page browses, bag
Include but be not limited to smart mobile phone, tablet personal computer, pocket computer on knee and desktop computer etc..
Server 405 can be to provide the server of various services, such as utilize terminal device 401,402,403 to user
The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to receiving
To the data such as information query request analyze etc. processing, and by result (such as target push information, product letter
Breath -- merely illustrative) feed back to terminal device.
It should be noted that the method for the progress data partition that the embodiment of the present invention is provided typically is held by server 405
OK, correspondingly, the device for carrying out data partition is generally positioned in server 405.
It should be understood that the number of the terminal device, network and server in Fig. 4 is only schematical.According to realizing need
Will, can have any number of terminal device, network and server.
Below with reference to Fig. 5, it illustrates suitable for for realizing the knot of the computer system 500 of the terminal of the embodiment of the present invention
Structure schematic diagram.Terminal shown in Fig. 5 is only an example, the function and use range of the embodiment of the present invention should not be brought and appointed
What is limited.
As shown in figure 5, computer system 500 includes CPU (CPU) 501, it can be read-only according to being stored in
Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage part 508 and
Perform various appropriate actions and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data.
CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always
Line 504.
I/O interfaces 505 are connected to lower component:Importation 506 including keyboard, mouse etc.;Penetrated including such as negative electrode
The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 508 including hard disk etc.;
And the communications portion 509 of the NIC including LAN card, modem etc..Communications portion 509 via such as because
The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as
Disk, CD, magneto-optic disk, semiconductor memory etc., it is arranged on as needed on driver 510, in order to read from it
Computer program be mounted into as needed storage part 508.
Especially, according to embodiment disclosed by the invention, may be implemented as counting above with reference to the process of flow chart description
Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product, it includes being carried on computer
Computer program on computer-readable recording medium, the computer program include the program code for being used for the method shown in execution flow chart.
In such embodiment, the computer program can be downloaded and installed by communications portion 509 from network, and/or from can
Medium 511 is dismantled to be mounted.When the computer program is performed by CPU (CPU) 501, system of the invention is performed
The above-mentioned function of middle restriction.
It should be noted that the computer-readable medium shown in the present invention can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination.Meter
The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more wires, just
Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the present invention, computer-readable recording medium can any include or store journey
The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this
In invention, computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for
By instruction execution system, device either device use or program in connection.Included on computer-readable medium
Program code can be transmitted with any appropriate medium, be included but is not limited to:Wirelessly, electric wire, optical cable, RF etc., or it is above-mentioned
Any appropriate combination.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of various embodiments of the invention, method and computer journey
Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation
The part of one module of table, program segment or code, a part for above-mentioned module, program segment or code include one or more
For realizing the executable instruction of defined logic function.It should also be noted that some as replace realization in, institute in square frame
The function of mark can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actual
On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also
It is noted that the combination of each square frame and block diagram in block diagram or flow chart or the square frame in flow chart, can use and perform rule
Fixed function or the special hardware based system of operation are realized, or can use the group of specialized hardware and computer instruction
Close to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard
The mode of part is realized.Described module can also be set within a processor, for example, can be described as:A kind of processor bag
Include acquisition module, judge module and segmentation module.Wherein, the title of these modules is not formed to the module under certain conditions
The restriction of itself, for example, acquisition module is also described as " being used for the module for obtaining the size of each data partition ".
As on the other hand, present invention also offers a kind of computer-readable medium, the computer-readable medium can be
Included in equipment described in above-described embodiment;Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating
Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the equipment, makes
Obtaining the equipment includes:Step S101, the size of each data partition is obtained;Step S102, judge whether that needs are divided
The data partition cut;Step S103, when in the presence of the data partition for needing to be split, according to predefined subregion field to institute
The data partition to be split that stating needs to be split performs division operation, to obtain multiple new data subregions, the new data point
The data storage path in area includes data storage path and the subregion mark of the new data subregion of the data partition to be split
Know, the partition identification generates according to the subregion field;Step S104, step S101 to step S103 is repeated, until
Step S102 judges the data partition for needing to be split is not present.
Technical scheme according to embodiments of the present invention, by by more than the data partition of predetermined threshold according to recursive method
Multiple new data subregions with hierarchical relationship are divided into, manually-operated uncertainty is overcome, solves manual operation and draw
The problem of error rate that rises is high, the wasting of resources and efficiency are low, and the details of each division operation can be recorded, after being
Continuous searching data partitioning scenario provides foundation, is easy to system administration.
Above-mentioned embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright
It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any
Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention
Within.
Claims (12)
- A kind of 1. method for carrying out data partition, it is characterised in that including:Step S101, the size of each data partition is obtained;Step S102, judge whether to need the data partition split;Step S103, when in the presence of need split data partition when, according to predefined subregion field to it is described need into The data partition to be split of row segmentation performs division operation, to obtain multiple new data subregions, the data of the new data subregion The partition identification in data storage path and the new data subregion of the storage path including the data partition to be split, described point Area's mark generates according to the subregion field;Step S104, step S101 to step S103 is repeated, until step S102 judges that the needs, which are not present, to be divided The data partition cut.
- 2. according to the method for claim 1, it is characterised in that the step S102 includes:Whether exceed what predetermined threshold value was split to judge whether the needs according to the size of the data partition Data partition;If in the presence of certain data partition, and the size of certain data partition exceedes the predetermined threshold value, then judge described in presence Need the data partition split;Otherwise, it is determined that in the absence of the data partition for needing to be split.
- 3. according to the method for claim 1, it is characterised in that the needs are divided according to predefined subregion field The step of data partition to be split execution division operation cut, includes:The subregion field is added in the afterbody of each row of data of the data partition to be split;The data storage path according to corresponding to the subregion field determines each row of data this time division operation;Path, which is deposited, according to the data establishes new data subregion;According to each row of data, this time data corresponding to division operation are deposited corresponding to each row of data is saved in by path New data subregion.
- 4. according to the method for claim 1, it is characterised in that needing to carry out to described according to predefined subregion field Before the data partition to be split of segmentation performs the step of division operation, in addition to:Record the data for needing to be split The partition name of subregion.
- 5. according to the method for claim 1, it is characterised in that the subregion field includes character and additional character, and often Two characters are separated by the additional character.
- A kind of 6. device for carrying out data partition, it is characterised in that including:Acquisition module, judge module and segmentation module, its In,The acquisition module, for obtaining the size of each data partition;The judge module, for judging whether to need the data partition split, divided if the needs be present The data partition cut then performs the segmentation module, and returns and continue to determine whether the data for needing to be split point be present Area, until in the absence of the data partition for needing to be split;The segmentation module, for when in the presence of need split data partition when, according to predefined subregion field to institute The data partition to be split that stating needs to be split performs division operation, to obtain multiple new data subregions, the new data point The data storage path in area includes data storage path and the subregion mark of the new data subregion of the data partition to be split Know, the partition identification generates according to the subregion field.
- 7. device according to claim 6, it is characterised in that the judge module is additionally operable to:Whether exceed what predetermined threshold value was split to judge whether the needs according to the size of the data partition Data partition;If in the presence of certain data partition, and the size of certain data partition exceedes the predetermined threshold value, then judge described in presence Need the data partition split;Otherwise, it is determined that in the absence of the data partition for needing to be split.
- 8. device according to claim 6, it is characterised in that the segmentation module is additionally operable to:The subregion field is added in the afterbody of each row of data of the data partition to be split;The data storage path according to corresponding to the subregion field determines each row of data this time division operation;Path, which is deposited, according to the data establishes new data subregion;According to each row of data, this time data corresponding to division operation are deposited corresponding to each row of data is saved in by path New data subregion.
- 9. device according to claim 6, it is characterised in that also including partitioned record module, be used for:According to predefined Subregion field needed to described before the data partition to be split split performs division operation, recording described needs to carry out The partition name of the data partition of segmentation.
- 10. device according to claim 6, it is characterised in that the subregion field includes character and additional character, and often Two characters are separated by the additional character.
- A kind of 11. terminal for carrying out data partition, it is characterised in that including:One or more processors;Storage device, for storing one or more programs,When one or more of programs are by one or more of computing devices so that one or more of processors are real The now method as described in any in claim 1-5.
- 12. a kind of computer-readable medium, is stored thereon with computer program, it is characterised in that described program is held by processor The method as described in any in claim 1-5 is realized during row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710606901.9A CN107480205B (en) | 2017-07-24 | 2017-07-24 | Method and device for partitioning data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710606901.9A CN107480205B (en) | 2017-07-24 | 2017-07-24 | Method and device for partitioning data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107480205A true CN107480205A (en) | 2017-12-15 |
CN107480205B CN107480205B (en) | 2020-06-05 |
Family
ID=60595810
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710606901.9A Active CN107480205B (en) | 2017-07-24 | 2017-07-24 | Method and device for partitioning data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107480205B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101531A (en) * | 2018-06-22 | 2018-12-28 | 联想(北京)有限公司 | Document handling method, apparatus and system |
CN109542898A (en) * | 2018-10-30 | 2019-03-29 | 天津字节跳动科技有限公司 | Date storage method, device, electronic equipment and the storage medium of data bank table |
CN110519319A (en) * | 2018-05-22 | 2019-11-29 | 杭州海康威视数字技术股份有限公司 | A kind of method and device dividing subregion |
CN110750515A (en) * | 2019-09-25 | 2020-02-04 | 浙江大华技术股份有限公司 | Database query method and processing device |
CN111061738A (en) * | 2019-12-16 | 2020-04-24 | 中国建设银行股份有限公司 | Data table pre-grouping method, device, equipment and storage medium |
CN112905596A (en) * | 2021-03-05 | 2021-06-04 | 北京中经惠众科技有限公司 | Data processing method and device, computer equipment and storage medium |
CN113778657A (en) * | 2020-09-24 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Data processing method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101546294A (en) * | 2009-04-30 | 2009-09-30 | 青岛海信宽带多媒体技术有限公司 | Method for storing data in Flash memory |
US20090245665A1 (en) * | 2008-03-31 | 2009-10-01 | Konica Minolta Systems Laboratory, Inc. | Systems and methods for resolution switching |
CN103902544A (en) * | 2012-12-25 | 2014-07-02 | 中国移动通信集团公司 | Data processing method and system |
CN105009110A (en) * | 2012-11-30 | 2015-10-28 | 华为技术有限公司 | Method for automated scaling of massive parallel processing (mpp) database |
-
2017
- 2017-07-24 CN CN201710606901.9A patent/CN107480205B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090245665A1 (en) * | 2008-03-31 | 2009-10-01 | Konica Minolta Systems Laboratory, Inc. | Systems and methods for resolution switching |
CN101546294A (en) * | 2009-04-30 | 2009-09-30 | 青岛海信宽带多媒体技术有限公司 | Method for storing data in Flash memory |
CN105009110A (en) * | 2012-11-30 | 2015-10-28 | 华为技术有限公司 | Method for automated scaling of massive parallel processing (mpp) database |
CN103902544A (en) * | 2012-12-25 | 2014-07-02 | 中国移动通信集团公司 | Data processing method and system |
Non-Patent Citations (1)
Title |
---|
孙雷刚等: "数据分区在地学空间数据查询中的应用", 《计算机应用》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110519319A (en) * | 2018-05-22 | 2019-11-29 | 杭州海康威视数字技术股份有限公司 | A kind of method and device dividing subregion |
CN110519319B (en) * | 2018-05-22 | 2022-02-11 | 杭州海康威视数字技术股份有限公司 | Method and device for splitting partitions |
CN109101531A (en) * | 2018-06-22 | 2018-12-28 | 联想(北京)有限公司 | Document handling method, apparatus and system |
CN109101531B (en) * | 2018-06-22 | 2022-05-31 | 联想(北京)有限公司 | File processing method, device and system |
CN109542898A (en) * | 2018-10-30 | 2019-03-29 | 天津字节跳动科技有限公司 | Date storage method, device, electronic equipment and the storage medium of data bank table |
CN110750515A (en) * | 2019-09-25 | 2020-02-04 | 浙江大华技术股份有限公司 | Database query method and processing device |
CN111061738A (en) * | 2019-12-16 | 2020-04-24 | 中国建设银行股份有限公司 | Data table pre-grouping method, device, equipment and storage medium |
CN113778657A (en) * | 2020-09-24 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Data processing method and device |
CN113778657B (en) * | 2020-09-24 | 2024-04-16 | 北京沃东天骏信息技术有限公司 | Data processing method and device |
CN112905596A (en) * | 2021-03-05 | 2021-06-04 | 北京中经惠众科技有限公司 | Data processing method and device, computer equipment and storage medium |
CN112905596B (en) * | 2021-03-05 | 2024-02-02 | 北京中经惠众科技有限公司 | Data processing method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107480205B (en) | 2020-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107480205A (en) | A kind of method and apparatus for carrying out data partition | |
CN109409119A (en) | Data manipulation method and device | |
CN108846753B (en) | Method and apparatus for processing data | |
CN108629029A (en) | A kind of data processing method and device applied to data warehouse | |
CN110798467B (en) | Target object identification method and device, computer equipment and storage medium | |
CN110019080A (en) | Data access method and device | |
CN107908615A (en) | A kind of method and apparatus for obtaining search term corresponding goods classification | |
CN109388654A (en) | A kind of method and apparatus for inquiring tables of data | |
US10417192B2 (en) | File classification in a distributed file system | |
CN108288208A (en) | The displaying object of image content-based determines method, apparatus, medium and equipment | |
CN110019367A (en) | A kind of method and apparatus of statistical data feature | |
CN112860744A (en) | Business process processing method and device | |
US20190163726A1 (en) | Automatic equation transformation from text | |
CN110309142A (en) | The method and apparatus of regulation management | |
CN109753424A (en) | The method and apparatus of AB test | |
CN112558966B (en) | Depth model visualization data processing method and device and electronic equipment | |
CN111339743B (en) | Account number generation method and device | |
CN110807097A (en) | Method and device for analyzing data | |
CN107895044A (en) | A kind of database data processing method, device and system | |
US9965558B2 (en) | Cross-channel social search | |
CN104598554B (en) | Webpage loading method and device | |
WO2019091101A1 (en) | Salary calculation method, application server and computer-readable storage medium | |
CN111753226A (en) | Page loading method and device | |
CN110347654A (en) | A kind of method and apparatus of online cluster features | |
CN109086438A (en) | Method and apparatus for query information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |