CN107480205A - A kind of method and apparatus for carrying out data partition - Google Patents

A kind of method and apparatus for carrying out data partition Download PDF

Info

Publication number
CN107480205A
CN107480205A CN201710606901.9A CN201710606901A CN107480205A CN 107480205 A CN107480205 A CN 107480205A CN 201710606901 A CN201710606901 A CN 201710606901A CN 107480205 A CN107480205 A CN 107480205A
Authority
CN
China
Prior art keywords
data
partition
split
subregion
data partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710606901.9A
Other languages
Chinese (zh)
Other versions
CN107480205B (en
Inventor
屠志强
季健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710606901.9A priority Critical patent/CN107480205B/en
Publication of CN107480205A publication Critical patent/CN107480205A/en
Application granted granted Critical
Publication of CN107480205B publication Critical patent/CN107480205B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method and apparatus for carrying out data partition, it is related to field of computer technology.One embodiment of this method includes:Step S101, the size of each data partition is obtained;Step S102, judge whether to need the data partition split;Step S103, when in the presence of the data partition for needing to be split, the data partition to be split split according to predefined subregion field to needs performs division operation to obtain multiple new data subregions;Step S104, step S101 to step S103 is repeated, until step S102 judges the data partition for needing to be split is not present.The embodiment solves the problems, such as that high error rate caused by manual operation, the wasting of resources and efficiency are low, and can record the details of each division operation, provides foundation for follow-up searching data partitioning scenario, is easy to system administration.

Description

A kind of method and apparatus for carrying out data partition
Technical field
The present invention relates to field of computer technology, more particularly to a kind of method and apparatus for carrying out data partition.
Background technology
When data volume is very big, and often carries out condition filter or packet according to certain field, it may be considered that logarithm According to subregion, such as the sales situation of certain commodity is carried out, often to check that some month, the sale in some season are detailed or total Meter, then subregion can be carried out according to the sales date, be divided into an area every month, and be preferably capable of the data point not same district It is not stored on different physical hard disks, so when being inquired about, directly can be inquired about in specified hard disc, data volume Small, speed is fast, if inquiring about the data in all months, polylith hard disk be able to can also be significantly improved with parallel query, speed.
Partitions of database is a kind of Physical database design technology, although partitioning technique can realize many effects, such as Simplification of lifting, management of performance etc., but its main purpose is to reduce reading and writing data in specific database manipulation Total amount improves the efficiency of data query to reduce the response time.
At present, the process of conventional progress partitions of database mainly includes:
1st, by inquiring about data, whether micro-judgment needs subregion, such as:Exceed necessarily when the response time of data query Threshold value when, can rule of thumb judge that the data volume of the subregion is excessive;
2nd, by professional knowledge, partition identification is set, for carrying out subregion according to the partition identification;
3rd, according to different type of database, by the partitioning instruction in its corresponding database language, perform to create and divide Area.
In process of the present invention is realized, inventor has found that at least there are the following problems in the prior art:
1st, prior art needs manual intervention and micro-judgment, sectorized when running into, but a certain subregion is still too big Situation, it is necessary to manually carry out subregion again;
2nd, partition identification is manually set with reference to professional knowledge in specific operation process;
Need manually to be arranged when the 3rd, subsequently searching partitioned record and operating guidance.
To sum up, prior art manual intervention is excessive, and error probability is big, the wasting of resources and efficiency is low;And without written Standard and record is performed, be unfavorable for system administration.
The content of the invention
In view of this, the embodiment of the present invention provides a kind of method and apparatus for carrying out data partition, can solve the problem that artificial behaviour The problem of error rate caused by work is high, the wasting of resources and efficiency are low, and the details of each division operation can be recorded, Foundation is provided for follow-up searching data partitioning scenario, is easy to system administration.
To achieve the above object, a kind of one side according to embodiments of the present invention, there is provided side for carrying out data partition Method.
A kind of method for carrying out data partition, including:Step S101, the size of each data partition is obtained;Step S102, Judge whether to need the data partition split;Step S103, when in the presence of the data partition for needing to be split, root Division operation is performed to the data partition to be split for needing to be split according to predefined subregion field, it is multiple new to obtain Data partition, the data storage path of the new data subregion include data storage path and the institute of the data partition to be split The partition identification of new data subregion is stated, the partition identification generates according to the subregion field;Step S104, step is repeated S101 to step S103, until step S102 judges the data partition for needing to be split is not present.
Alternatively, the step S102 includes:Whether predetermined threshold value is exceeded to sentence according to the size of the data partition The disconnected data partition split with the presence or absence of the needs;If in the presence of certain data partition, and the size of certain data partition More than the predetermined threshold value, then judge the data partition for needing to be split be present;Otherwise, it is determined that the need are not present The data partition split.
Alternatively, the data partition to be split split according to predefined subregion field to needs performs division operation The step of include:The subregion field is added in the afterbody of each row of data of the data partition to be split;According to the subregion Field determines that this time data corresponding to division operation deposit path to each row of data;Path is deposited according to the data to establish newly Data partition;According to each row of data, this time each row of data is saved in pair by data storage path corresponding to division operation The new data subregion answered.
Alternatively, subregion behaviour is performed in the data partition to be split split according to predefined subregion field to needs Before the step of making, in addition to:Record the partition name of the data partition for needing to be split.
Alternatively, the subregion field includes character and additional character, and character described in each two is by the additional character Separate.
A kind of another aspect according to embodiments of the present invention, there is provided device for carrying out data partition.
A kind of device for carrying out data partition, including:Acquisition module, judge module and segmentation module, wherein, the acquisition Module, for obtaining the size of each data partition;The judge module, for judging whether to need the number split According to subregion, the segmentation module is performed if it the data partition for needing to be split be present, and return and continue to determine whether The data partition for needing to be split be present, until in the absence of the data partition for needing to be split;The segmentation Module, for when in the presence of the data partition for needing to be split, being divided according to predefined subregion field the needs The data partition to be split cut performs division operation, to obtain multiple new data subregions, the data storage of the new data subregion Path includes data storage path and the partition identification of the new data subregion of the data partition to be split, the subregion mark Know and generated according to the subregion field.
Alternatively, the judge module is additionally operable to:According to the size of the data partition whether exceed predetermined threshold value come Judge whether the data partition for needing to be split;If in the presence of certain data partition, and certain data partition is big It is small to exceed the predetermined threshold value, then judge the data partition for needing to be split be present;Otherwise, it is determined that in the absence of described Need the data partition split.
Alternatively, the segmentation module includes:Described in afterbody addition in each row of data of the data partition to be split Subregion field;The data storage path according to corresponding to the subregion field determines each row of data this time division operation;According to New data subregion is established in the data storage path;According to data storage path corresponding to each row of data this time division operation The each row of data is saved in corresponding new data subregion.
Alternatively, in addition to partitioned record module, it is used for:The needs are being divided according to predefined subregion field Before the data partition to be split cut performs division operation, the partition name of the data partition for needing to be split is recorded.
Alternatively, the subregion field includes character and additional character, and character described in each two is by the additional character Separate.
A kind of another aspect according to embodiments of the present invention, there is provided terminal for carrying out data partition.
A kind of terminal for carrying out data partition, including:One or more processors;Storage device, for store one or Multiple programs, when one or more of programs are by one or more of computing devices so that one or more of places The method that reason device realizes the progress data partition that the embodiment of the present invention is provided.
A kind of another further aspect according to embodiments of the present invention, there is provided computer-readable medium.
A kind of computer-readable medium, computer program is stored thereon with, this is realized when described program is executed by processor The method for the progress data partition that inventive embodiments are provided.
One embodiment in foregoing invention has the following advantages that or beneficial effect:
It is multiple new with hierarchical relationship by will be divided into more than the data partition of predetermined threshold according to recursive method Data partition, manually-operated uncertainty is overcome, solve high error rate caused by manual operation, the wasting of resources and efficiency The problem of low, and the details of each division operation can be recorded, foundation is provided for follow-up searching data partitioning scenario, It is easy to system administration.
Further effect adds hereinafter in conjunction with embodiment possessed by above-mentioned non-usual optional mode With explanation.
Brief description of the drawings
Accompanying drawing is used to more fully understand the present invention, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is the schematic diagram of the key step of the method for progress data partition according to embodiments of the present invention;
Fig. 2 is implementation process figure according to an embodiment of the invention;
Fig. 3 is the schematic diagram of the main modular of the device of progress data partition according to embodiments of the present invention;
Fig. 4 is that the embodiment of the present invention can apply to exemplary system architecture figure therein;
Fig. 5 is adapted for the structural representation of the computer system of the terminal for realizing the embodiment of the present invention.
Embodiment
The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, the description to known function and structure is eliminated in following description.
Fig. 1 is the schematic diagram of the key step of the method for progress data partition according to embodiments of the present invention.Such as Fig. 1 institutes Show, the method for the progress data partition of the embodiment of the present invention mainly includes steps S101 to step S104.
Step S101:Obtain the size of each data partition.
When needing to carry out subregion to the data partition of tables of data or table, it is necessary first to obtain each before the data entry The size of individual data partition.Wherein, when the tables of data is performs subregion for the first time, whole tables of data is a data point Area., can be by using hadoop command code " hadoop fs-du-s-h hdfs when obtaining the size of data partition Address " is obtained, and the order is upon execution, it is only necessary to know the address where tables of data, you can is returned under current path Data partition size corresponding to all data partition addresses.Such as:According to code " hadoop fs-du-s-h hdfs:// section Point/user/ user names/library name/table name ", you can return to the size of all data partitions of the tables of data.
Step S102:Judge whether to need the data partition split.
In an embodiment of the present invention, whether predetermined threshold value can be exceeded to judge whether according to the size of data partition The data partition split in the presence of needs;
If in the presence of certain data partition, and the size of certain data partition exceedes predetermined threshold value, then judge to exist to need The data partition split;
Otherwise, it is determined that the data partition split in the absence of needs.
, it is necessary to preset the threshold of each data partition before subregion is carried out to the data partition of tables of data or table every time Value, default threshold value is, for example, 100GB (GB).Wherein, the threshold value of data partition can manually be set, and can also be used The default value of hadoop configurations.When setting the threshold value of data partition, can flexibly be adjusted according to the concrete condition of practical application It is whole, depending on the efficiency of data search and the workload of data maintenance should be taken into account, if subregion is too small, the work when carrying out data maintenance Work amount is possible will be larger, if subregion is excessive, the efficiency of data search will step-down.But the threshold value of data partition is necessary More than the size of any a data.So, after the size of each data partition is got, by by each data partition Size is compared with the threshold value, you can determines whether the data partition for needing to be split be present.
Step S103:When in the presence of need split data partition when, according to predefined subregion field to need into The data partition to be split of row segmentation performs division operation, to obtain multiple new data subregions, the data storage of new data subregion Path includes data storage path and the partition identification of new data subregion of data partition to be split, and partition identification is according to subregion word Duan Shengcheng.
Before subregion is carried out to the data partition of tables of data or table every time, it is also necessary to preset this division operation Subregion field identification and its position, and file separator (such as:/ 001), can be obtained by this 3 parameters set in advance Take the subregion field for performing this division operation.Such as:Assuming that the field identification bag that a tables of data and its data partition include Include " name ", " sex ", " age " and " height ", then include between this 4 field identifications and be expressed as after file separator: " age of the sex of name/001/001/001 height ", when position where known subregion field identification (such as " 3 "), you can obtain Corresponding subregion field identification is " age ", the subregion field according to corresponding to subregion field identification can get each row of data.
Wherein, the data partition to be split split according to predefined subregion field to needs performs division operation Process is mainly as follows:
Step S1031:Subregion field is added in the afterbody of each row of data of data partition to be split;
Step S1032:The data storage path according to corresponding to subregion field determines each row of data this time division operation;
Step S1033:Path, which is deposited, according to data establishes new data subregion;
Step S1034:Each row of data is saved according to data storage path corresponding to each row of data this time division operation Corresponding new data subregion.
Step S1031 can utilize MR (MapReduce) program finished writing, in each row of data tail of data partition to be split Portion adds a row subregion field, and subregion field includes character and additional character, and each two character is separated by additional character.Such as Subregion field identification is event_id, for certain the row data included in data partition to be split, subregion corresponding to the row data The value of field is " ABC123 ", then, the word that the subregion field identification includes in the subregion field that the row data afterbody adds Symbol is " ABC123 ", and additional character is added between each two character and " after@@", can obtain point of the row data afterbody addition Area's field is:“A@@B@@C@@1@@2@@3”.Herein, additional character is generally referred to as the character that will not be included in data, example Such as:@, & etc..
Step S1032 when determining each row of data this time data storage path corresponding to division operation according to subregion field, The character and the additional character that need to be included according to subregion field is carried out.Data deposit number of the path by data partition to be split Formed according to the partition identification of storage path and new data subregion, wherein, the partition identification of new data subregion is correspondingly obtained by character.
Define subregion number a j, j are the natural number more than 0 and assign initial value to be 1, represent to a data to be split point Area performs the number of division operation.Because a data partition to be split may be very big, after a division operation has been performed, point The new data subregion that is cut into now is needed to the new data subregion that is obtained after segmentation again it is possible to can exceed predetermined threshold value Division operation is performed, and this time division operation is regarded as second of the division operation performed to the data partition to be split.
The partition identification of new data subregion is correspondingly obtained by character, and the character that subregion field includes has serial number, and The serial number of character is corresponding with subregion number j, that is, the partition identification of new data subregion is by serial number and subregion number j phases Corresponding character correspondingly obtains.So that character that subregion field includes is " ABC123 " as an example, each character is endowed predetermined suitable Sequence number, serial number are 1 for the natural number more than 0 and tax initial value, then serial number corresponding to character " A " is 1, and character " B " is right The serial number answered is 2, and serial number corresponding to character " 1 " is 4, etc..Thus, it is corresponding new as subregion number j=1 The partition identification of data partition is " A ";As subregion number j=2, the partition identification of corresponding new data subregion is " B "; As subregion number j=4, the partition identification of corresponding new data subregion is " 1 ", etc..
Path and new data subregion are deposited by the data of data partition to be split in the data storage path of new data subregion Partition identification forms.Assuming that the data storage path of data partition to be split is X, subregion field identification is test_id, for treating Certain the row data included in partition data subregion, the value of subregion field corresponding to the row data are " ABC123 ", then, the line number According to afterbody addition subregion field for " the@@3 " of 1@@of A@@B@@C@@2, as subregion number j=1, corresponding new data subregion Data storage path is X/test_id_partition=A;As subregion number j=2, the data of corresponding new data subregion Storage path is X/test_id_partition=A/test_id_partition=B;It is corresponding as subregion number j=3 The data storage path of new data subregion is X/test_id_partition=A/test_id_partition=B/test_ Id_partition=C, the rest may be inferred, you can obtains data corresponding to each division operation of the row data and deposits path.And for example, Certain data partition path to be split is X, wherein comprising 4 row data, is carried out according to predetermined subregion field identification " Event_id " Second of division operation is (i.e.:When j=2), the data storage path of new data subregion corresponding to each data is as shown in table 1.
Table 1
Event_id Subregion field Data storage path during j=2
ABC A@@B@@C X/event_id_partition=A/event_id_partition=B
ADC A@@D@@C X/event_id_partition=A/event_id_partition=D
ABD A@@B@@D X/event_id_partition=A/event_id_partition=B
BDC B@@D@@C X/event_id_partition=B/event_id_partition=D
It can determine that this time data corresponding to division operation deposit path to each row of data by step S1032, afterwards, perform Step S1033:Path, which is deposited, according to data establishes new data subregion.When establishing new data subregion, for different data, hold Data storage path may be identical corresponding to this division operation of row, now only needs to establish a new data subregion.Together When, when establishing new data subregion, can also be the new data subregion name, naming rule can sets itself as needed, Such as its partition name etc. simply can be used as using the path of the new data subregion.
After different new data subregions is established, you can perform step S1034:According to data, this time division operation is corresponding Data storage path store data into corresponding to new data subregion.Such as:The data storage road of certain data partition to be split Footpath is X, and subregion field identification is test_id, and the data partition to be split includes 3 datas, wherein, subregion corresponding to data 1 The character that field includes is " ABC123 ";The character that subregion field corresponding to data 2 includes is " AED56 ";Divide corresponding to data 3 The character that area's field includes is " CD24 ".When performing first time division operation to the data partition to be split, according to this 3 numbers According to subregion field can obtain this time corresponding to (first time) division operation data deposit path be respectively:X/test_id_ Partition=A and X/test_id_partition=C;Two new datas point can be established by depositing path according to the two data Area, if depositing path as partition name using data, the partition name of the two new data subregions is respectively:X/test_id_ Partition=A and X/test_id_partition=C;According to data storage corresponding to this 3 data this time division operation Path, data 1 and data 2 should be saved in new data subregion X/test_id_partition=A, data 3 are saved in newly In data partition X/test_id_partition=C, in this way, can be achieved the data partition to be split divide into two newly Data partition X/test_id_partition=A and X/test_id_partition=C.During data partition is carried out, The method of data as described above in data partition to be split has been assigned in two new data subregions, now, is treated point Cut data partition and become the new data subregion after segmentation.
Step S104:Step S101 to step S103 is repeated, until step S102 judges to be divided in the absence of needs The data partition cut.
As it was previously stated, by data partition to be split carry out a division operation after, will perform again step S101 and During S102, determine whether to still suffer from the data partition for needing to be split.When still suffering from the data partition for needing to be split, Step S103 is then performed again;When in the absence of the data partition for needing to be split, then to the tables of data or the data of table The subregion process of subregion terminates.
In addition, after data partition process as described above completion, can also will be in MR program processes, by not Small documents caused by same reduce (reduction) node merge, and these small documents are the files in data partition to be split, It is stored in different files by scattered during division operation is performed, by the way that these small documents are merged, can optimizes Search efficiency.
In order to facilitate the partitioning scenario of subsequent query data, step S103 according to predefined subregion field to need into Before the data partition to be split of row segmentation performs division operation, the subregion for the data partition for needing to be split can also be recorded Title, to be formed " partition list " of standardization.According to inquiry needs, data to be split should can also be recorded in " partition list " The sliced time of subregion, segmentation times, and data partition path obtained every time after segmentation etc. information.
According to step S101 as described above to step S104, you can by more than the data partition of predetermined threshold according to recurrence Method be divided into multiple new data subregions with hierarchical relationship, overcome manually-operated uncertainty, solve artificial The problem of error rate caused by operation is high, the wasting of resources and efficiency are low, and the detailed letter of each division operation can be recorded Breath, provides foundation for follow-up searching data partitioning scenario, is easy to system administration.
Fig. 2 is implementation process figure according to an embodiment of the invention.As shown in Fig. 2 before configuration processor is started, need Partitioned parameters (step S201) are first set, such as:Partition threshold, subregion field identification and its position, file separator etc.. After being provided with, it is necessary first to obtain the size (step S202) of data partition;Then whether surpassed according to the size of data partition The partition threshold that pre-sets is crossed to judge whether data partition (step S203) to be split;When in the absence of data to be split During subregion, EP (end of program) (step S208), when data partition to be split being present, division operation will be performed, is mainly included:Treating Partition data subregion each row of data afterbody addition subregion field (step S204);This division operation is determined according to subregion field Data deposit path (step S205);Path dynamic creation new data subregion (step S206) is deposited according to data;Will be to be split Data in data partition are saved in new data subregion (step S207), in this way, this division operation can be completed;Finally, Step S202 is jumped to again, and is repeated according to this section of previously described flow, until data partition to be split is not present When, EP (end of program) (step S208).
Fig. 3 is the schematic diagram of the main modular of the device of progress data partition according to embodiments of the present invention.Such as Fig. 3 institutes Show, the device 300 of the progress data partition of the embodiment of the present invention mainly includes acquisition module 301, judge module 302 and segmentation mould Block 303.
Acquisition module 301 is used for the size for obtaining each data partition;
Judge module 302 is used to judge whether to need the data partition split, if being split in the presence of needs Data partition then perform segmentation module, and return and continue to determine whether the data partition for needing to be split be present, until not The data partition split in the presence of needs;
Segmentation module 303 is used for when in the presence of the data partition for needing to be split, according to predefined subregion field pair The data partition to be split for needing to be split performs division operation, to obtain multiple new data subregions, the number of new data subregion According to the data storage path in storage path including data partition to be split and the partition identification of new data subregion, partition identification according to Subregion field generates.
According to one embodiment of present invention, judge module 302 can be also used for:
Judge whether to need the data split according to whether the size of data partition exceedes predetermined threshold value Subregion;
If in the presence of certain data partition, and the size of certain data partition exceedes predetermined threshold value, then judge to exist to need The data partition split;
Otherwise, it is determined that the data partition split in the absence of needs.
According to one embodiment of present invention, segmentation module 303 can be also used for performing division operation by procedure below:
Subregion field is added in the afterbody of each row of data of data partition to be split;
The data storage path according to corresponding to subregion field determines each row of data this time division operation;
Path, which is deposited, according to data establishes new data subregion;
According to each row of data this time data corresponding to division operation deposit path each row of data is saved in corresponding to it is new several According to subregion.
In addition, the device 300 of the progress data partition of the embodiment of the present invention can also include partitioned record module (in figure not Show), it is used for:The data partition to be split split of needs is performed according to predefined subregion field division operation it Before, record needs the partition name for the data partition split.
Technical scheme according to embodiments of the present invention, subregion field can include character and additional character, and each two word Symbol is separated by additional character.
Technical scheme according to embodiments of the present invention, by by more than the data partition of predetermined threshold according to recursive method Multiple new data subregions with hierarchical relationship are divided into, solve high error rate caused by manual operation, the wasting of resources and effect The problem of rate is low, and the details of each division operation can be recorded, for follow-up searching data partitioning scenario provide according to According to being easy to system administration.
Fig. 4 shows the method for the progress data partition that can apply the embodiment of the present invention or carries out the device of data partition The exemplary system architecture 400 of (being adjusted according to specific case).
As shown in figure 4, system architecture 400 can include terminal device 401,402,403, network 404 and server 405 (this framework is only example, and the component included in specific framework can be according to the adjustment of application concrete condition).Network 404 to The medium of communication link is provided between terminal device 404,402,403 and server 405.Network 404 can include various connections Type, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted with using terminal equipment 401,402,403 by network 404 with server 405, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 401,402,403 (merely illustrative) such as the application of page browsing device, searching class application, JICQ, mailbox client, social platform softwares.
Terminal device 401,402,403 can have a display screen and a various electronic equipments that supported web page browses, bag Include but be not limited to smart mobile phone, tablet personal computer, pocket computer on knee and desktop computer etc..
Server 405 can be to provide the server of various services, such as utilize terminal device 401,402,403 to user The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to receiving To the data such as information query request analyze etc. processing, and by result (such as target push information, product letter Breath -- merely illustrative) feed back to terminal device.
It should be noted that the method for the progress data partition that the embodiment of the present invention is provided typically is held by server 405 OK, correspondingly, the device for carrying out data partition is generally positioned in server 405.
It should be understood that the number of the terminal device, network and server in Fig. 4 is only schematical.According to realizing need Will, can have any number of terminal device, network and server.
Below with reference to Fig. 5, it illustrates suitable for for realizing the knot of the computer system 500 of the terminal of the embodiment of the present invention Structure schematic diagram.Terminal shown in Fig. 5 is only an example, the function and use range of the embodiment of the present invention should not be brought and appointed What is limited.
As shown in figure 5, computer system 500 includes CPU (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage part 508 and Perform various appropriate actions and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.
I/O interfaces 505 are connected to lower component:Importation 506 including keyboard, mouse etc.;Penetrated including such as negative electrode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 508 including hard disk etc.; And the communications portion 509 of the NIC including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc., it is arranged on as needed on driver 510, in order to read from it Computer program be mounted into as needed storage part 508.
Especially, according to embodiment disclosed by the invention, may be implemented as counting above with reference to the process of flow chart description Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product, it includes being carried on computer Computer program on computer-readable recording medium, the computer program include the program code for being used for the method shown in execution flow chart. In such embodiment, the computer program can be downloaded and installed by communications portion 509 from network, and/or from can Medium 511 is dismantled to be mounted.When the computer program is performed by CPU (CPU) 501, system of the invention is performed The above-mentioned function of middle restriction.
It should be noted that the computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination.Meter The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more wires, just Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer-readable recording medium can any include or store journey The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this In invention, computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for By instruction execution system, device either device use or program in connection.Included on computer-readable medium Program code can be transmitted with any appropriate medium, be included but is not limited to:Wirelessly, electric wire, optical cable, RF etc., or it is above-mentioned Any appropriate combination.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of various embodiments of the invention, method and computer journey Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation The part of one module of table, program segment or code, a part for above-mentioned module, program segment or code include one or more For realizing the executable instruction of defined logic function.It should also be noted that some as replace realization in, institute in square frame The function of mark can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actual On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also It is noted that the combination of each square frame and block diagram in block diagram or flow chart or the square frame in flow chart, can use and perform rule Fixed function or the special hardware based system of operation are realized, or can use the group of specialized hardware and computer instruction Close to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described module can also be set within a processor, for example, can be described as:A kind of processor bag Include acquisition module, judge module and segmentation module.Wherein, the title of these modules is not formed to the module under certain conditions The restriction of itself, for example, acquisition module is also described as " being used for the module for obtaining the size of each data partition ".
As on the other hand, present invention also offers a kind of computer-readable medium, the computer-readable medium can be Included in equipment described in above-described embodiment;Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the equipment, makes Obtaining the equipment includes:Step S101, the size of each data partition is obtained;Step S102, judge whether that needs are divided The data partition cut;Step S103, when in the presence of the data partition for needing to be split, according to predefined subregion field to institute The data partition to be split that stating needs to be split performs division operation, to obtain multiple new data subregions, the new data point The data storage path in area includes data storage path and the subregion mark of the new data subregion of the data partition to be split Know, the partition identification generates according to the subregion field;Step S104, step S101 to step S103 is repeated, until Step S102 judges the data partition for needing to be split is not present.
Technical scheme according to embodiments of the present invention, by by more than the data partition of predetermined threshold according to recursive method Multiple new data subregions with hierarchical relationship are divided into, manually-operated uncertainty is overcome, solves manual operation and draw The problem of error rate that rises is high, the wasting of resources and efficiency are low, and the details of each division operation can be recorded, after being Continuous searching data partitioning scenario provides foundation, is easy to system administration.
Above-mentioned embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention Within.

Claims (12)

  1. A kind of 1. method for carrying out data partition, it is characterised in that including:
    Step S101, the size of each data partition is obtained;
    Step S102, judge whether to need the data partition split;
    Step S103, when in the presence of need split data partition when, according to predefined subregion field to it is described need into The data partition to be split of row segmentation performs division operation, to obtain multiple new data subregions, the data of the new data subregion The partition identification in data storage path and the new data subregion of the storage path including the data partition to be split, described point Area's mark generates according to the subregion field;
    Step S104, step S101 to step S103 is repeated, until step S102 judges that the needs, which are not present, to be divided The data partition cut.
  2. 2. according to the method for claim 1, it is characterised in that the step S102 includes:
    Whether exceed what predetermined threshold value was split to judge whether the needs according to the size of the data partition Data partition;
    If in the presence of certain data partition, and the size of certain data partition exceedes the predetermined threshold value, then judge described in presence Need the data partition split;
    Otherwise, it is determined that in the absence of the data partition for needing to be split.
  3. 3. according to the method for claim 1, it is characterised in that the needs are divided according to predefined subregion field The step of data partition to be split execution division operation cut, includes:
    The subregion field is added in the afterbody of each row of data of the data partition to be split;
    The data storage path according to corresponding to the subregion field determines each row of data this time division operation;
    Path, which is deposited, according to the data establishes new data subregion;
    According to each row of data, this time data corresponding to division operation are deposited corresponding to each row of data is saved in by path New data subregion.
  4. 4. according to the method for claim 1, it is characterised in that needing to carry out to described according to predefined subregion field Before the data partition to be split of segmentation performs the step of division operation, in addition to:Record the data for needing to be split The partition name of subregion.
  5. 5. according to the method for claim 1, it is characterised in that the subregion field includes character and additional character, and often Two characters are separated by the additional character.
  6. A kind of 6. device for carrying out data partition, it is characterised in that including:Acquisition module, judge module and segmentation module, its In,
    The acquisition module, for obtaining the size of each data partition;
    The judge module, for judging whether to need the data partition split, divided if the needs be present The data partition cut then performs the segmentation module, and returns and continue to determine whether the data for needing to be split point be present Area, until in the absence of the data partition for needing to be split;
    The segmentation module, for when in the presence of need split data partition when, according to predefined subregion field to institute The data partition to be split that stating needs to be split performs division operation, to obtain multiple new data subregions, the new data point The data storage path in area includes data storage path and the subregion mark of the new data subregion of the data partition to be split Know, the partition identification generates according to the subregion field.
  7. 7. device according to claim 6, it is characterised in that the judge module is additionally operable to:
    Whether exceed what predetermined threshold value was split to judge whether the needs according to the size of the data partition Data partition;
    If in the presence of certain data partition, and the size of certain data partition exceedes the predetermined threshold value, then judge described in presence Need the data partition split;
    Otherwise, it is determined that in the absence of the data partition for needing to be split.
  8. 8. device according to claim 6, it is characterised in that the segmentation module is additionally operable to:
    The subregion field is added in the afterbody of each row of data of the data partition to be split;
    The data storage path according to corresponding to the subregion field determines each row of data this time division operation;
    Path, which is deposited, according to the data establishes new data subregion;
    According to each row of data, this time data corresponding to division operation are deposited corresponding to each row of data is saved in by path New data subregion.
  9. 9. device according to claim 6, it is characterised in that also including partitioned record module, be used for:According to predefined Subregion field needed to described before the data partition to be split split performs division operation, recording described needs to carry out The partition name of the data partition of segmentation.
  10. 10. device according to claim 6, it is characterised in that the subregion field includes character and additional character, and often Two characters are separated by the additional character.
  11. A kind of 11. terminal for carrying out data partition, it is characterised in that including:
    One or more processors;
    Storage device, for storing one or more programs,
    When one or more of programs are by one or more of computing devices so that one or more of processors are real The now method as described in any in claim 1-5.
  12. 12. a kind of computer-readable medium, is stored thereon with computer program, it is characterised in that described program is held by processor The method as described in any in claim 1-5 is realized during row.
CN201710606901.9A 2017-07-24 2017-07-24 Method and device for partitioning data Active CN107480205B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710606901.9A CN107480205B (en) 2017-07-24 2017-07-24 Method and device for partitioning data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710606901.9A CN107480205B (en) 2017-07-24 2017-07-24 Method and device for partitioning data

Publications (2)

Publication Number Publication Date
CN107480205A true CN107480205A (en) 2017-12-15
CN107480205B CN107480205B (en) 2020-06-05

Family

ID=60595810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710606901.9A Active CN107480205B (en) 2017-07-24 2017-07-24 Method and device for partitioning data

Country Status (1)

Country Link
CN (1) CN107480205B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101531A (en) * 2018-06-22 2018-12-28 联想(北京)有限公司 Document handling method, apparatus and system
CN109542898A (en) * 2018-10-30 2019-03-29 天津字节跳动科技有限公司 Date storage method, device, electronic equipment and the storage medium of data bank table
CN110519319A (en) * 2018-05-22 2019-11-29 杭州海康威视数字技术股份有限公司 A kind of method and device dividing subregion
CN110750515A (en) * 2019-09-25 2020-02-04 浙江大华技术股份有限公司 Database query method and processing device
CN111061738A (en) * 2019-12-16 2020-04-24 中国建设银行股份有限公司 Data table pre-grouping method, device, equipment and storage medium
CN112905596A (en) * 2021-03-05 2021-06-04 北京中经惠众科技有限公司 Data processing method and device, computer equipment and storage medium
CN113778657A (en) * 2020-09-24 2021-12-10 北京沃东天骏信息技术有限公司 Data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546294A (en) * 2009-04-30 2009-09-30 青岛海信宽带多媒体技术有限公司 Method for storing data in Flash memory
US20090245665A1 (en) * 2008-03-31 2009-10-01 Konica Minolta Systems Laboratory, Inc. Systems and methods for resolution switching
CN103902544A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Data processing method and system
CN105009110A (en) * 2012-11-30 2015-10-28 华为技术有限公司 Method for automated scaling of massive parallel processing (mpp) database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090245665A1 (en) * 2008-03-31 2009-10-01 Konica Minolta Systems Laboratory, Inc. Systems and methods for resolution switching
CN101546294A (en) * 2009-04-30 2009-09-30 青岛海信宽带多媒体技术有限公司 Method for storing data in Flash memory
CN105009110A (en) * 2012-11-30 2015-10-28 华为技术有限公司 Method for automated scaling of massive parallel processing (mpp) database
CN103902544A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Data processing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙雷刚等: "数据分区在地学空间数据查询中的应用", 《计算机应用》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110519319A (en) * 2018-05-22 2019-11-29 杭州海康威视数字技术股份有限公司 A kind of method and device dividing subregion
CN110519319B (en) * 2018-05-22 2022-02-11 杭州海康威视数字技术股份有限公司 Method and device for splitting partitions
CN109101531A (en) * 2018-06-22 2018-12-28 联想(北京)有限公司 Document handling method, apparatus and system
CN109101531B (en) * 2018-06-22 2022-05-31 联想(北京)有限公司 File processing method, device and system
CN109542898A (en) * 2018-10-30 2019-03-29 天津字节跳动科技有限公司 Date storage method, device, electronic equipment and the storage medium of data bank table
CN110750515A (en) * 2019-09-25 2020-02-04 浙江大华技术股份有限公司 Database query method and processing device
CN111061738A (en) * 2019-12-16 2020-04-24 中国建设银行股份有限公司 Data table pre-grouping method, device, equipment and storage medium
CN113778657A (en) * 2020-09-24 2021-12-10 北京沃东天骏信息技术有限公司 Data processing method and device
CN113778657B (en) * 2020-09-24 2024-04-16 北京沃东天骏信息技术有限公司 Data processing method and device
CN112905596A (en) * 2021-03-05 2021-06-04 北京中经惠众科技有限公司 Data processing method and device, computer equipment and storage medium
CN112905596B (en) * 2021-03-05 2024-02-02 北京中经惠众科技有限公司 Data processing method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN107480205B (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN107480205A (en) A kind of method and apparatus for carrying out data partition
CN109409119A (en) Data manipulation method and device
CN108846753B (en) Method and apparatus for processing data
CN108629029A (en) A kind of data processing method and device applied to data warehouse
CN110798467B (en) Target object identification method and device, computer equipment and storage medium
CN110019080A (en) Data access method and device
CN107908615A (en) A kind of method and apparatus for obtaining search term corresponding goods classification
CN109388654A (en) A kind of method and apparatus for inquiring tables of data
US10417192B2 (en) File classification in a distributed file system
CN108288208A (en) The displaying object of image content-based determines method, apparatus, medium and equipment
CN110019367A (en) A kind of method and apparatus of statistical data feature
CN112860744A (en) Business process processing method and device
US20190163726A1 (en) Automatic equation transformation from text
CN110309142A (en) The method and apparatus of regulation management
CN109753424A (en) The method and apparatus of AB test
CN112558966B (en) Depth model visualization data processing method and device and electronic equipment
CN111339743B (en) Account number generation method and device
CN110807097A (en) Method and device for analyzing data
CN107895044A (en) A kind of database data processing method, device and system
US9965558B2 (en) Cross-channel social search
CN104598554B (en) Webpage loading method and device
WO2019091101A1 (en) Salary calculation method, application server and computer-readable storage medium
CN111753226A (en) Page loading method and device
CN110347654A (en) A kind of method and apparatus of online cluster features
CN109086438A (en) Method and apparatus for query information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant