CN107480205B - Method and device for partitioning data - Google Patents

Method and device for partitioning data Download PDF

Info

Publication number
CN107480205B
CN107480205B CN201710606901.9A CN201710606901A CN107480205B CN 107480205 B CN107480205 B CN 107480205B CN 201710606901 A CN201710606901 A CN 201710606901A CN 107480205 B CN107480205 B CN 107480205B
Authority
CN
China
Prior art keywords
partition
data
partitioned
needing
partitioning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710606901.9A
Other languages
Chinese (zh)
Other versions
CN107480205A (en
Inventor
屠志强
季健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710606901.9A priority Critical patent/CN107480205B/en
Publication of CN107480205A publication Critical patent/CN107480205A/en
Application granted granted Critical
Publication of CN107480205B publication Critical patent/CN107480205B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for partitioning data, and relates to the technical field of computers. One embodiment of the method comprises: s101, acquiring the size of each data partition; step S102, judging whether a data partition needing to be divided exists or not; step S103, when a data partition needing to be partitioned exists, partitioning operation is carried out on the data partition to be partitioned according to predefined partition fields so as to obtain a plurality of new data partitions; and step S104, repeatedly executing the step S101 to the step S103 until the step S102 judges that no data partition needing to be divided exists. The embodiment solves the problems of high error rate, resource waste and low efficiency caused by manual operation, can record the detailed information of each partition operation, provides a basis for subsequently searching the partition condition of the data, and is convenient for system management.

Description

Method and device for partitioning data
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for partitioning data.
Background
When the data volume is large and condition filtering or grouping is often performed according to a certain field, the data can be considered to be partitioned, for example, the sales condition of a certain commodity is often checked, the sales details or the total of a certain month and a certain quarter are often checked, the data can be partitioned according to the sales date, each month is divided into one partition, and preferably, the data of different partitions can be respectively stored on different physical hard disks, so that the data volume is small and the speed is high when the data of all months are inquired, the data can be inquired directly on a specific hard disk, and the speed can be obviously improved if the data of all months are inquired, a plurality of hard disks can be inquired in parallel.
Database partitioning is a physical database design technique, and although the partitioning technique can achieve many effects, such as performance improvement, management simplification, etc., the main purpose is to reduce the total amount of data read and write in a specific database operation to reduce the response time and improve the efficiency of data query.
At present, the commonly used process of partitioning a database mainly includes:
1. by querying the data, experience determines whether partitioning is required, for example: when the response time of the data query exceeds a certain threshold, the data volume of the partition is judged to be too large according to experience;
2. setting a partition identifier through service knowledge so as to partition according to the partition identifier;
3. and according to different database types, creating partitions through partition instructions in the corresponding database languages.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
1. the prior art needs manual intervention and experience judgment, and when a partition is already partitioned but a certain partition is still too large, the partition needs to be manually partitioned again;
2. in the specific operation process, partition identification needs to be manually set by combining service knowledge;
3. and manual arrangement is needed when partition records and operation guidelines are searched subsequently.
In conclusion, the prior art has excessive manual intervention, high error probability, resource waste and low efficiency; and the system is not beneficial to system management because of no idiomatic standard and execution record.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for partitioning data, which can solve the problems of high error rate, resource waste and low efficiency caused by manual operations, and can record detailed information of each partitioning operation, provide a basis for subsequently searching for a data partitioning condition, and facilitate system management.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of partitioning data.
A method of partitioning data, comprising: s101, acquiring the size of each data partition; step S102, judging whether a data partition needing to be divided exists or not; step S103, when a data partition needing to be partitioned exists, partitioning operation is carried out on the data partition needing to be partitioned according to predefined partition fields to obtain a plurality of new data partitions, wherein data storage paths of the new data partitions comprise data storage paths of the data partitions needing to be partitioned and partition identifications of the new data partitions, and the partition identifications are generated according to the partition fields; and step S104, repeatedly executing the step S101 to the step S103 until the step S102 judges that the data partition needing to be divided does not exist.
Optionally, the step S102 includes: judging whether the data partition needing to be segmented exists according to whether the size of the data partition exceeds a preset threshold value; if a certain data partition exists and the size of the certain data partition exceeds the preset threshold, judging that the data partition needing to be segmented exists; otherwise, judging that the data partition needing to be segmented does not exist.
Optionally, the step of performing a partitioning operation on the to-be-partitioned data partition to be partitioned according to the predefined partition field includes: adding the partition field at the tail of each row of data of the data partition to be partitioned; determining a data storage path corresponding to the partition operation of each row of data according to the partition field; establishing a new data partition according to the data storage path; and storing each row of data to a corresponding new data partition according to the data storage path corresponding to the partition operation of each row of data.
Optionally, before the step of performing a partitioning operation on the data partition to be partitioned according to the predefined partition field, the method further includes: and recording the partition names of the data partitions needing to be partitioned.
Optionally, the partition field comprises characters and special symbols, and every two of the characters are separated by the special symbols.
According to another aspect of the embodiments of the present invention, an apparatus for data partitioning is provided.
An apparatus for data partitioning, comprising: the device comprises an acquisition module, a judgment module and a segmentation module, wherein the acquisition module is used for acquiring the size of each data partition; the judging module is used for judging whether the data partition needing to be segmented exists or not, executing the segmenting module if the data partition needing to be segmented exists, and returning to continuously judge whether the data partition needing to be segmented exists or not until the data partition needing to be segmented does not exist; the partitioning module is configured to, when there is a data partition to be partitioned, perform partitioning operation on the data partition to be partitioned according to a predefined partition field to obtain a plurality of new data partitions, where a data storage path of the new data partition includes a data storage path of the data partition to be partitioned and a partition identifier of the new data partition, and the partition identifier is generated according to the partition field.
Optionally, the determining module is further configured to: judging whether the data partition needing to be segmented exists according to whether the size of the data partition exceeds a preset threshold value; if a certain data partition exists and the size of the certain data partition exceeds the preset threshold, judging that the data partition needing to be segmented exists; otherwise, judging that the data partition needing to be segmented does not exist.
Optionally, the segmentation module comprises: adding the partition field at the tail of each row of data of the data partition to be partitioned; determining a data storage path corresponding to the partition operation of each row of data according to the partition field; establishing a new data partition according to the data storage path; and storing each row of data to a corresponding new data partition according to the data storage path corresponding to the partition operation of each row of data.
Optionally, the system further comprises a partition recording module, configured to: and recording the partition name of the data partition to be partitioned before performing partition operation on the data partition to be partitioned according to the predefined partition field.
Optionally, the partition field comprises characters and special symbols, and every two of the characters are separated by the special symbols.
According to another aspect of the embodiments of the present invention, there is provided a terminal for performing data partitioning.
A terminal for data partitioning, comprising: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the method for partitioning data provided by the embodiment of the invention.
According to yet another aspect of embodiments of the present invention, a computer-readable medium is provided.
A computer readable medium, on which a computer program is stored, which when executed by a processor implements the method for data partitioning provided by an embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits:
the data partition exceeding the preset threshold is divided into a plurality of new data partitions with hierarchical relationships according to a recursion method, so that the uncertainty of manual operation is overcome, the problems of high error rate, resource waste and low efficiency caused by manual operation are solved, the detailed information of each partition operation can be recorded, a basis is provided for the subsequent data partition searching condition, and the system management is facilitated.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a method of data partitioning according to an embodiment of the present invention;
FIG. 2 is a flow diagram of an implementation according to one embodiment of the invention;
FIG. 3 is a schematic diagram of the main modules of an apparatus for data partitioning according to an embodiment of the present invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
FIG. 5 is a block diagram of a computer system suitable for use with a terminal implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of main steps of a method for partitioning data according to an embodiment of the present invention. As shown in fig. 1, the method for partitioning data according to the embodiment of the present invention mainly includes the following steps S101 to S104.
Step S101: the size of each data partition is obtained.
When a data table or a data partition of the table needs to be partitioned, the size of each current data partition of the data table needs to be acquired first. When the data table is partitioned for the first time, the whole data table is a data partition. When the size of the data partition is obtained, the size of the data partition corresponding to all data partition addresses under the current path can be returned only by knowing the address of the data table when the command is executed. For example: and returning the sizes of all data partitions of the data table according to the code 'hadoops fs-du-s-h hdfs:// node/user name/library name/table name'.
Step S102: and judging whether a data partition needing to be divided exists or not.
In the embodiment of the present invention, whether there is a data partition that needs to be partitioned may be determined according to whether the size of the data partition exceeds a predetermined threshold;
if a certain data partition exists and the size of the certain data partition exceeds a preset threshold value, judging that the data partition needing to be divided exists;
otherwise, judging that no data partition needing to be divided exists.
Before partitioning a data table or a data partition of a table, a threshold value of each data partition needs to be set in advance, and the preset threshold value is, for example, 100GB (gigabytes). The threshold of the data partition may be set manually, or may use a default value configured by hadoop. When the threshold of the data partition is set, the threshold can be flexibly adjusted according to the specific situation of the practical application, and the efficiency of data search and the workload of data maintenance are both considered, if the partition is too small, the workload may be greater when the data maintenance is performed, and if the partition is too large, the efficiency of data search may be lowered. However, the threshold for data partitioning must be greater than the size of any piece of data. Then, after the size of each data partition is obtained, the size of each data partition is compared with the threshold value, so that whether a data partition needing to be divided exists can be determined.
Step S103: when the data partition needing to be divided exists, the data partition needing to be divided is subjected to partition operation according to the predefined partition field to obtain a plurality of new data partitions, the data storage path of each new data partition comprises the data storage path of the data partition needing to be divided and the partition identification of the new data partition, and the partition identification is generated according to the partition field.
Before partitioning the data table or the data partition of the table, the partition field identifier and its location of the partitioning operation, and the file delimiter (e.g./001) need to be preset, and the partition field for performing the partitioning operation can be obtained through the preset 3 parameters. For example: assuming that a data table and its data partition include field identifiers including "name", "gender", "age", and "height", the 4 field identifiers are expressed with file separators included therebetween as follows: "name/001 gender/001 age/001 height", when the position (for example "3") of the partition field identifier is known, the corresponding partition field identifier is obtained as "age", and the partition field corresponding to each row of data can be obtained according to the partition field identifier.
The process of performing partition operation on the to-be-partitioned data partition to be partitioned according to the predefined partition field mainly comprises the following steps:
step S1031: adding a partition field at the tail of each row of data of a data partition to be partitioned;
step S1032: determining a data storage path corresponding to the partitioning operation of each row of data according to the partitioning field;
step S1033: establishing a new data partition according to the data storage path;
step S1034: and storing each row of data to a corresponding new data partition according to the data storage path corresponding to the partition operation of each row of data.
Step S1031 may utilize a written mr (mapreduce) program to add a column of partition fields at the tail of each row of data to be partitioned, where the partition fields include characters and special symbols, and every two characters are separated by a special symbol. For example, the partition field is identified as event _ id, and for a certain row of data included in the data partition to be partitioned, the value of the partition field corresponding to the row of data is "ABC 123", then the character included in the partition field appended to the tail of the row of data is identified as "ABC 123", and after a special symbol "@" is added between every two characters, the partition field appended to the tail of the row of data can be obtained as follows: "A @ @ B @ @ C @ @1@ @2@ @ 3". Here, the special symbol generally refers to a character that is not contained in data, for example: @, & and the like.
Step S1032 needs to be performed according to the characters and the special symbols included in the partition field when determining the data storage path corresponding to the partition operation of each row of data according to the partition field. The data storage path is composed of a data storage path of a data partition to be partitioned and a partition identifier of a new data partition, wherein the partition identifier of the new data partition is obtained by corresponding characters.
Defining a partition frequency j which is a natural number greater than 0 and is given an initial value of 1 to represent the frequency of executing partition operation on a data partition to be partitioned. Since a data partition to be partitioned may be large, after a partitioning operation is performed for one time, the partitioned new data partition may still exceed a predetermined threshold, and at this time, a partitioning operation needs to be performed on the new data partition obtained after the partitioning operation, which may be regarded as a second partitioning operation performed on the data partition to be partitioned.
The partition identifier of the new data partition is obtained by corresponding characters, the characters included in the partition field have sequence numbers, and the sequence numbers of the characters correspond to the partition times j, that is, the partition identifier of the new data partition is obtained by corresponding the characters with the sequence numbers corresponding to the partition times j. Taking the example that the characters included in the partition field are "ABC 123", each character is given a predetermined sequence number, the sequence number is a natural number greater than 0, and the initial value is 1, then the sequence number corresponding to the character "a" is 1, the sequence number corresponding to the character "B" is 2, the sequence number corresponding to the character "1" is 4, and so on. Therefore, when the partition frequency j is 1, the partition identifier of the corresponding new data partition is "a"; when the partition frequency j is 2, the partition identifier of the corresponding new data partition is 'B'; when the partition number j is 4, the partition identifier of the corresponding new data partition is "1", and so on.
The data storage path of the new data partition is composed of the data storage path of the data partition to be partitioned and the partition identifier of the new data partition. Assuming that a data storage path of a data partition to be partitioned is X, a partition field is identified as test _ id, and for a certain row of data contained in the data partition to be partitioned, the value of the partition field corresponding to the row of data is ABC123 ", then, a partition field added at the tail of the row of data is" A @ B @ @ C @ @1@ 2@ 3 ", and when a partition frequency j is 1, the data storage path of a corresponding new data partition is X/test _ id _ partition is A; when the partition frequency j is 2, the data storage path of the corresponding new data partition is X/test _ id _ partition ═ A/test _ id _ partition ═ B; when the partition number j is 3, the data storage path of the corresponding new data partition is X/test _ id _ partition as a/test _ id _ partition as B/test _ id _ partition as C, and so on, and the data storage path corresponding to each partition operation of the row of data can be obtained. For another example, when a certain partition path to be partitioned is X, which includes 4 rows of data, and a second partition operation is performed according to a predetermined partition field identifier "Event _ id" (i.e., j equals 2), the data storage path of the new data partition corresponding to each data is as shown in table 1.
TABLE 1
Event_id Partition field Data storage path when j equals 2
ABC A@@B@@C X/event_id_partition=A/event_id_partition=B
ADC A@@D@@C X/event_id_partition=A/event_id_partition=D
ABD A@@B@@D X/event_id_partition=A/event_id_partition=B
BDC B@@D@@C X/event_id_partition=B/event_id_partition=D
Through step S1032, the data storage path corresponding to the partition operation of each row of data may be determined, and then step S1033 is executed: and establishing a new data partition according to the data storage path. When a new data partition is established, for different data, the data storage paths corresponding to the partition operation may be the same, and only one new data partition needs to be established at this time. Meanwhile, when a new data partition is established, the new data partition may also be named, and the naming rule may be set by itself according to the requirement, for example, the path of the new data partition may be simply used as its partition name, and so on.
After the different new data partitions are established, step S1034 may be performed: and storing the data to the corresponding new data partition according to the data storage path corresponding to the partition operation of the data. For example: the data storage path of a certain data partition to be partitioned is X, the partition field identifier is test _ id, the data partition to be partitioned comprises 3 pieces of data, wherein the character included in the partition field corresponding to the data 1 is ABC 123; the partition field corresponding to data 2 includes the character "AED 56"; the partition field corresponding to data 3 includes the character "CD 24". When the first partitioning operation is performed on the partition of the data to be partitioned, the data storage paths corresponding to the (first) partitioning operation can be obtained according to the partition fields of the 3 pieces of data, and the data storage paths are respectively as follows: x/test _ id _ partition ═ a and X/test _ id _ partition ═ C; two new data partitions can be established according to the two data storage paths, and if the data storage path is taken as a partition name, the partition names of the two new data partitions are respectively as follows: x/test _ id _ partition ═ a and X/test _ id _ partition ═ C; according to the 3 data storage paths corresponding to the partition operation, data 1 and data 2 should be stored in the new data partition X/test _ id _ partition ═ a, and data 3 should be stored in the new data partition X/test _ id _ partition ═ C, so that the data partition to be partitioned can be divided into two new data partitions X/test _ id _ partition ═ a and X/test _ id _ partition ═ C. In the process of data partitioning, the data in the data partition to be partitioned is allocated to the two new data partitions according to the method described above, and at this time, the data partition to be partitioned becomes the partitioned new data partition.
Step S104: the steps S101 to S103 are repeatedly executed until step S102 determines that there is no data partition that needs to be divided.
As described above, when steps S101 and S102 are to be executed again after the data partition to be divided is subjected to the partitioning operation once, it is determined whether there is still a data partition that needs to be divided. When there are still data partitions to be partitioned, step S103 is executed again; and when the data partition needing to be partitioned does not exist, ending the partitioning process of the data table or the data partition of the table.
In addition, after the data partitioning process is completed, small files generated by different reduce nodes in the MR program execution process may be merged, where the small files are files in the data partitions to be partitioned and are stored in different files in a dispersed manner in the partition execution process, and the query efficiency may be optimized by merging the small files.
In order to facilitate the partitioning of the data to be subsequently queried, before performing the partitioning operation on the data partition to be partitioned according to the predefined partition field in step S103, the partition name of the data partition to be partitioned may also be recorded to form a normalized "partition list". According to the query requirement, the partition list can also record the partition time and the partition times of the data partition to be partitioned, the data partition path obtained after each partition, and other information.
According to the steps S101 to S104, the data partition exceeding the predetermined threshold can be divided into a plurality of new data partitions with hierarchical relationships according to a recursive method, so that the uncertainty of manual operation is overcome, the problems of high error rate, resource waste and low efficiency caused by manual operation are solved, the detailed information of each partition operation can be recorded, a basis is provided for subsequently searching the data partition, and system management is facilitated.
FIG. 2 is a flow chart of an implementation according to one embodiment of the present invention. As shown in fig. 2, before starting to execute the program, partition parameters need to be set (step S201), for example: partition thresholds, partition field identification and its location, file separators, and the like. After the setting is completed, the size of the data partition needs to be acquired first (step S202); then judging whether the data partition to be divided exists or not according to whether the size of the data partition exceeds a preset partition threshold value or not (step S203); when there is no data partition to be partitioned, the process ends (step S208), and when there is a data partition to be partitioned, the partition operation is executed, which mainly includes: adding a partition field at the tail of each row of data of a data partition to be partitioned (step S204); determining a data storage path of the partitioning operation according to the partition field (step S205); dynamically creating a new data partition according to the data storage path (step S206); storing the data in the data partition to be partitioned into a new data partition (step S207), so as to complete the partitioning operation; finally, the process goes to step S202 again, and the process is repeatedly executed according to the procedure described in the previous paragraph until the program is ended when there is no data partition to be divided (step S208).
Fig. 3 is a schematic diagram of main blocks of an apparatus for performing data partitioning according to an embodiment of the present invention. As shown in fig. 3, the apparatus 300 for data partitioning according to the embodiment of the present invention mainly includes an obtaining module 301, a determining module 302, and a dividing module 303.
The obtaining module 301 is configured to obtain a size of each data partition;
the judging module 302 is configured to judge whether there is a data partition that needs to be partitioned, execute the partitioning module if there is a data partition that needs to be partitioned, and return to continuously judge whether there is a data partition that needs to be partitioned until there is no data partition that needs to be partitioned;
the partitioning module 303 is configured to, when there is a data partition to be partitioned, perform partitioning operation on the data partition to be partitioned according to a predefined partition field to obtain a plurality of new data partitions, where a data storage path of a new data partition includes a data storage path of the data partition to be partitioned and a partition identifier of the new data partition, and the partition identifier is generated according to the partition field.
According to an embodiment of the present invention, the determining module 302 may further be configured to:
judging whether a data partition needing to be divided exists according to whether the size of the data partition exceeds a preset threshold value;
if a certain data partition exists and the size of the certain data partition exceeds a preset threshold value, judging that the data partition needing to be divided exists;
otherwise, judging that no data partition needing to be divided exists.
According to an embodiment of the invention, the partitioning module 303 may be further configured to perform the partitioning operation by:
adding a partition field at the tail of each row of data of a data partition to be partitioned;
determining a data storage path corresponding to the partitioning operation of each row of data according to the partitioning field;
establishing a new data partition according to the data storage path;
and storing each row of data to a corresponding new data partition according to the data storage path corresponding to the partition operation of each row of data.
In addition, the apparatus 300 for performing data partitioning according to the embodiment of the present invention may further include a partitioning recording module (not shown in the figure), configured to: and recording the partition name of the data partition to be partitioned before performing partition operation on the data partition to be partitioned according to the predefined partition field.
According to the technical scheme of the embodiment of the invention, the partition field can comprise characters and special symbols, and every two characters are separated by the special symbols.
According to the technical scheme of the embodiment of the invention, the data partition exceeding the preset threshold is divided into the plurality of new data partitions with the hierarchical relationship according to the recursion method, so that the problems of high error rate, resource waste and low efficiency caused by manual operation are solved, the detailed information of each partition operation can be recorded, a basis is provided for the subsequent data partition searching condition, and the system management is facilitated.
Fig. 4 illustrates an exemplary system architecture 400 of a method of data partitioning or an apparatus for data partitioning (tailored to a specific case) to which embodiments of the present invention may be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405 (this architecture is merely an example, and the components included in a particular architecture may be adapted according to application specific circumstances). The network 404 serves as a medium for providing communication links between the terminal devices 404, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The terminal devices 401, 402, 403 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 405 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 401, 402, 403. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the method for partitioning data provided in the embodiment of the present invention is generally executed by the server 405, and accordingly, the apparatus for partitioning data is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use with a terminal implementing an embodiment of the invention is shown. The terminal shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module, a determination module, and a segmentation module. The names of these modules do not in some cases constitute a limitation on the module itself, and for example, the acquiring module may also be described as a "module for acquiring the size of each data partition".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: s101, acquiring the size of each data partition; step S102, judging whether a data partition needing to be divided exists or not; step S103, when a data partition needing to be partitioned exists, partitioning operation is carried out on the data partition needing to be partitioned according to predefined partition fields to obtain a plurality of new data partitions, wherein data storage paths of the new data partitions comprise data storage paths of the data partitions needing to be partitioned and partition identifications of the new data partitions, and the partition identifications are generated according to the partition fields; and step S104, repeatedly executing the step S101 to the step S103 until the step S102 judges that the data partition needing to be divided does not exist.
According to the technical scheme of the embodiment of the invention, the data partition exceeding the preset threshold is divided into the plurality of new data partitions with the hierarchical relationship according to the recursion method, so that the uncertainty of manual operation is overcome, the problems of high error rate, resource waste and low efficiency caused by manual operation are solved, the detailed information of each partition operation can be recorded, a basis is provided for the subsequent data partition searching condition, and the system management is facilitated.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A method for data partitioning, comprising:
s101, acquiring the size of each data partition;
step S102, judging whether a data partition needing to be divided exists or not;
step S103, when a data partition needing to be partitioned exists, partitioning operation is carried out on the data partition needing to be partitioned according to predefined partition fields to obtain a plurality of new data partitions, wherein data storage paths of the new data partitions comprise data storage paths of the data partitions to be partitioned and partition identifications of the new data partitions, the partition fields comprise characters with sequence numbers, and the partition identifications are generated according to the characters, corresponding to the partition times, of the sequence numbers in the partition fields;
and step S104, repeatedly executing the step S101 to the step S103 until the step S102 judges that the data partition needing to be divided does not exist.
2. The method according to claim 1, wherein the step S102 comprises:
judging whether the data partition needing to be segmented exists according to whether the size of the data partition exceeds a preset threshold value;
if a certain data partition exists and the size of the certain data partition exceeds the preset threshold, judging that the data partition needing to be segmented exists;
otherwise, judging that the data partition needing to be segmented does not exist.
3. The method according to claim 1, wherein the step of performing partition operation on the data partition to be partitioned according to the predefined partition field comprises:
adding the partition field at the tail of each row of data of the data partition to be partitioned;
determining a data storage path corresponding to the partition operation of each row of data according to the partition field;
establishing a new data partition according to the data storage path;
and storing each row of data to a corresponding new data partition according to the data storage path corresponding to the partition operation of each row of data.
4. The method according to claim 1, further comprising, before the step of performing a partitioning operation on the data partition to be partitioned according to the predefined partition field, the step of: and recording the partition names of the data partitions needing to be partitioned.
5. The method of claim 1, wherein the partition field comprises characters and special symbols, and wherein every two of the characters are separated by the special symbols.
6. An apparatus for partitioning data, comprising: an acquisition module, a judgment module and a segmentation module, wherein,
the acquisition module is used for acquiring the size of each data partition;
the judging module is used for judging whether the data partition needing to be segmented exists or not, executing the segmenting module if the data partition needing to be segmented exists, and returning to continuously judge whether the data partition needing to be segmented exists or not until the data partition needing to be segmented does not exist;
the partitioning module is used for performing partitioning operation on the data partitions to be partitioned according to predefined partition fields when the data partitions to be partitioned exist so as to obtain a plurality of new data partitions, wherein the data storage paths of the new data partitions comprise the data storage paths of the data partitions to be partitioned and partition identifications of the new data partitions, the partition fields comprise characters with sequence numbers, and the partition identifications are generated according to the characters, corresponding to the partition times, of the sequence numbers in the partition fields.
7. The apparatus of claim 6, wherein the determining module is further configured to:
judging whether the data partition needing to be segmented exists according to whether the size of the data partition exceeds a preset threshold value;
if a certain data partition exists and the size of the certain data partition exceeds the preset threshold, judging that the data partition needing to be segmented exists;
otherwise, judging that the data partition needing to be segmented does not exist.
8. The apparatus of claim 6, wherein the segmentation module is further configured to:
adding the partition field at the tail of each row of data of the data partition to be partitioned;
determining a data storage path corresponding to the partition operation of each row of data according to the partition field;
establishing a new data partition according to the data storage path;
and storing each row of data to a corresponding new data partition according to the data storage path corresponding to the partition operation of each row of data.
9. The apparatus of claim 6, further comprising a partition recording module to: and recording the partition name of the data partition to be partitioned before performing partition operation on the data partition to be partitioned according to the predefined partition field.
10. The apparatus of claim 6, wherein the partition field comprises characters and special symbols, and wherein every two of the characters are separated by the special symbols.
11. A terminal for data partitioning, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201710606901.9A 2017-07-24 2017-07-24 Method and device for partitioning data Active CN107480205B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710606901.9A CN107480205B (en) 2017-07-24 2017-07-24 Method and device for partitioning data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710606901.9A CN107480205B (en) 2017-07-24 2017-07-24 Method and device for partitioning data

Publications (2)

Publication Number Publication Date
CN107480205A CN107480205A (en) 2017-12-15
CN107480205B true CN107480205B (en) 2020-06-05

Family

ID=60595810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710606901.9A Active CN107480205B (en) 2017-07-24 2017-07-24 Method and device for partitioning data

Country Status (1)

Country Link
CN (1) CN107480205B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110519319B (en) * 2018-05-22 2022-02-11 杭州海康威视数字技术股份有限公司 Method and device for splitting partitions
CN109101531B (en) * 2018-06-22 2022-05-31 联想(北京)有限公司 File processing method, device and system
CN109542898A (en) * 2018-10-30 2019-03-29 天津字节跳动科技有限公司 Date storage method, device, electronic equipment and the storage medium of data bank table
CN110750515A (en) * 2019-09-25 2020-02-04 浙江大华技术股份有限公司 Database query method and processing device
CN111061738A (en) * 2019-12-16 2020-04-24 中国建设银行股份有限公司 Data table pre-grouping method, device, equipment and storage medium
CN113778657B (en) * 2020-09-24 2024-04-16 北京沃东天骏信息技术有限公司 Data processing method and device
CN112905596B (en) * 2021-03-05 2024-02-02 北京中经惠众科技有限公司 Data processing method, device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546294A (en) * 2009-04-30 2009-09-30 青岛海信宽带多媒体技术有限公司 Method for storing data in Flash memory
CN103902544A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Data processing method and system
CN105009110A (en) * 2012-11-30 2015-10-28 华为技术有限公司 Method for automated scaling of massive parallel processing (mpp) database

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8121435B2 (en) * 2008-03-31 2012-02-21 Konica Minolta Laboratory U.S.A., Inc. Systems and methods for resolution switching

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546294A (en) * 2009-04-30 2009-09-30 青岛海信宽带多媒体技术有限公司 Method for storing data in Flash memory
CN105009110A (en) * 2012-11-30 2015-10-28 华为技术有限公司 Method for automated scaling of massive parallel processing (mpp) database
CN103902544A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Data processing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
数据分区在地学空间数据查询中的应用;孙雷刚等;《计算机应用》;20101231;第148-151页 *

Also Published As

Publication number Publication date
CN107480205A (en) 2017-12-15

Similar Documents

Publication Publication Date Title
CN107480205B (en) Method and device for partitioning data
CN109947668B (en) Method and device for storing data
US9372880B2 (en) Reclamation of empty pages in database tables
CN108629029B (en) Data processing method and device applied to data warehouse
CN109614402B (en) Multidimensional data query method and device
CN107704202B (en) Method and device for quickly reading and writing data
US10417192B2 (en) File classification in a distributed file system
EP3803625A1 (en) Frequent pattern analysis for distributed systems
CN109697019B (en) Data writing method and system based on FAT file system
CN112783887A (en) Data processing method and device based on data warehouse
US10599626B2 (en) Organization for efficient data analytics
CN111666278A (en) Data storage method, data retrieval method, electronic device and storage medium
CN115794876A (en) Fragment processing method, device, equipment and storage medium for service data packet
US10712959B2 (en) Method, device and computer program product for storing data
CN113704242A (en) Data processing method and device
CN107665241B (en) Real-time data multi-dimensional duplicate removal method and device
CN113495891A (en) Data processing method and device
CN112862554A (en) Order data processing method and device
CN112131287A (en) Method and device for reading data
CN112835609A (en) Method and device for modifying dependent package download address
CN112711572A (en) Online capacity expansion method and device suitable for sub-warehouse and sub-meter
CN112667627B (en) Data processing method and device
CN111459411B (en) Data migration method, device, equipment and storage medium
US11163462B1 (en) Automated resource selection for software-defined storage deployment
CN106777403B (en) Information pushing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant