CN111339133A - Data segmentation method and device, computer equipment and storage medium - Google Patents

Data segmentation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111339133A
CN111339133A CN201811559134.1A CN201811559134A CN111339133A CN 111339133 A CN111339133 A CN 111339133A CN 201811559134 A CN201811559134 A CN 201811559134A CN 111339133 A CN111339133 A CN 111339133A
Authority
CN
China
Prior art keywords
data
tables
sub
segmenting
data table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811559134.1A
Other languages
Chinese (zh)
Other versions
CN111339133B (en
Inventor
熊友军
方曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youbixuan Intelligent Robot Co ltd
Shenzhen Ubtech Technology Co ltd
Original Assignee
Ubtech Robotics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ubtech Robotics Corp filed Critical Ubtech Robotics Corp
Priority to CN201811559134.1A priority Critical patent/CN111339133B/en
Priority to PCT/CN2018/122380 priority patent/WO2020124491A1/en
Publication of CN111339133A publication Critical patent/CN111339133A/en
Application granted granted Critical
Publication of CN111339133B publication Critical patent/CN111339133B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method and a device for segmenting data, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a data set and a service type of the data in the data set; segmenting the data of the data set according to the service type to obtain a plurality of data tables; acquiring the number of data in each data table; and segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table. By segmenting the data in the above manner, the data query efficiency can be improved when the data query is needed.

Description

Data segmentation method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of databases, in particular to a data segmentation method, a data segmentation device, computer equipment and a storage medium.
Background
With the advent of the cloud era, big data has attracted more and more attention of people in related fields. Big data is a collection of a large amount of data, and user behaviors, product price trends and the like can be predicted through the big data.
With the continuous development of big data technology, the data volume stored by the server is larger and larger, and when the data volume of more than ten million or even hundred million is queried, the server database cannot respond in time due to too many query objects because of huge data volume stored in the server.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a data segmentation method, device, computer device, and storage medium for improving query efficiency.
A method of segmenting data, the method comprising:
acquiring a data set and a service type of the data in the data set;
segmenting the data of the data set according to the service type to obtain a plurality of data tables;
acquiring the number of data in each data table;
and segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
In one embodiment, after the segmenting the data of the data set according to the service type to obtain a plurality of data tables, the method further includes: determining the number of databases according to the number of the data tables; and storing the plurality of data tables in a plurality of determined databases.
In one embodiment, the segmenting the corresponding data table according to the number of the data pieces to obtain a plurality of data sub-tables corresponding to each data table includes: acquiring the occupied space corresponding to each piece of data in each data table; determining the number of data sub-tables corresponding to the data table according to the number of the data in the data table and the occupied space corresponding to each piece of data in the data table; and segmenting the data tables according to the number of the data sub-tables to obtain a plurality of data sub-tables corresponding to each data table.
In one embodiment, after the segmenting the data tables according to the number of the data sub-tables to obtain a plurality of data sub-tables corresponding to each data table, the method further includes: determining the number of databases according to the number of the data sub-tables; and storing the plurality of data sub-tables in a plurality of determined databases.
In one embodiment, the segmenting the corresponding data table according to the number of the data pieces to obtain a plurality of data sub-tables corresponding to each data table includes: acquiring a field number identifier corresponding to each piece of data in each data table; performing a modulus operation on the field number identifier to obtain a modulus result; and segmenting the corresponding data tables according to the modulus calculation result and the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
In one embodiment, after the segmenting the corresponding data table according to the number of the data pieces to obtain a plurality of data sub-tables corresponding to each data table, the method further includes: and sending the corresponding relation between the service type and the data table, the corresponding relation between the data table and the corresponding data sub-table and the data identifier corresponding to each piece of data in the data sub-table to a data table management center, so that the data table management center searches the data corresponding to the data identifier from the corresponding data sub-table according to the service type and the data identifier in the received data processing request.
An apparatus for slicing data, the apparatus comprising:
the first acquisition module is used for acquiring a data set and the service type of the data in the data set;
the first segmentation module is used for segmenting the data of the data set according to the service type to obtain a plurality of data tables;
the second acquisition module is used for acquiring the number of data in each data table;
and the second segmentation module is used for segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
In one embodiment, the apparatus for segmenting data further includes: the first number determining module is used for determining the number of the databases according to the number of the data tables; the first database storage module is used for storing the data tables in the determined databases.
In one embodiment, the second slicing module includes: the occupied space acquisition module is used for acquiring the occupied space corresponding to each piece of data in each data table; the occupied space determining module is used for determining the number of the data sub-tables corresponding to the data table according to the number of the data in the data table and the occupied space corresponding to each piece of data in the data table; and the occupied space segmentation module is used for segmenting the data tables according to the number of the data sub-tables to obtain a plurality of data sub-tables corresponding to each data table.
In one embodiment, the apparatus for segmenting data further includes: the second number determining module is used for determining the number of the databases according to the number of the data sub-tables; and the second database storage module is used for storing the plurality of data sub-tables in the determined plurality of databases.
In one embodiment, the second segmentation module further includes: the field number identification acquisition module is used for acquiring a field number identification corresponding to each piece of data in each data table; the module calculation module is used for performing module calculation on the field number identifier to obtain a module calculation result; and the die-solving and die-cutting division module is used for dividing the corresponding data tables according to the die-solving result and the data number to obtain a plurality of data sub-tables corresponding to each data table.
A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of:
acquiring a data set and a service type of the data in the data set;
segmenting the data of the data set according to the service type to obtain a plurality of data tables;
acquiring the number of data in each data table;
and segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
acquiring a data set and a service type of the data in the data set;
segmenting the data of the data set according to the service type to obtain a plurality of data tables;
acquiring the number of data in each data table;
and segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
The invention provides a data segmentation method, a device, equipment and a storage medium, which comprises the steps of firstly obtaining a data set and a service type of the data in the data set; segmenting the data of the data set according to the service type to obtain a plurality of data tables; then acquiring the number of data in each data table; and finally, segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table. Therefore, through the method, the original data set is stored by obtaining the plurality of data sub-tables instead of storing the original data set in one data table, so that the data can be directly inquired in one data sub-table instead of a large data table for storing the original data set when the data is required to be inquired subsequently, the inquiry efficiency and the response efficiency can be improved due to the reduction of the inquiry quantity, and in addition, the data is firstly segmented according to the service in the data segmentation mode, and then the table segmented through the service is segmented again, so that the finally obtained data sub-table is smaller than the data sub-table obtained only by one segmentation, and the inquiry efficiency and the response efficiency are further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Wherein:
FIG. 1 is a schematic diagram illustrating an implementation flow of a data segmentation method according to an embodiment;
FIG. 2 is a diagram of a piece of data in one embodiment;
FIG. 3 is a diagram of a piece of data in one embodiment;
FIG. 4 is a diagram of a piece of data in one embodiment;
FIG. 5 is a flow chart illustrating an implementation of a method for data segmentation in one embodiment;
FIG. 6 is a schematic diagram illustrating a process for slicing data according to the data slicing method in one embodiment;
FIG. 7 is a flow chart illustrating an implementation of a method for data segmentation in one embodiment;
FIG. 8 is a diagram illustrating an embodiment in which a data table management center determines data sub-tables based on service types and data identifiers;
FIG. 9 is a block diagram showing a structure of a data slicing apparatus according to an embodiment;
FIG. 10 is a block diagram showing a configuration of a computer device according to an embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in FIG. 1, in one embodiment, a method of slicing data is provided. The method can be applied to a server and also can be applied to a terminal. The server is a high-performance computer or a high-performance computer cluster composed of a plurality of high-performance computers, and the terminal may be a desktop terminal or a mobile terminal, for example, the desktop terminal is a desktop computer and the mobile terminal is a notebook computer.
As shown in fig. 1, the method for segmenting data according to the embodiment of the present invention specifically includes the following steps:
step 102, a data set and a service type of data in the data set are obtained.
The data set is a collection of a plurality of data.
The traffic type of the data indicates different types of data, for example, for a data set, the traffic type of the data may be an order traffic type and a user traffic type. The data table of the order service type may contain data of a user field type (user information data), data of an order amount field type (order amount data), data of a placing time field type (placing time data), data of a coupon field type (coupon data) and data of a purchase date type (purchase date data); the data table for the user service type may contain data of the user field type (user information data), data of the user gender field type (user gender data), data of the user age field type (user age data), and data of the user shipping address field type (user shipping address data).
And step S104, segmenting the data of the data set according to the service type to obtain a plurality of data tables.
The first segmentation of the data set is mainly segmentation of the data according to the service type of the data, wherein one service type corresponds to one data table, for example, the order type corresponds to an order data table, the user type corresponds to a user data table, and the address type corresponds to an address data table; and storing the data belonging to the user service type into a user data table, namely storing user information data, user gender data, user age data and user delivery address data into the user data table.
As an embodiment of the present invention, after the step 104 of segmenting the data of the data set according to the service type to obtain a plurality of data tables, the method further includes: determining the number of databases according to the number of the data tables; and storing the plurality of data tables in a plurality of determined databases.
For example, if 3 data tables are obtained through segmentation, the 3 data tables are respectively placed into 3 databases; or, 3 data tables are obtained through segmentation, two data tables are put into the database 1, and the third data table is put into the database 2.
And step S106, acquiring the number of data in each data table.
The number of data items refers to a row of data or a column of data in a data table, specifically, as shown in the order data table shown in fig. 2, data in each row represents a piece of data, or as shown in the order data table shown in fig. 3, data in each column represents a piece of data.
It should be noted that, the data table includes data of a plurality of fields, and one of the fields may be set as a main field. For example, in fig. 2 or fig. 3, the fields are: a user field, an order amount field, a placing time field, a coupon field, and a goods type field, and for better data division, the main field is set as a user, and as shown in fig. 2, data in the order data table can be divided into 3 pieces of data by setting the user field as the main field. Of course, other fields may be set as the main field, for example, as shown in fig. 4, the order placing time is set as the main field, and of course, what field needs to be specifically set as the main field may be determined according to actual requirements, such as an order table, it is obviously better to set the user field as the main field, and by setting the user field as the main field, data of a single user can be better analyzed, and of course, if the order placing time is set as the main field, a better analysis can be performed on the purchase condition of each time period.
And 108, segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
Here, the corresponding data table may be segmented in multiple ways according to the number of the data pieces, so as to obtain multiple data sub-tables corresponding to each data table.
Optionally, the segmenting the data table according to a preset segmentation ratio, specifically, the segmenting the corresponding data table according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table, includes: acquiring a preset segmentation proportion; and segmenting the corresponding data tables according to a preset segmentation proportion and the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
If the preset segmentation proportion is to uniformly segment each data table into 3 data sub-tables, the data table can be uniformly segmented into 3 data sub-tables according to the number of data in the data table, for example, 60 data, and each data sub-table has 20 data; if the preset splitting ratio is that each data table is split according to 1/6, 1/3 and 1/2, at this time, the data table can be split into data sub-table 1 containing 10 pieces of data, data sub-table 2 containing 20 pieces of data and data sub-table 3 containing 30 pieces of data according to the number of data pieces in the data table, for example, 60 pieces of data.
Optionally, the segmenting the data table according to the occupied space, specifically, the segmenting the corresponding data table according to the number of the data bars to obtain a plurality of data sub-tables corresponding to each data table, includes: acquiring the occupied space corresponding to each piece of data in each data table; determining the number of data sub-tables corresponding to the data table according to the number of the data in the data table and the occupied space corresponding to each piece of data in the data table; and segmenting the data tables according to the number of the data sub-tables to obtain a plurality of data sub-tables corresponding to each data table.
As shown in fig. 2, the occupied space of each piece of data can be calculated by assuming that the occupied space of one piece of data is 1kb, and since one piece of data includes 4 pieces of data, the occupied space of one piece of data is 4 kb.
The determining the number of the data sub-tables corresponding to the data table according to the number of the data in the data table and the occupied space corresponding to each piece of data in the data table specifically includes: determining the occupied space of a data table according to the number of data and the occupied space corresponding to each piece of data, and determining the number of data sub-tables according to the occupied space of the data table; and secondly, determining the number of data stored in each data sub-table according to the occupied space of each piece of data, and determining the number of the data sub-tables according to the number of data stored in each data sub-table and the number of data of the data tables.
For the first case, for example, the occupied space of each piece of data is 0.4M, the number of the pieces of data is 1000, and then the total occupied space is 400M, and since the space is large, two large data sub-tables can be used for storing, that is, the number of the data sub-tables is 2, and each data sub-table stores 200M of data, or more data sub-tables with small storage capacity can be used for storing, for example, 10 data sub-tables store 400M of data, and each data sub-table stores 40M of data, that is, 100 pieces of data. For the second case, for example, the occupied space of each piece of data is 0.4M, and the preset amount of data stored in each data sub-table is 100M (of course, the storage space of each preset data sub-table may be different, and this is only for convenience of description), so that it may be determined that each data sub-table stores 250 pieces of data, and assuming that the data table has 1000 pieces of data, 4 data sub-tables are required.
After the number of the data sub-tables is determined, the data in the data tables can be uniformly stored in each data sub-table, and the data in the data tables can be non-uniformly stored in the data sub-tables according to the specific size of the data required to be stored in each data sub-table.
Optionally, the segmenting a data table according to a field, specifically, the segmenting the corresponding data table according to the number of the data bars to obtain a plurality of data sub-tables corresponding to each data table, includes: acquiring a field number identifier corresponding to each piece of data in each data table; performing a modulus operation on the field number identifier to obtain a modulus result; and segmenting the corresponding data tables according to the modulus calculation result and the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
The field number identifier is used for uniquely identifying the number of one field. For example, a user field, each user is assigned a field number identifier, the field number identifier of user 1 is 1256, the field number identifier of user 2 is 1058, and the field number identifier of user 3 is 2002.
And then carrying out modular operation on the field number identifier to obtain a modular result. For example, the modulo result of the field number flag 1255 of user 1 is 1, the modulo result of the field number flag 1058 of user 2 is 3, and the modulo result of the field number flag 2002 of user 3 is 2.
And finally, segmenting the corresponding data table according to the modulus calculation result and the data strip number. For example, the data with the modulo result of 1 is put into the data sub-table 1, the data with the modulo result of 2 is put into the data sub-table 2, and the data with the modulo result of 3 is put into the data sub-table 3; furthermore, if there is more data for a certain modulo result, for example, more data for a modulo result of 1, then two data sub-tables may be used to store the data for a modulo result of 1, so that the amount of data stored in the finally obtained data sub-tables is almost the same.
Further, as shown in fig. 5, after the step 108 of segmenting the data tables according to the number of the data sub-tables to obtain a plurality of data sub-tables corresponding to each data table, the method further includes:
and step 109, determining the number of the databases according to the number of the data sub-tables.
And step 110, storing the plurality of data sub-tables in a plurality of determined databases.
For example, suppose data table 1 has 6 data sub-tables, data table 2 has 2 data sub-tables, and data table 3 has 15 data sub-tables; setting 2 databases for the data table 1, and placing 3 data sub-tables in each database; the data table 2 is provided with a database for storing two data sub-tables; the data table 3 has a large number of data sub-tables, so that a plurality of databases may be provided, and different data sub-tables are put into the plurality of databases, for example, 5 databases are provided, and each database stores 3 data sub-tables. Of course, when determining the data amount of the data sub-table stored in each database, it is not necessary to set the number of the data sub-tables stored in each database to be the same. For example, 5 databases of data table 3, the first database stores 2 data sub-tables, the second database stores 4 data sub-tables, the third database stores 3 data sub-tables, the fourth database stores 2 data sub-tables, and the fifth database stores 4 data sub-tables.
To more clearly illustrate the data splitting in the embodiment of the present invention, as shown in fig. 6, the data is first split according to the service type to obtain a plurality of data tables, and then each data table is split to obtain a plurality of data sub-tables.
The data segmentation method comprises the steps of firstly, acquiring a data set and a service type of the data in the data set; segmenting the data of the data set according to the service type to obtain a plurality of data tables; then acquiring the number of data in each data table; and finally, segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table. Therefore, through the method, the original data set is stored by obtaining the plurality of data sub-tables instead of storing the original data set in one data table, so that the data can be directly inquired in one data sub-table instead of a large data table for storing the original data set when the data is required to be inquired subsequently, the inquiry efficiency and the response efficiency can be improved due to the reduction of the inquiry quantity, and in addition, the data is firstly segmented according to the service in the data segmentation mode, and then the table segmented through the service is segmented again, so that the finally obtained data sub-table is smaller than the data sub-table obtained only by one segmentation, and the inquiry efficiency and the response efficiency are further improved.
As shown in fig. 7, an embodiment of the present invention provides a data segmentation method, which specifically includes the following steps:
step 702, a data set and a service type of data in the data set are obtained.
Step 704, segmenting the data of the data set according to the service type to obtain a plurality of data tables.
Step 706, acquiring the number of data in each data table.
Step 708, segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
Step 710, sending the corresponding relationship between the service type and the data table, the corresponding relationship between the data table and the corresponding data sub-table, and the data identifier corresponding to each piece of data in the data sub-table to a data table management center, so that the data table management center searches the data corresponding to the data identifier from the corresponding data sub-table according to the service type and the data identifier in the received data processing request.
The data identifier is a serial number and is used for indicating a fourth piece of data in the data sub-table, for example, the first piece of data in the data sub-table is identified by the data identifier 0001, and the second piece of data in the data sub-table is identified by 0002; the data flag is a field flag, and is used to flag data of a certain field in a data sub-table, for example, a main field of a certain data sub-table is a user, then the data flag 0910 flags user 1, and 0920 flags user 2.
The correspondence indicates which data table the data of the service type is stored in as long as the service type is known, or which data table the data of the service type is stored in as long as the data table is known, and at the same time, the location where the table is stored is also known according to the correspondence (the location reflects the database and the server where the table is located, that is, different tables can be stored not only in different databases, but also in different servers), for example, as shown in table 1.
TABLE 1
Figure BDA0001912794800000111
As shown in fig. 8, the data table management center receives a data processing request sent by the terminal, and then obtains a service type and a data identifier from the data processing request, and first knows that data of the service type is stored in the data table 3 according to the obtained service type, and further knows that data corresponding to the data identifier is stored in the data sub-table 2 according to the data identifier, and then searches for data corresponding to the data identifier from the data sub-table 2, so that the data query efficiency can be greatly improved, because it is only used for searching in the data sub-table 2.
As shown in fig. 9, an embodiment of the present invention provides a data slicing apparatus 900, where the apparatus 900 includes:
a first obtaining module 902, configured to obtain a data set and a service type of data in the data set;
a first segmentation module 904, configured to segment the data of the data set according to the service type to obtain a plurality of data tables;
a second obtaining module 906, configured to obtain the number of data pieces in each data table;
the second segmentation module 908 is configured to segment the corresponding data table according to the number of the data strips, so as to obtain a plurality of data sub-tables corresponding to each data table.
The data segmentation device firstly acquires a data set and a service type of the data in the data set; segmenting the data of the data set according to the service type to obtain a plurality of data tables; then acquiring the number of data in each data table; and finally, segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table. Therefore, through the method, the original data set is stored by obtaining the plurality of data sub-tables instead of storing the original data set in one data table, so that the data can be directly inquired in one data sub-table instead of a large data table for storing the original data set when the data is required to be inquired subsequently, the inquiry efficiency and the response efficiency can be improved due to the reduction of the inquiry quantity, and in addition, the data is firstly segmented according to the service in the data segmentation mode, and then the table segmented through the service is segmented again, so that the finally obtained data sub-table is smaller than the data sub-table obtained only by one segmentation, and the inquiry efficiency and the response efficiency are further improved.
In one embodiment, the apparatus 900 for splitting data further includes:
the first number determining module is used for determining the number of the databases according to the number of the data tables;
the first database storage module is used for storing the data tables in the determined databases.
In one embodiment, the second segmentation module 908 comprises:
the occupied space acquisition module is used for acquiring the occupied space corresponding to each piece of data in each data table;
the occupied space determining module is used for determining the number of the data sub-tables corresponding to the data table according to the number of the data in the data table and the occupied space corresponding to each piece of data in the data table;
and the occupied space segmentation module is used for segmenting the data tables according to the number of the data sub-tables to obtain a plurality of data sub-tables corresponding to each data table.
In one embodiment, the apparatus 900 for splitting data further includes:
the second number determining module is used for determining the number of the databases according to the number of the data sub-tables;
and the second database storage module is used for storing the plurality of data sub-tables in the determined plurality of databases.
In one embodiment, the second segmentation module 908 further comprises:
the field number identification acquisition module is used for acquiring a field number identification corresponding to each piece of data in each data table;
the module calculation module is used for performing module calculation on the field number identifier to obtain a module calculation result;
and the die-solving and die-cutting division module is used for dividing the corresponding data tables according to the die-solving result and the data number to obtain a plurality of data sub-tables corresponding to each data table.
FIG. 10 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be a server. As shown in fig. 10, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement a method of segmenting data. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform a method of segmenting data. The network interface is used for communicating with the outside. Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the method for segmenting data provided by the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 10. The memory of the computer device can store the individual program templates of the segmentation means that make up the data. Such as a first acquisition module 902, a first segmentation module 904, a second acquisition module 906, and a second segmentation module 908.
Specifically, the computer device according to the embodiment of the present invention includes a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the following steps:
acquiring a data set and a service type of the data in the data set;
segmenting the data of the data set according to the service type to obtain a plurality of data tables;
acquiring the number of data in each data table;
and segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
The computer equipment firstly acquires a data set and a service type of the data in the data set; segmenting the data of the data set according to the service type to obtain a plurality of data tables; then acquiring the number of data in each data table; and finally, segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table. Therefore, through the method, the original data set is stored by obtaining the plurality of data sub-tables instead of storing the original data set in one data table, so that the data can be directly inquired in one data sub-table instead of a large data table for storing the original data set when the data is required to be inquired subsequently, the inquiry efficiency and the response efficiency can be improved due to the reduction of the inquiry quantity, and in addition, the data is firstly segmented according to the service in the data segmentation mode, and then the table segmented through the service is segmented again, so that the finally obtained data sub-table is smaller than the data sub-table obtained only by one segmentation, and the inquiry efficiency and the response efficiency are further improved.
In one embodiment, the computer program is further configured to, when executed by the processor, perform the following steps:
determining the number of databases according to the number of the data tables;
and storing the plurality of data tables in a plurality of determined databases.
In one embodiment, the segmenting the corresponding data table according to the number of the data pieces to obtain a plurality of data sub-tables corresponding to each data table includes:
acquiring the occupied space corresponding to each piece of data in each data table;
determining the number of data sub-tables corresponding to the data table according to the number of the data in the data table and the occupied space corresponding to each piece of data in the data table;
and segmenting the data tables according to the number of the data sub-tables to obtain a plurality of data sub-tables corresponding to each data table.
In one embodiment, the computer program is further configured to, when executed by the processor, perform the following steps:
determining the number of databases according to the number of the data sub-tables;
and storing the plurality of data sub-tables in a plurality of determined databases.
In one embodiment, the segmenting the corresponding data table according to the number of the data pieces to obtain a plurality of data sub-tables corresponding to each data table includes:
acquiring a field number identifier corresponding to each piece of data in each data table;
performing a modulus operation on the field number identifier to obtain a modulus result;
and segmenting the corresponding data tables according to the modulus calculation result and the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
In one embodiment, the computer program is further configured to, when executed by the processor, perform the following steps:
and sending the corresponding relation between the service type and the data table, the corresponding relation between the data table and the corresponding data sub-table and the data identifier corresponding to each piece of data in the data sub-table to a data table management center, so that the data table management center searches the data corresponding to the data identifier from the corresponding data sub-table according to the service type and the data identifier in the received data processing request.
A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
acquiring a data set and a service type of the data in the data set;
segmenting the data of the data set according to the service type to obtain a plurality of data tables;
acquiring the number of data in each data table;
and segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
The computer readable storage medium firstly acquires a data set and a service type of the data in the data set; segmenting the data of the data set according to the service type to obtain a plurality of data tables; then acquiring the number of data in each data table; and finally, segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table. Therefore, through the method, the original data set is stored by obtaining the plurality of data sub-tables instead of storing the original data set in one data table, so that the data can be directly inquired in one data sub-table instead of a large data table for storing the original data set when the data is required to be inquired subsequently, the inquiry efficiency and the response efficiency can be improved due to the reduction of the inquiry quantity, and in addition, the data is firstly segmented according to the service in the data segmentation mode, and then the table segmented through the service is segmented again, so that the finally obtained data sub-table is smaller than the data sub-table obtained only by one segmentation, and the inquiry efficiency and the response efficiency are further improved.
In one embodiment, the computer program is further configured to, when executed by the processor, perform the following steps:
determining the number of databases according to the number of the data tables;
and storing the plurality of data tables in a plurality of determined databases.
In one embodiment, the segmenting the corresponding data table according to the number of the data pieces to obtain a plurality of data sub-tables corresponding to each data table includes:
acquiring the occupied space corresponding to each piece of data in each data table;
determining the number of data sub-tables corresponding to the data table according to the number of the data in the data table and the occupied space corresponding to each piece of data in the data table;
and segmenting the data tables according to the number of the data sub-tables to obtain a plurality of data sub-tables corresponding to each data table.
In one embodiment, the computer program is further configured to, when executed by the processor, perform the following steps:
determining the number of databases according to the number of the data sub-tables;
and storing the plurality of data sub-tables in a plurality of determined databases.
In one embodiment, the segmenting the corresponding data table according to the number of the data pieces to obtain a plurality of data sub-tables corresponding to each data table includes:
acquiring a field number identifier corresponding to each piece of data in each data table;
performing a modulus operation on the field number identifier to obtain a modulus result;
and segmenting the corresponding data tables according to the modulus calculation result and the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
In one embodiment, the computer program is further configured to, when executed by the processor, perform the following steps:
and sending the corresponding relation between the service type and the data table, the corresponding relation between the data table and the corresponding data sub-table and the data identifier corresponding to each piece of data in the data sub-table to a data table management center, so that the data table management center searches the data corresponding to the data identifier from the corresponding data sub-table according to the service type and the data identifier in the received data processing request.
It should be noted that, the embodiment of the method for splitting data, the embodiment of the apparatus for splitting data, the embodiment of the computer device, and the embodiment of the computer-readable storage medium belong to the same inventive concept, and the contents in the embodiment of the method, the embodiment of the apparatus, the embodiment of the computer device, and the embodiment of the computer-readable storage medium are mutually applicable.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for segmenting data, the method comprising:
acquiring a data set and a service type of the data in the data set;
segmenting the data of the data set according to the service type to obtain a plurality of data tables;
acquiring the number of data in each data table;
and segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
2. The method of claim 1, wherein after the segmenting the data of the dataset according to the service type to obtain a plurality of data tables, the method further comprises:
determining the number of databases according to the number of the data tables;
and storing the plurality of data tables in a plurality of determined databases.
3. The method of claim 1, wherein the segmenting the corresponding data table according to the number of the data bars to obtain a plurality of data sub-tables corresponding to each data table comprises:
acquiring the occupied space corresponding to each piece of data in each data table;
determining the number of data sub-tables corresponding to the data table according to the number of the data in the data table and the occupied space corresponding to each piece of data in the data table;
and segmenting the data tables according to the number of the data sub-tables to obtain a plurality of data sub-tables corresponding to each data table.
4. The method as claimed in claim 3, wherein after the splitting the data tables according to the number of the data sub-tables to obtain a plurality of data sub-tables corresponding to each data table, further comprising:
determining the number of databases according to the number of the data sub-tables;
and storing the plurality of data sub-tables in a plurality of determined databases.
5. The method of claim 1, wherein the segmenting the corresponding data table according to the number of the data bars to obtain a plurality of data sub-tables corresponding to each data table comprises:
acquiring a field number identifier corresponding to each piece of data in each data table;
performing a modulus operation on the field number identifier to obtain a modulus result;
and segmenting the corresponding data tables according to the modulus calculation result and the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
6. The method according to any one of claims 1 to 5, wherein after the segmenting the corresponding data table according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table, the method further comprises:
and sending the corresponding relation between the service type and the data table, the corresponding relation between the data table and the corresponding data sub-table and the data identifier corresponding to each piece of data in the data sub-table to a data table management center, so that the data table management center searches the data corresponding to the data identifier from the corresponding data sub-table according to the service type and the data identifier in the received data processing request.
7. An apparatus for slicing data, the apparatus comprising:
the first acquisition module is used for acquiring a data set and the service type of the data in the data set;
the first segmentation module is used for segmenting the data of the data set according to the service type to obtain a plurality of data tables;
the second acquisition module is used for acquiring the number of data in each data table;
and the second segmentation module is used for segmenting the corresponding data tables according to the number of the data strips to obtain a plurality of data sub-tables corresponding to each data table.
8. The apparatus of claim 7, further comprising:
the first number determining module is used for determining the number of the databases according to the number of the data tables;
the first database storage module is used for storing the data tables in the determined databases.
9. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 6.
10. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 6.
CN201811559134.1A 2018-12-19 2018-12-19 Data segmentation method and device, computer equipment and storage medium Active CN111339133B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811559134.1A CN111339133B (en) 2018-12-19 2018-12-19 Data segmentation method and device, computer equipment and storage medium
PCT/CN2018/122380 WO2020124491A1 (en) 2018-12-19 2018-12-20 Method and device for segmenting data, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811559134.1A CN111339133B (en) 2018-12-19 2018-12-19 Data segmentation method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111339133A true CN111339133A (en) 2020-06-26
CN111339133B CN111339133B (en) 2022-08-05

Family

ID=71100999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811559134.1A Active CN111339133B (en) 2018-12-19 2018-12-19 Data segmentation method and device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111339133B (en)
WO (1) WO2020124491A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050108184A1 (en) * 2001-11-09 2005-05-19 Turbo Data Laboratories, Inc Data joining/displaying method
US20090106299A1 (en) * 2005-08-15 2009-04-23 Turbo Data Laboratories, Inc. Shared-memory multiprocessor system and information processing method
CN102004804A (en) * 2010-12-31 2011-04-06 西北大学 Method for storing and inquiring range data
CN103605755A (en) * 2013-11-23 2014-02-26 华中科技大学 Hangul database, Hangul database construction method and Hangul database retrieval system
CN105488231A (en) * 2016-01-22 2016-04-13 杭州电子科技大学 Self-adaption table dimension division based big data processing method
US20160232159A1 (en) * 2015-02-09 2016-08-11 Ca, Inc. System and method of reducing data in a storage system
CN106294740A (en) * 2016-08-10 2017-01-04 北京创锐文化传媒有限公司 Data processing method, device and server
CN108090225A (en) * 2018-01-05 2018-05-29 腾讯科技(深圳)有限公司 Operation method, device, system and the computer readable storage medium of database instance
US10108669B1 (en) * 2014-03-21 2018-10-23 Xactly Corporation Partitioning data stores using tenant specific partitioning strategies

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9600553B1 (en) * 2014-05-31 2017-03-21 Veritas Technologies Llc Distributed replication in cluster environments

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050108184A1 (en) * 2001-11-09 2005-05-19 Turbo Data Laboratories, Inc Data joining/displaying method
US20090106299A1 (en) * 2005-08-15 2009-04-23 Turbo Data Laboratories, Inc. Shared-memory multiprocessor system and information processing method
CN102004804A (en) * 2010-12-31 2011-04-06 西北大学 Method for storing and inquiring range data
CN103605755A (en) * 2013-11-23 2014-02-26 华中科技大学 Hangul database, Hangul database construction method and Hangul database retrieval system
US10108669B1 (en) * 2014-03-21 2018-10-23 Xactly Corporation Partitioning data stores using tenant specific partitioning strategies
US20160232159A1 (en) * 2015-02-09 2016-08-11 Ca, Inc. System and method of reducing data in a storage system
CN105488231A (en) * 2016-01-22 2016-04-13 杭州电子科技大学 Self-adaption table dimension division based big data processing method
CN106294740A (en) * 2016-08-10 2017-01-04 北京创锐文化传媒有限公司 Data processing method, device and server
CN108090225A (en) * 2018-01-05 2018-05-29 腾讯科技(深圳)有限公司 Operation method, device, system and the computer readable storage medium of database instance

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈世敏: "大数据分析与高速数据更新", 《计算机研究与发展》 *

Also Published As

Publication number Publication date
CN111339133B (en) 2022-08-05
WO2020124491A1 (en) 2020-06-25

Similar Documents

Publication Publication Date Title
CN107844634B (en) Modeling method of multivariate general model platform, electronic equipment and computer readable storage medium
CN109558404B (en) Data storage method, device, computer equipment and storage medium
CN110866181B (en) Resource recommendation method, device and storage medium
CN108717426B (en) Enterprise data updating method and device, computer equipment and storage medium
CN113220657B (en) Data processing method and device and computer equipment
CN107807967B (en) Real-time recommendation method, electronic device and computer-readable storage medium
CN112015820A (en) Method, system, electronic device and storage medium for implementing distributed graph database
CN110781203A (en) Method and device for determining data width table
CN109033295B (en) Method and device for merging super-large data sets
CN110457401B (en) Data storage method and device, computer equipment and storage medium
CN109656947B (en) Data query method and device, computer equipment and storage medium
CN111198961B (en) Commodity searching method, commodity searching device and commodity searching server
CN111209061A (en) Method and device for filling in user information, computer equipment and storage medium
CN112835921B (en) Slow query processing method and device, electronic equipment and storage medium
CN109460500B (en) Hotspot event discovery method and device, computer equipment and storage medium
CN111339133B (en) Data segmentation method and device, computer equipment and storage medium
CN113849524B (en) Data processing method and device
CN110874370B (en) Data query method and device, computer equipment and readable storage medium
CN112835886A (en) Data table field adding method and device
CN117453153B (en) File storage method, device, terminal and medium based on flush rule
CN111079435B (en) Named entity disambiguation method, device, equipment and storage medium
CN117406967B (en) Component identification method and device, electronic equipment and storage medium
CN113609165A (en) Data request method, data processing method and data management system
CN117454025A (en) Method, device, equipment and medium for determining paging display data of server
CN117331967A (en) Data query method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 518000 16th and 22nd Floors, C1 Building, Nanshan Zhiyuan, 1001 Xueyuan Avenue, Nanshan District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen UBTECH Technology Co.,Ltd.

Address before: 518000 16th and 22nd Floors, C1 Building, Nanshan Zhiyuan, 1001 Xueyuan Avenue, Nanshan District, Shenzhen City, Guangdong Province

Patentee before: Shenzhen UBTECH Technology Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231213

Address after: Room 601, 6th Floor, Building 13, No. 3 Jinghai Fifth Road, Beijing Economic and Technological Development Zone (Tongzhou), Tongzhou District, Beijing, 100176

Patentee after: Beijing Youbixuan Intelligent Robot Co.,Ltd.

Address before: 518000 16th and 22nd Floors, C1 Building, Nanshan Zhiyuan, 1001 Xueyuan Avenue, Nanshan District, Shenzhen City, Guangdong Province

Patentee before: Shenzhen UBTECH Technology Co.,Ltd.