CN113742343A - Data splitting method, device and storage medium based on large amount of service data scenes - Google Patents

Data splitting method, device and storage medium based on large amount of service data scenes Download PDF

Info

Publication number
CN113742343A
CN113742343A CN202111018512.7A CN202111018512A CN113742343A CN 113742343 A CN113742343 A CN 113742343A CN 202111018512 A CN202111018512 A CN 202111018512A CN 113742343 A CN113742343 A CN 113742343A
Authority
CN
China
Prior art keywords
user
database
data
sub
phone number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111018512.7A
Other languages
Chinese (zh)
Inventor
侯亚君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pudong Development Bank Co Ltd
Original Assignee
Shanghai Pudong Development Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pudong Development Bank Co Ltd filed Critical Shanghai Pudong Development Bank Co Ltd
Priority to CN202111018512.7A priority Critical patent/CN113742343A/en
Publication of CN113742343A publication Critical patent/CN113742343A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data splitting method, a data splitting device and a storage medium based on a large number of service data scenes. A data splitting method based on a large number of service data scenes comprises the following steps: performing database and table dividing processing on a data table related to a user, wherein a fragment key used for the database and table dividing is determined according to the unique user identifier; and storing the plurality of data tables after the sub-database and sub-table processing into at least one database according to the sub-database and sub-table result. The invention discloses a data splitting method, a data splitting device and a storage medium based on a large number of service data scenes, which solve the problem of access bottleneck caused by overlarge pressure of a database when a large number of data services are stored.

Description

Data splitting method, device and storage medium based on large amount of service data scenes
Technical Field
The embodiment of the invention relates to a data processing technology, in particular to a data splitting method, a data splitting device and a storage medium based on a large number of service data scenes.
Background
The traditional data storage mode is to store data into a single data node in a centralized manner, namely single-table storage. And then when the data volume is increased to a certain amount, optimizing by adding indexes, rewriting SQL and the like. With the development of the internet and the rapid expansion of the user scale, the requirements on the system are higher and higher. When the data volume of the table reaches more than ten million levels and has a continuously increasing trend, and the concurrent access volume of the system reaches a certain scale, the average response time of the database is too long due to the increase, deletion, modification and check operations of the service.
Although the traditional optimization mode can improve the system performance to a certain extent and alleviate the problems caused by the data volume and the access volume, various bottleneck problems of a server CPU, a disk IO and a memory still exist. The maintenance cost of the index is high at this point, and data backup and restore times, costs and risks become uncontrollable as the amount of data increases.
Disclosure of Invention
The invention provides a data splitting method, a data splitting device and a storage medium based on a large number of service data scenes, and solves the problem of access bottleneck caused by overlarge pressure of a database when a large number of data services are stored.
In a first aspect, an embodiment of the present invention provides a data splitting method based on a large number of service data scenes, including:
performing database and table dividing processing on a data table related to a user, wherein a fragment key used for the database and table dividing is determined according to the unique user identifier;
and storing the plurality of data tables after the sub-database and sub-table processing into at least one database according to the sub-database and sub-table result.
In a second aspect, an embodiment of the present invention further provides a data splitting apparatus based on a large number of service data scenes, including:
the table dividing module is used for carrying out database and table dividing processing on the data table related to the user, wherein the fragment key used for the database and table dividing is determined according to the unique user identifier;
and the storage module is used for storing the plurality of data tables subjected to the sub-database and sub-table processing into at least one database according to the sub-database and sub-table result.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the program, implements the data splitting method based on a large number of service data scenes as in the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the data splitting method based on a large number of service data scenes according to the first aspect.
The data splitting method, the device and the storage medium based on a large number of service data scenes, provided by the embodiment of the invention, are used for performing database-based and table-based processing on the data tables related to the user, wherein the fragment keys used for the database-based and table-based processing are determined according to the unique user identification, and then a plurality of data tables processed by the database-based and table-based processing are stored in at least one database according to the database-based and table-based result, so that the access bottleneck caused by overlarge pressure of the database when a large number of data services are stored is solved.
Drawings
FIG. 1 illustrates a scenario for vertically sharding a user table and an order table into different databases according to business needs;
FIG. 2 illustrates a scheme for horizontally sharding user tables and order tables into different databases based on primary key parity values;
fig. 3 is a flowchart of a data splitting method based on a large number of service data scenes according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data splitting apparatus based on a large number of service data scenes according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Summarizing several aspects affecting database performance 1) data volume. The single-library data size has good performance within 5000 ten thousand, and the performance becomes weaker along with the increase of the data size after the threshold value is exceeded. The data size of the single table is 500w-1000w, the performance is better, and the performance is reduced when the data size exceeds 1000 w. 2) A magnetic disk. Because the disk space of a single service is limited, if all requests access the same node under concurrent pressure, the disk IO is certainly affected greatly. 3) And connecting the databases. Database connections are very rare resources, and if data related to existing users, commodities and orders exist in one database, the database connections are likely to become bottlenecks when massive users operate simultaneously.
In order to solve the above problem, an embodiment of the present invention provides a data splitting method based on a large number of service data scenes, based on a library/table splitting function under a data slicing function point in Sharding-JDBC. Data fragmentation refers to the decentralized storage of data stored in a single database into multiple databases or tables according to a certain dimension to achieve the effect of improving performance bottleneck and usability. The effective means of data fragmentation is to perform database partitioning and table partitioning on the relational database. The sub-base and the sub-table can effectively avoid the query bottleneck caused by the data volume exceeding the bearable threshold. In addition, the sub-database can be used for effectively dispersing the access amount of single points of the database; while not alleviating database stress, the sub-tables provide the potential to try to translate distributed transactions into local transactions, which often complicate matters once cross-library update operations are involved. The embodiment of the invention assumes that the rules of database division and table division are unified, namely, the condition of database division transaction is avoided.
The method has the advantages that the data size of each table is kept below a threshold value by splitting data through the sub-base and the sub-table, and the method is effective for dealing with high concurrency and mass data systems by dredging the flow and dealing with high access capacity. The splitting mode of the data shards is divided into a vertical shard and a horizontal shard.
The method is called vertical fragmentation, also called longitudinal fragmentation, according to the mode of service splitting, and the core concept of the method is special for a special database. Before splitting, a database is composed of a plurality of data tables, each table corresponding to a different service. After splitting, the tables are classified according to the services and distributed into different databases, so that the pressure is dispersed to the different databases. Fig. 1 shows a scheme for vertically fragmenting user tables and order tables into different databases according to service needs. Wherein the user table (t-user) and the order table (t _ order) originally stored in the database a are stored in the database a and the database B, respectively.
Vertical slicing often requires adjustments to architecture and design. Generally speaking, it is not time to cope with the rapid change of the internet service demand; moreover, it does not really solve the single-point bottleneck. The vertical splitting can alleviate the problems caused by data volume and access volume, but cannot radically cure the problems. If the amount of data in the table still exceeds the threshold that a single node can carry after vertical splitting, then horizontal splitting is required for further processing.
Horizontal slices are also known as lateral splits. With respect to vertical sharding, it no longer sorts data according to business logic, but rather disperses the data into multiple libraries or tables according to some rule through some field (or some few fields), each shard containing only a portion of the data. For example: the records for even primary keys are placed in the 0 bank (or table) and the records for odd primary keys are placed in the 1 bank (or table) according to the primary key shard, as shown in fig. 2, and fig. 2 illustrates a scheme for horizontally sharding user tables and order tables into different databases according to the primary key parity values. The user table (t-user) originally stored in the database A is respectively stored in the database A and the database B according to the parity value of the user identification (id). The horizontal slicing breaks through the bottleneck of single machine data volume processing theoretically, is relatively free in expansion, and is a standard solution for sub-base and sub-table.
Although data fragmentation solves the problems of performance, availability, single point backup recovery and the like, a distributed architecture also introduces new problems while gaining benefits. In the face of such scattered data after partitioning, it is one of the important challenges that application development engineers and database administrators become abnormally heavy to operate on the database. They need to know from which particular database sub-table the data needs to be retrieved. Another challenge is that SQL that can run correctly in a single-node database does not necessarily run correctly in a database after fragmentation. For example, the sub-tables result in modification of table names, or incorrect handling of operations such as paging, sorting, aggregating packets, and the like. Cross-library transactions are also a troublesome task to be faced by distributed database clusters. By reasonably adopting the sub-tables, local transactions can be used as much as possible under the condition of reducing the data volume of the single table, and the trouble caused by distributed transactions can be effectively avoided by using different tables in the same library. Therefore, the embodiment of the invention only uses the functions of database division and table division under the data fragmentation function point in Sharding-JDBC. Meanwhile, the influence of the Sharding-JDBC on the database partitioning is made transparent as much as possible, so that a user can use a database cluster after horizontal partitioning as much as possible like using one database, and the method is a main design target of a Sharding sphere data partitioning module.
Fig. 3 is a flowchart of a data splitting method based on a large number of service data scenes according to an embodiment of the present invention, and as shown in fig. 3, the data splitting method based on a large number of service data scenes according to the embodiment includes:
step S310, performing database and table dividing processing on the data table related to the user, wherein the fragment key used for the database and table dividing is determined according to the unique user identifier.
The data splitting method based on a large number of service data scenes provided by the embodiment is applied to a data storage server, the data storage server is connected with one or more databases, and the data is split and then stored in the one or more databases. The data splitting method provided by the embodiment of the invention is based on a database and table splitting idea in a Sharding-JDBC architecture, but is not limited to the Sharding-JDBC architecture.
In this embodiment, the sharding key determined by the user unique identifier is used to perform database-sharing and table-sharing processing on the data table related to the user. The user unique identifier is any identifier capable of uniquely determining user identity information, such as a user number allocated to the user during user registration, a mobile phone number used during user registration, a user certificate number and the like. Each data stored in the database is data associated with a user and includes at least one user unique identifier. According to a preset database-dividing rule, a database related to a user can be divided into a plurality of data tables, and a corresponding database is distributed for each data table.
Wherein the data table associated with the user may be a multidimensional table associated with the user and an order-related table, including, for example: a user table, an order information record table, a user binding table and the like, and a plurality of dimension tables related to the user. The data table is divided into a plurality of tables according to a certain rule and then stored in a single or a plurality of databases, the sub-tables are uniformly managed by a sub-database sub-table management server (in the embodiment, a shading-JDBC architecture is taken as an example), and when a user or a system administrator performs adding, deleting, changing and checking operations, a route finds the corresponding sub-database sub-table according to a fragment key and a fragment rule to perform the operation. The system does not operate all data any more, and the access pressure of the single table and the occupation of the disk IO are reduced. Database maintenance personnel can also migrate data or delete redundancy aiming at only one sub-table, so that the maintenance risk is reduced, and the operation and management efficiency of the system on the data is effectively improved.
Step S320, storing the plurality of data tables after the sub-database and sub-table processing in at least one database according to the sub-database and sub-table result.
After the database-dividing and table-dividing processing is finished on the data packet related to the user, a plurality of data tables obtained after the database-dividing and table-dividing processing can be stored in at least one database according to the database-dividing and table-dividing result. The number of the databases can be determined according to the data scale of the data tables and the storage capacity of the databases, and the data tables are stored in one or more databases respectively.
When there are at least two databases, the multiple databases may be stored in the at least two databases on average to equalize the access pressure of the multiple databases. That is, the database-dividing and table-dividing rule can perform corresponding database-dividing and table-dividing processing according to the number of the databases, and evenly distribute the fragmented multiple data tables in the multiple databases.
In addition, when a plurality of databases are provided, the plurality of databases after being divided into the database and the table are averagely stored in the plurality of databases, and in order to further balance the access pressure of each database and each table, a shard key with a more even distribution value can be selected. The space size of each table after being partitioned is about equal, and the load of the disk IO is more balanced. In the embodiment of the invention, the fragment key can be determined according to the user number or the mobile phone number of the user. Further, the fragment key can be determined according to the last two digits of the user number or the user mobile phone number, the user number is the same as the last two digits of the user mobile phone number, the last digit of the user number or the user mobile phone number is used for sorting, and the last digit of the user number or the user mobile phone number is used for sorting. The last two digits of the mobile phone number of the user have randomness, and the last two digits of the mobile phone number are used as the fragment keys, so that the data can be uniformly distributed. Specifically, the last two digits of the mobile phone number or the user number are used as fragment keys, the first digit of each fragment key is used for dividing libraries, the sequence of the libraries is from 0-9 libraries to 10 libraries in total, the second digit of each fragment key is used for dividing tables, the sequence of the tables is from 0-9 tables, one library comprises 10 data tables, and 100 data tables in total. And preferentially taking the user number as a sub-table key, and if the user number does not exist, taking the mobile phone number as a sub-table key. Thus, 10 databases each storing 10 data tables are required.
According to the data splitting method based on a large number of service data scenes, the data tables related to the user are subjected to database-splitting table-splitting processing, the fragment keys used for the database-splitting table-splitting processing are determined according to the unique user identification, then the multiple data tables subjected to the database-splitting table-splitting processing are stored in at least one database according to the database-splitting table-splitting result, and the access bottleneck caused by overlarge pressure of the database when a large number of data services are stored is solved.
The data splitting method based on a large number of service data scenes provided by the invention is further described in detail below by taking the database-partitioning and table-partitioning processing using the Sharding-JDBC architecture as a specific embodiment.
The core part of the Sharding-JDBC consists of six parts, namely SQL analysis, executor optimization, SQL routing, SQL rewriting, SQL execution and result merging, wherein the SQL analysis is divided into lexical analysis and syntactic analysis. SQL is firstly split into non-separable lexical units, namely, Token, through a lexical parser. SQL is then converted to an abstract syntax tree using a syntax parser. And traversing the abstract syntax tree to extract the analysis context for the fragment. And optimizing merging and optimizing slicing conditions such as OR and the like by an executor. And the SQL routing engine is matched with the fragmentation strategy configured by the user according to the analysis context and generates a broadcast route. And next, rewriting SQL written facing the logic library and the logic table into SQL which can be normally executed in a real database. And finally, asynchronously executing through a multi-thread executor, merging a plurality of execution result sets, and outputting the merged execution result sets to a user through a unified JDBC interface.
And according to the estimated traffic of the current system, performing sub-table design on the user correlation table and the order information table. When the sharing-JDBC is integrated for performing the database and table division, the fragment table needs to be configured with fragment keys, data nodes and the like, and a specific fragment algorithm is implemented in java code, and a user table (ft _ container _ user) is taken as an example to perform the sharing-JDBC database and table division integration description. As shown in table 1.
TABLE 1ft _ Cashier _ user section field
Name of field Type (B) Note
userid varchar User number
idno varchar Certificate number
name varchar Name (I)
mobileno varchar Mobile phone number
datetime timestamp Registration time
ip varchar Login ip
regstatus char(1) Registration status
Channel char(2) Channel number
Logical tables are a general term for the same logical and data structure tables of a horizontally split database (table). Example (c): the user data is divided into 10 tables according to the user number mantissas, namely ft _ case _ user _0 to ft _ case _ user _9, and the logical table names of the tables are ft _ case _ user. Real tables are physical tables that actually exist in a sharded database. Namely ft _ case _ user _0 to ft _ case _ user _9 in the previous example. A data node is the smallest unit of data fragmentation. Consists of data source names and data tables, example: ds _0.ft _ case _ user _0.
The sharding key, i.e. the database field for sharding, is the key field for horizontally splitting the database (table) and is also the key place of the database-splitting scheme. The more evenly the values of the sharded keys are distributed, the more equally the space size of each table is, and the more balanced the disk IO load is. The selection of the slicing key also determines the selection of the slicing scheme.
And slicing the data through a slicing algorithm, and supporting the slicing through BeTWEEN and IN. The fragmentation algorithm needs to be realized by an application developer, and the achievable flexibility is very high. Currently 4 fragmentation algorithms are provided. Because the fragmentation algorithm is closely related to the service implementation, a built-in fragmentation algorithm is not provided, but various scenes are extracted through a fragmentation strategy, higher-level abstraction is provided, and an interface is provided to enable an application developer to realize the fragmentation algorithm by himself. 1) And (4) a precise slicing algorithm. Corresponding to PreciseShardingAlgorithm, for handling a scenario of slicing with IN using a single key as a slicing key. It is used in combination with StandardShardingStrategy. 2) And (4) a range slicing algorithm. Corresponding to RangeShardingAlgorithm, for handling scenarios for sharding by betweeen AND using a single key as the sharding key. It is used in combination with StandardShardingStrategy. 3) And (4) a composite slicing algorithm. Corresponding to complexkeyshardingalgorithm, the method is used for processing a scene that uses multiple keys as slicing keys for slicing, the logic of the multiple slicing keys is complex, and an application developer is required to process the complexity of the multiple slicing keys. It is required to be used with complexsurdingstrand. 4) The Hint slicing strategy. Corresponding to hintshardingstrand. Policies that are sharded by way of Hint rather than SQL parsing. In the embodiment of the invention, a composite flat algorithm is adopted.
In a data packet related to a user, data related to the user is generally related to a mobile phone number and the user number, so that the user number of the user is kept consistent with the last two digits of the mobile phone number, the last two digits of the mobile phone number and the user number are used as fragment keys, the first digit of each fragment key is used for dividing a library, the sequence of the libraries is from 0-9 libraries to 10 libraries, the second digit of each fragment key is used for dividing a table, the sequence of the tables is from 0-9 tables, one library comprises 10 user tables, and the total number of the user tables is 100. The user number is preferentially taken as a table dividing key, if the user number does not exist, the mobile phone number is taken as a table dividing key, and the last two digits of the mobile phone number have randomness, so that the data can be uniformly distributed.
The specific algorithm is implemented in the service layer. Only the path of the implementation class is configured in the configuration file, and the configuration is as follows:
1. fragmented column field configuration
<sharding:complex-strategy id="databaseComplexShardingStrategy"sharding-columns="userid,mobileno,uniquekey"algorithm-ref="ComplexDatabaseShardingAlgorithm"/>
2. Composite fragmentation algorithm path configuration
<bean id="ComplexDatabaseShardingAlgorithm"class="com.spdb.shardingsphere.DatabaseComplexSharding"/>
3. Data node configuration
10 libraries are built in a database server, 10 user tables are newly built in each library, and 100 user tables are shared. Then, the data nodes are configured by using the row expressions and used for mapping the data table routing nodes, the final result of the whole expression is subjected to Cartesian combination according to the result of each sub-expression, then routing processing is carried out by a fragmentation algorithm, and the configuration is as follows:
<sharding:table-rule logic-table="ft_cashier_user"actual-data-nodes="epaydev$->{0..9}.ft_cashier_user_$->{0..99}"database-strategy-ref="databaseComplexShardingStrategy"table-strategy-ref="tableShardingStrategy"/>
in order to verify the data splitting scheme provided by the embodiment, a test environment is set up for verification. Because the existing PE micro-service framework uses spring boot + mybatis + dry, these tools are also used for the test environment setup of the embodiment, and the tool list is as follows:
1.Jdk1.8
2.STS
3.sharding-jdbc 3.0.0
4.springboot 1.5.7
5.mysql
6.mybatis
7.druid
for the user table, 10 epaydev banks are established, each bank has 10 user tables, and the final node condition is as follows: the logic table structure of epaydev _0.ft _ case _ user _ 00-epaydev _0.ft _ case _ user _00, … …, epaydev _9.ft _ case _ user _ 90-epaydev _9.ft _ case _ user _99user is as follows:
Figure BDA0003240861130000121
the user table has three fields: userid, idno, name, mobileno, datetime, ip, regstatus, Channel. Where the userid field is the unique index.
The starting dependence of sharding-JDBC is introduced, and the description of the starting dependence of other sharding-JDBC such as MySQL and mybatis is omitted:
Figure BDA0003240861130000122
rule configuration based on spring boot
Data source name, multiple data sources separated by commas
sharding.jdbc.datasource.names=epaydev_0,epaydev_1,epaydev_2,epaydev_3,epaydev_4,epaydev_5,epaydev_6,epaydev_7,epaydev_8,epaydev_9。
Tables not configured with fragmentation rules will be located by default data source-applicable to single-library list tables that do not need to be configured with fragmentation rules
sharding.jdbc.config.sharding.defaultDataSourceName=epaydev_0
Data node configuration
sharding.jdbc.config.sharding.tables.user.actualDataNodes=epaydev_$->{0..9}.ft_cashier_user_0$->{0..9}
Data source detailed configuration (0 to 9 library)
The # # # # # # # # # # # # # # # # # # # # # # # # # # # # ## # ### # #### # ########### # is configured in the library
sharding.jdbc.datasource.epaydev_0.type=com.alibaba.druid.pool.DruidDataSource
sharding.jdbc.datasource.epaydev_0.driver-class-name=com.mysql.jdbc.Driversharding.jdbc.datasource.epaydev_0.url=jdbc:mysql://localhost:3306/epaydev_0characterEncoding=UTF-8&useSSL=true
……
The # # # # # # # # # # # # # # # # # # # # # # # # # # ## # ### # ###### # ########### # is configured in the library
sharding.jdbc.datasource.epaydev_9.type=com.alibaba.druid.pool.DruidDataSource
sharding.jdbc.datasource.epaydev_9.driver-class-name=com.mysql.jdbc.Driversharding.jdbc.datasource.epaydev_9.url=jdbc:mysql://localhost:3306/epaydev_0characterEncoding=UTF-8&useSSL=true
Subdivision logic
The DoSharding method for realizing ComplexKeysShardingAlgorithm interface can
Figure BDA0003240861130000141
Figure BDA0003240861130000151
Figure BDA0003240861130000161
Sub-table logic
The DoSharding method for realizing ComplexKeysShardingAlgorithm interface can
Figure BDA0003240861130000162
Figure BDA0003240861130000171
Figure BDA0003240861130000181
Figure BDA0003240861130000191
Directly taking the last two bits of the userid field, such as: and when the userid is 000013735900, taking the last two bits as 00, allocating the userid to a 0 table in the 0 library: ft _ case _ user _00 table, so the record with userid of 000013735900 is eventually routed to epaydev _0.ft _ case _ user _00 table.
Figure BDA0003240861130000192
In the test environment, the total amount of data of the user table is 3000 ten thousand, and the client side performs read-write operation on the database through the interface. The service interface is realized by java code, and a Jmeter performance test tool is used for testing the functional interface. Under the condition of not adding indexes or performing other optimization, the database and the table are obviously improved relative to each performance of a single table. On an e-commerce payment platform, the database and table division is added with indexes to achieve a better optimization effect. In addition, the maintenance cost of indexes after the database and the table are divided is more convenient than that of a single table. Ensuring sufficient server resources, setting the same concurrency amount and the same test time, comparing the operation performance of the list table and the sub-database sub-tables, and comparing the service scenes, wherein 1) data is stored in a database; 2) updating according to the user number; 3) and inquiring according to the user number. The SQL performance after the database division and the table division is well improved, and more requests can be processed under the same time and concurrent conditions. When the sharding-jdbc is used as the intermediate component of the database and the table, only jar packets need to be introduced, the database and the table can be divided without intruding service codes, meanwhile, the database and the table dividing rule conforming to the actual service scene can be freely realized, for example, a plurality of database dividing fields are defined, the priority is set, the database is divided according to the field priority, and in short, when the database and the table dividing rule is realized by self, the use effect of conforming to the actual service scene can be achieved.
The data splitting method based on a large number of service data scenes provided by the embodiment of the invention designs a storage scheme based on a Sharding-JDBC branch table according to service requirements and data storage characteristics, and improves the manageability and usability of a data table related to a user. The user number and the mobile phone number of the user are combined, the latter two digits are used as a strategy of database and table division, data of each table in each database can be averaged better, management and operation efficiency of the data tables related to the user is improved on the premise of ensuring user experience, and operation and maintenance cost of the database is reduced.
Fig. 4 is a schematic structural diagram of a data splitting device based on a large number of service data scenes according to an embodiment of the present invention, and as shown in fig. 4, the data splitting device based on a large number of service data scenes according to this embodiment includes:
the table dividing module 41 is used for performing database and table dividing processing on the data table related to the user, wherein the fragment key used for the database and table dividing is determined according to the unique user identifier;
and the storage module 42 is configured to store the multiple data tables after the sub-database and sub-table processing in at least one database according to the sub-database and sub-table result.
The data splitting device based on a large number of service data scenes provided by the embodiment of the invention can execute the data splitting method based on a large number of service data scenes provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Further, on the basis of the embodiment shown in fig. 4, the fragment key is determined according to the user number or the mobile phone number of the user.
Further, on the basis of the embodiment shown in fig. 4, the fragment key is determined according to the last two digits of the user number or the user mobile phone number, and the user number is the same as the last two digits of the user mobile phone number, where the last digit of the user number or the user mobile phone number is used for sorting, and the last digit of the user number or the user mobile phone number is used for sorting.
Further, on the basis of the embodiment shown in fig. 4, after the databases are sorted and tabulated, when at least one database is greater than or equal to two databases, the number of data tables stored in each database is the same.
Further, on the basis of the embodiment shown in fig. 4, the data table includes: user table, order information record table, user card binding table.
Fig. 5 is a schematic structural diagram of a computer apparatus according to an embodiment of the present invention, as shown in fig. 5, the computer apparatus includes a processor 51, a memory 52, an input device 53, and an output device 54; the number of the processors 51 in the computer device may be one or more, and one processor 51 is taken as an example in fig. 5; the processor 51, the memory 52, the input device 53 and the output device 54 in the computer apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 5.
The memory 52 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the data splitting method based on a large number of service data scenarios in the embodiment of the present invention (for example, the sub-table module 41, the storage module 42 in the data splitting apparatus based on a large number of service data scenarios). The processor 51 executes various functional applications and data processing of the computer device by running software programs, instructions and modules stored in the memory 52, that is, implements the data splitting method based on a large number of service data scenarios.
The memory 52 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 52 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 52 may further include memory located remotely from the processor 51, which may be connected to a computer device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 53 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the computer apparatus. The output device 54 may include a display device such as a display screen.
Embodiments of the present invention further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a data splitting method based on a large number of service data scenarios, where the method includes:
performing database and table dividing processing on a data table related to a user, wherein a fragment key used for the database and table dividing is determined according to the unique user identifier;
and storing the plurality of data tables after the sub-database and sub-table processing into at least one database according to the sub-database and sub-table result.
Of course, the storage medium provided by the embodiment of the present invention includes computer-executable instructions, where the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the data splitting method based on a large number of service data scenarios provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the data splitting apparatus based on a large number of service data scenes, the included units and modules are only divided according to functional logic, but are not limited to the above division, as long as corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (12)

1. A data splitting method based on a large number of service data scenes is characterized by comprising the following steps:
performing database and table dividing processing on a data table related to a user, wherein a fragment key used for the database and table dividing is determined according to the unique user identifier;
and storing the plurality of data tables after the sub-database and sub-table processing into at least one database according to the sub-database and sub-table result.
2. The method of claim 1, wherein the shard key is determined according to a user number or a user phone number.
3. The method according to claim 2, wherein the fragment key is determined according to the last two digits of a user number or a user mobile phone number, the user number being the same as the last two digits of the user mobile phone number, wherein the last digit of the user number or the user mobile phone number is used for sorting, and the last digit of the user number or the user mobile phone number is used for sorting.
4. The method according to any one of claims 1 to 3, wherein after the database partitioning, when the at least one database is greater than or equal to two, the number of data tables stored in each database is the same.
5. The method according to any one of claims 1 to 3, wherein the data table comprises: user table, order information record table, user card binding table.
6. A data splitting device based on a large number of service data scenes is characterized by comprising:
the table dividing module is used for carrying out database and table dividing processing on the data table related to the user, wherein the fragment key used for the database and table dividing is determined according to the unique user identifier;
and the storage module is used for storing the plurality of data tables subjected to the sub-database and sub-table processing into at least one database according to the sub-database and sub-table result.
7. The apparatus of claim 6, wherein the fragment key is determined according to a user number or a user phone number.
8. The apparatus of claim 7, wherein the fragment key is determined according to a last two digits of a user number or a user phone number, the user number being the same as the last two digits of the user phone number, wherein the last digit of the user number or the user phone number is used for sorting, and the last digit of the user number or the user phone number is used for sorting.
9. The device according to any one of claims 6 to 8, wherein after the database division, when the at least one database is greater than or equal to two, the number of the data tables stored in each database is the same.
10. The apparatus of any one of claims 6 to 8, wherein the data table comprises: user table, order information record table, user card binding table.
11. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the data splitting method based on a plurality of service data scenes according to any one of claims 1 to 5 when executing the program.
12. A computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the data splitting method based on a plurality of service data scenarios according to any one of claims 1 to 5.
CN202111018512.7A 2021-09-01 2021-09-01 Data splitting method, device and storage medium based on large amount of service data scenes Pending CN113742343A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111018512.7A CN113742343A (en) 2021-09-01 2021-09-01 Data splitting method, device and storage medium based on large amount of service data scenes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111018512.7A CN113742343A (en) 2021-09-01 2021-09-01 Data splitting method, device and storage medium based on large amount of service data scenes

Publications (1)

Publication Number Publication Date
CN113742343A true CN113742343A (en) 2021-12-03

Family

ID=78734659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111018512.7A Pending CN113742343A (en) 2021-09-01 2021-09-01 Data splitting method, device and storage medium based on large amount of service data scenes

Country Status (1)

Country Link
CN (1) CN113742343A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238333A (en) * 2021-12-17 2022-03-25 中国邮政储蓄银行股份有限公司 Data splitting method, device and equipment
CN114327951A (en) * 2021-12-30 2022-04-12 上海众人智能科技有限公司 Modularized data management system based on multi-semantic expression
CN114510481A (en) * 2022-02-15 2022-05-17 中国农业银行股份有限公司 User credit data information storage method, device and equipment
CN115168409A (en) * 2022-09-05 2022-10-11 金蝶软件(中国)有限公司 Data query method and device for database sub-tables and computer equipment
CN115982176A (en) * 2023-03-17 2023-04-18 苏州阿基米德网络科技有限公司 Database storage method and system based on Sharding-JDBC
CN116166660A (en) * 2022-12-08 2023-05-26 广州纬纶信息科技有限公司 Method and device for compressing Cartesian product data combination

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611249A (en) * 2020-06-29 2020-09-01 平安银行股份有限公司 Data management method, device, equipment and storage medium
CN111737228A (en) * 2020-06-23 2020-10-02 平安医疗健康管理股份有限公司 Database and table dividing method and device
CN113094262A (en) * 2021-03-29 2021-07-09 四川新网银行股份有限公司 Method for testing production data based on database sub-base sub-table

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737228A (en) * 2020-06-23 2020-10-02 平安医疗健康管理股份有限公司 Database and table dividing method and device
CN111611249A (en) * 2020-06-29 2020-09-01 平安银行股份有限公司 Data management method, device, equipment and storage medium
CN113094262A (en) * 2021-03-29 2021-07-09 四川新网银行股份有限公司 Method for testing production data based on database sub-base sub-table

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238333A (en) * 2021-12-17 2022-03-25 中国邮政储蓄银行股份有限公司 Data splitting method, device and equipment
CN114327951A (en) * 2021-12-30 2022-04-12 上海众人智能科技有限公司 Modularized data management system based on multi-semantic expression
CN114510481A (en) * 2022-02-15 2022-05-17 中国农业银行股份有限公司 User credit data information storage method, device and equipment
CN115168409A (en) * 2022-09-05 2022-10-11 金蝶软件(中国)有限公司 Data query method and device for database sub-tables and computer equipment
CN116166660A (en) * 2022-12-08 2023-05-26 广州纬纶信息科技有限公司 Method and device for compressing Cartesian product data combination
CN116166660B (en) * 2022-12-08 2023-11-07 广州纬纶信息科技有限公司 Method and device for compressing Cartesian product data combination
CN115982176A (en) * 2023-03-17 2023-04-18 苏州阿基米德网络科技有限公司 Database storage method and system based on Sharding-JDBC

Similar Documents

Publication Publication Date Title
CN113742343A (en) Data splitting method, device and storage medium based on large amount of service data scenes
AU2013271538B2 (en) Data management and indexing across a distributed database
US8943103B2 (en) Improvements to query execution in a parallel elastic database management system
CN110147407B (en) Data processing method and device and database management server
CN107085570B (en) Data processing method, application server and router
US9208186B2 (en) Indexing technique to deal with data skew
CN112579606A (en) Workflow data processing method and device, computer equipment and storage medium
US8812645B2 (en) Query optimization in a parallel computer system with multiple networks
CN101916280A (en) Parallel computing system and method for carrying out load balance according to query contents
US20210390089A1 (en) Code dictionary generation based on non-blocking operations
CN111581234B (en) RAC multi-node database query method, device and system
US10831737B2 (en) Method and device for partitioning association table in distributed database
CN110399368B (en) Method for customizing data table, data operation method and device
CN111723148A (en) Data storage method and device, storage medium and electronic device
US11221890B2 (en) Systems and methods for dynamic partitioning in distributed environments
CN101916281B (en) Concurrent computational system and non-repetition counting method
US10162830B2 (en) Systems and methods for dynamic partitioning in distributed environments
WO2021027331A1 (en) Graph data-based full relationship calculation method and apparatus, device, and storage medium
CN109451069B (en) Network data file library storage and query method based on distributed storage
CN107644025B (en) Method and device for distributing WAL records of distributed database
US8930345B2 (en) Query optimization in a parallel computer system to reduce network traffic
KR20210066004A (en) Cosharding and Randomized Cosharding
US10657126B2 (en) Meta-join and meta-group-by indexes for big data
US11442792B2 (en) Systems and methods for dynamic partitioning in distributed environments
Li et al. A partition model and strategy based on the Stoer–Wagner algorithm for SaaS multi-tenant data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination