CN110147407B - Data processing method and device and database management server - Google Patents

Data processing method and device and database management server Download PDF

Info

Publication number
CN110147407B
CN110147407B CN201710917504.3A CN201710917504A CN110147407B CN 110147407 B CN110147407 B CN 110147407B CN 201710917504 A CN201710917504 A CN 201710917504A CN 110147407 B CN110147407 B CN 110147407B
Authority
CN
China
Prior art keywords
data table
field
association relationship
data
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710917504.3A
Other languages
Chinese (zh)
Other versions
CN110147407A (en
Inventor
陈开济
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201710917504.3A priority Critical patent/CN110147407B/en
Publication of CN110147407A publication Critical patent/CN110147407A/en
Application granted granted Critical
Publication of CN110147407B publication Critical patent/CN110147407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data processing method, a data processing device and a database management server, wherein the method comprises the following steps: acquiring at least one incidence relation of a first data table, and if a first conflict incidence relation exists in the at least one incidence relation, generating at least one first candidate scheme for solving the first conflict incidence relation, wherein the at least one first candidate scheme is a candidate scheme based on a global secondary index; determining a first target candidate scheme from the at least one first candidate scheme, and adjusting the at least one association relationship by using the first target candidate scheme to obtain the adjusted at least one association relationship; and performing library division on the first data table and the at least two second data tables according to the adjusted at least one incidence relation and library division rule, or performing library division on the first data table, and automatically processing the conflict incidence relation of the data tables based on the global secondary index so as to realize library division of the data tables and improve the library division efficiency.

Description

Data processing method and device and database management server
Technical Field
The invention relates to the technical field of computers, in particular to a data processing method and device and a database management server.
Background
With the development of computing technology, more and more data need to be stored and processed in a database system, and a data management mode based on a single database system cannot meet the increasing data service requirements of users, so that a distributed database system for storing and processing mass data is produced.
The distributed database system is a logically unified database formed by connecting a plurality of physically dispersed database units by using a computer network, and in order to provide the same data storage and query functions as those of a single database, the distributed database system needs to divide data tables in the database so that the data tables can be distributed in each data unit. The database partitioning scheme based on the association relationship between the data tables is one of the most widely applied database partitioning schemes currently, but the data tables usually have a conflict association relationship, where the conflict association relationship means that a certain data table has a self-association relationship or the data table and the multiple data tables are based on different field relationships, and the multiple data tables are parent tables of the data tables, and the association relationships between the data tables need to be manually deleted to process the conflict association relationship of the data tables, so that the database partitioning of the data tables based on the connection association relationship between the data tables can be realized.
Disclosure of Invention
The invention provides a data processing method and device and a database management server, which can automatically process the conflict incidence relation of a data table based on a global secondary index scheme so as to realize database partitioning of the data table and improve database partitioning efficiency.
In a first aspect, an embodiment of the present invention provides a data processing method, where in the method, at least one association relationship of a first data table is obtained, and if a first conflicting association relationship exists in the at least one association relationship, at least one first candidate scheme for solving the first conflicting association relationship is generated, where the at least one first candidate scheme is a candidate scheme based on a global secondary index; determining a first target scheme to be selected from the at least one first scheme to be selected, adjusting the at least one association relation by using the first target scheme to be selected to obtain the adjusted at least one association relation, and sorting the first data table and the at least two second data tables or the first data table according to the adjusted at least one association relation and a sorting rule.
The at least one incidence relation comprises an incidence relation between the first data table and at least two second data tables or comprises a self-incidence relation of the first data table, the second data table is a parent table of the first data table, and one incidence relation corresponds to one field.
In the technical scheme, when a first conflict incidence relation exists in at least one incidence relation of a first data table, a database management server may generate at least one first candidate scheme for solving the first conflict incidence relation, determine a first target candidate scheme from the at least one first candidate scheme, adjust the first conflict incidence relation by using the first target candidate scheme to obtain the adjusted at least one incidence relation, and perform library partitioning on the first data table or the first data table and at least two second data tables according to the adjusted at least one incidence relation and a library partitioning rule, so that the conflict incidence relation may be automatically solved, so that the data tables may be partitioned based on the incidence relation (i.e., the adjusted at least one incidence relation), and efficiency of performing library partitioning on the data tables is improved.
As an optional implementation manner, if the first data table has self-association, it is determined that a first conflicting association relationship exists; and if the first data table is associated with the at least two second data tables based on different fields, determining that a first conflict association relationship exists in the at least one association relationship.
In this technical solution, if the first data table has self-association, or the first data table is based on different fields and the at least two second data tables, the database management server may determine that a conflict association relationship exists in the at least one association relationship.
As an optional implementation manner, if the first data table has a self-association relationship, a candidate scheme of a global secondary index based on a self-association field is generated; and if the first data table is associated with the at least two second data tables based on different fields, generating at least two candidate schemes based on the global secondary index, wherein the at least two candidate schemes are based on different fields.
In this technical solution, if the first data table has a self-association relationship, the database management server may generate candidate schemes based on the global secondary index of the self-association field, and if the first data table is associated with the at least two second data tables based on different fields, generate at least two candidate schemes based on the global secondary index, so that the candidate schemes may be used to solve the conflict association relationship.
As an optional implementation manner, the first target candidate scheme is a candidate scheme based on the global secondary index of the self-association field; the database management server adjusts the at least one incidence relation by adopting the first target candidate scheme, and the specific mode of obtaining the adjusted at least one incidence relation comprises the following steps: and releasing the self-association relationship of the first data table, acquiring a first auxiliary table from the first data table, and establishing the association relationship between the first auxiliary table and the first data table based on the self-association field to adjust the at least one association relationship to obtain the adjusted at least one association relationship.
In the technical scheme, the database management server can remove the self-association relationship of the first data table, acquire the first auxiliary table from the first data table, establish the association relationship between the first auxiliary table and the first data table based on the self-association field to adjust the at least one association relationship, and obtain the adjusted at least one association relationship.
As an optional implementation manner, the first target candidate is a candidate based on a global secondary index of the first field; the database management server adjusts the at least one incidence relation by adopting the first target candidate scheme, and the specific mode of obtaining the adjusted at least one incidence relation comprises the following steps: determining at least one data table associated with the first data table based on the first field according to the at least one association relationship, wherein the determined at least one data table is one of the at least two second data tables, and the first field is any one of the fields corresponding to the at least one association relationship; the association relation between the first data table and the determined at least one data table is released, and at least one second auxiliary table is obtained from the first data table based on the first field, wherein the second auxiliary table comprises at least one field in the first data table, and the at least one field comprises the first field; and establishing an association relationship between the at least one second secondary table and the determined at least data table based on the first field to adjust the at least one association relationship to obtain the adjusted at least one association relationship.
In the technical scheme, the database management server can adopt the global secondary index-based scheme to be selected to solve the first conflict incidence relation to obtain the adjusted at least one incidence relation, and the adjusted at least one incidence relation does not have the conflict incidence relation, so that the first data table and the at least two second data tables can be sorted based on the incidence relation, and the sorting efficiency can be improved.
As an optional implementation manner, a first database field is determined according to the adjusted at least one association relationship, the first data table and the first sub-table are divided according to the first database field and the database partitioning rule to obtain a plurality of first sub-data tables, and the plurality of first sub-data tables are allocated to at least one data unit.
In the technical scheme, the database management server can determine a first sub-table field according to the adjusted at least one association relationship, divide the first data table and the first sub-table according to the first sub-table field and the sub-table rule to obtain a plurality of first sub-tables, and allocate the plurality of first sub-tables to at least one data unit, so that the database management server can automatically sub-table the data tables based on the association relationship to meet more service requirements of users.
As an optional implementation manner, determining at least one second library field according to the adjusted at least one association relationship; dividing the first data table, the at least two second data tables and the at least one second sub-table according to the at least one second database division field and the database division rule to obtain a plurality of second sub-data tables; the plurality of second sub data tables are allocated into at least one data unit.
In the technical scheme, the database management server may determine at least one second sub-database field according to the adjusted at least one association relationship, divide the first data table, the at least two second data tables, and the at least one second sub-table according to the at least one second sub-database field and the sub-database rule to obtain a plurality of second sub-data tables, and allocate the plurality of second sub-data tables to at least one data unit, so that the database management server may automatically sub-database the data tables based on the association relationship to meet more service requirements of users.
As an optional embodiment, at least one first routing field of the first data table is determined according to a historical query log; determining attribute information of the at least one first routing field according to the database partitioning rule, wherein the attribute information of the first routing field comprises at least one of a data table to which the first routing field belongs, a sub-data table to which the first routing field belongs or a data unit in which the sub-data table to which the first routing field belongs; and establishing a mapping relation between the at least one first routing field and the attribute information of the at least one first routing field.
In the technical scheme, the database management server can determine at least one first routing field of the first data table according to the historical query log, determine the attribute information of the at least one first routing field according to the library dividing rule, establish the mapping relation between the at least one first routing field and the attribute information of the at least one first routing field, dynamically set the routing information (namely the mapping relation) of the first data table according to the historical query record, improve the utilization rate of the routing information, and quickly position the data to be searched according to the routing information.
As an alternative embodiment, at least one second routing field of the first data table and the at least two second data tables is determined from the historical query logs; determining attribute information of the at least one second routing field according to the database partitioning rule, wherein the attribute information of the second routing field comprises at least one of a data table to which the second routing field belongs, a sub-data table to which the second routing field belongs or a data unit in which the sub-data table to which the second routing field belongs; and establishing a mapping relation between the at least one second routing field and the attribute information of the at least one second routing field.
In the technical scheme, the database management server can determine at least one second routing field of the first data table and the at least two second data tables according to the historical query logs; and determining the attribute information of the at least one second routing field according to the library dividing rule, and establishing a mapping relation between the at least one second routing field and the attribute information of the at least one second routing field. The routing information (namely, mapping relation) of the first data table and the at least two second data tables can be dynamically set according to the historical query record, the utilization rate of the routing information can be improved, and the data to be searched can be quickly positioned according to the routing information.
As an alternative implementation, the database management server may determine, according to the historical query log, a frequency with which each field of each of the first data table and the at least two second data tables is queried, determine, according to the frequency with which each field is queried, a query performance generated by performing a query based on each field, and calculate a second cost generated by establishing a routing field based on each field; and if the second field meets the rule of establishing the routing field according to the query performance and the second price, determining the second field as the second routing field, wherein the second field is any one of the first data table or the at least two second data tables.
In this technical solution, the database management server may determine, according to the historical query log, a frequency with which each field of each of the first data table and the at least two second data tables is queried, determine, according to the frequency with which each field is queried, query performance generated by performing the query based on each field, calculate a second price generated by establishing a routing field based on each field, and determine, if it is determined that a second field satisfies a rule of establishing a routing field according to the query performance and the second price, that the second field is a second routing field, where the second field is any one of the first data table or the at least two second data tables. The routing fields can be set according to the query performance and cost of the fields, the phenomenon that the large amount of routing information brings large storage pressure to a system can be avoided, and the routing fields are selected according to the historical query logs, so that the set routing information is more in line with the query rule of the user.
As an optional implementation manner, the database management server calculates a first cost for solving the first conflict association relationship by using each first candidate in the at least one first candidate, determines a candidate with the smallest cost according to the first cost, and determines the candidate with the smallest cost as the first target candidate.
In the technical scheme, the database management server may calculate a first cost for solving the first conflict association relationship by using each first candidate scheme of the at least one first candidate scheme, determine the candidate scheme with the minimum cost according to the first cost, and determine the candidate scheme with the minimum cost as the first target candidate scheme, so that the candidate scheme may be selected to process the conflict association relationship at the cost of controllable storage space and processing time, thereby implementing the database partitioning of the data table based on the association relationship.
As an optional implementation manner, the database management server may receive the query request, parse the query request to obtain a query data table or a query field, and if the query data table is the first target data table, determine whether the query field is matched with the routing field of the first target data table, where the first target data table is the first data table or any one of the at least two second data tables; if not, obtaining at least one updated field, wherein each updated field in the at least one updated field comprises the query field and a target routing field, the target routing field is any one of the routing fields of the first target data table, each updated field in the at least one updated field is different, calculating a query cost generated by querying based on each updated field in the at least one updated field, outputting the query cost of the at least one updated field and each updated field, and prompting that the target updated field is selected for querying based on the query cost of each updated field, and the target updated field is the field in the at least one updated field.
In the technical scheme, if the received query is a single-table-based query, whether a query field carried in the query request is a routing field of a data table is judged, if the query field is not the routing field, the query request is determined to be not the optimal query, namely the query request can be optimized, the database management server can optimize the query request (namely the query scheme is obtained) by adding the routing field in the query field, calculates the cost of the query scheme, can output the optimized query scheme, and can prompt a user to select an optimized query scheme according to the cost.
As an optional implementation manner, if the query data table is the first data table and the second target data table, the database management server may determine, according to the adjusted at least one association relationship, whether the first data table and the second target data table are associated based on the query field, where the second target data table is any one of the at least two second data tables and the at least one second sub-table; if the correlation is not based on the query field, determining at least one query scheme for querying the query field according to the library partitioning rule; calculating a third cost for each of the at least one query plan; and outputting the at least one query scheme and the third cost to prompt that a query scheme is selected from the at least one query scheme based on the third cost, and querying by adopting the selected query scheme.
In the technical scheme, if the received query is a multi-table-based query, the database management server can judge whether the routing field carried in the query request is an associated field between the query tables, if not, the query is determined not to be the optimal query, namely, the query request can be optimized, the database management service can determine the query scheme according to the database partitioning rule and calculate the cost of each query scheme, the optimized query scheme can be output for the user, and the user can be prompted to select the query scheme according to the cost.
As an optional implementation manner, receiving a request for establishing an association relationship based on a third field for a third data table and the first data table, where the third data table is a parent table of the first data table; establishing an incidence relation between the third data table and the first data table based on the third field; and updating the association relationship among the third data table, the first data table and a third target data table based on the adjusted at least one association relationship to obtain a first local association relationship, where the third target data table is a data table associated with the first data table in the at least two second data tables.
In this technical solution, the database management server may receive a request for establishing an association relationship based on a third field for a third data table and the first data table, that is, a request for newly adding an association relationship is received, may establish an association relationship between the third data table and the first data table based on the third field, and update the association relationship between the third data table, the first data table, and a third target data table based on the adjusted at least one association relationship to obtain a first local association relationship.
As an optional implementation manner, if a second conflict associative relationship exists in the first local associative relationship, at least one second candidate scheme for solving the second conflict associative relationship is generated, where the at least one second candidate scheme is a scheme based on a global secondary index, a second target candidate scheme is determined from the at least one second candidate scheme, a fourth cost for solving the second conflict associative relationship by using the second target candidate scheme is calculated, and if the fourth cost is less than a cost threshold, the first local associative relationship is adjusted by using the second target candidate scheme to obtain the adjusted first local associative relationship, and the first data table, the third target data and the third data table are banked again according to the adjusted first local associative relationship and the banked rule.
In this technical solution, if a second conflict associative relationship exists in the first local associative relationship, the database management server may adopt a candidate scheme based on a global secondary index to resolve the second conflict associative relationship, so as to perform library splitting on the first data table, the third data table, and the third data table again.
As an optional implementation manner, if the first data table, the third data table, and the third target data table are associated based on different fields, it is determined that a second conflicting association relationship exists in the first local association relationship.
In this technical solution, if the first data table, the third data table, and the third target data table are associated based on different fields, the database management server may determine that a second conflict association relationship exists in the first local association relationship.
As an optional implementation manner, if the four costs are greater than or equal to the cost threshold, updating the at least one association relationship by using the association relationship between the third data table and the first data table based on the third field to obtain a global association relationship; if a third conflict incidence relation exists in the global incidence relation, at least one third candidate scheme for solving the third conflict incidence relation is generated, the at least one third candidate scheme is a scheme based on a global secondary index, a third target candidate scheme is determined from the at least one third candidate scheme, the global incidence relation is adjusted by adopting the third target candidate scheme to obtain the adjusted global incidence relation, and the first data table, the third data table and the at least two second data tables are subjected to researchaizing according to the adjusted global incidence relation and the researchaizing rule.
In the technical scheme, if the fourth cost is greater than or equal to the cost threshold, that is, the cost for updating the association relationship of the data tables based on the local updating method is too high, the association relationship of each data table may be updated by using the global updating method to obtain the global association relationship, and if a conflict association relationship exists in the global association relationship, the conflict association relationship may be solved by using a candidate scheme based on the global secondary index, so that the database partitioning scheme of the data tables may be automatically updated according to the added association relationship.
As an optional implementation manner, if it is determined that the first data table is associated with the at least two second data tables and the third data table based on different fields according to the global association relationship, it is determined that a third conflicting association relationship exists in the global association relationship.
In this technical solution, if it is determined that the first data table is associated with the at least two second data tables and the third data table based on different fields according to the global association relationship, the database management server may determine that a third conflicting association relationship exists in the global association relationship.
As an optional implementation manner, an add instruction for a fourth data table is received, if the fourth data table is associated with the first data table based on a fourth field, an association relationship between the fourth data table and the first data table based on the fourth field is established, where the fourth data table is a parent table of the first data table, a fourth target data table associated with the first data table is determined according to the adjusted at least one association relationship, if a fourth conflicting association relationship exists in a second local association relationship, at least one fourth candidate scheme for solving the fourth conflicting association relationship is generated, where the at least one fourth candidate scheme is a scheme based on a global secondary index, and the second local association relationship is an association relationship among the first data table, the fourth data table, and the fourth target data table; determining a fourth target candidate scheme from the at least one fourth candidate scheme, adjusting the second local incidence relation by using the fourth target candidate scheme to obtain the adjusted second local incidence relation, and performing library splitting on the first data table, the fourth data table and the fourth target data table according to the adjusted second local incidence relation and the library splitting rule.
In the technical scheme, if a request for newly adding a data table (namely, a fourth data table) is received, the association relationship between the newly added data table and the first data table and between the newly added data table and a fourth target data table (a data table having a connection relationship with the newly added data table) can be updated in a local updating manner to obtain a second local association relationship, and if a conflict association relationship exists in the second local association relationship, the conflict association relationship can be solved by using a candidate scheme based on a global secondary index, so that the fourth data table, the fourth target data table and the first data table can be sorted based on association.
As an optional implementation manner, if it is determined that the fields associated with the first data table and the fourth target data table are different from the fourth field according to the adjusted at least one association relationship, the fourth conflict association relationship exists in the second local association relationship.
In this technical solution, if it is determined that the fields associated with the first data table and the fourth target data table are different from the fourth field according to the adjusted at least one association relationship, it is determined that the first data table, the fourth target data table, and the fourth data table are associated based on different fields, and it can be further determined that the fourth conflicting association relationship exists in the second local association relationship.
In a second aspect, a data processing apparatus is provided, which is applied to a database management apparatus, and the database management apparatus has a function of implementing the behavior in the first aspect or the possible implementation manner of the first aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The module may be software and/or hardware.
In a third aspect, there is provided a database management server, including: a memory for storing one or more programs; and the processor is used for calling the program stored in the memory so as to realize the scheme in the method design of the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by at least one processor, enables implementation of the first aspect and each of the possible embodiments and advantages of the first aspect.
In a fifth aspect, an embodiment of the present invention provides a computer program product, where the computer program product includes a non-volatile computer-readable storage medium storing a computer program, and the computer program, when executed, causes a computer to implement the steps of the method of the first aspect, where the problem solving embodiments and advantages of the computer program product may refer to the foregoing first aspect and possible method embodiments and advantages of the first aspect, and repeated details are not repeated.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be described below.
FIG. 1a is an architectural diagram of a distributed database system according to an embodiment of the present invention;
FIG. 1b is an architectural diagram of another distributed database system provided by an embodiment of the invention;
FIG. 2a is a flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2b is a self-association diagram of a first data table according to an embodiment of the present invention;
FIG. 2c is a diagram of at least one association relationship of a first data table according to an embodiment of the present invention;
FIG. 2d is a diagram of at least one adjusted association relationship according to an embodiment of the present invention;
FIG. 2e (1) is a diagram of at least one association relationship after another adjustment according to an embodiment of the present invention;
FIG. 2e (2) is a diagram of at least one adjusted association relationship provided by the embodiment of the present invention;
FIG. 2f is a schematic diagram illustrating an allocation manner of a plurality of first sub data tables according to an embodiment of the present invention;
FIG. 3 is a flow chart of another data processing method provided by the embodiment of the invention;
FIG. 4a is a flow chart illustrating a further data processing method according to an embodiment of the present invention;
FIG. 4b is a diagram of the at least one association relationship after further adjustment according to the embodiment of the present invention;
FIG. 4c is a first partial association diagram provided by an embodiment of the present invention;
FIG. 4d is a diagram of an adjusted first local association relationship according to an embodiment of the present invention;
fig. 4e is a global incidence relation diagram provided in the embodiment of the present invention;
fig. 4f is a diagram of an adjusted global association relationship according to an embodiment of the present invention;
FIG. 4g is a second partial correlation diagram according to an embodiment of the present invention;
FIG. 4h is a second adjusted local relationship diagram provided by an embodiment of the invention;
FIG. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a database management server according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.
The technical solution of the embodiment of the present invention may be typically applied to a Distributed Database System (DDBS), and more specifically, may be applied to a Distributed Database Management System (DDBMS), or a Database Management server deployed with a DDBMS.
The embodiment of the invention can be applied to the application scene of Online Transaction Processing (OLTP), such as bank Transaction or a charging system.
The architecture of the distributed database system applied in the embodiment of the present invention is shown in fig. 1a, and the distributed database system includes a client 10, a distributed database management system 11, and a distributed database 12.
Client 10 may refer to a device or application configured to interact with database management system 11. In some examples, the client 10 includes one or more application servers. The client 10 may be configured to send at least one data table, entity Relationship (ER) graph information between the data tables, a history query log of the data tables, and a query request to the distributed database management system 11.
The distributed database management system 11 is used for establishing, using and maintaining the distributed database 12, and performing unified management and control on the distributed database 12 to ensure the security and integrity of the database. The user can access the data in the distributed database 12 through the distributed database management system 11, and the database administrator also performs maintenance work of the distributed database 12 through the distributed database management system 11.
The distributed database management system 11 may provide a variety of functions for the client 10 to create, modify and query the distributed database 12. The functions provided by the distributed database management system 11 include, but are not limited to: (1) A Data Definition function, in which the distributed database management system 11 provides a Data Definition Language (DDL) to define a database structure, and the DDL is used for describing a distributed database framework and can be stored in a Data dictionary; (2) A Data access function, in which the distributed database management system 11 provides a Data Management Language (DML) to implement basic access operations, such as query, addition, update, and deletion, on the Data tables in the distributed database 12; (3) The distributed database operation management function, the distributed database management system 11 provides a data control function, that is, the security, integrity and concurrency control of data, etc. effectively control and manage the operation of the distributed database to ensure the correctness and validity of the data; (4) The establishment and maintenance functions of the distributed database 12 include loading of initial data of the distributed database 12, dumping, recovery and reorganization of the distributed database 12, system performance monitoring, analysis and other functions; (5) The transmission of data, provided by the distributed database management system 11, realizes the communication between the client 10 and the distributed database management system 11, and is usually done in coordination with an operating system.
In some embodiments, the distributed database management system 11 may include a data processing device 111, an automatic database partitioning component 112, and distributed database middleware 113, wherein the automatic database partitioning component 112 includes a solution searching device 112-1, a cost estimating device 112-2, a field selecting device 112-3, and a deploying device 112-4.
The data processing device 111 is configured to obtain an association relationship between the data tables according to the ER map information between the data tables and the historical query logs of the data tables, and send information indicating the association relationship between the data tables to the automatic database partitioning component 112.
The scheme searching device 112-1 is configured to obtain at least one association relationship of the first data table according to the information indicating the association relationship between the data tables, and if a first conflicting association relationship exists in the at least one association relationship, generate at least one first candidate scheme for solving the first conflicting association relationship.
The at least one association relationship comprises an association relationship between the first data table and at least two second data tables or a self-association relationship of the first data table, the second data tables are parent tables of the first data tables, and one association relationship corresponds to one field.
Wherein the first conflicting association may comprise a self-association of the first data table or the first data table being associated with the at least two second data tables based on different fields.
The scheme searching device 112-1 may select a local search or a greedy method to generate a scheme to be selected based on a global second-level index, and the speed of the scheme to be selected may reach a second level, so that the library splitting process may be completed under the condition that a user is completely unaware, and the library splitting scheme may be modified in the operation process. The distribution of the data tables can be dynamically adjusted according to the performance monitoring result of the database units in the distributed database, so that the problem that the overall user service performance is influenced due to the failure of a certain data unit is solved.
The cost estimation device 112-2 is configured to calculate a first cost for solving the first conflict association relationship by using each first candidate scheme of the at least one first candidate scheme, determine a candidate scheme with a minimum cost according to the first cost, and determine the candidate scheme with the minimum cost as the first target candidate scheme.
It should be noted that, because the first prices for solving the first conflicting associations by different candidate solutions are different, in order to improve the stability and the availability of the distributed database system, the cost estimation device 112-2 calculates a first cost for solving the first conflicting associations by using each first candidate solution of the at least one first candidate solution according to the cost model, determines a candidate solution with the smallest cost according to the first cost, and determines the candidate solution with the smallest cost as the first target candidate solution. In particular, the at least one candidate is a candidate based on the global secondary index of the self-associated field, and the cost estimation device 112-2 may calculate a first price of the candidate based on the global secondary index of the self-associated field, and determine the candidate based on the global secondary index of the self-associated field as the first target candidate if the first price is less than a preset cost value.
The cost model may be set by a user, may be set by a manufacturer of the cost estimation apparatus 112-2, and may be set by the cost estimation apparatus 112-2 according to an attribute of the distributed database, where the attribute of the distributed database includes the number of data units or the capacity of each data unit. The cost model may include a time cost model, a system overhead cost model, or a comprehensive cost model, which is a cost model set according to time, system overhead, and other factors.
The deployment device 112-4 is configured to adjust the at least one association relationship by using the first target candidate scheme to obtain the adjusted at least one association relationship, and perform banking on the first data table and the at least two second data tables according to the adjusted at least one association relationship and the banking rules, so as to allocate the first data table and the at least two second data tables to each data unit in the allocation database 12; or the first data table is sorted to allocate the first data table to each data unit in the allocation database 12.
The distributed database 12 may include at least one data unit (fig. 1 shows four data units as an example, which are denoted as data unit 12-1, data unit 12-2, data unit 12-3, and data unit 12-4) for storing data tables, where the data unit may be a physical machine, such as a database server, or a virtual machine running on an abstract hardware resource.
It should be noted that, there may be a connection channel for exchanging information between data units in the distributed database 12, or they may be independent units that are not sensed by each other; the form of the data unit storing data may include various structures including, but not limited to, a relational database, a memory database, a file storage unit, an object storage unit, and the like.
The field selection device 112-3 is configured to generate routing information of the first data table according to the historical query log and the database partitioning rule, or generate routing information of the first data table and each of the at least two second data tables according to the historical query log and the database partitioning rule, and send the routing information to the distributed database middleware 113-1.
The deployment device 112-4 is further configured to generate a library splitting scheme association relationship diagram according to the adjusted at least one association relationship, generate table access metadata and a table library splitting rule modification script according to the library splitting rule, and send the library splitting scheme association relationship diagram and the table library splitting rule modification script to the distributed database middleware 113-2. The distributed database middleware 113-2 may update the corresponding running, development and design related configuration files according to the table sub-library rule modification script, and maintain all database units and objects in the data units in the distributed database 12 according to the target candidate scheme. The distributed database middleware 113-2 employs a library partitioning scheme associative relation graph to assist data query and table design. The deployment device 112-4 is also used to send routing information and table access metadata to the distributed database middleware 113-1. The distributed database middleware 113-1 can support users to access their managed distributed databases 12 correctly, efficiently, and imperceptibly according to the routing information and table access metadata.
The routing information includes routing fields, data tables to which the routing fields belong, sub-data tables to which the routing fields belong, and data units in which the sub-data tables to which the routing fields belong. The routing information is used for quickly positioning to-be-inquired fields during data inquiry.
The table access metadata is mainly information describing a data property (property) and is used for supporting functions such as indicating a storage location, history data, resource searching, file recording and the like.
It should be noted that, in order to improve the routing performance, i.e. improve the query efficiency, the distributed database middleware 113-1 may store the routing information in the memory database or the data cache.
It should be noted that the distributed database management system 11 may also support a dynamic flexible scaling function, that is, dynamically update the database partitioning scheme according to increase or decrease of data units in the distributed database or different rules of user service access to the database, so as to maximize resource utilization.
In the embodiment of the invention, if a first conflict incidence relation exists in at least one incidence relation of a first data table, a database management server can generate at least one scheme to be selected for solving the first conflict incidence relation, select a first target scheme to be selected from the at least one scheme to be selected, adjust the first conflict incidence relation by adopting the first target scheme to be selected to obtain the adjusted at least one incidence relation, and can divide a database for the first data table according to the adjusted at least one incidence relation and a database dividing rule, or divide the database for the first data table and at least two second data tables, so that the conflict incidence relation can be automatically solved, the database division for the data tables based on the incidence relation can be realized, and the database dividing efficiency can be improved.
Based on the above description of the architecture of a distributed database system, an embodiment of the present invention provides an architecture of another distributed database system, as shown in fig. 1b, the architecture of the distributed database system shown in fig. 1b is different from the architecture of the distributed database system shown in fig. 1a in that the architecture of the distributed database system shown in fig. 1b does not have a distributed database middleware for uniformly managing the distributed database, and the table access metadata and the library division scheme association diagram for assisting system development and design need to be output to other external devices 13 such as terminal devices or database servers for maintenance. The routing information is information necessary for providing the query service, and thus the routing information may be stored in the data unit 12-5 in the distributed database 12, and may also be stored in a query routing device independent of the distributed database 12.
Based on the above description of the architecture of a distributed database system, an embodiment of the present invention provides a flowchart of a data processing method, which is applied to a database management server, and as shown in fig. 2a, the data processing method may include:
s201, acquiring at least one incidence relation of the first data table.
The at least one association relationship includes an association relationship between the first data table and at least two second data tables or a self-association relationship including the first data table, the second data table is a parent table of the first data table, one association relationship corresponds to one field, and the association relationship may refer to a connection relationship between the data tables.
In the embodiment of the present invention, the database management server may obtain at least one association relationship of the first data table according to ER map information of the data table and a historical Query record, where the historical Query record may include a Structured Query Language (SQL) Query record.
For example, as shown in fig. 2b, fig. 2b is a self-association relationship diagram of a first data table, the first data table is table 1 in the diagram, and table 1 is self-associated based on an ID field, that is, the at least one association relationship includes the self-association relationship of the first data table.
Alternatively, as shown in fig. 2c, fig. 2c is at least one association relationship diagram of the first data table, the first data table is table 1 in the diagram, the at least two second data tables include tables 2 and 3 in the diagram, and tables 2 and 3 are parent tables of table 1, table 1 is associated with table 2 based on field 1, and table 1 is associated with table 3 based on field 2, that is, the at least one association relationship includes an association relationship between the first data table and the at least two second data tables.
S202, if a first conflict incidence relation exists in the incidence relation, at least one first candidate scheme for solving the first conflict incidence relation is generated.
In the embodiment of the present invention, if a first conflicting association exists in the at least one association, that is, the first data table has self-association, that is, other fields in the first data table may be queried based on self-association fields, and if the self-association of the first data table is released and the first data table is allocated to a certain data table, it is not possible to query other fields in the first data table based on self-association fields, which reduces query performance.
Or, in the database partitioning scheme based on the association relationship, the data tables associated based on the same field are allocated to the same data unit, and if a first conflict association relationship exists in the at least one association relationship, that is, if the first data table is associated with the at least two second data tables based on different fields, the first data table and the at least two second data tables cannot be partitioned based on the at least one association relationship.
As an optional embodiment, the specific manner of determining, by the database management server, that the first conflicting association relationship exists in the at least one association relationship includes: if the first data table has self-correlation, determining that a first conflict correlation relationship exists; and if the first data table is associated with the at least two second data tables based on different fields, determining that a first conflict association relationship exists in the at least one association relationship.
For example, as shown in FIG. 2b above, since Table 1 is self-associated based on the ID field, the database management server may determine that a conflicting association exists.
Alternatively, as shown in fig. 2c above, since table 1 is associated with table 2 based on field 1 and table 1 is associated with table 3 based on field 2, the database management server may determine that the first data table is associated with the at least two second data tables based on different fields and may determine that the first conflicting association relationship exists in the at least one association relationship.
As an alternative embodiment, the specific way for the database management server to generate at least one first alternative solution for resolving the first conflicting association includes: if the first data table has a self-association relationship, generating a scheme to be selected of a global secondary index based on a self-association field; and if the first data table is associated with the at least two second data tables based on different fields, generating at least two candidate schemes based on the global secondary index, wherein the at least two candidate schemes are based on different fields.
For example, as shown in FIG. 2b above, since Table 1 is self-associated based on the ID field, the database management server may generate a candidate for a global secondary index based on the ID field.
Alternatively, as shown in fig. 2c, since table 1 is associated with table 2 based on field 1, and table 2 is associated with table 3 based on field 2, that is, the first data table is associated with the at least two second data tables based on different fields, the database management server may generate at least two candidate schemes based on the global secondary index, that is, the at least two candidate schemes include a candidate scheme based on the global secondary index of field 1 and a candidate scheme based on the global secondary index of field 2.
S203, determining a first target candidate scheme from the at least one first candidate scheme, and adjusting the at least one association relationship by using the first target candidate scheme to obtain the adjusted at least one association relationship.
In the embodiment of the present invention, the database management server may randomly select one candidate from the at least one first candidate as the first target candidate, or determine the first target candidate according to the cost of the at least one first candidate for solving the first collision association relationship, or determine the first target candidate according to an objective function provided by the user.
The first target candidate scheme may be a scheme with the minimum cost, or a candidate scheme that best fits an objective function provided by a user.
As an optional implementation manner, the database management server calculates a first cost for solving the first conflict association relationship by using each first candidate in the at least one first candidate, determines a candidate with the smallest cost according to the first cost, and determines the candidate with the smallest cost as the first target candidate.
In the embodiment of the present invention, because the first prices for solving the first conflicting association relationships of different candidate schemes are different, in order to improve stability and availability of the distributed database system, the database management server may calculate, according to the cost model, the first cost for solving the first conflicting association relationship by using each first candidate scheme of the at least one first candidate scheme, determine, according to the first cost, the candidate scheme with the minimum cost, and determine, as the first target candidate scheme, the candidate scheme with the minimum cost. Specifically, the at least one candidate scheme is a candidate scheme based on a global secondary index of a self-associated field, the database management server may calculate a first price of the candidate scheme based on the global secondary index of the self-associated field, and if the first price is smaller than a preset price value, determine the candidate scheme based on the global secondary index of the self-associated field as a first target candidate scheme.
The cost model may be set by a user, may be set by a manufacturer of the database management server, or may be set by the database management server according to attributes of the distributed database, where the attributes of the distributed database include the number of data units or the capacity of each data unit. The cost model may include CPU consumption, the number of times and the amount of transmission of data by the network, and the like. The number of times and the transmission amount of the network transmission data may refer to the number of times and the transmission amount generated when a certain field is queried.
For example, the first cost is the network data transmission times, the database management server calculates the network data transmission times of the candidate schemes based on the global secondary index of the ID field, and if the network data transmission times are smaller than a preset cost value, the candidate schemes based on the global secondary index of the ID field are determined as the first target candidate schemes.
Or, the database management server calculates the number of times of network data transmission for solving the first conflict association relationship by the candidate scheme based on the global secondary index of field 1 and the number of times of network data transmission for solving the first conflict association relationship by the candidate scheme based on the global secondary index of field 2, and if the number of times of network data transmission for solving the candidate scheme based on the global secondary index of field 1 is greater than the number of times of network data transmission for solving the candidate scheme based on the global secondary index of field 2, the candidate scheme based on the global secondary index of field 2 may be determined as the first target candidate scheme; otherwise, determining the candidate scheme based on the global secondary index of the field 1 as a first target candidate scheme.
As an optional implementation manner, the first target candidate is a candidate based on the global secondary index of the self-association field; the database management server adjusts the at least one incidence relation by adopting the first target candidate scheme, and the specific mode of obtaining the adjusted at least one incidence relation comprises the following steps: and releasing the self-association relationship of the first data table, acquiring a first auxiliary table from the first data table, and establishing the association relationship between the first auxiliary table and the first data table based on the self-association field to adjust the at least one association relationship to obtain the adjusted at least one association relationship.
It should be noted that, because the first secondary table includes the self-associated field and may also include other fields, the database management server may set the first secondary table according to the historical query record, for example, a field in the first data table may be added to the first secondary table as the frequency of a certain field in the first data table being based on an associated query is higher, or a field in the first secondary table may be set according to a selection of a user, and because the fields in the first secondary table are generally fewer than the fields in the first data table, the candidate scheme based on the global secondary index saves more storage space than the global copy table scheme.
For example, if the first target candidate is a candidate based on the ID field, the database management server is configured to release the self-association relationship of table 1, obtain a first sub-table from table 1 shown in fig. 2b, where the first sub-table at least includes the ID field, establish an association relationship between the first sub-table and table 1 based on the ID field to adjust the at least one association relationship, obtain the adjusted at least one association relationship, and obtain an adjusted at least one association relationship diagram shown in fig. 2 d.
As an optional implementation manner, the first target candidate is a candidate based on a global secondary index of the first field; the database management server adjusts the at least one incidence relation by adopting the first target candidate scheme, and the specific mode of obtaining the adjusted at least one incidence relation comprises the following steps: determining at least one data table associated with the first data table based on the first field according to the at least one association relationship, wherein the determined at least one data table is one of the at least two second data tables, and the first field is any one of the fields corresponding to the at least one association relationship; disassociating the first data table from the determined at least one data table based on the first field; obtaining at least one second secondary table from the first data table, wherein the second secondary table comprises at least one field in the first data table, and the at least one field comprises the first field; and establishing an association relationship between the at least one second sub-table and the determined at least data table based on the first field to adjust the at least one association relationship to obtain the adjusted at least one association relationship.
For example, if the first target candidate is a candidate based on a global secondary index of a first field, where the first field is field 1 in fig. 2c, the database management server determines that the data table associated with table 1 based on field 1 is table 2 according to at least one association relationship of the first data table shown in fig. 2c, removes the association relationship between table 1 and table 2 based on field 1, obtains a second sub-table from table 1, where the second sub-table at least includes field 1, establishes an association relationship between the second sub-table and table 2 based on field 1 to adjust the at least one association relationship, so as to obtain the adjusted at least one association relationship, and the adjusted at least one association relationship is shown in fig. 2e (1).
If the first target candidate is a candidate based on a global secondary index of a first field, where the first field is field 2 in fig. 2c, the database management server determines, according to at least one association relationship of table 1 shown in fig. 2c, that the data table associated with table 1 based on field 2 is table 3, removes the association relationship between table 1 and table 3 based on field 2, obtains a second secondary table from table 1, where the second secondary table at least includes field 2, establishes an association relationship between the second secondary table and table 3 based on field 2, so as to adjust the at least one association relationship, obtains the adjusted at least one association relationship, and the adjusted at least one association relationship is shown in fig. 2e (2).
S204, performing database partitioning on the first data table and the at least two second data tables according to the adjusted at least one incidence relation and the database partitioning rule, or performing database partitioning on the first data table.
In the embodiment of the invention, the database management server can divide the first data table and the at least two second data tables according to the adjusted at least one incidence relation and the division rule, or divide the first data table, so that the automatic division of the data tables can be realized, and the division efficiency is improved.
The database partitioning rule may be set by a user, or may be set by the database management server according to the total number of the data units, the memory capacity of each data unit, the size of the data table, the change speed of the fields in the data table, or the like, or may be set by a manufacturer of the database management server. The database partitioning rule comprises interval partitioning, list partitioning or hash partitioning and the like.
As an optional implementation manner, a specific manner in which the database management server executes step S204 includes: determining a first library field according to the adjusted at least one incidence relation; dividing the first data table and the first sub table according to the first database dividing field and the database dividing rule to obtain a plurality of first sub data tables; the plurality of first sub data tables are allocated into at least one data unit.
For example, as shown in the at least one adjusted association graph shown in fig. 2d, the first data table is table 1, the partition rule is interval division, the data amount of each interval includes 1000, the database management server may determine the first partition field according to the at least one adjusted association relation, the first partition field of the table 1 is ID1 field, the first partition field of the first sub-table is ID2, if table 1 and the first sub-table include 2500 data amounts, partition table 1 and the first sub-table according to the first partition field and the partition rule to obtain a plurality of first sub-tables, that is, obtain the plurality of first sub-tables including table 1-P1, table 1-P2, table 1-P3, table 1s-P1, table 1s-P2 and table 1s-P3, allocate table 1-P1, table s-P1 to data unit 1 shown in fig. 4, and allocate table 1s-P2 to data unit 2, as shown in fig. 2f 2, and fig. 2f 3.
Wherein, the table 1-P1 includes all the data records corresponding to the ID1 field value in the interval of 0-1000 in the table 1, the table 1-P2 includes all the data records corresponding to the ID1 field value in the interval of 1001-2000 in the table 1, and the table 1-P3 includes all the data records corresponding to the ID1 field value in the interval of 2001-2500 in the table 1. The table 1s-P1 includes all data records corresponding to the ID2 field value in the interval from 0 to 1000 in the first sub-table, the table 1s-P2 includes all data records corresponding to the ID2 field value in the interval from 1001 to 2000 in the first sub-table, and the table 1s-P3 includes all data records corresponding to the ID2 field value in the interval from 2001 to 2500 in the first sub-table.
A data table may include a plurality of fields, each field has a plurality of values, and a data record may refer to a complete set of related information of a row of information of the data table, for example, the data table is a user information data table, the user information data table includes a user name and a birth year, month and day, and then all information of a certain user in the user data table is a data record.
As an optional implementation manner, a specific manner in which the database server executes step S204 includes: determining at least one second library field according to the adjusted at least one incidence relation; dividing the first data table, the at least two second data tables and the at least one second sub-table according to the at least one second database division field and the database division rule to obtain a plurality of second sub-data tables; the plurality of second sub-data tables are allocated into at least one data unit.
For example, as shown in the at least one adjusted association relationship diagram in fig. 2e (1), the database management server may determine at least one second sub-table field according to the at least one adjusted association relationship of the sub-table, where the at least one second sub-table field may be field 1 and field 2, and the database management server may divide the first data table, the at least two second data tables, and the at least one second sub-table according to field 1, field 2, and the sub-table rule to obtain a plurality of second sub-tables, and allocate the plurality of second sub-tables to at least one data unit.
In the embodiment of the invention, if a first conflict incidence relation exists in at least one incidence relation of a first data table, a database management server can generate at least one scheme to be selected for solving the first conflict incidence relation, select a first target scheme to be selected from the at least one scheme to be selected, adjust the first conflict incidence relation by adopting the first target scheme to be selected to obtain the adjusted at least one incidence relation, and can divide a database for the first data table according to the adjusted at least one incidence relation and a database dividing rule, or divide the database for the first data table and at least two second data tables, so that the conflict incidence relation can be automatically solved, the database division for the data tables based on the incidence relation can be realized, and the database dividing efficiency can be improved.
Based on the above description of a data processing method, an embodiment of the present invention provides a flowchart of another data processing method, which is applied to a database management server, and as shown in fig. 3, the data processing method may include:
s301, acquiring at least one incidence relation of the first data table.
S302, if a first conflict incidence relation exists in the at least one incidence relation, generating at least one first candidate scheme for solving the first conflict incidence relation.
S303, determining a first target candidate scheme from the at least one first candidate scheme, and adjusting the at least one association relation by using the first target candidate scheme to obtain the adjusted at least one association relation.
S304, performing database partitioning on the first data table and the at least two second data tables according to the adjusted at least one incidence relation and the adjusted database partitioning rule, or performing database partitioning on the first data table.
S305, determining at least one second routing field of the first data table and the at least two second data tables according to the historical query logs.
In the embodiment of the invention, the database management server can determine the first data table and the at least one second routing field of the at least two second data tables according to the historical query logs, so that the field needing to be queried can be quickly located according to the routing field.
As an alternative embodiment, a specific manner for the database management server to execute step S305 includes: determining the queried frequency of each field of the first data table and each of the at least two second data tables according to the historical query log, determining the query performance generated by querying based on each field according to the queried frequency of each field, and calculating a second cost generated by establishing a routing field based on each field; and if the second field meets the rule of establishing the routing field according to the query performance and the second price, determining the second field as the second routing field, wherein the second field is any one of the first data table or the at least two second data tables.
In the embodiment of the invention, the database management server can set the routing fields for each data table, the more the routing fields are set, the faster the data query speed is, that is, the more the routing fields are set, the reading performance of the data can be optimized, but the more the routing fields are set, the time cost required by scanning the routing fields during the data query is increased, more storage space is required for storing the routing fields, and meanwhile, the cost during the data update and writing is increased, so the database management server can comprehensively consider the routing fields of the data table according to the query performance and the cost.
Since the higher the frequency of querying fields in the data tables is, the higher the query performance generated by setting the fields to the routing fields is (i.e. the higher the query efficiency is), the database management server may determine, from the historical query log, the frequency of querying the fields of each of the first data table and the at least two second data tables, and determine, from the frequency of querying the fields, the query performance generated by querying based on the fields.
The database management server may further calculate a second price generated by establishing the routing field based on the respective fields, and determine a second field as a second routing field if it is determined from the query performance and the second price that the second field satisfies the rule of establishing the routing field, the second field being any field of the first data table or the at least two second data tables.
The second price may include a time price, a storage space price, and a CPU consumption, where the time price may refer to time required for scanning the routing field during data query, the storage space price may refer to storage space required for storing the routing field, and the CPU consumption may refer to data update and write amount during establishing the routing field.
The rule for establishing the routing field may be set by the database management server according to the storage space of the device for storing the routing field, the speed of scanning the routing field by the database management server, or may be set by the user, which is not limited in the present invention.
S306, determining attribute information of the at least one second routing field according to the library-partitioning rule, wherein the attribute information of the second routing field comprises at least one of a data table to which the second routing field belongs, a sub-data table to which the second routing field belongs, or a data unit in which the sub-data table to which the second routing field belongs.
S307, establishing a mapping relation between the at least one second routing field and the attribute information of the at least one second routing field.
In the embodiment of the invention, the database management server can establish the mapping relation between the at least one second routing field and the attribute information of the at least one second routing field, and can quickly locate the field to be queried according to the mapping relation during data query, thereby improving the efficiency of data query.
As an alternative embodiment, the database management server may perform the following steps to establish the routing field of the first data table: determining at least one first routing field of the first data table according to the historical query log; determining attribute information of the at least one first routing field according to the database partitioning rule, wherein the attribute information of the first routing field comprises at least one of a data table to which the first routing field belongs, a sub-data table to which the first routing field belongs or a data unit in which the sub-data table to which the first routing field belongs; and establishing a mapping relation between the at least one first routing field and the attribute information of the at least one first routing field.
In the embodiment of the present invention, the database management server may determine at least one first routing field of the first data table according to the historical query log; determining attribute information of the at least one first routing field according to the database partitioning rule, wherein the attribute information of the first routing field comprises at least one of a data table to which the first routing field belongs, a sub-data table to which the first routing field belongs or a data unit in which the sub-data table to which the first routing field belongs; and establishing a mapping relation between the at least one first routing field and the attribute information of the at least one first routing field, so that when data query is carried out, the field to be queried can be quickly located according to the mapping relation, and the efficiency of the data query is improved.
As an alternative embodiment, after the database management server executes step S307, the following steps may be further executed: receiving a query request, and analyzing the query request to obtain a query data table or a query field; if the query data table is the first target data table, judging whether the query field is matched with a routing field of the first target data table, wherein the first target data table is the first data table or any one of the at least two second data tables; if not, obtaining at least one updated field, wherein each updated field in the at least one updated field comprises the query field and a target routing field, the target routing field is any one of the routing fields of the first target data table, each updated field in the at least one updated field is different, calculating a query cost generated by querying based on each updated field in the at least one updated field, outputting the query cost of the at least one updated field and each updated field, and prompting that the target updated field is selected for querying based on the query cost of each updated field, and the target updated field is the field in the at least one updated field.
In the embodiment of the invention, if the received query request is a query based on a single table and the query field is a routing field, the data management server can quickly locate the field to be queried according to the established mapping relation, namely, the query is determined to be the optimal query, and if the query field is not the routing field, the data management server can add the routing field in the query field so as to optimize the query request and improve the query efficiency.
The database management server may receive a query request, analyze the query request, obtain a query data table or a query field, if the query data table is the first target data table, the data management server may determine that the query request is based on a single table query, determine whether the query field is matched with a routing field of the first target data table, and if not, determine that the query request is not an optimal query request, so that at least one updated field may be obtained, where each updated field of the at least one updated field includes the query field and the target routing field.
Because the first target data table comprises at least one routing field, at least one updated field can be obtained by adding the routing field in the query field, meanwhile, the query cost for querying based on different fields is different, the database management server can calculate the query cost generated by querying based on each updated field in the at least one updated field, output the at least one updated field and the query cost of each updated field to prompt that the target updated field is selected for querying based on the query cost of each updated field, and can optimize the query by adding the routing field in the query field, thereby improving the query efficiency.
The query cost includes a time cost required for querying, a cost of memory occupied by querying, and the like.
As an alternative embodiment, after the database management server executes step S307, the following steps may be further executed: if the query data table is the first data table and the second target data table, judging whether the first data table and the second target data table are related based on the query field according to the adjusted at least one association relation, wherein the second target data table is any one of the at least two second data tables and the at least one second auxiliary table; if the correlation is not based on the query field, determining at least one query scheme for querying the query field according to the library dividing rule; calculating a third cost for each of the at least one query plan; and outputting the at least one query scheme and the third cost to prompt that a query scheme is selected from the at least one query scheme based on the third cost, and querying by adopting the selected query scheme.
In the embodiment of the present invention, if the query data tables are the first data table and the second target data table, the database management server may determine that the query is a multi-table-based query, and may determine, according to the adjusted at least one association relationship, whether the first data table and the second target data table are associated based on the query field, and if the query data table is associated based on the query field, may determine that the query request is an optimal query, and may quickly locate the query field based on the query field. If the query is not associated based on the query field, it may be determined that the query is not optimal, and the database management server may determine at least one query plan for querying the query field according to the banking rule, calculate a third price for each query plan of the at least one query plan, output the at least one query plan and the third price, prompt to select a query plan from the at least one query plan based on the third price, and query using the selected query plan.
It should be noted that the at least one query plan may include a global scanning plan or a plan that adds the query field in the adjusted association relationship.
It should be noted that, for the explanation of steps S301 to S304 in the embodiment of the present invention, reference may be made to the explanation of steps S201 to S204 in the embodiment of fig. 2a, and repeated descriptions are omitted.
In the embodiment of the invention, the database management server can calculate the query performance generated based on the field and the cost generated by establishing the routing field based on the field according to the query performance and the cost, and establishes more routing fields as much as possible on the premise of sacrificing lower cost, namely optimizing single-table query as much as possible. The method comprises the steps of determining attribute information of a routing field according to a database partitioning rule, establishing a mapping relation between the routing field and the attribute information of the routing field, rapidly positioning a field to be queried according to the mapping relation during data query, and improving the efficiency of data query.
Based on the above description of a data processing method, an embodiment of the present invention provides a flowchart of another data processing method, which is applied to a database management server, and as shown in fig. 4a, the data processing method may include:
s401, acquiring at least one incidence relation of the first data table.
S402, if a first conflict incidence relation exists in the at least one incidence relation, generating at least one first candidate scheme for solving the first conflict incidence relation.
S403, determining a first target candidate scheme from the at least one first candidate scheme, and adjusting the at least one association relationship by using the first target candidate scheme to obtain the adjusted at least one association relationship.
S404, performing database partitioning on the first data table and the at least two second data tables according to the adjusted at least one incidence relation and the database partitioning rule, or performing database partitioning on the first data table.
S405, a request for establishing an association relation based on a third field aiming at a third data table and the first data table is received, wherein the third data table is a parent table of the first data table.
In the embodiment of the invention, if the association relation needs to be added, namely the connection relation between two data tables needs to be added, after the association relation is added, the database management server can judge whether a new conflict association relation is introduced, and if so, the conflict association relation needs to be solved, so that the data tables are sorted again. The database management server may receive a request to establish an association based on the third field for the third data table and the first data table.
Note that the third data table is a data table that the database management server already exists before executing step S405 and has no connection relationship with the first data table.
The receiving of the request for establishing the association relationship based on the third field for the third data table and the first data table may be a request sent by a user, or a request sent automatically by the database management server according to the query record of the user, that is, the database management server may support a dynamic elastic expansion function, and may dynamically update the sub-database scheme according to different rules of the user querying the database, so as to maximize the resource utilization rate.
It should be noted that, the database management server may receive an instruction to add an association relationship, where the added association relationship may be a self-association relationship of the third data table, and since the self-association relationship is a conflict association relationship, the database management server may resolve the conflict association relationship by using a candidate scheme based on the global secondary index, so that the third data table may be sorted based on the association relationship.
S406, establishing an association relation between the third data table and the first data table based on the third field.
S407, based on the adjusted at least one association relationship, updating the association relationship among the third data table, the first data table, and a third target data table to obtain a first local association relationship, where the third target data table is a data table associated with the first data table in the at least two second data tables.
In the embodiment of the present invention, if an instruction for adding an association relationship is received, the database management server may update the association relationships of the third data table, the first data table, and the third target data table based on the adjusted at least one association relationship to obtain a first local association relationship, that is, update the relationship between the data tables by using a local update method, so as to determine whether to reservior the data tables according to the locally updated association relationship.
For example, if the adjusted at least one association relationship diagram is shown in fig. 4b, the first data table is table 1, the second sub-table is associated with table 3 based on field 2, table 1 is associated with table 2 based on field 1, and if a request for establishing an association relationship based on a third field is received for a third data table and the first data table, where the third data table is table 4 and the third field is field 3, the database management server may establish an association relationship between table 4 and table 1 based on field 3, and update the association relationships among the third data table, the first data table, and the third target data table based on the adjusted at least one association relationship to obtain a local association relationship, so as to obtain the first local association relationship diagram shown in fig. 4 c.
As an alternative embodiment, after performing step S407, the database management server may further perform the following steps: if a second conflict incidence relation exists in the first local incidence relation, generating at least one second candidate scheme for solving the second conflict incidence relation, wherein the at least one second candidate scheme is a scheme based on a global secondary index, determining a second target candidate scheme from the at least one second candidate scheme, calculating a fourth cost for solving the second conflict incidence relation by adopting the second target candidate scheme, if the fourth cost is smaller than a cost threshold value, adjusting the first local incidence relation by adopting the second target candidate scheme to obtain the adjusted first local incidence relation, and performing library splitting on the first data table, the third target data and the third data table again according to the adjusted first local incidence relation and the library splitting rule.
The fourth cost may be a query time cost or a query error cost caused by solving the second conflict association relationship by using the second target candidate scheme.
For example, as shown in the first local association relationship diagram of fig. 4c, since table 1 is associated with table 2 based on field 1, table 1 is associated with table 4 based on field 3, that is, it is determined that the first data table is associated with the third data table and at least two data tables based on different fields, it may be determined that a second conflicting association relationship exists in the first local association relationship, at least one second candidate for solving the second conflicting association relationship may be generated, where the at least one second candidate includes a candidate based on a global secondary index of field 1 and a candidate based on a global secondary index of field 3, a second target candidate is selected from the at least one second candidate, and if the selected second target candidate is a candidate based on a global secondary index of field 3,
and calculating a fourth cost for solving the second conflict association relationship by using the candidate scheme based on the global secondary index of the field 3, and if the fourth cost is less than a cost threshold, adjusting the first local association relationship by using the candidate scheme based on the global secondary index of the field 3 to obtain the adjusted first local association relationship, wherein the adjusted first local association relationship is shown in fig. 4 d. The database management server can perform the reservior on the table 1, the table 2, the table 4 and the third secondary table according to the adjusted first local association relation and the reservior rule.
Wherein the third sub-table comprises at least one field obtained from table 1, the at least one field comprising field 3.
In the embodiment of the present invention, if the second conflicting association does not exist in the first local association, the database management server may store a relationship diagram indicating the first local association, so as to assist in querying the data table; if the first local association relationship has a second conflict association relationship, the database management server can solve the second conflict association relationship, and perform database partitioning on the first data table and the at least two second data tables again, so that the database partitioning efficiency can be improved.
If a second conflict incidence relation exists in the first local incidence relation, the database management server may generate at least one second candidate scheme for solving the second conflict incidence relation, determine a second target candidate scheme from the at least one second candidate scheme, calculate a fourth cost for solving the second conflict incidence relation by using the second target candidate scheme, if the fourth cost is less than a cost threshold, determine that the cost for updating the incidence relation among the data tables based on a local updating method is relatively low, adjust the first local incidence relation by using the second target candidate scheme, obtain the adjusted first local incidence relation, update the incidence relation of the data tables based on the adjusted first local incidence relation and the adjusted subbase rule, reserve the first data table, the third target data and the target second subbase, when a request for adding an incidence relation is received, update the incidence relation of the data tables based on the local updating method, and improve the efficiency of solving a secondary conflict scheme based on the global incidence relation when a subbase correlation relation exists in the locally updated incidence relation.
As an optional implementation manner, if the first data table, the third data table, and the third target data table are associated based on different fields, it is determined that a second conflicting association relationship exists in the first local association relationship.
As an optional implementation manner, if the four costs are greater than or equal to the cost threshold, updating the at least one association relationship by using the association relationship between the third data table and the first data table based on the third field to obtain a global association relationship; if a third conflict incidence relation exists in the global incidence relation, generating at least one third candidate scheme for solving the third conflict incidence relation, wherein the at least one third candidate scheme is a scheme based on a global secondary index; and determining a third target scheme to be selected from the at least one third scheme to be selected, adjusting the global incidence relation by adopting the third target scheme to be selected to obtain the adjusted global incidence relation, and reseeding the first data table, the third data table and the at least two second data tables according to the adjusted global incidence relation and the reseeding rule.
For example, if the four costs are greater than or equal to the cost threshold, updating the at least one association relationship based on the association relationship between the third data table and the first data table and obtaining a global association relationship, where the global association relationship is shown in fig. 4e, and since table 1 is respectively associated with table 2, table 3, and table 4 based on field 1, field 2, and field 3, that is, the first data table is associated with at least two second data tables and the third data table based on different fields, and it is determined that a third conflicting association relationship exists in the global association relationship, at least one third candidate scheme for solving the third conflicting association relationship is generated, where the at least one candidate scheme includes a candidate scheme based on global secondary indexes of field 1 and field 2, and a candidate scheme based on global secondary indexes of field 2 and field 3. Determining a third target candidate scheme from the at least one third candidate scheme, where the third target candidate scheme is a candidate scheme based on a global secondary index of field 1 and field 2, adjusting the global association relationship by using the third target candidate scheme to obtain the adjusted global association relationship, and obtaining the adjusted global association relationship, where fig. 4f shows that the adjusted global association relationship is obtained, and reservoring the first data table, the third data table, and the at least two second data tables according to the adjusted global association relationship and the reservoring rule.
In the embodiment of the present invention, if the four costs are greater than or equal to the cost threshold, the database management server may determine that the cost for updating the association relationship of each data table based on the local update method is relatively high, and may update the association relationship between each data table based on the global update method. The method includes the steps of updating at least one association relation by adopting the association relation between a third data table and a first data table based on a third field to obtain a global association relation, generating at least one third candidate scheme for solving the third conflict association relation if a third conflict association relation exists in the global association relation, determining a third target candidate scheme from the at least one third candidate scheme, adjusting the global association relation by adopting the third target candidate scheme to obtain the adjusted global association relation, and reseeding the first data table, the third data table and the at least two second data tables according to the adjusted global association relation and the reseeding rule.
When an instruction for adding the association relation is received, the database management server adopts a local updating method to update the association relation of the local data tables, and compared with a method for updating the association relation of all the data tables by adopting a global updating method, the generated data updating and data migration costs are lower, but the performance of data query is reduced, so that a proper updating method can be selected to update the association relation of the data tables according to requirements.
As an optional implementation manner, if it is determined that the first data table is associated with the at least two second data tables and the third data table based on different fields according to the global association relationship, it is determined that a third conflicting association relationship exists in the global association relationship.
As an optional implementation manner, receiving an add instruction for a fourth data table, if the fourth data table is associated with the first data table based on a fourth field, establishing an association relationship between the fourth data table and the first data table based on the fourth field, where the fourth data table is a parent table of the first data table, and determining a fourth target data table associated with the first data table according to the adjusted at least one association relationship; if a second local association relationship has a fourth conflict association relationship, generating at least one fourth candidate scheme for solving the fourth conflict association relationship, wherein the at least one fourth candidate scheme is a scheme based on a global secondary index, and the second local association relationship is an association relationship among the first data table, the fourth data table and the fourth target data table; determining a fourth target scheme to be selected from the at least one fourth scheme to be selected, adjusting the second local incidence relation by using the fourth target scheme to be selected to obtain the adjusted second local incidence relation, and performing library splitting on the first data table, the fourth data table and the fourth target data table according to the adjusted second local incidence relation and the library splitting rule.
For example, if the adjusted at least one association relationship is as shown in fig. 4f, and an add instruction is received for a fourth data table, the fourth data table is table 5, and according to the adjusted at least one association relationship shown in fig. 4f, the database management server may determine that the fourth target data table associated with table 1 is table 4. If table 5 is associated with the first data table (i.e. table 1) based on the fourth field (i.e. field 4), the association relationship between table 1 based on field 5 and table 5 is established, and the association relationship among table 5, table 4 and table 1 is updated to obtain the second local association relationship, which is shown in fig. 4 g. According to fig. 4g, table 1 is associated with table 4 based on field 3, and is associated with table 5 based on field 4, that is, it is determined that the first data table is associated with the fourth data table and the fourth target data table based on different fields, and therefore it may be determined that a fourth conflicting association relationship exists in the second local association relationships between table 1 and tables 4 and 5.
The database management server may generate at least one fourth candidate scheme for solving the fourth conflicting association relationship, where the fourth candidate scheme includes a candidate scheme based on the global secondary index of field 3 and a candidate scheme based on the global secondary index of field 4, determine a fourth target candidate scheme from the at least one fourth candidate scheme, and if the fourth candidate scheme is the candidate scheme based on the global secondary index of field 3, adjust the second local association relationship by using the fourth target candidate scheme to obtain the adjusted second local association relationship, where the adjusted second local association diagram is shown in fig. 4h, and perform reservoiring on table 1, table 4, and table 5 according to the adjusted second local association relationship and the reservoiring rule.
Wherein, the sixth sub-table in fig. 4h is at least one field obtained from table 1, and the at least one field includes field 3.
In the embodiment of the invention, if an instruction of a newly-added data table (namely, a fourth data table) is received, the database management server can update the incidence relation between the newly-added data table and the original data table (the first data table and at least two second data tables) by adopting a local updating method, judge whether a conflict incidence relation exists in the incidence relation after local updating, if so, solve the conflict incidence relation by adopting a candidate scheme of global secondary index, and perform reshuffling on the newly-added data table and the data table associated with the newly-added data table.
As an optional implementation manner, if it is determined that the fields associated with the first data table and the fourth target data table are different from the fourth field according to the adjusted at least one association relationship, the fourth conflicting association relationship exists in the second local association relationship.
In the embodiment of the present invention, if it is determined that the fields associated with the first data table and the fourth target data table are different from the fourth field according to the adjusted at least one association relationship, it is determined that the first data table is associated with the fourth target data table and the fourth data table based on different fields, and it may be further determined that the fourth conflicting association relationship exists in the second local association relationship.
It should be noted that, if the newly added fourth data table has a self-association relationship, the database management server may establish a relationship based on a self-association field of the data table, and since the self-association relationship is a conflict association relationship, the database management server may process the conflict association relationship based on the candidate scheme of the global secondary index, so that the fourth data table may be sorted based on the association relationship, and a specific implementation process is shown in fig. 2a.
It should be noted that, for the explanation of steps S401 to S404 in the embodiment of the present invention, reference may be made to the explanation of steps S201 to S204 in the embodiment of fig. 2a, and repeated descriptions are omitted.
In the embodiment of the present invention, if an instruction of adding an association relationship (increasing a connection relationship between data tables) is received, the database management server may adjust an association relationship between a partial data table (a data table associated with a newly added connection) by using a local update method, and if a conflict association relationship exists in the adjusted local association relationship, the conflict association relationship is processed by using a candidate scheme of a global secondary index, so as to perform sub-base on the partial data table based on the association relationship. However, when the cost of the local updating method is greater than the cost threshold, the association relationship among all the data tables can be adjusted by adopting a global updating method, if a conflict association relationship exists in the adjusted global updating association relationship, the conflict association relationship can be processed by adopting a scheme to be selected based on the global secondary index, so that all the data tables can be reseated based on the association relationship, and when an instruction for adding the association relationship is received, the reseating mode of each data table can be automatically updated.
Based on the above description of a data processing method, the data processing apparatus provided in the present invention is applied to a database management server, please refer to fig. 5, and as shown in fig. 5, the data processing apparatus may include an obtaining module 501, a generating module 502, a determining module 503, an adjusting module 504, a library partitioning module 505, a first establishing module 506, a second establishing module 507, a first receiving module 508, an analyzing module 509, a determining module 510, a calculating module 511, an outputting module 512, a second receiving module 513, a third establishing module 514, an updating module 515, a third receiving module 516, and a fourth establishing module 517, where:
an obtaining module 501, configured to obtain at least one association relationship of a first data table, where the at least one association relationship includes an association relationship between the first data table and at least two second data tables or includes a self-association relationship of the first data table, where the second data tables are parent tables of the first data table, and one association relationship corresponds to one field.
A generating module 502, configured to generate at least one first candidate scheme for solving a first conflicting association relationship if the first conflicting association relationship exists in the at least one association relationship, where the at least one first candidate scheme is a candidate scheme based on a global secondary index.
A determining module 503, configured to determine a first target candidate scheme from the at least one first candidate scheme.
An adjusting module 504, configured to adjust the at least one association relationship by using the first target candidate scheme, so as to obtain the adjusted at least one association relationship.
And a database partitioning module 505, configured to perform database partitioning on the first data table and the at least two second data tables according to the adjusted at least one association relationship and the adjusted database partitioning rule, or perform database partitioning on the first data table.
Optionally, the determining module 503 is further configured to determine that a first conflict association relationship exists if the first data table has self-association; and if the first data table is associated with the at least two second data tables based on different fields, determining that a first conflict association relationship exists in the at least one association relationship.
Optionally, the generating module 502 is specifically configured to generate a candidate scheme of a global secondary index based on a self-association field if the first data table has a self-association relationship; and if the first data table is associated with the at least two second data tables based on different fields, generating at least two candidate schemes based on the global secondary index, wherein the at least two candidate schemes are based on different fields.
Optionally, the first target candidate is a candidate based on the global secondary index of the self-associated field; the adjusting module 504 adjusts at least one association relationship of the first data table by using the first target candidate scheme, and a specific manner of obtaining the adjusted at least one association relationship includes: releasing the self-association relationship of the first data table; obtaining a first secondary table from the first data table, wherein the first secondary table comprises at least one field in the first data table, and the at least one field comprises the self-association field; and establishing an association relation between the first auxiliary table and the first data table based on the self-association field to adjust the at least one association relation, so as to obtain the adjusted at least one association relation.
Optionally, the first target candidate scheme is a candidate scheme based on a global secondary index of a first field; the adjusting module 504 adjusts at least one association relationship of the first data table by using the first target candidate scheme, and a specific manner of obtaining the adjusted at least one association relationship includes: determining at least one data table associated with the first data table based on the first field according to the at least one association relationship, wherein the determined at least one data table is one of the at least two second data tables, and the first field is any one of fields corresponding to the at least one association relationship; disassociating the first data table from the determined at least one data table based on the first field; obtaining at least one second secondary table from the first data table, wherein the second secondary table comprises at least one field in the first data table, and the at least one field comprises the first field; and establishing an association relationship between the at least one second auxiliary table and the determined at least one data table based on the first field so as to adjust the at least one association relationship to obtain the adjusted at least one association relationship.
Optionally, the library sorting module 505 is specifically configured to determine a first library sorting field according to the adjusted at least one association relationship; dividing the first data table and the first sub table according to the first database dividing field and the database dividing rule to obtain a plurality of first sub data tables; allocating the plurality of first sub data tables into at least one data unit.
Optionally, the library dividing module 505 is specifically configured to determine at least one second library dividing field according to the adjusted at least one association relationship; dividing the first data table, the at least two second data tables and the at least one second sub-table according to the at least one second database division field and the database division rule to obtain a plurality of second sub-data tables; allocating the plurality of second sub data tables into at least one data unit.
Optionally, the determining module 503 is further configured to determine at least one first routing field of the first data table according to a historical query log; determining attribute information of the at least one first routing field according to the sub-library rule, where the attribute information of the first routing field includes at least one of a data table to which the first routing field belongs, a sub-data table to which the first routing field belongs, or a data unit in which the sub-data table to which the first routing field belongs.
Optionally, the first establishing module 506 is configured to establish a mapping relationship between the at least one first routing field and the attribute information of the at least one first routing field.
Optionally, the determining module 503 is further configured to determine at least one second routing field of the first data table and the at least two second data tables according to a historical query log; and determining attribute information of the at least one second routing field according to the database partitioning rule, wherein the attribute information of the second routing field comprises at least one of a data table to which the second routing field belongs, a sub-data table to which the second routing field belongs, or a data unit in which the sub-data table to which the second routing field belongs.
Optionally, the second establishing module 507 is configured to establish a mapping relationship between the at least one second routing field and the attribute information of the at least one second routing field.
Optionally, the determining module 503 is specifically configured to determine, according to the historical query log, a frequency with which each field of each data table of the first data table and the at least two second data tables is queried; determining query performance generated by querying based on each field according to the queried frequency of each field; calculating a second cost generated by establishing a routing field based on each field; and if the second field meets the rule of establishing the routing field according to the query performance and the second price, determining the second field as the second routing field, wherein the second field is any one field of the first data table or the at least two second data tables.
Optionally, the determining module 503 is specifically configured to calculate a first cost for solving the first conflict association relationship by using each first candidate scheme in the at least one first candidate scheme; and determining the scheme to be selected with the minimum cost according to the first price, and determining the scheme to be selected with the minimum cost as the first target scheme to be selected.
Optionally, the first receiving module 508 is configured to receive the query request.
Optionally, the parsing module 509 is configured to parse the query request to obtain a query data table or a query field.
Optionally, the determining module 510 is configured to determine whether the query field is matched with a routing field of the first target data table if the query data table is the first target data table, where the first target data table is any one of the first data table and the at least two second data tables.
Optionally, the obtaining module 501 is further configured to obtain at least one updated field if the fields are not matched, where each updated field in the at least one updated field includes the query field and a target routing field, the target routing field is any one of the routing fields of the first target data table, and each updated field in the at least one updated field is different.
Optionally, the calculating module 511 is configured to calculate a query cost generated by performing a query based on each updated field of the at least one updated field.
Optionally, the output module 512 is configured to output the at least one updated field and the query cost of each updated field, so as to prompt that a target updated field is selected for querying based on the query cost of each updated field, where the target updated field is a field in the at least one updated field.
Optionally, the determining module 510 is further configured to determine, if the query data table is the first data table and the second target data table, whether the first data table and the second target data table are associated based on the query field according to the adjusted at least one association relationship, where the second target data table is any one of the at least two second data tables and the at least one second sub-table.
Optionally, the determining module 503 is further configured to determine, if the query field association is not based on the query field association, at least one query scheme for querying the query field according to the library partitioning rule.
Optionally, the calculating module 511 is further configured to calculate a third cost of each query plan in the at least one query plan.
Optionally, the output module 512 is further configured to output the at least one query scheme and the third cost, so as to prompt that a query scheme is selected from the at least one query scheme based on the third cost, and query by using the selected query scheme.
Optionally, the second receiving module 513 is configured to receive a request for establishing an association relationship based on a third field for a third data table and the first data table, where the third data table is a parent table of the first data table.
Optionally, the third establishing module 514 is configured to establish an association relationship between the third data table and the first data table based on the third field.
Optionally, the updating module 515 is configured to update, based on the adjusted at least one association relationship, the association relationship among the third data table, the first data table, and a third target data table to obtain a first local association relationship, where the third target data table is a data table associated with the first data table in the at least two second data tables.
Optionally, the generating module 512 is further configured to generate at least one second candidate scheme for solving the second conflicting association relationship if the second conflicting association relationship exists in the first local association relationship, where the at least one second candidate scheme is a scheme based on a global secondary index.
Optionally, the determining module 503 is further configured to determine a second target candidate scheme from the at least one second candidate scheme.
Optionally, the calculating module 511 is further configured to calculate a fourth cost for solving the second conflict association relationship by using the second target candidate scheme.
Optionally, the adjusting module 504 is further configured to adjust the first local association relationship by using the second target candidate scheme if the fourth cost is smaller than a cost threshold, so as to obtain the adjusted first local association relationship.
Optionally, the database partitioning module 505 is further configured to perform database partitioning again on the first data table, the third data table, and the third target data table according to the adjusted first local association relationship and the database partitioning rule.
Optionally, the determining module 503 is configured to determine that a second conflict association relationship exists in the first local association relationship if the first data table, the third data table, and the third target data table are associated based on different fields.
Optionally, the updating module 515 is configured to update the at least one association relationship based on the association relationship between the third field and the first data table to obtain a global association relationship if the four costs are greater than or equal to the cost threshold.
Optionally, the generating module 502 is further configured to generate at least one third candidate scheme for solving a third conflicting association relation if the third conflicting association relation exists in the global association relation, where the at least one third candidate scheme is a scheme based on a global secondary index.
Optionally, the determining module 503 is further configured to determine a third target candidate scheme from the at least one third candidate scheme.
Optionally, the adjusting module 504 is further configured to adjust the global association relationship by using the third target candidate scheme, so as to obtain the adjusted global association relationship.
Optionally, the database partitioning module 505 is further configured to perform database partitioning again on the first data table, the third data table, and the at least two second data tables according to the adjusted global association relationship and the database partitioning rule.
Optionally, the determining module 503 is further configured to determine that a third conflicting association exists in the global association relationship if it is determined that the first data table is associated with the at least two second data tables and the third data table based on different fields according to the global association relationship.
Optionally, the third receiving module 516 is configured to receive an add instruction for the fourth data table.
Optionally, the fourth establishing module 517 is configured to, if the fourth data table is associated with the first data table based on a fourth field, establish an association relationship between the fourth data table and the first data table based on the fourth field, where the fourth data table is a parent table of the first data table.
Optionally, the determining module 503 is further configured to determine a fourth target data table associated with the first data table according to the adjusted at least one association relationship.
Optionally, the generating module 502 is further configured to generate at least one fourth candidate scheme for solving a fourth conflicting association relationship if the second local association relationship has a fourth conflicting association relationship, where the at least one fourth candidate scheme is a scheme based on a global secondary index, and the second local association relationship is an association relationship among the first data table, the fourth data table, and the fourth target data table.
Optionally, the determining module 503 is further configured to determine a fourth target candidate scheme from the at least one fourth candidate scheme.
Optionally, the adjusting module 504 is further configured to adjust the second local association relationship by using the fourth target candidate scheme, so as to obtain the adjusted second local association relationship.
Optionally, the database partitioning module 505 is further configured to perform database partitioning on the first data table, the fourth data table, and the fourth target data table according to the adjusted second local association relationship and the database partitioning rule.
Optionally, the determining module 503 is configured to determine that the second local association relationship has the fourth conflicting association relationship if the fields associated with the first data table and the fourth target data table are different from the fourth field according to the adjusted at least one association relationship.
In the embodiment of the present invention, the data processing apparatus has a function of implementing corresponding steps executed by the database management server in the data processing method in the corresponding embodiments of fig. 2a, fig. 3, and fig. 4 a. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules may be software and/or hardware.
Based on the same inventive concept, as the principle and the advantageous effects of the data processing apparatus to solve the problems can refer to the embodiments of the data processing method described in fig. 2a, fig. 3, and fig. 4a and the advantageous effects brought thereby, the embodiments of the data processing apparatus can refer to the embodiments of the data processing method described in fig. 2a, fig. 3, and fig. 4a, and repeated details are omitted.
Based on the above description of a data processing apparatus, the present invention provides a database management server, referring to fig. 6, where the database management server shown in fig. 6 may include: comprises a processor 601, a memory 602, a communication interface 603 and a power supply 604, wherein the processor 601, the memory 602, the communication interface 603 and the power supply 604 are connected with each other through a bus.
The processor 601 may be one or more Central Processing Units (CPUs), and in the case that the processor 601 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.
The memory 602 includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), and is used for storing instructions and data, such as an identifier of a gateway connected to the control device.
The communication interface 603 is connected to other network devices. For example, the communication interface 603 includes a plurality of interfaces respectively connected to a plurality of gateways. The communication interface 603 may be a wired interface, a wireless interface, or a combination thereof. The wired interface may be, for example, an ethernet interface. The ethernet interface may be an optical interface, an electrical interface, or a combination thereof. The wireless interface may be, for example, a Wireless Local Area Network (WLAN) interface, a cellular network interface, or a combination thereof. The communication interface 603 is used for receiving or sending data under the control of the controller, such as receiving a query request sent by a client or sending a data table.
And a power supply 604 for supplying power to the database management server.
The memory 602 is also used to store program instructions. The processor 601 may call the program instructions stored in the memory 602 to implement the data processing method according to the embodiments of the present application.
Optionally, the processor 601 in the embodiment of the present invention may implement functions of the obtaining module 501, the generating module 502, the determining module 503, the adjusting module 504, the library dividing module 505, the first establishing module 506, the second establishing module 507, the analyzing module 509, the judging module 510, the calculating module 511, the third establishing module 514, the updating module 515, and the fourth establishing module 517 in fig. 5, and the communication interface 603 may implement functions of the first receiving module 508, the second receiving module 513, the third receiving module 516, and the output module 512 in fig. 5, which is not limited in the embodiment of the present invention.
Based on the same inventive concept, the principle of solving the problem of the database management server provided in the embodiment of the present invention is similar to that of the method embodiment of the present invention, so the implementation and beneficial effects of the database management server can refer to the above method embodiments, and for brevity, the description is omitted here for brevity.
The present invention further provides a computer-readable storage medium, on which a computer program is stored, where the embodiments and advantageous effects of the program for solving the problems can refer to the embodiments and advantageous effects of the data processing method shown in fig. 2a, fig. 3, and fig. 4a, and repeated details are not repeated.
The present invention further provides a computer program product, where the computer program product includes a non-volatile computer-readable storage medium storing a computer program, and when the computer program is executed, the computer program makes a computer execute the steps of the data processing method in the embodiment corresponding to fig. 2a, fig. 3, and fig. 4a, and the problem solving embodiment and the beneficial effects of the computer program product may refer to the embodiment and the beneficial effects of the data processing method in fig. 2a, fig. 3, and fig. 4a, and repeated details are not repeated.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by instructing relevant hardware by a computer program, and the program may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above.

Claims (42)

1. A data processing method, comprising:
acquiring at least one incidence relation of a first data table, wherein the at least one incidence relation comprises an incidence relation between the first data table and at least two second data tables or comprises a self-incidence relation of the first data table, the second data tables are parent tables of the first data table, and one incidence relation corresponds to one field;
if a first conflict incidence relation exists in the incidence relation, generating at least one first candidate scheme for solving the first conflict incidence relation, wherein the at least one first candidate scheme is a candidate scheme based on a global secondary index;
determining a first target scheme to be selected from the at least one first scheme to be selected, and adjusting the at least one association relation by using the first target scheme to be selected to obtain the adjusted at least one association relation;
and performing library division on the first data table and the at least two second data tables according to the adjusted at least one incidence relation and library division rules, or performing library division on the first data table.
2. The method of claim 1, wherein before generating at least one first candidate for resolving the first conflicting association if there is a first conflicting association in the at least one association, further comprising:
if the first data table has self-association, determining that a first conflict association relationship exists;
and if the first data table is associated with the at least two second data tables based on different fields, determining that a first conflict association relationship exists in the at least one association relationship.
3. The method of claim 2, wherein said generating at least one first alternative to resolve said first conflicting association comprises:
if the first data table has a self-association relationship, generating a scheme to be selected of a global secondary index based on a self-association field;
and if the first data table is associated with the at least two second data tables based on different fields, generating at least two candidate schemes based on the global secondary index, wherein the at least two candidate schemes are based on different fields.
4. The method of claim 3, wherein the first target candidate is a candidate based on a global secondary index of the self-associated field;
the adjusting at least one association relationship of the first data table by using the first target candidate scheme to obtain the adjusted at least one association relationship includes:
releasing the self-association relation of the first data table;
obtaining a first secondary table from the first data table, the first secondary table comprising at least one field in the first data table, the at least one field comprising the self-associated field;
and establishing an association relation between the first auxiliary table and the first data table based on the self-association field to adjust the at least one association relation, so as to obtain the adjusted at least one association relation.
5. The method of claim 3, wherein the first target candidate is a candidate based on a global secondary index of a first field;
the adjusting at least one association relationship of the first data table by using the first target candidate scheme to obtain the adjusted at least one association relationship includes:
determining at least one data table associated with the first data table based on the first field according to the at least one association relationship, wherein the determined at least one data table is one of the at least two second data tables, and the first field is any one of fields corresponding to the at least one association relationship;
disassociating the first data table from the determined at least one data table based on the first field;
obtaining at least one second secondary table from the first data table, wherein the second secondary table comprises at least one field in the first data table, and the at least one field comprises the first field;
and establishing an association relation between the at least one second auxiliary table and the determined at least one data table based on the first field so as to adjust the at least one association relation, thereby obtaining the adjusted at least one association relation.
6. The method of claim 4, wherein the pooling the first data table according to the adjusted at least one association relationship and a pooling rule comprises:
determining a first library field according to the adjusted at least one incidence relation;
dividing the first data table and the first sub table according to the first database dividing field and the database dividing rule to obtain a plurality of first sub data tables;
allocating the plurality of first sub data tables into at least one data unit.
7. The method according to claim 5, wherein the performing the banking on the first data table and the at least two second data tables according to the adjusted at least one association relationship and the banking rules comprises:
determining at least one second library field according to the adjusted at least one incidence relation;
dividing the first data table, the at least two second data tables and the at least one second sub table according to the at least one second sub-table field and the sub-table rule to obtain a plurality of second sub data tables;
allocating the plurality of second sub data tables into at least one data unit.
8. The method of claim 6, wherein after the allocating the first plurality of sub data tables into at least one data unit, further comprising:
determining at least one first routing field of the first data table according to historical query logs;
determining attribute information of the at least one first routing field according to the database partitioning rule, wherein the attribute information of the first routing field comprises at least one of a data table to which the first routing field belongs, a sub-data table to which the first routing field belongs or a data unit in which the sub-data table to which the first routing field belongs;
and establishing a mapping relation between the at least one first routing field and the attribute information of the at least one first routing field.
9. The method of claim 7, wherein after said allocating the second plurality of sub-data tables into at least one data unit, further comprising:
determining at least one second routing field of the first data table and the at least two second data tables according to a historical query log;
determining attribute information of the at least one second routing field according to the library partitioning rule, wherein the attribute information of the second routing field comprises at least one of a data table to which the second routing field belongs, a sub-data table to which the second routing field belongs, or a data unit in which the sub-data table to which the second routing field belongs;
and establishing a mapping relation between the at least one second routing field and the attribute information of the at least one second routing field.
10. The method of claim 9, wherein said determining at least one second routing field of the first data table and the at least two second data tables from historical query logs comprises:
determining, from the historical query log, a frequency with which respective fields of each of the first data table and the at least two second data tables are queried;
determining query performance generated by querying based on each field according to the queried frequency of each field;
calculating a second cost generated by establishing a routing field based on each field;
and if the second field meets the rule of establishing the routing field according to the query performance and the second price, determining the second field as the second routing field, wherein the second field is any one field of the first data table or the at least two second data tables.
11. The method of any one of claims 1-10, wherein said determining a first target candidate from among said at least one first candidate comprises:
calculating a first cost for solving the first conflict incidence relation by adopting each first candidate scheme in the at least one first candidate scheme;
and determining the candidate scheme with the minimum cost according to the first price, and determining the candidate scheme with the minimum cost as the first target candidate scheme.
12. The method of claim 8 or 9, further comprising:
receiving a query request, and analyzing the query request to obtain a query data table or a query field;
if the query data table is a first target data table, judging whether the query field is matched with a routing field of the first target data table, wherein the first target data table is the first data table or any one of the at least two second data tables;
if not, acquiring at least one updated field, wherein each updated field in the at least one updated field comprises the query field and a target routing field, the target routing field is any one of the routing fields of the first target data table, and each updated field in the at least one updated field is different;
calculating a query cost generated by querying based on each updated field in the at least one updated field;
and outputting the at least one updated field and the query cost of each updated field to prompt that a target updated field is selected for querying based on the query cost of each updated field, wherein the target updated field is a field in the at least one updated field.
13. The method of claim 12, further comprising:
if the query data table is the first data table and the second target data table, judging whether the first data table and the second target data table are associated based on the query field according to the adjusted at least one association relation, wherein the second target data table is any one of the at least two second data tables and the at least one second auxiliary table;
if the correlation is not based on the query field, determining at least one query scheme for querying the query field according to the library dividing rule;
calculating a third cost for each of the at least one query plan;
and outputting the at least one query scheme and the third cost to prompt that a query scheme is selected from the at least one query scheme based on the third cost, and querying by adopting the selected query scheme.
14. The method of claim 11, further comprising:
receiving a request for establishing an association relation based on a third field aiming at a third data table and the first data table, wherein the third data table is a parent table of the first data table;
establishing an incidence relation between the third data table and the first data table based on the third field;
and updating the association relationship among the third data table, the first data table and a third target data table based on the adjusted at least one association relationship to obtain a first local association relationship, where the third target data table is a data table associated with the first data table in the at least two second data tables.
15. The method of claim 14, wherein after obtaining the first local association relationship, further comprising:
if a second conflict incidence relation exists in the first local incidence relation, generating at least one second candidate scheme for solving the second conflict incidence relation, wherein the at least one second candidate scheme is a scheme based on a global secondary index;
determining a second target candidate scheme from the at least one second candidate scheme;
calculating a fourth cost for solving the second conflict association relation by adopting the second target candidate scheme;
if the fourth cost is smaller than a cost threshold, adjusting the first local incidence relation by adopting the second target scheme to be selected to obtain the adjusted first local incidence relation;
and according to the adjusted first local incidence relation and the library splitting rule, splitting the first data table, the third data table and the third target data table again.
16. The method of claim 15, wherein before generating at least one second candidate for resolving a second conflicting association if the second conflicting association exists in the first local association, further comprising:
and if the first data table, the third data table and the third target data table are associated based on different fields, determining that a second conflict association relationship exists in the first local association relationship.
17. The method of claim 15, further comprising:
if the four costs are greater than or equal to the cost threshold, updating the at least one incidence relation by adopting the incidence relation between the third data table and the first data table based on the third field to obtain a global incidence relation;
if a third conflict incidence relation exists in the global incidence relation, generating at least one third candidate scheme for solving the third conflict incidence relation, wherein the at least one third candidate scheme is a scheme based on a global secondary index;
determining a third target scheme to be selected from the at least one third scheme to be selected, and adjusting the global association relationship by adopting the third target scheme to be selected to obtain the adjusted global association relationship;
and performing reseharing on the first data table, the third data table and the at least two second data tables according to the adjusted global incidence relation and the reseharing rule.
18. The method of claim 17, wherein before generating at least one third candidate for resolving a third conflicting association if the third conflicting association exists in the global association, further comprising:
and if the first data table, the at least two second data tables and the third data table are determined to be associated based on different fields according to the global association relationship, determining that a third conflict association relationship exists in the global association relationship.
19. The method of any one of claims 13-18, further comprising:
receiving an adding instruction aiming at a fourth data table;
if the fourth data table is associated with the first data table based on a fourth field, establishing an association relationship between the fourth data table and the first data table based on the fourth field, wherein the fourth data table is a parent table of the first data table;
determining a fourth target data table associated with the first data table according to the adjusted at least one association relation;
if a second local association relationship has a fourth conflict association relationship, generating at least one fourth candidate scheme for solving the fourth conflict association relationship, wherein the at least one fourth candidate scheme is a scheme based on a global secondary index, and the second local association relationship is an association relationship among the first data table, the fourth data table and the fourth target data table;
determining a fourth target scheme to be selected from the at least one fourth scheme to be selected, and adjusting the second local association relationship by using the fourth target scheme to be selected to obtain the adjusted second local association relationship;
and performing library division on the first data table, the fourth data table and the fourth target data table according to the adjusted second local incidence relation and the library division rule.
20. The method of claim 19, wherein before generating at least one fourth candidate for resolving the fourth conflicting association if the second local association has a fourth conflicting association, further comprising:
and if the fields associated with the first data table and the fourth target data table are different from the fourth field according to the adjusted at least one association relationship, the second local association relationship has the fourth conflict association relationship.
21. A data processing apparatus applied to a database management server, comprising:
an obtaining module, configured to obtain at least one association relationship of a first data table, where the at least one association relationship includes an association relationship between the first data table and at least two second data tables or includes a self-association relationship of the first data table, the second data tables are parent tables of the first data table, and one association relationship corresponds to one field;
a generating module, configured to generate at least one first candidate scheme for solving a first conflicting association relationship if the first conflicting association relationship exists in the at least one association relationship, where the at least one first candidate scheme is a candidate scheme based on a global secondary index;
the determining module is used for determining a first target candidate scheme from the at least one first candidate scheme;
the adjusting module is used for adjusting the at least one incidence relation by adopting the first target candidate scheme to obtain the adjusted at least one incidence relation;
and the database partitioning module is used for partitioning the first data table and the at least two second data tables according to the adjusted at least one incidence relation and the database partitioning rule, or partitioning the first data table.
22. The apparatus of claim 21,
the determining module is further configured to determine that a first conflict association relationship exists if the first data table has self-association; and if the first data table is associated with the at least two second data tables based on different fields, determining that a first conflict association relationship exists in the at least one association relationship.
23. The apparatus of claim 22,
the generating module is specifically configured to generate a candidate scheme of a global secondary index based on a self-association field if the first data table has a self-association relationship; and if the first data table is associated with the at least two second data tables based on different fields, generating at least two candidate schemes based on the global secondary index, wherein the at least two candidate schemes are based on different fields.
24. The apparatus of claim 23, wherein the first target candidate is a candidate based on a global secondary index of the self-associated field;
the adjusting module adjusts at least one association relationship of the first data table by using the first target candidate scheme, and a specific manner of obtaining the adjusted at least one association relationship includes:
releasing the self-association relationship of the first data table; obtaining a first secondary table from the first data table, wherein the first secondary table comprises at least one field in the first data table, and the at least one field comprises the self-association field; and establishing an association relation between the first auxiliary table and the first data table based on the self-association field to adjust the at least one association relation, so as to obtain the adjusted at least one association relation.
25. The apparatus of claim 23, wherein the first target candidate is a candidate based on a global secondary index of a first field;
the adjusting module adjusts at least one incidence relation of the first data table by using the first target candidate scheme, and the specific manner of obtaining the adjusted at least one incidence relation includes:
determining at least one data table associated with the first data table based on the first field according to the at least one association relationship, wherein the determined at least one data table is one of the at least two second data tables, and the first field is any one of fields corresponding to the at least one association relationship; disassociating the first data table from the determined at least one data table based on the first field; obtaining at least one second secondary table from the first data table, wherein the second secondary table comprises at least one field in the first data table, and the at least one field comprises the first field; and establishing an association relation between the at least one second auxiliary table and the determined at least one data table based on the first field so as to adjust the at least one association relation, thereby obtaining the adjusted at least one association relation.
26. The apparatus of claim 24,
the database partitioning module is specifically configured to determine a first database partitioning field according to the adjusted at least one association relationship; dividing the first data table and the first sub table according to the first database dividing field and the database dividing rule to obtain a plurality of first sub data tables; allocating the plurality of first sub data tables into at least one data unit.
27. The apparatus of claim 25,
the database partitioning module is specifically configured to determine at least one second database partitioning field according to the adjusted at least one association relationship; dividing the first data table, the at least two second data tables and the at least one second sub table according to the at least one second sub-table field and the sub-table rule to obtain a plurality of second sub data tables; allocating the plurality of second sub data tables into at least one data unit.
28. The apparatus of claim 26,
the determining module is further configured to determine at least one first routing field of the first data table according to a historical query log; determining attribute information of the at least one first routing field according to the database partitioning rule, wherein the attribute information of the first routing field comprises at least one of a data table to which the first routing field belongs, a sub-data table to which the first routing field belongs or a data unit in which the sub-data table to which the first routing field belongs;
the device further comprises:
a first establishing module, configured to establish a mapping relationship between the at least one first routing field and the attribute information of the at least one first routing field.
29. The apparatus of claim 27,
the determining module is further configured to determine at least one second routing field of the first data table and the at least two second data tables according to a historical query log; determining attribute information of the at least one second routing field according to the library partitioning rule, wherein the attribute information of the second routing field comprises at least one of a data table to which the second routing field belongs, a sub-data table to which the second routing field belongs, or a data unit in which the sub-data table to which the second routing field belongs;
the device further comprises:
and the second establishing module is used for establishing the mapping relation between the at least one second routing field and the attribute information of the at least one second routing field.
30. The apparatus of claim 29,
the determining module is specifically configured to determine, according to the historical query log, a frequency with which each field of each of the first data table and the at least two second data tables is queried; determining query performance generated by querying based on each field according to the queried frequency of each field; calculating a second cost generated by establishing a routing field based on each field; and if the second field meets the rule of establishing the routing field according to the query performance and the second price, determining the second field as the second routing field, wherein the second field is any one field of the first data table or the at least two second data tables.
31. The apparatus of any one of claims 21-30,
the determining module is specifically configured to calculate a first cost for solving the first conflict association relationship by using each of the at least one first candidate solution; and determining the scheme to be selected with the minimum cost according to the first price, and determining the scheme to be selected with the minimum cost as the first target scheme to be selected.
32. The apparatus of claim 28 or 29, further comprising:
the first receiving module is used for receiving the query request;
the analysis module is used for analyzing the query request to obtain a query data table or a query field;
a determining module, configured to determine whether the query field is matched with a routing field of a first target data table if the query data table is the first target data table, where the first target data table is any one of the first data table or the at least two second data tables;
the obtaining module is further configured to obtain at least one updated field if the fields are not matched, where each updated field in the at least one updated field includes the query field and a target routing field, the target routing field is any one of the routing fields in the first target data table, and each updated field in the at least one updated field is different;
a calculation module, configured to calculate a query cost generated by performing a query based on each updated field of the at least one updated field;
and the output module is used for outputting the at least one updated field and the query cost of each updated field to prompt that a target updated field is selected for querying based on the query cost of each updated field, wherein the target updated field is a field in the at least one updated field.
33. The apparatus of claim 32,
the determining module is further configured to determine, if the query data table is the first data table and the second target data table, whether the first data table and the second target data table are associated based on the query field according to the adjusted at least one association relationship, where the second target data table is any one of the at least two second data tables and the at least one second sub-table;
the determining module is further configured to determine at least one query scheme for querying the query field according to the banking rule if the query field association is not based on the query field association;
the calculating module is further configured to calculate a third cost of each query plan in the at least one query plan;
the output module is further configured to output the at least one query scheme and the third cost, so as to prompt that a query scheme is selected from the at least one query scheme based on the third cost, and query by using the selected query scheme.
34. The apparatus of claim 31,
the device further comprises:
a second receiving module, configured to receive a request for establishing an association relationship based on a third field for a third data table and the first data table, where the third data table is a parent table of the first data table;
a third establishing module, configured to establish an association relationship between the third data table and the first data table based on the third field;
and updating the association relationship among the third data table, the first data table and a third target data table based on the adjusted at least one association relationship to obtain a first local association relationship, where the third target data table is a data table associated with the first data table in the at least two second data tables.
35. The apparatus of claim 34,
the generating module is further configured to generate at least one second candidate scheme for solving a second conflicting association relationship if the first local association relationship has the second conflicting association relationship, where the at least one second candidate scheme is a scheme based on a global secondary index;
the determining module is further configured to determine a second target candidate scheme from the at least one second candidate scheme;
the calculating module is further used for calculating a fourth cost for solving the second conflict association relation by adopting the second target candidate scheme;
the adjusting module is further configured to adjust the first local association relationship by using the second target candidate scheme if the fourth cost is smaller than a cost threshold, so as to obtain the adjusted first local association relationship;
and the database partitioning module is further configured to perform database partitioning again on the first data table, the third data table, and the third target data table according to the adjusted first local association relation and the database partitioning rule.
36. The apparatus of claim 35,
the determining module is configured to determine that a second conflicting association exists in the first local association relationship if the first data table, the third data table, and the third target data table are associated based on different fields.
37. The apparatus of claim 35,
the updating module is configured to update the at least one association relationship based on the association relationship between the third field and the first data table to obtain a global association relationship if the four costs are greater than or equal to the cost threshold;
the generating module is further configured to generate at least one third candidate scheme for solving a third conflicting association relationship if the third conflicting association relationship exists in the global association relationship, where the at least one third candidate scheme is a scheme based on a global secondary index;
the determining module is further configured to determine a third target candidate scheme from the at least one third candidate scheme;
the adjusting module is further configured to adjust the global association relationship by using the third target candidate scheme to obtain the adjusted global association relationship;
and the database partitioning module is further used for partitioning the first data table, the third data table and the at least two second data tables again according to the adjusted global incidence relation and the database partitioning rule.
38. The apparatus of claim 37,
the determining module is further configured to determine that a third conflicting association relationship exists in the global association relationship if it is determined that the first data table is associated with the at least two second data tables and the third data table based on different fields according to the global association relationship.
39. The apparatus of any one of claims 33-38, further comprising:
the third receiving module is used for receiving an adding instruction aiming at the fourth data table;
a fourth establishing module, configured to establish an association relationship between the fourth data table and the first data table based on a fourth field if the fourth data table is associated with the first data table based on the fourth field, where the fourth data table is a parent table of the first data table;
the determining module is further configured to determine a fourth target data table associated with the first data table according to the adjusted at least one association relationship;
the generating module is further configured to generate at least one fourth candidate scheme for solving a fourth conflicting association relationship if the second local association relationship has the fourth conflicting association relationship, where the at least one fourth candidate scheme is a scheme based on a global secondary index, and the second local association relationship is an association relationship among the first data table, the fourth data table, and the fourth target data table;
the determining module is further configured to determine a fourth target candidate scheme from the at least one fourth candidate scheme;
the adjusting module is further configured to adjust the second local association relationship by using the fourth target candidate scheme to obtain the adjusted second local association relationship;
the database partitioning module is further configured to perform database partitioning on the first data table, the fourth data table, and the fourth target data table according to the adjusted second local association relationship and the database partitioning rule.
40. The apparatus of claim 39,
the determining module is configured to determine that the second local association relationship has the fourth conflicting association relationship if the fields associated with the first data table and the fourth target data table are different from the fourth field according to the adjusted at least one association relationship.
41. A database management server comprising at least one processor, memory and instructions stored on the memory and executable by the at least one processor, characterized in that the at least one processor executes the instructions to implement the steps of the data processing method of any of claims 1 to 20.
42. A computer-readable storage medium, characterized in that a computer storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the steps of the data processing method according to any one of claims 1 to 20.
CN201710917504.3A 2017-09-29 2017-09-29 Data processing method and device and database management server Active CN110147407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710917504.3A CN110147407B (en) 2017-09-29 2017-09-29 Data processing method and device and database management server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710917504.3A CN110147407B (en) 2017-09-29 2017-09-29 Data processing method and device and database management server

Publications (2)

Publication Number Publication Date
CN110147407A CN110147407A (en) 2019-08-20
CN110147407B true CN110147407B (en) 2023-02-14

Family

ID=67587989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710917504.3A Active CN110147407B (en) 2017-09-29 2017-09-29 Data processing method and device and database management server

Country Status (1)

Country Link
CN (1) CN110147407B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955662A (en) * 2019-11-29 2020-04-03 车智互联(北京)科技有限公司 Method, computing device and storage medium for maintaining data table association relation
CN111190583B (en) * 2019-12-31 2021-10-22 华为技术有限公司 Associated conflict block presenting method and equipment
CN111297403B (en) * 2020-02-25 2021-08-27 中国矿业大学 Rapid and accurate screening and early warning system for pulmonary fibrosis lesion of pneumoconiosis group
CN111930741A (en) * 2020-07-15 2020-11-13 中国银行股份有限公司 Database partitioning method and device and transaction request data reading and writing system
CN112181965A (en) * 2020-09-29 2021-01-05 成都商通数治科技有限公司 MYSQL-based big data cleaning system and method for writing bottleneck into MYSQL-based big data cleaning system
CN112732711A (en) * 2020-12-28 2021-04-30 北京金山云网络技术有限公司 Data storage method and device and electronic equipment
CN113111084A (en) * 2021-03-31 2021-07-13 北京沃东天骏信息技术有限公司 Method and device for processing data
CN113468186B (en) * 2021-09-02 2021-12-21 四川大学华西医院 Data table primary key association method and device, computer equipment and readable storage medium
CN113868251B (en) * 2021-09-24 2022-10-18 北京百度网讯科技有限公司 Global secondary indexing method and device for distributed database
CN114637736B (en) * 2022-03-09 2023-03-31 北京金堤科技有限公司 Database splitting method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646111A (en) * 2013-12-25 2014-03-19 普元信息技术股份有限公司 System and method for realizing real-time data association in big data environment
CN105512200A (en) * 2015-11-26 2016-04-20 华为技术有限公司 Distributed database processing method and device
CN105989195A (en) * 2015-03-23 2016-10-05 国际商业机器公司 Approach and system for processing data in database

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7599934B2 (en) * 2005-09-27 2009-10-06 Microsoft Corporation Server side filtering and sorting with field level security

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646111A (en) * 2013-12-25 2014-03-19 普元信息技术股份有限公司 System and method for realizing real-time data association in big data environment
CN105989195A (en) * 2015-03-23 2016-10-05 国际商业机器公司 Approach and system for processing data in database
CN105512200A (en) * 2015-11-26 2016-04-20 华为技术有限公司 Distributed database processing method and device

Also Published As

Publication number Publication date
CN110147407A (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN110147407B (en) Data processing method and device and database management server
US20220405284A1 (en) Geo-scale analytics with bandwidth and regulatory constraints
US20190278783A1 (en) Compaction policy
WO2018177060A1 (en) Query optimization method and related device
US10831737B2 (en) Method and device for partitioning association table in distributed database
CN108536808B (en) Spark calculation framework-based data acquisition method and device
CN109196807B (en) Network node and method of operating a network node for resource distribution
WO2016169237A1 (en) Data processing method and device
US20240061712A1 (en) Method, apparatus, and system for creating training task on ai training platform, and medium
CN107480254B (en) Online load balancing method suitable for distributed memory database
CN102724301B (en) Cloud database system and method and equipment for reading and writing cloud data
CN109962951B (en) Cloud platform monitoring data system
US20170270149A1 (en) Database systems with re-ordered replicas and methods of accessing and backing up databases
CN108932258B (en) Data index processing method and device
US11625503B2 (en) Data integrity procedure
CN101483668A (en) Network storage and access method, device and system for hot spot data
CN116775712A (en) Method, device, electronic equipment, distributed system and storage medium for inquiring linked list
US9537941B2 (en) Method and system for verifying quality of server
US8819017B2 (en) Affinitizing datasets based on efficient query processing
CN114020446A (en) Cross-multi-engine routing processing method, device, equipment and storage medium
KR102054068B1 (en) Partitioning method and partitioning device for real-time distributed storage of graph stream
CN107679093B (en) Data query method and device
US11550793B1 (en) Systems and methods for spilling data for hash joins
CN115114012B (en) Task allocation method and device, electronic equipment and storage medium
US20240078252A1 (en) Method and system for efficient data management in distributed database system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant