CN114741413A - Data table association processing method and device, computer equipment and storage medium - Google Patents

Data table association processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114741413A
CN114741413A CN202210302257.7A CN202210302257A CN114741413A CN 114741413 A CN114741413 A CN 114741413A CN 202210302257 A CN202210302257 A CN 202210302257A CN 114741413 A CN114741413 A CN 114741413A
Authority
CN
China
Prior art keywords
index
data table
data
row key
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210302257.7A
Other languages
Chinese (zh)
Inventor
李雯
陶涛
刘侃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202210302257.7A priority Critical patent/CN114741413A/en
Publication of CN114741413A publication Critical patent/CN114741413A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The application discloses a data table association processing method and device, computer equipment and a storage medium. The method relates to the technical field of big data intelligent analysis, and comprises the following steps: acquiring data of a data table to be associated; constructing a data table index according to the data of the data table to be associated and a row key design rule, wherein a row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table; storing the data of the data table to be associated in the data table index; constructing a cluster index and a cluster table according to the data table index; and dividing the storage area of the clustered index by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the dividing number of the storage area corresponding to the total data volume. By adopting the method, the high-efficiency associated query of the data table can be realized.

Description

Data table association processing method and device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of big data intelligent analysis technologies, and in particular, to a data table association processing method and apparatus, a computer device, and a storage medium.
Background
With the development of data query technology, technology for querying data in a plurality of associated data tables appears. In an enterprise's existing database application, a large number of user data tables are usually associated, the user data tables are different in size, and there may be inherent business association between the tables and the data between the tables. Designing the HBase table from the perspective of traditional database design tables not only results in an excessive number of tables, but also makes it difficult to implement cross-table associative queries.
At present, two traditional multi-table association query methods exist. The wide table mode and the multiple query mode are generated in advance. The pre-generation wide table mode is to associate three tables into one wide table in an SQL mode in advance according to the query requirement, and then load the data into the HBase table for data query. The multi-query mode is that each table is independently mapped into an HBase table, data is obtained by querying a first table, then a next table is queried through a correlation field, correlation among the data is realized through application, and finally correlated data is obtained. The traditional multi-table association query method has low data query efficiency.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data table association processing method, apparatus, computer device, computer readable storage medium, and computer program product, which can facilitate efficient association query of data tables.
In a first aspect, the present disclosure provides a data table association processing method. The method comprises the following steps:
acquiring data of a data table to be associated;
constructing a data table index according to the data of the data table to be associated and a row key design rule, wherein a row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table;
storing the data of the data table to be associated in the data table index;
constructing a cluster index and a cluster table according to the data table index;
and dividing the storage area of the clustered index by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the dividing number of the storage area corresponding to the total data volume.
In one embodiment, the first data table index whose row key is a common index field includes a primary index and a secondary index, and the row key of the secondary index of the first data table index includes a unique index field of the first data table.
In one embodiment, the row keys of the secondary index include at least two combination index row keys.
In one embodiment, the constructing a cluster index and a cluster table according to the data table index includes: and constructing a cluster index according to the data table index with the least row keys in the cluster table.
In one embodiment, the row key design rule comprises a query scene rule, a field sorting rule and a unique identification rule.
In one embodiment, the calculating the total data amount of each index includes:
calculating the data size of the secondary index using a custom formula, the custom formula comprising:
SE=Y/Z×G
wherein SE represents the data volume of the secondary index, Y represents the data volume of the data table, Z represents the field number of the data table, and G represents the field number of the secondary index.
In a second aspect, the present disclosure further provides a data table association processing apparatus. The device comprises:
the data acquisition module is used for acquiring data of a data table to be associated;
the first building module is used for building a data table index according to the data of the data table to be associated and a row key design rule, wherein a row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table;
the storage module is used for storing the data of the data table to be associated into the data table index;
the second construction module is used for constructing a cluster index and a cluster table according to the data table index;
and the partitioning module is used for partitioning the storage areas of the clustered indexes by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the partitioning number of the storage areas corresponding to the total data volume.
In one embodiment, the first building module is configured to build a first data index, the first data table index whose row key is a common index field includes a primary index and a secondary index, and the row key of the secondary index of the first data table index includes a unique index field of the first data table.
In one embodiment, the first building module is configured to build the secondary index with row keys comprising at least two combined index row keys.
In one embodiment, the second constructing module is configured to construct a cluster index according to a data table index with the least row keys in the cluster table.
In one embodiment, the row key design rule used by the first building module comprises a query scenario rule, a field ordering rule and a unique identification rule.
In one embodiment, the partitioning module is configured to calculate the data size of the secondary index using a custom formula, the custom formula including:
SE=Y/Z×G
wherein SE represents the data volume of the secondary index, Y represents the data volume of the data table, Z represents the field number of the data table, and G represents the field number of the secondary index.
In a third aspect, the present disclosure also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:
acquiring data of a data table to be associated;
constructing a data table index according to the data of the data table to be associated and a row key design rule, wherein a row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table;
storing the data of the data table to be associated in the data table index;
according to the data table index, a cluster index and a cluster table are constructed;
and dividing the storage area of the clustered index by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the dividing number of the storage area corresponding to the total data volume.
In a fourth aspect, the present disclosure also provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring data of a data table to be associated;
constructing a data table index according to the data of the data table to be associated and the row key design rule, wherein the row key of the data table index comprises a common index field, a unique index field corresponding to the data table to be associated is also included in a combined index row key comprising more than two index fields, and the combined index row key is used for uniquely identifying the corresponding data table;
storing the data of the data table to be associated in the data table index;
constructing a cluster index and a cluster table according to the data table index;
and dividing the storage area of the clustered index by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the dividing number of the storage area corresponding to the total data volume.
In a fifth aspect, the present disclosure also provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:
acquiring data of a data table to be associated;
constructing a data table index according to the data of the data table to be associated and a row key design rule, wherein a row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table;
storing the data of the data table to be associated in the data table index;
according to the data table index, a cluster index and a cluster table are constructed;
and dividing the storage area of the clustered index by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the dividing number of the storage area corresponding to the total data volume.
According to the data table association processing method, the data table association processing device, the computer equipment, the storage medium and the computer program product, the cluster table and the cluster index are constructed by constructing the data table index comprising the combined index, the association relation which is convenient for the efficient association query of the data table is established among the data tables to be associated, and in the aspect of data storage, the data of the data table to be associated are stored in the data table index, the storage area of the cluster index is divided by using a preset method, so that the data of the data table is stored in a form which is convenient for the efficient association query of the data table, and the association query efficiency of the data table is improved. According to the scheme, through ingenious design of index row keys and creation of a clustering table, user table data with certain service association is loaded into an HBase (the HBase is a high-reliability, high-performance, column-oriented and telescopic real-time distributed key value storage system) table, each user table has a similar row key, multi-table association query under certain conditions can be realized through query in one table, massive service data is adapted, and a multi-dimensional query scene is greatly optimized.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a diagram of an exemplary implementation of a data table association process;
FIG. 2 is a flowchart illustrating a method for processing table associations according to an embodiment;
FIG. 3 is a block diagram showing the structure of a data table association processing apparatus according to an embodiment;
FIG. 4 is a diagram of the internal structure of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more clearly understood, the present disclosure is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the disclosure and are not intended to limit the disclosure.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The data table association processing method provided by the embodiment of the disclosure can be applied to the application environment shown in fig. 1. Wherein the data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be placed on the cloud or other network server. The server 104 has a data acquisition end, and the data acquisition end can acquire data of the data table to be associated. The server 104 constructs a data table index according to the data of the data table to be associated and the row key design rule, wherein the row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table. The server 104 stores the data of the data table to be associated in the data table index. The server 104 constructs a cluster index and a cluster table according to the data table index. The server 104 divides the storage area of the clustered index using a preset method including calculating a total data amount of each index and a divided number of storage areas corresponding to the total data amount. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.
In one embodiment, as shown in fig. 2, a data table association processing method is provided, which is described by taking the application environment in fig. 1 as an example, and includes the following steps:
s202, obtaining data of the data table to be associated.
The data of the data table to be associated may refer to data contained in the data table that needs to be subjected to the data table association query.
Specifically, data of a data table to be associated is obtained, and the data of the data table to be associated includes the data table to be associated and data included in the data table to be associated. The data tables to be associated may have different table forms.
S204, constructing a data table index according to the data of the data table to be associated and the row key design rule, wherein the row key of the data table index comprises a common index field, the combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table.
The row key design rule may refer to a rule to be followed in the process of designing a row key of an index when constructing a data table index, that is, a standard for designing a row key. The row key of the data table index may contain only one field or may contain a plurality of fields. The common index field may refer to a field existing in different indexes at the same time. The unique index field may refer to a field existing in only one index. The two or more may mean two or more.
Specifically, the English name of the row key, Rowkey, is a field contained in the index. And constructing a data table index of each data table to be associated according to the data of the data table to be associated, and designing a row key of the data table index by using a row key design rule when constructing the data table index. The row key design rule can be determined according to the actual data table query requirement, so that the association query of the data table data can be efficiently carried out. It should be noted that, the efficiently performing the association query of the data table may include, on one hand, rapidly achieving the query purpose in the associated data table, and on the other hand, using the data table of the same association manner may also satisfy the query requirements in different query scenarios, that is, have higher universality. The row key of the data table index may include only the common index field, or may include a unique index field along with the common index field. Each data table index contains row keys, and the row keys contained in a data table may be referred to as index row keys. The index row key may comprise a compound index row key comprising more than two index fields. The combined index row key comprises a common index field and a unique index field corresponding to the data table to be associated. The compound index row key is used to uniquely identify the corresponding data table.
S206, storing the data of the data table to be associated in the data table index.
Specifically, after the data table indexes are built, the data of the data tables to be associated are stored in the corresponding data table indexes, so that an association relation is established between the data tables to be associated, and the data tables to be associated become associated data tables.
And S208, constructing a cluster index and a cluster table according to the data table index.
Wherein, constructing the clustered index may refer to making the position order of the index in the index structure completely consistent with the physical position order of the corresponding data in the clustered table.
Specifically, the cluster index may be constructed according to a certain index in the data table indexes. Further, a cluster index may be constructed according to a field included in a certain index in the data table index. All the data tables having an association relation with one cluster index constitute one cluster table.
And S210, dividing the storage area of the cluster index by using a preset method, wherein the preset method comprises the step of calculating the total data volume of each index and the dividing number of the storage area corresponding to the total data volume.
Wherein, the total data amount of each index may refer to the sum of the data amounts of each index.
Specifically, the total data volume of each index may be calculated according to an existing or customized calculation rule, and then the split number of the storage area corresponding to the total data volume may be calculated according to the data volume of each index and the data storage capacity of each sub-storage area. The storage area corresponding to the total data amount may be divided into a plurality of small storage areas, and the small storage areas generated by the division may be referred to as sub-storage areas. And dividing the storage area of the cluster index according to the dividing number of the storage area corresponding to the total data volume. For example, when the number of divided storage areas corresponding to the total data size is N, the storage area of the cluster index is divided into N divided storage areas. It should be noted that the total data amount is calculated without high precision and can be only an estimation, because the sub-storage area may still have a large storage space. The english name separating the split points of the storage region is SplitKey. The English name of the branch storage area is Region, and the branch storage area can realize distributed storage. The total data size of each sub-storage region may have an upper limit, for example, the upper limit may be 10G (1G 1024 megabytes).
In the data table association processing method, the association relation which is convenient for efficient association query of the data table is established between the data tables to be associated by constructing the data table index comprising the combined index, constructing the cluster table and the cluster index, and in the aspect of data storage, the data of the data table to be associated is stored in the data table index, and the storage area of the cluster index is divided by using a preset method, so that the data of the data table is stored in a form which is convenient for efficient association query of the data table, and the association query efficiency of the data table is improved. The method is characterized in that user table data with certain service association is loaded into one HBase table, so that in the query process, the association query of a plurality of data under certain conditions can be realized by querying in one table. The method not only improves the efficiency of data query, but also can adjacently store the cross-table service data together, so that the cross-table correlation query is developed into a sequential reading operation, the data redundancy is reduced, and the complexity of data preprocessing is reduced. The method greatly optimizes the scene of data table association query by designing the indexing and storage modes, is suitable for the service scene of multi-dimensional mass data, and is more convenient for service personnel to carry out data analysis.
In one embodiment, the first data table index whose row key is a common index field includes a primary index and a secondary index, and the row key of the secondary index of the first data table index includes a unique index field of the first data table.
Specifically, a data table index having a row key as a common index field is referred to as a first data table index, and a data table corresponding to the first data table index is referred to as a first data table. When the row key of one data table index only contains the common index field, a secondary index is established for the corresponding data table. The data table index of which the row key only contains the common index field is called a main index of the corresponding data table, and the corresponding row key is called a row key of the main index. The row key of the secondary index comprises a unique index field of the first data table, and the row key of the secondary index further comprises the common index field, namely the row key of the secondary index is a combined index row key.
In the embodiment, for the data table indexes with row keys as common fields, by designing the secondary indexes, all the data table indexes can have the combined index row keys which can uniquely identify the corresponding data tables, so that the efficient query of the data in the data tables is facilitated.
In one embodiment, the row keys of the secondary index include at least two compound index row keys.
Specifically, the row keys of the secondary index may include two or more combined index row keys. Because the data table to be associated generally comprises at least two data tables, the number of the secondary indexes contained in the data table to be associated reaches two or more. In the plurality of secondary indexes, the row key of the secondary index may include two or more combined index row keys, or the row key of the secondary index may include only one combined index row key.
In this embodiment, the row keys of the secondary index include at least two combined index row keys, which can meet more query requirements, and is beneficial to querying the target data in the data table more quickly, thereby being beneficial to efficient association query of the data table.
In one embodiment, the constructing a cluster index and a cluster table according to the data table index includes: and constructing a cluster index according to the data table index with the least row keys in the cluster table.
Specifically, in the process of constructing the cluster index, the data table index with the least row keys in the cluster table is selected, and other indexes in the cluster table are clustered with the index to construct the cluster index.
In the embodiment, the cluster index is constructed according to the data table index with the least row keys in the cluster table, so that the beneficial effects of simplifying the cluster index and simplifying the retrieval field can be achieved, and the efficient association query of the data table is facilitated.
In one embodiment, the method further comprises: and storing main index data and data table data in the main index, and storing index row key data in the secondary index.
Specifically, main index data and user data are stored in the main index, so that in the query process, after the target main index of the query is determined, the position of the target data table data is quickly determined without jumping to other storage areas. The index row key data is stored in the secondary index, so that the centralized storage of the index row key data can be realized, the rapid retrieval is convenient, and the efficient associated query of the data table is favorably realized. It should be noted that, when there is no secondary index, the index row key data is stored in the primary index.
In one embodiment, the row key design rule comprises a query scenario rule, a field ordering rule and a unique identification rule.
In particular, the query context rule may refer to determining which field the most valuable or most frequent query context for the user is based on. The field sorting rule may refer to whether sorting by a certain field is required when determining the query. A unique identification rule may refer to determining whether a field selected as a row key is capable of uniquely identifying a row of records.
In the embodiment, the row key design rule is stipulated from the angles of the query scene, field sequencing, unique identification and the like, so that the row key convenient for efficient associated query of the data table is designed.
In one embodiment, the calculating the total data amount for each index includes:
calculating the data size of the secondary index using a custom formula, the custom formula comprising:
SE=Y/Z×G
wherein SEThe data size of the secondary index is represented, Y represents the data size of the data table, Z represents the field number of the data table, and G represents the field number of the secondary index.
Specifically, the secondary index may include a plurality of fields, and the number of the fields may refer to the number set as the secondary index. For example, when a certain data table has 1 million historical users, the secondary index of the data table has two fields, half-year new users are estimated to be 10W (W represents ten thousand, 10W represents 10 ten thousand), the number of the fields of the data table is 20, and each piece of data is 512 bytes, the secondary index data volume is estimated: (1000W +10W) × 512)/20 × 2 ═ 512M, (M indicates mega, i.e., the sixth power of 10, 512M indicates 512 mega).
In this embodiment, by determining the calculation method of the secondary index data amount, the beneficial effects of conveniently calculating the data amount of each index and conveniently partitioning the storage area can be achieved.
In one embodiment, the method in the above embodiment is used to load user table data with certain service association into an HBase (the HBase is a high-reliability, high-performance, column-oriented, and scalable real-time distributed key value storage system) table, and each user table has similar row keys, and multi-table association query under certain conditions can be realized by querying in one table, so that massive service data can be adapted, a multi-dimensional query scenario is greatly optimized, and query efficiency is improved.
In one embodiment, after the data of the data table to be associated is processed by using the method, a universal capability package for associating the data table to be associated into a large table is provided, and a matched data access API (API is an Application Programming Interface for short in english) is provided.
In one embodiment, a data table association processing method is provided, the method comprising:
firstly, data of a data table to be associated are obtained, a data table index is constructed according to the data of the data table to be associated and a row key design rule, the row key design rule comprises a query scene rule, a field ordering rule and a unique identification rule, a row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table; the first data table index with the row key as the common index field in the data table index comprises a main index and a secondary index, the row key of the secondary index comprises at least two combined index row keys, and the row key of the secondary index of the first data table index comprises a unique index field of the first data table. And then storing the data of the data table to be associated in the data table index, and constructing a cluster index and a cluster table according to the data table index, wherein the cluster index is the same as the data table index with the least row keys in the cluster table. And then dividing the storage area of the clustered index by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the dividing number of the storage area corresponding to the total data volume, and the calculating the total data volume of each index comprises the following steps:
calculating the data size of the secondary index using a custom formula, the custom formula comprising:
SE=Y/Z×G
wherein SEThe data size of the secondary index is represented, Y represents the data size of the data table, Z represents the field number of the data table, and G represents the field number of the secondary index.
In one embodiment, the amount of data of the main index data is equal to the amount of data of the corresponding data table. The data size of the cluster index is equal to the sum of the data sizes of all indexes under the cluster.
In one embodiment, there are five tables for T1, T2, T3, T4, T5. For the T1 table, the combination index row key is designed to be (a1+ a 2). For the T2 table, the combination index row key is designed to be (a1+ b 2). For the T3 table, the combination index row key is designed to be (a1+ c2+ c 3). For the T4 table, the combination index row key is designed to be (a1+ d 4). For the T5 table, the primary index row key is designed to be a1, and two secondary index row keys are designed, namely: (a1+ e1), (a1+ e2+ e3+ e 4). T1, T2, T3, T4 and T5 are clustered to obtain a cluster table, which is marked as T. And selecting the main index a1 of the T5 table as a clustered index, and clustering the main indexes of other user tables with the index to construct a clustered index. The division points are structured according to a certain rule, for example, when the secondary index field of the data table is 2 and the number of the branch storage areas of the T1 table is m1, the division points of the T1 table may be structured as "001, 003, 005., m1 × 2".
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present disclosure further provides a data table association processing apparatus for implementing the above-mentioned data table association processing method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the method, so specific limitations in one or more embodiments of the data table association processing device provided below can refer to the limitations on the data table association processing method in the foregoing, and details are not described herein again.
Based on the description of the above embodiment of the form page display method, the present disclosure also provides a form page display apparatus. The apparatus may include systems (including distributed systems), software (applications), modules, components, servers, clients, etc. that use the methods described in embodiments of the present specification in conjunction with any necessary apparatus to implement the hardware. Based on the same innovative concept, the embodiments of the present disclosure provide an apparatus in one or more embodiments as described in the following embodiments. Since the implementation scheme of the apparatus for solving the problem is similar to that of the method, the specific implementation of the apparatus in the embodiment of the present specification may refer to the implementation of the foregoing method, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.
In one embodiment, as shown in fig. 3, there is provided a data table association processing apparatus 300, including: a data acquisition module 302, a first construction module 304, a storage module 306, a second construction module 308, and a segmentation module 310, wherein:
a data obtaining module 302, configured to obtain data of a to-be-associated data table.
A first constructing module 304, configured to construct a data table index according to the data of the data table to be associated and a row key design rule, where a row key of the data table index includes a common index field, and a combined index row key including two or more index fields further includes a unique index field corresponding to the data table to be associated, and the combined index row key is used to uniquely identify a corresponding data table.
A storage module 306, configured to store the data of the data table to be associated in the data table index.
And a second constructing module 308, configured to construct a cluster index and a cluster table according to the data table index.
A dividing module 310, configured to divide the storage area of the clustered index using a preset method, where the preset method includes calculating a total data amount of each index and a divided number of storage areas corresponding to the total data amount.
In one embodiment, the first building module is configured to build a first data index, the first data table index whose row key is a common index field includes a primary index and a secondary index, and the row key of the secondary index of the first data table index includes a unique index field of the first data table.
In one embodiment, the first construction module is configured to construct the secondary index with row keys comprising at least two combination index row keys.
In one embodiment, the second constructing module is configured to construct a cluster index according to a data table index with a least row key in the cluster table.
In one embodiment, the row key design rule used by the first building module comprises a query scenario rule, a field ordering rule, and a unique identification rule.
In one embodiment, the partitioning module is configured to calculate the amount of data of the secondary index using a custom formula, the custom formula including:
SE=Y/Z×G
wherein SE represents the data volume of the secondary index, Y represents the data volume of the data table, Z represents the field number of the data table, and G represents the field number of the secondary index.
The modules in the data table association processing device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data of the data table and data related to the data table. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data table association processing method.
Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with the disclosed aspects and does not constitute a limitation on the computing device to which the disclosed aspects apply, as a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
acquiring data of a data table to be associated;
constructing a data table index according to the data of the data table to be associated and a row key design rule, wherein a row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table;
storing the data of the data table to be associated in the data table index;
constructing a cluster index and a cluster table according to the data table index;
and dividing the storage area of the clustered index by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the dividing number of the storage area corresponding to the total data volume.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
the first data table index of which the row key is a common index field comprises a primary index and a secondary index, and the row key of the secondary index of the first data table index comprises a unique index field of the first data table.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
the row keys of the secondary index comprise at least two combined index row keys.
In one embodiment, the processor when executing the computer program further performs the steps of:
the constructing of the cluster index and the cluster table according to the data table index comprises: and constructing a cluster index according to the data table index with the least row keys in the cluster table.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
the calculating the total data volume of each index comprises:
calculating the data volume of the secondary index using a custom formula, the custom formula comprising:
SE=Y/Z×G
wherein SEThe data size of the secondary index is represented, Y represents the data size of the data table, Z represents the field number of the data table, and G represents the field number of the secondary index.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring data of a data table to be associated;
constructing a data table index according to the data of the data table to be associated and a row key design rule, wherein a row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table;
storing the data of the data table to be associated in the data table index;
constructing a cluster index and a cluster table according to the data table index;
and dividing the storage area of the clustered index by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the dividing number of the storage area corresponding to the total data volume.
In one embodiment, the computer program when executed by the processor further performs the steps of:
the first data table index of which the row key is a common index field comprises a primary index and a secondary index, and the row key of the secondary index of the first data table index comprises a unique index field of the first data table.
In one embodiment, the computer program when executed by the processor further performs the steps of:
the row keys of the secondary index comprise at least two combination index row keys.
In one embodiment, the computer program when executed by the processor further performs the steps of:
the constructing of the cluster index and the cluster table according to the data table index comprises: and constructing a cluster index according to the data table index with the least row keys in the cluster table.
In one embodiment, the computer program when executed by the processor further performs the steps of:
the calculating the total data volume of each index comprises:
calculating the data size of the secondary index using a custom formula, the custom formula comprising:
SE=Y/Z×G
wherein SEThe data size of the secondary index is represented, Y represents the data size of the data table, Z represents the field number of the data table, and G represents the field number of the secondary index.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:
acquiring data of a data table to be associated;
constructing a data table index according to the data of the data table to be associated and a row key design rule, wherein a row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table;
storing the data of the data table to be associated in the data table index;
constructing a cluster index and a cluster table according to the data table index;
and dividing the storage area of the clustered index by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the dividing number of the storage area corresponding to the total data volume.
It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present disclosure are information and data that are authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided by the present disclosure may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example. The databases involved in embodiments provided by the present disclosure may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided in this disclosure may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic, quantum computing based data processing logic, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present disclosure, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present disclosure. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the concept of the present disclosure, and these changes and modifications are all within the scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to the appended claims.

Claims (11)

1. A data table association processing method is characterized by comprising the following steps:
acquiring data of a data table to be associated;
constructing a data table index according to the data of the data table to be associated and a row key design rule, wherein a row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table;
storing the data of the data table to be associated in the data table index;
according to the data table index, a cluster index and a cluster table are constructed;
and dividing the storage area of the clustered index by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the dividing number of the storage area corresponding to the total data volume.
2. The method of claim 1, wherein the first data table index whose row key is a common index field comprises a primary index and a secondary index, and wherein the row key of the secondary index of the first data table index comprises a unique index field of the first data table.
3. The method of claim 2, wherein the row keys of the secondary index comprise at least two compound index row keys.
4. The method of claim 1, wherein constructing a cluster index and a cluster table according to the data table index comprises: and constructing a cluster index according to the data table index with the least row keys in the cluster table.
5. The method of claim 1, wherein the row key design rule comprises a query scenario rule, a field ordering rule, and a unique identification rule.
6. The method of claim 2, wherein calculating the total amount of data for each index comprises:
calculating the data size of the secondary index using a custom formula, the custom formula comprising:
SE=Y/Z×G
wherein SEThe data size of the secondary index is represented, Y represents the data size of the data table, Z represents the field number of the data table, and G represents the field number of the secondary index.
7. A data table association processing apparatus, characterized in that the apparatus comprises:
the data acquisition module is used for acquiring data of a data table to be associated;
the first building module is used for building a data table index according to the data of the data table to be associated and a row key design rule, wherein a row key of the data table index comprises a common index field, a combined index row key comprising more than two index fields also comprises a unique index field corresponding to the data table to be associated, and the combined index row key is used for uniquely identifying the corresponding data table;
the storage module is used for storing the data of the data table to be associated into the data table index;
the second building module is used for building a cluster index and a cluster table according to the data table index;
and the partitioning module is used for partitioning the storage areas of the clustered indexes by using a preset method, wherein the preset method comprises the steps of calculating the total data volume of each index and the partition number of the storage areas corresponding to the total data volume.
8. The apparatus of claim 7, wherein the first construction module is configured to construct a first data table index having row keys that are common index fields, wherein the first data table index comprises a primary index and a secondary index, and wherein the row key of the secondary index of the first data table index comprises a unique index field of the first data table.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
11. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.
CN202210302257.7A 2022-03-25 2022-03-25 Data table association processing method and device, computer equipment and storage medium Pending CN114741413A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210302257.7A CN114741413A (en) 2022-03-25 2022-03-25 Data table association processing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210302257.7A CN114741413A (en) 2022-03-25 2022-03-25 Data table association processing method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114741413A true CN114741413A (en) 2022-07-12

Family

ID=82276845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210302257.7A Pending CN114741413A (en) 2022-03-25 2022-03-25 Data table association processing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114741413A (en)

Similar Documents

Publication Publication Date Title
CN110413611B (en) Data storage and query method and device
US7257599B2 (en) Data organization in a fast query system
ES2821415T3 (en) Limit exploration of unordered and / or clustered relationships by using near-ordered correspondences
US7702619B2 (en) Methods and systems for joining database tables using indexing data structures
US9870382B2 (en) Data encoding and corresponding data structure
US20100106713A1 (en) Method for performing efficient similarity search
CN107368527B (en) Multi-attribute index method based on data stream
EP3289484B1 (en) Method and database computer system for performing a database query using a bitmap index
CN105117442B (en) A kind of big data querying method based on probability
US9953058B1 (en) Systems and methods for searching large data sets
Goyal et al. Cross platform (RDBMS to NoSQL) database validation tool using bloom filter
Li et al. SES-LSH: Shuffle-efficient locality sensitive hashing for distributed similarity search
US20220222233A1 (en) Clustering of structured and semi-structured data
JP2019520627A (en) Use of B-trees to store graph information in a database
CN113918605A (en) Data query method, device, equipment and computer storage medium
CN110389953B (en) Data storage method, storage medium, storage device and server based on compression map
CN106326295B (en) Semantic data storage method and device
CN114741413A (en) Data table association processing method and device, computer equipment and storage medium
CN106055690A (en) Method for carrying out rapid retrieval and acquiring data features on basis of attribute matching
Liroz-Gistau et al. Dynamic workload-based partitioning algorithms for continuously growing databases
CN109885570A (en) A kind of multi-Dimensional Range querying method of secondary index combination hash table
CN115114293A (en) Database index creating method, related device, equipment and storage medium
CN111309704B (en) Database operation method and database operation system
Mathew et al. Novel research framework on SN's NoSQL databases for efficient query processing
Omar et al. A scalable storage system for structured data based on higher order index array

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination