CN111694891B - Data table processing method and device - Google Patents

Data table processing method and device Download PDF

Info

Publication number
CN111694891B
CN111694891B CN201910184764.3A CN201910184764A CN111694891B CN 111694891 B CN111694891 B CN 111694891B CN 201910184764 A CN201910184764 A CN 201910184764A CN 111694891 B CN111694891 B CN 111694891B
Authority
CN
China
Prior art keywords
data table
combination
combinations
association
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910184764.3A
Other languages
Chinese (zh)
Other versions
CN111694891A (en
Inventor
杨帆
王能
冯仕炳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Xiaofei Finance Co Ltd
Original Assignee
Mashang Xiaofei Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Xiaofei Finance Co Ltd filed Critical Mashang Xiaofei Finance Co Ltd
Priority to CN201910184764.3A priority Critical patent/CN111694891B/en
Publication of CN111694891A publication Critical patent/CN111694891A/en
Application granted granted Critical
Publication of CN111694891B publication Critical patent/CN111694891B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data table processing method and a device, wherein the method comprises the following steps: according to the collected N associated query statements, counting the first association times of each data table combination in the M data table combinations; the data table combination comprises at least two data tables, and the first association times are the times of association query of the at least two data tables in the data table combination; and determining a first candidate data table combination according to the first association times of each data table combination in the M data table combinations, wherein the first candidate data table combination is used for generating a first wide table. The data table processing method provided by the invention can improve the accuracy of the selected data table for generating the wide table, further avoid generating more redundant wide tables and improve the coverage of the generated wide table to the associated query.

Description

Data table processing method and device
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method and an apparatus for processing a data table.
Background
In the data processing (for example, data mining) process, in order to improve the calculation efficiency and query convenience, the data tables conforming to the three-model design are associated in a redundancy mode, converted into wide tables, and then calculated, queried and the like based on the wide tables.
Currently, it is common to manually select which tables and which fields are associated to make a wide table based on experience. However, because the number of data tables is usually large and the business relationship is complex, manual selection of tables based on experience easily results in generation of a large number of redundant wide tables, waste of storage space, or poor coverage of the generated wide tables for associated queries.
Therefore, the problem that the accuracy of selecting the data table for generating the wide table is poor exists in the prior art.
Disclosure of Invention
The embodiment of the invention provides a data table processing method and device, and aims to solve the problem of poor accuracy of selecting a data table for generating a wide table.
In order to solve the technical problem, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a data table processing method. The method comprises the following steps:
according to the collected N associated query statements, counting the first association times of each data table combination in the M data table combinations; the data table combination comprises at least two data tables, and the first association times are the times of association query of the at least two data tables in the data table combination;
and determining a first candidate data table combination according to the first association times of each data table combination in the M data table combinations, wherein the first candidate data table combination is used for generating a first wide table.
In a second aspect, an embodiment of the present invention further provides a data table processing apparatus. The data sheet processing apparatus includes:
the statistical module is used for counting the first association times of each data table combination in the M data table combinations according to the collected N associated query statements; the data table combination comprises at least two data tables, and the first association times are the times of association query of the at least two data tables in the data table combination;
a first determining module, configured to determine a first candidate data table combination according to the first association times of each data table combination of the M data table combinations, where the first candidate data table combination is used to generate a first wide table.
In a third aspect, an embodiment of the present invention further provides a data table processing apparatus, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program, when executed by the processor, implements the steps of the data table processing method described above.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the data table processing method described above.
In the embodiment of the invention, according to the collected N associated query statements, the first association times of each data table combination in M data table combinations are counted; the data table combination comprises at least two data tables, and the first association times are the times of association query of the at least two data tables in the data table combination; and determining a first candidate data table combination according to the first association times of each data table combination in the M data table combinations, wherein the first candidate data table combination is used for generating a first wide table. Because the data table used for generating the wide table is selected from the associated query times of each data table combination, the accuracy of the selected data table used for generating the wide table can be improved, more redundant wide tables can be avoided from being generated, and the coverage of the generated wide table on the associated query is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flow chart of a method for processing a data table according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for processing a data table according to another embodiment of the invention;
FIG. 3 is a block diagram of a data table processing apparatus according to an embodiment of the present invention;
fig. 4 is a block diagram of a data table processing apparatus according to still another embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a data table processing method. Referring to fig. 1, fig. 1 is a flowchart of a data table processing method according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:
step 101, according to the collected N associated query statements, counting the first association times of each data table combination in the M data table combinations; and the first association times are the times of association query of the at least two data tables in the data table combination.
In this embodiment, the related query statement may refer to a statement that performs related query on at least two data tables, for example, select from t1 join t2 on t1.id ═ t2.id, select from t1 join t2 on t1.id ═ t2.id j join t3 on t2.id ═ t3.id, and so on.
Specifically, the N associated query statements may be associated query statements that are historically associated with data tables in a certain database or a business system, for example, all of the associated query statements that are historically associated with data tables in a certain database or a business system, or associated query statements that are historically associated with data tables in a certain database or a business system within a preset time period, where the preset time period may be reasonably set according to actual needs, for example, 1 month, 10 days, and the like.
In practical applications, associated Query statements for a user, an application program, and the like to execute an associated Query on a data table in a certain database or a business system, for example, the associated Query statements are based on Structured Query Language (SQL), so that association between data tables can be analyzed based on the collected associated Query statements.
For example, based on the association query statement "select from t1 join t2 on t1.id ═ t2. id", the association between the two data tables t1 and t2 can be obtained by analysis; based on the association query statement "select from t1 join t2 on t1.id ═ t2.id join t3 on t2.id ═ t3. id", the association among the three data tables of data table t1, data table t2 and data table t3 can be obtained through analysis.
In this step, M data table combinations may be obtained through parsing based on the collected N associated query statements, and the number of times of associated query of at least two data tables in each data table combination of the M data table combinations may be counted. It should be noted that any two data table combinations in the M data table combinations may be different.
The data table combination can be composed of at least two data tables which are related and inquired in the related inquiry statement. For example, for the associated query statement "select x from t1 join t2 on t1.id ═ t2. id", a data table combination consisting of data table t1 and data table t2 may be obtained; for the associative query statement "select from t1 join t2 on t1.id ═ t2.id join t3 on t2.id ═ t3. id", a data table combination consisting of data table t1, data table t2, and data table t3, or a first data table combination consisting of data table t1 and data table t2 and a second data table combination consisting of data table 2 and data table 3, can be obtained.
Optionally, the N associated query statements may be traversed, a data table combination corresponding to each associated query statement is obtained through analysis, and the number of the same data table combination in the obtained data table combination is counted as the first association frequency of the data table combination. For example, if there are 10 data table combinations a composed of the data table t1 and the data table t2 and 18 data table combinations B composed of the data table t2 and the data table t3 in the obtained data table combinations, the first association number of the data table combination a is 10, and the first association number of the data table combination B is 18.
It should be noted that, in this embodiment, each data table combination and the first association number thereof may be stored in a storage structure of a key value pair, where the data table combination is used as a key and the first association number thereof is used as a value.
Step 102, determining a first candidate data table combination according to the first association times of each data table combination in the M data table combinations, wherein the first candidate data table combination is used for generating a first wide table.
In this embodiment, the data table combination with the largest first association time among the M data table combinations may be determined as the first candidate data table combination, or the data table combination with the largest first association time and the smallest number of included data tables among the M data table combinations may be determined as the first candidate data table combination, or the data table combination with the largest weighted value between the first association time and the occupied storage space among the M data table combinations may be determined as the first candidate data table combination, and so on.
It should be noted that the first candidate data table combination may include one or at least two candidate data table combinations. When the first candidate data table combination only comprises one candidate data table combination, generating a wide table based on the candidate data table combination; when the first candidate data table combination includes at least two candidate data table combinations, the wide table may be generated based on each candidate data table, or the wide table may be generated by selecting one candidate data table from the at least two candidate data table combinations.
According to the data table processing method, the first association times of each data table combination in M data table combinations are counted according to the collected N association query statements; the data table combination comprises at least two data tables, and the first association times are the times of association query of the at least two data tables in the data table combination; and determining a first candidate data table combination according to the first association times of each data table combination in the M data table combinations, wherein the first candidate data table combination is used for generating a first wide table. Because the data table used for generating the wide table is selected from the associated query times of each data table combination, the accuracy of the selected data table used for generating the wide table can be improved, more redundant wide tables can be avoided from being generated, and the coverage of the generated wide table on the associated query is improved.
Optionally, the first candidate data table is combined as follows: and the data table combination with the maximum first association times in the M data table combinations.
In this embodiment, the data table combination with the largest first association frequency among the M data table combinations may be determined as the first candidate data table combination, which is not only simpler to implement, but also can improve the accuracy of the selected data table for generating the wide table.
Optionally, each of the M data table combinations includes two data tables.
In this embodiment, the number of times of association query between every two data tables may be counted, and the two data tables with the largest number of times of association query may be obtained by arranging the number of times of association query in order from large to small, so as to generate the wide table.
In practical applications, each associated query statement may be analyzed to obtain a corresponding data table combination, for example, for an associated query statement "select from t1 join t2 on t1.id ═ t2.id join t3 on t2.id ═ t3. id", a data table combination composed of data table t1 and data table t2 and a data table combination composed of data table t2 and data table t3 may be obtained; for the associative query statement "select from t2 join t4 on t2.id ═ t4.id join t3 on t2.id ═ t3. id", a data table combination composed of data table t2 and data table t4 and a data table combination composed of data table t2 and data table t3 can be found.
It should be noted that, in this embodiment, a data table combination corresponding to each associated query statement may be determined based on an associated field in the associated query statement, for example, "select from t1 join t2 on t1.id ═ t2.id join t3 on t2.id ═ t3. id", a data table combination composed of the data table t1 and the data table t2 and a data table combination composed of the data table t2 and the data table t3 may be obtained; for "select from t1 join t2 on t1.id ═ t2.id join t3 on t1.id ═ t3. id", a data table combination consisting of data table t1 and data table t2 and a data table combination consisting of data table t1 and data table t3 can be obtained.
In the embodiment of the invention, each data table combination in the M data table combinations comprises two data tables, so that the method is simple and convenient to implement, the accuracy of the selected data table for generating the wide table can be improved, the generation of more redundant wide tables is avoided, and the coverage of the generated wide table on the associated query is improved.
Optionally, the determining a first candidate data table combination according to the first association frequency of each data table combination in the M data table combinations may include:
calculating the second association times of each data table combination in the P data table combinations according to the first association times of each data table combination in the M data table combinations; the P data table combinations at least comprise the M data table combinations, and the second association times are reduced association times under the condition that the wide table is generated based on the data table combinations;
and determining the data table combination with the largest second association times in the P data table combinations as the first candidate data table combination.
In this embodiment, the P data table combinations may only include the M data table combinations, or may include any combination of data tables included in the M data table combinations. The second number of associations is a reduced number of associations under the condition that the wide table is generated based on the data table combination, for example, if the number of associations that can be reduced is 7 when the wide table is generated based on the data table combination a1, the second number of associations of the data table combination a1 is 7.
The present embodiment is described below by way of examples:
for example, the M data table combinations include data table combination a1, data table combination a2, and data table combination A3, and the corresponding first association times are 4, 3, and 2 in sequence, where data table combination a1 includes data table t1 and data table t2, data table combination a2 includes data table t1, data table t2, and data table t3, and data table combination A3 includes data table t1 and data table t3.
In the case that the P data table combinations only include the M data table combinations, that is, the P data table combinations may only include the data table combination a1, the data table combination a2 and the data table combination A3. At this time, if the wide table is generated based on the data table combination a1, the number of association times that can be reduced is 7 (i.e., 4+ 3); if a wide table is generated based on the data table combination a2, the number of associations that can be reduced is 5 (i.e., 3+ 2); if a wide table is generated based on data table combination a3, the number of associations that can be reduced is 6 (i.e., 3 x 2). Thus, the data table combination A1 can be selected to generate a wide table, and the number of association times can be reduced to the maximum.
In the case that the P data table combinations include any combination of the data tables included in the M data table combinations, that is, the P data table combinations may include the data table combination a1, the data table combination a2, the data table combination A3 and the data table combination a4, where the data table combination a4 includes the data table t2 and the data table t3. At this time, if the wide table is generated based on the data table combination a1, the number of association times that can be reduced is 7 (i.e., 4+ 3); if a wide table is generated based on the data table combination a2, the number of associations that can be reduced is 5 (i.e., 3+ 2); if a wide table is generated based on data table combination a3, the number of associations that can be reduced is 6 (i.e., 3 x 2); if a wide table is generated based on the data table combination a4, the number of associations that can be reduced is 3. Thus, the data table combination A1 can be selected to generate a wide table, and the number of association times can be reduced to the maximum.
The embodiment of the invention can further improve the accuracy of the selected data table for generating the wide table, avoid generating more redundant wide tables and improve the coverage of the generated wide table to the association query by respectively calculating the association times which can be reduced when the wide table is generated based on each data table combination and selecting the data table combination with the maximum reduced association times to generate the wide table.
Optionally, after determining a first candidate data table combination according to the first association times of each data table combination in the M data table combinations, the method may further include:
updating M-1 data table combinations according to the first candidate data table combination; wherein the M-1 data table combinations are data table combinations of the M data table combinations except the first candidate data table combination;
calculating the third association times of each data table combination in the Q data table combinations according to the first association times of each data table combination in the updated M-1 data table combinations; wherein the Q data table combinations at least include the updated M-1 data table combinations, and the third association times are association times that are reduced under the condition that a wide table is generated based on the data table combinations;
and determining the data table combination with the maximum third association times in the Q data table combinations as a second candidate data table combination, wherein the second candidate data table combination is used for generating a second wide table.
In this embodiment, the target data table combination including all the data tables in the first candidate data table combination in the M-1 data table combinations may be updated. Specifically, the sub-data table combinations in the target data table combination may be replaced by a wide table, that is, a wide table generated by all data tables in the candidate data table combination, where the sub-data table combination includes all data tables in the candidate data table combination.
For example, the M data table combinations include data table combination a1 (i.e., data table t1 and data table t2), data table combination a2 (i.e., data table t1, data table t2 and data table t3), and data table combination A3 (i.e., data table t1 and data table t3), where if the first candidate data table combination is data table combination a1, data table combination a2 is updated to include data tables t1-t2 (i.e., the wide tables generated by data table t1 and data table t2) and data table t3, and data table combination A3 is unchanged.
The Q data table combinations may include only the M-1 data table combinations, or may include any combination of data tables included in the M-1 data table combinations.
The present embodiment is described below by way of examples:
for example, the updated M-1 data table combinations include data table combination A2 (i.e., data tables t1-t2 and t3) and data table combination A3 (i.e., data table t1 and t3), where the first association number of data table combination A2 is 3 and the first association number of data table combination A3 is 2.
The Q data table combinations include the M-1 data table combinations, that is, the P data table combinations may include only the data table combination A2 and the data table combination A3. At this time, the number of times of association that can be reduced when the wide table is generated based on the data table combination a2 is 3, and the number of times of association that can be reduced when the wide table is generated based on the data table combination A3 is 2. Thus, the data table combination A2 can be selected to generate a wide table, and the number of association times can be reduced to the maximum.
It should be noted that, in the embodiment of the present invention, after a candidate data table combination is obtained each time, the remaining data table combinations are updated based on the obtained candidate data table combination, and the association times that can be reduced under the condition of generating the wide table based on the data table combination are calculated, so as to obtain the candidate data table combination for generating the wide table until a preset condition is reached, for example, the size of the generated wide table reaches the size of the preset storage space, or the number of the generated wide tables reaches the preset number, and the like.
It should be noted that different combinations of candidate data tables are used to generate different wide tables, for example, a first combination of candidate data tables is used to generate a first wide table, and a second combination of candidate data tables is used to generate a second wide table. According to the embodiment of the invention, the M-1 data table combinations are updated through the first candidate data table combination, and the association times which can be reduced when each data table combination in the updated M-1 data table combinations is used as the wide table are recalculated, so that the accuracy of the selected second candidate data table combination can be improved, and the generation of more redundant wide tables is avoided.
Optionally, after determining a first candidate data table combination according to the first association times of each data table combination in the M data table combinations, the method may further include:
under the condition that the first candidate data table combination comprises at least two candidate data table combinations, acquiring a target candidate data table combination which occupies the smallest storage space in the at least two candidate data table combinations; wherein the target candidate data table combination is used to generate the first wide table.
In this embodiment, when the first candidate data table combination includes at least two candidate data table combinations, the candidate data table combination (i.e., the target candidate data table combination) that occupies the smallest storage space may be selected to generate the wide table, so that the storage space may be saved.
It should be noted that the storage space occupied by the candidate data table combination may be the sum of storage spaces occupied by all data tables in the candidate data table combination, or may also be a storage space occupied by a wide table generated by the candidate data table combination, which is not limited in this embodiment.
Optionally, the target candidate data table combination may be: and the generated wide table in the at least two candidate data table combinations occupies the candidate data table combination with the minimum storage space.
In this embodiment, the wide table may be generated based on each of at least two candidate data table combinations, for example, the wide table a1 is generated based on the candidate data table combination a1, the wide table a2 is generated based on the candidate data table combination a2, and if the storage space occupied by the wide table a1 is larger than that occupied by the wide table a2, the target candidate data table combination is the candidate data table combination a 1.
The embodiment of the invention generates the wide table by the candidate data table combination with the minimum storage space occupied by the wide table generated in at least two candidate data table combinations, and can save the storage space.
The following describes embodiments of the present invention with reference to examples:
referring to fig. 2, the data table processing method provided by the embodiment of the present invention includes the following steps:
step 201, generating key-value pairs by analyzing each associated query statement.
In this embodiment, each association query statement may be analyzed to obtain a data table combination and association query times of the association query. The data table combination can be used as a key, and the associated query times are values.
It should be noted that, for each associated query language, the value of the corresponding key-value pair is 1.
The same key sums the values, step 202.
In this step, the same values of the keys in each key-value pair are summed, for example, if the keys of the first key-value pair and the second key-value pair are both the data table combination a, the value of the first key-value pair and the value of the second key-value pair may be added as the value corresponding to the key (i.e., the data table combination a).
Step 203, selecting the key corresponding to the maximum value as the candidate data table combination.
In this embodiment, the key corresponding to the largest value may be selected as the candidate data table combination. For example, if the value corresponding to the data table combination a is the largest, the data table combination a is taken as the candidate data table combination, so as to generate the wide table by the data tables in the candidate data table combination.
In the case where there are a plurality of maximum values, a data table combination that occupies the smallest storage space may be selected.
And step 204, updating the associated query statement through the selected candidate data table combination.
In this embodiment, the associated query statement may be updated through the selected candidate data table combination, or the key in the key value pair corresponding to the associated query statement may be updated directly through the selected candidate data table combination. For example, in a certain association query statement, there are an association query data table t1, a data table t2 and a data table t3, and the candidate data table combination includes a data table t1 and a data table t2, the association query statement may be updated to an association query data table t1-t2 and a data table t3, where the data tables t1-t2 may be wide tables generated for the data tables t1 and t2.
And step 205, judging whether an iteration condition is met.
In this embodiment, the iteration condition may be whether the size of the generated wide table reaches a preset size of the storage space. Specifically, the process may be ended when the size of the generated wide table reaches the preset size of the storage space, and the process may return to step 201 when the size of the generated wide table does not reach the preset size of the storage space.
Step 206, recording the candidate data table combination.
In this embodiment, after selecting the candidate data table combination each time, the selected candidate data table combination may be recorded to generate the wide table.
According to the data table processing method provided by the embodiment of the invention, each iteration selects the data table combination which can maximally reduce the associated query times, and the selected data table combination is substituted for the associated query, and then the optimal data table combination is continuously selected in an iteration manner. Therefore, under the condition of using only a small number of wide tables, more associated queries can be covered, the number of the wide tables can be reduced, the storage space is saved, the covered associated queries are increased, the number of associated queries can be reduced by using the wide tables for more queries, and the query efficiency is improved.
Referring to fig. 3, fig. 3 is a structural diagram of a data table processing apparatus according to an embodiment of the present invention. As shown in fig. 3, the data table processing apparatus 300 includes:
a counting module 301, configured to count a first association frequency of each data table combination in the M data table combinations according to the collected N association query statements; the data table combination comprises at least two data tables, and the first association times are the times of association query of the at least two data tables in the data table combination;
a first determining module 302, configured to determine a first candidate data table combination according to the first association times of each data table combination of the M data table combinations, where the first candidate data table combination is used to generate a first wide table.
Optionally, the first candidate data table is combined as follows: and the data table combination with the maximum first association times in the M data table combinations.
Optionally, each of the M data table combinations includes two data tables.
Optionally, the first determining module includes:
the first calculation unit is used for calculating the second association times of each data table combination in the P data table combinations according to the first association times of each data table combination in the M data table combinations; the P data table combinations at least comprise the M data table combinations, and the second association times are reduced association times under the condition that the wide table is generated based on the data table combinations;
a determining unit, configured to determine, as the first candidate data table combination, a data table combination with a largest second association frequency among the P data table combinations.
Optionally, the apparatus further comprises:
the updating module is used for updating M-1 data table combinations according to the first candidate data table combination; wherein the M-1 data table combinations are data table combinations of the M data table combinations except the first candidate data table combination;
the calculating module is used for calculating the third association times of each data table combination in the Q data table combinations according to the first association times of each data table combination in the updated M-1 data table combinations; wherein the Q data table combinations at least include the updated M-1 data table combinations, and the third association times are association times that are reduced under the condition that a wide table is generated based on the data table combinations;
and a second determining module, configured to determine a data table combination with the largest third association time among the Q data table combinations as a second candidate data table combination, where the second candidate data table combination is used to generate a second wide table.
Optionally, the apparatus further comprises:
an obtaining module, configured to, after determining a first candidate data table combination according to a first association number of each data table combination of the M data table combinations, obtain, when the first candidate data table combination includes at least two candidate data table combinations, a target candidate data table combination that occupies a minimum storage space among the at least two candidate data table combinations; wherein the target candidate data table combination is used to generate the first wide table.
Optionally, the target candidate data table is combined as follows: and the generated wide table in the at least two candidate data table combinations occupies the candidate data table combination with the minimum storage space.
The data table processing apparatus 300 provided in the embodiment of the present invention can implement each process in the foregoing method embodiments, and is not described here again to avoid repetition.
The data table processing apparatus 300 of the embodiment of the present invention includes a statistics module 301, configured to count a first association frequency of each data table combination of M data table combinations according to the collected N association query statements; the data table combination comprises at least two data tables, and the first association times are the times of association query of the at least two data tables in the data table combination; a first determining module 302, configured to determine a first candidate data table combination according to the first association times of each data table combination of the M data table combinations, where the first candidate data table combination is used to generate a first wide table. Because the data table used for generating the wide table is selected from the associated query times of each data table combination, the accuracy of the selected data table used for generating the wide table can be improved, more redundant wide tables can be avoided from being generated, and the coverage of the generated wide table on the associated query is improved.
Referring to fig. 4, fig. 4 is a block diagram of a data table processing apparatus according to still another embodiment of the present invention, and as shown in fig. 4, the data table processing apparatus 400 includes: a processor 401, a memory 402 and a computer program stored on the memory 402 and operable on the processor, the various components in the data transmission device 400 being coupled together by a bus interface 403, the computer program, when executed by the processor 401, performing the steps of:
according to the collected N associated query statements, counting the first association times of each data table combination in the M data table combinations; the data table combination comprises at least two data tables, and the first association times are the times of association query of the at least two data tables in the data table combination;
and determining a first candidate data table combination according to the first association times of each data table combination in the M data table combinations, wherein the first candidate data table combination is used for generating a first wide table.
Optionally, the first candidate data table is combined as follows: and the data table combination with the maximum first association times in the M data table combinations.
Optionally, each of the M data table combinations includes two data tables.
Optionally, the computer program when executed by the processor 401 is further configured to:
calculating the second association times of each data table combination in the P data table combinations according to the first association times of each data table combination in the M data table combinations; the P data table combinations at least comprise the M data table combinations, and the second association times are reduced association times under the condition that the wide table is generated based on the data table combinations;
and determining the data table combination with the largest second association times in the P data table combinations as the first candidate data table combination.
Optionally, the computer program when executed by the processor 401 is further configured to:
after the data table combination with the largest second association times in the P data table combinations is determined as the first candidate data table combination, updating M-1 data table combinations according to the first candidate data table combination; wherein the M-1 data table combinations are data table combinations of the M data table combinations except the first candidate data table combination;
calculating the third association times of each data table combination in the Q data table combinations according to the first association times of each data table combination in the updated M-1 data table combinations; wherein the Q data table combinations at least include the updated M-1 data table combinations, and the third association times are association times that are reduced under the condition that a wide table is generated based on the data table combinations;
and determining the data table combination with the maximum third association times in the Q data table combinations as a second candidate data table combination, wherein the second candidate data table combination is used for generating a second wide table.
Optionally, the computer program when executed by the processor 401 is further configured to:
after determining a first candidate data table combination according to the first association times of each data table combination in the M data table combinations, under the condition that the first candidate data table combination comprises at least two candidate data table combinations, acquiring a target candidate data table combination with the minimum occupied storage space in the at least two candidate data table combinations; wherein the target candidate data table combination is used to generate the first wide table.
Optionally, the target candidate data table is combined as follows: and the generated wide table in the at least two candidate data table combinations occupies the candidate data table combination with the minimum storage space.
An embodiment of the present invention further provides a data table processing apparatus, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, where the computer program, when executed by the processor, implements each process of the data table processing method embodiment, and can achieve the same technical effect, and is not described herein again to avoid repetition.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned data table processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (9)

1. A method for processing a data table, comprising:
according to the collected N associated query statements, counting the first association times of each data table combination in the M data table combinations; the data table combination comprises at least two data tables, and the first association times are the times of association query of the at least two data tables in the data table combination;
determining a first candidate data table combination according to the first association times of each data table combination in the M data table combinations, wherein the first candidate data table combination is used for generating a first wide table;
determining a first candidate data table combination according to the first association times of each data table combination in the M data table combinations, including:
calculating the second association times of each data table combination in the P data table combinations according to the first association times of each data table combination in the M data table combinations; the P data table combinations at least comprise the M data table combinations, and the second association times are reduced association times under the condition that the wide table is generated based on the data table combinations;
and determining the data table combination with the largest second association times in the P data table combinations as the first candidate data table combination.
2. The method of claim 1, wherein the first candidate data table is combined as: and the data table combination with the maximum first association times in the M data table combinations.
3. The method of claim 2, wherein each of the M combinations of data tables includes two data tables.
4. The method of claim 1, wherein after determining the data table combination with the largest second association number of the P data table combinations as the first candidate data table combination, the method further comprises:
updating M-1 data table combinations according to the first candidate data table combination; wherein the M-1 data table combinations are data table combinations of the M data table combinations except the first candidate data table combination;
calculating the third association times of each data table combination in the Q data table combinations according to the first association times of each data table combination in the updated M-1 data table combinations; wherein the Q data table combinations at least include the updated M-1 data table combinations, and the third association times are association times that are reduced under the condition that a wide table is generated based on the data table combinations;
and determining the data table combination with the maximum third association times in the Q data table combinations as a second candidate data table combination, wherein the second candidate data table combination is used for generating a second wide table.
5. The method of claim 1, wherein after determining a first candidate data table combination based on the first number of associations for each of the M data table combinations, the method further comprises:
under the condition that the first candidate data table combination comprises at least two candidate data table combinations, acquiring a target candidate data table combination which occupies the smallest storage space in the at least two candidate data table combinations; wherein the target candidate data table combination is used to generate the first wide table.
6. The method of claim 5, wherein the target candidate data table is combined as: and the generated wide table in the at least two candidate data table combinations occupies the candidate data table combination with the minimum storage space.
7. A data table processing apparatus, comprising:
the statistical module is used for counting the first association times of each data table combination in the M data table combinations according to the collected N associated query statements; the data table combination comprises at least two data tables, and the first association times are the times of association query of the at least two data tables in the data table combination;
a first determining module, configured to determine a first candidate data table combination according to a first association number of each data table combination of the M data table combinations, where the first candidate data table combination is used to generate a first wide table;
the first determining module includes:
the first calculation unit is used for calculating the second association times of each data table combination in the P data table combinations according to the first association times of each data table combination in the M data table combinations; the P data table combinations at least comprise the M data table combinations, and the second association times are reduced association times under the condition that the wide table is generated based on the data table combinations;
a determining unit, configured to determine, as the first candidate data table combination, a data table combination with a largest second association frequency among the P data table combinations.
8. A data sheet processing apparatus comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the data sheet processing method as claimed in any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the data table processing method according to any one of claims 1 to 6.
CN201910184764.3A 2019-03-12 2019-03-12 Data table processing method and device Active CN111694891B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910184764.3A CN111694891B (en) 2019-03-12 2019-03-12 Data table processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910184764.3A CN111694891B (en) 2019-03-12 2019-03-12 Data table processing method and device

Publications (2)

Publication Number Publication Date
CN111694891A CN111694891A (en) 2020-09-22
CN111694891B true CN111694891B (en) 2021-01-12

Family

ID=72474780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910184764.3A Active CN111694891B (en) 2019-03-12 2019-03-12 Data table processing method and device

Country Status (1)

Country Link
CN (1) CN111694891B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951552A (en) * 2017-03-27 2017-07-14 重庆邮电大学 A kind of user behavior data processing method based on Hadoop
CN109388637A (en) * 2018-09-21 2019-02-26 北京京东金融科技控股有限公司 Data warehouse information processing method, device, system, medium
CN109446197A (en) * 2018-09-26 2019-03-08 深圳壹账通智能科技有限公司 User information processing method, device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10528596B2 (en) * 2014-09-26 2020-01-07 Oracle International Corporation System and method for consistent reads between tasks in a massively parallel or distributed database environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951552A (en) * 2017-03-27 2017-07-14 重庆邮电大学 A kind of user behavior data processing method based on Hadoop
CN109388637A (en) * 2018-09-21 2019-02-26 北京京东金融科技控股有限公司 Data warehouse information processing method, device, system, medium
CN109446197A (en) * 2018-09-26 2019-03-08 深圳壹账通智能科技有限公司 User information processing method, device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
iVA-File: Efficiently Indexing Sparse Wide Tables in Community Systems;Boduo Li 等;《2009 IEEE 25th International Conference on Data Engineering》;20090410;210-221 *
基于SaaS架构的可定制模型的研究;宋仁才;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120915(第 09 期);I139-88 *
车辆监控系统中数据仓库的研究与优化;康介鹏;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20150815(第 08 期);C034-209 *

Also Published As

Publication number Publication date
CN111694891A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
JP7343568B2 (en) Identifying and applying hyperparameters for machine learning
US11372851B2 (en) Systems and methods for rapid data analysis
WO2016110121A1 (en) Method and device for data rasterization and method and device for analyzing user behavior
CN111294819B (en) Network optimization method and device
CN107870956B (en) High-utility item set mining method and device and data processing equipment
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
EP4272087A1 (en) Automated linear clustering recommendation for database zone maps
CN111435406A (en) Method and device for correcting database statement spelling errors
CN110888876A (en) Method and device for generating database script, storage medium and computer equipment
CN111626783B (en) Offline information setting method and device for realizing event conversion probability prediction
CN111694891B (en) Data table processing method and device
CN110674387A (en) Method, apparatus, and computer storage medium for data search
CN113010539A (en) Data processing method and device
CN110968790A (en) Latent customer intelligent recommendation method, device and storage medium based on big data
KR101568800B1 (en) Real-time issue search word sorting method and system
KR101557960B1 (en) Device for selecting core kyword, method for selecting core kyword, and method for providing search service using the same
CN114048216B (en) Index selection method, electronic device and storage medium
Yaghini et al. An efficient heuristic algorithm for the capacitated median problem
CN117520386A (en) Index query method, system, electronic device and storage medium
CN114547286A (en) Information searching method and device and electronic equipment
CN112765118A (en) Log query method, device, equipment and storage medium
CN114943004B (en) Attribute graph query method, attribute graph query device, and storage medium
CN110659345B (en) Data pushing method, device and equipment for fact report and storage medium
CN112395510A (en) Method and device for determining target user based on activity
Schäfer et al. Sampling with incremental mapreduce

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant