CN111125158B

CN111125158B - Data table processing method, device, medium and electronic equipment

Info

Publication number: CN111125158B
Application number: CN201911087888.6A
Authority: CN
Inventors: 韩佩利; 施小江; 王方博; 何旺
Original assignee: Taikang Insurance Group Co Ltd
Current assignee: Taikang Insurance Group Co Ltd
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2023-03-31
Anticipated expiration: 2039-11-08
Also published as: CN111125158A

Abstract

The embodiment of the invention provides a data sheet processing method, a data sheet processing device, a computer readable medium and electronic equipment, wherein the method comprises the following steps: acquiring a plurality of historical query sentences related to a source data table, and determining query fields in the historical query sentences and query frequency information of each query field; determining a plurality of field relation matrixes according to the query fields and the query frequency information of each query field; determining field association coefficients of the field relationship matrix according to the query frequency information of adjacent query fields in the field relationship matrix, and selecting a target field relationship matrix from the multiple field relationship matrices according to the field association coefficients; and determining a plurality of field splitting sequences according to the query frequency information of the adjacent query fields in the target field relation matrix, and determining a plurality of sub data tables corresponding to the source data table according to the field splitting sequences. The method can simplify the fields and improve the data query efficiency.

Description

Data table processing method, device, medium and electronic equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data table processing method, a data table processing apparatus, a computer readable medium, and an electronic device.

Background

For a traditional relational database, at the beginning of data table design, more fields are often put into the same table as much as possible according to experience or understanding of business under the current situation, so as to increase the efficiency of storing and reading database queries. With the development of services, more fields for representing new service scenes need to be continuously added into the data table. However, as the number of fields in the same data table is too large, the data table becomes a wide table, and as the data amount increases, the efficiency of querying the wide table decreases. Therefore, how to improve the query efficiency of the data table is a problem to be solved urgently at present.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present invention and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.

Disclosure of Invention

Embodiments of the present invention provide a data table processing method, a data table processing apparatus, a computer readable medium, and an electronic device, so as to overcome technical problems of data table field redundancy, low query efficiency, and the like, at least to a certain extent.

Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.

According to a first aspect of the embodiments of the present invention, there is provided a data table processing method, including:

acquiring a plurality of historical query sentences related to a source data table, and determining query fields in the historical query sentences and query frequency information of each query field;

determining a plurality of field relation matrixes according to the query fields and the query frequency information of each query field;

determining field association coefficients of the field relationship matrix according to the query frequency information of adjacent query fields in the field relationship matrix, and selecting a target field relationship matrix from the multiple field relationship matrices according to the field association coefficients;

and determining a plurality of field splitting sequences according to the query frequency information of the adjacent query fields in the target field relation matrix, and determining a plurality of sub data tables corresponding to the source data table according to the field splitting sequences.

In some embodiments of the present invention, based on the above technical solution, the obtaining a plurality of historical query statements related to a source data table includes:

determining a database where a source data table is located, and acquiring a data interaction log of the database;

extracting a plurality of historical query statements related to the source data table from the data interaction log.

In some embodiments of the present invention, based on the above technical solutions, the query number information of the query field includes a total number of accumulated queries of one query field and a total number of common queries of two different query fields.

In some embodiments of the present invention, based on the above technical solutions, the determining a plurality of field relationship matrices according to the query fields and the query frequency information of each query field includes:

sorting the query fields to obtain a plurality of field sequences corresponding to different field arrangement orders;

acquiring the accumulated total query times of each query field and the common query times of each query field and another query field;

and determining a plurality of field relation matrixes corresponding to the field sequences respectively according to the accumulated total query times of each query field and the common total query times of each query field.

In some embodiments of the present invention, based on the above technical solution, the determining a field association coefficient of the field relationship matrix according to the query frequency information of adjacent query fields in the field relationship matrix includes:

determining in-line field coefficients of each query field in a matrix row of the field relationship matrix according to the query times information of each query field and adjacent query fields;

accumulating the in-row field coefficients of each query field in the matrix row to obtain the inter-row field coefficients of the matrix row;

and accumulating the field coefficients between rows of each matrix row in the field relation matrix to obtain the field association coefficients of the field relation matrix.

In some embodiments of the present invention, based on the above technical solutions, the determining a plurality of field splitting sequences according to the query frequency information of adjacent query fields in the target field relationship matrix includes:

determining a target field sequence related to the target field relation matrix;

acquiring the accumulated total query times in the query time information of each query field;

determining the query frequency difference of two adjacent query fields according to the accumulated query total frequency;

determining one or more field segmentation positions in the target field sequence according to the query time difference;

and splitting the target field sequence according to the field splitting positions to obtain a plurality of field splitting sequences.

In some embodiments of the present invention, based on the above technical solution, the determining, according to the field splitting sequence, a plurality of sub data tables corresponding to the source data table includes:

extracting field data from the source data table according to the query field in the field splitting sequence;

and combining the field data according to the arrangement sequence of each query field in the field splitting sequence to obtain a plurality of sub data tables corresponding to the source data table.

According to a second aspect of the present invention, there is provided a data table processing apparatus comprising:

the field determination module is configured to acquire a plurality of historical query statements related to a source data table, and determine query fields in the historical query statements and query frequency information of each query field;

the matrix determination module is configured to determine a plurality of field relation matrixes according to the query fields and the query frequency information of each query field;

the matrix screening module is configured to determine field association coefficients of the field relationship matrix according to the query frequency information of adjacent query fields in the field relationship matrix, and select a target field relationship matrix from the plurality of field relationship matrices according to the field association coefficients;

and the data table splitting module is configured to determine a plurality of field splitting sequences according to the query frequency information of the adjacent query fields in the target field relation matrix, and determine a plurality of sub data tables corresponding to the source data table according to the field splitting sequences.

According to a third aspect of embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program, when executed by a processor, implementing the data table processing method according to the first aspect of the embodiments described above.

According to a fourth aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of data table processing as described in the first aspect of the embodiments above.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

in the technical solutions provided in some embodiments of the present invention, a field relationship matrix corresponding to different field arrangement modes can be established by counting query times information of each query field in a historical query statement, and then the source data table is split according to the association degree of the query fields in the field relationship matrix, so that a plurality of fields can be simplified and highly-available sub data tables can be obtained while maintaining a query logic relationship between the query fields, and thus, the data query efficiency can be greatly improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 schematically illustrates a flow chart of steps of a data table processing method in some embodiments of the invention.

FIG. 2 schematically illustrates a flow chart of steps for obtaining historical query statements in some embodiments of the invention.

FIG. 3 is a flow chart that schematically illustrates the steps of determining a field relationship matrix in some embodiments of the present invention.

Fig. 4 schematically illustrates a flow chart of the steps of determining field association coefficients in some embodiments of the present invention.

FIG. 5 is a flow chart that schematically illustrates the steps of determining a field split sequence in some embodiments of the present invention.

FIG. 6 is a flow chart that schematically illustrates the steps for determining a sub-data table in some embodiments of the present invention.

FIG. 7 schematically illustrates a data table splitting method in an application scenario.

FIG. 8 schematically illustrates a block diagram of the components of a data table processing apparatus in some embodiments of the invention.

FIG. 9 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flowcharts shown in the figures are illustrative only and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In the related art of the present invention, for a wide table with many fields stored therein, the fields in the wide table may be generally split based on experience or business needs, and some fields are divided into one data table and other fields are divided into another data table, so as to achieve the purpose of vertically partitioning the wide table. However, splitting the data table depending on experience or human subjective factors easily destroys the layout logic of the data table itself, which causes the usability of the split data table to be poor, and further affects the data query efficiency of the data table.

In view of the above problems in the related art, the present invention provides a data sheet processing method, a data sheet processing apparatus, a computer readable medium, and a computing device. The technical solution of the present invention will be described in detail with reference to the embodiments.

FIG. 1 schematically illustrates a flow chart of steps of a data table processing method in some embodiments of the invention. As shown in fig. 1, the method may mainly include the steps of:

step S110, a plurality of historical query sentences related to the source data table are obtained, and query fields in the historical query sentences and query frequency information of each query field are determined.

The historical Query statement may be a Structured Query Language (SQL) based Query statement. SQL is a special purpose programming language, a database query and programming language, used to access data and query, update, and manage relational database systems. One or more query fields may be determined in the historical query statement, for example, if a historical query statement is "select n, d from table," then the query fields therein are n and d. The query frequency information of each query field can be obtained by counting the query fields in each historical query statement. The query number information of the query field may include a cumulative total number of queries of one query field and a total number of queries in common of two different query fields. The total accumulated query times represent the total times of occurrence of one query field in one historical query statement, and the total common query times represent the total times of simultaneous occurrence of two query fields in one historical query statement. For example, based on a historical query statement "select n, d from table", the cumulative total number of queries for field n is 1, the cumulative total number of queries for field d is 1, and the total number of queries for field n and field d together is 1. And combining with other historical query sentences, and continuously accumulating the query frequency information of each corresponding query field. If a history query statement "select n, q from table" is added, the total number of accumulated queries in the field n is increased to 2, the total number of accumulated queries in the field d is still 1, the total number of accumulated queries in the field q is also 1, the total number of common queries in the field n and the field d is 1, the total number of common queries in the field n and the field q is also 1, and the total number of common queries in the field d and the field q is 0.

And S120, determining a plurality of field relation matrixes according to the query fields and the query frequency information of each query field.

Based on the acquired historical query statement, each query field is used as the element attribute of the row and the column of the field relation matrix, and the query frequency information of each query field is correspondingly filled in the field relation matrix and is used as a matrix element. For example, based on a historical query statement "select n, d from table", a field relationship matrix M1 can be obtained:

in the field relation matrix M1, the elements corresponding to n rows and n columns represent the total accumulated query times of the field n, the elements corresponding to d rows and d columns represent the total accumulated query times of the field d, and the elements corresponding to n rows and d columns and d rows and n columns both represent the total common query times of the field n and the field d. When a plurality of query fields of a plurality of historical query sentences are involved, a plurality of field relation matrixes can be correspondingly determined due to different arrangement modes of the query fields in the matrixes.

Step S130, determining field association coefficients of the field relationship matrix according to the query frequency information of adjacent query fields in the field relationship matrix, and selecting a target field relationship matrix from the multiple field relationship matrices according to the field association coefficients.

Each matrix element of the field relation matrix corresponds to the query frequency information of different query fields, the field association coefficient determined based on the query frequency information reflects the association degree of adjacent query fields in the field relation matrix, and when the number of times that two adjacent query fields are queried at the same time is larger, the association degree of the two query fields is higher, and the correspondingly determined field association coefficient is also larger. On the basis, the step can select a field relation matrix with the maximum field correlation coefficient as a target field relation matrix.

And S140, determining a plurality of field splitting sequences according to the query frequency information of the adjacent query fields in the target field relation matrix, and determining a plurality of sub data tables corresponding to the source data table according to the field splitting sequences.

A field relation matrix corresponds to an arrangement mode of query fields, and when query times information of two adjacent query fields is closer, the probability that the two query fields are queried together is relatively higher, so that the two query fields tend to be kept in the same data table. On the contrary, if the query times information of two adjacent query fields is greatly different, the probability that the two query fields are queried together is relatively low, so that the two query fields can be classified into different data tables. Based on the splitting principle, the step can determine a plurality of field splitting sequences, and further determine a plurality of sub data tables corresponding to the source data table.

In the data table processing method provided by the invention, a field relation matrix corresponding to different field arrangement modes can be established by counting the query frequency information of each query field in the historical query statement, and then the source data table is split according to the association degree of the query fields in the field relation matrix, so that the query logic relation among the query fields is kept, and simultaneously, the sub data table with a plurality of simplified fields and high availability is obtained, thereby greatly improving the data query efficiency.

The historical query statement can reflect the query habit of the user, the conventional service requirement, the internal logic of the query field and other information to a certain extent. FIG. 2 is a flow chart that schematically illustrates the steps of obtaining historical query statements, in some embodiments of the present invention. As shown in fig. 2, on the basis of the above embodiment, the obtaining of the plurality of historical query statements related to the source data table in step S110 may include the following steps:

and S210, determining a database where the source data table is located, and acquiring a data interaction log of the database.

The source data table can be stored in various relational databases such as DB2, mySQL, etc., and the data interaction log of the database records each interaction information of the database. For example, in a MySQL database, a binary log binlog may be used to record SQL statements that a user operates on the database.

Step S220, extracting a plurality of historical query statements related to the source data table from the data interaction log.

A plurality of historical query statements related to the source data table can be extracted from the acquired data interaction log, and the historical query statements can correspond to the same or different query fields.

The historical query statements acquired by using the data interaction log have high accuracy and comprehensiveness, and the reliability of field relationship analysis can be improved.

FIG. 3 is a flow chart that schematically illustrates the steps of determining a field relationship matrix in some embodiments of the present invention. As shown in fig. 3, on the basis of the above embodiments, step s120. Determining a plurality of field relationship matrices according to the query fields and the query frequency information of each query field may include the following steps:

step S310, the query fields are sorted to obtain a plurality of field sequences corresponding to different field arrangement orders.

Different field sequences can be correspondingly obtained by different query field arrangement modes, and corresponding multiple field split sequences can be obtained in subsequent steps by finding a proper split point in the field sequences. For example, two query fields n and d may form a field sequence [ n, d ], and after adding a query field q on the basis of the field sequence, three new field sequences [ q, n, d ], [ n, q, d ] and [ n, d, q ] may be obtained. Different field sequences have different query field adjacency relations, so different field sequence splitting modes can be obtained.

And S320, acquiring the accumulated total query times of each query field and the other query field.

The query times information of each query field can be obtained by counting and calculating the query fields in the historical query sentences, and the query times information comprises the accumulated total query times of one query field and the common total query times of one query field and the other query field.

And S330, determining a plurality of field relation matrixes respectively corresponding to the field sequences according to the accumulated total query times of each query field and the common total query times of each query field.

With the respective query fields as the attribute of elements constituting matrix rows and matrix columns and the cumulative total number of queries and the common total number of queries for each query field as matrix elements, a plurality of field relationship matrices corresponding to the respective field sequences can be determined.

Fig. 4 schematically illustrates a flow chart of the steps of determining field association coefficients in some embodiments of the present invention. As shown in fig. 4, on the basis of the foregoing embodiments, the determining, in step S130, the field association coefficient of the field relationship matrix according to the query frequency information of adjacent query fields in the field relationship matrix may include the following steps:

and S410, determining in-line field coefficients of each query field in a matrix row of the field relation matrix according to the query times information of each query field and adjacent query fields.

The field relationship matrix is composed of a plurality of matrix rows, and in each matrix row, the in-row field coefficients of the respective query fields are first determined. The in-row field coefficients of a query field may be determined by the query times information corresponding to the query field in the matrix row and the query times information corresponding to one or two adjacent query fields of the query field. The in-row field coefficients reflect the degree of correlation between each query field and adjacent query fields within a matrix row.

And S420, accumulating the in-row field coefficients of each query field in the matrix row to obtain the inter-row field coefficients of the matrix row.

After the in-line field coefficients of each query field in a matrix row are determined, the in-line field coefficients of all query fields in the matrix row are accumulated to obtain the inter-row field coefficients of the matrix row, and accordingly, the inter-row field coefficients corresponding to each matrix row can be obtained. The inter-row field coefficients reflect the overall degree of correlation of all query fields within a matrix row.

And S430, accumulating the field coefficients among the rows of each matrix row in the field relation matrix to obtain the field association coefficients of the field relation matrix.

After determining the inter-row field coefficients of each matrix row, the inter-row field coefficients of each matrix row may be accumulated to obtain the field association coefficients of the field relationship matrix. The obtained field association coefficient reflects the overall association degree of all query fields in the field relation matrix. The larger the field association coefficient, the higher the overall association degree of the query field in the field relationship matrix. On this basis, the embodiment may select a field relationship matrix with the highest field association coefficient as a target field relationship matrix, and then determine the field splitting sequence according to the target field relationship matrix.

FIG. 5 is a flow chart that schematically illustrates the steps of determining a field split sequence in some embodiments of the present invention. As shown in fig. 5, on the basis of the above embodiments, the determining a plurality of field splitting sequences according to the query frequency information of the adjacent query fields in the target field relationship matrix in step S140 may include the following steps:

and step S510, determining a target field sequence related to the target field relation matrix.

A field relation matrix is essentially one arrangement of a plurality of query fields, and the step can determine a target field sequence related to the target field relation matrix. The target field sequence represents the result of ordering the query fields by a positional adjacency with a high overall degree of relevance.

And S520, acquiring the accumulated query total times in the query time information of each query field.

The corresponding cumulative total number of queries may be obtained based on the query number information for each query field. For example, if a target field sequence determined in step S510 is [ q, n, d ], then this step may obtain that the cumulative query total number of the field q is 1, the cumulative query total number of the field n is 3, and the cumulative query total number of the field d is 2.

And S530, determining the query frequency difference value of two adjacent query fields according to the accumulated query total frequency.

The query number difference of two adjacent query fields can be determined according to the accumulated total query number of the two query fields. For example, the difference between the number of queries for field q and field n is 3-1=2, and the difference between the number of queries for field n and field d is 3-2=1.

And S540, determining one or more field segmentation positions in the target field sequence according to the query time difference.

The larger the difference between the query times of two adjacent query fields is, the lower the association degree of the two query fields is relatively, so that the step can determine a field division position between the two query fields with the larger difference between the query times. One or more field segmentation positions can be determined in the target field sequence by sorting according to the size of the query time difference.

And S550, splitting the target field sequence according to the field splitting positions to obtain a plurality of field splitting sequences.

According to the actual data table splitting requirement, a corresponding number of field splitting positions may be determined in step S540, and then the target field sequence may be split into a plurality of field splitting sequences according to each field splitting position in this step.

The splitting method provided by the embodiment can keep the query fields with high association degree in the same field splitting sequence. The source data table may be further split based on the field splitting sequence to obtain a plurality of sub data tables. FIG. 6 is a flow chart that schematically illustrates the steps for determining a sub-data table in some embodiments of the present invention. As shown in fig. 6, on the basis of the above embodiments, the determining, according to the field splitting sequence, a plurality of sub data tables corresponding to the source data table in step S140 may include the following steps:

and S610, extracting field data from the source data table according to the query field in the field splitting sequence.

The plurality of field splitting sequences obtained by splitting one target field sequence respectively correspond to different query fields, and corresponding field data can be extracted from the source data table according to the query fields in each field splitting sequence. For example, if a field splitting sequence is [ n, d ], then the field data corresponding to the query field n and the field data corresponding to the query field d can be extracted from the source data table accordingly.

And S620, combining the field data according to the arrangement sequence of each query field in the field splitting sequence to obtain a plurality of sub data tables corresponding to the source data table.

After the field data corresponding to the query field is obtained in step S610, the field data are combined according to the arrangement order of the query field in the field splitting sequence to form a sub data table, and each field splitting sequence may determine a corresponding sub data table.

The details of the data table processing method in the above embodiment of the present invention are described below with reference to a specific application scenario.

In many service scenarios, data tables are required to be used, for example, a service system needs to establish a user information data table, and at the beginning of data table design, more fields are often put into the same data table as much as possible according to experience or understanding of services under the current situation, such as the name, sex, identification number, cell address, house number, building number, and the like of a user, so as to increase the efficiency of storing and reading database queries. As services develop, more fields may need to be added to the table to indicate the fields of the new service scenario, which eventually results in the table becoming a wide table.

With more and more data and slow query efficiency of the wide table, the data table which is established at that time and contains fields such as names, sexes, identity card numbers, cell addresses, house numbers, building numbers and the like of users is seen back, too much information is stored, vertical partitioning is needed, whether the fields such as the cell addresses, the house numbers, the building numbers and the like can be guessed from experience, and a sub data table related to the addresses is better established; however, experience often cannot be applied to all service scenarios, and it cannot be guaranteed that the experience is a reasonable splitting mode which meets the historical query rules and the service requirements. The data table processing method provided by the technical scheme of the invention searches and calculates the degree of relationship between the fields through the historical query statement, and sorts, clusters and further splits the fields according to the degree of relationship.

FIG. 7 schematically illustrates a data table splitting method in an application scenario. As shown in fig. 7, the splitting method may mainly include the following steps:

step S710, obtaining historical query sentences.

The historical query statement may be an sql query statement obtained from a data interaction log corresponding to the source data table, and may include, for example, a plurality of query statements, such as sql1, sql2, \8230; sqlN, and the like.

Step S720, extracting query fields from the historical query statement to generate a plurality of field relation matrixes.

For each sql query statement, the required fields may be different, and the matrix dimensions of the respective involved columns may also be different. Therefore, each time a new sql query statement is received, whether matrix dimensionality needs to be increased or not, the array sequence of matrix rows and columns needs to be adjusted or not is judged according to the fields related to the new sql, and the previous matrix result is incrementally accumulated to obtain a new matrix.

For example, now a table (table) needs to be split, the first query statement sql1 is: select n, d from table. Then, two relevant fields n and d can be extracted through sql1, and then a matrix M1 can be initialized:

the matrix shows the known relationship of columns n and d based on the current inputted sql.

When the subsequent sql is input again, the previous matrix can be adjusted in the step, the adjustment strategy is that whether the field related to the newly added sql statement appears in the previous matrix or not, if the field does not appear, a new field needs to be added, and the matrix is updated; if all the fields are present, the addition is not needed, and only the accumulation is needed. For example, when the query statement sql2 is received: after select n, q from table, matrix M1 needs to be adjusted, and since sql2 refers to fields n and q, and q is not in M1, a new field needs to be added in the matrix. The added field q may be added in three places, before the field n, between the field n and the field d, and after the field d. By adding at different positions, different matrices can be obtained, and the matrix values are different. Three modes of addition are described below:

(1) Before n and d:

because the sql2 is one more column than the sql1, the matrix needs to be increased, firstly, one dimension needs to be added to the M1, and because the q column does not exist before the sql2 enters the calculation, the newly added dimensions can be filled into 0, and the matrix M11 with the newly added dimensions is obtained

Because the sql2 is added, the sql2 is arranged according to the row and column of the current matrix, and the forming principle of the matrix M1 is the same, so that a newly added matrix M21 can be obtained:

after obtaining the matrix M11 and the matrix M21, matrix addition is performed to obtain the latest matrix M2 after adding sql2 incrementally:

so far, due to the new field q introduced by sql2, the matrix result has been calculated in this way, adding before n and d.

(2) Added between n and d:

the same way as the above matrix determination is not repeated here, and the obtained matrix calculation result is the matrix M2':

(3) After n and d:

the same way as the above matrix determination is not repeated here, and the obtained matrix calculation result is a matrix M2 ″:

so far, three matrices, M2', and M2 ", are obtained, i.e., the values and conditions of the matrix corresponding to the input of sql2 are adjusted.

The above discusses the case where the field to which the newly added sql statement relates does not appear in the previous matrix, and there is also a case where the field to which the newly added sql statement relates already exists in the previous matrix. For example, then enter a sql3 statement: select n, d from table. Both field n and field d are already in the current matrix, and the increment sql3 does not affect the dimension of the matrix, but only the value. The calculation mode of the increment sql is consistent with that of sql2, which is not described herein, and after sql3 is added, the values of the three matrices M2, M2', and M2 ″ are all updated, and after updating, the three matrices M3, M3', and M3 ″ are obtained correspondingly:

if other query statements sql follow, the adding operation can be repeated, and the matrix row and column and the value thereof are adjusted. If the calculation of all query statements sql is completed or the task of calculating the degree of relationship between matrix columns has been triggered, the calculation of the degree of relationship of matrix columns is started.

And step S730, calculating the field association coefficient of each field relationship matrix to obtain a target field relationship matrix.

For each field relationship matrix obtained in step S720, the field correlation coefficients of all the matrices are calculated, and then the matrix with the largest field correlation coefficient is selected as the target field relationship matrix. The calculation formula of the field correlation coefficient needs to follow that the greater the degree of relationship between adjacent fields, the greater the field correlation coefficient of the matrix. According to this rule, different calculation formulas can be defined. For example, an alternative field association coefficient sim is calculated as follows:

where M () denotes a matrix element value at a corresponding position in the matrix of M × M, x denotes a matrix row, and y denotes a matrix column. If y-1 is less than or equal to 0, then M (x, y-1) is taken as 0. If y +1 is larger than or equal to M, taking M (x, y + 1) as 0.

Still taking the three matrices obtained in step S720 as an example, the corresponding field association coefficients obtained by the above formula calculation in this step are: simi (M3) =30; sami (M3') =26; sami (M3 ") =28. Since the matrix M3 is the optimal matrix row-column arrangement, the matrix M3 can be used as the target field relationship matrix, as simi (M3) > simi (M3 ") > simi (M3'). Since the arrangement order of all the matrix rows is the same as that of the columns, only one-dimensional vector of the row corresponding to the matrix needs to be output. Based on the above calculation results, the target field sequence can be determined to be [ q, n, d ]. The target field sequence ranks the more relevant field orderings together, which facilitates finding the cut points for subsequent aggregation.

And step S740, carrying out aggregation grouping on the query fields according to the expected number of the data table splits.

According to the statistical value and the determined target field sequence, the fields in the target field sequence can be traversed sequentially, and the difference value of every two adjacent fields is calculated. If the difference of the statistical number of the two adjacent fields in the sql is large, the difference of the frequency of the two fields is large, and then the fields can be divided to mark out the cutting points. As for how much difference exists between the statistical numbers of the fields in the sql, a corresponding threshold value needs to be defined according to the number of the data tables to be segmented.

Taking the target field sequence [ q, n, d ] as an example, the number of statistics that each field occurs in sql is count = [1,3,2]. And traversing each field in the target field sequence in sequence, calculating the difference to obtain that the difference between q and n is 2, and the difference between n and d is 1, obviously, the difference between q and n is arranged in front of n and d, splitting into two tables only needs one cutting point, and then the cutting point is between q and n. The fields of the two new data table tables that are cut are table (q) and table (n, d), respectively.

And performing vertical splitting of the wide table according to actual historical data. And sorting the fields according to the relation degree by calculating the relation degree of the fields and the fields, and further splitting. The splitting is supported by data logic, and the splitting scheme is more universal and accurate. In addition, since the historical query data can also feed back future query trends to a certain extent, the technical scheme of the invention splits the historical query data, and each split table can better support the subsequent data service.

Embodiments of the apparatus of the present invention are described below, which can be used to perform the above-described data table processing method of the present invention.

FIG. 8 schematically illustrates a block diagram of the components of a data table processing apparatus in some embodiments of the invention. As shown in fig. 8, the data table processing apparatus 800 may mainly include:

a field determination module 810 configured to obtain a plurality of historical query statements related to the source data table, and determine query fields in the respective historical query statements and query frequency information of each query field;

a matrix determination module 820 configured to determine a plurality of field relationship matrices according to the query fields and the query frequency information of each query field;

the matrix screening module 830 is configured to determine a field association coefficient of the field relationship matrix according to the query frequency information of the adjacent query fields in the field relationship matrix, and select a target field relationship matrix from the multiple field relationship matrices according to the field association coefficient;

the data table splitting module 840 is configured to determine a plurality of field splitting sequences according to the query frequency information of the adjacent query fields in the target field relationship matrix, and determine a plurality of sub data tables corresponding to the source data table according to the field splitting sequences.

In some embodiments of the invention, field determination module 810 may include:

the log obtaining module is configured to determine a database where the source data table is located and obtain a data interaction log of the database;

a statement extraction module configured to extract a plurality of historical query statements related to the source data table from the data interaction log.

In some embodiments of the invention, the query number information of a query field comprises the cumulative total number of queries for one query field and the total number of queries in common for two different query fields.

In some embodiments of the invention, matrix determination module 820 comprises:

a sequence determination module configured to rank the query fields to obtain a plurality of field sequences corresponding to different field arrangement orders;

the number counting module is configured to acquire the accumulated total query number of each query field and the common query total number of each query field and another query field;

and the number filling module is configured to determine a plurality of field relation matrixes respectively corresponding to the field sequences according to the accumulated total query number of each query field and the common total query number of each query field.

In some embodiments of the present invention, matrix filter module 830 comprises:

an in-line field coefficient determining module configured to determine an in-line field coefficient of each query field in a matrix row of the field relationship matrix according to query times information of each query field and adjacent query fields;

an inter-row field coefficient determination module configured to accumulate the intra-row field coefficients of each query field in a matrix row to obtain inter-row field coefficients of the matrix row;

and the field association coefficient determining module is configured to accumulate the field coefficients between rows of each matrix row in the field relationship matrix to obtain the field association coefficients of the field relationship matrix.

In some embodiments of the present invention, the data table splitting module 840 comprises:

a sequence determination module configured to determine a target field sequence associated with the target field relationship matrix;

a number acquisition module configured to acquire the cumulative total number of queries in the query number information of each query field;

a difference determining module configured to determine a query number difference between two adjacent query fields according to the accumulated query total number;

a position determination module configured to determine one or more field segmentation positions in the target field sequence according to the query number difference;

and the sequence splitting module is configured to split the target field sequence according to the field splitting positions to obtain a plurality of field splitting sequences.

In some embodiments of the invention, the data table splitting module 840 further comprises:

the field data extraction module is configured to extract field data from the source data table according to the query field in the field splitting sequence;

and the data table splitting module is configured to combine the field data according to the arrangement sequence of the query fields in the field splitting sequence to obtain a plurality of sub data tables corresponding to the source data table.

For details that are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the above-described embodiments of the data table processing method of the present invention for the functional modules of the data table processing apparatus of the exemplary embodiment of the present invention correspond to the steps of the above-described exemplary embodiment of the data table processing method.

Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use in implementing an electronic device of an embodiment of the present invention. The computer system 900 of the electronic device shown in fig. 9 is only an example, and should not bring any limitation to the function and the scope of the use of the embodiments of the present invention.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU) 901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for system operation are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the present application are executed when the computer program is executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present invention, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs, which when executed by one of the electronic devices, cause the electronic device to implement the data table processing method as described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, and may also be implemented by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for processing a data table, comprising:

2. The method of claim 1, wherein obtaining a plurality of historical query statements associated with a source data table comprises:

3. The data sheet processing method of claim 1, wherein the query number information of the query field includes a cumulative total number of queries of one query field and a total number of queries in common of two different query fields.

4. The method of claim 3, wherein determining a plurality of field relationship matrices according to the query fields and the query times information of each of the query fields comprises:

5. The method of claim 1, wherein the determining the field association coefficient of the field relationship matrix according to the query frequency information of the neighboring query fields in the field relationship matrix comprises:

and accumulating the field coefficients between rows of each matrix row in the field relationship matrix to obtain the field correlation coefficient of the field relationship matrix.

6. The method of claim 1, wherein determining a plurality of field splitting sequences according to the query times information of adjacent query fields in the target field relationship matrix comprises:

7. The method of claim 1, wherein determining the plurality of sub-data tables corresponding to the source data table according to the field splitting sequence comprises:

8. A data table processing apparatus, comprising:

the matrix screening module is configured to determine a field correlation coefficient of the field relation matrix according to the query frequency information of adjacent query fields in the field relation matrix, and select a target field relation matrix from the multiple field relation matrices according to the field correlation coefficient;

9. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of processing a data table according to any one of claims 1 to 7.

10. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the data table processing method of any of claims 1 to 7.