CN110569313A - Method and device for judging grade of model table of data warehouse - Google Patents
Method and device for judging grade of model table of data warehouse Download PDFInfo
- Publication number
- CN110569313A CN110569313A CN201810475388.9A CN201810475388A CN110569313A CN 110569313 A CN110569313 A CN 110569313A CN 201810475388 A CN201810475388 A CN 201810475388A CN 110569313 A CN110569313 A CN 110569313A
- Authority
- CN
- China
- Prior art keywords
- model table
- model
- tables
- data
- data warehouse
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 239000013598 vector Substances 0.000 claims description 40
- 238000004590 computer program Methods 0.000 claims description 9
- 238000013515 script Methods 0.000 claims description 7
- 238000012163 sequencing technique Methods 0.000 claims description 5
- 238000007726 management method Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 238000012423 maintenance Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000012544 monitoring process Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000011084 recovery Methods 0.000 description 4
- 238000007418 data mining Methods 0.000 description 2
- 238000013499 data model Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a method and a device for judging the level of a model table of a data warehouse, electronic equipment and a storage medium, and relates to the technical field of databases. The method comprises the following steps: acquiring index data of a plurality of preset indexes of each model table in the data warehouse; dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels and distributing a model table to each model table set to serve as an initial element; performing clustering operation on each model table according to each initial element, and distributing each model table to the corresponding model table set according to the result of the clustering operation; and distributing corresponding level identifications for each model table according to the distribution result of the model tables. The technical scheme of the embodiment of the invention can realize the automatic grading of the model table.
Description
Technical Field
The present invention relates to the field of database technologies, and in particular, to a method and an apparatus for determining a model table level of a data warehouse, an electronic device, and a computer-readable storage medium.
background
at present, a data warehouse is mostly established in an enterprise and is used for different data requirements of daily data analysis, report forms, data mining and the like of enterprise business. The core of establishing the data warehouse is to construct a set of data model based on company business, construct data information of different business links into a final model table through a certain modeling method and theory through the data model, and further provide services such as data query, analysis, retrieval, data mining and the like.
The model of the current data warehouse is generally divided into a plurality of levels according to the processing sequence and granularity, but the importance of the model table is not divided according to different levels, and the model table level is a very valuable attribute in actual work. For example, in the system maintenance of a data warehouse, data backup needs to be performed on the model table, and measures such as priority guarantee, important monitoring, full backup and the like should be taken for the model table with a higher level. For another example, when allocating resources in the data warehouse operation, more resources, such as compute node resources, storage resources, network access concurrent resources, etc., should be appropriately allocated to the model table with a higher level. In general, through reasonably grading the model table, more effective management methods can be provided according to different levels, so that the stability and the use value of the system are comprehensively improved.
therefore, how to determine the model table level in a data warehouse becomes a technical problem to be solved urgently.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the invention and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
An object of embodiments of the present invention is to provide a method for determining a model table level of a data warehouse, an apparatus for determining a model table level of a data warehouse, an electronic device, and a computer-readable storage medium, which overcome one or more problems due to limitations and disadvantages of the related art, at least to some extent.
According to one aspect of the present disclosure, there is provided a method for determining a model table level of a data warehouse, including:
Acquiring index data of a plurality of preset indexes of each model table in the data warehouse;
Dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels and distributing a model table to each model table set to serve as an initial element;
Performing clustering operation on each model table according to each initial element, and distributing each model table to the corresponding model table set according to the result of the clustering operation;
And distributing corresponding level identifications for each model table according to the distribution result of the model tables.
In an exemplary embodiment of the present disclosure, the preset index includes a plurality of fields, records, tasks, sources, queries, downloads, running scripts, updates, and comments.
In an exemplary embodiment of the present disclosure, assigning a model table as an initial element to each of the model table sets includes:
For each model table, calculating the sum of all the index data of the model table as first judgment data of the model table;
Sorting each model table according to each first judgment data;
And selecting the model tables distributed to each model table set in a preset order as initial elements according to the sequencing result of each model.
In an exemplary embodiment of the present disclosure, clustering each of the model tables according to each of the initial elements includes:
for each model table, calculating the distance between the model table and the vector centroid of each model table set;
Assigning the model table to a set of the model tables if the distance between the model table and a vector centroid of the set of the model tables is minimal;
and calculating the vector centroid of the model table set according to the index data of all the model tables in the model table set.
according to an aspect of the present disclosure, there is provided a model table level determination apparatus of a data warehouse, including:
The data acquisition module is used for acquiring index data of a plurality of preset indexes of each model table in the data warehouse;
the model table set initialization module is used for establishing model table sets corresponding to the model table levels and distributing a model table to each model table set to serve as an initial element;
The clustering operation module is used for carrying out clustering operation on each model table according to each initial element and distributing each model table to the corresponding model table set according to the result of the clustering operation;
And the grade output module is used for distributing corresponding grade identifications to the model tables according to the distribution results of the model tables.
In an exemplary embodiment of the present disclosure, the preset index includes a plurality of fields, records, tasks, sources, queries, downloads, running scripts, updates, and comments.
In an exemplary embodiment of the present disclosure, assigning a model table as an initial element to each of the model table sets includes:
For each model table, calculating the sum of all the index data of the model table as first judgment data of the model table;
sorting each model table according to each first judgment data;
and selecting the model tables distributed to each model table set in a preset order as initial elements according to the sequencing result of each model.
In an exemplary embodiment of the present disclosure, clustering each of the model tables according to each of the initial elements includes:
For each model table, calculating the distance between the model table and the vector centroid of each model table set;
Assigning the model table to a set of the model tables if the distance between the model table and a vector centroid of the set of the model tables is minimal;
and calculating the vector centroid of the model table set according to the index data of all the model tables in the model table set.
According to an aspect of the present disclosure, there is provided an electronic device including:
A processor; and
a memory having computer readable instructions stored thereon which, when executed by the processor, implement a method of model table level determination for a data warehouse as in any above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the model table level determination method for a data warehouse as described in the first aspect above.
In the technical solutions provided by some embodiments of the present invention, a clustering operation is performed on a model table based on index data of a plurality of indexes of each model table in a collected data warehouse, and a level of each model table is automatically determined according to a result of the clustering operation. Compared with the prior art, the grade judgment is carried out based on the indexes of the model table, so that the obtained judgment result is more objective and accurate, and the model tables of all data warehouses can be covered by selecting different indexes, so that the method has better universality. More specifically, on one hand, the technical scheme provided by the invention can avoid the problems of random judgment and low recognition rate of manual experience, has good business interpretability on the judgment standard of the model table level, and can reduce the dispute of manual judgment; on the other hand, the technical scheme provided by the invention has better universality, can be used for judging the model table level of all data warehouses, and reduces system management blind areas and operation and maintenance risks caused by models which cannot be identified manually; on the other hand, the technical scheme provided by the invention can realize automatic grading of the model table, so that the problem of evaluating the grade of the model table after the problem occurs in manual judgment can be solved by performing daily management, the risk in daily management and operation and maintenance of the data warehouse can be avoided, and the stability and application value of the data warehouse system can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 illustrates a model hierarchy architecture diagram of a data warehouse in accordance with one aspect;
FIG. 2 illustrates a flow diagram of a method for model table level determination of a data warehouse, in accordance with some embodiments of the invention;
FIG. 3 illustrates a flow diagram of a method for model table level determination of a data warehouse, in accordance with some embodiments of the invention;
FIG. 4 shows a schematic block diagram of a model table level decision apparatus of a data warehouse in accordance with an exemplary embodiment of the present invention;
FIG. 5 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.
Detailed Description
example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.
the block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
In the data warehouse in the prior art, the classification of the model table is generally not considered, and the relative importance of the model table is mostly evaluated through a manual experience method. For example, the order summary wide model table and the order detail wide model table are business model tables of the e-commerce system with orders as cores, so manual judgment is an important model table, and therefore, evaluation of the model table is relatively important and has a high level. And in turn, may give more attention to routine system maintenance and operation and maintenance management. While other model tables do not have uniform rules and methods to determine the model table level. For example:
FIG. 1 is a schematic diagram of a model hierarchy architecture of a data warehouse. It can be seen from fig. 1 that the data warehouse only layers the data flow of model table processing; the data are sequentially 1 layer, 2 layer, 3 layer and 4 layer from left to right, and basically flow from 1 layer to 4 layer (the specific number of layers can be set differently according to enterprise requirements). However, no distinction of model table levels is given for the model tables in each layer, i.e., the importance of any model table in the data warehouse cannot be generally judged and no method for evaluating the model table levels is provided.
In the existing technical scheme, the level of a model table is judged in a manual experience mode, different level labels are set for the model table according to the level of the model table judged in the manual experience mode, and then management and operation and maintenance work of a daily data warehouse is executed according to the level labels. For the judgment mode of manual experience, in actual work, the judgment is carried out according to whether core services exist or not, and the judgment is carried out according to the fact that the number of tasks used by a model table meets a certain rule, so that in a word, a scientific and reasonable judgment method which is in line with actual operation does not exist. Therefore, the existing method for determining the model table level by manual experience has the following disadvantages:
1) and because the method for manually judging the level of the model table has no reasonable rule, the identification rate is low, only a small part of the model table can be judged, and the judgment result has larger disputes due to human factors.
2) Some special model tables cannot be identified, for example, the special model tables are not core business model tables but have high utilization rate, so that important model tables are not identified, and system management blind areas and operation and maintenance risks are increased.
3) the manual judgment mode is mostly a mode of post judgment, namely, the model table has a problem in use, and the grade of the model table can be judged and analyzed after a certain influence is caused, so that the mode is passive and has poor actual effect.
Therefore, the prior art cannot form an effective method when determining the model table level, and the implementation of the method can cause potential risks to database management and operation and maintenance.
Based on the above, in the exemplary embodiment of the present invention, a method for determining a model table level of a data warehouse is first proposed. The method may be executed by a server or other electronic devices, which is not particularly limited in this exemplary embodiment. Referring to fig. 2, the method for determining the model table level of the data warehouse may include the following steps:
s210, acquiring index data of a plurality of preset indexes of each model table in the data warehouse;
step S220, dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels and distributing a model table to each model table set to serve as an initial element;
Step S230, performing clustering operation on each model table according to each initial element, and distributing each model table to the corresponding model table set according to the result of the clustering operation;
and S240, distributing corresponding level identifications to each model table according to the distribution result of the model tables.
Compared with the prior art, the method for judging the level of the model table of the data warehouse in the example embodiment of fig. 2 is a level judgment based on the indexes of the model table, so that the obtained judgment result is more objective and accurate, and different indexes are selected to cover all the model tables of the data warehouse, so that the method has better universality. More specifically, on one hand, the technical scheme provided by the invention can avoid the problems of random judgment and low recognition rate of manual experience, has good business interpretability on the judgment standard of the model table level, and can reduce the dispute of manual judgment; on the other hand, the technical scheme provided by the invention has better universality, can be used for judging the model table level of all data warehouses, and reduces system management blind areas and operation and maintenance risks caused by models which cannot be identified manually; on the other hand, the technical scheme provided by the invention can realize automatic grading of the model table, so that the problem of evaluating the grade of the model table after the problem occurs in manual judgment can be solved by performing daily management, the risk in daily management and operation and maintenance of the data warehouse can be avoided, and the stability and application value of the data warehouse system can be improved.
The model table level decision method of the data warehouse in the exemplary embodiment of fig. 2 is described in detail below with reference to fig. 3.
In step S210, index data of a plurality of preset indexes of each model table in the data warehouse is collected.
In an example embodiment, the data warehouse includes all of the model tables, and the model tables may be hierarchically designed and managed. The data warehouse can design, generate, use and manage the model table by some methods such as Extract-Transform-Load (ETL) and metadata management. For each model table, it may have multiple indices describing it. For example, in this exemplary embodiment, the preset indexes may include indexes related to attributes of the model table, indexes related to associated tasks, indexes related to source-dependent data, indexes related to usage situations, and the like. Specifically, the preset index may include a plurality of fields, a number of records, a number of tasks, a number of source tables, a number of queries of the statistical period, a number of downloads of the statistical period, a running time of the script, an update number of the statistical period, a number of users using the statistical period, a number of comments of the user, and the like, which is not limited in the exemplary embodiment. The statistical period may be a week, a month, a year, etc., and this is not particularly limited in the present exemplary embodiment.
After the index data of a plurality of preset indexes of each model table in the data warehouse are collected, the index data can be integrated into a model judgment condition table so as to facilitate storage and query operation. For example, the fields of the model decision condition table may include: model table name, field number (M1), record number (ten) (M2), task number (M3), source table number (M4), month query times (M5), month download times (M6), script running time (minutes) (M7), month update times (M8), number of persons used per month (M9) and user comment number (M10); the structure of the model determination condition table is as shown in table 1:
TABLE1
name of model table | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | M10 |
table1 | 25 | 305 | 4 | 8 | 245 | 80 | 34 | 0 | 42 | 10 |
table2 | 71 | 205 | 3 | 3 | 525 | 46 | 14 | 1 | 33 | 6 |
table3 | 20 | 85 | 4 | 4 | 352 | 35 | 41 | 0 | 21 | 9 |
table4 | 16 | 33 | 5 | 5 | 109 | 11 | 32 | 0 | 45 | 11 |
table5 | 34 | 190 | 6 | 5 | 221 | 13 | 23 | 0 | 63 | 4 |
table6 | 31 | 234 | 7 | 3 | 176 | 14 | 11 | 0 | 51 | 6 |
table7 | 29 | 53 | 2 | 6 | 240 | 17 | 21 | 0 | 35 | 14 |
table8 | 19 | 43 | 4 | 4 | 460 | 37 | 51 | 0 | 25 | 34 |
table9 | 39 | 73 | 5 | 4 | 405 | 22 | 31 | 0 | 37 | 12 |
table10 | 22 | 66 | 5 | 6 | 266 | 44 | 22 | 0 | 53 | 21 |
…… | …… | …… | …… | …… | …… | …… | …… | …… | …… | …… |
In step S220, a plurality of model table levels are divided, and a model table set corresponding to the model table levels is established and a model table is allocated to each model table set as an initial element.
In the present exemplary embodiment, a plurality of model table levels may be divided according to requirements; for example, 2 model table levels, 3 model table levels, 4 model table levels, 5 model table levels, and so on are divided. Taking 4 levels of model tables as an example, all the model tables can be divided into A, B, C, D four levels, which respectively correspond to the importance of the model: "ultra high", "medium", "low". Correspondingly, different model table levels can correspond to different daily management measures. Specifically, as shown in Table 2:
TABLE2
model table level | Degree of importance of model | Daily management measures |
A | super high | Resource priority, 7 × 24 monitoring, advanced disaster recovery and special person responsibility |
B | Height of | Resource priority, 7 × 24 monitoring, normal disaster recovery and special person responsibility |
C | in | 7 x 24 monitoring, normal disaster recovery and duty cycle |
D | Is low in | 7 x 24 monitoring, normal disaster recovery |
after the model table levels are divided, a model table set corresponding to the model table levels can be established. For example, the set of model tables corresponding to the model table level a is set a; the model table set corresponding to the model table level B is a set B; the set of the model tables corresponding to the model table level C is a set C; and the model table set corresponding to the model table level D is a set D.
referring to fig. 3, in this exemplary embodiment, allocating a model table as an initial element to each model table set may include the following steps:
step S310, for each model table, calculating the sum of all the index data of the model table as the first judgment data of the model table. Taking the model determination condition table shown in table1 as an example, the first determination data x of each model table is calculatediThe method of (3) may be as follows:
wherein i represents each model table; 1,2,3 … …, n; n represents the total number of model tables. After the calculation is finished, the set X is obtained as the Xi}。
Of course, in other exemplary embodiments of the present disclosure, the first determination data may be calculated in other manners; for example, weighted summation is performed on all the index data, or other processing such as multiplication and exponentiation is performed to obtain first judgment data; these too are within the scope of the present disclosure.
Step S320, sorting each model table according to each first judgment data.
For example, the first determination data corresponding to the model tables table1, table2, table3, table4, table5, table6, table7, table8, table9 and table10 in the above table2 are 753, 907, 571, 267, 559, 533, 417, 677, 628 and 505, respectively; the corresponding ranks are 2, 1, 5, 10, 6, 7, 9, 3, 4, 8. Specifically, as shown in table 3:
TABLE3
name of model table | M1 | M2 | …… | M10 | X | xivalue of (A) | Ranking |
table1 | 25 | 305 | …… | 10 | x1 | 753 | 2 |
table2 | 71 | 205 | …… | 6 | x2 | 907 | 1 |
table3 | 20 | 85 | …… | 9 | x3 | 571 | 5 |
table4 | 16 | 33 | …… | 11 | x4 | 267 | 10 |
table5 | 34 | 190 | …… | 4 | x5 | 559 | 6 |
table6 | 31 | 234 | …… | 6 | x6 | 533 | 7 |
table7 | 29 | 53 | …… | 14 | x7 | 417 | 9 |
table8 | 19 | 43 | …… | 34 | x8 | 677 | 3 |
table9 | 39 | 73 | …… | 12 | x9 | 628 | 4 |
table10 | 22 | 66 | …… | 21 | x10 | 505 | 8 |
s330, selecting the distributed model tables in a preset order as initial elements according to the sorting result of each model. For example:
For set A, corresponding to model table level A, the initial element in set A is denoted as a1. In the present exemplary embodiment, a may be taken1=argmax(xiX); that is, the model table corresponding to the xi with the maximum value in the set X is taken as the initial element.
For set B, corresponding to model table level B, the initial element in set B is denoted as B1. In the present exemplary embodiment, b may be taken1=argtop30%(xix); namely, X ranked at 30% rank in the set X is takenithe corresponding model table is the initial element.
for set C, corresponding to model table level C, the initial element in set C is denoted as C1. In the present exemplary embodiment, c may be taken1=argtop70%(xiX); namely, X ranked at 70% rank in the set X is takenithe corresponding model table is the initial element.
For set D, corresponding to model table level D, the initial element in set D is denoted as D1. In this example embodiment, d may be taken1=argmin(xiX); i.e. taking the smallest value X in the set Xithe corresponding model table is the initial element.
In the numbers of Table3By way of example, first judgment data x of each model table is calculatediafter side-by-side naming, the values of (c) can be found:
For set A, the initial element takes rank 1, i.e., the corresponding model table for the maximum value in set X, as the initial element a1I.e. a1table 2. Accordingly, the initial set a { (a)1=table2)}。
for set B, the initial element takes rank 30%, X for rank 3 in set XiCorresponding model table as initial element b1I.e. b1Table 8. Accordingly, the initial set B { (B)1=table8)}。
For set C, the initial element takes rank 70%, X for rank 7 in set XiCorresponding model table as initial element c1I.e. c1Table 6. Accordingly, the initial set C { (C)1=table6)}。
For set D, the initial element takes the rank 10, i.e., the corresponding model table for the minimum value in set X, as the initial element D1i.e. d1Table 4. Accordingly, the initial set D { (D)1=table4)}。
in step S230, a clustering operation is performed on each model table according to each initial element, and each model table is allocated to the corresponding model table set according to a result of the clustering operation.
In this example embodiment, the clustering operation may be performed in conjunction with computing the vector centroids of the set of model tables. For example, referring to fig. 3, performing a clustering operation on each model table according to each initial element may include the following steps:
Step S340, for each model table, calculating the distance between the model table and the vector centroid of each model table set.
in this exemplary embodiment, for each model table set, the vector centroid thereof may be calculated according to the index data of all model tables in the model table set.
for example, assume that the number of model tables in the set a corresponding to the a level is o, assume that the number of model tables in the set B corresponding to the B level is p, assume that the number of model tables in the set C corresponding to the C level is k, and assume that the number of model tables in the set D corresponding to the D level is m. In each model table set, each model table element is a 10-dimensional vector whose coordinate values are, in turn, the fields M1, M2 …, and M10 of the model decision condition decision table in table1 above. The generalizations for set a, set B, set C, set D are therefore as follows:
A={a1,a2,...,ao}ai∈Rn(i=1,2,...,o)
B={b1,b2,...,bp}bi∈Rn(i=1,2,...,p)
C={c1,c2,...,ck}ci∈Rn(i=1,2,...,k)
D={d1,d2,...,dm}di∈Rn(i=1,2,...,m)
wherein n is 10, Rnthe representation is a 10-dimensional vector space. Of course, if the determination condition determination table includes other numbers of fields, the dimensions of each model table element correspond to other numbers, which is not particularly limited in this exemplary embodiment.
After generalized representations of the set A, the set B, the set C and the set D are obtained, the vector centroids mu of the set A, the set B, the set C and the set D are obtaineda、μb、μc、μdCan be calculated by the following formula:
That is, in the present exemplary embodiment, the vector centroid of the model table set is calculated by calculating the average value of the vector positions corresponding to all the elements in the set, and the resulting μa、μb、μc、μdAre all 10-dimensional vectors. However, it is easily understood by those skilled in the art that in other exemplary embodiments of the present disclosure, the centroid of the model table set may be calculated in other manners, and the present exemplary embodiment is not limited thereto.
After calculating the vector centroids of the model table sets, for each model table, the vector N of the model table may be taken, and the vector centroids μ of the vector N and the vector centroids μ of the set a, the set B, the set C, and the set D are calculateda、μb、μc、μddis _ a, Dis _ b, Dis _ c, Dis _ d. For example:
Dis_a=||N-μa||2
Dis_b=||N-μb||2
Dis_c=||N-μc||2
Dis_d=||N-μd||2
wherein | X-Y | is the root number of the sum of squares of the components after the vector is differenced.
Note that, in the present exemplary embodiment, the euclidean distance is calculated, but in other exemplary embodiments of the present disclosure, a mahalanobis distance, a cosine distance, a manhattan distance, or the like may also be calculated; these too are within the scope of the present disclosure.
step S350, if the distance between the model table and the vector centroid of the model table set is minimum, the model table is distributed to the model table set. The minimum distance may be determined, for example, by:
Min(Dis_a,Dis_b,Dis_c,Dis_d)
For example, for the model table1 described above, which has the smallest distance to the vector centroid of the model table set a, the model table1 is assigned to the model table set a; for the model table2 above, which has the smallest distance to the vector centroid of the model table set D, then model table2 is assigned to the model table set D, and so on.
and S360, calculating the vector centroid of the model table set according to the index data of all the model tables in the model table set.
That is, after adding a new element to a model table set, its vector centroid needs to be recalculated. In this exemplary embodiment, the vector centroid can be recalculated by the method in step S340, and can also be calculated by the following formula:
if a new model table is added in the set A, the vector centroid of the set A is updated as follows:
o=o+1
If a new model table is added in the set B, the vector centroid of the set B is updated as follows:
p=p+1
if a new model table is added to the set C, the vector centroid of the set C is updated as follows:
k=k+1
If a new model table is added in the set D, the vector centroid of the set D is updated as follows:
m=m+1
And then, iterating the steps S340 to S360 until all model tables of the model judgment condition table are judged to be finished, and finally obtaining A, B, C, D four sets, namely judging the grading results of all models in the data warehouse.
In step S240, a corresponding level identifier is assigned to each model table according to the model table assignment result.
in this exemplary embodiment, the level identifier may be a level tag, a level score, or the like; taking the class label as an example, the output result can be shown in the following table 4:
TABLE4
model watch | Model table level |
table1 | A |
table2 | D |
table3 | C |
table4 | B |
table5 | B |
table6 | C |
…… | …… |
based on the obtained level judgment result, different daily management measures can be pertinently taken for model tables of different levels, so that risks in daily management and operation and maintenance of the data warehouse can be avoided, and the stability and application value of the data warehouse system are improved.
In addition, in the embodiment of the invention, a model table level judgment device of the data warehouse is also provided. Referring to fig. 4, the model table level determination apparatus 400 of the data warehouse may include: a data acquisition module 410, a set initialization module 420, a clustering operation module 430, and a level output module 440. Wherein:
The data collection module 410 may be configured to collect index data of a plurality of preset indexes of each model table in the data warehouse.
The set initialization module 420 may be configured to divide a plurality of model table levels, establish a model table set corresponding to the model table levels, and allocate a model table to each model table set as an initial element.
the clustering operation module 430 may be configured to perform a clustering operation on each model table according to each initial element, and allocate each model table to the corresponding model table set according to a result of the clustering operation.
The level output module 440 may be configured to assign a corresponding level identifier to each model table according to the model table assignment result.
in some embodiments of the invention, based on the foregoing,
the preset indexes comprise a plurality of field numbers, record numbers, task numbers, source table numbers, statistics period query times, statistics period download times, script running time, statistics period update times, statistics period use number and user comment numbers.
In some embodiments of the present invention, based on the foregoing scheme, assigning a model table as an initial element to each of the model table sets may include:
For each model table, calculating the sum of all the index data of the model table as first judgment data of the model table;
sorting each model table according to each first judgment data;
And selecting the model tables distributed to each model table set in a preset order as initial elements according to the sequencing result of each model.
In some embodiments of the invention, based on the foregoing,
Performing clustering operation on each model table according to each initial element comprises:
For each model table, calculating the distance between the model table and the vector centroid of each model table set;
Assigning the model table to a set of the model tables if the distance between the model table and a vector centroid of the set of the model tables is minimal;
and calculating the vector centroid of the model table set according to the index data of all the model tables in the model table set.
Since each functional module of the model table level determination apparatus 400 of the data warehouse according to the exemplary embodiment of the present invention corresponds to the step of the above-described exemplary embodiment of the model table level determination method of the data warehouse, it is not described herein again.
In an exemplary embodiment of the present invention, there is also provided an electronic device capable of implementing the above method.
referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use in implementing an electronic device of an embodiment of the present invention. The computer system 500 of the electronic device shown in fig. 5 is only an example, and should not bring any limitation to the function and the scope of the use of the embodiments of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for system operation are also stored. The CPU501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
the units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the model table level determination method for a data warehouse as described in the above embodiments.
for example, the electronic device may implement the following as shown in fig. 2: s210, acquiring index data of a plurality of preset indexes of each model table in the data warehouse; step S220, dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels and distributing a model table to each model table set to serve as an initial element; step S230, performing clustering operation on each model table according to each initial element, and distributing each model table to the corresponding model table set according to the result of the clustering operation; and S240, distributing corresponding level identifications to each model table according to the distribution result of the model tables.
It should be noted that although in the above detailed description several modules or units of a device or apparatus for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
it will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
Claims (10)
1. A method for judging the level of a model table of a data warehouse is characterized by comprising the following steps:
Acquiring index data of a plurality of preset indexes of each model table in the data warehouse;
dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels and distributing a model table to each model table set to serve as an initial element;
performing clustering operation on each model table according to each initial element, and distributing each model table to the corresponding model table set according to the result of the clustering operation;
And distributing corresponding level identifications for each model table according to the distribution result of the model tables.
2. The method for determining the model table level of the data warehouse according to claim 1, wherein the preset index includes a plurality of fields, records, tasks, sources, queries, downloads, scripts, updates, users and comments.
3. The method of claim 1, wherein assigning a model table as an initial element to each of the sets of model tables comprises:
For each model table, calculating the sum of all the index data of the model table as first judgment data of the model table;
sorting each model table according to each first judgment data;
and selecting the model tables distributed to each model table set in a preset order as initial elements according to the sequencing result of each model.
4. the method for determining the model table level of a data warehouse according to any one of claims 1 to 3, wherein clustering each model table according to each initial element comprises:
for each model table, calculating the distance between the model table and the vector centroid of each model table set;
assigning the model table to a set of the model tables if the distance between the model table and a vector centroid of the set of the model tables is minimal;
And calculating the vector centroid of the model table set according to the index data of all the model tables in the model table set.
5. An apparatus for determining a model table level of a data warehouse, comprising:
The data acquisition module is used for acquiring index data of a plurality of preset indexes of each model table in the data warehouse;
The model table set initialization module is used for establishing model table sets corresponding to the model table levels and distributing a model table to each model table set to serve as an initial element;
The clustering operation module is used for carrying out clustering operation on each model table according to each initial element and distributing each model table to the corresponding model table set according to the result of the clustering operation;
and the grade output module is used for distributing corresponding grade identifications to the model tables according to the distribution results of the model tables.
6. The model table level determination device of the data warehouse according to claim 5, wherein the preset index includes a plurality of fields, records, tasks, sources, queries, downloads, scripts running time, updates, users and comments.
7. The apparatus for determining the model table level of a data warehouse of claim 5, wherein assigning a model table as an initial element to each of the model table sets comprises:
for each model table, calculating the sum of all the index data of the model table as first judgment data of the model table;
Sorting each model table according to each first judgment data;
And selecting the model tables distributed to each model table set in a preset order as initial elements according to the sequencing result of each model.
8. The apparatus for determining the model table level of a data warehouse according to any one of claims 5 to 7, wherein clustering each of the model tables according to each of the initial elements includes:
for each model table, calculating the distance between the model table and the vector centroid of each model table set;
assigning the model table to a set of the model tables if the distance between the model table and a vector centroid of the set of the model tables is minimal;
And calculating the vector centroid of the model table set according to the index data of all the model tables in the model table set.
9. an electronic device, comprising:
A processor; and
A memory having stored thereon computer readable instructions which, when executed by the processor, implement a method of model table level determination for a data warehouse as claimed in any one of claims 1 to 4.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of model table level determination of a data warehouse according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810475388.9A CN110569313B (en) | 2018-05-17 | 2018-05-17 | Model table level judging method and device of data warehouse |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810475388.9A CN110569313B (en) | 2018-05-17 | 2018-05-17 | Model table level judging method and device of data warehouse |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110569313A true CN110569313A (en) | 2019-12-13 |
CN110569313B CN110569313B (en) | 2023-12-05 |
Family
ID=68771832
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810475388.9A Active CN110569313B (en) | 2018-05-17 | 2018-05-17 | Model table level judging method and device of data warehouse |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110569313B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112256802A (en) * | 2020-10-20 | 2021-01-22 | 威海上和软件科技有限公司 | Automatic acquisition method and equipment for marine microorganism information |
CN113342903A (en) * | 2020-02-18 | 2021-09-03 | 北京沃东天骏信息技术有限公司 | Method and device for managing models in data warehouse |
CN113568990A (en) * | 2021-09-01 | 2021-10-29 | 上海中通吉网络技术有限公司 | Management system of data warehouse model |
CN115081787A (en) * | 2022-03-10 | 2022-09-20 | 上海数中科技有限公司 | Model management method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105099782A (en) * | 2015-08-18 | 2015-11-25 | 北京京东世纪贸易有限公司 | Method and system for controlling big data resource of cloud environment cluster |
US20160364468A1 (en) * | 2015-06-10 | 2016-12-15 | International Business Machines Corporation | Database index for constructing large scale data level of details |
CN107315627A (en) * | 2017-05-31 | 2017-11-03 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus of automatic configuration data warehouse parallel task queue |
CN107766940A (en) * | 2017-11-20 | 2018-03-06 | 北京百度网讯科技有限公司 | Method and apparatus for generation model |
-
2018
- 2018-05-17 CN CN201810475388.9A patent/CN110569313B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160364468A1 (en) * | 2015-06-10 | 2016-12-15 | International Business Machines Corporation | Database index for constructing large scale data level of details |
CN105099782A (en) * | 2015-08-18 | 2015-11-25 | 北京京东世纪贸易有限公司 | Method and system for controlling big data resource of cloud environment cluster |
CN107315627A (en) * | 2017-05-31 | 2017-11-03 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus of automatic configuration data warehouse parallel task queue |
CN107766940A (en) * | 2017-11-20 | 2018-03-06 | 北京百度网讯科技有限公司 | Method and apparatus for generation model |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113342903A (en) * | 2020-02-18 | 2021-09-03 | 北京沃东天骏信息技术有限公司 | Method and device for managing models in data warehouse |
CN112256802A (en) * | 2020-10-20 | 2021-01-22 | 威海上和软件科技有限公司 | Automatic acquisition method and equipment for marine microorganism information |
CN113568990A (en) * | 2021-09-01 | 2021-10-29 | 上海中通吉网络技术有限公司 | Management system of data warehouse model |
CN115081787A (en) * | 2022-03-10 | 2022-09-20 | 上海数中科技有限公司 | Model management method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110569313B (en) | 2023-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107704625B (en) | Method and device for field matching | |
US8849828B2 (en) | Refinement and calibration mechanism for improving classification of information assets | |
US20200050968A1 (en) | Interactive interfaces for machine learning model evaluations | |
CN110569313B (en) | Model table level judging method and device of data warehouse | |
US9524310B2 (en) | Processing of categorized product information | |
CN109344154B (en) | Data processing method, device, electronic equipment and storage medium | |
US10303705B2 (en) | Organization categorization system and method | |
CN106844407B (en) | Tag network generation method and system based on data set correlation | |
CN111723292B (en) | Recommendation method, system, electronic equipment and storage medium based on graph neural network | |
CN109634941B (en) | Medical data processing method and device, electronic equipment and storage medium | |
CN111639077B (en) | Data management method, device, electronic equipment and storage medium | |
CN115203435A (en) | Entity relation generation method and data query method based on knowledge graph | |
CN113554307A (en) | RFM (recursive filter) model-based user grouping method and device and readable medium | |
CN111414410A (en) | Data processing method, device, equipment and storage medium | |
CN115408381A (en) | Data processing method and related equipment | |
US20140108625A1 (en) | System and method for configuration policy extraction | |
US11782918B2 (en) | Selecting access flow path in complex queries | |
CN104636422B (en) | The method and system for the pattern concentrated for mining data | |
CN116167733A (en) | Performance evaluation method, device, equipment and medium | |
CN116485019A (en) | Data processing method and device | |
CN111915115A (en) | Execution policy setting method and device | |
US11868167B2 (en) | Automatically provisioned tag schema for hybrid multicloud cost and chargeback analysis | |
CN114121204A (en) | Patient record matching method based on patient master index, storage medium and equipment | |
CN109360638B (en) | Evaluation plan generation method and device, electronic device and storage medium | |
Almiñana et al. | A classification rule reduction algorithm based on significance domains |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |