CN110569313B - Model table level judging method and device of data warehouse - Google Patents

Model table level judging method and device of data warehouse Download PDF

Info

Publication number
CN110569313B
CN110569313B CN201810475388.9A CN201810475388A CN110569313B CN 110569313 B CN110569313 B CN 110569313B CN 201810475388 A CN201810475388 A CN 201810475388A CN 110569313 B CN110569313 B CN 110569313B
Authority
CN
China
Prior art keywords
model table
model
tables
data
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810475388.9A
Other languages
Chinese (zh)
Other versions
CN110569313A (en
Inventor
李建星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201810475388.9A priority Critical patent/CN110569313B/en
Publication of CN110569313A publication Critical patent/CN110569313A/en
Application granted granted Critical
Publication of CN110569313B publication Critical patent/CN110569313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a model table level judging method and device of a data warehouse, electronic equipment and a storage medium, and relates to the technical field of databases. The method comprises the following steps: acquiring index data of a plurality of preset indexes of each model table in the data warehouse; dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels, and distributing a model table as an initial element for each model table set; performing clustering operation on each model table according to each initial element, and distributing each model table to the corresponding model table set according to the clustering operation result; and distributing corresponding level identifiers for the model tables according to the model table distribution results. The technical scheme of the embodiment of the invention can realize automatic classification of the model table.

Description

Model table level judging method and device of data warehouse
Technical Field
The present invention relates to the field of database technologies, and in particular, to a method for determining a model table level of a data warehouse, a device for determining a model table level of a data warehouse, an electronic device, and a computer-readable storage medium.
Background
At present, a data warehouse is mostly built in an enterprise and is used for different data requirements of daily data analysis, report forms, data mining and the like of enterprise business. The core of the data warehouse establishment is to construct a set of data model based on company business, the data information of different business links is constructed into a final model table through a certain modeling method and theory by the data model, and then the services of data query, analysis, retrieval, data mining and the like can be provided for the outside.
The model of the current data warehouse is generally divided into a plurality of layers according to the processing sequence and granularity, but the importance of the model table is not divided into different levels, and the model table level is a very valuable attribute in actual work. For example, in the system maintenance of the data warehouse, data backup is required to be performed on the model table, and measures such as priority guarantee, key monitoring, full-scale backup and the like should be adopted on the model table with higher level. As another example, in allocating resources in a data warehouse run, more resources should be allocated appropriately for higher level model tables, such as compute node resources, storage resources, network access concurrency resources, etc. In general, by reasonably classifying the model table, a more effective management method can be provided according to different levels, so that the stability and the use value of the system are comprehensively improved.
Therefore, how to determine the model table level in a data warehouse is a technical problem to be solved.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the invention and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.
Disclosure of Invention
It is an object of embodiments of the present invention to provide a model table level determination method of a data warehouse, a model table level determination apparatus of a data warehouse, an electronic device, and a computer-readable storage medium, which overcome, at least in part, one or more problems due to the limitations and disadvantages of the related art.
According to one aspect of the present disclosure, there is provided a model table level determination method of a data warehouse, including:
acquiring index data of a plurality of preset indexes of each model table in the data warehouse;
dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels, and distributing a model table as an initial element for each model table set;
performing clustering operation on each model table according to each initial element, and distributing each model table to the corresponding model table set according to the clustering operation result;
And distributing corresponding level identifiers for the model tables according to the model table distribution results.
In an exemplary embodiment of the present disclosure, the preset indexes include a number of fields, a number of records, a number of tasks, a number of source tables, a number of queries of the statistical period, a number of downloads of the statistical period, a duration of script operation, a number of updates of the statistical period, a number of people used in the statistical period, and a number of comments of the user.
In one exemplary embodiment of the present disclosure, assigning a model table to each of the set of model tables as an initial element includes:
for each model table, calculating the sum of all index data of the model table as first judgment data of the model table;
sorting the model tables according to the first judgment data;
and selecting the model table sets distributed to the preset order as initial elements according to the sorting result of the models.
In an exemplary embodiment of the present disclosure, clustering each of the model tables according to each of the initial elements includes:
for each model table, calculating the distance between the model table and the vector centroid of each model table set;
Assigning the model table to a set of the model tables if the distance of the model table from the vector centroid of the set of model tables is minimal;
and calculating the vector centroid of the model table set according to the index data of all the model tables in the model table set.
According to an aspect of the present disclosure, there is provided a model table level judging device of a data warehouse, including:
the data acquisition module is used for acquiring index data of a plurality of preset indexes of each model table in the data warehouse;
the set initialization module is used for dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels and distributing a model table for each model table set as an initial element;
the clustering operation module is used for carrying out clustering operation on each model table according to each initial element and distributing each model table to the corresponding model table set according to the clustering operation result;
and the level output module is used for distributing corresponding level identifiers to the model tables according to the model table distribution result.
In an exemplary embodiment of the present disclosure, the preset indexes include a number of fields, a number of records, a number of tasks, a number of source tables, a number of queries of the statistical period, a number of downloads of the statistical period, a duration of script operation, a number of updates of the statistical period, a number of people used in the statistical period, and a number of comments of the user.
In one exemplary embodiment of the present disclosure, assigning a model table to each of the set of model tables as an initial element includes:
for each model table, calculating the sum of all index data of the model table as first judgment data of the model table;
sorting the model tables according to the first judgment data;
and selecting the model table sets distributed to the preset order as initial elements according to the sorting result of the models.
In an exemplary embodiment of the present disclosure, clustering each of the model tables according to each of the initial elements includes:
for each model table, calculating the distance between the model table and the vector centroid of each model table set;
assigning the model table to a set of the model tables if the distance of the model table from the vector centroid of the set of model tables is minimal;
and calculating the vector centroid of the model table set according to the index data of all the model tables in the model table set.
According to one aspect of the present disclosure, there is provided an electronic device including:
a processor; and
A memory having stored thereon computer readable instructions which, when executed by the processor, implement a model table level determination method of a data warehouse as set forth in any one of the above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a model table level determination method of a data warehouse as described in the first aspect above.
In some embodiments of the present invention, a clustering operation is performed on a model table based on index data of a plurality of indexes of each model table in an acquired data warehouse, and a level of each model table is automatically determined according to a result of the clustering operation. Compared with the prior art, the level judgment is carried out based on the indexes of the model table, so that the obtained judgment result is more objective and accurate, and the model tables of all data warehouses can be covered by selecting different indexes, so that the model table has better universality. More specifically, on one hand, the technical scheme provided by the invention can avoid the problems of random judgment of manual experience and low recognition rate, has good service interpretation on the judgment standard of the model table level, and can reduce the dispute of manual judgment; on the other hand, the technical scheme provided by the invention has better universality, can be used for judging the model table level of all data warehouses, and reduces the system management blind area and operation and maintenance risk caused by the manually unrecognizable model; on the other hand, the technical scheme provided by the invention can realize automatic grading of the model table, so that daily management is carried out, the problem that the model table level is estimated again after the problem occurs in manual judgment can be solved, the risks in daily management and operation and maintenance of the data warehouse can be avoided, and the stability and application value of the data warehouse system are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:
FIG. 1 illustrates a model hierarchy architecture diagram of a data warehouse in accordance with one aspect;
FIG. 2 illustrates a flow diagram of a model table level determination method of a data warehouse, according to some embodiments of the invention;
FIG. 3 illustrates a flow diagram of a model table level determination method of a data warehouse, according to some embodiments of the invention;
FIG. 4 shows a schematic block diagram of a model table level determination apparatus of a data warehouse according to an exemplary embodiment of the present invention;
fig. 5 shows a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
The prior art data warehouse generally does not consider grading model tables, and the relative importance of the model tables is evaluated mostly by a method of manual experience. For example, the order summary width model table, the order detail width model table, and the like are business model tables of an e-commerce system with orders as cores, so that the manual judgment is an important model table, and the evaluation of the model table is relatively important. And further, more attention is paid to the daily system maintenance and operation and maintenance management. While other model tables have no unified rules and methods to determine model table levels. For example:
FIG. 1 is a schematic diagram of a model hierarchy of a data warehouse. It can be seen from fig. 1 that the data warehouse only hierarchies the data flow of the model table process; the data flow from 1 layer to 4 layers is basically from 1 layer to 2 layers, 3 layers and 4 layers in turn from left to right (the specific number of layers can be set differently according to the requirements of enterprises). However, for the model table in each layer, no distinction is given between the model table levels, i.e., the importance of any model table in the data warehouse cannot generally be judged and no method for evaluating the model table levels is available.
In the prior art, model table levels are judged according to a manual experience mode, different level labels are set for the model table according to the model table levels judged by the manual experience, and then management and operation and maintenance work of a daily data warehouse are executed according to the level labels. For the judgment mode of manual experience, in actual work, the judgment is carried out according to whether core business is judged, and the judgment is carried out according to the fact that the number of tasks used by a model table meets a certain rule, so that a scientific and reasonable judgment method which accords with actual operation is not available. Thus, existing methods of determining model table levels by human experience have the following drawbacks:
1) Because the method for manually judging the model table level has no reasonable rule, the recognition rate is low, only a small part of the model table can be judged, and the judgment result has high disputes due to human factors.
2) Some special model tables cannot be identified, such as a core service model table, but the usage rate is high, so that important model tables are not identified, and system management blind areas and operation and maintenance risks are increased.
3) The manual judgment mode is mostly a mode of post judgment, namely a mode that a problem is caused in the use of a model table, and the level of the model table can be judged and analyzed after a certain influence is caused.
Thus, prior art solutions do not form an efficient way to determine model table levels, which can pose potential risks to data warehouse management and operation.
Based on the foregoing, in an exemplary embodiment of the present invention, a model table level determination method of a data warehouse is first proposed. The method may be performed by a server or may be performed by other electronic devices, which is not particularly limited in the present exemplary embodiment. Referring to fig. 2, the model table level judging method of the data warehouse may include the steps of:
s210, acquiring index data of a plurality of preset indexes of each model table in the data warehouse;
s220, dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels, and distributing a model table as an initial element for each model table set;
s230, carrying out clustering operation on each model table according to each initial element, and distributing each model table to the corresponding model table set according to the clustering operation result;
and S240, distributing corresponding level identifiers for the model tables according to the model table distribution results.
Compared with the prior art, the model table level judging method of the data warehouse in the example embodiment of fig. 2 is based on level judgment of indexes of the model table, so that the obtained judging result is more objective and accurate, and the model tables of all the data warehouses can be covered by selecting different indexes, so that the model table level judging method has better universality. More specifically, on one hand, the technical scheme provided by the invention can avoid the problems of random judgment of manual experience and low recognition rate, has good service interpretation on the judgment standard of the model table level, and can reduce the dispute of manual judgment; on the other hand, the technical scheme provided by the invention has better universality, can be used for judging the model table level of all data warehouses, and reduces the system management blind area and operation and maintenance risk caused by the manually unrecognizable model; on the other hand, the technical scheme provided by the invention can realize automatic grading of the model table, so that daily management is carried out, the problem that the model table level is estimated again after the problem occurs in manual judgment can be solved, the risks in daily management and operation and maintenance of the data warehouse can be avoided, and the stability and application value of the data warehouse system are improved.
A model table level determination method of the data warehouse in the example embodiment of fig. 2 is described in detail below in conjunction with fig. 3.
In step S210, index data of a plurality of preset indexes of each model table in the data warehouse is collected.
In an example embodiment, the data warehouse includes all of the model tables, and the model tables may be hierarchically designed and managed. The data warehouse may design, generate, use, and manage the model table through some methods such as ETL (Extract-Transform-Load), metadata management, and the like. For each model table, it may have a plurality of indices describing it. For example, in this exemplary embodiment, the preset indexes may include indexes related to attributes of the model table, indexes related to related tasks, indexes related to dependent source data, indexes related to usage conditions, and the like. Specifically, the preset indexes may include, but are not limited to, a number of fields, a number of records, a number of tasks, a number of source tables, a number of queries of the statistical period, a number of downloads of the statistical period, a script running duration, a number of updates of the statistical period, a number of users used in the statistical period, a number of comments of the users, and the like. The statistical period may be a week, month, year, etc., and is not particularly limited in the present exemplary embodiment.
After the index data of a plurality of preset indexes of each model table in the data warehouse are collected, the index data can be integrated into a model judgment condition table, so that the storage and query operation are convenient. For example, the fields of the model decision condition table may include: model table name, field number (M1), record number (ten thousand) (M2), task number (M3), source table number (M4), month inquiry number (M5), month downloading number (M6), script running time (min) (M7), month updating number (M8), number of people used in the month (M9) and user comment number (M10); the structure of the model judgment condition table is as follows in table 1:
TABLE 1
Model table name M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
table1 25 305 4 8 245 80 34 0 42 10
table2 71 205 3 3 525 46 14 1 33 6
table3 20 85 4 4 352 35 41 0 21 9
table4 16 33 5 5 109 11 32 0 45 11
table5 34 190 6 5 221 13 23 0 63 4
table6 31 234 7 3 176 14 11 0 51 6
table7 29 53 2 6 240 17 21 0 35 14
table8 19 43 4 4 460 37 51 0 25 34
table9 39 73 5 4 405 22 31 0 37 12
table10 22 66 5 6 266 44 22 0 53 21
…… …… …… …… …… …… …… …… …… …… ……
In step S220, a plurality of model table levels are divided, a model table set corresponding to the model table levels is established, and a model table is allocated to each model table set as an initial element.
In this example embodiment, a plurality of model table levels may be divided according to the requirements; for example, divide 2 two model table levels, 3 model table levels, 4 model table levels, 5 model table levels, and so on. Taking the division of 4 model table levels as an example, all model tables can be divided into A, B, C, D four levels, which respectively correspond to the importance degrees of the models: "ultra high", "medium", "low". Accordingly, different levels of model tables may correspond to different daily management measures. The specific examples are shown in Table 2:
TABLE 2
Model table level Degree of model importance Daily management measures
A Superhigh pressure Resource priority, 7 x 24 monitoring, advanced disaster recovery, and dedicated person responsibility
B High height Resource priority, 7 x 24 monitoring, normal disaster recovery, dedicated person responsible
C In (a) 7 x 24 monitoring, normal disaster recovery, round trip value responsibility
D Low and low 7 x 24 monitoring and normal disaster recovery
After the model table levels are divided, a model table set corresponding to the model table levels can be established. For example, the model table set corresponding to the model table level a is set a; the model table set corresponding to the model table level B is set B; the model table set corresponding to the model table level C is set C; the model table set corresponding to the model table level D is set D.
Referring to fig. 3, in this exemplary embodiment, assigning a model table to each of the model table sets as an initial element may include the steps of:
and S310, calculating the sum of all index data of each model table as first judgment data of the model table. Taking the model judgment condition table shown in Table 1 as an example, the first judgment data x of each model table is calculated i The method of (2) may be as follows:
wherein i represents each model table; i=1, 2,3 … …, n; n represents the total number of model tables. After the calculation is completed, the set x= { X is obtained i }。
Of course, in other exemplary embodiments of the present disclosure, the first determination data may also be calculated in other manners; for example, performing weighted summation on all the index data, or performing other processes such as product, power operation and the like to obtain first judgment data; these are also within the scope of the present disclosure.
And S320, sorting the model tables according to the first judgment data.
For example, the first judgment data corresponding to the model tables table1, table2, table3, table4, table5, table6, table7, table8, table9, table10 in the above table2 are 753, 907, 571, 267, 559, 533, 417, 677, 628, 505, respectively; the corresponding rank is 2, 1, 5, 10, 6, 7, 9, 3, 4, 8. As shown in table 3:
TABLE3 Table3
Model table name M1 M2 …… M10 X x i Values of (2) Ranking
table1 25 305 …… 10 x 1 753 2
table2 71 205 …… 6 x 2 907 1
table3 20 85 …… 9 x 3 571 5
table4 16 33 …… 11 x 4 267 10
table5 34 190 …… 4 x 5 559 6
table6 31 234 …… 6 x 6 533 7
table7 29 53 …… 14 x 7 417 9
table8 19 43 …… 34 x 8 677 3
table9 39 73 …… 12 x 9 628 4
table10 22 66 …… 21 x 10 505 8
S330, selecting the model table sets distributed to the preset order as initial elements according to the sorting result of the models. For example:
for set A, corresponding to model table level A, the initial element in set A is denoted as a 1 . In the present exemplary embodiment, a may be taken 1 =argmax(x i X); namely taking a model table corresponding to xi with the largest value in the set X as an initial element.
For set B, corresponding to model table level B, the initial element in set B is denoted as B 1 . In the present exemplary embodiment, b may be taken 1 =argtop30%(x i X); i.e. get X ranked at 30% ranking in set X i The corresponding model table is the initial element.
For set C, corresponding to model table level C, the initial element in set C is denoted as C 1 . In the present exemplary embodiment, c may be taken 1 =argtop70%(x i X); i.e. get X ranked at 70% ranking in set X i The corresponding model table is the initial element.
For set D, corresponding to model table level D, the initial element in set D is denoted as D 1 . In the present exemplary embodiment, d may be taken 1 =argmin(x i X); i.e. taking the X with the smallest value in the set X i The corresponding model table is the initial element.
Taking the data in Table 3 as an example, the first judgment data x of each model table is calculated i After juxtaposing the names, it is possible to obtain:
for set A, the initial element takes the model table corresponding to rank 1, i.e., the maximum value in set X, as initial element a 1 I.e. a 1 Table2. Accordingly, the initial set a= { (a) 1 =table2)}。
For set B, the initial element takes the rank 30 th, i.e., X of rank 3 in set X i The corresponding model table is taken as an initial element b 1 I.e. b 1 Table8. Accordingly, the initial set b= { (B) 1 =table8)}。
For set C, the initial element takes the 70 th rank, i.e., X of rank 7 in set X i Corresponding toAs initial element c 1 I.e. c 1 Table6. Accordingly, the initial set c= { (C) 1 =table6)}。
For set D, the initial element takes the rank 10, i.e., the corresponding model table for the minimum in set X, as initial element D 1 I.e. d 1 Table4. Accordingly, the initial set d= { (D) 1 =table4)}。
In step S230, clustering operation is performed on each model table according to each initial element, and each model table is allocated to the corresponding model table set according to the result of the clustering operation.
In this example embodiment, a clustering operation may be performed in conjunction with calculating vector centroids of a set of model tables. For example, referring to fig. 3, clustering the model tables according to the initial elements may include the following steps:
and S340, calculating the distance between each model table and the vector centroid of each model table set for each model table.
In this exemplary embodiment, for each model table set, a vector centroid of the model table set may be calculated according to the index data of all model tables in the model table set.
For example, let the number of mode tables in set a corresponding to level a be o, let the number of mode tables in set B corresponding to level B be p, let the number of mode tables in set C corresponding to level C be k, and let the number of mode tables in set D corresponding to level D be m. In each model table set, each model table element is a 10-dimensional vector, and the coordinate values of the vector are the fields M1, M2 …, M10 of the model determination condition determination table in the above table 1 in order. Thus the generalized representation of set A, set B, set C, set D is as follows:
A={a 1 ,a 2 ,...,a o }a i ∈R n (i=1,2,...,o)
B={b 1 ,b 2 ,...,b p }b i ∈R n (i=1,2,...,p)
C={c 1 ,c 2 ,...,c k }c i ∈R n (i=1,2,...,k)
D={d 1 ,d 2 ,...,d m }d i ∈R n (i=1,2,...,m)
Wherein n=10, r n The representation is a 10-dimensional vector space. Of course, if other number of fields are included in the determination condition determination table, the dimension of each model table element corresponds to other number, which is not particularly limited in the present exemplary embodiment.
After the generalized representation of set A, set B, set C, set D is obtained, then the vector centroids μ of set A, set B, set C, set D a 、μ b 、μ c 、μ d The calculation can be made by the following formula:
that is, in the present exemplary embodiment, the vector centroid calculation method of the model table set is to calculate the average value of the vector positions corresponding to all the elements in the set, and the resulting μ a 、μ b 、μ c 、μ d Are all 10-dimensional vectors. Those skilled in the art will readily appreciate that in other exemplary embodiments of the present disclosure, the centroids of the set of model tables may be calculated in other ways, and the exemplary embodiment is not limited thereto.
After calculating the vector centroid of the set of model tables, for each model table, the vector of the model table may be takenN, calculating vector N and vector centroid μ of set A, set B, set C, set D a 、μ b 、μ c 、μ d Dis_a, dis_b, dis_c, dis_d. For example:
Dis_a=||N-μ a || 2
Dis_b=||N-μ b || 2
Dis_c=||N-μ c || 2
Dis_d=||N-μ d || 2
wherein, I X-Y I is the open root number of the sum of squares of the components after vector difference.
In this exemplary embodiment, the euclidean distance is calculated, but in other exemplary embodiments of the present disclosure, the mahalanobis distance, the cosine distance, the manhattan distance, or the like may be calculated; these are also within the scope of the present disclosure.
Step S350, if the distance between the model table and the vector centroid of one model table set is minimum, the model table is allocated to the model table set. The minimum distance can be determined, for example, by:
Min(Dis_a,Dis_b,Dis_c,Dis_d)
for example, for the model table1, if the distance between the model table1 and the vector centroid of the model table set a is the smallest, the model table1 is allocated to the model table set a; for the model table2, if the distance between the model table2 and the vector centroid of the model table set D is the smallest, the model table2 is allocated to the model table set D, and so on.
And S360, calculating the vector centroid of the model table set according to the index data of all the model tables in the model table set.
I.e. after a new element is added to a set of model tables, its vector centroid needs to be recalculated. In this exemplary embodiment, the vector centroid thereof may be recalculated by the method in step S340 described above, or may be calculated by the following formula:
If a new model table is added in the set A, the vector centroid of the set A is updated as follows:
o=o+1
if a new model table is added in the set B, the vector centroid of the set B is updated as follows:
p=p+1
if a new model table is added in the set C, the vector centroid of the set C is updated as follows:
k=k+1
if a new model table is added in the set D, the vector centroid of the set D is updated as follows:
m=m+1
then, the steps S340 to S360 are iterated until all the model tables of the "model judgment condition table" are judged to be completed, and the four finally obtained A, B, C, D sets are the classification results of all the models in the judgment data warehouse.
In step S240, corresponding level identifiers are allocated to the model tables according to the model table allocation result.
In this example embodiment, the level identifier may be a level label, a level score, or the like; taking the class label as an example, the output results can be shown in the following table 4:
TABLE 4 Table 4
Model table Model table level
table1 A
table2 D
table3 C
table4 B
table5 B
table6 C
…… ……
Based on the obtained level judgment result, different daily management measures can be adopted in a targeted manner for model tables of different levels, so that risks in daily management and operation and maintenance of the data warehouse can be avoided, and the stability and application value of the data warehouse system are improved.
In addition, in the embodiment of the invention, a model table level judging device of the data warehouse is also provided. Referring to fig. 4, the model table level determining apparatus 400 of the data warehouse may include: a data acquisition module 410, a set initialization module 420, a cluster operation module 430, and a level output module 440. Wherein:
the data acquisition module 410 may be configured to acquire index data of a plurality of preset indexes of each model table in the data warehouse.
The set initialization module 420 may be configured to divide a plurality of model table levels, establish a model table set corresponding to the model table levels, and assign a model table to each of the model table sets as an initial element.
The clustering module 430 may be configured to perform a clustering operation on each model table according to each initial element, and allocate each model table to the corresponding model table set according to a result of the clustering operation.
The level output module 440 may be configured to assign a corresponding level identifier to each model table according to the model table assignment result.
In some embodiments of the invention, based on the foregoing,
the preset indexes comprise a plurality of fields, records, tasks, source tables, query times of the statistical period, downloading times of the statistical period, script running time, updating times of the statistical period, numbers of people used in the statistical period and user comments.
In some embodiments of the present invention, based on the foregoing scheme, assigning a model table to each of the model table sets as an initial element may include:
for each model table, calculating the sum of all index data of the model table as first judgment data of the model table;
sorting the model tables according to the first judgment data;
and selecting the model table sets distributed to the preset order as initial elements according to the sorting result of the models.
In some embodiments of the invention, based on the foregoing,
clustering the model tables according to the initial elements comprises the following steps:
for each model table, calculating the distance between the model table and the vector centroid of each model table set;
assigning the model table to a set of the model tables if the distance of the model table from the vector centroid of the set of model tables is minimal;
and calculating the vector centroid of the model table set according to the index data of all the model tables in the model table set.
Since the respective functional modules of the model table level determination apparatus 400 of the data warehouse of the exemplary embodiment of the present invention correspond to the steps of the exemplary embodiment of the model table level determination method of the data warehouse described above, a detailed description thereof will be omitted.
In an exemplary embodiment of the present invention, an electronic device capable of implementing the above method is also provided.
Referring now to FIG. 5, there is illustrated a schematic diagram of a computer system 500 suitable for use in implementing an electronic device of an embodiment of the present invention. The computer system 500 of the electronic device shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU) 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the system operation are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the system of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 501.
The computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the model table level determination method of the data warehouse as described in the above embodiment.
For example, the electronic device may implement the method as shown in fig. 2: s210, acquiring index data of a plurality of preset indexes of each model table in the data warehouse; s220, dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels, and distributing a model table as an initial element for each model table set; s230, carrying out clustering operation on each model table according to each initial element, and distributing each model table to the corresponding model table set according to the clustering operation result; and S240, distributing corresponding level identifiers for the model tables according to the model table distribution results.
It should be noted that although in the above detailed description several modules or units of a device or means for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present invention.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (9)

1. A model table level determination method for a data warehouse, comprising:
acquiring index data of a plurality of preset indexes of each model table in the data warehouse; the preset indexes comprise one or more of indexes related to model table attributes, indexes related to related tasks, indexes related to dependent source data and indexes related to use conditions;
Dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels, and distributing a model table as an initial element for each model table set; the model table level is determined according to the importance degree of the model table in daily management;
performing clustering operation on each model table according to each initial element, wherein the clustering operation comprises the following steps: for each model table, calculating the distance between the model table and the vector centroid of each level of the model table set; assigning the model table to a set of the model tables if the distance of the model table from the vector centroid of the set of model tables is minimal; calculating vector centroids of the model table set according to the index data of all model tables in the model table set;
and distributing corresponding level identifiers for the model tables according to the model table distribution results.
2. The method for determining a model table level of a data warehouse according to claim 1, wherein the preset indexes comprise a plurality of fields, records, tasks, source tables, query times of the statistics period, download times of the statistics period, script running duration, update times of the statistics period, number of people used in the statistics period and number of user comments.
3. The method of model table level determination for a data warehouse of claim 1, wherein assigning a model table to each of the model table sets as an initial element comprises:
for each model table, calculating the sum of all index data of the model table as first judgment data of the model table;
sorting the model tables according to the first judgment data;
and selecting the model table sets distributed to the preset order as initial elements according to the sorting result of the models.
4. A model table level judging device of a data warehouse, comprising:
the data acquisition module is used for acquiring index data of a plurality of preset indexes of each model table in the data warehouse; the preset indexes comprise one or more of indexes related to model table attributes, indexes related to related tasks, indexes related to dependent source data and indexes related to use conditions;
the set initialization module is used for dividing a plurality of model table levels, establishing model table sets corresponding to the model table levels and distributing a model table for each model table set as an initial element; the model table level is determined according to the importance degree of the model table in daily management;
The clustering operation module is used for carrying out clustering operation on the model tables according to the initial elements, and the clustering operation comprises the following steps: for each model table, calculating the distance between the model table and the vector centroid of each level of the model table set; assigning the model table to a set of the model tables if the distance of the model table from the vector centroid of the set of model tables is minimal; calculating vector centroids of the model table set according to the index data of all model tables in the model table set;
and the level output module is used for distributing corresponding level identifiers to the model tables according to the model table distribution result.
5. The model table level judging device of the data warehouse according to claim 4, wherein the preset indexes comprise a plurality of fields, records, tasks, source tables, statistics period inquiry times, statistics period downloading times, script running time, statistics period updating times, statistics period using people and user comments.
6. The model table level judging apparatus of a data warehouse of claim 4, wherein assigning a model table as an initial element to each of the model table sets comprises:
For each model table, calculating the sum of all index data of the model table as first judgment data of the model table;
sorting the model tables according to the first judgment data;
and selecting the model table sets distributed to the preset order as initial elements according to the sorting result of the models.
7. The model table level judgment device of a data warehouse according to any one of claims 4 to 6, wherein performing a clustering operation on each model table based on each initial element comprises:
for each model table, calculating the distance between the model table and the vector centroid of each model table set;
assigning the model table to a set of the model tables if the distance of the model table from the vector centroid of the set of model tables is minimal;
and calculating the vector centroid of the model table set according to the index data of all the model tables in the model table set.
8. An electronic device, comprising:
a processor; and
a memory having stored thereon computer readable instructions which, when executed by the processor, implement the model table level determination method of a data warehouse of any one of claims 1 to 3.
9. A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the model table level determination method of a data warehouse as claimed in any one of claims 1 to 3.
CN201810475388.9A 2018-05-17 2018-05-17 Model table level judging method and device of data warehouse Active CN110569313B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810475388.9A CN110569313B (en) 2018-05-17 2018-05-17 Model table level judging method and device of data warehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810475388.9A CN110569313B (en) 2018-05-17 2018-05-17 Model table level judging method and device of data warehouse

Publications (2)

Publication Number Publication Date
CN110569313A CN110569313A (en) 2019-12-13
CN110569313B true CN110569313B (en) 2023-12-05

Family

ID=68771832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810475388.9A Active CN110569313B (en) 2018-05-17 2018-05-17 Model table level judging method and device of data warehouse

Country Status (1)

Country Link
CN (1) CN110569313B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342903A (en) * 2020-02-18 2021-09-03 北京沃东天骏信息技术有限公司 Method and device for managing models in data warehouse
CN112256802A (en) * 2020-10-20 2021-01-22 威海上和软件科技有限公司 Automatic acquisition method and equipment for marine microorganism information
CN113568990A (en) * 2021-09-01 2021-10-29 上海中通吉网络技术有限公司 Management system of data warehouse model
CN115081787A (en) * 2022-03-10 2022-09-20 上海数中科技有限公司 Model management method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105099782A (en) * 2015-08-18 2015-11-25 北京京东世纪贸易有限公司 Method and system for controlling big data resource of cloud environment cluster
CN107315627A (en) * 2017-05-31 2017-11-03 北京京东尚科信息技术有限公司 A kind of method and apparatus of automatic configuration data warehouse parallel task queue
CN107766940A (en) * 2017-11-20 2018-03-06 北京百度网讯科技有限公司 Method and apparatus for generation model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10042914B2 (en) * 2015-06-10 2018-08-07 International Business Machines Corporation Database index for constructing large scale data level of details

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105099782A (en) * 2015-08-18 2015-11-25 北京京东世纪贸易有限公司 Method and system for controlling big data resource of cloud environment cluster
CN107315627A (en) * 2017-05-31 2017-11-03 北京京东尚科信息技术有限公司 A kind of method and apparatus of automatic configuration data warehouse parallel task queue
CN107766940A (en) * 2017-11-20 2018-03-06 北京百度网讯科技有限公司 Method and apparatus for generation model

Also Published As

Publication number Publication date
CN110569313A (en) 2019-12-13

Similar Documents

Publication Publication Date Title
US11188791B2 (en) Anonymizing data for preserving privacy during use for federated machine learning
CN110569313B (en) Model table level judging method and device of data warehouse
US8849828B2 (en) Refinement and calibration mechanism for improving classification of information assets
CN109344154B (en) Data processing method, device, electronic equipment and storage medium
WO2018103718A1 (en) Application recommendation method and apparatus, and server
US20150278813A1 (en) Determining a temporary transaction limit
TW202029079A (en) Method and device for identifying irregular group
US10303705B2 (en) Organization categorization system and method
CN115408381A (en) Data processing method and related equipment
CN115203435A (en) Entity relation generation method and data query method based on knowledge graph
US11061934B1 (en) Method and system for characterizing time series
CN112182138A (en) Catalog making method and device
US20160004730A1 (en) Mining of policy data source description based on file, storage and application meta-data
CN104636422B (en) The method and system for the pattern concentrated for mining data
US10003492B2 (en) Systems and methods for managing data related to network elements from multiple sources
US11868167B2 (en) Automatically provisioned tag schema for hybrid multicloud cost and chargeback analysis
US20150363711A1 (en) Device for rapid operational visibility and analytics automation
CN111915115A (en) Execution policy setting method and device
CN115543428A (en) Simulated data generation method and device based on strategy template
US20220284023A1 (en) Estimating computational cost for database queries
US20220188308A1 (en) Selecting access flow path in complex queries
CN113554307A (en) RFM (recursive filter) model-based user grouping method and device and readable medium
US12001456B2 (en) Mutual exclusion data class analysis in data governance
CN112685574B (en) Method and device for determining hierarchical relationship of domain terms
AU2022208873B2 (en) Information matching using subgraphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant