WO2022217987A1 - Data table heat differentiation method and apparatus, and related device - Google Patents

Data table heat differentiation method and apparatus, and related device Download PDF

Info

Publication number
WO2022217987A1
WO2022217987A1 PCT/CN2022/071364 CN2022071364W WO2022217987A1 WO 2022217987 A1 WO2022217987 A1 WO 2022217987A1 CN 2022071364 W CN2022071364 W CN 2022071364W WO 2022217987 A1 WO2022217987 A1 WO 2022217987A1
Authority
WO
WIPO (PCT)
Prior art keywords
data table
data
heat
service node
tables
Prior art date
Application number
PCT/CN2022/071364
Other languages
French (fr)
Chinese (zh)
Inventor
季振峰
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2022217987A1 publication Critical patent/WO2022217987A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Definitions

  • the present application relates to the field of big data, and in particular, to a method, device and related equipment for distinguishing the heat of a data table.
  • the present application provides a method, device and related equipment for distinguishing the heat of a data table, which can improve the accuracy of distinguishing the heat of a data table.
  • a method for distinguishing data table heat includes:
  • the service node obtains a second data table associated with the first data table from a storage node, where the storage node stores a plurality of data tables;
  • the service node acquires the associated heat of the first data table and the second data table, wherein the associated heat of the first data table and the second data table is based on the inherent heat of the second data table And the association relationship between the first data table and the second data table is obtained, and the inherent heat of the second data table is the heat generated by the second data table being called;
  • the service node determines the popularity of the first data table according to the relative popularity of the first data table and the second data table.
  • the heat brought by the second data table having an associated relationship with the first data table is introduced to the first data table, that is, the difference between the first data table and the second data table. Therefore, the calculated popularity of the first data table can be improved to be more accurate, and when the popularity of multiple data tables is acquired, the popularity of multiple data tables can be better distinguished.
  • the service node acquires the second data table associated with the first data table from the storage node, including:
  • the service node obtains, from the storage node, the second data table having a data blood relationship with the first data table, wherein the data blood relationship indicates that the second data table is based on the first data table Calculated, or, the first data table is calculated according to the second data table;
  • the service node obtains the correlation heat between the first data table and the second data table, including:
  • the service node calculates the correlation degree of the first data table and the second data table according to the data blood relationship between the first data table and the second data table.
  • the service node obtains the second data table associated with the first data table from the storage node, including:
  • the service node acquires, from the storage node, the second data table having a primary and foreign key association relationship with the first data table, wherein the primary and foreign key association relationship represents one of the first data tables Or multiple fields are referenced as the primary key of the second data table, or, one or more fields in the second data table are referenced as the primary key of the first data table;
  • the service node obtains the correlation heat between the first data table and the second data table, including:
  • the service node calculates the association heat between the first data table and the second data table according to the primary and foreign key association relationship between the first data table and the second data table.
  • the service node determines the popularity of the first data table according to the correlation between the first data table and the second data table, including:
  • the service node determines the heatness of the first data table according to the inherent heatness of the first data table and the associated heatness of the first data table and the second data table, wherein the first data table
  • the inherent heat is the heat generated by the first data table being called.
  • the method further includes:
  • the service node calculates the heatness of the plurality of data tables
  • the service node deletes, from the storage node according to the calculation result, data tables whose heat is less than a first preset threshold.
  • the service node deletes the data table with low heat from the storage node according to the calculation result, which can save storage space.
  • the method further includes:
  • the service node calculates the heatness of the plurality of data tables
  • the service node adjusts, according to the calculation result, a position on the display interface of a data table whose heat is greater than the second preset threshold in the plurality of data tables to be in front of a data table whose heat is less than the second preset threshold.
  • the service node adjusts the position of the data table with high popularity on the display interface to the front of the data table with low popularity, so that the user can view the data table with high popularity conveniently and quickly.
  • the method further includes:
  • the service node calculates the heatness of the plurality of data tables
  • the service node migrates, according to the calculation result, data tables whose heat is less than a third preset threshold to a first storage device, where the storage performance of the first storage device is lower than that of the storage node.
  • the service node migrates the data table with low heat to the first storage device whose storage performance is lower than that of the storage node, which can not only prevent the data table with low heat from continuing to occupy the resources of the storage node, but also when users need to view this part of the data table in the future. Also found from the first storage device.
  • the method further includes:
  • the service node calculates the heatness of the plurality of data tables
  • the service node migrates, according to the calculation result, a data table whose heat is greater than a fourth preset threshold to a second storage device, where the storage performance of the second storage device is higher than that of the storage node.
  • the service node migrates the hot data table to the second storage device with higher storage performance than the storage node, which can improve the efficiency of operating data in the hot data table and improve the storage security of the hot data table. sex.
  • a data table heat discrimination device is provided, the device is applied to a service node, and the device includes:
  • an obtaining module configured to obtain a second data table associated with the first data table from a storage node, where the storage node stores a plurality of data tables;
  • a processing module configured to obtain the correlation degree of the first data table and the second data table, wherein the correlation degree of the first data table and the second data table is based on the inherent characteristics of the second data table
  • the heat and the association relationship between the first data table and the second data table are obtained, and the inherent heat of the second data table is the heat generated by the second data table being called;
  • the processing module is configured to determine the popularity of the first data table according to the correlation between the first data table and the second data table.
  • the obtaining module is specifically used for:
  • the second data table having a data blood relationship with the first data table from the storage node, wherein the data blood relationship indicates that the second data table is calculated according to the first data table, or , the first data table is calculated according to the second data table;
  • the processing module is specifically used for:
  • the correlation degree of the first data table and the second data table is calculated.
  • the obtaining module is specifically used for:
  • the second data table having a primary-foreign key association relationship with the first data table from the storage node, wherein the primary-foreign key association relationship represents one or more fields in the first data table be referenced as the primary key of the second data table, or, one or more fields in the second data table are referenced as the primary key of the first data table;
  • the processing module is specifically used for:
  • the association degree of the first data table and the second data table is calculated.
  • the processing module is specifically used for:
  • the heatness of the first data table is determined according to the inherent heatness of the first data table and the correlation heatness of the first data table and the second data table, wherein the inherent heatness of the first data table is all The heat generated when the first data table is called.
  • the processing module is further configured to:
  • the processing module is further configured to:
  • the position on the display interface of the data table whose heat is greater than the second preset threshold among the plurality of data tables is adjusted to be in front of the data table whose heat is less than the second preset threshold.
  • the processing module is further configured to:
  • the data tables whose heat is less than the third preset threshold are migrated to the first storage device, and the data tables whose heat is greater than the fourth preset threshold are migrated to the second storage device, wherein the storage of the first storage device
  • the performance of the second storage device is lower than that of the storage node, and the storage performance of the second storage device is higher than that of the storage node.
  • a non-transitory computer-readable storage medium stores computer-readable instructions.
  • the computer-readable instructions When the computer-readable instructions are executed, the first method described above is executed. Aspect or a method described in any specific implementation of the first aspect.
  • a computer program product including a computer program, when the computer program is read and executed by a cluster of computer devices, the cluster of computer devices is made to execute the first aspect or any specific implementation of the first aspect. The method described in the implementation.
  • a computing device cluster including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the The computing device performs the method as described in the above first aspect or any specific implementation of the first aspect.
  • the computing device cluster includes a computing device, and the computing device includes a processor and a memory; the processor is configured to execute instructions stored in the memory, so that the computing device performs the first aspect or A method provided by any possible implementation manner of the first aspect.
  • the computing device cluster includes at least two computing devices, and each computing device includes a processor and a memory; the processors of the at least two computing devices are used to execute the memory of the at least two computing devices.
  • FIG. 1 is a schematic diagram of an application scenario involved in an embodiment of the present application
  • FIG. 2 is a schematic diagram of a data blood relationship involved in an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a primary-foreign key association relationship involved in an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a method for distinguishing the heat of a data table provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of a data blood relationship of a first data table provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another data table heat discrimination method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a primary and foreign key association relationship of a first data table provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • first and second in the embodiments of the present application are only used for the purpose of description, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
  • “at least one” refers to one or more, and “multiple” refers to two or more.
  • “And/or”, which describes the association relationship of the associated objects indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an "or” relationship.
  • “At least one of the following” or similar expressions refers to any combination of these items, including any combination of a single item(s) or a plurality of items(s).
  • At least one (a) of a, b or c may represent: a, b, c, a-b, a-c, b-c or a-b-c, wherein a, b, c may be single or multiple.
  • Transactional data also known as transactional data, business data, etc.
  • transactional data describe the internal or external events or transaction records in the business operation process of the organization, such as sales orders, call records, etc.
  • Data popularity a value used to reflect the degree of attention to the data. This value also indicates the possibility of the data being accessed within a certain period of time from the current time. If the data popularity is large, it indicates that the data has a high degree of attention, indicating that the data has received a high degree of attention. The data has a high possibility of being accessed in the current period of time, and the data popularity is small, indicating that the data has a low degree of attention, indicating that the possibility of the data being accessed in the current period of time is very small.
  • Data table popularity a value used to reflect the degree of attention of the data table. This value indicates the possibility of the data table being accessed for a period of time from the current beginning. If the data table is hot, it means that the data table has a high degree of attention. , indicating that the data table is very likely to be accessed for a period of time from the current time, and the data table is less popular, indicating that the data table has a low degree of attention, indicating that the data table is very likely to be accessed for a period of time from the current beginning. Small.
  • the inherent heat of the data table the heat generated by the data table itself being called, the heat can be determined according to the number of times the data table is called (also called the number of times of use or the number of visits), usually, the inherent heat of the data table
  • the heat is equal to the number of times the data table is called, where the number of times the data table is called includes the number of times of querying (select) data, adding (insert) data, deleting (deleting) data, and modifying (update)
  • the number of times the data table is called also includes the number of other data operations performed in the data table.
  • the method based on data creation time is mainly used to distinguish the popularity of transaction data tables (that is, tables that mainly include transaction data). Specifically, assuming that the storage node stores transaction data table A and transaction data table B, the data in transaction data table A is Created in the last year, the data in transaction data table B was created one year ago. After obtaining transaction data table A and transaction data table B from the storage node, the service node obtains the creation time of the data in transaction data table A and The creation time of the data in transaction data table B is compared. When it is determined that most or all of the data in transaction data table A are created later than the data in transaction data table B, the transaction data table will be determined. The heat of A is greater than the heat of transaction data table B, otherwise, it is determined that the heat of transaction data table A is less than the heat of transaction data table B.
  • the service node then distinguishes the heat of the two transaction data tables according to the above method based on the data creation time, and the obtained heat distinction result is obviously inaccurate and inconsistent with the actual application scenario.
  • the inherent heat method based on the data table is mainly used to distinguish the heat of the webpage data table (that is, the table mainly including webpage data (such as articles, pictures, videos, etc. published on the webpage), specifically, it is assumed that the storage node stores webpage data. Table A and webpage data table B. After obtaining webpage data table A and webpage data table B from the storage node, the service node obtains the inherent heat of webpage data table A and the inherent heat of webpage data table B and compares them. When the inherent popularity of data sheet A is greater than that of webpage data sheet B, it will be determined that the popularity of webpage data sheet A is greater than that of webpage data sheet B; otherwise, it is determined that the popularity of webpage data sheet A is lower than that of webpage data sheet B.
  • the service node distinguishes the popularity of the two web page data tables according to the above method based on the inherent popularity of the data table, and the obtained popularity distinction result is obviously inaccurate and inconsistent with the actual application scenario.
  • the embodiments of the present application provide a method, device, and related equipment for distinguishing the heat of a data table, which can improve the accuracy of distinguishing the heat of a data table and are more in line with practical application scenarios.
  • Data blood relationship also known as data lineage relationship, data origin relationship and data lineage relationship, etc., refers to a relationship that will be formed between data tables in the process of generation, fusion, transformation, circulation and death of data tables .
  • an intermediate table including intermediate data ie, some or all of the calculated original data
  • data table 3 including final data is formed.
  • the data link from data table 1 to data table 2 to data table 3 is Indicates the data blood relationship of these three tables.
  • data table 1 and data table 2 have a direct blood relationship
  • data table 2 and data table 3 have a direct blood relationship
  • data table 1 and data table 3 have an indirect blood relationship.
  • data table 2 directly depends on data table 1
  • data table 3 directly depends on data table 2 , indirectly dependent on Data Table 1. It can be understood that if the data used to calculate data table 2 and data table 3 in data table 1 is accessed, it means that data table 2 and data table 3 are indirectly accessed, that is, data table 1 is to a certain extent.
  • data table 2 can improve the popularity of data table 2 and data table 3; if the data from data table 1 in data table 2 is accessed, it means that data table 1 and data table 3 are indirectly accessed, that is to say, To a certain extent, data table 2 can improve the popularity of data table 1 and data table 3; if the data from data table 2 in data table 3 is accessed, it means that data table 1 and data table 2 are indirectly Accessed, that is to say, Data Sheet 3 has an effect on the popularity of Data Sheet 1 and the popularity of Data Sheet 2 to a certain extent.
  • each data table has a data blood relationship with it ( Taking into account the increased popularity of other data tables (including direct blood relationship and indirect blood relationship), the determined popularity of each data table will be more accurate and can better highlight the importance of each data table.
  • the primary key-foreign key relationship defines a relationship between two tables in a relational database. As shown in Figure 3, one or more fields A1 in data table 1 are Reference is made as the primary key of data table 2', at this time, the field A1 in data table 1 is said to be a foreign key pointing to data table 2', and data table 1 and data table 2' have a primary-foreign key association relationship.
  • the primary key of data table 2' is also referenced as the primary key of data table 3'.
  • data table 1 and data table 3' also have a primary and foreign key association relationship.
  • the primary and foreign key associations between data table 1 and data table 2' and the primary and foreign key associations between data table 2' and data table 3' are called direct primary and foreign key associations, and data table 2' and data
  • the primary and foreign key associations between tables 3' are indirect primary and foreign key associations.
  • table 3' has an effect of improving; if the primary key of data table 3' is accessed, it means that data table 1 and data table 2' are indirectly accessed, that is to say, data table 3' has a certain degree of influence on data table 1 and Data Sheet 2' heat up.
  • each data table has its main external
  • key associations including direct primary and foreign key associations and indirect primary and foreign key associations
  • Data table 1 refers to the heat brought by the associated data table to the associated data table, such as the above-mentioned data table 1 due to the data table 2 and/or data table 3 that has a data blood relationship with it.
  • Data table 1 has increased popularity due to data table 2' and/or data table 3' having a primary and foreign key relationship with it.
  • the process includes but is not limited to the following steps:
  • the service node obtains the log information of the data operation of the data table 1 from the storage node, and obtains the information of the data operation of the data table 1 according to the log information of the data operation of the data table 1.
  • the log information of the data operation of the data table 1 indicates that there is log information about the data operation performed by the user that is automatically recorded by the storage node when the user performs data operations on the data table 1, and the log information includes the user's data operation on the data table 1.
  • Information about the data operations performed such as the type of data operations performed on Data Table 1 (such as deleting data, adding data, etc.) and the time of data operations on Data Table 1. Therefore, according to the data in Table 1 Operation log information Get information about data operation of data table 1.
  • the service node can obtain the log information of the data table 1 within a preset time period from the storage node, and then obtain the data operation information of the data table 1 within the preset time period according to the log information, for example, the service node You can obtain the log information of data table 1 in 2020, and then obtain the information of data operation of data table 1 in 2020 according to the log information of data table 1 in 2020.
  • the service node determines the number of times the data table 1 is called according to the data operation information of the data table 1.
  • the number of times of querying data in data table 1, the number of times of adding data in data table 1, the number of times of deleting data in data table 1, and the number of times of deleting data in data table 1 and The number of times of modifying the data, etc., and then summing the above times can determine the number of times the data table 1 is called.
  • A3. Determine the inherent heat of data table 1 according to the number of times data table 1 is called.
  • the inherent popularity of the data table 1 the number of times the data table 1 is called.
  • the service node can obtain and first data from the storage node.
  • the second data table associated with the table, and then obtain the correlation heat of the first data table and the second data table according to the correlation relationship between the first data table and the second data table and the inherent heat of the second data table, after the first data table is obtained.
  • the popularity of the first data table is determined according to the correlation, wherein the correlation between the first data table and the second data table includes a data blood relationship and a primary and foreign key correlation. species or multiple species.
  • a method for distinguishing the heat of a data table provided by the embodiment of the present application is described in more detail below with reference to FIG. 4 .
  • the method for distinguishing the heat of a data table provided by the embodiment of the present application includes but is not limited to the following steps:
  • the service node acquires a first data table and a second data table having a data blood relationship with the first data table from a storage node.
  • the storage node stores multiple data tables, and the first data table may be any one or more data tables among the multiple data tables stored by the storage node.
  • the multiple data tables stored by the storage node can be various types of tables such as transaction data tables and web page data tables. Tables belonging to any database, not specifically limited here.
  • the data blood relationship between the first data table and the second data table means that the second data table is calculated according to the first data table, and/or, the first data table is based on the second data table. Calculated from the data sheet.
  • the data blood relationship between the first data table and the second data table may be a direct blood relationship or an indirect blood relationship, which is not specifically limited here.
  • the service node can obtain the second data table that has a data blood relationship with the first data table from the storage node through a data warehouse tool (such as hive) or a SQL statement, wherein hive It is a data warehouse tool based on Hadoop for data extraction, transformation and loading. It is a mechanism for storing, querying and analyzing large-scale data stored in Hadoop.
  • a data warehouse tool such as hive
  • SQL statement wherein hive It is a data warehouse tool based on Hadoop for data extraction, transformation and loading. It is a mechanism for storing, querying and analyzing large-scale data stored in Hadoop.
  • the service node obtains the second data table having a data blood relationship with the first data table from the storage node through the data warehouse tool or the SQL statement, which is only an example and should not be regarded as a specific limitation.
  • the service node can also obtain the second data table that has a data blood relationship with the first data table in other ways, such as manually reading the code to find the second data table that has a data blood relationship with the first data table, and the service node Receive the manually input name of the second data table that has a data blood relationship with the first data table, and then acquire the second data table according to the manually input name of the second data table.
  • the service node acquires the inherent heat H 0 of the first data table.
  • the inherent heat H 0 of the first data table is the heat generated by the first data table itself being called.
  • the service node calculates the correlation heat H 1 of the first data table and the second data table according to the data blood relationship between the first data table and the second data table and the inherent heat of the second data table.
  • the inherent heat of the second data table is the heat generated by the second data table itself being called.
  • the service node can determine the blood relationship weight corresponding to the second data table according to the data blood relationship between the first data table and the second data table, and Calculate the inherent heat of the second data table, and then calculate the associated heat H 1 of the first data table and the second data table according to the blood relationship weight corresponding to the second data table and the inherent heat of the second data table.
  • H 1 W A *H 0,A +W B *H 0,B
  • W A and W B are both numbers greater than 0 and less than 1.
  • the second data table A has a direct blood relationship with the first data table
  • the second data table B has an indirect blood relationship with the first data table
  • the first data table A has an indirect blood relationship with the first data table.
  • the relationship between the second data table A and the first data table is closer, preferably, W A is greater than W B .
  • the service node determines the heat H 0 of the first data table according to the inherent heat H 0 of the first data table and the associated heat H 1 of the first data table and the second data table.
  • H H 0 +H 1 .
  • FIG. 6 is a schematic flowchart of another method for distinguishing the heat of a data table provided by an embodiment of the present application. As shown in FIG. 6, the method for distinguishing the heat of a data table provided by an embodiment of the present application includes but is not limited to the following steps:
  • the service node obtains a first data table and a second data table having a primary and foreign key association relationship with the first data table from a storage node.
  • the service node may obtain the second data table having a primary and foreign key association relationship with the first data table from the storage node through a data warehouse tool or a SQL statement.
  • the service node obtains the second data table with the primary and foreign key association relationship in the first data table from the storage node through the data warehouse tool or the SQL statement, which is only an example.
  • the service node can also obtain the second data table that has a primary and foreign key relationship with the first data table in other ways, such as manually reading the code to find the second data that has a primary and foreign key relationship with the first data table.
  • the service node receives the manually input name of the second data table that has a primary foreign key relationship with the first data table, and then obtains the second data table according to the manually input name of the second data table.
  • the service node acquires the inherent heat H 0 of the first data table.
  • the service node calculates the association heat H 1 of the first data table and the second data table according to the primary and foreign key association relationship between the first data table and the second data table and the inherent heat of the second data table.
  • the service node can determine the corresponding data table according to the primary and foreign key association relationship between the first data table and the second data table. association weight, and calculating the inherent heat of the second data table, and then calculating the association heat H 1 of the first data table and the second data table according to the association weight corresponding to the second data table and the inherent heat of the second data table.
  • H 1 W C *H 0,C +W D *H 0,D
  • both W C and W D are numbers greater than 0 and less than 1.
  • the second data table C and the first data table have a direct primary and foreign key association relationship
  • the second data table D and the first data table have an indirect primary and foreign key relationship.
  • the relationship between the second data table C and the first data table is closer, preferably, W C is greater than W D .
  • the service node determines the heat H 0 of the first data table according to the inherent heat H 0 of the first data table and the associated heat H 1 of the first data table and the second data table.
  • H H 0 +H 1 .
  • the service node obtains the second data table that has an associated relationship with the first data table from the storage node, if it not only obtains the second data table that has a data blood relationship with the first data table, but also obtains the second data table that is related to the first data table.
  • the data table has a second data table with a primary and foreign key association relationship
  • the correlation H1 between the first data table and the second data table calculated by the service node includes not only the second data table that has a data blood relationship with the first data table.
  • the heat brought by it also includes the heat brought by the second data table that has a primary and foreign key association relationship with the first data table.
  • the first data table has both the data blood relationship shown in FIG. 5 and the primary and foreign key association shown in FIG.
  • the associated heat H 1 of the first data table and the second data table is:
  • H 1 W A *H 0,A +W B *H 0,B +W C *H 0,C +W D *H 0,D
  • the association heat H 1 includes not only the heat brought by the second data table that has a data blood relationship with the first data table, but also the heat brought by the second data table that has a primary and foreign key association relationship with the first data table.
  • the heat H of the first data table calculated by the service node not only includes the heat brought by the second data table that has a data blood relationship with the first data table, but also includes the first data table that has a primary and foreign key association relationship with the first data table. The heat brought by the data sheet.
  • the service node can obtain the heat of multiple data tables according to the data table heat discrimination method provided above.
  • the service node can obtain the heat of multiple data tables. Distinguish which data tables are more popular and which are less popular, so as to manage multiple data tables.
  • the service node may delete data tables whose heatness is less than the first preset threshold from the storage node according to the heatness of the multiple data tables, so as to save storage space.
  • the service node may display the data tables whose popularity is greater than the second preset threshold from the multiple data tables on the display interface according to the popularity of the multiple data tables
  • the position of the data table is adjusted to the front of the data table whose heat is less than the second preset threshold, that is to say, the position of the data table whose heat is greater than the second preset threshold on the display interface is adjusted to a position that is more convenient for users to view, which is convenient for users. Quickly view popular data sheets.
  • the service node may further migrate data tables whose heatness is less than the third preset threshold to the first storage device, and the heatness is greater than the fourth preset threshold.
  • the data table of the threshold is migrated to the second storage device, wherein the storage performance of the first storage device is lower than that of the storage node, and the storage performance of the second storage device is higher than that of the storage node.
  • the sizes of the first preset threshold, the second preset threshold, the third preset threshold, and the fourth preset threshold can be set according to actual conditions, and are not specifically limited here.
  • the service node migrates the data tables with low heat to the first storage device whose storage performance is lower than that of the storage node, which not only prevents the data tables with low heat from continuing to occupy the resources of the storage node, but also prevents the subsequent users from viewing this part of the data tables. It can be found from the first storage device; the service node migrates the hot data table to the second storage device with higher storage performance than the storage node, which can improve the efficiency of operating data in the hot data table, and improve the efficiency of the hot data table. Data sheet storage security.
  • the method for distinguishing the heat of data tables introduces a second data table that has an associated relationship with the first data table as the first data table when determining the heat degree H of the first data table.
  • the resulting heat that is, the associated heat H 1 between the first data table and the second data table, can make the calculated heat H of the first data table more accurate and more in line with the actual application scenario.
  • the hotness of multiple data tables can be better distinguished.
  • a method for distinguishing the heatness of a data table according to an embodiment of the present application is described in detail above. Based on the same inventive concept, the apparatus for distinguishing the heatness of a data table in an embodiment of the present application is continued below.
  • FIG. 8 is a schematic structural diagram of a data processing system 10 provided by an embodiment of the present application.
  • the data processing system 10 includes a data table heat distinguishing device 1100 provided by an embodiment of the present application.
  • the data table heat distinguishing device 1100 includes: an acquisition module 1101 and a processing module 1102, the data table heat discrimination device 1100 can be integrated into the service node 110 in the data processing system 10, and the data processing system 10 can include, in addition to the service node 110, a storage node 120, The first storage device 130 and the second storage device 140, wherein,
  • the storage node 120 stores a plurality of data tables
  • an obtaining module 1101, configured to obtain a second data table associated with the first data table from the storage node 120;
  • the processing module 1102 is configured to obtain the associated heat H 1 of the first data table and the second data table, wherein the associated heat H 1 of the first data table and the second data table is based on the inherent heat of the second data table and the first data
  • the association relationship between the table and the second data table is obtained, and the inherent heat of the second data table is the heat generated by the second data table being called;
  • the processing module 1102 is configured to determine the popularity H of the first data table according to the correlation H1 of the first data table and the second data table.
  • the obtaining module 1101 is specifically used for:
  • the processing module 1102 is specifically used for:
  • the correlation heat H 1 of the first data table and the second data table is calculated.
  • the obtaining module 1101 is specifically used for:
  • the processing module 1102 is specifically used for:
  • the association heat H 1 of the first data table and the second data table is calculated.
  • processing module 1102 is specifically configured to:
  • the heat H 0 of the first data table is determined according to the inherent heat H 0 of the first data table and the associated heat H 1 of the first data table and the second data table, wherein the inherent heat H 0 of the first data table is the first data table The heat generated by the call.
  • processing module 1102 is further configured to:
  • the data table whose heat is less than the first preset threshold is deleted from the storage node 120 .
  • processing module 1102 is further configured to:
  • the position on the display interface of the data table whose heat is greater than the second preset threshold among the plurality of data tables is adjusted to be in front of the data table whose heat is less than the second preset threshold.
  • processing module 1102 is further configured to:
  • the data tables whose heat is less than the third preset threshold are migrated to the first storage device 130
  • the data tables whose heat is greater than the fourth preset threshold are migrated to the second storage device 140
  • the data tables of the first storage device 130 are The performance is lower than that of the storage node 120
  • the performance of the second storage device 140 is higher than that of the storage node 120 .
  • the sizes of the first preset threshold, the second preset threshold, the third preset threshold, and the fourth preset threshold can be set according to actual conditions, and are not specifically limited here.
  • the data processing system 10 and the apparatus 1100 for distinguishing the heat of a data table are only an example provided by the embodiments of the present application, and the data processing system 10 and the apparatus 1100 for distinguishing the heat of a data table may have more or more components than those shown in FIG. 8 . Fewer components, two or more components may be combined, or may be implemented with different configurations of components.
  • the embodiment of the present application further provides a computing device cluster 20, and the computing device cluster 20 can be used to deploy the data processing system 10 shown in FIG. 8, and specifically can be used to deploy the data table in the data processing system 10 shown in FIG. 8
  • the heat distinguishing apparatus 1100 is configured to execute the data table heat distinguishing method provided by the embodiment of the present application.
  • the computing device cluster 20 includes at least one computing device 200 .
  • the computing device cluster 20 includes only one computing device 200 , all the modules in the data processing system 10 shown in FIG. 8 may be deployed in the one computing device 200 : the service node 110 and the storage node 120 , the first storage device 130 and the second storage device 140 .
  • each computing device 200 in the multiple computing devices 200 may be used to deploy some modules in the data processing system 10 shown in FIG. Two or more of the computing devices 200 of the computing devices 200 are jointly used to deploy one or more modules in the data processing system 10 shown in FIG. 8 .
  • the computing device 200A can be used to deploy the service node 110 and the storage node 120
  • the computing device 200B can be used to deploy the first storage device 130 and the second storage device 130.
  • the storage device 140, or the computing device 200A and the computing device 200B are jointly used to deploy the service node 110, for example, the obtaining module 1101 in the data table heat distinguishing device 1100 is deployed on the computing device 200A, and the data table heat distinguishing device is deployed on the computing device 200B
  • the computing device 200A is also used to deploy storage nodes
  • the computing device 200B is also used to deploy the first storage device 130 and the second storage device 140; it is assumed that the multiple computing devices 200 include computing devices 200A, 200B, 200C and 200D, the computing device 200A can be used to deploy the service node 110, the computing device 200B can be used to deploy the storage node 120, the computing device 200C can be used to deploy the first storage device 130, and the computing device 200D can be used to deploy the second storage device 140.
  • At least one computing device 200 included in the computing device cluster 20 may be all terminal devices, or all cloud servers, or some cloud servers and some terminal devices, which are not specifically limited here.
  • each computing device 200 in the computing device cluster 20 may include a processor, a memory, a communication interface, etc., and the memory in one or more computing devices 200 in the computing device cluster 20 may store the same
  • the code (which may also be referred to as an instruction or a program instruction, etc.) for executing the data table heat discrimination method provided by the embodiment of the present application
  • the processor can read the code from the memory, and execute the code to realize the code provided by the embodiment of the present application.
  • the communication interface can be used to realize the communication between each computing device 200 and other devices.
  • each computing device 200 in the computing device cluster 20 may also communicate with other devices through a network connection.
  • the network may be a wide area network or a local area network, or the like.
  • the computing device 200 in which the apparatus 1100 for distinguishing the data table heat is deployed includes: a processor 210 , a memory 220 and a communication interface 230 , wherein the processor 210 , the memory 220 and the communication interface 230 can be connected to each other through a bus 240 .
  • the processor 210 may read the code stored in the memory 220, and cooperate with the communication interface 230 to execute some or all of the steps of the data table heat discrimination method performed by the data table heat discrimination apparatus 1100 in the above embodiments of the present application.
  • the processor 210 may have various specific implementation forms, for example, the processor 210 may be a central processing unit (central processing unit, CPU) or a graphics processing unit (graphics processing unit, GPU), and the processor 210 may also be a single-core processor or multi-core processor.
  • the processor 210 may be a combination of a CPU and a hardware chip.
  • the above-mentioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof.
  • the above-mentioned PLD can be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general-purpose array logic (generic array logic, GAL) or any combination thereof.
  • the processor 210 may also be independently implemented by a logic device with built-in processing logic, such as an FPGA or a digital signal processor (digital signal processing, DSP).
  • the memory 220 may store codes as well as data.
  • the code includes: the code of the acquisition module 1101 and the code of the processing module 1102, etc.
  • the data includes: the inherent heat H 0 of the first data table, the inherent heat of the second data table, and the association between the first data table and the second data table Heat H 1 and so on.
  • the memory 220 may be a non-volatile memory, such as a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (erasable). PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), or flash memory.
  • ROM read-only memory
  • PROM programmable read-only memory
  • EPROM erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory volatile memory, which may be random access memory (RAM), which acts as an external cache.
  • Communication interface 230 may be a wired interface (eg, an Ethernet interface) or a wireless interface (eg, a cellular network interface or using a wireless local area network interface) for communicating with other computing nodes or devices.
  • the communication interface 230 may use a protocol family above transmission control protocol/internet protocol (TCP/IP), for example, remote function call (RFC) protocol, simple object access protocol (SOAP) protocol, simple network management protocol (SNMP) protocol, common object request broker architecture (CORBA) protocol, and distributed protocols and many more.
  • TCP/IP transmission control protocol/internet protocol
  • RRC remote function call
  • SOAP simple object access protocol
  • SNMP simple network management protocol
  • CORBA common object request broker architecture
  • the bus 240 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA for short) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus 240 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 10, but it does not mean that there is only one bus or one type of bus.
  • the above computing device 200 is configured to execute the method in the above embodiment of the method for classifying the heat of a data table, which belongs to the same concept as the above embodiment of the method.
  • the specific implementation process please refer to the above embodiment of the method, which will not be repeated here.
  • computing device 200 is only an example provided by the embodiments of the present application, and the computing device 200 may have more or less components than those shown in FIG. 10 , two or more components may be combined, or Different configurations of components are possible.
  • Embodiments of the present application also provide a non-transitory computer-readable storage medium, where code is stored in the non-transitory computer-readable storage medium, and when the non-transitory computer-readable storage medium runs on a processor, the data table heat rate described in the foregoing embodiments can be implemented. Distinguish some or all of the steps of the method.
  • the above embodiments it may be implemented in whole or in part by software, hardware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product may contain code.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media, or semiconductor media, and the like.
  • the steps in the method of the embodiment of the present application may be sequentially adjusted, combined or deleted according to actual needs; the units in the device of the embodiment of the present application may be divided, combined or deleted according to actual needs.

Abstract

A data table heat differentiation method and apparatus, and a related device. The method comprises: a service node obtains, from a storage node, a second data table associated with a first data table, then obtains an associated heat between the first data table and the second data table according to the second data table, and after obtaining the associated heat between the first data table and the second data table, determines the heat of the first data table according to the associated heat between the first data table and the second data table, the associated heat between the first data table and the second data table being obtained according to the inherent heat of the second data table and the association between the first data table and the second data table. Said method can improve the accuracy of data table heat differentiation.

Description

数据表热度区分方法、装置以及相关设备Data sheet heat discrimination method, device and related equipment
本申请要求于2021年4月12日提交中国专利局、申请号为202110389324.9、发明名称为“数据表热度区分方法、装置以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on April 12, 2021 with the application number 202110389324.9 and the invention titled "Data Sheet Heat Discrimination Method, Apparatus and Related Equipment", the entire contents of which are incorporated by reference in in this application.
技术领域technical field
本申请涉及大数据领域,尤其涉及一种数据表热度区分方法、装置以及相关设备。The present application relates to the field of big data, and in particular, to a method, device and related equipment for distinguishing the heat of a data table.
背景技术Background technique
在大数据时代,数据呈爆发性增长,数据表的数量也随着数据的增长变得越来越庞大。为了提高数据表的使用效率,需要区分大量数据表的热度,并根据数据表的热度对大量的数据表进行管理,例如清理热度较低的数据表或者将热度较高的数据表进行置顶等。In the era of big data, data grows explosively, and the number of data tables becomes larger and larger with the growth of data. In order to improve the use efficiency of data tables, it is necessary to distinguish the heat of a large number of data tables, and manage a large number of data tables according to the heat of the data tables, such as cleaning the data tables with low heat or putting the data tables with high heat to the top, etc.
但是,目前已有的数据表热度区分方法存在着对数据表热度进行区分的准确度低的问题。However, the existing methods for distinguishing data table heat have the problem of low accuracy in distinguishing data table heat.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种数据表热度区分方法、装置以及相关设备,能够提高数据表热度的区分准确度。The present application provides a method, device and related equipment for distinguishing the heat of a data table, which can improve the accuracy of distinguishing the heat of a data table.
第一方面,提供了一种数据表热度区分方法,所述方法包括:In a first aspect, a method for distinguishing data table heat is provided, and the method includes:
服务节点从存储节点获取与第一数据表关联的第二数据表,所述存储节点存储有多个数据表;The service node obtains a second data table associated with the first data table from a storage node, where the storage node stores a plurality of data tables;
所述服务节点获取所述第一数据表和所述第二数据表的关联热度,其中,所述第一数据表和所述第二数据表的关联热度根据所述第二数据表的固有热度以及所述第一数据表和所述第二数据表的关联关系获得,所述第二数据表的固有热度为所述第二数据表被调用产生的热度;The service node acquires the associated heat of the first data table and the second data table, wherein the associated heat of the first data table and the second data table is based on the inherent heat of the second data table And the association relationship between the first data table and the second data table is obtained, and the inherent heat of the second data table is the heat generated by the second data table being called;
所述服务节点根据所述第一数据表和所述第二数据表的关联热度,确定所述第一数据表的热度。The service node determines the popularity of the first data table according to the relative popularity of the first data table and the second data table.
上述方案中,在计算第一数据表的热度时,引入了与第一数据表具有关联关系的第二数据表为第一数据表带来的热度,即第一数据表和第二数据表的关联热度,因此,可以提高计算得到的第一数据表的热度更准确,在获取了多个数据表的热度的情况下,能够更好地区分多个数据表的热度。In the above solution, when calculating the heat of the first data table, the heat brought by the second data table having an associated relationship with the first data table is introduced to the first data table, that is, the difference between the first data table and the second data table. Therefore, the calculated popularity of the first data table can be improved to be more accurate, and when the popularity of multiple data tables is acquired, the popularity of multiple data tables can be better distinguished.
在一种可能的实现方式中,所述服务节点从存储节点获取与第一数据表关联的第二数据表,包括:In a possible implementation manner, the service node acquires the second data table associated with the first data table from the storage node, including:
所述服务节点从所述存储节点获取与所述第一数据表具有数据血缘关系的所述第二数据表,其中,所述数据血缘关系表示所述第二数据表根据所述第一数据表计算得到,或者,所述第一数据表根据所述第二数据表计算得到;The service node obtains, from the storage node, the second data table having a data blood relationship with the first data table, wherein the data blood relationship indicates that the second data table is based on the first data table Calculated, or, the first data table is calculated according to the second data table;
所述服务节点获取所述第一数据表和所述第二数据表的关联热度,包括:The service node obtains the correlation heat between the first data table and the second data table, including:
所述服务节点根据所述第一数据表和所述第二数据表的数据血缘关系,计算所述第一数据表和所述第二数据表的关联热度。The service node calculates the correlation degree of the first data table and the second data table according to the data blood relationship between the first data table and the second data table.
在一种可能的实现方式中,所述服务节点从存储节点获取与第一数据表关联的第二数据 表,包括:In a possible implementation manner, the service node obtains the second data table associated with the first data table from the storage node, including:
所述服务节点从所述存储节点获取与所述第一数据表具有主外键关联关系的所述第二数据表,其中,所述主外键关联关系表示所述第一数据表中的一个或者多个字段被引用作为第二数据表的主键,或者,所述第二数据表中的一个或者多个字段被引用作为所述第一数据表的主键;The service node acquires, from the storage node, the second data table having a primary and foreign key association relationship with the first data table, wherein the primary and foreign key association relationship represents one of the first data tables Or multiple fields are referenced as the primary key of the second data table, or, one or more fields in the second data table are referenced as the primary key of the first data table;
所述服务节点获取所述第一数据表和所述第二数据表的关联热度,包括:The service node obtains the correlation heat between the first data table and the second data table, including:
所述服务节点根据所述第一数据表和所述第二数据表的主外键关联关系,计算所述第一数据表和所述第二数据表的关联热度。The service node calculates the association heat between the first data table and the second data table according to the primary and foreign key association relationship between the first data table and the second data table.
在一种可能的实现方式中,所述服务节点根据所述第一数据表和所述第二数据表的关联热度确定所述第一数据表的热度,包括:In a possible implementation manner, the service node determines the popularity of the first data table according to the correlation between the first data table and the second data table, including:
所述服务节点根据所述第一数据表的固有热度以及所述第一数据表和所述第二数据表的关联热度,确定所述第一数据表的热度,其中,所述第一数据表的固有热度为所述第一数据表被调用产生的热度。The service node determines the heatness of the first data table according to the inherent heatness of the first data table and the associated heatness of the first data table and the second data table, wherein the first data table The inherent heat is the heat generated by the first data table being called.
在一种可能的实现方式中,所述方法还包括:In a possible implementation, the method further includes:
所述服务节点计算所述多个数据表的热度;the service node calculates the heatness of the plurality of data tables;
所述服务节点根据计算结果从所述存储节点删除热度小于第一预设阈值的数据表。The service node deletes, from the storage node according to the calculation result, data tables whose heat is less than a first preset threshold.
上述方案中,服务节点根据计算结果从存储节点删除热度小的数据表,可以节省存储空间。In the above solution, the service node deletes the data table with low heat from the storage node according to the calculation result, which can save storage space.
在一种可能的实现方式中,所述方法还包括:In a possible implementation, the method further includes:
所述服务节点计算所述多个数据表的热度;the service node calculates the heatness of the plurality of data tables;
所述服务节点根据计算结果将所述多个数据表中热度大于第二预设阈值的数据表在显示界面上的位置调整到热度小于所述第二预设阈值的数据表的前面。The service node adjusts, according to the calculation result, a position on the display interface of a data table whose heat is greater than the second preset threshold in the plurality of data tables to be in front of a data table whose heat is less than the second preset threshold.
上述方案中,服务节点将热度大的数据表在显示界面上的位置调整到热度小的数据表的前面,便于用户方便快捷地查看到热度大的数据表。In the above solution, the service node adjusts the position of the data table with high popularity on the display interface to the front of the data table with low popularity, so that the user can view the data table with high popularity conveniently and quickly.
在一种可能的实现方式中,所述方法还包括:In a possible implementation, the method further includes:
所述服务节点计算所述多个数据表的热度;the service node calculates the heatness of the plurality of data tables;
所述服务节点根据计算结果将热度小于第三预设阈值的数据表迁移到第一存储装置,其中,所述第一存储装置的存储性能低于所述存储节点。The service node migrates, according to the calculation result, data tables whose heat is less than a third preset threshold to a first storage device, where the storage performance of the first storage device is lower than that of the storage node.
上述方案中,服务节点将热度小的数据表迁移到存储性能低于存储节点的第一存储装置,不仅可以避免热度小的数据表继续占用存储节点资源,后续用户需要查看这部分数据表时,还可以从第一存储装置中找到。In the above solution, the service node migrates the data table with low heat to the first storage device whose storage performance is lower than that of the storage node, which can not only prevent the data table with low heat from continuing to occupy the resources of the storage node, but also when users need to view this part of the data table in the future. Also found from the first storage device.
在一种可能的实现方式中,所述方法还包括:In a possible implementation, the method further includes:
所述服务节点计算所述多个数据表的热度;the service node calculates the heatness of the plurality of data tables;
所述服务节点根据计算结果将热度大于第四预设阈值的数据表迁移到第二存储装置,其中,所述第二存储装置的存储性能高于所述存储节点。The service node migrates, according to the calculation result, a data table whose heat is greater than a fourth preset threshold to a second storage device, where the storage performance of the second storage device is higher than that of the storage node.
上述方案中,服务节点将热度大的数据表迁移到存储性能高于存储节点的第二存储装置,可以提升在热度大的数据表中操作数据的效率,以及提升热度大的数据表的存储安全性。In the above solution, the service node migrates the hot data table to the second storage device with higher storage performance than the storage node, which can improve the efficiency of operating data in the hot data table and improve the storage security of the hot data table. sex.
第二方面,提供了一种数据表热度区分装置,所述装置应用于服务节点,所述装置包括:In a second aspect, a data table heat discrimination device is provided, the device is applied to a service node, and the device includes:
获取模块,用于从存储节点获取与第一数据表关联的第二数据表,所述存储节点存储有多个数据表;an obtaining module, configured to obtain a second data table associated with the first data table from a storage node, where the storage node stores a plurality of data tables;
处理模块,用于获取所述第一数据表和所述第二数据表的关联热度,其中,所述第一数 据表和所述第二数据表的关联热度根据所述第二数据表的固有热度以及所述第一数据表和所述第二数据表的关联关系获得,所述第二数据表的固有热度为所述第二数据表被调用产生的热度;A processing module, configured to obtain the correlation degree of the first data table and the second data table, wherein the correlation degree of the first data table and the second data table is based on the inherent characteristics of the second data table The heat and the association relationship between the first data table and the second data table are obtained, and the inherent heat of the second data table is the heat generated by the second data table being called;
所述处理模块,用于根据所述第一数据表和所述第二数据表的关联热度确定所述第一数据表的热度。The processing module is configured to determine the popularity of the first data table according to the correlation between the first data table and the second data table.
在一种可能的实现方式中,所述获取模块,具体用于:In a possible implementation manner, the obtaining module is specifically used for:
从所述存储节点获取与所述第一数据表具有数据血缘关系的所述第二数据表,其中,所述数据血缘关系表示所述第二数据表根据所述第一数据表计算得到,或者,所述第一数据表根据所述第二数据表计算得到;Acquire the second data table having a data blood relationship with the first data table from the storage node, wherein the data blood relationship indicates that the second data table is calculated according to the first data table, or , the first data table is calculated according to the second data table;
所述处理模块,具体用于:The processing module is specifically used for:
根据所述第一数据表和所述第二数据表的数据血缘关系,计算所述第一数据表和所述第二数据表的关联热度。According to the data blood relationship between the first data table and the second data table, the correlation degree of the first data table and the second data table is calculated.
在一种可能的实现方式中,所述获取模块,具体用于:In a possible implementation manner, the obtaining module is specifically used for:
从所述存储节点获取与所述第一数据表具有主外键关联关系的所述第二数据表,其中,所述主外键关联关系表示所述第一数据表中的一个或者多个字段被引用作为第二数据表的主键,或者,所述第二数据表中的一个或者多个字段被引用作为所述第一数据表的主键;Acquire the second data table having a primary-foreign key association relationship with the first data table from the storage node, wherein the primary-foreign key association relationship represents one or more fields in the first data table be referenced as the primary key of the second data table, or, one or more fields in the second data table are referenced as the primary key of the first data table;
所述处理模块,具体用于:The processing module is specifically used for:
根据所述第一数据表和所述第二数据表的主外键关联关系,计算所述第一数据表和所述第二数据表的关联热度。According to the primary and foreign key association relationship between the first data table and the second data table, the association degree of the first data table and the second data table is calculated.
在一种可能的实现方式中,所述处理模块,具体用于:In a possible implementation manner, the processing module is specifically used for:
根据所述第一数据表的固有热度和所述第一数据表和所述第二数据表的关联热度确定所述第一数据表的热度,其中,所述第一数据表的固有热度为所述第一数据表被调用产生的热度。The heatness of the first data table is determined according to the inherent heatness of the first data table and the correlation heatness of the first data table and the second data table, wherein the inherent heatness of the first data table is all The heat generated when the first data table is called.
在一种可能的实现方式中,所述处理模块,还用于:In a possible implementation manner, the processing module is further configured to:
计算所述多个数据表的热度;calculating the popularity of the plurality of data tables;
根据计算结果从所述存储节点删除热度小于第一预设阈值的数据表。According to the calculation result, delete the data table whose heat is less than the first preset threshold from the storage node.
在一种可能的实现方式中,所述处理模块,还用于:In a possible implementation manner, the processing module is further configured to:
计算所述多个数据表的热度;calculating the popularity of the plurality of data tables;
根据计算结果将所述多个数据表中热度大于第二预设阈值的数据表在显示界面上的位置调整到热度小于所述第二预设阈值的数据表的前面。According to the calculation result, the position on the display interface of the data table whose heat is greater than the second preset threshold among the plurality of data tables is adjusted to be in front of the data table whose heat is less than the second preset threshold.
在一种可能的实现方式中,所述处理模块,还用于:In a possible implementation manner, the processing module is further configured to:
计算所述多个数据表的热度;calculating the popularity of the plurality of data tables;
根据计算结果将热度小于第三预设阈值的数据表迁移到第一存储装置,以及将热度大于第四预设阈值的数据表迁移到第二存储装置,其中,所述第一存储装置的存储性能低于所述存储节点,所述第二存储装置的存储性能高于所述存储节点。According to the calculation result, the data tables whose heat is less than the third preset threshold are migrated to the first storage device, and the data tables whose heat is greater than the fourth preset threshold are migrated to the second storage device, wherein the storage of the first storage device The performance of the second storage device is lower than that of the storage node, and the storage performance of the second storage device is higher than that of the storage node.
第三方面,提供了一种非瞬态计算机可读存储介质,所述非瞬态计算机可读存储介质存储有计算机可读指令,当所述计算机可读指令被运行时,执行如上述第一方面或者第一方面的任意具体实现方式中所描述方法。In a third aspect, a non-transitory computer-readable storage medium is provided, and the non-transitory computer-readable storage medium stores computer-readable instructions. When the computer-readable instructions are executed, the first method described above is executed. Aspect or a method described in any specific implementation of the first aspect.
第四方面,提供了一种计算机程序产品,包括计算机程序,当所述计算机程序被计算机设备集群读取并执行时,使得所述计算机设备集群执行如上述第一方面或者第一方面的任意具体实现方式中所描述方法。In a fourth aspect, a computer program product is provided, including a computer program, when the computer program is read and executed by a cluster of computer devices, the cluster of computer devices is made to execute the first aspect or any specific implementation of the first aspect. The method described in the implementation.
第五方面,提供了一种计算设备集群,包括至少一个计算设备,每个计算设备包括处理器和存储器;至少一个计算设备的处理器用于执行至少一个计算设备的存储器中存储的指令,以使得该计算设备执行如上述第一方面或者第一方面的任意具体实现方式中所描述方法。In a fifth aspect, a computing device cluster is provided, including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the The computing device performs the method as described in the above first aspect or any specific implementation of the first aspect.
在一种可能的实现方式中,该计算设备集群包括一个计算设备,该计算设备包括处理器和存储器;该处理器用于执行该存储器中存储的指令,以使得该计算设备执行如第一方面或第一方面的任意可能的实现方式提供的方法。In a possible implementation manner, the computing device cluster includes a computing device, and the computing device includes a processor and a memory; the processor is configured to execute instructions stored in the memory, so that the computing device performs the first aspect or A method provided by any possible implementation manner of the first aspect.
在一种可能的实现方式中,该计算设备集群包括至少两个计算设备,每个计算设备包括处理器和存储器;该至少两个计算设备的处理器用于执行该至少两个计算设备的存储器中存储的指令,以使得该计算设备集群执行如第一方面或第一方面的任意可能的实现方式提供的方法。In a possible implementation manner, the computing device cluster includes at least two computing devices, and each computing device includes a processor and a memory; the processors of the at least two computing devices are used to execute the memory of the at least two computing devices. Stored instructions to cause the computing device cluster to perform the method as provided by the first aspect or any possible implementation of the first aspect.
附图说明Description of drawings
图1是本申请实施例涉及的一种应用场景的示意图;1 is a schematic diagram of an application scenario involved in an embodiment of the present application;
图2是本申请实施例涉及的一种数据血缘关系的示意图;2 is a schematic diagram of a data blood relationship involved in an embodiment of the present application;
图3是本申请实施例涉及的一种主外键关联关系的示意图;3 is a schematic diagram of a primary-foreign key association relationship involved in an embodiment of the present application;
图4是本申请实施例提供的一种数据表热度区分方法的流程示意图;FIG. 4 is a schematic flowchart of a method for distinguishing the heat of a data table provided by an embodiment of the present application;
图5是本申请实施例提供的一种第一数据表具有的数据血缘关系的示意图;5 is a schematic diagram of a data blood relationship of a first data table provided by an embodiment of the present application;
图6是本申请实施例提供的另一种数据表热度区分方法的流程示意图;6 is a schematic flowchart of another data table heat discrimination method provided by an embodiment of the present application;
图7是本申请实施例提供的一种第一数据表具有的主外键关联关系的示意图;7 is a schematic diagram of a primary and foreign key association relationship of a first data table provided by an embodiment of the present application;
图8是本申请实施例提供的一种数据处理系统的结构示意图;8 is a schematic structural diagram of a data processing system provided by an embodiment of the present application;
图9是本申请实施例提供的一种计算设备集群的结构示意图;FIG. 9 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application;
图10是本申请实施例提供的一种计算设备的结构示意图。FIG. 10 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行描述。The technical solutions in the present application will be described below with reference to the accompanying drawings.
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。The terms "first" and "second" in the embodiments of the present application are only used for the purpose of description, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature.
本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下中的至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a、b、c、a-b、a-c、b-c或a-b-c,其中a、b、c可以是单个,也可以是多个。In the embodiments of the present application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or", which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural. The character "/" generally indicates that the associated objects are an "or" relationship. "At least one of the following" or similar expressions refers to any combination of these items, including any combination of a single item(s) or a plurality of items(s). For example, at least one (a) of a, b or c may represent: a, b, c, a-b, a-c, b-c or a-b-c, wherein a, b, c may be single or multiple.
本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。Any embodiment or design described in this application as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner.
为了便于理解本申请实施例,下面先对本申请实施例涉及的概念及术语等进行介绍。In order to facilitate understanding of the embodiments of the present application, concepts, terms and the like involved in the embodiments of the present application are first introduced below.
(1)事务数据(transactional data),也可以称为交易数据、业务数据等,描述组织业务运营过程中的内部或外部事件或者交易记录等,如销售订单、通话记录等。(1) Transactional data, also known as transactional data, business data, etc., describe the internal or external events or transaction records in the business operation process of the organization, such as sales orders, call records, etc.
(2)数据热度,用于反映数据受关注程度的一个数值,该值也预示着从当前开始一段时间内该数据被访问的可能性,数据热度大,表示数据受关注程度高,预示着从当前开始一段时间内该数据被访问的可能性很大,数据热度小,表示数据受关注程度低,预示着从当前开始一段时间内该数据被访问的可能性很小。(2) Data popularity, a value used to reflect the degree of attention to the data. This value also indicates the possibility of the data being accessed within a certain period of time from the current time. If the data popularity is large, it indicates that the data has a high degree of attention, indicating that the data has received a high degree of attention. The data has a high possibility of being accessed in the current period of time, and the data popularity is small, indicating that the data has a low degree of attention, indicating that the possibility of the data being accessed in the current period of time is very small.
(3)数据表热度,用于反映数据表受关注程度的一个数值,该值预示着从当前开始一段时间内该数据表被访问的可能性,数据表热度大,表示数据表受关注程度高,预示着从当前开始一段时间内该数据表被访问的可能性很大,数据表热度小,表示数据表受关注程度低,预示着从当前开始一段时间内该数据表被访问的可能性很小。(3) Data table popularity, a value used to reflect the degree of attention of the data table. This value indicates the possibility of the data table being accessed for a period of time from the current beginning. If the data table is hot, it means that the data table has a high degree of attention. , indicating that the data table is very likely to be accessed for a period of time from the current time, and the data table is less popular, indicating that the data table has a low degree of attention, indicating that the data table is very likely to be accessed for a period of time from the current beginning. Small.
(4)数据表的固有热度,为数据表自身被调用而产生的热度,该热度可以根据数据表的被调用次数(也可以称为使用次数或者访问次数)确定,通常地,数据表的固有热度等于数据表的被调用次数,其中,数据表的被调用次数包括查询(select)数据的次数、新增(insert)数据的次数、删除(delete)数据的次数以及修改(update)数据的次数等,即数据表的被调用次数=在数据表中查询数据的次数+在数据表中新增数据的次数+在数据表中删除数据的次数+在数据表中修改数据的次数,在数据表的调用操作还包括其他操作的情况下,数据表的被调用次数中还包括在数据表中进行其他数据操作的次数。(4) The inherent heat of the data table, the heat generated by the data table itself being called, the heat can be determined according to the number of times the data table is called (also called the number of times of use or the number of visits), usually, the inherent heat of the data table The heat is equal to the number of times the data table is called, where the number of times the data table is called includes the number of times of querying (select) data, adding (insert) data, deleting (deleting) data, and modifying (update) The times of data etc., that is, the number of times the data table is called = the number of times the data is queried in the data table + the number of times the data is added in the data table + the number of times the data is deleted in the data table + the number of times the data is modified in the data table, in the data table If the calling operation also includes other operations, the number of times the data table is called also includes the number of other data operations performed in the data table.
下面对本申请实施例涉及的应用场景进行简要说明。The following briefly describes the application scenarios involved in the embodiments of the present application.
随着互联网技术的飞速发展,各种网站平台的用户快速增长,需要处理的数据量也呈指数级增长,数据的类型繁多,数据非常复杂,而这些复杂的数据经过融合、转换、流通之后,又产生新的数据,汇聚成数据的海洋。可以理解,在数据飞速增长的过程中,会沉淀下来数以万计甚至百万千万计的数据表,如图1所示,存储节点存储的大量数据表中有些表格是临时表或者陈旧表,很少被调用,应该得到清理,有些表格是经常被调用的表格,应该得到重视,以提高数据表的使用效率,节省存储资源。因此,对大量数据表进行管理,已成为各企业关注的重要问题之一。其中,对大量数据表的热度进行区分是企业对大量数据表进行管理的关键一环。With the rapid development of Internet technology, users of various website platforms have grown rapidly, and the amount of data to be processed has also increased exponentially. There are many types of data, and the data is very complex. And new data is generated and aggregated into an ocean of data. It is understandable that in the process of rapid data growth, tens of thousands or even millions of data tables will be deposited. As shown in Figure 1, some of the large number of data tables stored by storage nodes are temporary or stale tables. , which is rarely called and should be cleaned up. Some tables are frequently called and should be paid attention to to improve the efficiency of data table usage and save storage resources. Therefore, the management of a large number of data tables has become one of the important issues that enterprises pay attention to. Among them, distinguishing the popularity of a large number of data tables is a key part of the enterprise's management of a large number of data tables.
目前,在对大量数据表进行管理时,通常采用以下两种方法来区分数据表的热度:(1)基于数据创建时间,(2)基于数据表的固有热度,其中,At present, when managing a large number of data tables, the following two methods are usually used to distinguish the popularity of data tables: (1) based on the data creation time, (2) based on the inherent popularity of the data table, among which,
基于数据创建时间方法主要用来区分事务数据表(即主要包括事务数据的表格)的热度,具体地,假设存储节点存储有事务数据表A和事务数据表B,事务数据表A中的数据是最近一年创建的,事务数据表B中的数据是一年前创建的,服务节点在从存储节点获取事务数据表A和事务数据表B后,获取事务数据表A中的数据的创建时间以及事务数据表B中的数据的创建时间并进行对比,在判断出事务数据表A中的数据的创建时间大多或者全部都晚于事务数据表B中的数据的创建时间时,会确定事务数据表A的热度大于事务数据表B的热度,反之,则确定事务数据表A的热度小于事务数据表B的热度。The method based on data creation time is mainly used to distinguish the popularity of transaction data tables (that is, tables that mainly include transaction data). Specifically, assuming that the storage node stores transaction data table A and transaction data table B, the data in transaction data table A is Created in the last year, the data in transaction data table B was created one year ago. After obtaining transaction data table A and transaction data table B from the storage node, the service node obtains the creation time of the data in transaction data table A and The creation time of the data in transaction data table B is compared. When it is determined that most or all of the data in transaction data table A are created later than the data in transaction data table B, the transaction data table will be determined. The heat of A is greater than the heat of transaction data table B, otherwise, it is determined that the heat of transaction data table A is less than the heat of transaction data table B.
可以理解,在实际应用场景中,极有可能存在如下情况:虽然事务数据表A中的数据的创建时间大多或者全部都晚于事务数据表B中的数据的创建时间,但是事务数据表B中的数据比事务数据表A中的数据更重要,事务数据表B的调用比事务数据表A的调用更频繁的情况,也就是说,实际上事务数据表B的热度是大于事务数据表A的热度的。It can be understood that in practical application scenarios, it is very likely that the following situations exist: although most or all of the data in transaction data table A are created later than the data in transaction data table B, but in transaction data table B The data of the transaction data table is more important than the data in the transaction data table A, and the call of the transaction data table B is more frequent than the call of the transaction data table A, that is to say, the heat of the transaction data table B is actually greater than that of the transaction data table A. hot.
在上述可能存在的情况下,服务节点再根据上述基于数据创建时间的方法区分两个事务数据表的热度,得到的热度区分结果显然是不准确的,与实际应用场景不符。In the above possible cases, the service node then distinguishes the heat of the two transaction data tables according to the above method based on the data creation time, and the obtained heat distinction result is obviously inaccurate and inconsistent with the actual application scenario.
基于数据表的固有热度方法主要用来区分网页数据表(即主要包括网页数据(如发布在网页上的文章、图片、视频等)的表格)的热度,具体地,假设存储节点存储有网页数据表 A和网页数据表B,服务节点在从存储节点获取网页数据表A和网页数据表B后,获取网页数据表A的固有热度和网页数据表B的固有热度并进行对比,在判断出网页数据表A的固有热度大于网页数据表B的固有热度时,会确定网页数据表A的热度大于网页数据表B的热度,反之,则确定网页数据表A的热度小于网页数据表B的热度。The inherent heat method based on the data table is mainly used to distinguish the heat of the webpage data table (that is, the table mainly including webpage data (such as articles, pictures, videos, etc. published on the webpage), specifically, it is assumed that the storage node stores webpage data. Table A and webpage data table B. After obtaining webpage data table A and webpage data table B from the storage node, the service node obtains the inherent heat of webpage data table A and the inherent heat of webpage data table B and compares them. When the inherent popularity of data sheet A is greater than that of webpage data sheet B, it will be determined that the popularity of webpage data sheet A is greater than that of webpage data sheet B; otherwise, it is determined that the popularity of webpage data sheet A is lower than that of webpage data sheet B.
可以理解,在实际应用场景中,极有可能存在如下情况:虽然网页数据表A的固有热度大于网页数据表B的固有热度,但是网页数据表A中的数据是最近一年创建的,网页数据表B中的数据是一年前创建的,其中,网页数据的创建时间可以理解为网页数据发布在网页上的时间,通常来说,新发布在网页上的数据被访问的次数要小于很久以前就已经发布在网页上的数据,但是并不代表新发布在网页上的数据的热度小于很久以前就已经发布在网页上的数据,也就是说,实际上,网页数据表A的热度是大于网页数据表B的热度的。It can be understood that in practical application scenarios, it is very likely that the following situations exist: although the inherent popularity of webpage data table A is greater than that of webpage data table B, the data in webpage data table A was created in the last year, and the webpage data The data in Table B was created one year ago. The creation time of the web page data can be understood as the time when the web page data was published on the web page. Generally speaking, the number of times the newly published data on the web page is accessed is less than that of a long time ago. The data that has already been published on the webpage, but it does not mean that the popularity of the newly released data on the webpage is less than that of the data that has been published on the webpage a long time ago, that is to say, in fact, the webpage data table A is more popular than the webpage. Data Sheet B of the heat.
在上述可能存在的情况下,服务节点再根据上述基于数据表的固有热度的方法区分两个网页数据表的热度,得到的热度区分结果显然是不准确的,与实际应用场景不符。In the case of the above possible existence, the service node distinguishes the popularity of the two web page data tables according to the above method based on the inherent popularity of the data table, and the obtained popularity distinction result is obviously inaccurate and inconsistent with the actual application scenario.
可以看出,上述两种方法存在着对数据表热度进行区分的准确度较低、与实际应用场景不符的问题。It can be seen that the above two methods have the problem that the accuracy of distinguishing the heat of the data table is low and inconsistent with the actual application scenario.
针对上述问题,本申请实施例提供了数据表热度区分方法、装置以及相关设备,能够提高数据表热度的区分准确度,更加符合实际应用场景。In view of the above problems, the embodiments of the present application provide a method, device, and related equipment for distinguishing the heat of a data table, which can improve the accuracy of distinguishing the heat of a data table and are more in line with practical application scenarios.
在介绍本申请实施例提供的数据表热度区分方法、装置以及相关设备之前,先对本申请实施例涉及的数据表关联关系、数据表的关联热度等概念以及数据表的固有热度的获取过程进行介绍。Before introducing the method, device, and related equipment for distinguishing data table popularity provided by the embodiments of the present application, concepts such as the association relationship of data tables, the correlation degree of data tables involved in the embodiments of the present application, and the process of acquiring the inherent popularity of data tables are introduced. .
(1)数据表关联关系,具体包括数据血缘关系和主外键关联关系,其中,(1) Data table association, specifically including data blood relationship and primary and foreign key association, among which,
数据血缘关系,也可以称为数据血统关系,数据起源关系以及数据谱系关系等等,指在数据表的产生、融合、转换、流通至消亡的过程中,数据表之间会形成的一种关系。如图2所示,假设原始数据存储在数据表1中,对数据表1中的部分或者全部原始数据进行计算后,得到了包括中间数据(即计算后的部分或者全部原始数据)的中间表2,对中间表2中来源于数据表1的中间数据进行计算之后,形成了包括最终数据的数据表3,此时,从数据表1至数据表2至数据表3这条数据链路就表示这三个表格的数据血缘关系。具体地,可以称数据表1和数据表2之间具有直接血缘关系,数据表2和数据表3之间具有直接血缘关系,数据表1和数据表3之间具有间接血缘关系。通过对数据表之间的数据血缘关系进行分析,可以很清楚地了解数据表的迁徙流转,为数据表价值的评估以及数据表的管理提供依据。Data blood relationship, also known as data lineage relationship, data origin relationship and data lineage relationship, etc., refers to a relationship that will be formed between data tables in the process of generation, fusion, transformation, circulation and death of data tables . As shown in Figure 2, assuming that the original data is stored in data table 1, after calculating some or all of the original data in data table 1, an intermediate table including intermediate data (ie, some or all of the calculated original data) is obtained. 2. After calculating the intermediate data from data table 1 in intermediate table 2, data table 3 including final data is formed. At this time, the data link from data table 1 to data table 2 to data table 3 is Indicates the data blood relationship of these three tables. Specifically, it can be said that data table 1 and data table 2 have a direct blood relationship, data table 2 and data table 3 have a direct blood relationship, and data table 1 and data table 3 have an indirect blood relationship. By analyzing the data blood relationship between the data tables, the migration and circulation of the data tables can be clearly understood, which provides a basis for the evaluation of the value of the data tables and the management of the data tables.
在数据表1、数据表2和数据表3之间的数据血缘关系如图2所示的情况下,可以看出,数据表2直接依赖于数据表1,数据表3直接依赖于数据表2,间接依赖于数据表1。可以理解,若数据表1中用于计算得到数据表2和数据表3的数据被访问了,则表示数据表2和数据表3被间接访问了,也就是说,数据表1在一定程度上对数据表2的热度和数据表3的热度有提高作用;若数据表2中来源于数据表1的数据被访问了,则表示数据表1和数据表3被间接访问了,也就是说,数据表2在一定程度上对数据表1的热度和数据表3的热度有提高作用;若数据表3中来源于数据表2的数据被访问了,则表示数据表1和数据表2被间接访问了,也就是说,数据表3在一定程度上对数据表1的热度和数据表2的热度有提高作用。In the case of the data blood relationship between data table 1, data table 2 and data table 3 as shown in Figure 2, it can be seen that data table 2 directly depends on data table 1, and data table 3 directly depends on data table 2 , indirectly dependent on Data Table 1. It can be understood that if the data used to calculate data table 2 and data table 3 in data table 1 is accessed, it means that data table 2 and data table 3 are indirectly accessed, that is, data table 1 is to a certain extent. It can improve the popularity of data table 2 and data table 3; if the data from data table 1 in data table 2 is accessed, it means that data table 1 and data table 3 are indirectly accessed, that is to say, To a certain extent, data table 2 can improve the popularity of data table 1 and data table 3; if the data from data table 2 in data table 3 is accessed, it means that data table 1 and data table 2 are indirectly Accessed, that is to say, Data Sheet 3 has an effect on the popularity of Data Sheet 1 and the popularity of Data Sheet 2 to a certain extent.
因此,可以理解,在确定数据表1的热度、数据表2的热度、数据表3的热度时,若除了考虑每个数据表的固有热度外,将每个数据表因与其具有数据血缘关系(包括直接血缘关 系和间接血缘关系)的其他数据表提高的热度也考虑进来,则确定的每个数据表的热度会更加准确,能够更好地突出每个数据表的重要性。Therefore, it can be understood that when determining the popularity of data table 1, the popularity of data table 2, and the popularity of data table 3, if in addition to considering the inherent popularity of each data table, each data table has a data blood relationship with it ( Taking into account the increased popularity of other data tables (including direct blood relationship and indirect blood relationship), the determined popularity of each data table will be more accurate and can better highlight the importance of each data table.
主外键关联关系(primary key-foreign key relationship),其定义了关系型数据库中两个表之间的一种关联关系,如图3所示,数据表1中的一个或者多个字段A1被引用作为数据表2'的主键,此时,则称数据表1中的字段A1是指向数据表2'的外键,数据表1和数据表2'具有主外键关联关系。The primary key-foreign key relationship defines a relationship between two tables in a relational database. As shown in Figure 3, one or more fields A1 in data table 1 are Reference is made as the primary key of data table 2', at this time, the field A1 in data table 1 is said to be a foreign key pointing to data table 2', and data table 1 and data table 2' have a primary-foreign key association relationship.
如图3所示,数据表2'的主键还被引用作为了数据表3'的主键,此时,则称数据表1和数据表3'也具有主外键关联关系,为了区分与描述,则称数据表1和数据表2'之间的主外键关联关系以及数据表2'和数据表3'之间的主外键关联关系为直接主外键关联关系,数据表2'和数据表3'之间的主外键关联关系为间接主外键关联关系。As shown in Figure 3, the primary key of data table 2' is also referenced as the primary key of data table 3'. At this time, it is said that data table 1 and data table 3' also have a primary and foreign key association relationship. In order to distinguish and describe, The primary and foreign key associations between data table 1 and data table 2' and the primary and foreign key associations between data table 2' and data table 3' are called direct primary and foreign key associations, and data table 2' and data The primary and foreign key associations between tables 3' are indirect primary and foreign key associations.
在数据表1、数据表2'、数据表3'之间的主外键关联关系如图3所示的情况下,可以看出,数据表2'直接依赖于数据表1,数据表3'直接依赖于数据表2',间接依赖于数据表1。可以理解,若数据表1中的字段A1被访问了,则表示数据表2和数据表3被间接访问了,也就是说,数据表1在一定程度上对数据表2的热度和数据表3的热度有提高作用;若数据表2'的主键被访问了,则表示数据表1和数据表3'被间接访问了,也就是说,数据表2'在一定程度上对数据表1和数据表3'的热度有提高作用;若数据表3'的主键被访问了,则表示数据表1和数据表2'被间接访问了,也就是说,数据表3'在一定程度上对数据表1和数据表2'的热度有提高作用。In the case of the primary and foreign key associations between data table 1, data table 2', and data table 3' as shown in Figure 3, it can be seen that data table 2' directly depends on data table 1, data table 3' Direct dependency on Data Sheet 2' and indirect dependency on Data Sheet 1. It can be understood that if the field A1 in data table 1 is accessed, it means that data table 2 and data table 3 are indirectly accessed, that is to say, data table 1 has a certain degree of interest in data table 2 and data table 3. If the primary key of data table 2' is accessed, it means that data table 1 and data table 3' are indirectly accessed, that is to say, data table 2' has a certain degree of influence on data table 1 and data table 1 and data table 3'. The popularity of table 3' has an effect of improving; if the primary key of data table 3' is accessed, it means that data table 1 and data table 2' are indirectly accessed, that is to say, data table 3' has a certain degree of influence on data table 1 and Data Sheet 2' heat up.
因此,可以理解,在确定数据表1的热度、数据表2'的热度、数据表3'的热度时,若除了考虑每个数据表的固有热度外,将每个数据表因与其具有主外键关联关系(包括直接主外键关联关系和间接主外键关联关系)的其他数据表提高的热度也考虑进来,则确定的每个数据表的热度会更加准确,能够更好地突出每个数据表的重要性。Therefore, it can be understood that when determining the popularity of data table 1, the popularity of data table 2', and the popularity of data table 3', if in addition to considering the inherent popularity of each data table, each data table has its main external The increased popularity of other data tables of key associations (including direct primary and foreign key associations and indirect primary and foreign key associations) is also taken into account, and the determined popularity of each data table will be more accurate, which can better highlight each data table. Importance of data sheets.
(2)数据表的关联热度,指关联的数据表为被关联的数据表带来的热度,如上述数据表1因与其具有数据血缘关系的数据表2和/或数据表3提高的热度、数据表1因与其具有主外键关联关系的数据表2'和/或数据表3'提高的热度等。(2) The relevance of the data table, which refers to the heat brought by the associated data table to the associated data table, such as the above-mentioned data table 1 due to the data table 2 and/or data table 3 that has a data blood relationship with it. Data table 1 has increased popularity due to data table 2' and/or data table 3' having a primary and foreign key relationship with it.
(3)数据表的固有热度的获取过程:(3) The process of acquiring the inherent heat of the data sheet:
以服务节点获取数据表1的固有热度为例,该过程包括但不限于如下步骤:Taking the service node to obtain the inherent heat of data table 1 as an example, the process includes but is not limited to the following steps:
A1、服务节点从存储节点获取数据表1的数据操作的日志信息,根据数据表1的数据操作的日志信息获取数据表1的数据操作的信息。A1. The service node obtains the log information of the data operation of the data table 1 from the storage node, and obtains the information of the data operation of the data table 1 according to the log information of the data operation of the data table 1.
其中,数据表1的数据操作的日志信息,表示有用户在对数据表1进行数据操作时,存储节点自动记录下来的关于用户进行的数据操作的日志信息,该日志信息包括用户对数据表1进行的数据操作的信息,例如对数据表1进行的数据操作的类型(如删除数据、新增数据等)和对数据表1进行数据操作的时刻等信息,因此,可以根据数据表1的数据操作的日志信息获取数据表1的数据操作的信息。Among them, the log information of the data operation of the data table 1 indicates that there is log information about the data operation performed by the user that is automatically recorded by the storage node when the user performs data operations on the data table 1, and the log information includes the user's data operation on the data table 1. Information about the data operations performed, such as the type of data operations performed on Data Table 1 (such as deleting data, adding data, etc.) and the time of data operations on Data Table 1. Therefore, according to the data in Table 1 Operation log information Get information about data operation of data table 1.
在具体实现中,服务节点可以从存储节点获取数据表1在预设时间段内的日志信息,然后根据该日志信息获取在预设时间段内数据表1的数据操作的信息,例如,服务节点可以获取数据表1在2020年度的日志信息,然后根据数据表1在2020年度的日志信息获取在2020年度内数据表1的数据操作的信息。In a specific implementation, the service node can obtain the log information of the data table 1 within a preset time period from the storage node, and then obtain the data operation information of the data table 1 within the preset time period according to the log information, for example, the service node You can obtain the log information of data table 1 in 2020, and then obtain the information of data operation of data table 1 in 2020 according to the log information of data table 1 in 2020.
A2、服务节点根据数据表1的数据操作的信息确定数据表1的被调用次数。A2. The service node determines the number of times the data table 1 is called according to the data operation information of the data table 1.
具体地,可以根据数据表1的数据操作的信息统计在数据表1中查询数据的次数、在数据表1中新增数据的次数、在数据表1中删除数据的次数以及在数据表1中修改数据的次数等,然后对上述次数进行求和运算即可确定数据表1的被调用次数。Specifically, the number of times of querying data in data table 1, the number of times of adding data in data table 1, the number of times of deleting data in data table 1, and the number of times of deleting data in data table 1 and The number of times of modifying the data, etc., and then summing the above times can determine the number of times the data table 1 is called.
A3、根据数据表1的被调用次数确定数据表1的固有热度。A3. Determine the inherent heat of data table 1 according to the number of times data table 1 is called.
在一种具体的实施例中,数据表1的固有热度=数据表1的被调用次数。In a specific embodiment, the inherent popularity of the data table 1 = the number of times the data table 1 is called.
下面继续介绍本申请实施例提供的数据表热度区分方法、装置以及相关设备,在本申请实施例提高的数据表热度区分方法、装置以及相关设备中,服务节点可以从存储节点获取与第一数据表关联的第二数据表,然后根据第一数据表和第二数据表的关联关系以及第二数据表的固有热度获取第一数据表和第二数据表的关联热度,在获取到第一数据表和第二数据表的关联热度后,根据该关联热度确定第一数据表的热度,其中,第一数据表和第二数据表的关联关系包括数据血缘关系和主外键关联关系中的一种或者多种。The following will continue to introduce the method, apparatus, and related equipment for distinguishing the heat of a data table provided by the embodiments of the present application. In the method, apparatus, and related equipment for distinguishing the heat of a data table improved by the embodiments of the present application, the service node can obtain and first data from the storage node. The second data table associated with the table, and then obtain the correlation heat of the first data table and the second data table according to the correlation relationship between the first data table and the second data table and the inherent heat of the second data table, after the first data table is obtained. After the correlation between the table and the second data table, the popularity of the first data table is determined according to the correlation, wherein the correlation between the first data table and the second data table includes a data blood relationship and a primary and foreign key correlation. species or multiple species.
下面结合图4对本申请实施例提供的一种数据表热度区分方法进行更详细的描述,如图4所示,本申请实施例提供的数据表热度区分方法包括但不限于如下步骤:A method for distinguishing the heat of a data table provided by the embodiment of the present application is described in more detail below with reference to FIG. 4 . As shown in FIG. 4 , the method for distinguishing the heat of a data table provided by the embodiment of the present application includes but is not limited to the following steps:
S101、服务节点从存储节点获取第一数据表,以及与第一数据表具有数据血缘关系的第二数据表。S101. The service node acquires a first data table and a second data table having a data blood relationship with the first data table from a storage node.
其中,存储节点存储有多个数据表,第一数据表可以为存储节点存储的多个数据表中的任意一个或者多个数据表。存储节点存储的多个数据表可以为事务数据表、网页数据表等各种类型的表格,多个数据表可以为sql server数据库、oracle数据库等数据库中的表格,也可以为用户临时创建的不属于任何数据库的表格,此处不作具体限定。The storage node stores multiple data tables, and the first data table may be any one or more data tables among the multiple data tables stored by the storage node. The multiple data tables stored by the storage node can be various types of tables such as transaction data tables and web page data tables. Tables belonging to any database, not specifically limited here.
由上文对数据血缘关系的介绍可知,第一数据表和第二数据表之间具有数据血缘关系表示第二数据表根据第一数据表计算得到,和/或,第一数据表根据第二数据表计算得到。具体地,第一数据表和第二数据表之间的数据血缘关系可以为直接血缘关系,也可以为间接血缘关系,此处不作具体限定。As can be seen from the above introduction to the data blood relationship, the data blood relationship between the first data table and the second data table means that the second data table is calculated according to the first data table, and/or, the first data table is based on the second data table. Calculated from the data sheet. Specifically, the data blood relationship between the first data table and the second data table may be a direct blood relationship or an indirect blood relationship, which is not specifically limited here.
在具体实现中,服务节点在获取到第一数据表之后,可以通过数据仓库工具(如hive)或者sql语句从存储节点获取与第一数据表具有数据血缘关系的第二数据表,其中,hive是基于Hadoop的一个数据仓库工具,用来进行数据提取、转化、加载,这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。In a specific implementation, after obtaining the first data table, the service node can obtain the second data table that has a data blood relationship with the first data table from the storage node through a data warehouse tool (such as hive) or a SQL statement, wherein hive It is a data warehouse tool based on Hadoop for data extraction, transformation and loading. It is a mechanism for storing, querying and analyzing large-scale data stored in Hadoop.
需要说明的是,服务节点通过数据仓库工具或者sql语句从存储节点获取与第一数据表具有数据血缘关系的第二数据表,仅仅是作为一种示例,不应视为具体限定。在具体实现中,服务节点还可以通过其他方式获取与第一数据表具有数据血缘关系的第二数据表,如人工阅读代码查找与第一数据表具有数据血缘关系的第二数据表,服务节点接收人工输入的与第一数据表具有数据血缘关系的第二数据表的名称,然后根据人工输入的第二数据表的名称获取第二数据表。It should be noted that the service node obtains the second data table having a data blood relationship with the first data table from the storage node through the data warehouse tool or the SQL statement, which is only an example and should not be regarded as a specific limitation. In a specific implementation, the service node can also obtain the second data table that has a data blood relationship with the first data table in other ways, such as manually reading the code to find the second data table that has a data blood relationship with the first data table, and the service node Receive the manually input name of the second data table that has a data blood relationship with the first data table, and then acquire the second data table according to the manually input name of the second data table.
S102、服务节点获取第一数据表的固有热度H 0S102. The service node acquires the inherent heat H 0 of the first data table.
其中,第一数据表的固有热度H 0,为第一数据表自身被调用而产生的热度。 Wherein, the inherent heat H 0 of the first data table is the heat generated by the first data table itself being called.
S103、服务节点根据第一数据表和第二数据表的数据血缘关系以及第二数据表的固有热度,计算第一数据表和第二数据表的关联热度H 1S103 , the service node calculates the correlation heat H 1 of the first data table and the second data table according to the data blood relationship between the first data table and the second data table and the inherent heat of the second data table.
其中,第二数据表的固有热度,为第二数据表自身被调用而产生的热度。Among them, the inherent heat of the second data table is the heat generated by the second data table itself being called.
具体地,服务节点在获取到与第一数据表具有数据血缘关系的第二数据表后,可以根据第一数据表和第二数据表的数据血缘关系确定第二数据表对应的血缘权重,以及计算第二数 据表的固有热度,然后根据第二数据表对应的血缘权重和第二数据表的固有热度计算第一数据表和第二数据表的关联热度H 1Specifically, after acquiring the second data table that has a data blood relationship with the first data table, the service node can determine the blood relationship weight corresponding to the second data table according to the data blood relationship between the first data table and the second data table, and Calculate the inherent heat of the second data table, and then calculate the associated heat H 1 of the first data table and the second data table according to the blood relationship weight corresponding to the second data table and the inherent heat of the second data table.
举例来讲,如图5所示,假设与第一数据表具有数据血缘关系的第二数据表有两个,即数据表A和数据表B,其中,第二数据表A与第一数据表具有直接血缘关系,第二数据表B与第一数据表具有间接血缘关系,假设第二数据表A的固有热度为H 0,A,第二数据表B的固有热度为H 0,B,第二数据表A对应的血缘权重为W A,第二数据表B对应的血缘权重为W B,则服务节点获取的第一数据表和第二数据表A、B的关联热度H 1为: For example, as shown in FIG. 5 , it is assumed that there are two second data tables that have a data blood relationship with the first data table, namely data table A and data table B, wherein the second data table A and the first data table There is a direct blood relationship, and the second data table B has an indirect blood relationship with the first data table. Assuming that the inherent heat of the second data table A is H 0,A , the inherent heat of the second data table B is H 0,B , The blood relationship weight corresponding to the second data table A is W A , and the blood relationship weight corresponding to the second data table B is W B , then the correlation heat H 1 of the first data table and the second data tables A and B obtained by the service node is:
H 1=W A*H 0,A+W B*H 0,B H 1 =W A *H 0,A +W B *H 0,B
其中,W A和W B均为大于0且小于1的数,考虑到第二数据表A与第一数据表具有直接血缘关系,第二数据表B与第一数据表具有间接血缘关系,第二数据表A与第一数据表的关系更近,优选地,W A大于W BAmong them, W A and W B are both numbers greater than 0 and less than 1. Considering that the second data table A has a direct blood relationship with the first data table, and the second data table B has an indirect blood relationship with the first data table, the first data table A has an indirect blood relationship with the first data table. The relationship between the second data table A and the first data table is closer, preferably, W A is greater than W B .
S104、服务节点根据第一数据表的固有热度H 0以及第一数据表和第二数据表的关联热度H 1,确定第一数据表的热度H。 S104. The service node determines the heat H 0 of the first data table according to the inherent heat H 0 of the first data table and the associated heat H 1 of the first data table and the second data table.
在本申请具体的实施例中,H=H 0+H 1In a specific embodiment of the present application, H=H 0 +H 1 .
需要说明的是,为了简便陈述,本申请实施例没有对第一数据表的固有热度H 0的获取过程以及第二数据表的固有热度的获取过程展开描述,具体可以参考上文所描述的数据表1的固有热度的获取过程,此处不再展开赘述。 It should be noted that, for the sake of simplicity, the embodiments of this application do not describe the process of acquiring the intrinsic heat H 0 of the first data table and the process of acquiring the intrinsic heat of the second data table. For details, please refer to the data described above. The acquisition process of the inherent heat in Table 1 will not be repeated here.
请参见图6,图6是本申请实施例提供的另一种数据表热度区分方法的流程示意图,如图6所示,本申请实施例提供的数据表热度区分方法包括但不限于如下步骤:Please refer to FIG. 6. FIG. 6 is a schematic flowchart of another method for distinguishing the heat of a data table provided by an embodiment of the present application. As shown in FIG. 6, the method for distinguishing the heat of a data table provided by an embodiment of the present application includes but is not limited to the following steps:
S201、服务节点从存储节点获取第一数据表,以及与第一数据表具有主外键关联关系的第二数据表。S201. The service node obtains a first data table and a second data table having a primary and foreign key association relationship with the first data table from a storage node.
由上文对主外键关联关系的介绍可知,第一数据表和第二数据表之间具有主外键关联关系表示第一数据表中的一个或者多个字段被引用作为第二数据表的主键,和/或,第二数据表中的一个或者多个字段被引用作为第一数据表的主键。From the above description of the primary and foreign key associations, it can be seen that there is a primary and foreign key association between the first data table and the second data table, which means that one or more fields in the first data table are referenced as the second data table. The primary key, and/or, one or more fields in the second data table are referenced as the primary key of the first data table.
在具体实现中,服务节点在获取到第一数据表之后,可以通过数据仓库工具或者sql语句从存储节点获取与第一数据表具有主外键关联关系的第二数据表。In a specific implementation, after obtaining the first data table, the service node may obtain the second data table having a primary and foreign key association relationship with the first data table from the storage node through a data warehouse tool or a SQL statement.
需要说明的是,服务节点通过数据仓库工具或者sql语句从存储节点获取第一数据表具有主外键关联关系的第二数据表,仅仅是作为一种示例。在具体实现中,服务节点还可以通过其他方式获取与第一数据表具有主外键关联关系的第二数据表,如人工阅读代码查找与第一数据表具有主外键关联关系的第二数据表,服务节点接收人工输入的与第一数据表具有主外键关联关系的第二数据表的名称,然后根据人工输入的第二数据表的名称获取到第二数据表。It should be noted that the service node obtains the second data table with the primary and foreign key association relationship in the first data table from the storage node through the data warehouse tool or the SQL statement, which is only an example. In a specific implementation, the service node can also obtain the second data table that has a primary and foreign key relationship with the first data table in other ways, such as manually reading the code to find the second data that has a primary and foreign key relationship with the first data table. table, the service node receives the manually input name of the second data table that has a primary foreign key relationship with the first data table, and then obtains the second data table according to the manually input name of the second data table.
S202、服务节点获取第一数据表的固有热度H 0S202. The service node acquires the inherent heat H 0 of the first data table.
S203、服务节点根据第一数据表和第二数据表的主外键关联关系以及第二数据表的固有热度,计算第一数据表和第二数据表的关联热度H 1S203. The service node calculates the association heat H 1 of the first data table and the second data table according to the primary and foreign key association relationship between the first data table and the second data table and the inherent heat of the second data table.
具体地,服务节点在获取到与第一数据表具有主外键关联关系的第二数据表后,可以根据第一数据表和第二数据表的主外键关联关系确定第二数据表对应的关联权重,以及计算第二数据表的固有热度,然后根据第二数据表对应的关联权重和第二数据表的固有热度计算第一数据表和第二数据表的关联热度H 1Specifically, after acquiring the second data table that has the primary and foreign key association relationship with the first data table, the service node can determine the corresponding data table according to the primary and foreign key association relationship between the first data table and the second data table. association weight, and calculating the inherent heat of the second data table, and then calculating the association heat H 1 of the first data table and the second data table according to the association weight corresponding to the second data table and the inherent heat of the second data table.
举例来讲,如图7所示,假设与第一数据表具有主外键关联关系的第二数据表有两个,即数据表C和数据表D,其中,第二数据表C与第一数据表具有直接主外键关联关系,第二 数据表D与第一数据表具有间接主外键关联关系,假设第二数据表C的固有热度为H 0,C,第二数据表D的固有热度为H 0,D,第二数据表C对应的关联权重为W C,第二数据表D对应的关联权重为W D,则服务节点获取的第一数据表和第二数据表C、D的关联热度H 1为: For example, as shown in FIG. 7 , it is assumed that there are two second data tables with a primary foreign key association relationship with the first data table, namely data table C and data table D, wherein the second data table C and the first data table The data table has a direct primary and foreign key relationship, and the second data table D has an indirect primary and foreign key relationship with the first data table. It is assumed that the inherent heat of the second data table C is H 0,C , and the inherent heat of the second data table D is H 0,C . The heat is H 0,D , the correlation weight corresponding to the second data table C is WC , and the correlation weight corresponding to the second data table D is WD , then the first data table and the second data table C, D obtained by the service node The associated heat H 1 is:
H 1=W C*H 0,C+W D*H 0,D H 1 =W C *H 0,C +W D *H 0,D
其中,W C和W D均为大于0且小于1的数,考虑到第二数据表C与第一数据表具有直接主外键关联关系,第二数据表D与第一数据表具有间接主外键关联关系,第二数据表C与第一数据表的关系更近,优选地,W C大于W DWherein, both W C and W D are numbers greater than 0 and less than 1. Considering that the second data table C and the first data table have a direct primary and foreign key association relationship, the second data table D and the first data table have an indirect primary and foreign key relationship. In the foreign key association relationship, the relationship between the second data table C and the first data table is closer, preferably, W C is greater than W D .
S204、服务节点根据第一数据表的固有热度H 0以及第一数据表和第二数据表的关联热度H 1,确定第一数据表的热度H。 S204. The service node determines the heat H 0 of the first data table according to the inherent heat H 0 of the first data table and the associated heat H 1 of the first data table and the second data table.
在本申请具体的实施例中,H=H 0+H 1In a specific embodiment of the present application, H=H 0 +H 1 .
可以理解,在服务节点从存储节点获取与第一数据表具有关联关系的第二数据表时,若不仅获取到了与第一数据表具有数据血缘关系的第二数据表,还获取到了与第一数据表具有主外键关联关系的第二数据表,则服务节点计算得到的第一数据表和第二数据表的关联热度H 1不仅包括与第一数据表具有数据血缘关系的第二数据表带来的热度,还包括与第一数据表具有主外键关联关系的第二数据表带来的热度。 It can be understood that when the service node obtains the second data table that has an associated relationship with the first data table from the storage node, if it not only obtains the second data table that has a data blood relationship with the first data table, but also obtains the second data table that is related to the first data table. The data table has a second data table with a primary and foreign key association relationship, then the correlation H1 between the first data table and the second data table calculated by the service node includes not only the second data table that has a data blood relationship with the first data table. The heat brought by it also includes the heat brought by the second data table that has a primary and foreign key association relationship with the first data table.
继续以上文所举的图5和图7的例子为例,假设第一数据表既具有图5所示的数据血缘关系,又具有图7所示的主外键关联关系,则服务节点获取的第一数据表和第二数据表的关联热度H 1为: Continuing to take the examples of FIG. 5 and FIG. 7 mentioned above as an example, assuming that the first data table has both the data blood relationship shown in FIG. 5 and the primary and foreign key association shown in FIG. The associated heat H 1 of the first data table and the second data table is:
H 1=W A*H 0,A+W B*H 0,B+W C*H 0,C+W D*H 0,D H 1 =W A *H 0,A +W B *H 0,B +W C *H 0,C +W D *H 0,D
由于第一数据表的热度H=第一数据表的固有热度H 0+第一数据表和第二数据表的关联热度H 1,因此,可以理解,在第一数据表和第二数据表的关联热度H 1不仅包括与第一数据表具有数据血缘关系的第二数据表带来的热度,还包括与第一数据表具有主外键关联关系的第二数据表带来的热度的情况下,服务节点计算得到的第一数据表的热度H也不仅包括与第一数据表具有数据血缘关系的第二数据表带来的热度,还包括与第一数据表具有主外键关联关系的第二数据表带来的热度。 Since the heat H of the first data table = the inherent heat H 0 of the first data table + the associated heat H 1 of the first data table and the second data table, it can be understood that in the first data table and the second data table The association heat H 1 includes not only the heat brought by the second data table that has a data blood relationship with the first data table, but also the heat brought by the second data table that has a primary and foreign key association relationship with the first data table. , the heat H of the first data table calculated by the service node not only includes the heat brought by the second data table that has a data blood relationship with the first data table, but also includes the first data table that has a primary and foreign key association relationship with the first data table. The heat brought by the data sheet.
可以理解,服务节点可以根据上文提供的数据表热度区分方法获取到多个数据表的热度,在服务节点获取到多个数据表的热度的情况下,服务节点便可以从多个数据表中区分哪些数据表热度较高,哪些数据表热度较低,从而实现对多个数据表进行管理的目的。It can be understood that the service node can obtain the heat of multiple data tables according to the data table heat discrimination method provided above. When the service node obtains the heat of multiple data tables, the service node can obtain the heat of multiple data tables. Distinguish which data tables are more popular and which are less popular, so as to manage multiple data tables.
在一种可能的实施例中,服务节点在获取到多个数据表的热度后,可以根据多个数据表的热度从存储节点删除热度小于第一预设阈值的数据表,以节省存储空间。In a possible embodiment, after acquiring the heatness of multiple data tables, the service node may delete data tables whose heatness is less than the first preset threshold from the storage node according to the heatness of the multiple data tables, so as to save storage space.
在一种可能的实施例中,服务节点在获取到多个数据表的热度后,可以根据多个数据表的热度将多个数据表中热度大于第二预设阈值的数据表在显示界面上的位置调整到热度小于第二预设阈值的数据表的前面,也就是说,将热度大于第二预设阈值的数据表在显示界面上的位置调整到更便于用户查看的位置,便于用户方便快捷地查看到热度大的数据表。In a possible embodiment, after acquiring the popularity of the multiple data tables, the service node may display the data tables whose popularity is greater than the second preset threshold from the multiple data tables on the display interface according to the popularity of the multiple data tables The position of the data table is adjusted to the front of the data table whose heat is less than the second preset threshold, that is to say, the position of the data table whose heat is greater than the second preset threshold on the display interface is adjusted to a position that is more convenient for users to view, which is convenient for users. Quickly view popular data sheets.
在一种可能的实施例中,服务节点在获取到多个数据表的热度后,还可以将热度小于第三预设阈值的数据表迁移到第一存储装置,以及将热度大于第四预设阈值的数据表迁移到第二存储装置,其中,第一存储装置的存储性能低于存储节点,第二存储装置的存储性能高于存储节点。In a possible embodiment, after acquiring the heatness of multiple data tables, the service node may further migrate data tables whose heatness is less than the third preset threshold to the first storage device, and the heatness is greater than the fourth preset threshold. The data table of the threshold is migrated to the second storage device, wherein the storage performance of the first storage device is lower than that of the storage node, and the storage performance of the second storage device is higher than that of the storage node.
第一预设阈值、第二预设阈值、第三预设阈值、第四预设阈值的大小可以根据实际情况设置,此处不作具体限定。The sizes of the first preset threshold, the second preset threshold, the third preset threshold, and the fourth preset threshold can be set according to actual conditions, and are not specifically limited here.
可以理解,服务节点将热度小的数据表迁移到存储性能低于存储节点的第一存储装置, 不仅可以避免热度小的数据表继续占用存储节点资源,后续用户需要查看这部分数据表时,还可以从第一存储装置中找到;服务节点将热度大的数据表迁移到存储性能高于存储节点的第二存储装置,可以提升在热度大的数据表中操作数据的效率,以及提升热度大的数据表的存储安全性。It can be understood that the service node migrates the data tables with low heat to the first storage device whose storage performance is lower than that of the storage node, which not only prevents the data tables with low heat from continuing to occupy the resources of the storage node, but also prevents the subsequent users from viewing this part of the data tables. It can be found from the first storage device; the service node migrates the hot data table to the second storage device with higher storage performance than the storage node, which can improve the efficiency of operating data in the hot data table, and improve the efficiency of the hot data table. Data sheet storage security.
由上述实施例可以看出,本申请实施例提供的数据表热度区分方法在确定第一数据表的热度H时,引入了与第一数据表具有关联关系的第二数据表为第一数据表带来的热度,即第一数据表和第二数据表的关联热度H 1,可以使得计算得到的第一数据表的热度H更准确,更加符合实际应用场景,在获取了多个数据表的热度的情况下,能够更好地区分多个数据表的热度。 It can be seen from the above embodiments that the method for distinguishing the heat of data tables provided by the embodiments of the present application introduces a second data table that has an associated relationship with the first data table as the first data table when determining the heat degree H of the first data table. The resulting heat, that is, the associated heat H 1 between the first data table and the second data table, can make the calculated heat H of the first data table more accurate and more in line with the actual application scenario. In the case of hotness, the hotness of multiple data tables can be better distinguished.
上文详细阐述了本申请实施例的一种数据表热度区分方法,基于相同的发明构思,下面继续提供本申请实施例的数据表热度区分装置。A method for distinguishing the heatness of a data table according to an embodiment of the present application is described in detail above. Based on the same inventive concept, the apparatus for distinguishing the heatness of a data table in an embodiment of the present application is continued below.
参见图8,图8是本申请实施例提供的一种数据处理系统10的结构示意图,该数据处理系统10中包括本申请实施例提供的一种数据表热度区分装置1100,数据表热度区分装置1100包括:获取模块1101和处理模块1102,该数据表热度区分装置1100可以集成于数据处理系统10中的服务节点110,数据处理系统10除了包括服务节点110之外,还可以包括存储节点120、第一存储装置130、第二存储装置140,其中,Referring to FIG. 8, FIG. 8 is a schematic structural diagram of a data processing system 10 provided by an embodiment of the present application. The data processing system 10 includes a data table heat distinguishing device 1100 provided by an embodiment of the present application. The data table heat distinguishing device 1100 includes: an acquisition module 1101 and a processing module 1102, the data table heat discrimination device 1100 can be integrated into the service node 110 in the data processing system 10, and the data processing system 10 can include, in addition to the service node 110, a storage node 120, The first storage device 130 and the second storage device 140, wherein,
存储节点120存储有多个数据表;The storage node 120 stores a plurality of data tables;
获取模块1101,用于从存储节点120获取与第一数据表关联的第二数据表;an obtaining module 1101, configured to obtain a second data table associated with the first data table from the storage node 120;
处理模块1102,用于获取第一数据表和第二数据表的关联热度H 1,其中,第一数据表和第二数据表的关联热度H 1根据第二数据表的固有热度以及第一数据表和第二数据表的关联关系获得,第二数据表的固有热度为第二数据表被调用产生的热度; The processing module 1102 is configured to obtain the associated heat H 1 of the first data table and the second data table, wherein the associated heat H 1 of the first data table and the second data table is based on the inherent heat of the second data table and the first data The association relationship between the table and the second data table is obtained, and the inherent heat of the second data table is the heat generated by the second data table being called;
处理模块1102,用于根据第一数据表和第二数据表的关联热度H 1确定第一数据表的热度H。 The processing module 1102 is configured to determine the popularity H of the first data table according to the correlation H1 of the first data table and the second data table.
在一种可能的实施例中,获取模块1101,具体用于:In a possible embodiment, the obtaining module 1101 is specifically used for:
从存储节点120获取与第一数据表具有数据血缘关系的第二数据表,其中,数据血缘关系表示第二数据表根据第一数据表计算得到,和/或,第一数据表根据第二数据表计算得到;Acquire a second data table having a data blood relationship with the first data table from the storage node 120, wherein the data blood relationship indicates that the second data table is calculated according to the first data table, and/or the first data table is based on the second data Calculated from the table;
处理模块1102,具体用于:The processing module 1102 is specifically used for:
根据第一数据表和第二数据表的数据血缘关系,计算第一数据表和第二数据表的关联热度H 1According to the data blood relationship between the first data table and the second data table, the correlation heat H 1 of the first data table and the second data table is calculated.
在一种可能的实施例中,获取模块1101,具体用于:In a possible embodiment, the obtaining module 1101 is specifically used for:
从存储节点120获取与第一数据表具有主外键关联关系的第二数据表,其中,主外键关联关系表示第一数据表中的一个或者多个字段被引用作为第二数据表的主键,和/或,第二数据表中的一个或者多个字段被引用作为第一数据表的主键;Acquire a second data table having a primary-foreign key association relationship with the first data table from the storage node 120, wherein the primary-foreign key association relationship indicates that one or more fields in the first data table are referenced as the primary key of the second data table , and/or, one or more fields in the second data table are referenced as the primary key of the first data table;
处理模块1102,具体用于:The processing module 1102 is specifically used for:
根据第一数据表和第二数据表的主外键关联关系,计算第一数据表和第二数据表的关联热度H 1According to the primary and foreign key association relationship between the first data table and the second data table, the association heat H 1 of the first data table and the second data table is calculated.
在一种可能的实施例中,处理模块1102,具体用于:In a possible embodiment, the processing module 1102 is specifically configured to:
根据第一数据表的固有热度H 0和第一数据表和第二数据表的关联热度H 1确定第一数据表的热度H,其中,第一数据表的固有热度H 0为第一数据表被调用产生的热度。 The heat H 0 of the first data table is determined according to the inherent heat H 0 of the first data table and the associated heat H 1 of the first data table and the second data table, wherein the inherent heat H 0 of the first data table is the first data table The heat generated by the call.
在一种可能的实施例中,处理模块1102,还用于:In a possible embodiment, the processing module 1102 is further configured to:
计算多个数据表的热度;Calculate the heat of multiple data tables;
根据计算结果从存储节点120中删除热度小于第一预设阈值的数据表。According to the calculation result, the data table whose heat is less than the first preset threshold is deleted from the storage node 120 .
在一种可能的实施例中,处理模块1102,还用于:In a possible embodiment, the processing module 1102 is further configured to:
根据计算结果将多个数据表中热度大于第二预设阈值的数据表在显示界面上的位置调整到热度小于第二预设阈值的数据表的前面。According to the calculation result, the position on the display interface of the data table whose heat is greater than the second preset threshold among the plurality of data tables is adjusted to be in front of the data table whose heat is less than the second preset threshold.
在一种可能的实施例中,处理模块1102,还用于:In a possible embodiment, the processing module 1102 is further configured to:
根据计算结果将热度小于第三预设阈值的数据表迁移到第一存储装置130,以及将热度大于第四预设阈值的数据表迁移到第二存储装置140,其中,第一存储装置130的性能低于存储节点120,第二存储装置140的性能高于存储节点120。According to the calculation result, the data tables whose heat is less than the third preset threshold are migrated to the first storage device 130 , and the data tables whose heat is greater than the fourth preset threshold are migrated to the second storage device 140 , wherein the data tables of the first storage device 130 are The performance is lower than that of the storage node 120 , and the performance of the second storage device 140 is higher than that of the storage node 120 .
第一预设阈值、第二预设阈值、第三预设阈值、第四预设阈值的大小可以根据实际情况设置,此处不作具体限定。The sizes of the first preset threshold, the second preset threshold, the third preset threshold, and the fourth preset threshold can be set according to actual conditions, and are not specifically limited here.
具体地,上述数据处理系统10中的数据表热度区分装置1100执行各种操作的具体实现,可参照上述数据表热度区分方法实施例中相关内容中的描述,为了说明书的简洁,这里不再赘述。Specifically, for the specific implementation of various operations performed by the data table heat discrimination device 1100 in the above data processing system 10, reference may be made to the description in the relevant content in the above-mentioned embodiment of the data table heat discrimination method. .
应当理解,数据处理系统10以及数据表热度区分装置1100仅为本申请实施例提供的一个例子,并且,数据处理系统10以及数据表热度区分装置1100可具有比图8示出的部件更多或更少的部件,可以组合两个或更多个部件,或者可具有部件的不同配置实现。It should be understood that the data processing system 10 and the apparatus 1100 for distinguishing the heat of a data table are only an example provided by the embodiments of the present application, and the data processing system 10 and the apparatus 1100 for distinguishing the heat of a data table may have more or more components than those shown in FIG. 8 . Fewer components, two or more components may be combined, or may be implemented with different configurations of components.
本申请实施例还提供一种计算设备集群20,所述计算设备集群20可以用于部署图8所示的数据处理系统10,具体可以用于部署图8所示数据处理系统10中的数据表热度区分装置1100,以执行本申请实施例提供的数据表热度区分方法。如图9所示,该计算设备集群20包括至少一个计算设备200。The embodiment of the present application further provides a computing device cluster 20, and the computing device cluster 20 can be used to deploy the data processing system 10 shown in FIG. 8, and specifically can be used to deploy the data table in the data processing system 10 shown in FIG. 8 The heat distinguishing apparatus 1100 is configured to execute the data table heat distinguishing method provided by the embodiment of the present application. As shown in FIG. 9 , the computing device cluster 20 includes at least one computing device 200 .
具体地,在所述计算设备集群20仅包括一个计算设备200的情况下,可以在该一个计算设备200中部署图8所示的数据处理系统10中的全部模块:服务节点110、存储节点120、第一存储装置130和第二存储装置140。Specifically, in the case that the computing device cluster 20 includes only one computing device 200 , all the modules in the data processing system 10 shown in FIG. 8 may be deployed in the one computing device 200 : the service node 110 and the storage node 120 , the first storage device 130 and the second storage device 140 .
在所述计算设备集群20包括多个计算设备200的情况下,多个计算设备200中的每个计算设备200可以用于部署图8所示的数据处理系统10中的部分模块,或者,多个计算设备200中的两个或者两个以上的计算设备200共同用于部署图8所示的数据处理系统10中的一个或者多个模块。In the case where the computing device cluster 20 includes multiple computing devices 200, each computing device 200 in the multiple computing devices 200 may be used to deploy some modules in the data processing system 10 shown in FIG. Two or more of the computing devices 200 of the computing devices 200 are jointly used to deploy one or more modules in the data processing system 10 shown in FIG. 8 .
举例来讲,假设多个计算设备200包括计算设备200A和计算设备200B,则计算设备200A可以用于部署服务节点110和存储节点120,计算设备200B可以用于部署第一存储装置130和第二存储装置140,或者,计算设备200A和计算设备200B共同用于部署服务节点110,例如,计算设备200A上部署数据表热度区分装置1100中的获取模块1101,计算设备200B上部署数据表热度区分装置1100中的处理模块1102,计算设备200A还用于部署存储节点,计算设备200B还用于部署第一存储装置130和第二存储装置140;假设多个计算设备200包括计算设备200A、200B、200C和200D,则计算设备200A可以用于部署服务节点110,计算设备200B可以用于部署存储节点120,计算设备200C可以用于部署第一存储装置130,计算设备200D可以用于部署第二存储装置140。For example, assuming that the plurality of computing devices 200 includes a computing device 200A and a computing device 200B, the computing device 200A can be used to deploy the service node 110 and the storage node 120, and the computing device 200B can be used to deploy the first storage device 130 and the second storage device 130. The storage device 140, or the computing device 200A and the computing device 200B are jointly used to deploy the service node 110, for example, the obtaining module 1101 in the data table heat distinguishing device 1100 is deployed on the computing device 200A, and the data table heat distinguishing device is deployed on the computing device 200B In the processing module 1102 in 1100, the computing device 200A is also used to deploy storage nodes, and the computing device 200B is also used to deploy the first storage device 130 and the second storage device 140; it is assumed that the multiple computing devices 200 include computing devices 200A, 200B, 200C and 200D, the computing device 200A can be used to deploy the service node 110, the computing device 200B can be used to deploy the storage node 120, the computing device 200C can be used to deploy the first storage device 130, and the computing device 200D can be used to deploy the second storage device 140.
在具体实现中,所述计算设备集群20中包括的至少一个计算设备200可以全部是终端设备,也可以全部是云服务器,还可以部分是云服务器部分是终端设备,此处不作具体限定。In a specific implementation, at least one computing device 200 included in the computing device cluster 20 may be all terminal devices, or all cloud servers, or some cloud servers and some terminal devices, which are not specifically limited here.
更具体地,所述计算设备集群20中的每个计算设备200可以包括处理器、存储器以及通信接口等,所述计算设备集群20中的一个或者多个计算设备200中的存储器可以存有相同的用于执行本申请实施例提供的数据表热度区分方法的代码(也可以称为指令或者程序指令 等),处理器可以从存储器中读取代码,并执行代码以实现本申请实施例提供的数据表热度区分方法,通信接口可以用于实现每个计算设备200与其他设备之间的通信。More specifically, each computing device 200 in the computing device cluster 20 may include a processor, a memory, a communication interface, etc., and the memory in one or more computing devices 200 in the computing device cluster 20 may store the same The code (which may also be referred to as an instruction or a program instruction, etc.) for executing the data table heat discrimination method provided by the embodiment of the present application, the processor can read the code from the memory, and execute the code to realize the code provided by the embodiment of the present application. According to the method for distinguishing data table heat, the communication interface can be used to realize the communication between each computing device 200 and other devices.
在一些可能的实现方式中,计算设备集群20中的每个计算设备200也可以通过网络与其他设备连接进行通信。其中,所述网络可以是广域网或局域网等等。In some possible implementations, each computing device 200 in the computing device cluster 20 may also communicate with other devices through a network connection. Wherein, the network may be a wide area network or a local area network, or the like.
下面结合图10对本申请实施例提供的部署了数据表热度区分装置1100的计算设备200进行详细描述。The following will describe in detail the computing device 200 provided with the embodiment of the present application in which the apparatus 1100 for distinguishing the heat of a data table is deployed with reference to FIG. 10 .
参见图10,部署了数据表热度区分装置1100的计算设备200包括:处理器210、存储器220以及通信接口230,其中,处理器210、存储器220以及通信接口230之间可以通过总线240相互连接。其中,Referring to FIG. 10 , the computing device 200 in which the apparatus 1100 for distinguishing the data table heat is deployed includes: a processor 210 , a memory 220 and a communication interface 230 , wherein the processor 210 , the memory 220 and the communication interface 230 can be connected to each other through a bus 240 . in,
处理器210可以读取存储器220中存储的代码,与通信接口230配合执行本申请上述实施例中由数据表热度区分装置1100执行的数据表热度区分方法的部分或者全部步骤。The processor 210 may read the code stored in the memory 220, and cooperate with the communication interface 230 to execute some or all of the steps of the data table heat discrimination method performed by the data table heat discrimination apparatus 1100 in the above embodiments of the present application.
处理器210可以有多种具体实现形式,例如处理器210可以为中央处理器(central processing unit,CPU)或图形处理器(graphics processing unit,GPU),处理器210还可以是单核处理器或多核处理器。处理器210可以由CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。处理器210也可以单独采用内置处理逻辑的逻辑器件来实现,例如FPGA或数字信号处理器(digital signal processing,DSP)等。The processor 210 may have various specific implementation forms, for example, the processor 210 may be a central processing unit (central processing unit, CPU) or a graphics processing unit (graphics processing unit, GPU), and the processor 210 may also be a single-core processor or multi-core processor. The processor 210 may be a combination of a CPU and a hardware chip. The above-mentioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof. The above-mentioned PLD can be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general-purpose array logic (generic array logic, GAL) or any combination thereof. The processor 210 may also be independently implemented by a logic device with built-in processing logic, such as an FPGA or a digital signal processor (digital signal processing, DSP).
存储器220可以存储有代码以及数据。其中,代码包括:获取模块1101的代码和处理模块1102的代码等,数据包括:第一数据表的固有热度H 0、第二数据表的固有热度、第一数据表和第二数据表的关联热度H 1等等。 The memory 220 may store codes as well as data. The code includes: the code of the acquisition module 1101 and the code of the processing module 1102, etc., and the data includes: the inherent heat H 0 of the first data table, the inherent heat of the second data table, and the association between the first data table and the second data table Heat H 1 and so on.
在实际应用中,存储器220可以是非易失性存储器,例如,只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。存储器220也可以是易失性存储器,易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。In practical applications, the memory 220 may be a non-volatile memory, such as a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (erasable). PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), or flash memory. The memory 220 may also be volatile memory, which may be random access memory (RAM), which acts as an external cache.
通信接口230可以为有线接口(例如以太网接口)或无线接口(例如蜂窝网络接口或使用无线局域网接口),用于与其他计算节点或装置进行通信。当通信接口230为有线接口时,通信接口230可以采用传输控制协议/网际协议(transmission control protocol/internet protocol,TCP/IP)之上的协议族,例如,远程函数调用(remote function call,RFC)协议、简单对象访问协议(simple object access protocol,SOAP)协议、简单网络管理协议(simple network management protocol,SNMP)协议、公共对象请求代理体系结构(common object request broker architecture,CORBA)协议以及分布式协议等等。Communication interface 230 may be a wired interface (eg, an Ethernet interface) or a wireless interface (eg, a cellular network interface or using a wireless local area network interface) for communicating with other computing nodes or devices. When the communication interface 230 is a wired interface, the communication interface 230 may use a protocol family above transmission control protocol/internet protocol (TCP/IP), for example, remote function call (RFC) protocol, simple object access protocol (SOAP) protocol, simple network management protocol (SNMP) protocol, common object request broker architecture (CORBA) protocol, and distributed protocols and many more.
总线240可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。所述总线240可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 240 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA for short) bus or the like. The bus 240 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 10, but it does not mean that there is only one bus or one type of bus.
上述计算设备200用于执行上述数据表热度区分方法实施例中的方法,与上述方法实施例属于同一构思,其具体实现过程详见上述方法实施例,这里不再赘述。The above computing device 200 is configured to execute the method in the above embodiment of the method for classifying the heat of a data table, which belongs to the same concept as the above embodiment of the method. For the specific implementation process, please refer to the above embodiment of the method, which will not be repeated here.
应当理解,计算设备200仅为本申请实施例提供的一个例子,并且,计算设备200可具 有比图10示出的部件更多或更少的部件,可以组合两个或更多个部件,或者可具有部件的不同配置实现。It should be understood that the computing device 200 is only an example provided by the embodiments of the present application, and the computing device 200 may have more or less components than those shown in FIG. 10 , two or more components may be combined, or Different configurations of components are possible.
本申请实施例还提供一种非瞬态计算机可读存储介质,非瞬态计算机可读存储介质中存储有代码,当其在处理器上运行时,可以实现上述实施例中记载的数据表热度区分方法的部分或者全部步骤。Embodiments of the present application also provide a non-transitory computer-readable storage medium, where code is stored in the non-transitory computer-readable storage medium, and when the non-transitory computer-readable storage medium runs on a processor, the data table heat rate described in the foregoing embodiments can be implemented. Distinguish some or all of the steps of the method.
可以理解,随着数据表的数量越来越庞大,也会出现大量的数据库、数据系统等,企业除了需要管理大量的数据表,也需要对大量的数据库、数据系统进行管理。因此,对大量数据库的热度进行区分必将成为企业对大量数据库进行管理的关键一环,对大量数据系统的热度进行区分必将成为企业对大量数据系统进行管理的关键一环。本申请提供的数据表热度区分方法、装置以及相关设备的思路除了可以应用于对大量数据表进行管理之外,还可以应用于对大量数据库、大量数据系统的热度进行区分。It can be understood that as the number of data tables becomes larger and larger, a large number of databases and data systems will also appear. In addition to managing a large number of data tables, enterprises also need to manage a large number of databases and data systems. Therefore, distinguishing the heat of a large number of databases will definitely become a key part of the management of a large number of databases, and distinguishing the heat of a large number of data systems will become a key part of the management of a large number of data systems. The idea of the method, device and related equipment for distinguishing the heat of data tables provided in this application can not only be applied to the management of a large number of data tables, but also can be applied to distinguish the heat of a large number of databases and data systems.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.
在上述实施例中,可以全部或部分地通过软件、硬件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品可以包含代码。当计算机程序产品被计算机读取并执行时,可以实现上述方法实施例中记载的数据表热度区分方法的部分或者全部步骤。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如软盘、硬盘、磁带)、光介质、或者半导体介质等。In the above embodiments, it may be implemented in whole or in part by software, hardware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product may contain code. When the computer program product is read and executed by a computer, part or all of the steps of the method for distinguishing the heat of a data table described in the above method embodiments can be implemented. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media, or semiconductor media, and the like.
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并或删减;本申请实施例装置中的单元可以根据实际需要进行划分、合并或删减。The steps in the method of the embodiment of the present application may be sequentially adjusted, combined or deleted according to actual needs; the units in the device of the embodiment of the present application may be divided, combined or deleted according to actual needs.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The embodiments of the present application have been introduced in detail above, and the principles and implementations of the present application are described in this paper by using specific examples. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application; at the same time, for Persons of ordinary skill in the art, based on the idea of the present application, will have changes in the specific implementation manner and application scope. In summary, the contents of this specification should not be construed as limitations on the present application.

Claims (17)

  1. 一种数据表热度区分方法,其特征在于,所述方法包括:A method for distinguishing the heat of a data table, characterized in that the method comprises:
    服务节点从存储节点获取与第一数据表关联的第二数据表,所述存储节点存储有多个数据表;The service node obtains a second data table associated with the first data table from a storage node, where the storage node stores a plurality of data tables;
    所述服务节点获取所述第一数据表和所述第二数据表的关联热度,其中,所述第一数据表和所述第二数据表的关联热度根据所述第二数据表的固有热度以及所述第一数据表和所述第二数据表的关联关系获得,所述第二数据表的固有热度为所述第二数据表被调用产生的热度;The service node acquires the associated heat of the first data table and the second data table, wherein the associated heat of the first data table and the second data table is based on the inherent heat of the second data table And the association relationship between the first data table and the second data table is obtained, and the inherent heat of the second data table is the heat generated by the second data table being called;
    所述服务节点根据所述第一数据表和所述第二数据表的关联热度,确定所述第一数据表的热度。The service node determines the popularity of the first data table according to the relative popularity of the first data table and the second data table.
  2. 根据权利要求1所述的方法,其特征在于,所述服务节点从存储节点获取与第一数据表关联的第二数据表,包括:The method according to claim 1, wherein the service node obtains the second data table associated with the first data table from the storage node, comprising:
    所述服务节点从所述存储节点获取与所述第一数据表具有数据血缘关系的所述第二数据表,其中,所述数据血缘关系表示所述第二数据表根据所述第一数据表计算得到,或者,所述第一数据表根据所述第二数据表计算得到;The service node obtains, from the storage node, the second data table having a data blood relationship with the first data table, wherein the data blood relationship indicates that the second data table is based on the first data table Calculated, or, the first data table is calculated according to the second data table;
    所述服务节点获取所述第一数据表和所述第二数据表的关联热度,包括:The service node obtains the correlation heat between the first data table and the second data table, including:
    所述服务节点根据所述第一数据表和所述第二数据表的数据血缘关系,计算所述第一数据表和所述第二数据表的关联热度。The service node calculates the correlation degree of the first data table and the second data table according to the data blood relationship between the first data table and the second data table.
  3. 根据权利要求1所述的方法,其特征在于,所述服务节点从存储节点获取与第一数据表关联的第二数据表,包括:The method according to claim 1, wherein the service node obtains the second data table associated with the first data table from the storage node, comprising:
    所述服务节点从所述存储节点获取与所述第一数据表具有主外键关联关系的所述第二数据表,其中,所述主外键关联关系表示所述第一数据表中的一个或者多个字段被引用作为第二数据表的主键,或者,所述第二数据表中的一个或者多个字段被引用作为所述第一数据表的主键;The service node acquires, from the storage node, the second data table having a primary and foreign key association relationship with the first data table, wherein the primary and foreign key association relationship represents one of the first data tables Or multiple fields are referenced as the primary key of the second data table, or, one or more fields in the second data table are referenced as the primary key of the first data table;
    所述服务节点获取所述第一数据表和所述第二数据表的关联热度,包括:The service node obtains the correlation heat between the first data table and the second data table, including:
    所述服务节点根据所述第一数据表和所述第二数据表的主外键关联关系,计算所述第一数据表和所述第二数据表的关联热度。The service node calculates the association heat between the first data table and the second data table according to the primary and foreign key association relationship between the first data table and the second data table.
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述服务节点根据所述第一数据表和所述第二数据表的关联热度确定所述第一数据表的热度,包括:The method according to any one of claims 1 to 3, wherein the service node determines the popularity of the first data table according to the relative popularity of the first data table and the second data table, comprising:
    所述服务节点根据所述第一数据表的固有热度以及所述第一数据表和所述第二数据表的关联热度,确定所述第一数据表的热度,其中,所述第一数据表的固有热度为所述第一数据表被调用产生的热度。The service node determines the heatness of the first data table according to the inherent heatness of the first data table and the associated heatness of the first data table and the second data table, wherein the first data table The inherent heat is the heat generated by the first data table being called.
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 4, wherein the method further comprises:
    所述服务节点计算所述多个数据表的热度;the service node calculates the heatness of the plurality of data tables;
    所述服务节点根据计算结果从所述存储节点删除热度小于第一预设阈值的数据表。The service node deletes, from the storage node according to the calculation result, data tables whose heat is less than a first preset threshold.
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 5, wherein the method further comprises:
    所述服务节点计算所述多个数据表的热度;the service node calculates the heatness of the plurality of data tables;
    所述服务节点根据计算结果将所述多个数据表中热度大于第二预设阈值的数据表在显示界面上的位置调整到热度小于所述第二预设阈值的数据表的前面。The service node adjusts, according to the calculation result, a position on the display interface of a data table whose heat is greater than the second preset threshold in the plurality of data tables to be in front of a data table whose heat is less than the second preset threshold.
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 6, wherein the method further comprises:
    所述服务节点计算所述多个数据表的热度;the service node calculates the heatness of the plurality of data tables;
    所述服务节点根据计算结果将热度小于第三预设阈值的数据表迁移到第一存储装置,其中,所述第一存储装置的存储性能低于所述存储节点。The service node migrates, according to the calculation result, data tables whose heat is less than a third preset threshold to a first storage device, where the storage performance of the first storage device is lower than that of the storage node.
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, wherein the method further comprises:
    所述服务节点计算所述多个数据表的热度;the service node calculates the heatness of the plurality of data tables;
    所述服务节点根据计算结果将热度大于第四预设阈值的数据表迁移到第二存储装置,其中,所述第二存储装置的存储性能高于所述存储节点。The service node migrates, according to the calculation result, a data table whose heat is greater than a fourth preset threshold to a second storage device, where the storage performance of the second storage device is higher than that of the storage node.
  9. 一种数据表热度区分装置,其特征在于,所述装置应用于服务节点,所述装置包括:An apparatus for distinguishing data table heat, characterized in that the apparatus is applied to a service node, and the apparatus includes:
    获取模块,用于从存储节点获取与第一数据表关联的第二数据表,所述存储节点存储有多个数据表;an obtaining module, configured to obtain a second data table associated with the first data table from a storage node, where the storage node stores a plurality of data tables;
    处理模块,用于获取所述第一数据表和所述第二数据表的关联热度,其中,所述第一数据表和所述第二数据表的关联热度根据所述第二数据表的固有热度以及所述第一数据表和所述第二数据表的关联关系获得,所述第二数据表的固有热度为所述第二数据表被调用产生的热度;A processing module, configured to obtain the correlation degree of the first data table and the second data table, wherein the correlation degree of the first data table and the second data table is based on the inherent characteristics of the second data table The heat and the association relationship between the first data table and the second data table are obtained, and the inherent heat of the second data table is the heat generated by the second data table being called;
    所述处理模块,用于根据所述第一数据表和所述第二数据表的关联热度确定所述第一数据表的热度。The processing module is configured to determine the popularity of the first data table according to the correlation between the first data table and the second data table.
  10. 根据权利要求9所述的装置,其特征在于,The device of claim 9, wherein:
    所述获取模块,具体用于:The acquisition module is specifically used for:
    从所述存储节点获取与所述第一数据表具有数据血缘关系的所述第二数据表,其中,所述数据血缘关系表示所述第二数据表根据所述第一数据表计算得到,或者,所述第一数据表根据所述第二数据表计算得到;Acquire the second data table having a data blood relationship with the first data table from the storage node, wherein the data blood relationship indicates that the second data table is calculated according to the first data table, or , the first data table is calculated according to the second data table;
    所述处理模块,具体用于:The processing module is specifically used for:
    根据所述第一数据表和所述第二数据表的数据血缘关系,计算所述第一数据表和所述第二数据表的关联热度。According to the data blood relationship between the first data table and the second data table, the correlation degree of the first data table and the second data table is calculated.
  11. 根据权利要求9所述的装置,其特征在于,The device of claim 9, wherein:
    所述获取模块,具体用于:The acquisition module is specifically used for:
    从所述存储节点获取与所述第一数据表具有主外键关联关系的所述第二数据表,其中,所述主外键关联关系表示所述第一数据表中的一个或者多个字段被引用作为第二数据表的主键,或者,所述第二数据表中的一个或者多个字段被引用作为所述第一数据表的主键;Acquire the second data table having a primary-foreign key association relationship with the first data table from the storage node, wherein the primary-foreign key association relationship represents one or more fields in the first data table be referenced as the primary key of the second data table, or, one or more fields in the second data table are referenced as the primary key of the first data table;
    所述处理模块,具体用于:The processing module is specifically used for:
    根据所述第一数据表和所述第二数据表的主外键关联关系,计算所述第一数据表和所述第二数据表的关联热度。According to the primary and foreign key association relationship between the first data table and the second data table, the association degree of the first data table and the second data table is calculated.
  12. 根据权利要求9至11任一所述的装置,其特征在于,所述处理模块,具体用于:The device according to any one of claims 9 to 11, wherein the processing module is specifically configured to:
    根据所述第一数据表的固有热度和所述第一数据表和所述第二数据表的关联热度确定所述第一数据表的热度,其中,所述第一数据表的固有热度为所述第一数据表被调用产生的热度。The heatness of the first data table is determined according to the inherent heatness of the first data table and the correlation heatness of the first data table and the second data table, wherein the inherent heatness of the first data table is all The heat generated when the first data table is called.
  13. 根据权利要求9至12任一项所述的装置,其特征在于,所述处理模块,还用于:The device according to any one of claims 9 to 12, wherein the processing module is further configured to:
    计算所述多个数据表的热度;calculating the popularity of the plurality of data tables;
    根据计算结果从所述存储节点删除热度小于第一预设阈值的数据表。According to the calculation result, delete the data table whose heat is less than the first preset threshold from the storage node.
  14. 根据权利要求9至13任一项所述的装置,其特征在于,所述处理模块,还用于:The device according to any one of claims 9 to 13, wherein the processing module is further configured to:
    计算所述多个数据表的热度;calculating the popularity of the plurality of data tables;
    根据计算结果将所述多个数据表中热度大于第二预设阈值的数据表在显示界面上的位置 调整到热度小于所述第二预设阈值的数据表的前面。According to the calculation result, the position on the display interface of the data table whose heat is greater than the second preset threshold among the plurality of data tables is adjusted to be in front of the data table whose heat is less than the second preset threshold.
  15. 根据权利要求9至14任一项所述的装置,其特征在于,所述处理模块,还用于:The device according to any one of claims 9 to 14, wherein the processing module is further configured to:
    计算所述多个数据表的热度;calculating the popularity of the plurality of data tables;
    根据计算结果将热度小于第三预设阈值的数据表迁移到第一存储装置,以及将热度大于第四预设阈值的数据表迁移到第二存储装置,其中,所述第一存储装置的存储性能低于所述存储节点,所述第二存储装置的存储性能高于所述存储节点。According to the calculation result, the data tables whose heat is less than the third preset threshold are migrated to the first storage device, and the data tables whose heat is greater than the fourth preset threshold are migrated to the second storage device, wherein the storage of the first storage device The performance of the second storage device is lower than that of the storage node, and the storage performance of the second storage device is higher than that of the storage node.
  16. 一种非瞬态计算机可读存储介质,其特征在于,所述非瞬态计算机可读存储介质存储有计算机可读指令,当所述计算机可读指令被运行时,执行如权利要求1至8任一项所述的方法。A non-transitory computer-readable storage medium, characterized in that, the non-transitory computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed, the execution of claims 1 to 8 is performed. The method of any one.
  17. 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;A computing device cluster, comprising at least one computing device, each computing device including a processor and a memory;
    所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1至8中任一项所述的方法。The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any one of claims 1-8.
PCT/CN2022/071364 2021-04-12 2022-01-11 Data table heat differentiation method and apparatus, and related device WO2022217987A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110389324.9 2021-04-12
CN202110389324.9A CN115203195A (en) 2021-04-12 2021-04-12 Data table heat distinguishing method and device and related equipment

Publications (1)

Publication Number Publication Date
WO2022217987A1 true WO2022217987A1 (en) 2022-10-20

Family

ID=83571486

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071364 WO2022217987A1 (en) 2021-04-12 2022-01-11 Data table heat differentiation method and apparatus, and related device

Country Status (2)

Country Link
CN (1) CN115203195A (en)
WO (1) WO2022217987A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186566A (en) * 2011-12-28 2013-07-03 中国移动通信集团河北有限公司 Data classification storage method, device and system
US20150095184A1 (en) * 2013-09-30 2015-04-02 Alliance Data Systems Corporation Recommending a personalized ensemble
CN105447062A (en) * 2014-09-30 2016-03-30 中国电信股份有限公司 Hot spot data identification method and device
CN111339404A (en) * 2020-02-14 2020-06-26 腾讯科技(深圳)有限公司 Content popularity prediction method and device based on artificial intelligence and computer equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186566A (en) * 2011-12-28 2013-07-03 中国移动通信集团河北有限公司 Data classification storage method, device and system
US20150095184A1 (en) * 2013-09-30 2015-04-02 Alliance Data Systems Corporation Recommending a personalized ensemble
CN105447062A (en) * 2014-09-30 2016-03-30 中国电信股份有限公司 Hot spot data identification method and device
CN111339404A (en) * 2020-02-14 2020-06-26 腾讯科技(深圳)有限公司 Content popularity prediction method and device based on artificial intelligence and computer equipment

Also Published As

Publication number Publication date
CN115203195A (en) 2022-10-18

Similar Documents

Publication Publication Date Title
US11586692B2 (en) Streaming data processing
US11416528B2 (en) Query acceleration data store
US11461334B2 (en) Data conditioning for dataset destination
US10795884B2 (en) Dynamic resource allocation for common storage query
US11775501B2 (en) Trace and span sampling and analysis for instrumented software
US9418101B2 (en) Query optimization
US20180089259A1 (en) External dataset capability compensation
US20180089258A1 (en) Resource allocation for multiple datasets
US8892677B1 (en) Manipulating objects in hosted storage
US9201908B2 (en) Multi-layered multi-tenancy database architecture
US10970300B2 (en) Supporting multi-tenancy in a federated data management system
US10936559B1 (en) Strongly-consistent secondary index for a distributed data set
US11074267B2 (en) Staged approach to automatic data discovery and performance
CN112559271B (en) Interface performance monitoring method, device and equipment for distributed application and storage medium
WO2021244473A1 (en) Frequency control method and apparatus
US10812322B2 (en) Systems and methods for real time streaming
WO2019205365A1 (en) Method and apparatus for loading dom node data, and computer device and storage medium
US20100191730A1 (en) Efficiency in processing queries directed to static data sets
JP6501924B2 (en) Method and server for canceling alert
WO2022217987A1 (en) Data table heat differentiation method and apparatus, and related device
CN111753141A (en) Data management method and related equipment
US11816090B2 (en) Selectively processing an event published responsive to an operation on a database record that relates to consent
CN107665241B (en) Real-time data multi-dimensional duplicate removal method and device
CN112579673A (en) Multi-source data processing method and device
CN113704242A (en) Data processing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22787219

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22787219

Country of ref document: EP

Kind code of ref document: A1