WO2022217987A1 - Procédé et appareil de différenciation de chaleur de table de données, et dispositif associé - Google Patents

Procédé et appareil de différenciation de chaleur de table de données, et dispositif associé Download PDF

Info

Publication number
WO2022217987A1
WO2022217987A1 PCT/CN2022/071364 CN2022071364W WO2022217987A1 WO 2022217987 A1 WO2022217987 A1 WO 2022217987A1 CN 2022071364 W CN2022071364 W CN 2022071364W WO 2022217987 A1 WO2022217987 A1 WO 2022217987A1
Authority
WO
WIPO (PCT)
Prior art keywords
data table
data
heat
service node
tables
Prior art date
Application number
PCT/CN2022/071364
Other languages
English (en)
Chinese (zh)
Inventor
季振峰
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2022217987A1 publication Critical patent/WO2022217987A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Definitions

  • the present application relates to the field of big data, and in particular, to a method, device and related equipment for distinguishing the heat of a data table.
  • the present application provides a method, device and related equipment for distinguishing the heat of a data table, which can improve the accuracy of distinguishing the heat of a data table.
  • a method for distinguishing data table heat includes:
  • the service node obtains a second data table associated with the first data table from a storage node, where the storage node stores a plurality of data tables;
  • the service node acquires the associated heat of the first data table and the second data table, wherein the associated heat of the first data table and the second data table is based on the inherent heat of the second data table And the association relationship between the first data table and the second data table is obtained, and the inherent heat of the second data table is the heat generated by the second data table being called;
  • the service node determines the popularity of the first data table according to the relative popularity of the first data table and the second data table.
  • the heat brought by the second data table having an associated relationship with the first data table is introduced to the first data table, that is, the difference between the first data table and the second data table. Therefore, the calculated popularity of the first data table can be improved to be more accurate, and when the popularity of multiple data tables is acquired, the popularity of multiple data tables can be better distinguished.
  • the service node acquires the second data table associated with the first data table from the storage node, including:
  • the service node obtains, from the storage node, the second data table having a data blood relationship with the first data table, wherein the data blood relationship indicates that the second data table is based on the first data table Calculated, or, the first data table is calculated according to the second data table;
  • the service node obtains the correlation heat between the first data table and the second data table, including:
  • the service node calculates the correlation degree of the first data table and the second data table according to the data blood relationship between the first data table and the second data table.
  • the service node obtains the second data table associated with the first data table from the storage node, including:
  • the service node acquires, from the storage node, the second data table having a primary and foreign key association relationship with the first data table, wherein the primary and foreign key association relationship represents one of the first data tables Or multiple fields are referenced as the primary key of the second data table, or, one or more fields in the second data table are referenced as the primary key of the first data table;
  • the service node obtains the correlation heat between the first data table and the second data table, including:
  • the service node calculates the association heat between the first data table and the second data table according to the primary and foreign key association relationship between the first data table and the second data table.
  • the service node determines the popularity of the first data table according to the correlation between the first data table and the second data table, including:
  • the service node determines the heatness of the first data table according to the inherent heatness of the first data table and the associated heatness of the first data table and the second data table, wherein the first data table
  • the inherent heat is the heat generated by the first data table being called.
  • the method further includes:
  • the service node calculates the heatness of the plurality of data tables
  • the service node deletes, from the storage node according to the calculation result, data tables whose heat is less than a first preset threshold.
  • the service node deletes the data table with low heat from the storage node according to the calculation result, which can save storage space.
  • the method further includes:
  • the service node calculates the heatness of the plurality of data tables
  • the service node adjusts, according to the calculation result, a position on the display interface of a data table whose heat is greater than the second preset threshold in the plurality of data tables to be in front of a data table whose heat is less than the second preset threshold.
  • the service node adjusts the position of the data table with high popularity on the display interface to the front of the data table with low popularity, so that the user can view the data table with high popularity conveniently and quickly.
  • the method further includes:
  • the service node calculates the heatness of the plurality of data tables
  • the service node migrates, according to the calculation result, data tables whose heat is less than a third preset threshold to a first storage device, where the storage performance of the first storage device is lower than that of the storage node.
  • the service node migrates the data table with low heat to the first storage device whose storage performance is lower than that of the storage node, which can not only prevent the data table with low heat from continuing to occupy the resources of the storage node, but also when users need to view this part of the data table in the future. Also found from the first storage device.
  • the method further includes:
  • the service node calculates the heatness of the plurality of data tables
  • the service node migrates, according to the calculation result, a data table whose heat is greater than a fourth preset threshold to a second storage device, where the storage performance of the second storage device is higher than that of the storage node.
  • the service node migrates the hot data table to the second storage device with higher storage performance than the storage node, which can improve the efficiency of operating data in the hot data table and improve the storage security of the hot data table. sex.
  • a data table heat discrimination device is provided, the device is applied to a service node, and the device includes:
  • an obtaining module configured to obtain a second data table associated with the first data table from a storage node, where the storage node stores a plurality of data tables;
  • a processing module configured to obtain the correlation degree of the first data table and the second data table, wherein the correlation degree of the first data table and the second data table is based on the inherent characteristics of the second data table
  • the heat and the association relationship between the first data table and the second data table are obtained, and the inherent heat of the second data table is the heat generated by the second data table being called;
  • the processing module is configured to determine the popularity of the first data table according to the correlation between the first data table and the second data table.
  • the obtaining module is specifically used for:
  • the second data table having a data blood relationship with the first data table from the storage node, wherein the data blood relationship indicates that the second data table is calculated according to the first data table, or , the first data table is calculated according to the second data table;
  • the processing module is specifically used for:
  • the correlation degree of the first data table and the second data table is calculated.
  • the obtaining module is specifically used for:
  • the second data table having a primary-foreign key association relationship with the first data table from the storage node, wherein the primary-foreign key association relationship represents one or more fields in the first data table be referenced as the primary key of the second data table, or, one or more fields in the second data table are referenced as the primary key of the first data table;
  • the processing module is specifically used for:
  • the association degree of the first data table and the second data table is calculated.
  • the processing module is specifically used for:
  • the heatness of the first data table is determined according to the inherent heatness of the first data table and the correlation heatness of the first data table and the second data table, wherein the inherent heatness of the first data table is all The heat generated when the first data table is called.
  • the processing module is further configured to:
  • the processing module is further configured to:
  • the position on the display interface of the data table whose heat is greater than the second preset threshold among the plurality of data tables is adjusted to be in front of the data table whose heat is less than the second preset threshold.
  • the processing module is further configured to:
  • the data tables whose heat is less than the third preset threshold are migrated to the first storage device, and the data tables whose heat is greater than the fourth preset threshold are migrated to the second storage device, wherein the storage of the first storage device
  • the performance of the second storage device is lower than that of the storage node, and the storage performance of the second storage device is higher than that of the storage node.
  • a non-transitory computer-readable storage medium stores computer-readable instructions.
  • the computer-readable instructions When the computer-readable instructions are executed, the first method described above is executed. Aspect or a method described in any specific implementation of the first aspect.
  • a computer program product including a computer program, when the computer program is read and executed by a cluster of computer devices, the cluster of computer devices is made to execute the first aspect or any specific implementation of the first aspect. The method described in the implementation.
  • a computing device cluster including at least one computing device, each computing device including a processor and a memory; the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the The computing device performs the method as described in the above first aspect or any specific implementation of the first aspect.
  • the computing device cluster includes a computing device, and the computing device includes a processor and a memory; the processor is configured to execute instructions stored in the memory, so that the computing device performs the first aspect or A method provided by any possible implementation manner of the first aspect.
  • the computing device cluster includes at least two computing devices, and each computing device includes a processor and a memory; the processors of the at least two computing devices are used to execute the memory of the at least two computing devices.
  • FIG. 1 is a schematic diagram of an application scenario involved in an embodiment of the present application
  • FIG. 2 is a schematic diagram of a data blood relationship involved in an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a primary-foreign key association relationship involved in an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a method for distinguishing the heat of a data table provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of a data blood relationship of a first data table provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another data table heat discrimination method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a primary and foreign key association relationship of a first data table provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • first and second in the embodiments of the present application are only used for the purpose of description, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
  • “at least one” refers to one or more, and “multiple” refers to two or more.
  • “And/or”, which describes the association relationship of the associated objects indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an "or” relationship.
  • “At least one of the following” or similar expressions refers to any combination of these items, including any combination of a single item(s) or a plurality of items(s).
  • At least one (a) of a, b or c may represent: a, b, c, a-b, a-c, b-c or a-b-c, wherein a, b, c may be single or multiple.
  • Transactional data also known as transactional data, business data, etc.
  • transactional data describe the internal or external events or transaction records in the business operation process of the organization, such as sales orders, call records, etc.
  • Data popularity a value used to reflect the degree of attention to the data. This value also indicates the possibility of the data being accessed within a certain period of time from the current time. If the data popularity is large, it indicates that the data has a high degree of attention, indicating that the data has received a high degree of attention. The data has a high possibility of being accessed in the current period of time, and the data popularity is small, indicating that the data has a low degree of attention, indicating that the possibility of the data being accessed in the current period of time is very small.
  • Data table popularity a value used to reflect the degree of attention of the data table. This value indicates the possibility of the data table being accessed for a period of time from the current beginning. If the data table is hot, it means that the data table has a high degree of attention. , indicating that the data table is very likely to be accessed for a period of time from the current time, and the data table is less popular, indicating that the data table has a low degree of attention, indicating that the data table is very likely to be accessed for a period of time from the current beginning. Small.
  • the inherent heat of the data table the heat generated by the data table itself being called, the heat can be determined according to the number of times the data table is called (also called the number of times of use or the number of visits), usually, the inherent heat of the data table
  • the heat is equal to the number of times the data table is called, where the number of times the data table is called includes the number of times of querying (select) data, adding (insert) data, deleting (deleting) data, and modifying (update)
  • the number of times the data table is called also includes the number of other data operations performed in the data table.
  • the method based on data creation time is mainly used to distinguish the popularity of transaction data tables (that is, tables that mainly include transaction data). Specifically, assuming that the storage node stores transaction data table A and transaction data table B, the data in transaction data table A is Created in the last year, the data in transaction data table B was created one year ago. After obtaining transaction data table A and transaction data table B from the storage node, the service node obtains the creation time of the data in transaction data table A and The creation time of the data in transaction data table B is compared. When it is determined that most or all of the data in transaction data table A are created later than the data in transaction data table B, the transaction data table will be determined. The heat of A is greater than the heat of transaction data table B, otherwise, it is determined that the heat of transaction data table A is less than the heat of transaction data table B.
  • the service node then distinguishes the heat of the two transaction data tables according to the above method based on the data creation time, and the obtained heat distinction result is obviously inaccurate and inconsistent with the actual application scenario.
  • the inherent heat method based on the data table is mainly used to distinguish the heat of the webpage data table (that is, the table mainly including webpage data (such as articles, pictures, videos, etc. published on the webpage), specifically, it is assumed that the storage node stores webpage data. Table A and webpage data table B. After obtaining webpage data table A and webpage data table B from the storage node, the service node obtains the inherent heat of webpage data table A and the inherent heat of webpage data table B and compares them. When the inherent popularity of data sheet A is greater than that of webpage data sheet B, it will be determined that the popularity of webpage data sheet A is greater than that of webpage data sheet B; otherwise, it is determined that the popularity of webpage data sheet A is lower than that of webpage data sheet B.
  • the service node distinguishes the popularity of the two web page data tables according to the above method based on the inherent popularity of the data table, and the obtained popularity distinction result is obviously inaccurate and inconsistent with the actual application scenario.
  • the embodiments of the present application provide a method, device, and related equipment for distinguishing the heat of a data table, which can improve the accuracy of distinguishing the heat of a data table and are more in line with practical application scenarios.
  • Data blood relationship also known as data lineage relationship, data origin relationship and data lineage relationship, etc., refers to a relationship that will be formed between data tables in the process of generation, fusion, transformation, circulation and death of data tables .
  • an intermediate table including intermediate data ie, some or all of the calculated original data
  • data table 3 including final data is formed.
  • the data link from data table 1 to data table 2 to data table 3 is Indicates the data blood relationship of these three tables.
  • data table 1 and data table 2 have a direct blood relationship
  • data table 2 and data table 3 have a direct blood relationship
  • data table 1 and data table 3 have an indirect blood relationship.
  • data table 2 directly depends on data table 1
  • data table 3 directly depends on data table 2 , indirectly dependent on Data Table 1. It can be understood that if the data used to calculate data table 2 and data table 3 in data table 1 is accessed, it means that data table 2 and data table 3 are indirectly accessed, that is, data table 1 is to a certain extent.
  • data table 2 can improve the popularity of data table 2 and data table 3; if the data from data table 1 in data table 2 is accessed, it means that data table 1 and data table 3 are indirectly accessed, that is to say, To a certain extent, data table 2 can improve the popularity of data table 1 and data table 3; if the data from data table 2 in data table 3 is accessed, it means that data table 1 and data table 2 are indirectly Accessed, that is to say, Data Sheet 3 has an effect on the popularity of Data Sheet 1 and the popularity of Data Sheet 2 to a certain extent.
  • each data table has a data blood relationship with it ( Taking into account the increased popularity of other data tables (including direct blood relationship and indirect blood relationship), the determined popularity of each data table will be more accurate and can better highlight the importance of each data table.
  • the primary key-foreign key relationship defines a relationship between two tables in a relational database. As shown in Figure 3, one or more fields A1 in data table 1 are Reference is made as the primary key of data table 2', at this time, the field A1 in data table 1 is said to be a foreign key pointing to data table 2', and data table 1 and data table 2' have a primary-foreign key association relationship.
  • the primary key of data table 2' is also referenced as the primary key of data table 3'.
  • data table 1 and data table 3' also have a primary and foreign key association relationship.
  • the primary and foreign key associations between data table 1 and data table 2' and the primary and foreign key associations between data table 2' and data table 3' are called direct primary and foreign key associations, and data table 2' and data
  • the primary and foreign key associations between tables 3' are indirect primary and foreign key associations.
  • table 3' has an effect of improving; if the primary key of data table 3' is accessed, it means that data table 1 and data table 2' are indirectly accessed, that is to say, data table 3' has a certain degree of influence on data table 1 and Data Sheet 2' heat up.
  • each data table has its main external
  • key associations including direct primary and foreign key associations and indirect primary and foreign key associations
  • Data table 1 refers to the heat brought by the associated data table to the associated data table, such as the above-mentioned data table 1 due to the data table 2 and/or data table 3 that has a data blood relationship with it.
  • Data table 1 has increased popularity due to data table 2' and/or data table 3' having a primary and foreign key relationship with it.
  • the process includes but is not limited to the following steps:
  • the service node obtains the log information of the data operation of the data table 1 from the storage node, and obtains the information of the data operation of the data table 1 according to the log information of the data operation of the data table 1.
  • the log information of the data operation of the data table 1 indicates that there is log information about the data operation performed by the user that is automatically recorded by the storage node when the user performs data operations on the data table 1, and the log information includes the user's data operation on the data table 1.
  • Information about the data operations performed such as the type of data operations performed on Data Table 1 (such as deleting data, adding data, etc.) and the time of data operations on Data Table 1. Therefore, according to the data in Table 1 Operation log information Get information about data operation of data table 1.
  • the service node can obtain the log information of the data table 1 within a preset time period from the storage node, and then obtain the data operation information of the data table 1 within the preset time period according to the log information, for example, the service node You can obtain the log information of data table 1 in 2020, and then obtain the information of data operation of data table 1 in 2020 according to the log information of data table 1 in 2020.
  • the service node determines the number of times the data table 1 is called according to the data operation information of the data table 1.
  • the number of times of querying data in data table 1, the number of times of adding data in data table 1, the number of times of deleting data in data table 1, and the number of times of deleting data in data table 1 and The number of times of modifying the data, etc., and then summing the above times can determine the number of times the data table 1 is called.
  • A3. Determine the inherent heat of data table 1 according to the number of times data table 1 is called.
  • the inherent popularity of the data table 1 the number of times the data table 1 is called.
  • the service node can obtain and first data from the storage node.
  • the second data table associated with the table, and then obtain the correlation heat of the first data table and the second data table according to the correlation relationship between the first data table and the second data table and the inherent heat of the second data table, after the first data table is obtained.
  • the popularity of the first data table is determined according to the correlation, wherein the correlation between the first data table and the second data table includes a data blood relationship and a primary and foreign key correlation. species or multiple species.
  • a method for distinguishing the heat of a data table provided by the embodiment of the present application is described in more detail below with reference to FIG. 4 .
  • the method for distinguishing the heat of a data table provided by the embodiment of the present application includes but is not limited to the following steps:
  • the service node acquires a first data table and a second data table having a data blood relationship with the first data table from a storage node.
  • the storage node stores multiple data tables, and the first data table may be any one or more data tables among the multiple data tables stored by the storage node.
  • the multiple data tables stored by the storage node can be various types of tables such as transaction data tables and web page data tables. Tables belonging to any database, not specifically limited here.
  • the data blood relationship between the first data table and the second data table means that the second data table is calculated according to the first data table, and/or, the first data table is based on the second data table. Calculated from the data sheet.
  • the data blood relationship between the first data table and the second data table may be a direct blood relationship or an indirect blood relationship, which is not specifically limited here.
  • the service node can obtain the second data table that has a data blood relationship with the first data table from the storage node through a data warehouse tool (such as hive) or a SQL statement, wherein hive It is a data warehouse tool based on Hadoop for data extraction, transformation and loading. It is a mechanism for storing, querying and analyzing large-scale data stored in Hadoop.
  • a data warehouse tool such as hive
  • SQL statement wherein hive It is a data warehouse tool based on Hadoop for data extraction, transformation and loading. It is a mechanism for storing, querying and analyzing large-scale data stored in Hadoop.
  • the service node obtains the second data table having a data blood relationship with the first data table from the storage node through the data warehouse tool or the SQL statement, which is only an example and should not be regarded as a specific limitation.
  • the service node can also obtain the second data table that has a data blood relationship with the first data table in other ways, such as manually reading the code to find the second data table that has a data blood relationship with the first data table, and the service node Receive the manually input name of the second data table that has a data blood relationship with the first data table, and then acquire the second data table according to the manually input name of the second data table.
  • the service node acquires the inherent heat H 0 of the first data table.
  • the inherent heat H 0 of the first data table is the heat generated by the first data table itself being called.
  • the service node calculates the correlation heat H 1 of the first data table and the second data table according to the data blood relationship between the first data table and the second data table and the inherent heat of the second data table.
  • the inherent heat of the second data table is the heat generated by the second data table itself being called.
  • the service node can determine the blood relationship weight corresponding to the second data table according to the data blood relationship between the first data table and the second data table, and Calculate the inherent heat of the second data table, and then calculate the associated heat H 1 of the first data table and the second data table according to the blood relationship weight corresponding to the second data table and the inherent heat of the second data table.
  • H 1 W A *H 0,A +W B *H 0,B
  • W A and W B are both numbers greater than 0 and less than 1.
  • the second data table A has a direct blood relationship with the first data table
  • the second data table B has an indirect blood relationship with the first data table
  • the first data table A has an indirect blood relationship with the first data table.
  • the relationship between the second data table A and the first data table is closer, preferably, W A is greater than W B .
  • the service node determines the heat H 0 of the first data table according to the inherent heat H 0 of the first data table and the associated heat H 1 of the first data table and the second data table.
  • H H 0 +H 1 .
  • FIG. 6 is a schematic flowchart of another method for distinguishing the heat of a data table provided by an embodiment of the present application. As shown in FIG. 6, the method for distinguishing the heat of a data table provided by an embodiment of the present application includes but is not limited to the following steps:
  • the service node obtains a first data table and a second data table having a primary and foreign key association relationship with the first data table from a storage node.
  • the service node may obtain the second data table having a primary and foreign key association relationship with the first data table from the storage node through a data warehouse tool or a SQL statement.
  • the service node obtains the second data table with the primary and foreign key association relationship in the first data table from the storage node through the data warehouse tool or the SQL statement, which is only an example.
  • the service node can also obtain the second data table that has a primary and foreign key relationship with the first data table in other ways, such as manually reading the code to find the second data that has a primary and foreign key relationship with the first data table.
  • the service node receives the manually input name of the second data table that has a primary foreign key relationship with the first data table, and then obtains the second data table according to the manually input name of the second data table.
  • the service node acquires the inherent heat H 0 of the first data table.
  • the service node calculates the association heat H 1 of the first data table and the second data table according to the primary and foreign key association relationship between the first data table and the second data table and the inherent heat of the second data table.
  • the service node can determine the corresponding data table according to the primary and foreign key association relationship between the first data table and the second data table. association weight, and calculating the inherent heat of the second data table, and then calculating the association heat H 1 of the first data table and the second data table according to the association weight corresponding to the second data table and the inherent heat of the second data table.
  • H 1 W C *H 0,C +W D *H 0,D
  • both W C and W D are numbers greater than 0 and less than 1.
  • the second data table C and the first data table have a direct primary and foreign key association relationship
  • the second data table D and the first data table have an indirect primary and foreign key relationship.
  • the relationship between the second data table C and the first data table is closer, preferably, W C is greater than W D .
  • the service node determines the heat H 0 of the first data table according to the inherent heat H 0 of the first data table and the associated heat H 1 of the first data table and the second data table.
  • H H 0 +H 1 .
  • the service node obtains the second data table that has an associated relationship with the first data table from the storage node, if it not only obtains the second data table that has a data blood relationship with the first data table, but also obtains the second data table that is related to the first data table.
  • the data table has a second data table with a primary and foreign key association relationship
  • the correlation H1 between the first data table and the second data table calculated by the service node includes not only the second data table that has a data blood relationship with the first data table.
  • the heat brought by it also includes the heat brought by the second data table that has a primary and foreign key association relationship with the first data table.
  • the first data table has both the data blood relationship shown in FIG. 5 and the primary and foreign key association shown in FIG.
  • the associated heat H 1 of the first data table and the second data table is:
  • H 1 W A *H 0,A +W B *H 0,B +W C *H 0,C +W D *H 0,D
  • the association heat H 1 includes not only the heat brought by the second data table that has a data blood relationship with the first data table, but also the heat brought by the second data table that has a primary and foreign key association relationship with the first data table.
  • the heat H of the first data table calculated by the service node not only includes the heat brought by the second data table that has a data blood relationship with the first data table, but also includes the first data table that has a primary and foreign key association relationship with the first data table. The heat brought by the data sheet.
  • the service node can obtain the heat of multiple data tables according to the data table heat discrimination method provided above.
  • the service node can obtain the heat of multiple data tables. Distinguish which data tables are more popular and which are less popular, so as to manage multiple data tables.
  • the service node may delete data tables whose heatness is less than the first preset threshold from the storage node according to the heatness of the multiple data tables, so as to save storage space.
  • the service node may display the data tables whose popularity is greater than the second preset threshold from the multiple data tables on the display interface according to the popularity of the multiple data tables
  • the position of the data table is adjusted to the front of the data table whose heat is less than the second preset threshold, that is to say, the position of the data table whose heat is greater than the second preset threshold on the display interface is adjusted to a position that is more convenient for users to view, which is convenient for users. Quickly view popular data sheets.
  • the service node may further migrate data tables whose heatness is less than the third preset threshold to the first storage device, and the heatness is greater than the fourth preset threshold.
  • the data table of the threshold is migrated to the second storage device, wherein the storage performance of the first storage device is lower than that of the storage node, and the storage performance of the second storage device is higher than that of the storage node.
  • the sizes of the first preset threshold, the second preset threshold, the third preset threshold, and the fourth preset threshold can be set according to actual conditions, and are not specifically limited here.
  • the service node migrates the data tables with low heat to the first storage device whose storage performance is lower than that of the storage node, which not only prevents the data tables with low heat from continuing to occupy the resources of the storage node, but also prevents the subsequent users from viewing this part of the data tables. It can be found from the first storage device; the service node migrates the hot data table to the second storage device with higher storage performance than the storage node, which can improve the efficiency of operating data in the hot data table, and improve the efficiency of the hot data table. Data sheet storage security.
  • the method for distinguishing the heat of data tables introduces a second data table that has an associated relationship with the first data table as the first data table when determining the heat degree H of the first data table.
  • the resulting heat that is, the associated heat H 1 between the first data table and the second data table, can make the calculated heat H of the first data table more accurate and more in line with the actual application scenario.
  • the hotness of multiple data tables can be better distinguished.
  • a method for distinguishing the heatness of a data table according to an embodiment of the present application is described in detail above. Based on the same inventive concept, the apparatus for distinguishing the heatness of a data table in an embodiment of the present application is continued below.
  • FIG. 8 is a schematic structural diagram of a data processing system 10 provided by an embodiment of the present application.
  • the data processing system 10 includes a data table heat distinguishing device 1100 provided by an embodiment of the present application.
  • the data table heat distinguishing device 1100 includes: an acquisition module 1101 and a processing module 1102, the data table heat discrimination device 1100 can be integrated into the service node 110 in the data processing system 10, and the data processing system 10 can include, in addition to the service node 110, a storage node 120, The first storage device 130 and the second storage device 140, wherein,
  • the storage node 120 stores a plurality of data tables
  • an obtaining module 1101, configured to obtain a second data table associated with the first data table from the storage node 120;
  • the processing module 1102 is configured to obtain the associated heat H 1 of the first data table and the second data table, wherein the associated heat H 1 of the first data table and the second data table is based on the inherent heat of the second data table and the first data
  • the association relationship between the table and the second data table is obtained, and the inherent heat of the second data table is the heat generated by the second data table being called;
  • the processing module 1102 is configured to determine the popularity H of the first data table according to the correlation H1 of the first data table and the second data table.
  • the obtaining module 1101 is specifically used for:
  • the processing module 1102 is specifically used for:
  • the correlation heat H 1 of the first data table and the second data table is calculated.
  • the obtaining module 1101 is specifically used for:
  • the processing module 1102 is specifically used for:
  • the association heat H 1 of the first data table and the second data table is calculated.
  • processing module 1102 is specifically configured to:
  • the heat H 0 of the first data table is determined according to the inherent heat H 0 of the first data table and the associated heat H 1 of the first data table and the second data table, wherein the inherent heat H 0 of the first data table is the first data table The heat generated by the call.
  • processing module 1102 is further configured to:
  • the data table whose heat is less than the first preset threshold is deleted from the storage node 120 .
  • processing module 1102 is further configured to:
  • the position on the display interface of the data table whose heat is greater than the second preset threshold among the plurality of data tables is adjusted to be in front of the data table whose heat is less than the second preset threshold.
  • processing module 1102 is further configured to:
  • the data tables whose heat is less than the third preset threshold are migrated to the first storage device 130
  • the data tables whose heat is greater than the fourth preset threshold are migrated to the second storage device 140
  • the data tables of the first storage device 130 are The performance is lower than that of the storage node 120
  • the performance of the second storage device 140 is higher than that of the storage node 120 .
  • the sizes of the first preset threshold, the second preset threshold, the third preset threshold, and the fourth preset threshold can be set according to actual conditions, and are not specifically limited here.
  • the data processing system 10 and the apparatus 1100 for distinguishing the heat of a data table are only an example provided by the embodiments of the present application, and the data processing system 10 and the apparatus 1100 for distinguishing the heat of a data table may have more or more components than those shown in FIG. 8 . Fewer components, two or more components may be combined, or may be implemented with different configurations of components.
  • the embodiment of the present application further provides a computing device cluster 20, and the computing device cluster 20 can be used to deploy the data processing system 10 shown in FIG. 8, and specifically can be used to deploy the data table in the data processing system 10 shown in FIG. 8
  • the heat distinguishing apparatus 1100 is configured to execute the data table heat distinguishing method provided by the embodiment of the present application.
  • the computing device cluster 20 includes at least one computing device 200 .
  • the computing device cluster 20 includes only one computing device 200 , all the modules in the data processing system 10 shown in FIG. 8 may be deployed in the one computing device 200 : the service node 110 and the storage node 120 , the first storage device 130 and the second storage device 140 .
  • each computing device 200 in the multiple computing devices 200 may be used to deploy some modules in the data processing system 10 shown in FIG. Two or more of the computing devices 200 of the computing devices 200 are jointly used to deploy one or more modules in the data processing system 10 shown in FIG. 8 .
  • the computing device 200A can be used to deploy the service node 110 and the storage node 120
  • the computing device 200B can be used to deploy the first storage device 130 and the second storage device 130.
  • the storage device 140, or the computing device 200A and the computing device 200B are jointly used to deploy the service node 110, for example, the obtaining module 1101 in the data table heat distinguishing device 1100 is deployed on the computing device 200A, and the data table heat distinguishing device is deployed on the computing device 200B
  • the computing device 200A is also used to deploy storage nodes
  • the computing device 200B is also used to deploy the first storage device 130 and the second storage device 140; it is assumed that the multiple computing devices 200 include computing devices 200A, 200B, 200C and 200D, the computing device 200A can be used to deploy the service node 110, the computing device 200B can be used to deploy the storage node 120, the computing device 200C can be used to deploy the first storage device 130, and the computing device 200D can be used to deploy the second storage device 140.
  • At least one computing device 200 included in the computing device cluster 20 may be all terminal devices, or all cloud servers, or some cloud servers and some terminal devices, which are not specifically limited here.
  • each computing device 200 in the computing device cluster 20 may include a processor, a memory, a communication interface, etc., and the memory in one or more computing devices 200 in the computing device cluster 20 may store the same
  • the code (which may also be referred to as an instruction or a program instruction, etc.) for executing the data table heat discrimination method provided by the embodiment of the present application
  • the processor can read the code from the memory, and execute the code to realize the code provided by the embodiment of the present application.
  • the communication interface can be used to realize the communication between each computing device 200 and other devices.
  • each computing device 200 in the computing device cluster 20 may also communicate with other devices through a network connection.
  • the network may be a wide area network or a local area network, or the like.
  • the computing device 200 in which the apparatus 1100 for distinguishing the data table heat is deployed includes: a processor 210 , a memory 220 and a communication interface 230 , wherein the processor 210 , the memory 220 and the communication interface 230 can be connected to each other through a bus 240 .
  • the processor 210 may read the code stored in the memory 220, and cooperate with the communication interface 230 to execute some or all of the steps of the data table heat discrimination method performed by the data table heat discrimination apparatus 1100 in the above embodiments of the present application.
  • the processor 210 may have various specific implementation forms, for example, the processor 210 may be a central processing unit (central processing unit, CPU) or a graphics processing unit (graphics processing unit, GPU), and the processor 210 may also be a single-core processor or multi-core processor.
  • the processor 210 may be a combination of a CPU and a hardware chip.
  • the above-mentioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof.
  • the above-mentioned PLD can be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general-purpose array logic (generic array logic, GAL) or any combination thereof.
  • the processor 210 may also be independently implemented by a logic device with built-in processing logic, such as an FPGA or a digital signal processor (digital signal processing, DSP).
  • the memory 220 may store codes as well as data.
  • the code includes: the code of the acquisition module 1101 and the code of the processing module 1102, etc.
  • the data includes: the inherent heat H 0 of the first data table, the inherent heat of the second data table, and the association between the first data table and the second data table Heat H 1 and so on.
  • the memory 220 may be a non-volatile memory, such as a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (erasable). PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), or flash memory.
  • ROM read-only memory
  • PROM programmable read-only memory
  • EPROM erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory volatile memory, which may be random access memory (RAM), which acts as an external cache.
  • Communication interface 230 may be a wired interface (eg, an Ethernet interface) or a wireless interface (eg, a cellular network interface or using a wireless local area network interface) for communicating with other computing nodes or devices.
  • the communication interface 230 may use a protocol family above transmission control protocol/internet protocol (TCP/IP), for example, remote function call (RFC) protocol, simple object access protocol (SOAP) protocol, simple network management protocol (SNMP) protocol, common object request broker architecture (CORBA) protocol, and distributed protocols and many more.
  • TCP/IP transmission control protocol/internet protocol
  • RRC remote function call
  • SOAP simple object access protocol
  • SNMP simple network management protocol
  • CORBA common object request broker architecture
  • the bus 240 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA for short) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus 240 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 10, but it does not mean that there is only one bus or one type of bus.
  • the above computing device 200 is configured to execute the method in the above embodiment of the method for classifying the heat of a data table, which belongs to the same concept as the above embodiment of the method.
  • the specific implementation process please refer to the above embodiment of the method, which will not be repeated here.
  • computing device 200 is only an example provided by the embodiments of the present application, and the computing device 200 may have more or less components than those shown in FIG. 10 , two or more components may be combined, or Different configurations of components are possible.
  • Embodiments of the present application also provide a non-transitory computer-readable storage medium, where code is stored in the non-transitory computer-readable storage medium, and when the non-transitory computer-readable storage medium runs on a processor, the data table heat rate described in the foregoing embodiments can be implemented. Distinguish some or all of the steps of the method.
  • the above embodiments it may be implemented in whole or in part by software, hardware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product may contain code.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media, or semiconductor media, and the like.
  • the steps in the method of the embodiment of the present application may be sequentially adjusted, combined or deleted according to actual needs; the units in the device of the embodiment of the present application may be divided, combined or deleted according to actual needs.

Abstract

Procédé et appareil de différenciation de chaleur de table de données, et dispositif associé. Le procédé comprend les étapes suivantes : un nœud de service obtient, à partir d'un nœud de stockage, une seconde table de données associée à une première table de données, puis obtient une chaleur associée entre la première table de données et la seconde table de données en fonction de la seconde table de données, et après avoir obtenu la chaleur associée entre la première table de données et la seconde table de données, détermine la chaleur de la première table de données en fonction de la chaleur associée entre la première table de données et la seconde table de données, la chaleur associée entre la première table de données et la seconde table de données étant obtenue en fonction de la chaleur inhérente de la seconde table de données et de l'association entre la première table de données et la seconde table de données. Ledit procédé peut améliorer la précision de différenciation de chaleur de table de données.
PCT/CN2022/071364 2021-04-12 2022-01-11 Procédé et appareil de différenciation de chaleur de table de données, et dispositif associé WO2022217987A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110389324.9 2021-04-12
CN202110389324.9A CN115203195A (zh) 2021-04-12 2021-04-12 数据表热度区分方法、装置以及相关设备

Publications (1)

Publication Number Publication Date
WO2022217987A1 true WO2022217987A1 (fr) 2022-10-20

Family

ID=83571486

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071364 WO2022217987A1 (fr) 2021-04-12 2022-01-11 Procédé et appareil de différenciation de chaleur de table de données, et dispositif associé

Country Status (2)

Country Link
CN (1) CN115203195A (fr)
WO (1) WO2022217987A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186566A (zh) * 2011-12-28 2013-07-03 中国移动通信集团河北有限公司 一种数据分级存储方法、装置及系统
US20150095184A1 (en) * 2013-09-30 2015-04-02 Alliance Data Systems Corporation Recommending a personalized ensemble
CN105447062A (zh) * 2014-09-30 2016-03-30 中国电信股份有限公司 热点数据识别方法和装置
CN111339404A (zh) * 2020-02-14 2020-06-26 腾讯科技(深圳)有限公司 基于人工智能的内容热度预测方法、装置和计算机设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186566A (zh) * 2011-12-28 2013-07-03 中国移动通信集团河北有限公司 一种数据分级存储方法、装置及系统
US20150095184A1 (en) * 2013-09-30 2015-04-02 Alliance Data Systems Corporation Recommending a personalized ensemble
CN105447062A (zh) * 2014-09-30 2016-03-30 中国电信股份有限公司 热点数据识别方法和装置
CN111339404A (zh) * 2020-02-14 2020-06-26 腾讯科技(深圳)有限公司 基于人工智能的内容热度预测方法、装置和计算机设备

Also Published As

Publication number Publication date
CN115203195A (zh) 2022-10-18

Similar Documents

Publication Publication Date Title
US11586692B2 (en) Streaming data processing
US11416528B2 (en) Query acceleration data store
US11461334B2 (en) Data conditioning for dataset destination
US10795884B2 (en) Dynamic resource allocation for common storage query
US11775501B2 (en) Trace and span sampling and analysis for instrumented software
US9785668B2 (en) High performance real-time relational database system and methods for using same
US20180089269A1 (en) Query processing using query-resource usage and node utilization data
US9418101B2 (en) Query optimization
US20180089259A1 (en) External dataset capability compensation
US20180089258A1 (en) Resource allocation for multiple datasets
US8892677B1 (en) Manipulating objects in hosted storage
US9201908B2 (en) Multi-layered multi-tenancy database architecture
US10970300B2 (en) Supporting multi-tenancy in a federated data management system
US10936559B1 (en) Strongly-consistent secondary index for a distributed data set
US10146814B1 (en) Recommending provisioned throughput capacity for generating a secondary index for an online table
US11074267B2 (en) Staged approach to automatic data discovery and performance
CN112559271B (zh) 分布式应用的接口性能监测方法、装置、设备及存储介质
US10812322B2 (en) Systems and methods for real time streaming
WO2019205365A1 (fr) Procédé et appareil de chargement de données de nœuds dom, dispositif informatique et support de stockage
CN112783887A (zh) 一种基于数据仓库的数据处理方法及装置
US20100191730A1 (en) Efficiency in processing queries directed to static data sets
JP6501924B2 (ja) アラートを解除する方法及びサーバー
WO2022217987A1 (fr) Procédé et appareil de différenciation de chaleur de table de données, et dispositif associé
US11816090B2 (en) Selectively processing an event published responsive to an operation on a database record that relates to consent
TW201828193A (zh) 一種用戶群體的劃分方法和裝置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22787219

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22787219

Country of ref document: EP

Kind code of ref document: A1