TWI743092B

TWI743092B - Recognition method, device and system of data table

Info

Publication number: TWI743092B
Application number: TW106107243A
Authority: TW
Inventors: 潘旻; 徐寧; 王偉
Original assignee: 香港商阿里巴巴集團服務有限公司
Priority date: 2016-06-17
Filing date: 2017-03-06
Publication date: 2021-10-21
Also published as: CN107515886A; US20170364582A1; TW201810083A; US10445345B2; CN107515886B; WO2017218744A1

Abstract

本發明實施例提供了一種資料表的識別方法、裝置和系統，所述方法包括：獲取資料表之間的第一依賴關係；依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；獲取所述資料表中的一個或多個欄位之間的第二依賴關係；依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度；依據所述關聯度，對所述資料表進行識別，使得在確定資料表的關聯度時從欄位細微性出發，透過欄位的使用情況、欄位本身的屬性、資料表的距離、資料表的連通性等維度，能夠科學合理地衡量出資料表之間的關聯度。 Embodiments of the present invention provide a method, device, and system for identifying data tables. The method includes: obtaining a first dependency relationship between data tables; and counting paths between the data tables according to the first dependency relationship Length and number of paths; obtain the second dependence relationship between one or more fields in the data table; determine the importance coefficient of the one or more fields according to the second dependence relationship; adopt all The path length, the number of paths, and the importance coefficient are used to determine the degree of association between the data tables; the data table is identified according to the degree of association, so that when determining the degree of association of the data table from the field Starting from nuance, through dimensions such as the usage of the field, the attributes of the field itself, the distance of the data table, the connectivity of the data table, etc., the correlation between the data tables can be measured scientifically and reasonably.

Description

Recognition method, device and system of data table

本發明關於資訊技術領域，特別是關於一種資料表的識別方法、一種資料表關聯度的確定方法、一種資料表的識別裝置、一種資料表關聯度的確定裝置和一種資料表的識別系統。 The present invention relates to the field of information technology, in particular to a data table identification method, a data table relevance determination method, a data table recognition device, a data table relevance determination device and a data table recognition system.

對於大資料，業界提出了3V特徵，即規模性(Volumn)、高速性(Velocity)和多樣性(Variety)。隨著近幾年的發展，大資料的儲存、計算能力都取得了不錯的發展，目前，迫切需要解決的就是大資料的多樣性。為了滿足大資料的多樣性要求，其中一種解決方案就是資料交換。資料交換可以在不同公司之間進行，也可以在同一公司內部不同業務部門之間進行。資料交換的具體形式就是資料倉庫中或者雲計算環境下不同資料表之間的相互訪問。在日常業務過程中，為了滿足各項業務對於資料多樣性的需求，一張結果資料表的組成也許需要依賴多個業務部門的資料表甚至是不同公司開放出來的資料表，但是，在資料交換和互訪問中，不同的資料表對於滿足業務需求的結果資料表的重要性可能不同，如何識別出具有較高重要性的資料表，以便重點運維和重點保障便成了大資料時代的一項重要任務。由於對於資料表重要性的識別主要是透過資料表的關聯度來確定的，因此，各部門、各公司提供的資料對於滿足業務需求的結果資料表的關聯度大小如何確定，就成為資料互訪問中資料交換價值衡量與計量的關鍵。 For big data, the industry has proposed 3V features, namely, scale (Volumn), high speed (Velocity) and diversity (Variety). With the development in recent years, the storage and computing capabilities of big data have made good progress. At present, what urgently needs to be solved is the diversity of big data. In order to meet the diverse requirements of big data, one of the solutions is data exchange. Data exchange can be carried out between different companies, or between different business departments within the same company. The specific form of data exchange is the mutual access between different data tables in a data warehouse or in a cloud computing environment. In the daily business process, in order to meet the needs of various businesses for the diversity of data, the composition of a result data sheet may need to rely on data sheets of multiple business departments or even data sheets opened by different companies. However, in the data exchange And mutual visits, different data tables are essential to meet business needs The importance of the result data table may be different. How to identify the data table with higher importance so that the key operation and maintenance and key protection have become an important task in the era of big data. Since the identification of the importance of the data table is mainly determined by the degree of relevance of the data table, how to determine the degree of relevance of the result data table that meets the business needs of the data provided by each department and company becomes a data mutual access The key to data exchange value measurement and measurement in China.

通常，資料表的儲存可以透過資料倉庫來實現，資料倉庫中往往存在著成千上萬的資料表，而每一張資料表中又會有數十個或者數百個欄位。在某一具體的業務情況中，為了達到具體的分析需求，可以將多張資料表之間的依賴關係透過一個複雜的有向圖來表示。如圖1所示，是一種以資料表為節點的有向無環圖的示意圖。在圖1中，圓圈代表資料表，圓圈中的字母代表資料表的名稱，例如資料表A，資料表B等；圓圈旁注釋塊中字母代表資料表中的欄位名，例如資料表A中有欄位有a1、a2、a3和a4；兩個圓圈之間帶有方向的線段代表兩個資料表之間存在著掃描/依賴關係，例如從資料表A到資料表C的箭頭，表示資料表A為資料表C貢獻了欄位a1和欄位a2兩個欄位，也可以說資料表C的產生需要依賴資料表A的欄位a1和欄位a2。 Usually, the storage of data tables can be realized through a data warehouse. There are often thousands of data tables in the data warehouse, and there are dozens or hundreds of fields in each data table. In a specific business situation, in order to achieve specific analysis requirements, the dependency relationship between multiple data tables can be expressed through a complex directed graph. As shown in Figure 1, it is a schematic diagram of a directed acyclic graph with data tables as nodes. In Figure 1, the circle represents the data table, and the letters in the circle represent the name of the data table, such as data table A, data table B, etc.; the letters in the comment block next to the circle represent the column names in the data table, such as data table A There are fields a1, a2, a3, and a4; the line segment with the direction between the two circles indicates that there is a scan/dependency relationship between the two data tables, for example, the arrow from data table A to data table C indicates data Table A contributes two fields, field a1 and field a2, to table C. It can also be said that the generation of table C needs to rely on field a1 and field a2 of table A.

已有技術在計算兩張資料表之間的關聯度時，分為兩種情況分別計算：一種是兩張資料表存在直接依賴關係，例如圖1中資料表A與資料表C，而另一種則是兩張資料表存在間接依賴關係，例如圖1中資料表A與資料表E。 In the prior art, when calculating the correlation between two data tables, it is calculated separately in two cases: one is that the two data tables have a direct dependency relationship, such as data table A and data table C in Figure 1, and the other Two pieces of information Tables have indirect dependencies, such as data table A and data table E in Figure 1.

對於存在直接依賴關係的資料表，已有技術按照貢獻的欄位個數占比來計算關聯度。例如在圖1中，在計算資料表A與資料表C之間的關聯度時，首先確認資料表C所依賴的資料表包括資料表A和資料表B，其中資料表A為資料表C貢獻了2個欄位，而資料表B則為資料表C只貢獻了1個欄位，因此資料表A和資料表B對資料表C的關聯度比例為2：1，即資料表A對資料表C的關聯度為2/3，資料表B對資料表C的關聯度為1/3。 For data tables that have a direct dependency relationship, the prior art calculates the degree of relevance according to the proportion of the number of contributed fields. For example, in Figure 1, when calculating the correlation between data table A and data table C, first confirm that data table C depends on data table including data table A and data table B, where data table A contributes to data table C Data table B only contributes 1 field to data table C. Therefore, the correlation ratio of data table A and data table B to data table C is 2:1, that is, data table A is to data The relevance of table C is 2/3, and the relevance of data table B to data table C is 1/3.

對於不存在直接依賴關係的資料表，已有技術在計算關聯度時需要透過中間資料表，將間接依賴關係轉化為存在直接關係的資料連結來進行計算。例如圖1中資料表A對資料表E的關聯度，需要首先計算資料表A對資料表C的關聯度，以及，資料表C對資料表E的關聯度。由於資料表A對資料表C的關聯度為2/3，資料表C對資料表E的關聯度為1/4，則資料表A對資料表E的關聯度為2/3 * 1/4=1/6。 For data tables that do not have a direct dependency relationship, the existing technology needs to use an intermediate data table to convert the indirect dependency relationship into a data link that has a direct relationship for calculation when calculating the relevance. For example, in Figure 1 the degree of relevance of data table A to data table E, it is necessary to first calculate the degree of relevance of data table A to data table C, and the degree of relevance of data table C to data table E. Since the relevance degree of data table A to data table C is 2/3, and the relevance degree of data table C to data table E is 1/4, the relevance degree of data table A to data table E is 2/3 * 1/4 =1/6.

但是，按照上述已有技術計算的資料表之間的關聯度只能精確到資料表細微性，無法具體精確到資料表的欄位細微性，而事實上一張資料表內部的不同資料欄位的重要性存在較大差別的，已有技術的計算方法無法體現出這種差異。其次，對於存在直接依賴關係的父子表之間，已有技術只是簡單依據一張子表對父表貢獻的欄位數比例作為關聯度大小，考慮的因數過於簡單，無法完全精確的反映實際的業務情況的區別。第三，對於只有間接依賴關係的父子表之間的關聯度計算，已有技術透過轉化為直接依賴的資料表之間關聯度的乘積，使得相隔一兩層的資料表之間關聯度會成指數級減小，關聯度衰減速率過快，無法體現資料表之間真實的貢獻情況。因此，按照已有技術對資料表的重要性進行識別的結果並不準確。 However, the correlation between the data tables calculated according to the above-mentioned existing technology can only be accurate to the subtlety of the data table, and cannot be specifically accurate to the subtlety of the fields of the data table. In fact, there are different data fields in a data table. If there is a big difference in the importance of, the existing technical calculation method cannot reflect this difference. Secondly, for parent-child tables that have a direct dependency relationship, the existing technology simply bases on the proportion of column digits contributed by a child table to the parent table as the degree of relevance, and the factors considered are too simple to be completely accurate. The actual business situation is different. Third, for the calculation of the correlation between the parent-child tables with only indirect dependencies, the existing technology converts the product of the correlation between the directly dependent data tables, so that the correlation between the data tables separated by one or two levels will become Decrease exponentially, and the decay rate of the correlation degree is too fast to reflect the true contribution between the data tables. Therefore, the result of identifying the importance of the data table according to the existing technology is not accurate.

鑒於上述問題，提出了本發明實施例以便提供一種克服上述問題或者至少部分地解決上述問題的一種資料表的識別方法、一種資料表關聯度的確定方法、一種資料表的識別裝置、一種資料表關聯度的確定裝置和相應的一種資料表的識別系統。 In view of the above problems, embodiments of the present invention are proposed to provide a method for identifying data tables, a method for determining the degree of association of data tables, a device for identifying data tables, and a data table that overcome or at least partially solve the above problems. A device for determining the degree of relevance and a corresponding identification system for a data table.

為了解決上述問題，本發明揭示了一種資料表的識別系統，所述系統包括終端和伺服器，其中：所述終端執行：接收針對資料業務的識別指令；將所述識別指令提交至伺服器；接收伺服器發送的所述資料業務所關聯的資料表，其中，所述資料業務所關聯的資料表由所述伺服器針對所述識別指令，透過識別所述資料業務所關聯的資料表獲得；展現所述資料業務所關聯的資料表；所述伺服器執行：接收針對資料業務的識別指令；針對所述識別指令，對所述資料業務所關聯的資料表進行識別；輸出所述資料業務所關聯的資料表。 In order to solve the above problems, the present invention discloses a data table identification system, the system includes a terminal and a server, wherein: the terminal executes: receiving identification instructions for data services; submitting the identification instructions to the server; Receiving a data table associated with the data service sent by a server, wherein the data table associated with the data service is obtained by the server by identifying the data table associated with the data service in response to the identification instruction; Display the data table associated with the data service; the server executes: receiving an identification instruction for the data service; For the identification instruction, identify the data table associated with the data service; output the data table associated with the data service.

可選地，所述針對所述識別指令，對所述資料業務所關聯的資料表進行識別的步驟包括：獲取資料表之間的第一依賴關係；依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；獲取所述資料表中的一個或多個欄位之間的第二依賴關係；依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度；依據所述關聯度，對所述資料表進行識別。 Optionally, the step of identifying the data tables associated with the data service for the identification instruction includes: obtaining a first dependency relationship between the data tables; and counting the data tables according to the first dependency relationship The path length and the number of paths between the data tables; obtain the second dependency relationship between one or more fields in the data table; determine the value of the one or more fields according to the second dependency relationship Importance coefficient; the path length, the number of paths, and the importance coefficient are used to determine the degree of relevance between the data tables; the data table is identified according to the degree of relevance.

為了解決上述問題，本發明揭示了一種資料表的識別方法，包括：接收針對資料業務的識別指令；將所述識別指令提交至伺服器；接收伺服器發送的所述資料業務所關聯的資料表，其中，所述資料業務所關聯的資料表由所述伺服器針對所述識別指令，透過識別所述資料業務所關聯的資料表獲得；展現所述資料業務所關聯的資料表。 In order to solve the above problems, the present invention discloses a data table identification method, which includes: receiving an identification instruction for a data service; submitting the identification instruction to a server; receiving a data table associated with the data service sent by the server , Wherein the data table associated with the data service is obtained by the server by identifying the data table associated with the data service in response to the identification instruction; the data table associated with the data service is displayed.

為了解決上述問題，本發明揭示了一種資料表的識別方法，包括：接收由終端提交的針對資料業務的識別指令；針對所述識別指令，識別所述資料業務所關聯的資料表；向終端發送所述資料業務所關聯的資料表。 In order to solve the above problems, the present invention discloses a data table identification The method includes: receiving an identification instruction for a data service submitted by a terminal; identifying a data table associated with the data service in response to the identification instruction; and sending the data table associated with the data service to the terminal.

可選地，所述針對所述識別指令，識別所述資料業務所關聯的資料表的步驟包括：獲取資料表之間的第一依賴關係；依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；獲取所述資料表中的一個或多個欄位之間的第二依賴關係；依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度；依據所述關聯度，對所述資料表進行識別。 Optionally, the step of identifying the data table associated with the data service for the identification instruction includes: obtaining a first dependency relationship between the data tables; and counting the data tables based on the first dependency relationship The length of the path and the number of paths between; obtain the second dependency relationship between one or more fields in the data table; determine the importance of the one or more fields according to the second dependency relationship Coefficient; the path length, the number of paths, and the importance coefficient are used to determine the degree of association between the data tables; the data table is identified according to the degree of association.

可選地，所述依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目的步驟包括：針對所述第一依賴關係，構建所述資料表之間的有向無環圖；統計所述有向無環圖中的路徑長度和路徑數目。 Optionally, the step of counting the path length and the number of paths between the data tables according to the first dependency relationship includes: constructing a directed or non-directed relationship between the data tables for the first dependency relationship. Ring graph; count the path length and the number of paths in the directed acyclic graph.

可選地，所述針對所述第一依賴關係，構建所述資料表之間的有向圖的步驟包括：按照所述第一依賴關係所對應的順序，構建以所述資料表為節點的有向圖；刪除所述有向圖中的環，獲得所述資料表之間的有向無環圖。 Optionally, the step of constructing a directed graph between the data tables for the first dependency relationship includes: According to the sequence corresponding to the first dependency relationship, construct a directed graph with the data tables as nodes; delete loops in the directed graph to obtain a directed acyclic graph between the data tables.

可選地，所述統計所述有向無環圖中的路徑長度和路徑數目的步驟包括：統計所述有向無環圖中的第一資料表與第二資料表之間的一條或多條路徑的長度，以及，所述第一資料表到任一資料表的路徑數目，和，所述第一資料表到任一資料表且經過第二資料表的路徑數目。 Optionally, the step of counting the path length and the number of paths in the directed acyclic graph includes: counting one or more data tables between the first data table and the second data table in the directed acyclic graph. The length of the path, and the number of paths from the first data table to any data table, and the number of paths from the first data table to any data table and passing through the second data table.

可選地，所述依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數的步驟包括：獲取所述一個或多個欄位在預設時間段內的使用次數，所述一個或多個欄位具有對應的欄位等級；根據所述使用次數，和/或，欄位等級，確定所述一個或多個欄位的重要性係數，其中，所述一個或多個欄位的重要性係數與所述使用次數，和/或，所述欄位等級正相關。 Optionally, the step of determining the importance coefficient of the one or more fields according to the second dependency relationship includes: obtaining the number of times the one or more fields are used in a preset time period, The one or more fields have corresponding field levels; the importance coefficient of the one or more fields is determined according to the number of times of use, and/or the field level, wherein the one or more fields The importance coefficient of each field is positively correlated with the number of uses, and/or the field level.

可選地，所述採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度的步驟包括：採用所述第一資料表與第二資料表之間的一條或多條路徑的長度，確定第一資料表與第二資料表之間的距離係數；採用所述第一資料表到任一資料表的路徑數目，和，所述第一資料表到任一資料表且經過第二資料表的路徑數目，確定第一資料表與第二資料表之間的連通係數；採用所述第一資料表與第二資料表之間的距離係數，所述第一資料表與第二資料表之間的連通係數，以及第一資料表中的一個或多個欄位的重要性係數，第二資料表中的一個或多個欄位的重要性係數，確定第一資料表中的一個或多個欄位對第二資料表中的一個或多個欄位的關聯度，所述第一資料表中的一個或多個欄位與第二資料表中的一個或多個欄位具有依賴關係；採用所述第一資料表中的一個或多個欄位對第二資料表中的一個或多個欄位的關聯度，確定第一資料表對第二資料表的關聯度。 Optionally, the step of using the path length, the number of paths, and the importance coefficient to determine the degree of relevance between the data tables includes: using the difference between the first data table and the second data table The length of one or more paths determines the distance coefficient between the first data table and the second data table; the number of paths from the first data table to any data table is used, and, The number of paths from the first data table to any data table and passing through the second data table determines the connectivity coefficient between the first data table and the second data table; adopting the first data table and the second data table The distance coefficient between the first data table and the second data table, and the importance coefficient of one or more fields in the first data table, one or more of the second data table The importance coefficient of a field determines the degree of relevance of one or more fields in the first data table to one or more fields in the second data table, and one or more fields in the first data table Bit has a dependency relationship with one or more fields in the second data table; using the degree of relevance of one or more fields in the first data table to one or more fields in the second data table, Determine the degree of relevance of the first data table to the second data table.

可選地，所述依據所述關聯度，對所述資料表進行識別的步驟包括：按照所述關聯度的大小，識別出資料業務所需的多張資料表。 Optionally, the step of identifying the data table according to the degree of relevance includes: identifying multiple data tables required for data services according to the magnitude of the degree of relevance.

可選地，所述按照所述關聯度的大小，識別出資料業務所需的多張資料表的步驟包括：分別獲取所述資料業務所需的資料表的關聯度大小；根據所述關聯度大小，從所述資料業務所需的資料表中篩選出預設數量的多張資料表。 Optionally, the step of identifying a plurality of data tables required by the data service according to the size of the relevance degree includes: obtaining the relevancy degree sizes of the data tables required by the data service respectively; according to the degree of relevance Size, select a preset number of multiple data tables from the data tables required by the data business.

為了解決上述問題，本發明揭示了一種資料表的識別方法，包括：獲取資料表之間的第一依賴關係；依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；獲取所述資料表中的一個或多個欄位之間的第二依賴關係；依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度；依據所述關聯度，對所述資料表進行識別。 In order to solve the above-mentioned problems, the present invention discloses a method for identifying data tables, including: obtaining a first dependency relationship between data tables; According to the first dependency relationship, the path length and the number of paths between the data tables are counted; the second dependency relationship between one or more fields in the data table is obtained; according to the second dependency relationship , Determine the importance coefficient of the one or more fields; use the path length, the number of paths, and the importance coefficient to determine the degree of relevance between the data tables; according to the degree of relevance, Data sheet for identification.

為了解決上述問題，本發明揭示了一種資料表關聯度的確定方法，包括：獲取資料表之間的第一依賴關係；依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；獲取所述資料表中的一個或多個欄位之間的第二依賴關係；依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度。 In order to solve the above problems, the present invention discloses a method for determining the degree of association of data tables, including: obtaining a first dependency relationship between data tables; and counting the path length and the path length between the data tables according to the first dependency relationship. The number of paths; obtain the second dependency relationship between one or more fields in the data table; determine the importance coefficient of the one or more fields according to the second dependency relationship; adopt the path The length, the number of paths, and the importance coefficient determine the degree of association between the data tables.

為了解決上述問題，本發明揭示了一種資料表的識別裝置，包括：第一接收模組，用於接收針對資料業務的識別指令；提交模組，用於將所述識別指令提交至伺服器；第二接收模組，用於接收伺服器發送的所述資料業務所關聯的資料表，其中，所述資料業務所關聯的資料表由所述伺服器針對所述識別指令，透過識別所述資料業務所關聯的資料表獲得；展現模組，用於展現所述資料業務所關聯的資料表。 In order to solve the above problems, the present invention discloses a data table identification device, including: a first receiving module for receiving identification instructions for data services; a submission module for submitting the identification instructions to a server; The second receiving module is used to receive the data table associated with the data service sent by the server, wherein the data table associated with the data service is identified by the server in response to the identification instruction. The data table associated with the service is obtained; the display module is used to display the data table associated with the data service.

為了解決上述問題，本發明揭示了一種資料表的識別裝置，包括：第三接收模組，用於接收由終端提交的針對資料業務的識別指令；識別模組，用於針對所述識別指令，識別所述資料業務所關聯的資料表；發送模組，用於向終端發送所述資料業務所關聯的資料表。 In order to solve the above-mentioned problems, the present invention discloses a data table identification device, which includes: a third receiving module for receiving identification instructions for data services submitted by a terminal; and an identification module for identifying the instructions, Identify the data table associated with the data service; the sending module is used to send the data table associated with the data service to the terminal.

可選地，所述識別模組包括：第一依賴關係獲取子模組，用於獲取資料表之間的第一依賴關係；路徑長度和路徑數目統計子模組，用於依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；第二依賴關係獲取子模組，用於獲取所述資料表中的一個或多個欄位之間的第二依賴關係；重要性係數確定子模組，用於依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；關聯度確定子模組，用於採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度；資料表識別子模組，用於依據所述關聯度，對所述資料表進行識別。 Optionally, the identification module includes: a first dependency relationship acquisition sub-module for acquiring a first dependency relationship between data tables; and a path length and path number statistics sub-module for acquiring a first dependency relationship based on the first dependency relationship. Dependency relationship, count the path length and the number of paths between the data tables; the second dependency relationship acquisition sub-module is used to acquire the second dependency relationship between one or more fields in the data table; important The sub-module for determining the coefficient of sex is used to determine the importance coefficient of the one or more fields according to the second dependency relationship; the sub-module for determining the degree of relevance is used to use the path length, the number of paths, and , The importance coefficient, to determine the degree of relevance between the data tables; The data table identification sub-module is used to identify the data table according to the degree of association.

可選地，所述路徑長度和路徑數目統計子模組包括：有向無環圖構建單元，用於針對所述第一依賴關係，構建所述資料表之間的有向無環圖；路徑長度和路徑數目統計單元，用於統計所述有向無環圖中的路徑長度和路徑數目。 Optionally, the path length and path number statistics sub-module includes: a directed acyclic graph construction unit, configured to construct a directed acyclic graph between the data tables for the first dependency relationship; paths; The length and path number statistics unit is used to count the path length and path number in the directed acyclic graph.

可選地，所述有向無環圖構建單元包括：無環圖構建子單元，用於按照所述第一依賴關係所對應的順序，構建以所述資料表為節點的有向圖；有向無環圖獲得子單元，用於刪除所述有向圖中的環，獲得所述資料表之間的有向無環圖。 Optionally, the directed acyclic graph construction unit includes: an acyclic graph construction subunit, configured to construct a directed graph with the data table as a node according to the sequence corresponding to the first dependency relationship; The directed acyclic graph obtaining subunit is used to delete the rings in the directed graph and obtain the directed acyclic graphs between the data tables.

可選地，所述路徑長度和路徑數目統計單元包括：路徑長度統計子單元，用於統計所述有向無環圖中的第一資料表與第二資料表之間的一條或多條路徑的長度，以及，路徑數目統計子單元，用於統計所述第一資料表到任一資料表的路徑數目，和，所述第一資料表到任一資料表且經過第二資料表的路徑數目。 Optionally, the path length and path number statistics unit includes: a path length statistics subunit for counting one or more paths between the first data table and the second data table in the directed acyclic graph The length of, and the path number counting subunit, used to count the number of paths from the first data table to any data table, and the path from the first data table to any data table and passing through the second data table number.

可選地，所述重要性係數確定子模組包括：使用次數獲取單元，用於獲取所述一個或多個欄位在預設時間段內的使用次數，所述一個或多個欄位具有對應的欄位等級；重要性係數確定單元，用於根據所述使用次數，和/ 或，欄位等級，確定所述一個或多個欄位的重要性係數，其中，所述一個或多個欄位的重要性係數與所述使用次數，和/或，所述欄位等級正相關。 Optionally, the importance coefficient determining sub-module includes: a usage frequency obtaining unit, configured to obtain the usage frequency of the one or more fields within a preset time period, and the one or more fields have Corresponding column level; importance coefficient determining unit, used according to the number of uses, and/ Or, the field level determines the importance coefficient of the one or more fields, wherein the importance coefficient of the one or more fields and the number of uses, and/or, the field level is positive Related.

可選地，所述關聯度確定子模組包括：距離係數確定單元，用於採用所述第一資料表與第二資料表之間的一條或多條路徑的長度，確定第一資料表與第二資料表之間的距離係數；連通係數確定單元，用於採用所述第一資料表到任一資料表的路徑數目，和，所述第一資料表到任一資料表且經過第二資料表的路徑數目，確定第一資料表與第二資料表之間的連通係數；欄位關聯度確定單元，用於採用所述第一資料表與第二資料表之間的距離係數，所述第一資料表與第二資料表之間的連通係數，以及第一資料表中的一個或多個欄位的重要性係數，第二資料表中的一個或多個欄位的重要性係數，確定第一資料表中的一個或多個欄位對第二資料表中的一個或多個欄位的關聯度，所述第一資料表中的一個或多個欄位與第二資料表中的一個或多個欄位具有依賴關係；資料表關聯度確定單元，用於採用所述第一資料表中的一個或多個欄位對第二資料表中的一個或多個欄位的關聯度，確定第一資料表對第二資料表的關聯度。 Optionally, the correlation degree determining sub-module includes: a distance coefficient determining unit, configured to use the length of one or more paths between the first data table and the second data table to determine the relationship between the first data table and the second data table. The distance coefficient between the second data table; the connection coefficient determining unit is used to adopt the number of paths from the first data table to any data table, and, the first data table to any data table and pass through the second data table The number of paths in the data table determines the connection coefficient between the first data table and the second data table; the field association degree determination unit is used to adopt the distance coefficient between the first data table and the second data table, so Describe the connectivity coefficient between the first data table and the second data table, and the importance coefficient of one or more fields in the first data table, and the importance coefficient of one or more fields in the second data table , Determine the degree of relevance of one or more fields in the first data table to one or more fields in the second data table, one or more fields in the first data table and the second data table One or more fields in the data table have a dependency relationship; the data table association degree determining unit is used to compare one or more fields in the second data table by using one or more fields in the first data table The degree of relevance determines the degree of relevance of the first data table to the second data table.

可選地，所述資料表識別子模組包括：資料表識別單元，用於按照所述關聯度的大小，識別出資料業務所需的多張資料表。 Optionally, the data table identification sub-module includes: a data table identification unit configured to identify Produce multiple data tables required for data business.

可選地，所述資料表識別單元包括：資料表關聯度獲取子單元，用於分別獲取所述資料業務所需的資料表的關聯度大小；資料表篩選子單元，用於根據所述關聯度大小，從所述資料業務所需的資料表中篩選出預設數量的多張資料表。 Optionally, the data table identification unit includes: a data table relevance degree obtaining subunit, which is used to obtain the relevance degree size of the data table required by the data business; The number of data tables is selected from the data tables required by the data business.

為了解決上述問題，本發明揭示了一種資料表的識別裝置，包括：第一依賴關係獲取模組，用於獲取資料表之間的第一依賴關係；路徑長度和路徑數目統計模組，用於依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；第二依賴關係獲取模組，用於獲取所述資料表中的一個或多個欄位之間的第二依賴關係；重要性係數確定模組，用於依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；關聯度確定模組，用於採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度；資料表識別模組，用於依據所述關聯度，對所述資料表進行識別。 In order to solve the above problems, the present invention discloses a data table identification device, which includes: a first dependency acquisition module for acquiring the first dependency relationship between the data tables; a path length and path number statistics module for According to the first dependency relationship, the path length and the number of paths between the data tables are counted; the second dependency relationship acquisition module is used to acquire the second relationship among one or more fields in the data table Dependency relationship; importance coefficient determination module for determining the importance coefficient of the one or more fields according to the second dependency relationship; correlation degree determination module for adopting the path length and the number of paths , And an importance coefficient to determine the degree of association between the data tables; the data table identification module is used to identify the data tables based on the degree of association.

為了解決上述問題，本發明揭示了一種資料表關聯度的確定裝置，包括：第一依賴關係獲取模組，用於獲取資料表之間的第一依賴關係；路徑長度和路徑數目統計模組，用於依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；第二依賴關係獲取模組，用於獲取所述資料表中的一個或多個欄位之間的第二依賴關係；重要性係數確定模組，用於依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；關聯度確定模組，用於採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度。 In order to solve the above problems, the present invention discloses a device for determining the degree of association of data tables, including: a first dependency acquisition module for acquiring the first dependency between data tables Dependency relationship; a path length and path number statistics module for calculating the path length and path number between the data tables according to the first dependency relationship; a second dependency relationship acquisition module for acquiring the data The second dependence relationship between one or more fields in the table; the importance coefficient determination module is used to determine the importance coefficient of the one or more fields according to the second dependence relationship; the degree of relevance The determining module is used to determine the degree of association between the data tables by using the path length, the number of paths, and the importance coefficient.

與背景技術相比，本發明實施例包括以下優點： Compared with the background art, the embodiments of the present invention include the following advantages:

本發明實施例，在依據第一依賴關係統計所述資料表之間的路徑長度和路徑數目，以及，依據第二依賴關係確定一個或多個欄位的重要性係數後，採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度，並依據所述關聯度，對所述資料表進行識別，使得在確定資料表的關聯度時從欄位細微性出發，透過欄位的使用情況、欄位本身的屬性、資料表的距離、資料表的連通性等維度，能夠科學合理地衡量出資料表之間的關聯度。 In the embodiment of the present invention, the path length and the number of paths between the data tables are counted according to the first dependency relationship, and the importance coefficient of one or more fields is determined according to the second dependency relationship, and then the path length is used , The number of paths, and the importance coefficient to determine the degree of relevance between the data tables, and identify the data tables according to the degree of relevance, so as to determine the degree of relevance of the data table from the subtlety of the column Starting from this, through dimensions such as the usage of the field, the attributes of the field itself, the distance of the data table, the connectivity of the data table, etc., the correlation between the data tables can be measured scientifically and reasonably.

其次，本發明實施例採用圖論的思想提出了資料表之間的連通係數和距離係數，作為資料表之間關聯度計量的兩個重要權重因數，並引入了資料表之間的層級關係，透過將兩張表之間的層級關係融入到距離係數中，來合理解決非直接依賴資料表之間的關聯度問題，避免了非直接依賴資料表之間關聯度隨著層級的變化衰減太快的問題。 Secondly, the embodiment of the present invention adopts the idea of graph theory to propose the connectivity coefficient and the distance coefficient between the data tables as two important weighting factors for measuring the degree of association between the data tables, and introduces the hierarchical relationship between the data tables. By integrating the hierarchical relationship between the two tables into the distance coefficient, the problem of the degree of association between the indirect dependent data tables can be reasonably solved, and the indirect dependence can be avoided. It depends on the problem that the degree of association between the data tables decays too quickly with the change of the level.

101、102、103、104、105、106‧‧‧方法步驟 101, 102, 103, 104, 105, 106‧‧‧Method steps

201、202、203、204、205、206、207、208‧‧‧方法步驟 201, 202, 203, 204, 205, 206, 207, 208‧‧‧Method steps

301、302、303、304、305‧‧‧方法步驟 301, 302, 303, 304, 305‧‧‧Method steps

401、402、403、404‧‧‧方法步驟 401, 402, 403, 404‧‧‧ method steps

501、502、503‧‧‧方法步驟 501, 502, 503‧‧‧Method steps

601‧‧‧第一接收模組 601‧‧‧First receiving module

602‧‧‧提交模組 602‧‧‧Submit Module

603‧‧‧第二接收模組 603‧‧‧Second Receiving Module

604‧‧‧展現模組 604‧‧‧Display Module

701‧‧‧第三接收模組 701‧‧‧Third receiving module

702‧‧‧識別模組 702‧‧‧Identification Module

703‧‧‧發送模組 703‧‧‧Send module

801‧‧‧第一依賴關係獲取模組 801‧‧‧First dependency acquisition module

802‧‧‧路徑長度和路徑數目統計模組 802‧‧‧Path length and path number statistics module

803‧‧‧第二依賴關係獲取模組 803‧‧‧Second dependency acquisition module

804‧‧‧重要性係數確定模組 804‧‧‧Importance coefficient determination module

805‧‧‧關聯度確定模組 805‧‧‧Relationship determination module

806‧‧‧資料表識別模組 806‧‧‧Data table identification module

901‧‧‧第一依賴關係獲取模組 901‧‧‧First dependency acquisition module

902‧‧‧路徑長度和路徑數目統計模組 902‧‧‧Path length and path number statistics module

903‧‧‧第二依賴關係獲取模組 903‧‧‧Second dependency acquisition module

904‧‧‧重要性係數確定模組 904‧‧‧Importance coefficient determination module

905‧‧‧關聯度確定模組 905‧‧‧Relationship determination module

圖1是一種以資料表為節點的有向無環圖的示意圖；圖2是本發明的一種資料表的識別方法實施例一的步驟流程圖；圖3一種標注有欄位依賴關係的有向無環圖的示意圖；圖4是本發明的一種資料表的識別方法實施例二的步驟流程圖；圖5是一種具有環的有向圖的示意圖；圖6是本發明的一種資料表關聯度的確定方法實施例三的步驟流程圖；圖7是本發明的一種資料表的識別方法實施例四的步驟流程圖；圖8是本發明的一種資料表的識別方法實施例五的步驟流程圖；圖9是本發明的一種資料表的識別裝置實施例一的結構方塊圖；圖10是本發明的一種資料表的識別裝置實施例二的結構方塊圖；圖11是本發明的一種資料表的識別裝置實施例三的結構方塊圖；圖12是本發明的一種資料表關聯度的確定裝置實施例四的結構方塊圖；圖13是本發明的一種資料表的識別系統的架構圖。 Fig. 1 is a schematic diagram of a directed acyclic graph with a data table as a node; Fig. 2 is a flow chart of the first embodiment of a data table identification method of the present invention; Fig. 3 is a directed acyclic graph marked with field dependencies A schematic diagram of an acyclic graph; FIG. 4 is a step flow chart of the second embodiment of a data table identification method of the present invention; FIG. 5 is a schematic diagram of a directed graph with a ring; FIG. 6 is a data table association degree of the present invention Figure 7 is a step flow chart of the fourth embodiment of a data table identification method of the present invention; Figure 8 is a step flow chart of the fifth embodiment of a data table identification method of the present invention 9 is a structural block diagram of Embodiment 1 of a data table identification device of the present invention; Figure 10 is a structural block diagram of Embodiment 2 of a data table identification device of the present invention; Figure 11 is a data table of the present invention The block diagram of the structure of the third embodiment of the identification device; Figure 12 is the implementation of the device for determining the degree of association of a data table of the present invention The structure block diagram of Example 4; FIG. 13 is a structure diagram of a data table identification system of the present invention.

為使本發明的上述目的、特徵和優點能夠更加明顯易懂，下面結合附圖和具體實施方式對本發明作進一步詳細的說明。 In order to make the above-mentioned objects, features and advantages of the present invention more obvious and understandable, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

參照圖2，示出了本發明的一種資料表的識別方法實施例一的步驟流程圖，具體可以包括如下步驟： Referring to FIG. 2, there is shown a step flow chart of Embodiment 1 of a method for identifying data tables of the present invention, which may specifically include the following steps:

步驟101，獲取資料表之間的第一依賴關係；大資料環境下，資料會以一張張資料表的形式存在於資料倉庫或資料庫中，資料表是一種邏輯概念，可以認為資料表中的資料彼此之間都符合一定的邏輯規則或者邏輯條件。 Step 101: Obtain the first dependency relationship between the data tables; in the big data environment, the data will exist in the data warehouse or database in the form of data tables. The data table is a logical concept, which can be regarded as the data table. The data of each complies with certain logical rules or logical conditions.

在本發明實施例中，所述第一依賴關係可以是資料表之間的依賴關係或掃描關係。所述依賴關係或掃描關係是指某一資料表的產生，依賴于其他一張或多張資料表。在具體實現中，可以透過獲取資料倉庫中所有資料表之間的互訪問資料來確定第一依賴關係，資料的形式為<C：c，A：a>的二元組。意思是資料表C與資料表A存在掃描關係，並且資料表C的c欄位由資料表A的a欄位產生。 In the embodiment of the present invention, the first dependency relationship may be a dependency relationship between data tables or a scanning relationship. The dependency relationship or scanning relationship refers to the generation of a certain data table, which depends on one or more other data tables. In specific implementation, the first dependency relationship can be determined by obtaining mutual access data between all data tables in the data warehouse, and the data is in the form of a two-tuple of <C:c,A:a>. It means that there is a scanning relationship between data table C and data table A, and the c field of data table C is generated by the a field of data table A.

如圖1所示，資料表C的產生依賴於資料表A和資料表B，即可以認為資料表C與資料表A和資料表B具有相應的依賴關係或掃描關係。 As shown in Figure 1, the generation of data table C depends on data table A and data table B, that is, data table C can be considered to have a corresponding dependency or scanning relationship with data table A and data table B.

步驟102，依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；路徑長度是指兩張具有依賴關係的資料表之間的距離長度，對於具有直接依賴關係的資料表而言，一般可以認為其路徑長度為1，而對於間接依賴的資料表，其路徑長度可以透過在兩張資料表之間具有間接依賴關係的資料表的張數來確定。通常，對於具有間接依賴關係的資料表，其路徑長度可能不是唯一的，即具有多條路徑，因此，其路徑數目也相應不唯一。 Step 102: Count the path length and the number of paths between the data tables according to the first dependency relationship; the path length refers to the distance length between two data tables that have a dependency relationship, and for data that has a direct dependency relationship For tables, generally the path length can be considered to be 1, and for indirectly dependent data tables, the path length can be determined by the number of data tables that have an indirect dependency relationship between two data tables. Generally, for data tables with indirect dependencies, the path length may not be unique, that is, there are multiple paths, so the number of paths is correspondingly not unique.

在本發明的一種較佳實施例中，所述依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目的步驟具體可以包括如下子步驟：子步驟1021，針對所述第一依賴關係，構建所述資料表之間的有向無環圖；子步驟1022，統計所述有向無環圖中的路徑長度和路徑數目。 In a preferred embodiment of the present invention, the step of counting the path length and the number of paths between the data tables according to the first dependency relationship may specifically include the following sub-steps: sub-step 1021, for the The first dependency relationship is to construct a directed acyclic graph between the data tables; sub-step 1022, the path length and the number of paths in the directed acyclic graph are counted.

如果在一張圖中，它的每條邊都是有方向的，則這張圖可以被稱為有向圖。有向圖中的邊是由兩個頂點組成的有序對，有序對通常用尖括弧表示，如<vi,vj>表示一條有向邊，其中vi是邊的始點，vj是邊的終點。<vi,vj>和<vj,vi>代表兩條不同的有向邊。在圖論中，如果一個有向圖無法從某個頂點出發經過若干條邊回到該點，則這個圖是一個有向無環圖。 If in a graph, each side of it is directional, then the graph can be called a directed graph. An edge in a directed graph is an ordered pair composed of two vertices. An ordered pair is usually represented by angle brackets. For example, <vi,vj> represents a directed edge, where vi is the starting point of the edge and vj is the edge end. <vi,vj> and <vj,vi> represent two different directed edges. In graph theory, if a directed graph cannot start from a vertex and return to that point through several edges, then the graph is a directed acyclic graph.

在本發明實施例中，在獲得資料表之間的第一依賴關係後，可以依據所述第一依賴關係，構建出以資料資料表為節點的有向無環圖，透過統計所述有向無環圖中的路徑長度和路徑數目，可以直觀地獲得兩張資料表之間的路徑長度和路徑數目。例如，在圖1中，資料表A和資料表E之間的路徑只有一條，即路徑數目為1，該路徑的路徑長度為2。資料的形式為：<A-C-E>。代表資料表A與資料表E之間存在一條路徑為A->C->E。 In the embodiment of the present invention, the first dependency relationship between obtaining data tables After integration, a directed acyclic graph with data tables as nodes can be constructed according to the first dependency relationship. By counting the path length and the number of paths in the directed acyclic graph, two pieces of data can be obtained intuitively. The path length and the number of paths between the data tables. For example, in Figure 1, there is only one path between data table A and data table E, that is, the number of paths is 1, and the path length of the path is 2. The format of the data is: <A-C-E>. It means that there is a path A->C->E between data table A and data table E.

步驟103，獲取所述資料表中的一個或多個欄位的第二依賴關係；通常，一張資料表中可以包括有一個或多個欄位，由於資料表的產生可能依賴于其他一張或多張資料表，因此，資料表中的一個欄位的產生也可能依賴于其他一張或多張資料表中的一個或多個欄位。 Step 103: Obtain the second dependency relationship of one or more fields in the data table; usually, one or more fields may be included in a data table, because the generation of the data table may depend on other fields. Or multiple data tables, therefore, the generation of a field in the data table may also depend on one or more fields in other one or more data tables.

如圖3所示，是一種標注有欄位依賴關係的有向無環圖的示意圖。具體地，欄位依賴關係可以以<C：c1，A：a1^A：a2>的資料形式表示，意思是：資料表C中的欄位c1由資料表A中的欄位a1、a2產生。 As shown in Figure 3, it is a schematic diagram of a directed acyclic graph marked with field dependencies. Specifically, the field dependency relationship can be expressed in the data form of <C: c1, A: a1^A: a2>, which means: the field c1 in the data table C is generated by the fields a1 and a2 in the data table A .

其次，第二依賴關係還可以包括欄位在預設時間段內被使用的次數，即該欄位被下游資料表訪問的次數，以及該資料表的直接下游表數目，相應的資料形式為<A：a1,3,2>，表示資料表A的欄位a1在預設時間段內被下游資料表訪問了3次，資料表A的直接下游資料表有2張。通常，預設時間段可以是一天。當然，在實際中也可以根據需要將預設時間段設定為兩天或半天，本發明對此不作具體限定。 Secondly, the second dependency relationship can also include the number of times the field is used within the preset time period, that is, the number of times the field is accessed by the downstream data table, and the number of directly downstream tables of the data table. The corresponding data format is < A: a1,3,2>, it means that the column a1 of data table A has been accessed 3 times by the downstream data table within the preset time period, and there are 2 directly downstream data tables of data table A. Generally, the preset time period may be one day. Of course, in practice, the preset time period can also be set to two days or half a day as required. There is no specific limitation.

步驟104，依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；通常，資料表中的任一一個欄位都具有相應的欄位等級，不同的欄位具有不同的欄位。例如，欄位等級可以分為1，2，3，4四個級別，分別對應於可揭示、可共用、隱私資訊、絕密四種狀態的資料。其資料形式可以是<A：a1,1>，表示資料表A中的欄位a1的欄位等級為1級，即欄位a1可揭示。 Step 104: Determine the importance coefficient of the one or more fields according to the second dependency relationship; usually, any field in the data table has a corresponding field level, and different fields have Different fields. For example, the field level can be divided into four levels: 1, 2, 3, and 4, corresponding to data in four states: revealable, shareable, private information, and top secret. The data format can be <A: a1,1>, which means that the field level of the field a1 in the data table A is level 1, that is, the field a1 can be revealed.

在本發明的一種較佳實施例中，所述依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數的步驟具體可以包括如下子步驟：子步驟1041，獲取所述一個或多個欄位在預設時間段內的使用次數；子步驟1042，根據所述使用次數，和/或，欄位等級，確定所述一個或多個欄位的重要性係數。 In a preferred embodiment of the present invention, the step of determining the importance coefficient of the one or more fields according to the second dependency relationship may specifically include the following sub-steps: sub-step 1041, obtaining the The use times of one or more fields in a preset time period; sub-step 1042, according to the use times and/or the field level, determine the importance coefficient of the one or more fields.

在本發明實施例中，所述一個或多個欄位的重要性係數可以是與所述使用次數，和/或，所述欄位等級正相關。 In the embodiment of the present invention, the importance coefficient of the one or more fields may be positively correlated with the number of uses, and/or the rank of the field.

在具體實現中，可以採用如下公式，確定所述一個或多個欄位的重要性係數：

In specific implementation, the following formula may be used to determine the importance coefficient of the one or more fields:

其中，level_weight(a _i )為欄位a _i的欄位等級，use_cnt(a _i )為預設時間段內欄位a _i的使用次數，n為所述資料表中的欄位數量，α、β為比例係數，0<α、β<1。 Among them, level_weight (a _i ) is the field level of the field a _i , use_cnt (a _i ) is the number of times the field a _i is used in the preset time period , n is the number of fields in the data table, α , β is the proportional coefficient, 0< α , β <1.

步驟105，採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度；在本發明實施例中，路徑長度和路徑數目表示了資料表之間的關聯強度。例如，路徑長度越短，其聯繫越緊密；路徑數目越少，某一資料表對另一資料表越不可或缺。 Step 105: Use the path length, the number of paths, and the importance coefficient to determine the degree of association between the data tables; in the embodiment of the present invention, the path length and the number of paths indicate the strength of the association between the data tables. . For example, the shorter the path length, the closer its connection; the fewer the number of paths, the more indispensable one table is to another table.

因此，在確定獲得資料表中的一個或多個欄位的重要性係數後，可以進一步採用路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度。 Therefore, after determining the importance coefficient of one or more fields in the data table, the path length, the number of paths, and the importance coefficient may be further used to determine the degree of association between the data tables.

步驟106，依據所述關聯度，對所述資料表進行識別。 Step 106: Identify the data table according to the degree of association.

在本發明實施例中，在確定獲得資料表之間的關聯度後，可以依據所述關聯度對資料表進行識別，具體地，可以按照所述關聯度的大小，識別出資料業務所需的多張資料表。例如，對於某一個資料業務，可能用到L張資料表，那麼在分別確定獲得所述L張資料表的關聯度後，可以進一步篩選出這L張資料表中關聯度較大的topK張資料表，然後對所述topK張資料表進行重點運維和重點保障，以確保資料表的資料品質和產出時間。 In the embodiment of the present invention, after determining the degree of relevance between the data tables, the data tables can be identified according to the degree of relevance. Specifically, the data tables can be identified according to the degree of relevance. Multiple data sheets. For example, for a certain data business, L data tables may be used, then after the relevance of the L data tables is determined separately, the topK data of the L data tables with a larger degree of relevance can be further filtered Then, perform key operation and maintenance and key protection on the topK data tables to ensure the data quality and output time of the data tables.

在本發明實施例中，在依據第一依賴關係統計所述資料表之間的路徑長度和路徑數目，以及，依據第二依賴關係確定一個或多個欄位的重要性係數後，採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度，並依據所述關聯度，對所述資料表進行識別，使得在確定資料表的關聯度時從欄位細微性出發，透過欄位的使用情況、欄位本身的屬性、資料表的距離、資料表的連通性等維度，能夠科學合理地衡量出資料表之間的關聯度。 In the embodiment of the present invention, the path length and the number of paths between the data tables are counted according to the first dependency relationship, and according to the second dependency relationship. After determining the importance coefficient of one or more fields, the path length, the number of paths, and the importance coefficient are used to determine the degree of relevance between the data tables, and according to the degree of relevance, the The data table is identified, so that when determining the relevance of the data table, it can be scientifically reasonable through the use of the field, the attributes of the field itself, the distance of the data table, the connectivity of the data table, etc. Measure the degree of relevance between the data tables.

參照圖4，示出了本發明的一種資料表的識別方法實施例二的步驟流程圖，具體可以包括如下步驟： Referring to FIG. 4, there is shown a step flow chart of the second embodiment of a data table identification method of the present invention, which may specifically include the following steps:

步驟201，針對所述第一依賴關係，構建所述資料表之間的有向圖；在本發明實施例中，透過獲取資料倉庫中所有資料表之間的互訪問資料，可以構建出所述資料表之間的有向無環圖。 In step 201, a directed graph between the data tables is constructed for the first dependency relationship; in the embodiment of the present invention, the mutual access data between all the data tables in the data warehouse can be obtained to construct the Directed acyclic graph between data tables.

在本發明的一種較佳實施例中，所述針對所述第一依賴關係，構建所述資料表之間的有向圖的步驟具體可以包括如下子步驟：子步驟2011，按照所述第一依賴關係所對應的順序，構建以所述資料表為節點的有向圖；子步驟2012，刪除所述有向圖中的環，獲得所述資料表之間的有向無環圖。 In a preferred embodiment of the present invention, the step of constructing a directed graph between the data tables for the first dependency relationship may specifically include the following sub-steps: sub-step 2011, according to the first According to the sequence corresponding to the dependency relationship, construct a directed graph with the data tables as nodes; sub-step 2012, delete the rings in the directed graph, and obtain a directed acyclic graph between the data tables.

在具體實現中，在獲得資料表之間的互訪問資料即第一依賴關係後，可以按照所述第一依賴關係所對應的順序，首先構建出以所述資料表為節點的有向圖，進而透過刪除所述有向圖中的環，獲得資料表之間的有向無環圖。 In a specific implementation, after obtaining the mutual access data between the data tables, that is, the first dependency relationship, a directed graph with the data tables as nodes can be constructed according to the order corresponding to the first dependency relationship. And then through Delete the ring in the directed graph, and obtain the directed acyclic graph between the data tables.

如圖5所示，是一種具有環的有向圖的示意圖，其中，存在環ABCC以及環ABDA。 As shown in Fig. 5, it is a schematic diagram of a directed graph with rings, in which there are rings ABCC and ABDA.

在具體實現中，可以透過堆疊的方法，去除有向圖中的環。以某一資料表為起點，在每一步的遍歷中，當判斷出現環時，可以透過刪除出現環的有向線段，以刪除環。例如，以資料表A為起點，但遍歷至ABCC時出現了環，此時可以透過刪除資料表C自身的有向線段刪除環，當遍歷至ABDA時，可以透過刪除資料表D與資料表A之間的有向線段刪除環。 In specific implementation, the ring in the directed graph can be removed through the stacking method. Taking a certain data table as a starting point, in each step of the traversal, when it is judged that a ring appears, the ring can be deleted by deleting the directed line segment where the ring appears. For example, take data table A as the starting point, but a loop appears when traversing to ABCC. At this time, you can delete the ring by deleting the directed line segment of data table C. When traversing to ABDA, you can delete data table D and data table A. Delete the ring between the directed line segments.

步驟202，統計所述有向無環圖中的第一資料表與第二資料表之間的一條或多條路徑的長度，以及，所述第一資料表到任一資料表的路徑數目，和，所述第一資料表到任一資料表且經過第二資料表的路徑數目；在本發明實施例中，當需要確定第一資料表與第二資料表之間的關聯度時，可以首先在有向無環圖中統計出所述第一資料表與第二資料表之間的一條或多條路徑的長度，以及，所述第一資料表到任一資料表的路徑數目，和，所述第一資料表到任一資料表且經過第二資料表的路徑數目。 Step 202: Count the length of one or more paths between the first data table and the second data table in the directed acyclic graph, and the number of paths from the first data table to any data table. And, the number of paths from the first data table to any data table and passing through the second data table; in the embodiment of the present invention, when it is necessary to determine the degree of association between the first data table and the second data table, First, calculate the length of one or more paths between the first data table and the second data table in the directed acyclic graph, and the number of paths from the first data table to any data table, and , The number of paths from the first data table to any data table and passing through the second data table.

步驟203，確定所述資料表中一個或多個欄位的重要性係數；在具體實現中，可以首先獲取某一欄位在預設時間段內(通常為一天)的使用次數以及所述欄位的欄位等級，然後採用如下公式，確定所述欄位的重要性係數：

Step 203: Determine the importance coefficient of one or more fields in the data table; in specific implementation, you can first obtain the number of times a field is used in a preset time period (usually a day) and the field Then use the following formula to determine the importance coefficient of the field:

步驟204，採用所述第一資料表與第二資料表之間的一條或多條路徑的長度，確定第一資料表與第二資料表之間的距離係數；步驟205，採用所述第一資料表到任一資料表的路徑數目，和，所述第一資料表到任一資料表且經過第二資料表的路徑數目，確定第一資料表與第二資料表之間的連通係數；在本發明實施例中，在獲得資料表之間的路徑長度和路徑數目後，可以分別根據所述路徑長度和路徑數目，確定資料表之間的距離係數和連通係數。 In step 204, the length of one or more paths between the first data table and the second data table is used to determine the distance coefficient between the first data table and the second data table; step 205, the first data table is used The number of paths from the data table to any data table, and the number of paths from the first data table to any data table and passing through the second data table to determine the connectivity coefficient between the first data table and the second data table; In the embodiment of the present invention, after obtaining the path length and the number of paths between the data tables, the distance coefficient and the connection coefficient between the data tables can be determined according to the path length and the number of paths, respectively.

在具體實現中，可以採用如下公式，確定第一資料表與第二資料表之間的距離係數：

In specific implementation, the following formula can be used to determine the distance coefficient between the first data table and the second data table:

其中，step(A,B)表示第一資料表A到第二資料表B的一條路徑的長度，n為第一資料表A到第二資料表B的路徑數目；可以採用如下公式，確定第一資料表與第二資料表之間的連通係數，連通係數越大可以表示資料表之間的連通性越強：

Among them, step (A, B) represents the length of a path from the first data table A to the second data table B, n is the number of paths from the first data table A to the second data table B; the following formula can be used to determine the The connection coefficient between the first data table and the second data table. The larger the connection coefficient, the stronger the connection between the data tables:

其中，path_cnt(A,B,leaf)為第一資料表A到任一資料表且經過第二資料表B的路徑數目，path_cnt(A,null,leaf)為第一資料表A到任一資料表的路徑數目。 Among them, path_cnt(A,B,leaf) is the number of paths from the first data table A to any data table and passing through the second data table B, and path_cnt(A,null,leaf) is the first data table A to any data The number of paths to the table.

步驟206，採用所述第一資料表與第二資料表之間的距離係數，所述第一資料表與第二資料表之間的連通係數，以及第一資料表中的一個或多個欄位的重要性係數，第二資料表中的一個或多個欄位的重要性係數，確定第一資料表中的一個或多個欄位對第二資料表中的一個或多個欄位的關聯度；在本發明實施例中，當分別獲得資料表之間的距離係數、連通係數，以及資料表中一個或多個欄位的重要性係數後，可以採用上述距離係數、連通係數，以及重要性係數，確定第一資料表中的一個或多個欄位與具有依賴關係的第二資料表中的一個或多個欄位之間的關聯度。 Step 206, using the distance coefficient between the first data table and the second data table, the connection coefficient between the first data table and the second data table, and one or more columns in the first data table The importance coefficient of the position, the importance coefficient of one or more fields in the second data table, determines the effect of one or more fields in the first data table on one or more fields in the second data table Degree of relevance; in the embodiment of the present invention, when the distance coefficient, the connection coefficient between the data tables, and the importance coefficient of one or more fields in the data table are obtained respectively, the aforementioned distance coefficient, connection coefficient, and The importance coefficient determines the degree of relevance between one or more fields in the first data table and one or more fields in the second data table that has a dependent relationship.

在具體實現中個，可以採用如下公式，確定第一資料表中的一個或多個欄位對第二資料表中的一個或多個欄位的關聯度：

In specific implementation, the following formula can be used to determine the degree of relevance of one or more fields in the first data table to one or more fields in the second data table:

其中，i=1...N表示與第一資料表A中的欄位a _i存在依賴關係的資料表，m=1...n表示在第二資料表B中，與第一資料表A中的欄位a _i存在依賴關係的欄位bm，ρ、λ為比例係數，0<ρ、λ<1。 Among them, i = 1...N represents a data table that is dependent on the column a _i in the first data table A, and m = 1...n represents that in the second data table B, it is in a relationship with the first data table. The column a _{i in A} has a dependent column bm , ρ and λ are proportional coefficients, 0< ρ , λ <1.

weight(A,a _i ,B,b _i )可以代表資料表B中欄位b _i與資料表A中的a _i欄位具有依賴關係，以及欄位a _i與欄位b _i的關聯度大小。等式的右邊第一部分，代表的是資料表A與資料表B的綜合關聯度，綜合關聯度由兩個因數組成，分別是連通係數和關聯係數；等式的右邊第二部分，代表的是資料表B中欄位b _i在資料表B中與欄位a _i存在血緣關係的所有欄位中的權重。 weight(A,a _i ,B,b _i ) can represent that the column b _i in data table B has a dependent relationship with the a _i field in data table A, and the correlation between the field a _i and the field b _i . The first part on the right side of the equation represents the comprehensive correlation degree between table A and data table B. The comprehensive correlation degree is composed of two factors, namely the connectivity coefficient and the correlation coefficient; the second part on the right side of the equation represents All field data in table B field b _i kinship exists in the data table B with field a _i in weight.

步驟207，採用所述第一資料表中的一個或多個欄位對第二資料表中的一個或多個欄位的關聯度，確定第一資料表對第二資料表的關聯度；在本發明實施例中，在分別獲得一個或多個欄位的關聯度後，可以採用所述欄位之間的關聯度，確定資料表之間的關聯度。 Step 207, using the degree of relevance of one or more fields in the first data table to one or more fields in the second data table to determine the degree of relevance of the first data table to the second data table; In the embodiment of the present invention, after the relevance of one or more fields are obtained, the relevance between the fields may be used to determine the relevance between the data tables.

在具體實現中，可以採用如下公式，確定第一資料表對第二資料表的關聯度：

In specific implementation, the following formula can be used to determine the degree of relevance of the first data table to the second data table:

其中，M為在第一資料表A中，與第二資料表B中的欄位具有依賴關係的欄位數量，N為在第二資料表B中，與第一資料表A中的欄位具有依賴關係的欄位數量。 Among them, M is the number of fields in the first data table A that have a dependent relationship with the fields in the second data table B, and N is the number of fields in the second data table B that are in relation to the fields in the first data table A The number of fields with dependencies.

步驟208，按照所述關聯度的大小，識別出資料業務所需的多張資料表。 Step 208: Identify multiple data tables required for data services according to the degree of association.

在本發明的一種較佳實施例中，所述按照所述關聯度的大小，識別出資料業務所需的多張資料表的步驟具體可以包括如下子步驟：子步驟2081，分別獲取所述資料業務所需的資料表的關聯度大小；子步驟2082，根據所述關聯度大小，從所述資料業務所需的資料表中篩選出預設數量的多張資料表。 In a preferred embodiment of the present invention, the step of identifying multiple data tables required by the data service according to the degree of association may specifically include the following sub-steps: sub-step 2081, respectively obtaining the data The size of the correlation degree of the data tables required by the business; sub-step 2082, according to the size of the correlation degree, select a preset number of multiple data tables from the data tables required by the data business.

在具體實現中，在確定獲得資料表之間的關聯度後，可以依據所述關聯度對資料表進行識別。例如，對於某一個資料業務，可能用到L張資料表，那麼在分別確定獲得所述L張資料表的關聯度後，可以進一步篩選出這L張資料表中關聯度較大的topK張資料表，然後對所述topK張資料表進行重點運維和重點保障，以確保資料表的資料品質和產出時間。 In a specific implementation, after the degree of association between the data tables is determined to be obtained, the data tables can be identified according to the degree of association. For example, for a certain data business, L data tables may be used, then after the relevance of the L data tables is determined separately, the topK data of the L data tables with a larger degree of relevance can be further filtered Then, perform key operation and maintenance and key protection on the topK data tables to ensure the data quality and output time of the data tables.

在本發明實施例中，採用圖論的思想提出了資料表之間的連通係數和距離係數，作為資料表之間關聯度計量的兩個重要權重因數，並引入了資料表之間的層級關係，透過將兩張表之間的層級關係融入到距離係數中，來合理解決非直接依賴資料表之間的關聯度問題，避免了非直接依賴資料表之間關聯度隨著層級的變化衰減太快的問題。 In the embodiment of the present invention, the idea of graph theory is adopted to propose the connectivity coefficient and the distance coefficient between the data tables as two important weighting factors for the measurement of the correlation between the data tables, and the hierarchical relationship between the data tables is introduced. , By integrating the hierarchical relationship between the two tables into the distance coefficient, it can reasonably solve the problem of the degree of association between the indirect dependent data tables, and avoid the indirect dependent It depends on the problem that the degree of association between the data tables decays too quickly with the change of the level.

參照圖6，示出了本發明的一種資料表關聯度的確定方法實施例三的步驟流程圖，具體可以包括如下步驟：步驟301，獲取資料表之間的第一依賴關係；步驟302，依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；步驟303，獲取所述資料表中的一個或多個欄位之間的第二依賴關係；步驟304，依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；步驟305，採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度。 6, there is shown a step flow chart of Embodiment 3 of a method for determining the degree of association of data tables of the present invention, which may specifically include the following steps: Step 301: Obtain the first dependency relationship between the data tables; Step 302, according to The first dependency relationship is calculated based on the path length and the number of paths between the data tables; step 303, the second dependency relationship between one or more fields in the data table is acquired; step 304, based on The second dependency relationship is to determine the importance coefficient of the one or more fields; step 305, the path length, the number of paths, and the importance coefficient are used to determine the degree of association between the data tables.

由於步驟301-305與本發明的一種資料表的識別方法實施例一中的步驟101-105類似，相關之處參見資料表的識別方法實施例一的部分說明即可，本實施例在此不加以詳述。為了便於理解，下面以一個具體事例對資料表之間關聯度的確定方法作一說明。 Since steps 301-305 are similar to steps 101-105 in the first embodiment of a data table identification method of the present invention, please refer to the part of the description of the first embodiment of the data table identification method for related details. Be detailed. In order to facilitate understanding, a specific example is given below to illustrate the method of determining the degree of association between data tables.

以圖3所示的有向無環圖為例。 Take the directed acyclic graph shown in Figure 3 as an example.

資料表之間的第一依賴關係可以表示如下： The first dependency relationship between the data tables can be expressed as follows:

a)<A,C> a)<A,C>

b)<B,C> b)<B,C>

c)<C,E> c)<C,E>

d)<D,E> d)<D,E>

e)<A,C,E> e)<A,C,E>

f)<B,C,E> f)<B,C,E>

欄位之間的第二依賴關係可以表示如下： The second dependency relationship between the fields can be expressed as follows:

a)<C：c1,A：a1> a) <C: c1, A: a1>

b)<C：c1,A：a2> b)<C: c1, A: a2>

c)<C：c1,B：b1> c) <C: c1, B: b1>

d)<C：c2,A：a3> d)<C: c2, A: a3>

e)<C：c2,B：b2> e)<C: c2, B: b2>

f)<C：c2,B：b3> f)<C: c2, B: b3>

g)<E：e1,C：c1> g)<E: e1, C: c1>

h)<E：e1,D：d2> h)<E: e1, D: d2>

i)<E：e2,C：c2> i) <E: e2, C: c2>

資料表中各欄位等級可以表示如下： The rank of each column in the data table can be expressed as follows:

a)<A：a1,1> a)<A: a1,1>

b)<A：a2,1> b)<A: a2,1>

c)<A：a3,3> c) <A: a3,3>

d)<B：b1,2> d)<B: b1,2>

e)<B：b2,2> e)<B: b2,2>

f)<B：b3,3> f)<B: b3,3>

g)<C：c1,1> g)<C: c1,1>

h)<C：c2,3> h)<C:c2,3>

i)<D：d1,2> i)<D:d1,2>

j)<D：d2,3> j)<D:d2,3>

k)<E：c1,1> k)<E: c1,1>

l)<E：e2,2> l)<E:e2,2>

資料表欄位使用次數及下游資料表數量資料可以表示如下： Data table column usage times and downstream data table quantity data can be expressed as follows:

a)<A：a1,2,1> a)<A:a1,2,1>

b)<A：a2,3,1> b)<A: a2,3,1>

c)<A：a3,1,1> c)<A: a3,1,1>

d)<B：b1,2,1> d)<B: b1,2,1>

e)<B：b2,1,1> e)<B: b2,1,1>

f)<B：b3,2,1> f)<B: b3,2,1>

g)<C：c1,1,1> g)<C: c1,1,1>

h)<C：c2,1,1> h)<C: c2,1,1>

i)<D：d1,2,1> i)<D: d1,2,1>

j)<D：d2,1,1> j)<D: d2,1,1>

k)<E：e1,0,0> k)<E:e1,0,0>

l)<E：e2,0,0> l)<E: e2,0,0>

1、確定資料表A與資料表E的連通係數：conn_ratio(A,E)=1/1=1 1. Determine the connectivity coefficient between data table A and data table E: conn_ratio(A,E)=1/1=1

2、確定資料表A與資料表E的距離係數：length_ratio(A,E)=1/2 2. Determine the distance coefficient between data table A and data table E: length_ratio(A,E)=1/2

3、確定資料表A與資料表E的綜合關聯度(取比例係數為0.5)sum_score(A,E)=0.5 * 1+0.5 * 1/2=0.75 3. Determine the comprehensive correlation between data sheet A and data sheet E (take the scale coefficient as 0.5) sum_score(A,E)=0.5 * 1+0.5 * 1/2=0.75

4、從圖3中可知，與資料表E中的e2欄位存在依賴關係的資料表A中的欄位為a3。此外，與資料表E存在依賴關係的資料表除資料表A之外還有資料表C、B、D。因此：weight(A,a3,E,e2)=sum_score(A,E)/(sum_score(A,C)+sum_score(A,B)+sum_score(A,D)+sum_score(A,E))*(weight(a3)/(weight(a3)))=0.75/(1+0+0+0.75)* 1=3/7=0.43 4. It can be seen from Fig. 3 that the field in data table A that is dependent on the e2 field in data table E is a3. In addition, it exists with data sheet E In addition to the data table A, there are data tables C, B, and D in the data tables of the dependency relationship. Therefore: weight(A,a3,E,e2)=sum_score(A,E)/(sum_score(A,C)+sum_score(A,B)+sum_score(A,D)+sum_score(A,E))* (weight(a3)/(weight(a3)))=0.75/(1+0+0+0.75)* 1=3/7=0.43

5、由於資料表A與資料表E之間只在欄位a3與欄位e2之間存在依賴關係，因此attr(A,E)=0.43，即資料表A對資料表E的關聯度為0.43。 5. Because there is only a dependency between field a3 and field e2 between data table A and data table E, attr(A,E)=0.43, that is, the correlation degree of data table A to data table E is 0.43 .

參照圖7，示出了本發明的一種資料表的識別方法實施例四的步驟流程圖，具體可以包括如下步驟：步驟401，接收針對資料業務的識別指令；步驟402，將所述識別指令提交至伺服器；步驟403，接收伺服器發送的所述資料業務所關聯的資料表，其中，所述資料業務所關聯的資料表由所述伺服器針對所述識別指令，透過識別所述資料業務所關聯的資料表獲得；步驟404，展現所述資料業務所關聯的資料表。 Referring to FIG. 7, there is shown a flow chart of the fourth embodiment of a data table identification method of the present invention, which may specifically include the following steps: step 401, receiving identification instructions for data services; step 402, submitting the identification instructions To the server; step 403, receiving the data table associated with the data service sent by the server, wherein the data table associated with the data service is identified by the server for the identification command by identifying the data service The associated data table is obtained; step 404, the data table associated with the data service is displayed.

在本發明實施例中，當需要對資料業務所關聯的資料表進行識別時，可以向終端發送針對所述資料業務的識別指令，終端在接收到識別指令後，可以將所述識別指令提交至伺服器，由伺服器識別出所述資料業務所關聯的資料表，進而回饋至終端，終端在接收到伺服器回饋的所述資料業務所關聯的資料表後，可以在終端的使用者介面上展現所述資料表。 In the embodiment of the present invention, when the data table associated with the data service needs to be identified, an identification instruction for the data service can be sent to the terminal. After receiving the identification instruction, the terminal can submit the identification instruction to The server identifies the data table associated with the data service by the server, and then feeds it back to the terminal. After the terminal receives the data table associated with the data service returned by the server, it can display it on the user interface of the terminal exhibition The data sheet is now described.

參照圖8，示出了本發明的一種資料表的識別方法實施例五的步驟流程圖，具體可以包括如下步驟：步驟501，接收由終端提交的針對資料業務的識別指令；步驟502，針對所述識別指令，識別所述資料業務所關聯的資料表；步驟503，向終端發送所述資料業務所關聯的資料表。 Referring to FIG. 8, there is shown a flow chart of the fifth embodiment of a method for identifying data tables of the present invention, which may specifically include the following steps: step 501, receiving an identification instruction for data services submitted by a terminal; step 502, for all The identification instruction identifies the data table associated with the data service; step 503, sends the data table associated with the data service to the terminal.

在本發明實施例中，伺服器在接收到到由終端提交的針對某一資料業務的識別指令後，可以針對所述識別指令，識別出所述資料業務所關聯的資料表，然後向終端回饋所述資料表。 In the embodiment of the present invention, after receiving the identification instruction for a certain data service submitted by the terminal, the server can identify the data table associated with the data service in response to the identification instruction, and then feedback to the terminal The data sheet.

在本發明的一種較佳實施例中，所述針對所述識別指令，識別所述資料業務所關聯的資料表的步驟具體可以包括如下子步驟：子步驟5031，獲取資料表之間的第一依賴關係；子步驟5032，依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；子步驟5033，獲取所述資料表中的一個或多個欄位之間的第二依賴關係；子步驟5034，依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；子步驟5035，採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度；子步驟5036，依據所述關聯度，對所述資料表進行識別。 In a preferred embodiment of the present invention, the step of identifying the data table associated with the data service for the identification instruction may specifically include the following sub-steps: sub-step 5031, obtaining the first data table between the data tables. Dependency relationship; sub-step 5032, according to the first dependency relationship, count the path length and the number of paths between the data tables; sub-step 5033, obtain the first relationship between one or more fields in the data table Second dependency relationship; sub-step 5034, according to the second dependency relationship, determine the importance coefficient of the one or more fields; sub-step 5035, use the path length and the number of paths to And, the importance coefficient determines the degree of association between the data tables; sub-step 5036, according to the degree of association, the data tables are identified.

由於子步驟5031-5036與本發明的一種資料表的識別方法實施例一中的步驟101-106類似，相關之處參見資料表的識別方法實施例一的部分說明即可，本實施例在此不加以詳述。 Since the sub-steps 5031-5036 are similar to the steps 101-106 in the first embodiment of a data table identification method of the present invention, please refer to the part of the description of the first embodiment of the data table identification method. This embodiment is here. Do not elaborate.

需要說明的是，對於方法實施例，為了簡單描述，故將其都表述為一系列的動作組合，但是本領域技術人員應該知悉，本發明實施例並不受所描述的動作順序的限制，因為依據本發明實施例，某些步驟可以採用其他順序或者同時進行。其次，本領域技術人員也應該知悉，說明書中所描述的實施例均屬於較佳實施例，所關於的動作並不一定是本發明實施例所必須的。 It should be noted that for the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the embodiments of the present invention are not limited by the described sequence of actions, because According to the embodiments of the present invention, certain steps may be performed in other order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the related actions are not necessarily required by the embodiments of the present invention.

參照圖9，示出了本發明的一種資料表的識別裝置實施例一的結構方塊圖，具體可以包括如下模組：第一接收模組601，用於接收針對資料業務的識別指令；提交模組602，用於將所述識別指令提交至伺服器；第二接收模組603，用於接收伺服器發送的所述資料業務所關聯的資料表，其中，所述資料業務所關聯的資料表可以是由所述伺服器針對所述識別指令，透過識別所述資料業務所關聯的資料表獲得；展現模組604，用於展現所述資料業務所關聯的資料表。 9, there is shown a structural block diagram of Embodiment 1 of a data table identification device of the present invention, which may specifically include the following modules: a first receiving module 601 for receiving identification instructions for data services; a submission module The group 602 is used to submit the identification instruction to the server; the second receiving module 603 is used to receive the data table associated with the data service sent by the server, wherein the data table associated with the data service It can be obtained by the server by identifying the data table associated with the data service for the identification instruction; the display module 604 is used to display the data associated with the data service surface.

參照圖10，示出了本發明的一種資料表的識別裝置實施例二的結構方塊圖，具體可以包括如下模組：第三接收模組701，用於接收由終端提交的針對資料業務的識別指令；識別模組702，用於針對所述識別指令，識別所述資料業務所關聯的資料表；發送模組703，用於向終端發送所述資料業務所關聯的資料表。 10, there is shown a structural block diagram of the second embodiment of a data table identification device of the present invention, which may specifically include the following modules: a third receiving module 701 for receiving data service identification submitted by the terminal Instruction; identification module 702, for identifying the data table associated with the data service for the identification instruction; sending module 703, for sending the data table associated with the data service to the terminal.

在本發明實施例中，所述識別模組702具體可以包括如下子模組：第一依賴關係獲取子模組7021，用於獲取資料表之間的第一依賴關係；路徑長度和路徑數目統計子模組7022，用於依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；第二依賴關係獲取模組子7023，用於獲取所述資料表中的一個或多個欄位的第二依賴關係；重要性係數確定子模組7024，用於依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；關聯度確定模組子7025，用於採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度；資料表識別子模組7026，用於依據所述關聯度，對所述資料表進行識別。 In the embodiment of the present invention, the identification module 702 may specifically include the following sub-modules: a first dependency relationship acquisition sub-module 7021 for acquiring the first dependency relationship between data tables; path length and path number statistics The sub-module 7022 is used to count the path length and the number of paths between the data tables according to the first dependency relationship; the second dependency-relation acquisition module 7023 is used to obtain one or the number of the data tables The second dependency relationship of a plurality of fields; the importance coefficient determination sub-module 7024, which is used to determine the importance coefficient of the one or more fields according to the second dependence relationship; the correlation degree determination module 7025 , Used to determine the degree of association between the data tables by using the path length, the number of paths, and the importance coefficient; the data table identification sub-module 7026 is used to determine the degree of association between the data tables according to the degree of association The data sheet is identified.

在本發明實施例中，所述路徑長度和路徑數目統計子模組7022具體可以包括如下單元：有向無環圖構建單元，用於針對所述第一依賴關係，構建所述資料表之間的有向無環圖；路徑長度和路徑數目統計單元，用於統計所述有向無環圖中的路徑長度和路徑數目。 In the embodiment of the present invention, the path length and path number statistics submodule 7022 may specifically include the following units: a directed acyclic graph construction unit, which is used to construct the relationship between the data tables for the first dependency relationship. The path length and path number statistical unit is used to count the path length and the number of paths in the directed acyclic graph.

在本發明實施例中，所述有向無環圖構建單元具體可以包括如下子單元：無環圖構建子單元，用於按照所述第一依賴關係所對應的順序，構建以所述資料表為節點的有向圖；有向無環圖獲得子單元，用於刪除所述有向圖中的環，獲得所述資料表之間的有向無環圖。 In the embodiment of the present invention, the directed acyclic graph construction unit may specifically include the following subunits: the acyclic graph construction subunit is configured to construct the data table according to the order corresponding to the first dependency relationship. It is a directed graph of nodes; a directed acyclic graph obtaining subunit is used to delete a ring in the directed graph and obtain a directed acyclic graph between the data tables.

在本發明實施例中，所述路徑長度和路徑數目統計單元具體可以包括如下子單元：路徑長度統計子單元，用於統計所述有向無環圖中的第一資料表與第二資料表之間的一條或多條路徑的長度，以及，路徑數目統計子單元，用於統計所述第一資料表到任一資料表的路徑數目，和，所述第一資料表到任一資料表且經過第二資料表的路徑數目。 In the embodiment of the present invention, the path length and path number statistics unit may specifically include the following subunits: a path length statistics subunit for counting the first data table and the second data table in the directed acyclic graph The length of one or more paths between and the path number counting subunit is used to count the number of paths from the first data table to any data table, and, from the first data table to any data table And the number of paths passing through the second data table.

在本發明實施例中，所述重要性係數確定子模組7024具體可以包括如下單元：使用次數獲取單元，用於獲取所述一個或多個欄位在預設時間段內的使用次數，所述一個或多個欄位可以具有對應的欄位等級；重要性係數確定單元，用於根據所述使用次數，和/或，欄位等級，確定所述一個或多個欄位的重要性係數，其中，所述一個或多個欄位的重要性係數與所述使用次數，和/或，所述欄位等級正相關。 In the embodiment of the present invention, the importance coefficient determining sub-module 7024 may specifically include the following unit: a usage frequency obtaining unit, configured to obtain the status of the one or more fields For the number of uses in a preset time period, the one or more fields may have a corresponding field level; the importance coefficient determining unit is configured to determine the number of uses and/or the field level according to the number of uses and/or the field level. The importance coefficient of one or more fields, wherein the importance coefficient of the one or more fields is positively correlated with the number of uses, and/or the rank of the field.

其中，level_weight(a _i )為欄位a _i的欄位等級，use_cnt(a _i )為預設時間段內欄位a _i的使用次數，n為所述資料表中的欄位數量。 Among them, level_weight (a _i ) is the field level of the field a _i , use_cnt (a _i ) is the number of times the field a _i is used in the preset time period , and n is the number of fields in the data table.

在本發明實施例中，所述關聯度確定子模組7025具體可以包括如下單元：距離係數確定單元，用於採用所述第一資料表與第二資料表之間的一條或多條路徑的長度，確定第一資料表與第二資料表之間的距離係數；連通係數確定單元，用於採用所述第一資料表到任一資料表的路徑數目，和，所述第一資料表到任一資料表且經過第二資料表的路徑數目，確定第一資料表與第二資料表之間的連通係數；欄位關聯度確定單元，用於採用所述第一資料表與第二資料表之間的距離係數，所述第一資料表與第二資料表之間的連通係數，以及第一資料表中的一個或多個欄位的重要性係數，第二資料表中的一個或多個欄位的重要性係數，確定第一資料表中的一個或多個欄位對第二資料表中的一個或多個欄位的關聯度，所述第一資料表中的一個或多個欄位與第二資料表中的一個或多個欄位具有依賴關係；資料表關聯度確定單元，用於採用所述第一資料表中的一個或多個欄位對第二資料表中的一個或多個欄位的關聯度，確定第一資料表對第二資料表的關聯度。 In the embodiment of the present invention, the correlation degree determining submodule 7025 may specifically include the following unit: a distance coefficient determining unit for adopting one or more paths between the first data table and the second data table The length determines the distance coefficient between the first data table and the second data table; the connection coefficient determining unit is used to adopt the number of paths from the first data table to any data table, and, the first data table to Any data table and the number of paths passing through the second data table determine the connectivity coefficient between the first data table and the second data table; the field association degree determination unit is used to use the first data table and the second data table The distance coefficient between the two data tables, the connectivity coefficient between the first data table and the second data table, and the importance coefficient of one or more fields in the first data table, the second data table The importance coefficient of one or more fields determines the degree of relevance of one or more fields in the first data table to one or more fields in the second data table, and one of the first data tables Or multiple fields have a dependency relationship with one or more fields in the second data table; the data table association degree determining unit is used to use one or more fields in the first data table to compare the second data The degree of relevance of one or more fields in the table determines the degree of relevance of the first data table to the second data table.

在本發明實施例中，可以採用如下公式，確定第一資料表與第二資料表之間的距離係數：

In the embodiment of the present invention, the following formula may be used to determine the distance coefficient between the first data table and the second data table:

其中，step(A,B)表示第一資料表A到第二資料表B的一條路徑的長度，n為第一資料表A到第二資料表B的路徑數量；可以採用如下公式，確定第一資料表與第二資料表之間的連通係數：

Among them, step (A, B) represents the length of a path from the first data table A to the second data table B, n is the number of paths from the first data table A to the second data table B; the following formula can be used to determine the The connection coefficient between the first data table and the second data table:

其中，path_cnt(A,B,leaf)為第一資料表A到任一資料表且經過第二資料表B的路徑數目，path_cnt(A,null,leaf) 為第一資料表A到任一資料表的路徑數目；可以採用如下公式，確定第一資料表中的一個或多個欄位對第二資料表中的一個或多個欄位的關聯度：

Among them, path_cnt(A,B,leaf) is the number of paths from the first data table A to any data table and passing through the second data table B, and path_cnt(A,null,leaf) is the first data table A to any data The number of paths in the table; the following formula can be used to determine the degree of relevance of one or more fields in the first data table to one or more fields in the second data table:

其中，i=1...N表示與第一資料表A中的欄位a _i存在依賴關係的資料表，m=1...n表示在第二資料表B中，與第一資料表A中的欄位a _i存在依賴關係的欄位bm；可以採用如下公式，確定第一資料表對第二資料表的關聯度：

Among them, i = 1...N represents a data table that is dependent on the column a _i in the first data table A, and m = 1...n represents that in the second data table B, it is in a relationship with the first data table. The fields a _{i in A} are dependent on the fields bm ; the following formula can be used to determine the degree of relevance between the first data table and the second data table:

在本發明實施例中，所述資料表識別子模組7026具體可以包括如下單元：資料表識別單元，用於按照所述關聯度的大小，識別出資料業務所需的多張資料表。 In the embodiment of the present invention, the data table identification sub-module 7026 may specifically include the following units: a data table identification unit for identifying multiple data tables required for data services according to the degree of association.

在本發明實施例中，所述資料表識別單元具體可以包括如下子單元：資料表關聯度獲取子單元，用於分別獲取所述資料業務所需的資料表的關聯度大小；資料表篩選子單元，用於根據所述關聯度大小，從所述資料業務所需的資料表中篩選出預設數量的多張資料表。 In the embodiment of the present invention, the data table identification unit may specifically include the following subunits: The data table relevance degree obtaining subunit is used to obtain the relevance degree of the data table required by the data service; the data table screening subunit is used to obtain the data required by the data service according to the relevance degree. A preset number of multiple data tables are filtered out in the table.

參照圖11，示出了本發明的一種資料表的識別裝置實施例三的結構方塊圖，具體可以包括如下模組：第一依賴關係獲取模組801，用於獲取資料表之間的第一依賴關係；路徑長度和路徑數目統計模組802，用於依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；第二依賴關係獲取模組803，用於獲取所述資料表中的一個或多個欄位之間的第二依賴關係；重要性係數確定模組804，用於依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；關聯度確定模組805，用於採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度；資料表識別模組806，用於依據所述關聯度，對所述資料表進行識別。 11, there is shown a structural block diagram of Embodiment 3 of a data table identification device of the present invention, which may specifically include the following modules: a first dependency acquisition module 801, which is used to acquire the first dependency between the data tables. Dependency; the path length and path number statistics module 802 is used to count the path length and the number of paths between the data tables according to the first dependency relationship; the second dependency relationship acquisition module 803 is used to acquire all The second dependence relationship between one or more fields in the data table; the importance coefficient determination module 804 is used to determine the importance coefficient of the one or more fields according to the second dependence relationship Relation degree determination module 805, used to determine the degree of relevance between the data tables using the path length, number of paths, and importance coefficient; data table identification module 806, used to determine the degree of relevance between the data tables according to the degree of relevance , To identify the data table.

參照圖12，示出了本發明的一種資料表關聯度的確定裝置實施例四的結構方塊圖，具體可以包括如下模組：第一依賴關係獲取模組901，用於獲取資料表之間的第一依賴關係；路徑長度和路徑數目統計模組902，用於依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；第二依賴關係獲取模組903，用於獲取所述資料表中的一個或多個欄位之間的第二依賴關係；重要性係數確定模組904，用於依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；關聯度確定模組905，用於採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度。 12, there is shown a structural block diagram of Embodiment 4 of a device for determining the degree of association of data tables according to the present invention, which may specifically include the following modules: The first dependency acquisition module 901 is used to obtain information between data tables. The first dependency relationship; the path length and path number statistics module 902, used to count the path length and the number of paths between the data tables according to the first dependency relationship; the second dependency relationship acquisition module 903, used to Acquire the second dependency relationship between one or more fields in the data table; the importance coefficient determination module 904 is used to determine the importance of the one or more fields according to the second dependency relationship The degree of association determination module 905 is used to determine the degree of association between the data tables by using the path length, the number of paths, and the importance coefficient.

對於裝置實施例而言，由於其與方法實施例基本相似，所以描述的比較簡單，相關之處參見方法實施例的部分說明即可。 As for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.

參照圖13，示出了本發明的一種資料表的識別系統的架構圖，所述系統可以包括終端和伺服器，所述終端可以執行如下動作：接收針對資料業務的識別指令；將所述識別指令提交至伺服器；接收伺服器發送的所述資料業務所關聯的資料表，其中，所述資料業務所關聯的資料表由所述伺服器針對所述識別指令，透過識別所述資料業務所關聯的資料表獲得；展現所述資料業務所關聯的資料表；所述伺服器可以執行如下動作：接收針對資料業務的識別指令；針對所述識別指令，對所述資料業務所關聯的資料表進行識別；輸出所述資料業務所關聯的資料表。 Referring to FIG. 13, there is shown an architecture diagram of a data table identification system of the present invention. The system may include a terminal and a server. The terminal may perform the following actions: receive an identification instruction for a data service; The instruction is submitted to the server; the data table associated with the data service sent by the server is received, wherein the data table associated with the data service is determined by the server for the identification instruction by identifying the data service office The associated data table is obtained; the data table associated with the data service is displayed; the server can perform the following actions: Receive an identification instruction for the data service; identify the data table associated with the data service for the identification instruction; output the data table associated with the data service.

在本發明的實施例中，所述針對所述識別指令，對所述資料業務所關聯的資料表進行識別的步驟具體可以包括如下子步驟：獲取資料表之間的第一依賴關係；依據所述第一依賴關係，統計所述資料表之間的路徑長度和路徑數目；獲取所述資料表中的一個或多個欄位之間的第二依賴關係；依據所述第二依賴關係，確定所述一個或多個欄位的重要性係數；採用所述路徑長度、路徑數目，以及，重要性係數，確定所述資料表之間的關聯度；依據所述關聯度，對所述資料表進行識別。 In the embodiment of the present invention, the step of identifying the data table associated with the data service for the identification instruction may specifically include the following sub-steps: obtaining the first dependency relationship between the data tables; The first dependency relationship, the path length and the number of paths between the data tables are counted; the second dependency relationship between one or more fields in the data table is acquired; according to the second dependency relationship, it is determined The importance coefficient of the one or more fields; the path length, the number of paths, and the importance coefficient are used to determine the degree of relevance between the data tables; according to the degree of relevance, the data table Identify it.

本說明書中的各個實施例均採用遞進的方式描述，每個實施例重點說明的都是與其他實施例的不同之處，各個實施例之間相同相似的部分互相參見即可。 The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other.

本領域內的技術人員應明白，本發明實施例的實施例可提供為方法、裝置、或電腦程式產品。因此，本發明實施例可採用完全硬體實施例、完全軟體實施例、或結合軟體和硬體方面的實施例的形式。而且，本發明實施例可採用在一個或多個其中包含有電腦可用程式碼的電腦可用儲存媒體(包括但不限於磁碟記憶體、CD-ROM、光學記憶體等)上實施的電腦程式產品的形式。 Those skilled in the art should understand that the embodiments of the present invention can be provided as methods, devices, or computer program products. Therefore, the embodiments of the present invention may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware. Moreover, the embodiments of the present invention may adopt A form of computer program product implemented on one or more computer-usable storage media (including but not limited to disk memory, CD-ROM, optical memory, etc.) containing computer-usable program codes.

在一個典型的配置中，所述電腦設備包括一個或多個處理器(CPU)、輸入/輸出介面、網路介面和記憶體。記憶體可能包括電腦可讀媒體中的非永久性記憶體，隨機存取記憶體(RAM)和/或非易失性記憶體等形式，如唯讀記憶體(ROM)或快閃記憶體(flash RAM)。記憶體是電腦可讀媒體的示例。電腦可讀媒體包括永久性和非永久性、可移動和非可移動媒體可以由任何方法或技術來實現資訊儲存。資訊可以是電腦可讀指令、資料結構、程式的模組或其他資料。電腦的儲存媒體的例子包括，但不限於相變記憶體(PRAM)、靜態隨機存取記憶體(SRAM)、動態隨機存取記憶體(DRAM)、其他類型的隨機存取記憶體(RAM)、唯讀記憶體(ROM)、電可擦除可程式設計唯讀記憶體(EEPROM)、快閃記憶體或其他記憶體技術、唯讀光碟唯讀記憶體(CD-ROM)、數位多功能光碟(DVD)或其他光學儲存、磁盒式磁帶，磁帶磁磁片儲存或其他磁性存放裝置或任何其他非傳輸媒體，可用於儲存可以被計算設備訪問的資訊。按照本文中的界定，電腦可讀媒體不包括暫態性的電腦可讀媒體(transitory media)，如調變的資料信號和載波。 In a typical configuration, the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. Memory may include non-permanent memory in computer-readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory ( flash RAM). Memory is an example of computer-readable media. Computer-readable media includes permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), and other types of random access memory (RAM) , Read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital multi-function Optical discs (DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transient computer-readable media (transitory media), such as modulated data signals and carrier waves.

本發明實施例是參照根據本發明實施例的方法、終端設備(系統)、和電腦程式產品的流程圖和/或方塊圖來描述的。應理解可由電腦程式指令實現流程圖和/或方塊圖中的每一流程和/或方塊、以及流程圖和/或方塊圖中的流程和/或方塊的結合。可提供這些電腦程式指令到通用電腦、專用電腦、嵌入式處理機或其他可程式設計資料處理終端設備的處理器以產生一個機器，使得透過電腦或其他可程式設計資料處理終端設備的處理器執行的指令產生用於實現在流程圖一個流程或多個流程和/或方塊圖一個方塊或多個方塊中指定的功能的裝置。 The embodiment of the present invention is described with reference to the flowchart and/or block diagram of the method, terminal device (system), and computer program product according to the embodiment of the present invention of. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processors of general-purpose computers, dedicated computers, embedded processors or other programmable data processing terminal equipment to generate a machine, which can be executed by the processor of the computer or other programmable data processing terminal equipment The instructions generate a device for implementing the functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

這些電腦程式指令也可儲存在能引導電腦或其他可程式設計資料處理終端設備以特定方式工作的電腦可讀記憶體中，使得儲存在該電腦可讀記憶體中的指令產生包括指令裝置的製造品，該指令裝置實現在流程圖一個流程或多個流程和/或方塊圖一個方塊或多個方塊中指定的功能。 These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing terminal equipment to work in a specific manner, so that the instructions stored in the computer-readable memory can be generated including the manufacturing of the instruction device The instruction device realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

這些電腦程式指令也可裝載到電腦或其他可程式設計資料處理終端設備上，使得在電腦或其他可程式設計終端設備上執行一系列操作步驟以產生電腦實現的處理，從而在電腦或其他可程式設計終端設備上執行的指令提供用於實現在流程圖一個流程或多個流程和/或方塊圖一個方塊或多個方塊中指定的功能的步驟。 These computer program instructions can also be loaded on a computer or other programmable data processing terminal equipment, so that a series of operation steps are executed on the computer or other programmable terminal equipment to produce computer-implemented processing, so that the computer or other programmable terminal equipment The instructions executed on the design terminal device provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

儘管已描述了本發明實施例的較佳實施例，但本領域內的技術人員一旦得知了基本創造性概念，則可對這些實施例做出另外的變更和修改。所以，所附申請專利範圍意欲解釋為包括較佳實施例以及落入本發明實施例範圍的所有變更和修改。 Although the preferred embodiments of the embodiments of the present invention have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the scope of the attached patent application is intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the embodiments of the present invention.

最後，還需要說明的是，在本文中，諸如第一和第二等之類的關係術語僅僅用來將一個實體或者操作與另一個實體或操作區分開來，而不一定要求或者暗示這些實體或操作之間存在任何這種實際的關係或者順序。而且，術語“包括”、“包含”或者其任何其他變體意在涵蓋非排他性的包含，從而使得包括一系列要素的過程、方法、物品或者終端設備不僅包括那些要素，而且還包括沒有明確列出的其他要素，或者是還包括為這種過程、方法、物品或者終端設備所固有的要素。在沒有更多限制的情況下，由語句“包括一個......”限定的要素，並不排除在包括所述要素的過程、方法、物品或者終端設備中還存在另外的相同要素。 Finally, it should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. Or there is any such actual relationship or sequence between operations. Moreover, the terms "including", "including" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or terminal device including a series of elements not only includes those elements, but also includes those elements that are not explicitly listed. Other elements listed, or also include elements inherent to this process, method, article, or terminal device. Without more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other same elements in the process, method, article, or terminal device that includes the element.

以上對本發明所提供的一種資料表的識別方法、一種資料表關聯度的確定方法、一種資料表的識別裝置、一種資料表關聯度的確定裝置和一種資料表的識別系統，進行了詳細介紹，本文中應用了具體個例對本發明的原理及實施方式進行了闡述，以上實施例的說明只是用於幫助理解本發明的方法及其核心思想；同時，對於本領域的一般技術人員，依據本發明的思想，在具體實施方式及應用範圍上均會有改變之處，綜上所述，本說明書內容不應理解為對本發明的限制。 The above provides a detailed introduction to a data table identification method, a data table relevance determination method, a data table recognition device, a data table relevance determination device, and a data table recognition system provided by the present invention. Specific examples are used in this article to illustrate the principles and implementation of the present invention. The descriptions of the above examples are only used to help understand the methods and core ideas of the present invention; at the same time, for those of ordinary skill in the art, according to the present invention The idea of, there will be changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as a limitation of the present invention.

Claims

A data table identification system, the system comprising a terminal and a server, characterized in that the terminal executes: receiving an identification instruction for a data service; submitting the identification instruction to the server; receiving the data service associated with the data service sent by the server The data table associated with the data service is obtained by the server by identifying the data table associated with the data service for the identification command; the data table associated with the data service is displayed; the server executes: Receive an identification instruction for the data business; for the identification instruction, identify the data table associated with the data business; and output the data table associated with the data business, where the identification instruction is associated with the data business The steps of identifying the data tables include: obtaining the first dependency relationship between the data tables; according to the first dependency relationship, counting the path length and the number of paths between the data tables; obtaining one or more of the data tables The second dependence relationship between the fields; according to the second dependence relationship, determine the importance coefficient of the one or more fields; The path length, the number of paths, and the importance coefficient are used to determine the degree of relevance between the data tables; and the data table is identified based on the degree of relevance.

A method for identifying a data table, characterized in that the method includes: receiving an identification instruction for a data service; submitting the identification instruction to a server; receiving a data table associated with the data service sent by the server, wherein the data The data table associated with the business is obtained by the server by identifying the data table associated with the data service for the identification command; and displays the data table associated with the data service, where the data table associated with the data service is the basis The degree of relevance between the data tables is obtained by identifying the data table; the degree of relevance between the data tables is determined by the path length, path data, and the importance coefficient of one or more fields; the one or more fields The importance coefficient of the bit is determined according to the second dependency relationship between one or more fields in the obtained data table; the path length and the number of paths between the data tables are determined according to the first one between the obtained data tables Dependency statistics are obtained.

A method for identifying data tables, characterized in that the method includes: receiving an identification instruction for data services submitted by a terminal; identifying the data table associated with the data business in response to the identification instruction; and sending the data business office to the terminal Associated data tables, Wherein, the step of identifying the data table associated with the data service for the identification instruction includes: obtaining a first dependency relationship between the data tables; and counting the path length and the path between the data tables according to the first dependency relationship Number; Obtain the second dependency relationship between one or more fields in the data table; Determine the importance coefficient of the one or more fields according to the second dependency relationship; Use the path length and the number of paths, And, the importance coefficient determines the degree of relevance between the data tables; and identifies the data table based on the degree of relevance.

The method according to item 3 of the scope of patent application, wherein the step of counting the path length and the number of paths between the data tables according to the first dependency relationship includes: constructing the data table for the first dependency relationship Directed acyclic graph in between; and count the path length and the number of paths in the directed acyclic graph.

The method according to item 4 of the scope of patent application, wherein the step of constructing a directed graph between the data tables for the first dependency relationship includes: constructing the first dependency relationship according to the sequence corresponding to the first dependency relationship. The data table is a directed graph of nodes; and delete the rings in the directed graph to obtain the directed acyclics between the data tables picture.

The method according to item 4 or 5 of the scope of patent application, wherein the step of counting the path length and the number of paths in the directed acyclic graph includes: counting the first data table and the first data table in the directed acyclic graph The length of one or more paths between two data tables, and the number of paths from the first data table to any data table, and, the number of paths from the first data table to any data table and passing through the second data table .

The method according to item 3 of the scope of patent application, wherein the step of determining the importance coefficient of the one or more fields according to the second dependency relationship includes: obtaining the one or more fields at a preset time The number of uses in the segment, the one or more fields have a corresponding field level; and according to the number of uses, and/or the field level, the importance coefficient of the one or more fields is determined, where the The importance coefficient of one or more fields is positively correlated with the number of uses, and/or the rank of the field.

According to the method described in item 3 of the scope of patent application, the step of using the path length, the number of paths, and the importance coefficient to determine the degree of relevance between the data tables includes: using the first data table and the first data table The length of one or more paths between two data tables determines the distance coefficient between the first data table and the second data table; the number of paths from the first data table to any data table is used, and the first The number of paths from the data table to any data table and through the second data table, Determine the connection coefficient between the first data table and the second data table; use the distance coefficient between the first data table and the second data table, the connection coefficient between the first data table and the second data table, and The importance coefficient of one or more fields in the first data table, and the importance coefficient of one or more fields in the second data table determine that one or more fields in the first data table are The degree of relevance of one or more fields in the data table, one or more fields in the first data table and one or more fields in the second data table have a dependency relationship; and the first data is adopted The degree of relevance of one or more fields in the table to one or more fields in the second data table determines the degree of relevance of the first data table to the second data table.

The method according to item 3 or 4 or 5 or 7 or 8 of the scope of patent application, wherein the step of identifying the data table according to the degree of relevance includes: identifying the data business office according to the degree of relevance Multiple data sheets required.

The method according to item 9 of the scope of patent application, wherein the step of identifying the multiple data tables required by the data business according to the degree of relevance includes: obtaining the relevance degrees of the data tables required by the data business respectively Size; and according to the degree of relevance, filter out a preset number of multiple data tables from the data tables required by the data business.

A method for identifying data tables, characterized in that the method includes Including: Obtain the first dependency relationship between data tables; According to the first dependency relationship, count the path length and the number of paths between the data tables; Obtain the second relationship between one or more fields in the data table Dependency relationship; according to the second dependency relationship, determine the importance coefficient of the one or more fields; use the path length, the number of paths, and the importance coefficient to determine the degree of relevance between the data tables; and according to the Relevance to identify the data table.

A method for determining the degree of association of data tables, characterized in that the method includes: obtaining a first dependency relationship between data tables; counting the path length and the number of paths between the data tables according to the first dependency relationship; The second dependency relationship between one or more fields in the data table; according to the second dependency relationship, the importance coefficient of the one or more fields is determined; and the path length, the number of paths, and the importance are used The coefficient of sex determines the degree of relevance between the data tables.

A device for identifying data tables, characterized in that the device includes: The first receiving module is used to receive the identification instruction for the data service; the submission module is used to submit the identification instruction to the server; the second receiving module is used to receive the information associated with the data service sent by the server A data table, wherein the data table associated with the data service is obtained by the server by identifying the data table associated with the data service in response to the identification command; and a display module for displaying the data table associated with the data service , Where the data table associated with the data business is obtained by identifying the data table based on the degree of association between the data tables; the degree of association between the data tables is determined by path length, path data, and one or more columns The importance coefficient of the position is determined; the importance coefficient of the one or more fields is determined according to the obtained second dependency relationship between the one or more fields in the data table; the path length between the data tables and The number of paths is calculated according to the obtained first dependency relationship between the data tables.

An identification device for a data table, characterized in that the device includes: a third receiving module for receiving an identification instruction for a data service submitted by a terminal; an identification module for identifying the data service for the identification instruction The associated data table; and a sending module for sending the data table associated with the data service to the terminal, wherein the identification module includes: a first dependency acquisition sub-module for acquiring the data table The first dependency relationship of; the path length and path number statistics sub-module is used to count the path length and the number of paths between the data tables according to the first dependency relationship; the second dependency relationship acquisition sub-module is used to obtain The second dependency relationship between one or more fields in the data table; the importance coefficient determination sub-module is used to determine the importance coefficient of the one or more fields according to the second dependency relationship; The degree determination sub-module is used to determine the degree of association between the data tables by using the path length, the number of paths, and the importance coefficient; and the data table identification sub-module is used to determine the data table according to the degree of association Identify it.

The device according to item 14 of the scope of patent application, wherein the path length and path number statistics sub-module includes: a directed acyclic graph construction unit for constructing the relationship between the data tables for the first dependency relationship A directed acyclic graph; and a path length and path number statistical unit for counting the path length and the number of paths in the directed acyclic graph.

The device according to item 15 of the scope of patent application, wherein the directed acyclic graph construction unit includes: an acyclic graph construction subunit for constructing the data table according to the order corresponding to the first dependency relationship A directed graph of nodes; and The direct acyclic graph obtaining subunit is used to delete the rings in the directed graph and obtain the directed acyclic graph between the data tables.

The device according to item 15 or 16 of the scope of patent application, wherein the path length and path number statistics unit includes: a path length statistics subunit for counting the first data table and the second data table in the directed acyclic graph The length of one or more paths between the data tables, and the path number statistics subunit, used to count the number of paths from the first data table to any data table, and, the first data table to any data table and The number of paths through the second data table.

The device according to item 14 of the scope of patent application, wherein the importance coefficient determining sub-module includes: a usage frequency obtaining unit, configured to obtain the usage frequency of the one or more fields within a preset time period, the One or more fields have corresponding field levels; and an importance coefficient determining unit for determining the importance coefficient of the one or more fields according to the number of uses, and/or the field level, wherein, The importance coefficient of the one or more fields is positively correlated with the number of uses, and/or the rank of the field.

The device according to item 14 of the scope of patent application, wherein the correlation degree determining sub-module includes: a distance coefficient determining unit for adopting one or more paths between the first data table and the second data table Length, determine the distance coefficient between the first data table and the second data table; The connectivity coefficient determining unit is used to determine the number of paths from the first data table to any data table, and the number of paths from the first data table to any data table and passing through the second data table to determine the first data table and The connection coefficient between the second data table; the field association degree determining unit is used to adopt the distance coefficient between the first data table and the second data table, and the connection between the first data table and the second data table The coefficient, and the importance coefficient of one or more fields in the first data table, the importance coefficient of one or more fields in the second data table, determine one or more fields in the first data table Regarding the degree of relevance of one or more fields in the second data table, one or more fields in the first data table have a dependency relationship with one or more fields in the second data table; and the data table The degree of relevance determining unit is used to determine the degree of relevance of one or more fields in the first data table to one or more fields in the second data table to determine the relevance of the first data table to the second data table Spend.

The device according to item 14 or 15 or 16 or 18 or 19 of the scope of patent application, wherein the data table identification sub-module includes: a data table identification unit for identifying data business needs according to the degree of relevance Multiple data tables.

The device according to item 20 of the scope of patent application, wherein the data table identification unit includes: a data table relevance acquisition subunit for obtaining the relevance size of the data table required by the data business; and data table filtering The sub-unit is used to filter out a preset number of multiple data tables from the data tables required by the data business according to the degree of relevance.

A device for identifying data tables, characterized in that the device comprises: a first dependency relationship acquisition module for acquiring the first dependency relationship between the data tables; a path length and path number statistics module for acquiring the first dependency relationship between the data tables; A dependency relationship, which counts the path length and the number of paths between the data tables; a second dependency relationship acquisition module, which is used to acquire the second dependency relationship between one or more fields in the data table; importance coefficient The determining module is used to determine the importance coefficient of the one or more fields according to the second dependency relationship; the relevancy determining module is used to determine the path length, the number of paths, and the importance coefficient. The degree of association between the data tables; and the data table identification module for identifying the data table based on the degree of association.

A device for determining the degree of association of data tables, characterized in that the device comprises: a first dependency relationship acquisition module for acquiring the first dependency relationship between data tables; a path length and path number statistics module for determining The first dependency relationship counts the path length and the number of paths between the data tables; the second dependency relationship acquisition module is used to acquire the second dependency relationship between one or more fields in the data table; important The sexual coefficient determination module is used to determine the importance coefficient of the one or more fields according to the second dependency relationship; and The degree of relevance determination module is used to determine the degree of relevance between the data tables by using the path length, the number of paths, and the importance coefficient.