TWI669668B

TWI669668B - Data management device and data management method

Info

Publication number: TWI669668B
Application number: TW107109645A
Authority: TW
Inventors: 葉集閔
Original assignee: 兆豐國際商業銀行股份有限公司
Priority date: 2018-03-21
Filing date: 2018-03-21
Publication date: 2019-08-21
Also published as: TW201941126A

Abstract

本發明提出一種資料管理裝置及資料管理方法。資料管理裝置包括處理器及耦接到處理器的記憶體。處理器獲得多個資料實體並匯入對應資料實體的多個元資料；解析資料操作語言來建立資料實體的資料關係；根據圖學群集演算法來建立資料關係的節點關係圖，其中節點關係圖的多個節點對應資料實體；以及匯出節點關係圖。The invention provides a data management device and a data management method. The data management device includes a processor and a memory coupled to the processor. The processor obtains multiple data entities and imports multiple meta-data of the corresponding data entity; parses the data operation language to establish a data relationship of the data entity; and establishes a node relationship diagram of the data relationship according to the graph cluster algorithm, wherein the node relationship diagram Multiple nodes correspond to data entities; and export node relationship diagrams.

Description

Data management device and data management method

本發明是有關於一種資料管理裝置及資料管理方法，且特別是有關於一種能夠明確顯示出資料關係及資料重要性的資料管理裝置及資料管理方法。The present invention relates to a data management apparatus and a data management method, and more particularly to a data management apparatus and a data management method capable of clearly showing the relationship of data and the importance of data.

在處理大量資料時，為了了解資料的來源及進行後續的處理程序，通常會建立資料履歷(Data Lineage)，以達到資訊透明的目的。同時，資料履歷也可作為錯誤查找及部分資料錯誤所造成的衝擊分析使用。When dealing with a large amount of data, in order to understand the source of the data and carry out subsequent processing procedures, a Data Lineage is usually established to achieve the purpose of information transparency. At the same time, the data history can also be used as an impact analysis caused by error finding and partial data errors.

在一個資料庫中，往往具有成千上萬的資料實體(Data Entity)，每個資料實體都可具有多個資料欄位。資料履歷的建立通常需要根據資料操作語言(Data Manipulation Language，DML)中描述的資料實體之間的關係來達成。然而，從建立出來的文字型資料履歷往往難以快速理解資料實體之間的關係。因此，如何建立出一個資料履歷使得管理者可明確了解資料關係及資料的重要性是本領域技術人員應致力的目標。In a database, there are often thousands of Data Entities, and each data entity can have multiple data fields. The creation of a data history usually needs to be based on the relationship between the data entities described in the Data Manipulation Language (DML). However, it is often difficult to quickly understand the relationship between data entities from the established textual material history. Therefore, how to establish a data resume makes it possible for managers to clearly understand the importance of data relationships and materials is a goal that those skilled in the art should strive for.

本發明提供一種資料管理裝置及資料管理方法，能夠明確顯示出資料關係及資料重要性。The invention provides a data management device and a data management method, which can clearly display the data relationship and the importance of the data.

本發明提出一種資料管理裝置，包括處理器及記憶體。記憶體耦接到處理器並儲存多個資料實體(data entity)。上述處理器獲得資料實體並匯入對應資料實體的多個元資料；解析(parse)資料操作語言(Data Manipulation Language，DML)來建立資料實體的資料關係；根據圖學群集演算法來建立資料關係的節點關係圖，其中節點關係圖的多個節點對應資料實體；以及匯出節點關係圖。The invention provides a data management device, which comprises a processor and a memory. The memory is coupled to the processor and stores a plurality of data entities. The processor obtains a data entity and imports a plurality of metadata of the corresponding data entity; parses a Data Manipulation Language (DML) to establish a data relationship of the data entity; and establishes a data relationship according to the graph cluster algorithm A node relationship diagram in which a plurality of nodes of a node relationship diagram correspond to a data entity; and a node relationship diagram.

在本發明的一實施例中，上述處理器解析資料操作語言來建立多個層級的資料實體的資料關係，其中上述層級包括系統層級、工作層級、任務層級、指令層級、資料表層級及欄位層級。In an embodiment of the invention, the processor parses the data manipulation language to establish a data relationship of a plurality of hierarchical data entities, wherein the hierarchy includes a system hierarchy, a work hierarchy, a task hierarchy, an instruction hierarchy, a data table hierarchy, and a field. Level.

在本發明的一實施例中，上述處理器根據圖學群集演算法將節點關係圖分成至少一群集，每個群集包括中心點，中心點為每個群集中具有最高度(degree)的節點。In an embodiment of the invention, the processor divides the node relationship graph into at least one cluster according to the graph clustering algorithm, each cluster includes a center point, and the center point is a node having the highest degree in each cluster.

在本發明的一實施例中，上述節點關係圖為每一資料實體的多個資料欄位的欄位關係圖。In an embodiment of the invention, the node relationship graph is a field relationship diagram of a plurality of data fields of each data entity.

在本發明的一實施例中，上述處理器根據節點關係圖的節點的度來計算對應節點的資料實體的重要性指數。In an embodiment of the invention, the processor calculates an importance index of the data entity of the corresponding node according to the degree of the node of the node relationship graph.

本發明提出一種資料管理方法，包括：獲得資料實體並匯入對應資料實體的多個元資料；解析資料操作語言來建立資料實體的資料關係；根據圖學群集演算法來建立資料關係的節點關係圖，其中節點關係圖的多個節點對應資料實體；以及匯出節點關係圖。The invention provides a data management method, comprising: obtaining a data entity and importing a plurality of meta-data of a corresponding data entity; analyzing a data operation language to establish a data relationship of the data entity; and establishing a node relationship of the data relationship according to the graph cluster algorithm A graph in which a plurality of nodes of a node relationship graph correspond to a data entity; and a graph of a node relationship.

在本發明的一實施例中，資料管理方法還包括：解析資料操作語言來建立多個層級的資料實體的資料關係，其中上述層級包括系統層級、工作層級、任務層級、指令層級、資料表層級及欄位層級。In an embodiment of the present invention, the data management method further includes: parsing a data operation language to establish a data relationship of a plurality of hierarchical data entities, wherein the hierarchical level includes a system level, a working level, a task level, an instruction level, and a data table level. And the level of the field.

在本發明的一實施例中，資料管理方法還包括：根據圖學群集演算法將節點關係圖分成至少一群集，每個群集包括中心點，中心點為每個群集中具有最高度(degree)的節點。In an embodiment of the invention, the data management method further comprises: dividing the node relationship graph into at least one cluster according to the graph cluster algorithm, each cluster including a center point, and the center point has a maximum degree in each cluster Node.

在本發明的一實施例中，資料管理方法還包括：根據節點關係圖的節點的度來計算對應節點的資料實體的重要性指數。In an embodiment of the present invention, the data management method further includes: calculating an importance index of the data entity of the corresponding node according to the degree of the node of the node relationship graph.

基於上述，本發明的資料管理裝置及資料管理方法會解析資料操作語言來建立資料實體的資料關係再利用圖學群集演算法來將資料實體的資料關係圖像化，因此可顯示出資料節點的群集及中心點等資訊，並能從資料關係的節點關係圖中獲得資料實體的重要性指數。Based on the above, the data management apparatus and the data management method of the present invention analyze the data operation language to establish a data relationship of the data entity and reuse the graph cluster algorithm to image the data relationship of the data entity, thereby displaying the data node. Information such as clusters and central points, and the importance index of the data entity can be obtained from the node relationship diagram of the data relationship.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。The above described features and advantages of the invention will be apparent from the following description.

圖1為本發明一實施例的資料管理裝置的方塊圖。1 is a block diagram of a material management apparatus according to an embodiment of the present invention.

請參照圖1，本發明的資料管理裝置100包括處理器110、記憶體120及通訊介面130。資料管理裝置100可為一或多個伺服器，透過通訊介面130接收用戶資料並將用戶資料儲存或暫存在記憶體120中。資料管理裝置100還可透過處理器110進型用戶資料處理。資料管理裝置100還可透過處理器110進行用戶資料的管理，並明確顯示出資料之間的關係。Referring to FIG. 1 , the data management device 100 of the present invention includes a processor 110 , a memory 120 , and a communication interface 130 . The data management device 100 can be one or more servers that receive user data through the communication interface 130 and store or temporarily store the user data in the memory 120. The data management device 100 can also process user data through the processor 110. The data management device 100 can also manage the user data through the processor 110 and clearly display the relationship between the data.

處理器110可以是中央處理單元(Central Processing Unit，CPU)，或是其他可程式化之一般用途或特殊用途的微處理器(Microprocessor)、數位信號處理器(Digital Signal Processor，DSP)、可程式化控制器、特殊應用積體電路(Application Specific Integrated Circuit，ASIC)或其他類似元件或上述元件的組合。The processor 110 can be a central processing unit (CPU), or other programmable general purpose or special purpose microprocessor (Microprocessor), digital signal processor (DSP), programmable A controller, an Application Specific Integrated Circuit (ASIC) or other similar component or a combination of the above components.

記憶體120可以是任何型態的固定或可移動隨機存取記憶體(Random Access Memory，RAM)、唯讀記憶體(Read-Only Memory，ROM)、快閃記憶體(flash memory)、硬碟(Hard Disk Drive，HDD)、固態硬碟(Solid State Drive，SSD)或類似元件或上述元件的組合。The memory 120 can be any type of fixed or removable random access memory (RAM), read-only memory (ROM), flash memory, hard disk. (Hard Disk Drive, HDD), Solid State Drive (SSD) or the like or a combination of the above.

在一實施例中，通訊介面130可為支援全球行動通信(Global System for Mobile communication, GSM)、個人手持式電話系統(Personal Handy-phone System, PHS)、碼多重擷取(Code Division Multiple Access, CDMA)系統、寬頻碼分多址(Wideband Code Division Multiple Access, WCDMA)系統、長期演進(Long Term Evolution, LTE)系統、全球互通微波存取(Worldwide interoperability for Microwave Access, WiMAX)系統、無線保真(Wireless Fidelity, Wi-Fi)系統或藍牙等系統的信號傳輸的無線通訊介面。在另一實施例中，通訊介面130也可為任何能夠提供傳輸或接收資料的有線通訊介面，例如通用序列匯流排(Universal Serial Bus，USB)介面或積體電路間匯流排(Inter-Integrated bus，I2C bus)等有線通訊介面。本新型創作並不對通訊介面130的種類進行限制。In an embodiment, the communication interface 130 can support Global System for Mobile communication (GSM), Personal Handy-phone System (PHS), and Code Division Multiple Access (Code Division Multiple Access). CDMA) system, Wideband Code Division Multiple Access (WCDMA) system, Long Term Evolution (LTE) system, Worldwide interoperability for Microwave Access (WiMAX) system, Wireless Fidelity (Wireless Fidelity, Wi-Fi) system or wireless communication interface for signal transmission in systems such as Bluetooth. In another embodiment, the communication interface 130 can also be any wired communication interface capable of providing transmission or reception of data, such as a Universal Serial Bus (USB) interface or an inter-integrated bus (Inter-Integrated bus). , I2C bus) and other wired communication interfaces. This novel creation does not limit the type of communication interface 130.

[第一實施例][First Embodiment]

圖2為根據本發明一實施例的資料操作語言與資料實體的關係示意圖。圖3為根據本發明一實施例的資料表的關係示意圖。圖4為根據本發明一實施例的資料表及處理程式的關係示意圖。2 is a schematic diagram showing the relationship between a data manipulation language and a data entity according to an embodiment of the invention. FIG. 3 is a schematic diagram showing the relationship of a data table according to an embodiment of the present invention. 4 is a schematic diagram showing the relationship between a data table and a processing program according to an embodiment of the invention.

請參照圖2到圖4，在本實施例中，資料操作語言可對應到資料實體(例如，資料實體200)與其他資料實體之間的連接關係。資料實體200可包括資料表210及資料欄位220。當資料實體數量較多時，資料管理裝置100會針對資料表(例如，資料表310及其他資料表)作出資料表關係圖，以對資料表的目標關係及來源關係進行描述。在本實施例中，資料管理裝置100還會產生資料表(例如，資料表410，在圖中以長方形表示)與處理程式(例如，處理程式430，在圖中以橢圓型表示)的關係圖。系統管理者根據資料表及處理程式的關係圖可了解一個處理程式會從哪些資料表取得資料並輸出資料到哪些資料表。Referring to FIG. 2 to FIG. 4, in the embodiment, the data operation language may correspond to a connection relationship between a data entity (for example, the data entity 200) and other data entities. The data entity 200 can include a data table 210 and a data field 220. When the number of data entities is large, the data management device 100 will make a data table relationship diagram for the data table (for example, the data table 310 and other data tables) to describe the target relationship and source relationship of the data table. In the present embodiment, the material management apparatus 100 also generates a relationship table between the data table (for example, the data table 410, which is represented by a rectangle in the figure) and the processing program (for example, the processing program 430, which is represented by an ellipse in the figure). . Based on the data sheet and the processing diagram of the processing program, the system administrator can know which data sheets a processing program will obtain data from and which data sheets to output.

圖5為根據本發明一實施例的資料管理方法的流程圖。FIG. 5 is a flow chart of a data management method according to an embodiment of the invention.

請參照圖5，在步驟S501中，獲得資料實體。Referring to FIG. 5, in step S501, a data entity is obtained.

在步驟S502中，獲得元資料。元資料可視為資料字典，用來解釋各個資料實體的內容。In step S502, metadata is obtained. Metadata can be used as a data dictionary to explain the content of each data entity.

在步驟S503中，匯入元資料。In step S503, the metadata is imported.

在步驟S504中，獲得資料操作語言。資料操作語言可為SQL腳本。In step S504, a material operation language is obtained. The data manipulation language can be a SQL script.

在步驟S505中，解析資料操作語言。In step S505, the material operation language is parsed.

在步驟S506中，建立資料關係。In step S506, a data relationship is established.

在步驟S507中，匯出資料關係。In step S507, the data relationship is remitted.

在步驟S508中，獲得資料履歷。具體來說，當完整的資料履歷建立之後，資料管理裝置100可從底層到上層來追溯資料源頭。In step S508, a material history is obtained. Specifically, after the complete data history is established, the material management apparatus 100 can trace the source of the data from the bottom layer to the upper layer.

在步驟S509中，獲得衝擊分析。具體來說，當上層的資料被變更之後，資料管理裝置100可從資料關係來判斷哪些下層資料會被受到上層資料變更的影響。In step S509, an impact analysis is obtained. Specifically, after the data of the upper layer is changed, the data management apparatus 100 can determine from the data relationship which lower layer data is affected by the change of the upper layer data.

在第一實施例中，資料來源及資料處理程式的數量相當多，因此資料表關係圖的連線十分複雜而較難釐清資料欄位的關係。因此以下提出第二實施例來改良第一實施例的不足。In the first embodiment, the number of data sources and data processing programs is quite large, so the connection of the data sheet relationship diagram is very complicated and it is difficult to clarify the relationship of the data fields. Therefore, the second embodiment is proposed below to improve the deficiencies of the first embodiment.

[第二實施例][Second embodiment]

圖6為根據本發明一實施例的資料管理方法的流程圖。圖7為根據本發明一實施例的資料實體的節點散佈關係圖。圖8為根據本發明一實施例的離群節點的示意圖。FIG. 6 is a flowchart of a data management method according to an embodiment of the present invention. FIG. 7 is a diagram of a node scatter relationship of a data entity according to an embodiment of the invention. 8 is a schematic diagram of an outlier node in accordance with an embodiment of the present invention.

請參照圖6，步驟S601~S606與圖5的步驟S501~S506類似，因此不再贅述。Referring to FIG. 6, steps S601 to S606 are similar to steps S501 to S506 of FIG. 5, and therefore are not described again.

在步驟S607中，利用網路分析匯出資料關係。具體來說，資料管理裝置100可根據圖學群集演算法來建立資料關係的節點關係圖並匯出節點關係圖。圖學群集演算法例如是最高連接子圖(Highly Connected Subgraphs，HCS)群集演算法，或其他群集演算法。節點關係圖的多個節點對應資料實體。In step S607, the data relationship is extracted using the network analysis. Specifically, the data management device 100 can establish a node relationship diagram of the data relationship according to the graph cluster algorithm and remit the node relationship graph. The graph clustering algorithm is, for example, a Highly Connected Subgraphs (HCS) clustering algorithm, or other clustering algorithm. Multiple nodes of the node relationship graph correspond to data entities.

在步驟S608中，建立節點的資料中心性。具體來說，資料管理裝置100可將資料實體的節點分成多個群集。以圖7為例，一個群集會有一個中心點700及子群集710、子群集720，且子群集710也會有一個子中心點711。中心點700可為一個群集中具有最高度(degree)的節點。換句話說，中心點是一個群集中，可直接連接到最多節點的節點。類似地，子中心點711也是在子群集710中具有最高度的節點。在建立節點的資料中心性之後，管理者就可以明確了解資料欄位之間的群集關係，以分析資料特徵。從資料群集中，管理者還可確認資料的重要性。例如，具有越高的「度」的節點在一個群集中的重要性指數也會越高。In step S608, the data center of the node is established. In particular, the material management apparatus 100 can divide the nodes of the data entity into a plurality of clusters. Taking FIG. 7 as an example, a cluster will have a center point 700 and a sub-cluster 710, a sub-cluster 720, and the sub-cluster 710 will also have a sub-center point 711. The central point 700 can be a node having the highest degree in a cluster. In other words, the center point is a node in a cluster that can be directly connected to the most nodes. Similarly, subcenter point 711 is also the node with the highest height in subcluster 710. After establishing the data center of the node, the administrator can clearly understand the cluster relationship between the data fields to analyze the data characteristics. From the data cluster, managers can also confirm the importance of the data. For example, a node with a higher degree will have a higher importance index in a cluster.

在步驟S609中，獲得團(clique)及子群集。具體來說，「團」的定義為一個完整子圖，也就是在一個「團」中的任意兩個節點都彼此直接相鄰。當資料管理裝置100獲得子群集或是團的資訊時，就可確認在此子群集或團中的資料關係的緊密程度。In step S609, a clique and a sub-cluster are obtained. Specifically, "group" is defined as a complete subgraph, that is, any two nodes in a "group" are directly adjacent to each other. When the material management device 100 obtains sub-cluster or group information, it is possible to confirm the closeness of the data relationship in the sub-cluster or group.

在步驟S610中，獲得節點的位置及角色。具體來說，當資料管理裝置100在產生節點關係圖時，就可依照節點彼此之間的關係程度來決定節點彼此之間的距離。當兩個節點關係越緊密時在節點關係圖的距離也會越短。因此，也可能產生離群節點810及離群節點820，如圖8所示。離群節點810及離群節點820並沒有跟節點關係圖中其他節點有任何關係，因此也不會有邊(edge)來連接離群節點810及離群節點820及其他節點。In step S610, the location and role of the node are obtained. Specifically, when the material management device 100 generates the node relationship map, the distance between the nodes can be determined according to the degree of relationship between the nodes. The closer the relationship between the two nodes, the shorter the distance in the node graph. Therefore, it is also possible to generate the outlier node 810 and the outlier node 820, as shown in FIG. The outlier node 810 and the outlier node 820 do not have any relationship with other nodes in the node relationship diagram, and therefore there is no edge to connect the outlier node 810 and the outlier node 820 and other nodes.

圖9為根據本發明一實施例分層建立資料實體關係圖的示意圖。FIG. 9 is a schematic diagram of hierarchically establishing a data entity relationship diagram according to an embodiment of the invention.

在步驟S901中，建立抽取轉換加載(Extract Transform Load，ETL)系統的資料實體關係圖。In step S901, a data entity relationship diagram of an Extract Transform Load (ETL) system is established.

在步驟S902中，建立不同的ETL執行頻率的資料實體關係表。具體來說，根據業務或執行特性，資料實體可分為每月執行、每周執行、每天執行等不同頻率。因此資料管理裝置100會針對不同執行頻率的資料實體分別建立資料實體關係圖。In step S902, a data entity relationship table of different ETL execution frequencies is established. Specifically, depending on the business or execution characteristics, the data entities can be divided into different frequencies such as monthly execution, weekly execution, and daily execution. Therefore, the data management apparatus 100 separately establishes a data entity relationship diagram for data entities of different execution frequencies.

在步驟S903中，建立ETL執行群組的資料實體關係圖。ETL執行群組屬於工作(Job)層級。In step S903, a data entity relationship diagram of the ETL execution group is established. The ETL execution group belongs to the job (Job) level.

在步驟S904中，建立ETL執行檔案的資料實體關係圖。ETL執行檔案(例如，SQL檔案)屬於任務(task)層級。In step S904, a data entity relationship diagram of the ETL execution file is established. ETL execution files (for example, SQL files) belong to the task hierarchy.

在步驟S905中，建立SQL檔案中各指令的資料實體關係圖。In step S905, a data entity relationship diagram of each instruction in the SQL file is established.

在步驟S906中，建立SQL指令中各個虛擬或實體表的資料實體關係圖。各個虛擬或實體表都屬於一個SQL區塊。In step S906, a data entity relationship diagram of each virtual or entity table in the SQL instruction is established. Each virtual or entity table belongs to a SQL block.

在步驟S907中，建立SQL區塊每個目標欄位的資料實體關係圖。In step S907, a data entity relationship diagram of each target field of the SQL block is established.

在步驟S908中，建立SQL區塊每個來源欄位的資料實體關係圖。In step S908, a data entity relationship diagram of each source field of the SQL block is established.

綜上所述，本發明的資料管理裝置及資料管理方法會解析資料操作語言來建立資料實體的資料關係再利用圖學群集演算法來將資料實體的資料關係圖像化，因此可顯示出資料節點的群集及中心點等資訊，並能從資料關係的節點關係圖中獲得資料實體的重要性指數。In summary, the data management apparatus and the data management method of the present invention analyze the data operation language to establish a data relationship of the data entity and reuse the graph cluster algorithm to image the data relationship of the data entity, thereby displaying the data. Information such as the cluster and the central point of the node, and the importance index of the data entity can be obtained from the node relationship diagram of the data relationship.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention, and any one of ordinary skill in the art can make some changes and refinements without departing from the spirit and scope of the present invention. The scope of the invention is defined by the scope of the appended claims.

100‧‧‧資料管理裝置100‧‧‧Data management device

110‧‧‧處理器110‧‧‧ processor

120‧‧‧記憶體120‧‧‧ memory

130‧‧‧通訊介面130‧‧‧Communication interface

200‧‧‧資料實體200‧‧‧Information entity

210、310、410‧‧‧資料表210, 310, 410‧‧‧Information Sheet

220‧‧‧資料欄位220‧‧‧Information field

430‧‧‧處理程式430‧‧‧Processing program

S501~S509‧‧‧資料管理方法的步驟S501~S509‧‧‧Steps for data management methods

S601~S610‧‧‧資料管理方法的步驟Steps of the S601~S610‧‧‧ data management method

700‧‧‧中心點 700‧‧‧ center point

710、720‧‧‧子群集 710, 720‧‧‧ sub-cluster

711‧‧‧子中心點 711‧‧‧Subcenter points

810、820‧‧‧離群節點 810, 820‧‧‧ outlier nodes

S901~S908‧‧‧分層建立資料實體關係圖的步驟 S901~S908‧‧‧Steps for layering data entity relationship diagram

圖1為根據本發明一實施例的資料管理裝置的方塊圖。圖2為根據本發明一實施例的資料操作語言與資料實體的關係示意圖。圖3為根據本發明一實施例的資料表的關係示意圖。圖4為根據本發明一實施例的資料表及處理程式的關係示意圖。圖5為根據本發明一實施例的資料管理方法的流程圖。圖6為根據本發明一實施例的資料管理方法的流程圖。圖7為根據本發明一實施例的資料實體的節點散佈關係圖。圖8為根據本發明一實施例的離群節點的示意圖。圖9為根據本發明一實施例分層建立資料實體關係表的示意圖。1 is a block diagram of a material management apparatus in accordance with an embodiment of the present invention. 2 is a schematic diagram showing the relationship between a data manipulation language and a data entity according to an embodiment of the invention. FIG. 3 is a schematic diagram showing the relationship of a data table according to an embodiment of the present invention. 4 is a schematic diagram showing the relationship between a data table and a processing program according to an embodiment of the invention. FIG. 5 is a flow chart of a data management method according to an embodiment of the invention. FIG. 6 is a flowchart of a data management method according to an embodiment of the present invention. FIG. 7 is a diagram of a node scatter relationship of a data entity according to an embodiment of the invention. 8 is a schematic diagram of an outlier node in accordance with an embodiment of the present invention. FIG. 9 is a schematic diagram of hierarchically establishing a data entity relationship table according to an embodiment of the invention.

Claims

A data management device includes: a processor; and a memory coupled to the processor and storing a plurality of data entities, wherein the processor obtains the data entities and imports the corresponding data entities a plurality of meta-data; parse a Data Manipulation Language (DML) to establish a data relationship of the data entities; and establish a node relationship diagram of the data relationship according to a graph clustering algorithm, The plurality of nodes of the node relationship graph correspond to the data entities; and the node relationship graph is exported.

The data management device of claim 1, wherein the processor parses the data operation language to establish the data relationship of the plurality of hierarchical data entities, wherein the hierarchical levels comprise a system level and a working level , a task level, an instruction level, a data level, and a column level.

The data management device of claim 1, wherein the processor divides the node relationship graph into at least one cluster according to the graph cluster algorithm, and each of the at least one cluster includes a center point, where the center point is Each of the at least one group is centralized with a node having the highest degree.

The data management device of claim 1, wherein the node relationship diagram is a field relationship diagram of a plurality of data fields of each of the data entities. The data management device of claim 1, wherein the processor calculates an importance index of the data entities corresponding to the nodes according to the degrees of the nodes of the node relationship graph.

A data management method, applicable to a data management device, the data management device includes a processor and a memory coupled to the processor and storing a plurality of data entities, the data management method comprising: obtaining the The data entities are merged into a plurality of metadata corresponding to the data entities; the processor analyzes a data operation language to establish a data relationship of the data entities; and the processor performs a cluster algorithm according to a graph And establishing a node relationship diagram of the data relationship, wherein the plurality of nodes of the node relationship graph correspond to the data entities; and the node relationship diagram is remitted by the processor.

The data management method of claim 6, further comprising: parsing the data operation language by the processor to establish the data relationship of the plurality of hierarchical data entities, wherein the hierarchical levels comprise a system level , a work level, a task level, an instruction level, a data level, and a column level.

The data management method of claim 6, further comprising: dividing, by the processor, the node relationship graph into at least one cluster according to the graph cluster algorithm, each of the at least one cluster including a center point, The center point is a node having the highest degree for each of the at least one group.

The data management method of claim 6, wherein the node relationship diagram is a field relationship diagram of a plurality of data fields of each of the data entities.

The data management method of claim 6, further comprising: calculating, by the processor, an importance index of the data entities corresponding to the nodes according to degrees of the nodes of the node relationship graph.