WO2022178979A1 - 数据处理方法、系统、计算机设备及可读存储介质 - Google Patents

数据处理方法、系统、计算机设备及可读存储介质 Download PDF

Info

Publication number
WO2022178979A1
WO2022178979A1 PCT/CN2021/091309 CN2021091309W WO2022178979A1 WO 2022178979 A1 WO2022178979 A1 WO 2022178979A1 CN 2021091309 W CN2021091309 W CN 2021091309W WO 2022178979 A1 WO2022178979 A1 WO 2022178979A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
usage
user
nodes
Prior art date
Application number
PCT/CN2021/091309
Other languages
English (en)
French (fr)
Inventor
向明
胡明荣
傅群慧
朱尧
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022178979A1 publication Critical patent/WO2022178979A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Definitions

  • the present application relates to the field of big data technologies, and in particular, to a data processing method, system, computer device, and readable storage medium.
  • the traditional data asset management method is to register and classify data assets to form an asset directory tree, which can only support data search and positioning.
  • the purpose of the present application is to provide a data processing method, system, computer equipment and readable storage medium, which are used to solve the problem of easy errors and omissions during registration in the prior art, low efficiency, scattered data, no complete management system, not only It is easy to cause unreasonable access across departments, and the data value is not clear, the data display is not intuitive, resulting in multiple waste of storage resources, computing resources and management costs.
  • a data processing method comprising the following steps:
  • the logic code is parsed, and a data full-link structure corresponding to the data asset table is generated, and the data full-link structure includes a service node and a data node;
  • Statistical analysis is performed on the asset usage dashboard and the asset value analysis dashboard, respectively, and the data in the service nodes corresponding to the resource usage that is lower than the first preset value and the low.
  • the data in the service node corresponding to the resource access condition of the second preset value is archived or offline.
  • the present application also provides a data processing system, which specifically includes the following components:
  • a first parsing module configured to parse the acquired data asset table to obtain the logic code of the data asset table
  • the second parsing module is configured to parse the logic code according to a preset blood relationship analysis tool, and generate a data full-link structure corresponding to the data asset table, where the data full-link structure includes service nodes and data node;
  • a determination module for obtaining the user of the service node, and according to the user of the service node, to determine the user of the data node in the data full link structure
  • the apportionment module is configured to determine the usage department of the service node and the usage department of the data node according to the usage user of the service node and the usage user of the data node, and assign the service node to the usage department of the data node. And the storage resources and computing resources of the data nodes are allocated to each user department;
  • the statistics module is used to separately count the resource usage of each usage department and the resource access situation of each usage department, and generate an asset usage status dashboard corresponding to the resource usage and a dashboard corresponding to the resource usage according to the statistical results.
  • the asset value analysis dashboard corresponding to the visit.
  • a processing module configured to perform statistical analysis on the asset usage status dashboard and the asset value analysis dashboard respectively, and respectively perform statistical analysis on the service nodes corresponding to the resource usage status that is lower than the first preset value.
  • the data in the service node and the data in the service node corresponding to the resource access condition lower than the second preset value are archived or offline.
  • the present application also provides a computer device, the computer device specifically includes: a memory, a processor, and a computer program stored in the memory and running on the processor, the processor executes the computer program. The following steps are implemented when the computer program is described:
  • the logic code is parsed, and a data full-link structure corresponding to the data asset table is generated, and the data full-link structure includes a service node and a data node;
  • Statistical analysis is performed on the asset usage dashboard and the asset value analysis dashboard, respectively, and the data in the service nodes corresponding to the resource usage that is lower than the first preset value and the low.
  • the data in the service node corresponding to the resource access condition of the second preset value is archived or offline.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and the computer program implements the following steps when executed by a processor:
  • the logic code is parsed, and a data full-link structure corresponding to the data asset table is generated, and the data full-link structure includes a service node and a data node;
  • Statistical analysis is performed on the asset usage dashboard and the asset value analysis dashboard, respectively, and the data in the service nodes corresponding to the resource usage that is lower than the first preset value and the low.
  • the data in the service node corresponding to the resource access condition of the second preset value is archived or offline.
  • the data processing method, system, computer equipment and readable storage medium provided by this application, by incorporating the entire data flow from data storage to final consumption and use, into the data full-link structure, forming a complete data life cycle, Then, according to the nodes of the data full-link structure, the use department of each service node and each data node in the data full-link structure is determined, which not only eliminates the unreasonable access across departments, but also greatly improves the performance. data security.
  • the asset usage status dashboard and asset value analysis dashboard corresponding to the data assets the data value is clear, the data display is intuitive and comprehensive, the data assets with large storage space and low value can be released in time, and the waste of storage space can be reduced. Moreover, the consumption of computing resources is reduced, which greatly saves the data management cost of enterprises.
  • FIG. 1 is a schematic flowchart of an optional step of a data processing method provided by an embodiment of the present application
  • FIG. 2 is a schematic schematic diagram of an optional step refinement process of step S200 in FIG. 1 according to an embodiment of the present application;
  • FIG. 3 is a schematic schematic diagram of an optional step refinement process of step S201 in FIG. 2 according to an embodiment of the present application;
  • FIG. 4 is a schematic effect diagram of a data full-link structure provided by an embodiment of the present application.
  • FIG. 5 is a schematic schematic diagram of an optional step refinement process of step S400 in FIG. 1 according to an embodiment of the present application;
  • FIG. 6 is a schematic schematic diagram of an optional step refinement process of step S500 in FIG. 1 according to an embodiment of the present application;
  • FIG. 7 is a schematic schematic diagram of another optional step refinement process of step S500 in FIG. 1 according to an embodiment of the present application;
  • FIG. 8 is a schematic diagram of an optional program module of a data processing system provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an optional hardware architecture of a computer device according to an embodiment of the present application.
  • first, second, third, etc. may be used in this application to describe various information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information without departing from the scope of the present application.
  • word "if” as used herein can be interpreted as "at the time of” or "when” or "in response to determining.”
  • FIG. 1 a schematic flowchart of steps of a data processing method provided by an embodiment of the present application is shown. It can be understood that the flowcharts in the embodiments of the present application are not used to limit the order of executing steps.
  • the following is an exemplary description with a computer device as the execution subject, and the computer device may include mobile terminals such as smart phones, tablet personal computers, laptop computers, etc., as well as fixed terminals such as desktop computers. . details as follows:
  • step S100 the acquired data asset table is analyzed to obtain a logic code of the data asset table.
  • the data assets of the enterprise are obtained, a data asset table is obtained, the data asset table is parsed through a preset HQL (Hibernate Query Language, a completely object-oriented query language), and the corresponding data asset table is obtained.
  • HQL Hibernate Query Language, a completely object-oriented query language
  • Logic code wherein, the data asset (Data Asset) refers to the data resources that are owned or controlled by the enterprise and can bring future economic benefits to the enterprise and are recorded in physical or electronic ways, such as documents, electronic data Etc., in an enterprise, not all data constitutes a data asset, and a data asset is a data resource that can generate value for the enterprise.
  • the data assets include: order information data, user information data, capital flow data, traffic data, customer service data, and the like.
  • Step S200 analyze the logic code according to a preset blood relationship analysis tool, and generate a data full link structure corresponding to the data asset table, where the data full link structure includes a service node and a data node.
  • the logic code of the data asset table is parsed to obtain a tree structure code of the data asset table, and the association of the logic code is determined according to the tree structure code. point, and then deconstruct the tree structure code according to the preset recursive algorithm, identify each node of the logic code, and finally generate a data full link structure corresponding to the data asset table.
  • the link structure includes service nodes and data nodes.
  • the step S200 may include:
  • Step S201 according to the preset blood relationship analysis tool, analyze the logic code of the data asset table, and generate all the nodes associated with the data asset table and the association relationship between the nodes, the association The relationship includes parent node and child node;
  • Step S202 Connect the nodes according to the association relationship between the nodes, and all the connected nodes form a data full link structure of the data asset table.
  • the preset blood relationship analysis tool call the preset abstract syntax tree (Abstract Syntax Tree, AST) in HIVE to parse the logic code of the data asset table, and generate the data of the data asset table Full link structure.
  • AST Abstract Syntax Tree
  • the method may also draw a binary tree corresponding to the data asset table according to the data full link structure and store it in a graph database, so as to facilitate daily query of data links or for use in other projects .
  • the HIVE is a data warehouse tool based on Hadoop, which is used for data extraction, transformation and loading, and is a mechanism that can store, query and analyze large-scale data stored in Hadoop.
  • the preset blood relationship analysis tool has added the function of configurable to identify the special source code of the data asset table, and will automatically alarm when a special source code is identified, such as using a variable instead of a library name or a table name, and submit it to the developer for processing.
  • the step S201 may include:
  • Step S211 parse the logic code of the data asset table to obtain a tree structure code of the data asset table;
  • Step S212 Deconstruct the tree structure code according to a preset recursive algorithm, and dig out the association relationship between the nodes to determine the parent node of the nodes.
  • the AST in the HIVE is called to parse the logic code of the data asset table, and each node of the logic code and the The association relationship of each node is used to determine the parent node of each node. It should be noted that when complex nested logic exists in the logic code, the tree structure code will also perform corresponding nesting expansion.
  • the step S202 may include:
  • the nodes are connected according to the connection mode of the parent node and the child node, and all the connected nodes form a data full link structure of the data asset table.
  • the interpretation of the logic code is as follows: the source table source_table_s is left associated with the source table source_table_t, and the two field contents of name and pp in the associated result are all inserted into the target table target_table_x.
  • the logical code of the table target_table_x is parsed by the AST, and the tree structure code of the table target_table_x is obtained, so that the source tables associated with the table target_table_x are initially obtained as table source_table_s and table source_table_t, wherein,
  • the tree structure code is as follows:
  • the definition of the tree structure code is as follows: the source table source_table_s is left associated with the source table source_table_t, and the contents of the two fields name and pp in the association result are all inserted into the target table target_table_x.
  • the tree-structured code is generated by parsing the logic code through the AST, and the tree-structured code structure is more regular and more convenient for recursion, splitting and deconstruction.
  • FIG. 4 is a schematic effect diagram of the full link structure of data. It is assumed that the table target_table_x is used as a node X of the binary tree, and its parent nodes are S and T respectively.
  • the preset blood relationship analysis tool obtains that the parent nodes of node S are A and B respectively. By repeatedly calling the preset blood relationship analysis tool, all nodes associated with X are obtained, and finally the full data chain of the table target_table_x is obtained. road structure.
  • the logic code of the data asset table is parsed by a preset blood relationship analysis tool, so as to determine the data full-link structure of the data asset table, which not only avoids the phenomenon of errors and omissions caused by manual configuration and registration, but also extremely The accuracy of data link association is greatly improved.
  • the binary tree is drawn according to the full link structure of the data and the knowledge map is constructed, so that the data development team and downstream data users can quickly query and understand the data link without reading a large number of professional codes, lowering the threshold for information acquisition, and expanding the application crowd range.
  • Step S300 Acquire the user of the service node, and determine the user of the data node in the full data link structure according to the user of the service node.
  • the data node has no direct user, and the service node confirms the user.
  • the user of the obtained service node "indicator 1" is user A
  • the user of the source table associated with "indicator 1" can be determined, that is, the user A is also data.
  • Step S400 Determine the usage department of the service node and the usage department of the data node according to the usage user of the service node and the usage user of the data node, and assign the service node and the data node to the usage department.
  • the storage resources and computing resources of the data nodes are allocated to each user department.
  • the users who use the service node and the users who use the data node are classified according to the preset attribution departments of the users, and the use department of the service node and the data node are determined.
  • the usage department of the node is allocated, and the storage resources and computing resources of the service node and the data node are allocated to each usage department.
  • the storage resource is the disk space occupied by the system storage data, which can be obtained through the HIVE;
  • the computing resource is the computing unit used when computing data, including a cluster central processing unit (CPU), memory etc., which can be obtained through the cluster monitoring system log.
  • the step S400 may include:
  • Step S401 determining the user of the data node in the full data link structure according to the user of the service node;
  • Step S402 Classify the user according to the attribution department preset by the user, and determine the use department of the service node and the use department of the data node.
  • the finance department can be determined to be the data node X, The usage department of data node S, data node T, data node A, data node B and data node C.
  • Step S500 collect statistics on the resource usage of each using department and the resource access situation of each usage department, and generate an asset usage status dashboard corresponding to the resource usage and the resource access situation according to the statistical results.
  • the corresponding asset value analysis dashboard collect statistics on the resource usage of each using department and the resource access situation of each usage department.
  • the step S500 may include:
  • Step S501 respectively acquiring the total storage space of each using department, the storage resources of the service node and the computing resources of the service node;
  • Step S502 respectively counting the total apportioned storage space of each using department and the apportioned storage space of various service nodes;
  • Step S503 calculating the ratio of the total storage space and the total apportioned storage space of the respective use departments to obtain a first ratio result
  • Step S504 calculating the ratio of the apportioned storage space to the total storage space of the each using department, to obtain a second ratio result;
  • Step S505 sorting the storage resources to obtain a first sorting result
  • Step S506 sorting the computing resources to obtain a second sorting result
  • Step S507 according to the total storage space, the total allocated storage space, the allocated storage space, the first ratio result, the second ratio result, the first sorting result and the second sorting result , which generates the asset usage dashboard.
  • the asset usage status dashboard is generated and displayed on the asset usage status dashboard.
  • This embodiment of the present application generates an asset usage status dashboard by performing statistical analysis on the data asset and displays the asset usage status dashboard, so that the user can simply and intuitively understand the usage status of the data asset.
  • the step S500 may include:
  • Step S511 acquiring the access volume of each service node in each of the usage departments
  • Step S512 normalize the visits and sort them from large to small to obtain a third sorting result
  • Step S513 performing reverse order processing on the third sorting result to obtain a fourth sorting result
  • Step S514 calculating the ratio of the access amount to the total storage space to obtain a third ratio result
  • Step S515 normalizing the third ratio results and sorting them from large to small to obtain a fifth sorting result
  • Step S516 performing reverse order processing on the fifth sorting result to obtain a sixth sorting result
  • Step S517 Generate the asset value analysis dashboard according to the third sorting result, the fourth sorting result, the fifth sorting result, and the sixth sorting result.
  • the access volume of the service nodes in each user department first obtain the access volume of the service nodes in each user department, normalize the access volume and then sort it, then calculate the ratio of the access volume to the total storage space, After the ratios are normalized, they are sorted, and finally the asset value analysis dashboard is generated according to the sorting result and displayed on the asset value analysis dashboard.
  • This embodiment of the present application generates an asset value analysis dashboard by performing statistical analysis on the data asset and displays the asset value analysis dashboard, so that the user can simply and intuitively understand the asset value of the data asset.
  • Step S600 respectively perform statistical analysis on the asset usage status dashboard and the asset value analysis dashboard, and respectively perform statistical analysis on the service nodes corresponding to the resource usage status that is lower than the first preset value.
  • the data and the data in the service node corresponding to the resource access condition lower than the second preset value are archived or offline.
  • the low-value data assets that are lower than the preset value, that is, the first comparison result and the second comparison result, are archived or offline, thereby releasing storage and computing resources, wherein the low-value data assets include high storage , high computing consumption, low access heat and low importance data content.
  • the embodiment of the present application promotes the data user to actively cooperate with the data developer to gradually release data assets with high input and low output, thereby reducing the waste of storage space and the consumption of computing resources. At the same time, it saves the cost of data management for enterprises.
  • the embodiment of the present application provides a data processing method.
  • a complete data life cycle is formed, and then according to the data full-link structure It determines the use department of each service node and each data node in the data full-link structure, which not only prevents unreasonable access across departments, but also greatly improves the security of data use.
  • the asset usage status dashboard and asset value analysis dashboard corresponding to the data assets, the data value is clear, the data display is intuitive and comprehensive, the data assets with large storage space and low value can be released in time, and the waste of storage space can be reduced. Moreover, the consumption of computing resources is reduced, which greatly saves the data management cost of enterprises.
  • FIG. 8 a schematic diagram of program modules of a data processing system 700 according to an embodiment of the present application is shown.
  • the data processing system 700 can be applied to a computer device, and the computer device can be a mobile phone, a tablet personal computer, a laptop computer, or other devices with a data transmission function.
  • the data processing system 700 may include or be divided into one or more program modules, and the one or more program modules are stored in a readable storage medium and processed by one or more processors. Execute to complete the embodiments of the present application, and can implement the above-mentioned data processing system 700 .
  • the program modules referred to in the embodiments of the present application refer to a series of computer program instruction segments capable of performing specific functions, and are more suitable for describing the execution process of the data processing system 700 in the readable storage medium than the programs themselves.
  • the data processing system 700 includes a first parsing module 701 , a second parsing module 702 , a determination module 703 , an apportionment module 704 , a statistics module 705 and a processing module 706 .
  • the following description will specifically introduce the functions of each program module in the embodiments of the present application:
  • the first parsing module 701 is configured to parse the acquired data asset table to obtain the logic code of the data asset table.
  • the first parsing module 701 obtains the data assets of the enterprise, obtains a data asset table, and parses the data asset table through a preset HQL (Hibernate Query Language, a completely object-oriented query language), Obtain the logical code corresponding to the data asset table, wherein the data asset (Data Asset) refers to a data resource that is owned or controlled by the enterprise and can bring future economic benefits to the enterprise and is recorded in a physical or electronic manner , such as documents, electronic data, etc. In an enterprise, not all data constitutes data assets, and data assets are data resources that can generate value for the enterprise.
  • the data assets include: order information data, user information data, capital flow data, traffic data, customer service data, and the like.
  • the second parsing module 702 is configured to parse the logic code according to a preset blood relationship analysis tool, and generate a data full-link structure corresponding to the data asset table, where the data full-link structure includes a service node and data nodes.
  • the second parsing module 702 parses the logic code of the data asset table according to a preset blood relationship analysis tool, and obtains the tree structure code of the data asset table, which is determined according to the tree structure code.
  • the associated nodes of the logic code and then deconstruct the tree structure code according to the preset recursive algorithm, identify each node of the logic code, and finally generate a full data chain corresponding to the data asset table.
  • the data link structure includes a service node and a data node.
  • the second parsing module 702 is specifically configured to:
  • the logic code of the data asset table is parsed, and all the nodes associated with the data asset table and the association relationship between the nodes are generated, and the association relationship includes parent node and child nodes;
  • the nodes are connected according to the association relationship between the nodes, and all the connected nodes form a data full link structure of the data asset table.
  • the second parsing module 702 calls a preset abstract syntax tree (Abstract Syntax Tree, AST) in HIVE to parse the logic code of the data asset table according to the preset blood relationship analysis tool, and generates The data full link structure of the data asset table.
  • AST Abstract Syntax Tree
  • the method may also draw a binary tree corresponding to the data asset table according to the data full link structure and store it in a graph database, so as to facilitate daily query of data links or for use in other projects .
  • the HIVE is a data warehouse tool based on Hadoop, which is used for data extraction, transformation and loading, and is a mechanism that can store, query and analyze large-scale data stored in Hadoop.
  • the preset blood relationship analysis tool has added the function of configurable to identify the special source code of the data asset table, and will automatically alarm when a special source code is identified, such as using a variable instead of a library name or a table name, and submit it to the developer for processing.
  • the second parsing module 702 is further configured to:
  • the preset blood relationship analysis tool analyze the logic code of the data asset table to obtain the tree structure code of the data asset table;
  • the tree structure code is deconstructed according to a preset recursive algorithm, and the association relationship between each node is mined to determine the parent node of each node.
  • the second parsing module 702 calls the AST in the HIVE to parse the logic code of the data asset table according to the preset blood relationship analysis tool and the preset recursive algorithm, and digs out the Each node of the logic code and the associated relationship of each node are used to determine the parent node of each node. It should be noted that when complex nested logic exists in the logic code, the tree structure code will also perform corresponding nesting expansion.
  • the second parsing module 702 is further configured to:
  • the nodes are connected according to the connection mode of the parent node and the child node, and all the connected nodes form a data full link structure of the data asset table.
  • the interpretation of the logic code is as follows: the source table source_table_s is left associated with the source table source_table_t, and the two field contents of name and pp in the associated result are all inserted into the target table target_table_x.
  • the logical code of the table target_table_x is parsed by the AST, and the tree structure code of the table target_table_x is obtained, so that the source tables associated with the table target_table_x are initially obtained as table source_table_s and table source_table_t, wherein,
  • the tree structure code is as follows:
  • the definition of the tree structure code is as follows: the source table source_table_s is left associated with the source table source_table_t, and the contents of the two fields name and pp in the association result are all inserted into the target table target_table_x.
  • the tree-structured code is generated by parsing the logic code through the AST, and the tree-structured code structure is more regular and more convenient for recursion, splitting and deconstruction.
  • FIG. 4 is a schematic effect diagram of the full link structure of data. It is assumed that the table target_table_x is used as a node X of the binary tree, and its parent nodes are S and T respectively.
  • the preset blood relationship analysis tool obtains that the parent nodes of node S are A and B respectively. By repeatedly calling the preset blood relationship analysis tool, all nodes associated with X are obtained, and finally the full data chain of the table target_table_x is obtained. road structure.
  • the logic code of the data asset table is parsed by a preset blood relationship analysis tool, so as to determine the data full-link structure of the data asset table, which not only avoids the phenomenon of errors and omissions caused by manual configuration and registration, but also extremely The accuracy of data link association is greatly improved.
  • the binary tree is drawn according to the full link structure of the data and the knowledge map is constructed, so that the data development team and downstream data users can quickly query and understand the data link without reading a large number of professional codes, lowering the threshold for information acquisition, and expanding the application crowd range.
  • the determining module 703 is configured to acquire the user of the service node, and determine the user of the data node in the full data link structure according to the user of the service node.
  • the data node has no direct user, and the service node confirms the user.
  • the user of the obtained service node "indicator 1" is user A
  • the user of the source table associated with "indicator 1" can be determined, that is, the user A is also data.
  • the allocation module 704 is configured to determine the usage department of the service node and the usage department of the data node according to the usage user of the service node and the usage user of the data node, and assign the service node to the usage department of the data node.
  • the storage resources and computing resources of the nodes and the data nodes are allocated to each user department.
  • the apportioning module 704 classifies the users of the service node and the users of the data node according to the preset attribution of the users, and determines the use department of the service node and the usage department of the data node, and allocate the storage resources and computing resources of the service node and the data node to each usage department.
  • the storage resource is the disk space occupied by the system storage data, which can be obtained through the HIVE;
  • the computing resource is the computing unit used when computing data, including a cluster central processing unit (CPU), memory etc., which can be obtained through the cluster monitoring system log.
  • the apportioning module 704 is specifically configured to:
  • the users are classified according to the attribution department preset by the user, and the use department of the service node and the use department of the data node are determined.
  • the finance department can be determined to be the data node X, The usage department of data node S, data node T, data node A, data node B and data node C.
  • the statistics module 705 is used to separately count the resource usage of each usage department and the resource access situation of each usage department, and generate an asset usage dashboard corresponding to the resource usage and a dashboard corresponding to the resource usage according to the statistical results. Asset value analysis dashboard corresponding to resource access.
  • the statistics module 705 obtains the total allocated space of each use department, the storage resources of the service node and the computing resources of the service node, and then calculates the total allocated storage space of each use department, Statistical calculation is performed on the allocated storage space of various service nodes, and the asset usage status dashboard is generated according to the resource usage status of each user department.
  • the access volume of each service node in each usage department is acquired, then the resource access situation of each usage department is counted, and the asset value analysis dashboard is generated according to the resource access situation.
  • the statistics module 705 is specifically used for:
  • the resource access situation of each user department is counted, and the asset value analysis dashboard is generated according to the resource access situation.
  • the statistics module 705 is further used for:
  • the total storage space the total allocated storage space, the allocated storage space, the first ratio result, the second ratio result, the first sorting result and the second sorting result, the generated the asset usage dashboard.
  • the statistics module 705 first separately obtains the total storage space of the respective usage departments, the storage resources of the service nodes, and the computing resources of the service nodes, and counts the total apportionment of the respective usage departments.
  • Storage space, the allocated storage space of various service nodes, the service nodes include reports, indicators, labels, interfaces, etc., and then calculate the ratio of the total storage space of each user department to the total allocated storage space , obtain the first ratio result, calculate the ratio of the apportioned storage space to the total storage space of each using department, obtain the second ratio result, and then sort the storage resources to obtain the first sorting result,
  • the computing resources are sorted to obtain a second sorting result, and finally the total storage space, the total allocated storage space, the allocated storage space, the first ratio result, the second ratio result, the The first sorting result and the second sorting result generate the asset usage dashboard and display the asset usage dashboard.
  • This embodiment of the present application generates an asset usage status dashboard by performing statistical analysis on the data asset and displays the asset usage status dashboard, so that the user can understand the usage status of the data asset simply and intuitively.
  • the statistics module 705 is further used for:
  • the third ratio result is normalized and sorted in descending order to obtain the fifth sorting result
  • the asset value analysis dashboard is generated according to the third sorting result, the fourth sorting result, the fifth sorting result and the sixth sorting result.
  • the statistics module 705 first obtains the access volume of the service nodes in the respective usage departments, normalizes the access volume and sorts it, and then calculates the access volume and the total storage space The ratio is normalized and sorted, and finally the asset value analysis dashboard is generated according to the sorting result and displayed on the asset value analysis dashboard.
  • This embodiment of the present application generates an asset value analysis dashboard by performing statistical analysis on the data asset and displays the asset value analysis dashboard, so that the user can simply and intuitively understand the asset value of the data asset.
  • the processing module 706 is configured to perform statistical analysis on the asset usage status dashboard and the asset value analysis dashboard respectively, and respectively perform statistical analysis on the service results corresponding to the resource usage status that is lower than the first preset value.
  • the data in the node and the data in the service node corresponding to the resource access condition lower than the second preset value are archived or offline.
  • the processing module 706 analyzes the statistical results of the asset usage status dashboard and the asset value analysis dashboard, compares the third sorting result with the first preset value, and obtains a value lower than the specified value.
  • the third sorting result of the first preset value is obtained to obtain the first comparison result;
  • the fifth sorting result is compared with the second preset value to obtain the fifth sorting result lower than the second preset value
  • a second comparison result is obtained, and the low-value data assets that are lower than the preset value, that is, the first comparison result and the second comparison result, are archived or offline, thereby releasing storage and computing resources.
  • Data assets include high storage, high computing consumption, low access heat, and low importance data content.
  • the embodiment of the present application promotes the data user to actively cooperate with the data developer to gradually release data assets with high input and low output, thereby reducing the waste of storage space and the consumption of computing resources. At the same time, it saves the cost of data management for enterprises.
  • the embodiment of the present application provides the data processing system 700 by incorporating the entire data flow from data storage to final consumption into the data full-link structure to form a complete data life cycle, and then according to the data full-link structure It determines the use department of each service node and each data node in the data full-link structure, which not only prevents unreasonable access across departments, but also greatly improves the security of data use.
  • the data value is clear, the data display is intuitive and comprehensive, the data assets with large storage space and low value can be released in time, and the waste of storage space can be reduced.
  • the consumption of computing resources is reduced, which greatly saves the data management cost of enterprises.
  • an embodiment of the present application further provides a schematic diagram of a hardware architecture of a computer device 800 .
  • a computer device 800 Such as smart phones, tablet computers, notebook computers, desktop computers, rack servers, blade servers, tower servers or rack servers (including independent servers, or server clusters composed of multiple servers) that can execute programs, etc. .
  • the computer device 800 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions.
  • the computer device 800 at least includes, but is not limited to, a memory 801, a processor 802, and a network interface 803 that can communicate with each other through a device bus. in:
  • the memory 801 includes at least one type of computer-readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), and random access memory.
  • RAM static random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read only memory
  • PROM programmable read only memory
  • magnetic memory magnetic disk, optical disk, and the like.
  • the memory 801 may be an internal storage unit of the computer device 800 , such as a hard disk or a memory of the computer device 800 .
  • the memory 801 may also be an external storage device of the computer device 800, for example, a pluggable hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) device equipped on the computer device 800 Digital, SD) card, flash card (Flash Card), etc.
  • the memory 801 may also include both the internal storage unit of the computer device 800 and its external storage device.
  • the memory 801 is generally used to store an operating device installed in the computer device 800 and various types of application software, such as program codes of the data processing system 700 and the like.
  • the memory 801 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 802 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some inventive embodiments.
  • the processor 802 is generally used to control the overall operation of the computer device 800 .
  • the processor 802 is configured to run the program code or process data stored in the memory 801, for example, run the program code of the data processing system 700, so as to implement the data processing method in each of the foregoing invention embodiments.
  • the network interface 803 may include a wireless network interface or a wired network interface, and the network interface 803 is generally used to establish a communication connection between the computer device 800 and other electronic devices.
  • the network interface 803 is used to connect the computer device 800 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 800 and the external terminal.
  • the network may be an intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network Wireless or wired network such as network, Bluetooth (Bluetooth), Wi-Fi, etc.
  • FIG. 9 only shows computer device 800 having components 801-803, but it should be understood that implementation of all shown components is not required, and that more or less components may be implemented instead.
  • the data processing system 700 stored in the memory 801 may also be divided into one or more program modules, and the one or more program modules are stored in the memory 801 and are composed of one or more program modules.
  • a plurality of processors (the embodiment of the present application is the processor 802 ) are executed to complete the data processing method of the present application.
  • Embodiments of the present application also provide a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Disk, Optical Disc, Server, App Store, etc., on which computer programs are stored , the program implements the corresponding function when the program is executed by the processor.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium of the embodiment of the present application is used to store the data processing system 700, so as to implement the data processing method of the present application when executed by the processor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种数据处理方法,包括:根据预设的血缘分析工具,对数据资产表的逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构;将所述数据全链路结构的服务结点及数据结点的存储资源及计算资源分摊至各使用部门;分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并对统计结果进行分析,将低于预设值的数据资产进行归档或下线。本申请实施例杜绝跨部门的不合理访问情况,极大提高了数据资产使用的安全性,而且数据资产展示直观全面,数据资产价值明确,用户能够及时释放存储空间大且低价值的数据资产,减少了存储空间的浪费,不仅降低了计算资源的消耗,也极大节省了企业的数据资产管理成本。

Description

数据处理方法、系统、计算机设备及可读存储介质
本申请要求于2021年2月25日提交中国专利局、申请号为202110214728.4、发明名称为“数据处理方法、系统、计算机设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及大数据技术领域,具体涉及一种数据处理方法、系统、计算机设备及可读存储介质。
背景技术
随着经济的不断发展,企业的业务不断扩大,数据资产日积月累,随之管理数据的成本也在增加。
传统的数据资产管理方法是通过对数据资产进行登记及分类,形成资产目录树,仅能支持数据的搜索定位,数据登记需要依赖人工操作,无用或者访问量低的数据也主要通过人工进行判别。
然而,针对上述做法,发明人发现,传统的数据资产管理方法登记时容易造成错漏现象且效率低下,数据分散,没有形成完整的管理系统,不仅容易造成跨部门的不合理访问,而且数据价值不明确,数据展示不直观,造成存储资源、计算资源及管理成本的多重浪费。
发明内容
本申请的目的在于提供一种数据处理方法、系统、计算机设备及可读存储介质,用于解决现有技术中登记时容易造成错漏现象且效率低下,数据分散,没有形成完整的管理系统,不仅容易造成跨部门的不合理访问,而且数据价值不明确,数据展示不直观,造成存储资源、计算资源及管理成本的多重浪费的缺陷。
根据本申请的一个方面,提供了一种数据处理方法,该方法包括如下步骤:
对获取的数据资产表进行解析,得到所述数据资产表的逻辑代码;
根据预设的血缘分析工具,对所述逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点;
获取所述服务结点的使用用户,并根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户;
根据所述服务结点的使用用户及所述数据结点的使用用户,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门;
分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘;
对所述资产使用状况仪表盘及所述资产价值分析仪表盘分别进行统计分析,并分别将统计出的低于第一预设值的所述资源使用情况对应的服务结点中的数据及低于第二预设值的所述资源访问情况对应的服务结点中的数据进行归档或下线。
为了实现上述目的,本申请还提供一种数据处理系统,该系统具体包括以下组成部分:
第一解析模块,用于对获取的数据资产表进行解析,得到所述数据资产表的逻辑代码;
第二解析模块,用于根据预设的血缘分析工具,对所述逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点;
确定模块,用于获取所述服务结点的使用用户,并根据所述服务结点的使用用户,确 定所述数据全链路结构中所述数据结点的使用用户;
分摊模块,用于根据所述服务结点的使用用户及所述数据结点的使用用户,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门;
统计模块,用于分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘。
处理模块,用于对所述资产使用状况仪表盘及所述资产价值分析仪表盘分别进行统计分析,并分别将统计出的低于第一预设值的所述资源使用情况对应的服务结点中的数据及低于第二预设值的所述资源访问情况对应的服务结点中的数据进行归档或下线。
为了实现上述目的,本申请还提供一种计算机设备,该计算机设备具体包括:存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
对获取的数据资产表进行解析,得到所述数据资产表的逻辑代码;
根据预设的血缘分析工具,对所述逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点;
获取所述服务结点的使用用户,并根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户;
根据所述服务结点的使用用户及所述数据结点的使用用户,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门;
分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘;
对所述资产使用状况仪表盘及所述资产价值分析仪表盘分别进行统计分析,并分别将统计出的低于第一预设值的所述资源使用情况对应的服务结点中的数据及低于第二预设值的所述资源访问情况对应的服务结点中的数据进行归档或下线。
为了实现上述目的,本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:
对获取的数据资产表进行解析,得到所述数据资产表的逻辑代码;
根据预设的血缘分析工具,对所述逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点;
获取所述服务结点的使用用户,并根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户;
根据所述服务结点的使用用户及所述数据结点的使用用户,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门;
分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘;
对所述资产使用状况仪表盘及所述资产价值分析仪表盘分别进行统计分析,并分别将统计出的低于第一预设值的所述资源使用情况对应的服务结点中的数据及低于第二预设值的所述资源访问情况对应的服务结点中的数据进行归档或下线。
本申请提供的数据处理方法、系统、计算机设备及可读存储介质,通过将整个数据流从数据入库到最终消费使用这整个过程都纳入数据全链路结构中,形成数据完整的生命周期,再根据所述数据全链路结构的结点,确定所述数据全链路结构中各服务结点及各数据 结点的使用部门,不仅杜绝了跨部门的不合理访问情况,而且极大提高了数据使用的安全性。通过生成所述数据资产对应的资产使用状况仪表盘及资产价值分析仪表盘,使得数据价值明确,数据展示直观全面,能够及时释放存储空间大且低价值的数据资产,减少了存储空间的浪费,且降低了计算资源的消耗,极大节省了企业的数据管理成本。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1为本申请实施例提供的数据处理方法的一种可选的步骤流程示意图;
图2为本申请实施例提供的图1中步骤S200的一种可选的步骤细化流程示意图;
图3为本申请实施例提供的图2中步骤S201的一种可选的步骤细化流程示意图;
图4为本申请实施例提供的一种示意性的数据全链路结构效果图;
图5为本申请实施例提供的图1中步骤S400的一种可选的步骤细化流程示意图;
图6为本申请实施例提供的图1中步骤S500的一种可选的步骤细化流程示意图;
图7为本申请实施例提供的图1中步骤S500的另一种可选的步骤细化流程示意图;
图8为本申请实施例提供的数据处理系统的一种可选的程序模块示意图;
图9为本申请实施例提供的计算机设备的一种可选的硬件架构示意图。
具体实施方式
这里将详细地对示例性发明实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性发明实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的系统和方法的例子。
在本申请使用的术语是仅仅出于描述特定发明实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本申请可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
在本申请的描述中,需要理解的是,步骤前的数字标号并不标识执行步骤的前后顺序,仅用于方便描述本申请及区别每一步骤,因此不能理解为对本申请的限制。基于本申请中的发明实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他发明实施例,都属于本申请保护的范围。
下面结合附图对本申请实施例进行说明。
实施例一
参阅图1,示出了本申请实施例提供的一种数据处理方法的步骤流程示意图。可以理解,本申请实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备为执行主体进行示例性描述,所述计算机设备可以包括诸如智能手机、平板个人计算机(tablet personal computer)、膝上型计算机(laptop computer)等移动终端,以及诸如台式计算机等固定终端。具体如下:
步骤S100,对获取的数据资产表进行解析,得到所述数据资产表的逻辑代码。
具体地,获取企业的数据资产,得到数据资产表,通过预设的HQL(Hibernate Query  Language,一种完全面对对象的查询语言)对所述数据资产表进行解析,得到所述数据资产表对应的逻辑代码,其中,所述数据资产(Data Asset)是指由企业拥有或者控制的,能够为企业带来未来经济利益的,以物理或电子的方式记录的数据资源,如文件资料、电子数据等,在企业中,并非所有的数据都构成数据资产,数据资产是能够为企业产生价值的数据资源。所述数据资产包括:订单信息数据、用户信息数据、资金流水数据、流量数据及客户服务数据等。
步骤S200,根据预设的血缘分析工具,对所述逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点。
具体地,根据预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,得到所述数据资产表的树形结构代码,根据所述树形结构代码确定所述逻辑代码的关联结点,再根据预设的递归算法对所述树形结构代码进行解构,识别出所述逻辑代码的各个结点,最终生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点。
在示例性的实施例中,如图2所示,所述步骤S200可以包括:
步骤S201,根据所述预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,生成所述数据资产表相关联的所有结点以及各个结点之间的关联关系,所述关联关系包括父结点与子结点;
步骤S202,根据所述各个结点之间的关联关系将所述各个结点进行连接,连接后的所有结点构成所述数据资产表的数据全链路结构。
具体地,根据所述预设的血缘分析工具,调用预设的HIVE中的抽象语法树(Abstract Syntax Tree,AST)对所述数据资产表的逻辑代码进行解析,生成所述数据资产表的数据全链路结构。
在示例性的实施例中,所述方法还可以根据所述数据全链路结构绘制与所述数据资产表对应的二叉树并存储于图数据库中,以便于日常查询数据链路或供其他项目使用。其中,所述HIVE是基于Hadoop的一个数据仓库工具,用来进行数据提取、转化和加载,是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。需要说明的是,所述预设的血缘分析工具增加了可配置识别数据资产表的特殊源代码的功能,当识别到有特殊的源代码例如使用变量代替库名或者表名时会自动报警,并提交至开发人员进行处理。
在示例性的实施例中,如图3所示,所述步骤S201可以包括:
步骤S211,根据所述预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,得到所述数据资产表的树形结构代码;
步骤S212,根据预设的递归算法对所述树形结构代码进行解构,挖掘出所述各个结点之间的关联关系,以确定所述各个结点的父结点。
具体地,根据所述预设的血缘分析工具及预设的递归算法,调用所述HIVE中所述AST对所述数据资产表的逻辑代码进行解析,挖掘出所述逻辑代码的各个结点以及所述各个结点的关联关系,以确定所述各个结点的父结点。需要注意的是,当所述逻辑代码中存在复杂的嵌套逻辑时,所述树形结构代码也会进行相应的嵌套扩充。
在示例性的实施例中,所述步骤S202可以包括:
将所述各个结点按照所述父结点和所述子结点的连接方式进行连接,连接后的所有结点构成所述数据资产表的数据全链路结构。
示例性的,假设存在一张表target_table_x,所述表target_table_x的逻辑代码如下所示:
insert overwrite table target_table_x
select a.name,b.pp
from schema_a.source_table_s b
left join schema_b.source_table_t a on a.id=b.id
所述逻辑代码的释义如下:源表source_table_s左关联源表source_table_t,并将关联结 果中name,pp这两个字段内容全部插入目标表target_table_x中。
通过所述AST对所述表target_table_x的所述逻辑代码进行解析,得到所述表target_table_x的树形结构代码,从而初步得到所述表target_table_x所关联的源表分别为表source_table_s和表source_table_t,其中,所述树形结构代码如下所示:
Figure PCTCN2021091309-appb-000001
所述树形结构代码的释义如下:源表source_table_s左关联源表source_table_t,并将关联结果中name,pp这两个字段内容全部插入目标表target_table_x中。所述树形结构代码通过所述AST对所述逻辑代码进行解析后生成,所述树形结构代码结构更规则化,更便于递归、拆分及解构。
如图4所示,图4为一种示意性的数据全链路结构效果图,假设所述表target_table_x作为二叉树的一个结点X,其父结点分别为S和T,假设又通过所述预设的血缘分析工具得出结点S的父结点分别为A和B,通过反复调用所述预设的血缘分析工具,得到X所关联的所有结点,最终得到表target_table_x的数据全链路结构。
本申请实施例通过预设的血缘分析工具对所述数据资产表的逻辑代码进行解析,从而确定所述数据资产表的数据全链路结构,不仅避免了人工配置登记造成的错漏现象,而且极大提高了数据链路关联准确性。此外,根据数据全链路结构绘制二叉树并构建知识图谱,使数据开发组及下游数据使用方不需翻阅大量专业代码便可快速查询及了解数据链路,降低了信息获取门槛,而且扩大了适用人群范围。
步骤S300,获取所述服务结点的使用用户,并根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户。
具体地,所述数据结点没有直接的使用用户,通过所述服务结点来确认使用用户。
示例性的,请继续参阅图4,假设获取的服务结点“指标1”的使用用户为用户A,则可以确定“指标1”所关联源表的使用用户,即所述用户A也为数据结点X、数据结点S、数据结点T、数据结点A、数据结点B及数据结点C的使用用户。
步骤S400,根据所述服务结点的使用用户及所述数据结点的使用用户,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门。
具体地,将所述服务结点的使用用户及所述数据结点的使用用户按照预设的所述使用用户的归属部门进行归类,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门。其中,所述存储资源为系统存储数据占用的磁盘空间,可以通过所述HIVE获得;所述计算资源为运算数据时所使用的计算单元,包括集群中央处理器(Central Processing Unit,CPU)、内存等,可以通过集群监控系统日志获得。
在示例性的实施例中,如图5所示,所述步骤S400可以包括:
步骤S401,根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户;
步骤S402,将所述使用用户按照所述使用用户预设的归属部门进行归类,确定所述服务结点的使用部门以及所述数据结点的使用部门。
示例性的,请继续参阅图4,假设获取到服务结点“指标1”的使用用户为用户A,且所述用户A的归属部门为财务部,则可确定财务部为数据结点X、数据结点S、数据结点T、数据结点A、数据结点B及数据结点C的使用部门。
本申请实施例通过根据数据全链路结构的服务结点,确定所述数据全链路结构中个数据结点的使用部门,不仅杜绝了跨部门的不合理访问情况,而且极大提高了数据使用的安全性。
步骤S500,分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘。
具体地,获取所述各使用部门的总分摊空间、所述服务结点的存储资源及所述服务结点的计算资源,然后对所述各使用部门的总分摊存储空间、各种服务结点的分摊存储空间进行统计计算,并根据所述各使用部门的资源使用情况生成所述资产使用状况仪表盘。获取所述各使用部门中各服务结点的访问量,然后对所述各使用部门的资源访问情况进行统计,并根据所述资源访问情况生成所述资产价值分析仪表盘。
在示例性的实施例中,如图6所示,所述步骤S500可以包括:
步骤S501,分别获取所述各使用部门的总存储空间、所述服务结点的存储资源及所述服务结点的计算资源;
步骤S502,分别统计所述各使用部门的总分摊存储空间、各种服务结点的分摊存储空间;
步骤S503,计算所述各使用部门的所述总存储空间与所述总分摊储存空间的比值,得到第一比值结果;
步骤S504,计算所述各使用部门的所述分摊存储空间与所述总存储空间的比值,得到第二比值结果;
步骤S505,将所述存储资源进行排序,得到第一排序结果;
步骤S506,将所述计算资源进行排序,得到第二排序结果;
步骤S507,根据所述总存储空间、所述总分摊存储空间、所述分摊存储空间、所述第一比值结果、所述第二比值结果、所述第一排序结果及所述第二排序结果,生成所述资产使用状况仪表盘。
具体地,先分别获取所述各使用部门的总存储空间、所述服务结点的存储资源及所述服务结点的计算资源,以及分别统计所述各使用部门的总分摊存储空间、各种服务结点的分摊存储空间,所述服务结点包括报表、指标、标签及接口等,然后计算所述各使用部门的所述总存储空间与所述总分摊储存空间的比值,得到第一比值结果,计算所述各使用部门的所述分摊存储空间与所述总存储空间的比值,得到第二比值结果,然后将所述存储资源进行排序,得到第一排序结果,将所述计算资源进行排序,得到第二排序结果,最后根据所述总存储空间、所述总分摊存储空间、所述分摊存储空间、所述第一比值结果、所述第二比值结果、所述第一排序结果及所述第二排序结果,生成所述资产使用状况仪表盘并将所述资产使用状况仪表盘进行展示。
本申请实施例通过对所述数据资产进行统计分析生成资产使用状况仪表盘并将所述资产使用状况仪表盘进行展示,使用户可以简单直观地了解数据资产的使用状况。
在示例性的实施例中,如图7所示,所述步骤S500可以包括:
步骤S511,获取所述各使用部门中各服务结点的访问量;
步骤S512,将所述访问量归一化处理后从大到小进行排序,得到第三排序结果;
步骤S513,将所述第三排序结果进行倒序处理,得到第四排序结果;
步骤S514,计算所述访问量与所述总存储空间的比值,得到第三比值结果;
步骤S515,将所述第三比值结果归一化处理后从大到小进行排序,得到第五排序结果;
步骤S516,将所述第五排序结果进行倒序处理,得到第六排序结果;
步骤S517,根据所述第三排序结果、所述第四排序结果、所述第五排序结果及所述第六排序结果,生成所述资产价值分析仪表盘。
具体地,先获取所述各使用部门中所述服务结点的访问量,将所述访问量归一化处理后进行排序,然后计算所述访问量与所述总存储空间的比值,将所述比值归一化处理后进行排序,最后根据排序结果生成所述资产价值分析仪表盘并将所述资产价值分析仪表盘进行展示。
本申请实施例通过对所述数据资产进行统计分析生成资产价值分析仪表盘并将所述资产价值分析仪表盘进行展示,使用户可以简单直观地了解数据资产的资产价值情况。
步骤S600,对所述资产使用状况仪表盘及所述资产价值分析仪表盘分别进行统计分析,并分别将统计出的低于第一预设值的所述资源使用情况对应的服务结点中的数据及低于第二预设值的所述资源访问情况对应的服务结点中的数据进行归档或下线。
具体地,对所述资产使用状况仪表盘及所述资产价值分析仪表盘的统计结果进行分析,将所述第三排序结果与第一预设值进行比较,获取低于所述第一预设值的所述第三排序结果,得到第一比较结果;将所述第五排序结果与第二预设值进行比较,获取低于第二预设值的所述第五排序结果,得到第二比较结果,将低于预设值的低价值数据资产即所述第一比较结果及所述第二比较结果进行归档或下线,从而释放存储和计算资源,其中,低价值数据资产包括高存储、高计算消耗、低访问热度及低重要性的数据内容。
本申请实施例通过将生成的资产价值分析仪表盘进行展示,推动数据使用方主动配合数据开发方逐步释放高投入低产出的数据资产,减少了存储空间的浪费,降低了计算资源的消耗,同时节省了企业的数据管理成本。
本申请实施例提供数据处理方法,通过将整个数据流从数据入库到最终消费使用这整个过程都纳入数据全链路结构中,形成数据完整的生命周期,再根据所述数据全链路结构的结点,确定所述数据全链路结构中各服务结点及各数据结点的使用部门,不仅杜绝了跨部门的不合理访问情况,而且极大提高了数据使用的安全性。通过生成所述数据资产对应的资产使用状况仪表盘及资产价值分析仪表盘,使得数据价值明确,数据展示直观全面,能够及时释放存储空间大且低价值的数据资产,减少了存储空间的浪费,且降低了计算资源的消耗,极大节省了企业的数据管理成本。
实施例二
参阅图8,示出了本申请实施例之一种数据处理系统700的程序模块示意图。所述数据处理系统700可以应用于计算机设备中,所述计算机设备可以是手机、平板个人计算机(tablet personal computer)、膝上型计算机(laptop computer)、等具有数据传输功能的设备。在本申请实施例中,所述数据处理系统700可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于可读存储介质中,并由一个或多个处理器所执行,以完成本申请实施例,并可实现上述数据处理系统700。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序本身更适合于描述所述数据处理系统700在可读存储介质中的执行过程。在示例性的实施例中,该数据处理系统700包括第一解析模块701、第二解析模块702、确定模块703、分摊模块704、统计模块705及处理模块706。以下描述将具体介绍本申请实施例各程序模块的功能:
第一解析模块701,用于对获取的数据资产表进行解析,得到所述数据资产表的逻辑代码。
具体地,所述第一解析模块701获取企业的数据资产,得到数据资产表,通过预设的 HQL(Hibernate Query Language,一种完全面对对象的查询语言)对所述数据资产表进行解析,得到所述数据资产表对应的逻辑代码,其中,所述数据资产(Data Asset)是指由企业拥有或者控制的,能够为企业带来未来经济利益的,以物理或电子的方式记录的数据资源,如文件资料、电子数据等,在企业中,并非所有的数据都构成数据资产,数据资产是能够为企业产生价值的数据资源。所述数据资产包括:订单信息数据、用户信息数据、资金流水数据、流量数据及客户服务数据等。
第二解析模块702,用于根据预设的血缘分析工具,对所述逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点。
具体地,所述第二解析模块702根据预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,得到所述数据资产表的树形结构代码,根据所述树形结构代码确定所述逻辑代码的关联结点,再根据预设的递归算法对所述树形结构代码进行解构,识别出所述逻辑代码的各个结点,最终生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点。
在示例性的实施例中,所述第二解析模块702具体用于:
根据所述预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,生成所述数据资产表相关联的所有结点以及各个结点之间的关联关系,所述关联关系包括父结点与子结点;
根据所述各个结点之间的关联关系将所述各个结点进行连接,连接后的所有结点构成所述数据资产表的数据全链路结构。
具体地,所述第二解析模块702根据所述预设的血缘分析工具,调用预设的HIVE中的抽象语法树(Abstract Syntax Tree,AST)对所述数据资产表的逻辑代码进行解析,生成所述数据资产表的数据全链路结构。
在示例性的实施例中,所述方法还可以根据所述数据全链路结构绘制与所述数据资产表对应的二叉树并存储于图数据库中,以便于日常查询数据链路或供其他项目使用。其中,所述HIVE是基于Hadoop的一个数据仓库工具,用来进行数据提取、转化和加载,是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。需要说明的是,所述预设的血缘分析工具增加了可配置识别数据资产表的特殊源代码的功能,当识别到有特殊的源代码例如使用变量代替库名或者表名时会自动报警,并提交至开发人员进行处理。
在示例性的实施例中,所述第二解析模块702具体还用于:
根据所述预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,得到所述数据资产表的树形结构代码;
根据预设的递归算法对所述树形结构代码进行解构,挖掘出所述各个结点之间的关联关系,以确定所述各个结点的父结点。
具体地,所述第二解析模块702根据所述预设的血缘分析工具及预设的递归算法,调用所述HIVE中所述AST对所述数据资产表的逻辑代码进行解析,挖掘出所述逻辑代码的各个结点以及所述各个结点的关联关系,以确定所述各个结点的父结点。需要注意的是,当所述逻辑代码中存在复杂的嵌套逻辑时,所述树形结构代码也会进行相应的嵌套扩充。
在示例性的实施例中,所述第二解析模块702具体还用于:
将所述各个结点按照所述父结点和所述子结点的连接方式进行连接,连接后的所有结点构成所述数据资产表的数据全链路结构。
示例性的,假设存在一张表target_table_x,所述表target_table_x的逻辑代码如下所示:
insert overwrite table target_table_x
select a.name,b.pp
from schema_a.source_table_s b
left join schema_b.source_table_t a on a.id=b.id
所述逻辑代码的释义如下:源表source_table_s左关联源表source_table_t,并将关联结 果中name,pp这两个字段内容全部插入目标表target_table_x中。
通过所述AST对所述表target_table_x的所述逻辑代码进行解析,得到所述表target_table_x的树形结构代码,从而初步得到所述表target_table_x所关联的源表分别为表source_table_s和表source_table_t,其中,所述树形结构代码如下所示:
Figure PCTCN2021091309-appb-000002
所述树形结构代码的释义如下:源表source_table_s左关联源表source_table_t,并将关联结果中name,pp这两个字段内容全部插入目标表target_table_x中。所述树形结构代码通过所述AST对所述逻辑代码进行解析后生成,所述树形结构代码结构更规则化,更便于递归、拆分及解构。
如图4所示,图4为一种示意性的数据全链路结构效果图,假设所述表target_table_x作为二叉树的一个结点X,其父结点分别为S和T,假设又通过所述预设的血缘分析工具得出结点S的父结点分别为A和B,通过反复调用所述预设的血缘分析工具,得到X所关联的所有结点,最终得到表target_table_x的数据全链路结构。
本申请实施例通过预设的血缘分析工具对所述数据资产表的逻辑代码进行解析,从而确定所述数据资产表的数据全链路结构,不仅避免了人工配置登记造成的错漏现象,而且极大提高了数据链路关联准确性。此外,根据数据全链路结构绘制二叉树并构建知识图谱,使数据开发组及下游数据使用方不需翻阅大量专业代码便可快速查询及了解数据链路,降低了信息获取门槛,而且扩大了适用人群范围。
确定模块703,用于获取所述服务结点的使用用户,并根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户。
具体地,所述确定模块703所述数据结点没有直接的使用用户,通过所述服务结点来确认使用用户。
示例性的,请继续参阅图4,假设获取的服务结点“指标1”的使用用户为用户A,则可以确定“指标1”所关联源表的使用用户,即所述用户A也为数据结点X、数据结点S、数据结点T、数据结点A、数据结点B及数据结点C的使用用户。
分摊模块704,用于根据所述服务结点的使用用户及所述数据结点的使用用户,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门。
具体地,所述分摊模块704将所述服务结点的使用用户及所述数据结点的使用用户按照预设的所述使用用户的归属部门进行归类,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门。其中,所述存储资源为系统存储数据占用的磁盘空间,可以通过所述HIVE获得;所述计算资源为运算数据时所使用的计算单元,包括集群中央处理器(Central Processing Unit,CPU)、内存等,可以通过集群监控系统日志获得。
在示例性的实施例中,所述分摊模块704具体用于:
根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户;
将所述使用用户按照所述使用用户预设的归属部门进行归类,确定所述服务结点的使用部门以及所述数据结点的使用部门。
示例性的,请继续参阅图4,假设获取到服务结点“指标1”的使用用户为用户A,且所述用户A的归属部门为财务部,则可确定财务部为数据结点X、数据结点S、数据结点T、数据结点A、数据结点B及数据结点C的使用部门。
本申请实施例通过根据数据全链路结构的服务结点,确定所述数据全链路结构中个数据结点的使用部门,不仅杜绝了跨部门的不合理访问情况,而且极大提高了数据使用的安全性。
统计模块705,用于分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘。
具体地,所述统计模块705获取所述各使用部门的总分摊空间、所述服务结点的存储资源及所述服务结点的计算资源,然后对所述各使用部门的总分摊存储空间、各种服务结点的分摊存储空间进行统计计算,并根据所述各使用部门的资源使用情况生成所述资产使用状况仪表盘。获取所述各使用部门中各服务结点的访问量,然后对所述各使用部门的资源访问情况进行统计,并根据所述资源访问情况生成所述资产价值分析仪表盘。
在示例性的实施例中,所述统计模块705具体用于:
统计所述各使用部门的资源使用情况,并根据所述资源使用情况生成所述资产使用状况仪表盘;
统计所述各使用部门的资源访问情况,并根据所述资源访问情况生成所述资产价值分析仪表盘。
在示例性的实施例中,所述统计模块705具体还用于:
分别获取所述各使用部门的总存储空间、所述服务结点的存储资源及所述服务结点的计算资源;
分别统计所述各使用部门的总分摊存储空间、各种服务结点的分摊存储空间;
计算所述各使用部门的所述总存储空间与所述总分摊储存空间的比值,得到第一比值结果;
计算所述各使用部门的所述分摊存储空间与所述总存储空间的比值,得到第二比值结果;
将所述存储资源进行排序,得到第一排序结果;
将所述计算资源进行排序,得到第二排序结果;
根据所述总存储空间、所述总分摊存储空间、所述分摊存储空间、所述第一比值结果、所述第二比值结果、所述第一排序结果及所述第二排序结果,生成所述资产使用状况仪表盘。
具体地,所述统计模块705先分别获取所述各使用部门的总存储空间、所述服务结点的存储资源及所述服务结点的计算资源,以及分别统计所述各使用部门的总分摊存储空间、各种服务结点的分摊存储空间,所述服务结点包括报表、指标、标签及接口等,然后计算所述各使用部门的所述总存储空间与所述总分摊储存空间的比值,得到第一比值结果,计算所述各使用部门的所述分摊存储空间与所述总存储空间的比值,得到第二比值结果,然后将所述存储资源进行排序,得到第一排序结果,将所述计算资源进行排序,得到第二排序结果,最后根据所述总存储空间、所述总分摊存储空间、所述分摊存储空间、所述第一比值结果、所述第二比值结果、所述第一排序结果及所述第二排序结果,生成所述资产使用状况仪表盘并将所述资产使用状况仪表盘进行展示。
本申请实施例通过对所述数据资产进行统计分析生成资产使用状况仪表盘并将所述资 产使用状况仪表盘进行展示,使用户可以简单直观地了解数据资产的使用状况。
在示例性的实施例中,所述统计模块705具体还用于:
获取所述各使用部门中各服务结点的访问量;
将所述访问量归一化处理后从大到小进行排序,得到第三排序结果;
将所述第三排序结果进行倒序处理,得到第四排序结果;
计算所述访问量与所述总存储空间的比值,得到第三比值结果;
将所述第三比值结果归一化处理后从大到小进行排序,得到第五排序结果;
将所述第五排序结果进行倒序处理,得到第六排序结果;
根据所述第三排序结果、所述第四排序结果、所述第五排序结果及所述第六排序结果,生成所述资产价值分析仪表盘。
具体地,所述统计模块705先获取所述各使用部门中所述服务结点的访问量,将所述访问量归一化处理后进行排序,然后计算所述访问量与所述总存储空间的比值,将所述比值归一化处理后进行排序,最后根据排序结果生成所述资产价值分析仪表盘并将所述资产价值分析仪表盘进行展示。
本申请实施例通过对所述数据资产进行统计分析生成资产价值分析仪表盘并将所述资产价值分析仪表盘进行展示,使用户可以简单直观地了解数据资产的资产价值情况。
处理模块706,用于对所述资产使用状况仪表盘及所述资产价值分析仪表盘分别进行统计分析,并分别将统计出的低于第一预设值的所述资源使用情况对应的服务结点中的数据及低于第二预设值的所述资源访问情况对应的服务结点中的数据进行归档或下线。
具体地,所述处理模块706对所述资产使用状况仪表盘及所述资产价值分析仪表盘的统计结果进行分析,将所述第三排序结果与第一预设值进行比较,获取低于所述第一预设值的所述第三排序结果,得到第一比较结果;将所述第五排序结果与第二预设值进行比较,获取低于第二预设值的所述第五排序结果,得到第二比较结果,将低于预设值的低价值数据资产即所述第一比较结果及所述第二比较结果进行归档或下线,从而释放存储和计算资源,其中,低价值数据资产包括高存储、高计算消耗、低访问热度及低重要性的数据内容。
本申请实施例通过将生成的资产价值分析仪表盘进行展示,推动数据使用方主动配合数据开发方逐步释放高投入低产出的数据资产,减少了存储空间的浪费,降低了计算资源的消耗,同时节省了企业的数据管理成本。
本申请实施例提供数据处理系统700通过将整个数据流从数据入库到最终消费使用这整个过程都纳入数据全链路结构中,形成数据完整的生命周期,再根据所述数据全链路结构的结点,确定所述数据全链路结构中各服务结点及各数据结点的使用部门,不仅杜绝了跨部门的不合理访问情况,而且极大提高了数据使用的安全性。通过生成所述数据资产对应的资产使用状况仪表盘及资产价值分析仪表盘,使得数据价值明确,数据展示直观全面,能够及时释放存储空间大且低价值的数据资产,减少了存储空间的浪费,且降低了计算资源的消耗,极大节省了企业的数据管理成本。
实施例三
参阅图9,本申请实施例还提供一种计算机设备800的硬件架构示意图。如可以执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。在本申请实施例中,所述计算机设备800是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。如图所示,所述计算机设备800至少包括,但不限于,可通过装置总线相互通信连接存储器801、处理器802、网络接口803。其中:
本申请实施例中,存储器801至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读 存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些发明实施例中,存储器801可以是计算机设备800的内部存储单元,例如所述计算机设备800的硬盘或内存。在另一些发明实施例中,存储器801也可以是计算机设备800的外部存储设备,例如所述计算机设备800上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,存储器801还可以既包括计算机设备800的内部存储单元也包括其外部存储设备。本申请实施例中,存储器801通常用于存储安装于计算机设备800的操作装置和各类应用软件,例如所述数据处理系统700的程序代码等。此外,存储器801还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器802在一些发明实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。所述处理器802通常用于控制计算机设备800的总体操作。本申请实施例中,处理器802用于运行存储器801中存储的程序代码或者处理数据,例如运行所述数据处理系统700的程序代码,以实现上述各个发明实施例中的所述数据处理方法。
所述网络接口803可包括无线网络接口或有线网络接口,所述网络接口803通常用于在所述计算机设备800与其他电子装置之间建立通信连接。例如,所述网络接口803用于通过网络将所述计算机设备800与外部终端相连,在所述计算机设备800与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯装置(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图9仅示出了具有部件801-803的计算机设备800,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本申请实施例中,存储于存储器801中的所述数据处理系统700还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器801中,并由一个或多个处理器(本申请实施例为处理器802)所执行,以完成本申请之数据处理方法。
实施例四
本申请实施例还提供一种计算机可读存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器执行时实现相应功能。所述计算机可读存储介质可以是非易失性,也可以是易失性。本申请实施例的计算机可读存储介质用于存储所述数据处理系统700,以被处理器执行时实现本申请之数据处理方法。
上述本申请实施例序号仅仅为了描述,不代表发明实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述发明实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选发明实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种数据处理方法,其中,所述方法包括:
    对获取的数据资产表进行解析,得到所述数据资产表的逻辑代码;
    根据预设的血缘分析工具,对所述逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点;
    获取所述服务结点的使用用户,并根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户;
    根据所述服务结点的使用用户及所述数据结点的使用用户,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门;
    分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘;
    对所述资产使用状况仪表盘及所述资产价值分析仪表盘分别进行统计分析,并分别将统计出的低于第一预设值的所述资源使用情况对应的服务结点中的数据及低于第二预设值的所述资源访问情况对应的服务结点中的数据进行归档或下线。
  2. 如权利要求1所述的数据处理方法,其中,所述根据预设的血缘分析工具,对所述逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构,包括:
    根据所述预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,生成所述数据资产表相关联的所有结点以及各个结点之间的关联关系,所述关联关系包括父结点与子结点;及
    根据所述各个结点之间的关联关系将所述各个结点进行连接,连接后的所有结点构成所述数据资产表的数据全链路结构。
  3. 如权利要求2所述的数据处理方法,其中,所述各个结点之间的关联关系的生成方法,包括:
    根据所述预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,得到所述数据资产表的树形结构代码;及
    根据预设的递归算法对所述树形结构代码进行解构,挖掘出所述各个结点之间的关联关系,以确定所述各个结点的父结点。
  4. 如权利要求2所述的数据处理方法,其中,所述根据所述各个结点之间的关联关系将所述各个结点进行连接,连接后的所有结点构成所述数据资产表的数据全链路结构,包括:
    将所述各个结点按照所述父结点和所述子结点的连接方式进行连接,连接后的所有结点构成所述数据资产表的数据全链路结构。
  5. 如权利要求1所述的数据处理方法,其中,所述根据所述服务结点的使用用户及所述数据结点的使用用户,确定所述服务结点的使用部门及所述数据结点的使用部门,包括:
    根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户;及
    将所述使用用户按照所述使用用户预设的归属部门进行归类,确定所述服务结点的使用部门以及所述数据结点的使用部门。
  6. 如权利要求1所述的数据处理方法,其中,所述分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘,包括:
    分别获取所述各使用部门的总存储空间、所述服务结点的存储资源及所述服务结点的计算资源;
    分别统计所述各使用部门的总分摊存储空间、各种服务结点的分摊存储空间;
    计算所述各使用部门的所述总存储空间与所述总分摊储存空间的比值,得到第一比值结果;
    计算所述各使用部门的所述分摊存储空间与所述总存储空间的比值,得到第二比值结果;
    将所述存储资源进行排序,得到第一排序结果;
    将所述计算资源进行排序,得到第二排序结果;及
    根据所述总存储空间、所述总分摊存储空间、所述分摊存储空间、所述第一比值结果、所述第二比值结果、所述第一排序结果及所述第二排序结果,生成所述资产使用状况仪表盘。
  7. 如权利要求1所述的数据处理方法,其中,所述分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘,还包括:
    获取所述各使用部门中各服务结点的访问量;
    将所述访问量归一化处理后从大到小进行排序,得到第三排序结果;
    将所述第三排序结果进行倒序处理,得到第四排序结果;
    计算所述访问量与所述总存储空间的比值,得到第三比值结果;
    将所述第三比值结果归一化处理后从大到小进行排序,得到第五排序结果;
    将所述第五排序结果进行倒序处理,得到第六排序结果;及
    根据所述第三排序结果、所述第四排序结果、所述第五排序结果及所述第六排序结果,生成所述资产价值分析仪表盘。
  8. 一种数据处理系统,其中,所述系统包括:
    第一解析模块,用于对获取的数据资产表进行解析,得到所述数据资产表的逻辑代码;
    第二解析模块,用于根据预设的血缘分析工具,对所述逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点;
    确定模块,用于获取所述服务结点的使用用户,并根据所述服务结点的使用用户,确定所述数据全链路结构中各数据结点的使用用户;
    分摊模块,用于根据所述服务结点的使用用户及所述数据结点的使用用户,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门;
    统计模块,用于分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘;
    处理模块,用于对所述资产使用状况仪表盘及所述资产价值分析仪表盘分别进行统计分析,并分别将统计出的低于第一预设值的所述资源使用情况对应的服务结点中的数据及低于第二预设值的所述资源访问情况对应的服务结点中的数据进行归档或下线。
  9. 一种计算机设备,所述计算机设备包括:存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现以下步骤:
    对获取的数据资产表进行解析,得到所述数据资产表的逻辑代码;
    根据预设的血缘分析工具,对所述逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点;
    获取所述服务结点的使用用户,并根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户;
    根据所述服务结点的使用用户及所述数据结点的使用用户,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门;
    分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘;
    对所述资产使用状况仪表盘及所述资产价值分析仪表盘分别进行统计分析,并分别将统计出的低于第一预设值的所述资源使用情况对应的服务结点中的数据及低于第二预设值的所述资源访问情况对应的服务结点中的数据进行归档或下线。
  10. 如权利要求9所述的计算机设备,其中,所述处理器执行所述计算机程序时还实现以下步骤:
    根据所述预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,生成所述数据资产表相关联的所有结点以及各个结点之间的关联关系,所述关联关系包括父结点与子结点;及
    根据所述各个结点之间的关联关系将所述各个结点进行连接,连接后的所有结点构成所述数据资产表的数据全链路结构。
  11. 如权利要求10所述的计算机设备,其中,所述处理器执行所述计算机程序时还实现以下步骤:
    根据所述预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,得到所述数据资产表的树形结构代码;及
    根据预设的递归算法对所述树形结构代码进行解构,挖掘出所述各个结点之间的关联关系,以确定所述各个结点的父结点。
  12. 如权利要求10所述的计算机设备,其中,所述处理器执行所述计算机程序时还实现以下步骤:
    将所述各个结点按照所述父结点和所述子结点的连接方式进行连接,连接后的所有结点构成所述数据资产表的数据全链路结构。
  13. 如权利要求9所述的计算机设备,其中,所述处理器执行所述计算机程序时还实现以下步骤:
    根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户;及
    将所述使用用户按照所述使用用户预设的归属部门进行归类,确定所述服务结点的使用部门以及所述数据结点的使用部门。
  14. 如权利要求9所述的计算机设备,其中,所述处理器执行所述计算机程序时还实现以下步骤:
    分别获取所述各使用部门的总存储空间、所述服务结点的存储资源及所述服务结点的计算资源;
    分别统计所述各使用部门的总分摊存储空间、各种服务结点的分摊存储空间;
    计算所述各使用部门的所述总存储空间与所述总分摊储存空间的比值,得到第一比值结果;
    计算所述各使用部门的所述分摊存储空间与所述总存储空间的比值,得到第二比值结果;
    将所述存储资源进行排序,得到第一排序结果;
    将所述计算资源进行排序,得到第二排序结果;及
    根据所述总存储空间、所述总分摊存储空间、所述分摊存储空间、所述第一比值结果、所述第二比值结果、所述第一排序结果及所述第二排序结果,生成所述资产使用状况仪表盘。
  15. 如权利要求9所述的计算机设备,其中,所述处理器执行所述计算机程序时还实现以下步骤:
    获取所述各使用部门中各服务结点的访问量;
    将所述访问量归一化处理后从大到小进行排序,得到第三排序结果;
    将所述第三排序结果进行倒序处理,得到第四排序结果;
    计算所述访问量与所述总存储空间的比值,得到第三比值结果;
    将所述第三比值结果归一化处理后从大到小进行排序,得到第五排序结果;
    将所述第五排序结果进行倒序处理,得到第六排序结果;及
    根据所述第三排序结果、所述第四排序结果、所述第五排序结果及所述第六排序结果,生成所述资产价值分析仪表盘。
  16. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现以下步骤:
    对获取的数据资产表进行解析,得到所述数据资产表的逻辑代码;
    根据预设的血缘分析工具,对所述逻辑代码进行解析,生成与所述数据资产表对应的数据全链路结构,所述数据全链路结构包括服务结点及数据结点;
    获取所述服务结点的使用用户,并根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户;
    根据所述服务结点的使用用户及所述数据结点的使用用户,确定所述服务结点的使用部门及所述数据结点的使用部门,并将所述服务结点及所述数据结点的存储资源及计算资源分摊至各使用部门;
    分别统计所述各使用部门的资源使用情况及所述各使用部门的资源访问情况,并根据统计结果生成与所述资源使用情况对应的资产使用状况仪表盘及与所述资源访问情况对应的资产价值分析仪表盘;
    对所述资产使用状况仪表盘及所述资产价值分析仪表盘分别进行统计分析,并分别将统计出的低于第一预设值的所述资源使用情况对应的服务结点中的数据及低于第二预设值的所述资源访问情况对应的服务结点中的数据进行归档或下线。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现以下步骤:
    根据所述预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,生成所述数据资产表相关联的所有结点以及各个结点之间的关联关系,所述关联关系包括父结点与子结点;及
    根据所述各个结点之间的关联关系将所述各个结点进行连接,连接后的所有结点构成所述数据资产表的数据全链路结构。
  18. 如权利要求17所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现以下步骤:
    根据所述预设的血缘分析工具,对所述数据资产表的逻辑代码进行解析,得到所述数据资产表的树形结构代码;及
    根据预设的递归算法对所述树形结构代码进行解构,挖掘出所述各个结点之间的关联关系,以确定所述各个结点的父结点。
  19. 如权利要求17所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现以下步骤:
    将所述各个结点按照所述父结点和所述子结点的连接方式进行连接,连接后的所 有结点构成所述数据资产表的数据全链路结构。
  20. 如权利要求16所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现以下步骤:
    根据所述服务结点的使用用户,确定所述数据全链路结构中所述数据结点的使用用户;及
    将所述使用用户按照所述使用用户预设的归属部门进行归类,确定所述服务结点的使用部门以及所述数据结点的使用部门。
PCT/CN2021/091309 2021-02-25 2021-04-30 数据处理方法、系统、计算机设备及可读存储介质 WO2022178979A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110214728.4A CN112948381B (zh) 2021-02-25 2021-02-25 数据处理方法、系统、计算机设备及可读存储介质
CN202110214728.4 2021-02-25

Publications (1)

Publication Number Publication Date
WO2022178979A1 true WO2022178979A1 (zh) 2022-09-01

Family

ID=76246347

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/091309 WO2022178979A1 (zh) 2021-02-25 2021-04-30 数据处理方法、系统、计算机设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN112948381B (zh)
WO (1) WO2022178979A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717645A (zh) * 2019-09-02 2020-01-21 北京航空航天大学 基于分域分业务的智能网联汽车信息安全资产识别方法
CN111353723A (zh) * 2020-03-30 2020-06-30 上海至数企业发展有限公司 设备资产盘点的方法、系统、设备及存储介质
US20200267125A1 (en) * 2016-06-10 2020-08-20 OneTrust, LLC Data processing systems for migrating data between data centers
CN112396404A (zh) * 2020-11-27 2021-02-23 广州光点信息科技有限公司 一种数据中台系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8266670B1 (en) * 2004-05-06 2012-09-11 American Express Travel Related Services Company, Inc. System and method for dynamic security provisioning of data resources
US10089401B2 (en) * 2016-01-12 2018-10-02 Fox Broadcasting Company Method and pluggable system for trend-based allocation of media assets between global and local storage
CN109903147A (zh) * 2019-01-28 2019-06-18 平安科技(深圳)有限公司 资产数据处理方法、装置及计算机设备
CN110471949B (zh) * 2019-07-11 2023-02-28 创新先进技术有限公司 数据血缘分析方法、装置、系统、服务器及存储介质
CN111401700B (zh) * 2020-03-05 2023-09-19 平安科技(深圳)有限公司 一种数据分析方法、装置、计算机系统及可读存储介质
CN112241402A (zh) * 2020-10-16 2021-01-19 中国民用航空华东地区空中交通管理局 一种空管数据供应链系统及数据治理方法
CN112328575A (zh) * 2020-11-12 2021-02-05 杭州数梦工场科技有限公司 数据资产血缘生成方法、装置、电子设备
CN112256687A (zh) * 2020-11-17 2021-01-22 珠海大横琴科技发展有限公司 一种数据处理的方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200267125A1 (en) * 2016-06-10 2020-08-20 OneTrust, LLC Data processing systems for migrating data between data centers
CN110717645A (zh) * 2019-09-02 2020-01-21 北京航空航天大学 基于分域分业务的智能网联汽车信息安全资产识别方法
CN111353723A (zh) * 2020-03-30 2020-06-30 上海至数企业发展有限公司 设备资产盘点的方法、系统、设备及存储介质
CN112396404A (zh) * 2020-11-27 2021-02-23 广州光点信息科技有限公司 一种数据中台系统

Also Published As

Publication number Publication date
CN112948381B (zh) 2022-10-28
CN112948381A (zh) 2021-06-11

Similar Documents

Publication Publication Date Title
JP6827127B2 (ja) 多次元データベース環境において1回のスキャンでロード、集約、およびバッチ計算を行なうためのシステムおよび方法
US10073837B2 (en) Method and system for implementing alerts in semantic analysis technology
WO2020087829A1 (zh) 数据趋势分析方法、系统、计算机装置及可读存储介质
US9400700B2 (en) Optimized system for analytics (graphs and sparse matrices) operations
CN112527827A (zh) 用于多维数据的自动洞察
GB2508503A (en) Batch evaluation of remote method calls to an object oriented database
Hasic et al. Towards assessing the theoretical complexity of the decision model and notation (DMN)
CN107203529B (zh) 基于元数据图结构相似性的多业务关联性分析方法及装置
US20170364590A1 (en) Detecting Important Variables and Their Interactions in Big Data
WO2021017290A1 (zh) 基于知识图谱的实体识别数据增强方法及系统
CN112181704A (zh) 一种大数据任务处理方法、装置、电子设备及存储介质
CN112445875A (zh) 数据关联及检验方法、装置、电子设备及存储介质
US11442930B2 (en) Method, apparatus, device and storage medium for data aggregation
CN111078695A (zh) 计算企业内元数据关联关系的方法及装置
CN111522782A (zh) 文件数据写入方法、装置及计算机可读存储介质
CN111221698A (zh) 任务数据采集方法与装置
US11514236B1 (en) Indexing in a spreadsheet based data store using hybrid datatypes
CN111858366B (zh) 一种测试用例生成方法、装置、设备及存储介质
WO2022178979A1 (zh) 数据处理方法、系统、计算机设备及可读存储介质
JP7098735B2 (ja) 大規模データ分析の最適化
CN111226201B (zh) 管理计算机中的存储器的方法和计算机系统
CN110888888A (zh) 人员关系分析方法、装置、电子设备及存储介质
CN111159213A (zh) 一种数据查询方法、装置、系统和存储介质
US11500839B1 (en) Multi-table indexing in a spreadsheet based data store
CN112560416B (zh) 页面图表生成方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927412

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21927412

Country of ref document: EP

Kind code of ref document: A1