CN116841904A - Health diagnosis method, device and equipment for data product and storage medium - Google Patents

Health diagnosis method, device and equipment for data product and storage medium Download PDF

Info

Publication number
CN116841904A
CN116841904A CN202310897260.2A CN202310897260A CN116841904A CN 116841904 A CN116841904 A CN 116841904A CN 202310897260 A CN202310897260 A CN 202310897260A CN 116841904 A CN116841904 A CN 116841904A
Authority
CN
China
Prior art keywords
health
data
knowledge graph
standard
data product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310897260.2A
Other languages
Chinese (zh)
Inventor
王党团
李文明
邢琪
何柳柳
万小妹
胡小明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202310897260.2A priority Critical patent/CN116841904A/en
Publication of CN116841904A publication Critical patent/CN116841904A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/366Software debugging using diagnostics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Abstract

The application provides a health diagnosis method, a device, equipment and a storage medium of a data product, which can be applied to the field of big data or the field of finance. The method comprises the following steps: program source codes of target data products are collected, and then a first knowledge graph corresponding to the target data products is generated according to the program source codes, wherein the first knowledge graph is used for representing the dependency relationship among all entities in the source codes. And then, obtaining health model data corresponding to the target data product according to the first knowledge graph. And finally, comparing the health model data with a standard health model to obtain a health diagnosis result of the target data product. Therefore, the health condition of the corresponding data product can be accurately diagnosed by comparing the health model data corresponding to the data product with the standard health model set by the user. Therefore, the health condition of the data product can be judged according to specific data, and the accuracy and the scientificity of the diagnosis result are ensured, so that the quality of the data product is ensured.

Description

Health diagnosis method, device and equipment for data product and storage medium
Technical Field
The present application relates to the field of big data or finance, and in particular, to a method, apparatus, device and storage medium for diagnosing health of a data product.
Background
A data product is a product that exchanges value with a user via structured data. The data product can reduce the data threshold used by the user, improve the data use efficiency, exert the data value and assist the user in decision/action. The form of the system comprises a platform type product, a system function module, a mobile APP, an applet and the like. In the rapidly evolving internet industry, however, data-related products may be referred to as "data products," such as advertising platforms, operating platforms, marketing platforms, consumer portrayal platforms, and the like.
The operation process of the data product mainly comprises the steps of extracting and processing a large amount of data according to business requirements, and then outputting business reports and analysis reports. The whole process comprises three main stages of ETL (Extract-Transform-Load), data modeling and data application of data. In the development process of the data product, the management and control of various algorithms, the cognition of the service and the field experience of the developer can influence the health state of the final data product. In the development of a large number of data products, the health status of the data products is highly heterogeneous. And the quality of output data of the product can be influenced by the data product with poor health state, so that the problems of complex subsequent development, slow operation response, poor service expansibility and the like are caused.
However, in the prior art, no judging method and standard for the health degree of the data product are developed, and the health state of the product can be evaluated only by means of personal experience, so that great uncertainty exists in the health diagnosis result.
Disclosure of Invention
In view of the above, the present application provides a method, apparatus, device and storage medium for diagnosing health of a data product, which aim to diagnose health of the data product.
In a first aspect, the present application provides a method of health diagnosis of a data product, the method comprising:
program source codes of target data products are collected;
generating a first knowledge graph corresponding to the target data product according to the program source code, wherein the first knowledge graph is used for representing the dependency relationship among all entities in the program source code;
obtaining health model data corresponding to the target data product according to the first knowledge graph;
and comparing the health model data with a standard health model to obtain a health diagnosis result of the target data product.
Optionally, the program source code for collecting the target data product includes:
capturing a source code file and/or a database program file of the target data product;
and sorting the grabbed files to obtain a standard program file, wherein the standard program file comprises program source codes of the target data product.
Optionally, the generating the first knowledge graph corresponding to the target data product according to the program source code includes:
identifying the program source codes according to a general database operation language, and obtaining the entity in the program source codes and the dependency relationship between the entities;
and connecting the entities in one way according to the dependency relationship to obtain a net-shaped knowledge graph serving as a first knowledge graph.
Optionally, taking the entity as a node of the first knowledge graph, and obtaining health model data corresponding to the target data product according to the first knowledge graph includes:
traversing the nodes of the first knowledge graph layer by layer according to the dependency relationship, and layering the nodes in the first knowledge graph;
counting the total number of nodes and the number of non-child nodes of each layer of the first knowledge graph;
obtaining the number of the sub nodes of each layer according to the total number of the nodes and the number of the non-sub nodes;
and combining the layer sequence numbers of all the layers of the first knowledge graph with the corresponding number of the sub-nodes to obtain a plurality of group sequences which are used as the health model data.
Optionally, a plurality of parameters in the standard health model are user-defined, the plurality of parameters including: total number of source data, processing step size and compression ratio;
the total number of the source data is the number of root nodes in the second knowledge graph corresponding to the standard health product;
the processing step length is the number of layers of the layers where the root node is located and the service layer in the second knowledge graph, and the number of nodes of the service layer corresponds to the number of service models of the standard health product;
the compression ratio is the ratio of the node number of the service layer to the total number of the source data.
Optionally, comparing the health model data with a standard health model to obtain a health diagnosis result of the target data product includes:
obtaining standard health data of the standard health model according to the parameters;
correspondingly comparing the health model data with the standard health data;
obtaining the product health degree corresponding to the target data product according to the comparison result;
and obtaining the health diagnosis result of the target data product according to a preset standard and the product health degree.
Optionally, the method further comprises:
obtaining a standard health degree curve corresponding to the standard health model according to the user-defined multiple parameters;
obtaining a health degree curve corresponding to the target data product according to the health model data;
obtaining a health degree histogram corresponding to the target data product according to the health model data;
and displaying the health histogram, the health curve and the standard health curve in the same page so as to display the fitting condition of the health curve and the standard health curve.
In a second aspect, the present application provides a health diagnostic device for a data product, the device comprising: the system comprises a source code acquisition module, a knowledge graph generation module, a health model data determination module and a diagnosis result determination module;
the source code acquisition module is used for acquiring program source codes of target data products;
the knowledge graph generation module is used for generating a first knowledge graph corresponding to the target data product according to the program source code, and the first knowledge graph is used for representing the dependency relationship among all entities in the source code;
the health model data determining module is used for obtaining health model data corresponding to the target data product according to the first knowledge graph;
and the diagnosis result determining module is used for comparing the health model data with a standard health model to obtain a health diagnosis result of the target data product.
In a third aspect, the present application provides an electronic device comprising a memory for storing program code and a processor for executing the program code to cause the electronic device to perform the method of health diagnosis of a data product according to any one of the preceding first aspects.
In a fourth aspect, the present application provides a computer storage medium having a computer program stored therein, which, when executed, enables an apparatus running the computer program to implement the method for health diagnosis of a data product according to any one of the preceding first aspects.
The application provides a health diagnosis method, a device, equipment and a storage medium of a data product, which can be applied to the field of big data or the field of finance. The method comprises the following steps: program source codes of target data products are collected, and then a first knowledge graph corresponding to the target data products is generated according to the program source codes, wherein the first knowledge graph is used for representing the dependency relationship among all entities in the source codes. And then, obtaining health model data corresponding to the target data product according to the first knowledge graph. And finally, comparing the health model data with a standard health model to obtain a health diagnosis result of the target data product. Therefore, the health model data corresponding to the data product is obtained by collecting the source codes of the data product and is compared with the standard health model set by the user, so that the health condition of the corresponding data product can be accurately diagnosed. Therefore, the health condition of the data product can be judged according to specific data, and the accuracy and the scientificity of the diagnosis result are ensured, so that the data product with poor quality is found in time, and the quality of the data product is ensured.
Drawings
In order to more clearly illustrate this embodiment or the technical solutions of the prior art, the drawings that are required for the description of the embodiment or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for diagnosing health of a data product according to an embodiment of the present application;
FIG. 2 is a flowchart of another method of diagnosing health of a data product according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a source code file in a method for diagnosing health of a data product according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a knowledge graph in a health diagnosis method of a data product according to an embodiment of the present application;
FIG. 5 is a schematic diagram of another knowledge graph in a method for diagnosing health of a data product according to an embodiment of the present application;
FIG. 6 is a schematic diagram showing a combination of health histogram of a data product in a method for diagnosing health of a data product according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a health diagnosis device for a data product according to an embodiment of the present application.
Detailed Description
As previously described, data products refer to techniques and applications that combine the capabilities of data warehouse, report queries, data analysis, etc., to provide data support for decisions of an enterprise. With the development of computer hardware and the progress of data processing algorithms, data products are more and more powerful, and algorithms used are more and more. This results in the development of data products that contain different data hierarchies and data models, even in the same business domain, by different development organizations or developers due to different domain experiences and technical capabilities. Although the quality of the data products is good and bad, the data products with poor quality can lead to the problems of complex development, slow operation response or poor service expansibility, etc. However, in the prior art, it is blank how to identify the health status of a data product, and the health status of the product can only be evaluated by means of the personal experience of a tester. The evaluation method has no specific data support and fixed evaluation standard, and the evaluation result has great uncertainty and can not accurately and objectively diagnose the health state of the data product.
In view of the above, the present application provides a method, apparatus, device and storage medium for diagnosing health of a data product. The method comprises the following steps: program source codes of target data products are collected, and then a first knowledge graph corresponding to the target data products is generated according to the program source codes, wherein the first knowledge graph is used for representing the dependency relationship among all entities in the source codes. And then, obtaining health model data corresponding to the target data product according to the first knowledge graph. And finally, comparing the health model data with a standard health model to obtain a health diagnosis result of the target data product. Therefore, the health model data corresponding to the data product is obtained by collecting the source codes of the data product and is compared with the standard health model set by the user, so that the health condition of the corresponding data product can be accurately diagnosed. Therefore, the health condition of the data product can be judged according to specific data, and the accuracy and the scientificity of the diagnosis result are ensured, so that the data product with poor quality is found in time, and the quality of the data product is ensured.
In order to make the present application better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, fig. 1 is a flowchart of a method for diagnosing health of a data product according to an embodiment of the present application, including:
s101: program source codes of target data products are collected.
Wherein, the data product refers to a product form which can exert the data value to assist the user in making decisions (even actions) better. It can act as an analytical presenter of information and an enabler of value during the decision and action of the user. Conventional data products are of three types: user data products, business data products, and enterprise data products. In some possible implementations, the target data product may also include other forms of data products, such as tool-type data products, to custom service-type data products, or intelligent data products, etc., without affecting the normal implementation of the embodiments of the present application.
Alternatively, the program source code of the target data product may be collected as follows: first, source code files and/or database program files of a target data product are crawled. And then, sorting the grabbed files to obtain a standard program file, wherein the standard program file comprises program source codes of target data products. In some possible implementation manners, other manners may be adopted to obtain the program source code of the target data product, which do not affect the normal implementation of the embodiment of the present application.
S102: and generating a first knowledge graph corresponding to the target data product according to the program source code.
The first knowledge graph is used for representing the dependency relationship among all entities in the program source code. The entity may be a table in the database file or related to the program source code, and the dependency relationship may be a processing operation and a processing step of the table.
Optionally, the first knowledge graph may use a plurality of entities in the program source code as a plurality of nodes, and use a dependency relationship between the entities as a directed connection line, and connect the plurality of entities according to a front-to-back relationship between the entities. In some possible implementations, the first knowledge-graph may be generated as follows: firstly, identifying program source codes according to a general database operation language (such as SQL language) to obtain the dependency relationship between entities in the program source codes. And connecting the entities in one way according to the dependency relationship to obtain a net-shaped knowledge graph as a first knowledge graph. Of course, other manners may be adopted to obtain the first knowledge graph, which do not affect the normal implementation of the embodiment of the present application.
S103: and obtaining health model data corresponding to the target data product according to the first knowledge graph.
The health model data are related information of a first knowledge graph corresponding to the target data product. The health model data may include the total number of nodes, the number of sub-nodes, the number of non-sub-nodes, the number of root nodes, etc. of each layer of the first knowledge graph, and may further include the number of business models, that is, the number of nodes of the business layer, included in the first knowledge graph. The business model is a entity model defined in the data product and is used for directly operating the data so as to realize different functions.
Alternatively, when the entity is used as a node of the first knowledge graph, the health model data may be obtained in the following manner: firstly, traversing the nodes of the first knowledge graph layer by layer according to the dependency relationship, and layering the nodes in the first knowledge graph. And then, counting the total number of nodes and the number of non-child nodes of each layer of the first knowledge graph. And obtaining the number of the sub nodes of each layer according to the total number of the nodes and the number of the non-sub nodes. And finally, combining the layer sequence numbers of all the layers of the first knowledge graph with the corresponding number of the sub-nodes to obtain a plurality of group sequences which are used as health model data. Of course, other manners may be adopted to obtain the health model data, which do not affect the normal implementation of the embodiment of the present application.
S104: and comparing the health model data with a standard health model to obtain a health diagnosis result of the target data product.
The standard health model is a health diagnosis standard of health model data of the target data product. The standard health parameters may include a specific health calculation formula to output a standard health curve. Of course, the standard number of node numbers of each layer in the first knowledge graph can also be included, and the normal implementation of the embodiment of the application is not affected. Alternatively, the final standard health model may be determined based on a plurality of parameters, or the initial standard health model may be directly used for health diagnosis.
Alternatively, a plurality of parameters may be customized by the user to arrive at a final standard health model. Wherein the parameters may include: the total source data, the processing step length and the compression ratio, wherein the total source data is the number of root nodes in the second knowledge graph corresponding to the standard health product. The processing step length is the number of layers of the layers where the root node is located and the number of layers of the service layers, and the number of nodes of the service layers corresponds to the number of service models of the standard health product. The compression ratio is the ratio of the number of nodes of the traffic layer to the total number of source data. Of course, other manners may be adopted to obtain the final standard health model, for example, the initialized values of the parameters of the standard health model are adjusted automatically according to the compression ratio, the processing step length and the starting node number of the target data product, which does not affect the normal operation of the embodiment of the present application. Therefore, in the embodiment of the application, the standard health model can be customized by a user, and the parameters of the standard health model can be actually adjusted according to the related information of the data product, so that the evaluation standard for flexibly adjusting the health condition of the data product can be realized, and the method and the device can be widely applied to various practical application scenes.
Alternatively, the final health diagnosis may be obtained by: first, standard health data of a standard health model is obtained according to a plurality of parameters. The health model data is then compared correspondingly to standard health data. And obtaining the product health degree corresponding to the target data product according to the comparison result. Finally, according to the preset standard and the product health degree, the health diagnosis result of the target data product is obtained. In some possible implementation manners, other manners may be adopted to obtain the health diagnosis result of the target data product, which do not affect the normal implementation of the embodiment of the present application.
Optionally, the method may further comprise: and obtaining a standard health degree curve corresponding to the standard health model according to a plurality of parameters defined by a user. And meanwhile, obtaining a health degree curve corresponding to the target data product according to the health model data. And obtaining a health degree histogram corresponding to the target data product according to the health model data. And displaying the health histogram, the health curve and the standard health curve on the same page so as to display the fitting condition of the health curve and the standard health curve. Therefore, the user can more intuitively and conveniently observe the difference between the data product and the standard health model by simultaneously displaying the health degree curve and the standard health degree curve of the data product, thereby being beneficial to finding the reason of poor quality of the data product.
According to the embodiment of the application, the source codes of the data products are collected to obtain the health model data corresponding to the data products, and the health model data is compared with the standard health model set by a user, so that the health condition of the corresponding data products can be accurately diagnosed. Therefore, the health condition of the data product can be judged according to specific data, and the accuracy and the scientificity of the diagnosis result are ensured, so that the data product with poor quality is found in time, and the quality of the data product is ensured.
The health diagnosis method of the data product provided by the embodiment of the application is introduced above, and the health diagnosis method of the data product is exemplified below in combination with specific application scenes.
Referring to fig. 2, another method flow chart of a method for diagnosing health of a data product according to an embodiment of the present application is shown.
S201: a source code file of the data product is extracted.
The source code file may be a product source code file or a database program file of a data product, or may be a standard program file after data cleaning conversion.
S202: and constructing a knowledge graph according to the source code file.
The source code in the source code file may be referred to in fig. 3, and a screenshot of the Oracle program source code named p_001.Prc is shown in fig. 3. The entities in the source code file include: t9, T12, T11, T4, T3, T5, T2 and T6; dependencies include filtering and federation. Wherein, filtering forms a new table for selectively selecting contents in the table for the execution object (table), and combining means combining a plurality of tables into the same table.
Alternatively, the knowledge-graph may be constructed as follows: first, according to SQL (Structured Query Language) grammar rules, the entity objects following the INTO or FROM statement are identified. Then, dependency relationships between entities, such as statements like SELECT or UNION, are identified. By way of example, the dependencies between the various entities in FIG. 3 may be obtained as: t12 and T11 are associated to produce a temporary table (alias Z12), can be recorded as V$ $ TE 1. T4 is filtered to generate a temporary table, which can be denoted as V$$TE 2. The temporary list of V $ TE1, the temporary list of V $ TE2 and the temporary list of T3 are respectively filtered and then are united by UNION, so that the temporary list is 3 temporary lists, which can be marked as V $ U3, V $ U,4 and V $ U5. The combined result generates a temporary table (alias TT 1), can be recorded as V$ $ TE6. Finally, 4 tables are associated, TT1, T5, T2, T6 are inserted into T9. And finally, inserting all the obtained tables into a graph database for storage, and recording the tables as entity information to form nodes. And creating a directed connecting line on a node of the gallery according to the front-back dependency relationship between the entities to generate an entity relationship. In some possible implementations, as shown in fig. 4, an "entity-relationship-entity" triplet may be formed, and the entities and the value pairs of the related attributes thereof are connected to each other through a relationship to form a mesh knowledge graph. Of course, knowledge maps can be obtained in other ways, and the normal implementation of the embodiment of the application is not affected.
S203: and traversing the knowledge graph to obtain input data of the health degree model of the data product.
Alternatively, the input data may be obtained as follows: first, the constructed knowledge graph is traversed. For example, the network graph may be traversed layer by layer from the beginning of the root node to the end of the child nodes according to the context between the nodes in the knowledge graph. Meanwhile, counting the total number of nodes of each layer and the total number of nodes without child nodes, and calculating the total number of the child nodes of each layer by subtracting the corresponding total number of the nodes without child nodes from the total number of the nodes of each layer. Then, the layer sequence number of each layer and the corresponding total number of the child nodes form a group sequence (layer sequence number, number of child nodes). In particular, the total number of child-free nodes of the last layer is 0. The generated group sequence data of each layer is used as input data of a health degree model of the data product.
For example, referring to fig. 5, fig. 5 is a knowledge graph corresponding to a data product. The knowledge graph has 49 nodes in total, the left side is a root node, the right side is a child node according to the relation direction, and the whole graph is divided into 7 layers. The group sequence of each layer can be obtained as follows:
the total number of the layer 0 nodes is 10, the total number of the no child nodes is 0, and the sequence of the layer 0 is (0, 10).
The total number of the layer 1 nodes is 9, the total number of the no child nodes is 1, the total number of the child nodes is 9-1=8, and the layer 1 sequence is (1, 8).
The total number of the layer 2 nodes is 9, the total number of the no child nodes is 2, the total number of the child nodes is 9-2=7, and the layer 2 sequence is (2, 7).
And so on, the 3 rd layer group sequence is (3, 4), the 4 th layer group sequence is (4, 3), the 5 th layer group sequence is (5, 5), and the 6 th layer group sequence is (6, 9).
The group sequence of each layer is the input data corresponding to the data product. Of course, other ways of obtaining the input data may be adopted, and normal implementation of the embodiment of the present application is not affected.
S204: a plurality of parameter definition standard health models are obtained.
The standard health model may specifically include the following formula:
y=h(1-k)/s 2 *x 2 -2h(1-k)/s*x+h
wherein h is the total number of source data, k is the compression ratio of the total number of service models to the total number of source data h, s is the processing step length from the source data to the service models, x is the layer sequence number of the knowledge graph corresponding to the standard health model, and y is the input data corresponding to the layer sequence number in the standard health model corresponding to the set total number of source data h, compression ratio k and processing step length s.
The total number h of the source data corresponds to the total number of the nodes with sub-nodes at the 0 th layer in the knowledge graph, the total number of the service models corresponds to the total number of the nodes at the service layer in the knowledge graph, and the processing step length corresponds to the number of the separation layers from the layer where the source data is located to the service layer.
For example, if the level 0 in the knowledge graph of fig. 5 is the source data, the level 4 is the service layer, and when the total number of nodes corresponding to the service layer is 3, the corresponding h value is 10, s value is 4, and k value is 3/10=0.33. In some possible implementations, the values of both model parameters k and s may be adjusted according to the actual compression ratio and process step size of the data product. Of course, the user may also customize the method, and the normal implementation of the embodiment of the application is not affected.
S205: and obtaining input data corresponding to the standard health model.
The input data may include a total number of nodes of each layer in a knowledge graph corresponding to the standard health model. Optionally, when h takes a value of 10, k takes a value of 0.2, and s takes a value of 4, input data corresponding to the standard health model can be calculated according to the formula of the standard health model, as shown in table l:
TABLE 1
x 0 1 2 3 4 5 6 7
y 10 6.5 4 2.5 2 2.5 4 6.5
Wherein x represents the layer sequence number of the knowledge graph corresponding to the standard health model, and y represents the standard node number corresponding to the layer sequence number.
In some possible implementation manners, other ways may be used to obtain input data corresponding to the standard health model, which does not affect the normal implementation of the embodiment of the present application.
S206: substituting the input data corresponding to the data product and the input data corresponding to the standard health model into a health degree formula to obtain the health degree of the data product.
Wherein, the health formula is:
wherein x is i Total number of child nodes at the ith layer, s, in input data of health model of data product i And (3) the number of standard nodes corresponding to the ith layer in the input data of the standard health model, wherein n is the total number of layers of the knowledge graph corresponding to the data product, and y is the health degree of the data product.
In particular, the service layer sequence number is preceded by x i -s i The business model layer is followed by s i -x i . If the knowledge graph layer 4 is a business layer, the calculation result can be shown in table 2:
TABLE 2
From table 2, the final calculation result shows that the health degree is 0.3. Wherein, (x) i -s i )/s i If the value of (2) is smaller than 0, the value is set to 0, otherwise, the actual value is taken. Alternatively, the health condition of the data product can be judged according to the calculated health degree in the following manner: if the health degree is equal to 0, the health degree of the product is normal; if the health is greater than 0, the health of the product needs to be concerned; if the health degree is more than 0.3, there is a slight problem in the health degree of the product; if the health degree is more than 0.5, the product health degree has a serious problem.
Optionally, a web graph mode can be used for drawing a health degree columnar curve combination chart of the data product according to the calculated health degree result data, so that a user can intuitively see the fitting condition of the health degree of the product and a standard model. Referring to fig. 6, the health histogram assembly includes a standard health curve, a product health curve, and a product health histogram. The standard health degree curve is obtained from input data corresponding to the standard health model. The product health curve and the product health histogram are obtained from input data of a health model of the data product. Of course, other graphic manners may be adopted to display the health status of the data product, which does not affect the normal operation of the embodiment of the present application.
According to the embodiment of the application, the standard health model is provided, the health model data of the data product is utilized to be compared with the standard health model, the health degree of the data product can be accurately and scientifically measured through specific data, and the data analysis conclusion is utilized, so that the blank that the current manual evaluation cannot be realized is filled.
The health diagnosis method of the data product provided by the application can be used in the big data field or the financial field. The above is merely an example, and does not limit the application field of the health diagnosis method for a data product provided by the present application.
The above is some specific implementation manners of the health diagnosis method for the data product provided by the embodiment of the application, and based on this, the application also provides a corresponding device. The apparatus provided by the embodiment of the present application will be described in terms of functional modularization.
Referring to the schematic structural diagram of the health diagnosis apparatus 300 of the data product shown in fig. 7, the apparatus 300 includes a source code acquisition module 301, a knowledge graph generation module 302, a health model data determination module 303, and a diagnosis result determination module 304;
the source code collection module 301 is configured to collect program source codes of a target data product.
The knowledge graph generation module 302 is configured to generate a first knowledge graph corresponding to the target data product according to the program source code, where the first knowledge graph is used to represent a dependency relationship between each entity in the source code.
The health model data determining module 303 is configured to obtain health model data corresponding to the target data product according to the first knowledge graph.
The diagnosis result determining module 304 is configured to compare the health model data with a standard health model to obtain a health diagnosis result of the target data product.
Optionally, the source code collection module 301 is further configured to capture a source code file and/or a database program file of the target data product, and sort the captured file to obtain a standard program file, where the standard program file includes program source codes of the target data product.
Optionally, the knowledge graph generating module 302 is further configured to operate the language identification program source code according to the general database to obtain a dependency relationship between entities in the program source code, and perform unidirectional connection between the entities according to the dependency relationship to obtain a net-shaped knowledge graph as the first knowledge graph.
Optionally, the entity is used as a node of the first knowledge graph, and the health model data determining module 303 further includes a traversing sub-module, a statistics sub-module, a sub-node number determining sub-module, and a group sequence generating sub-module;
the traversal submodule is used for traversing the nodes of the first knowledge graph layer by layer according to the dependency relationship, and layering the nodes in the first knowledge graph.
And the statistics sub-module is used for counting the total number of nodes and the number of non-child nodes of each layer of the first knowledge graph.
And the sub-node number determining sub-module is used for obtaining the sub-node number of each layer according to the total number of the nodes and the number of the non-sub-nodes.
And the group sequence generation sub-module is used for combining the layer sequence numbers of all the layers of the first knowledge graph with the corresponding number of the sub-nodes to obtain a plurality of group sequences which are used as health model data.
Optionally, a plurality of parameters in the standard health model are user-defined, the plurality of parameters including: total number of source data, processing step size, and compression ratio.
The total number of the source data is the number of root nodes in the second knowledge graph corresponding to the standard health product;
the processing step length is the number of layers between the layer where the root node in the second knowledge graph is located and the service layer, and the number of nodes in the service layer corresponds to the number of service models of standard health products;
the compression ratio is the ratio of the number of nodes of the traffic layer to the total number of source data.
Optionally, the diagnostic result determining module 304 is further configured to obtain standard health data of the standard health model according to the plurality of parameters. And correspondingly comparing the health model data with the standard health data, and obtaining the product health degree corresponding to the target data product according to the comparison result. And obtaining the health diagnosis result of the target data product according to the preset standard and the product health degree.
Optionally, the device further includes a display module, where the display module is configured to obtain a standard health degree curve corresponding to the standard health model according to a plurality of parameters defined by a user, and obtain a health degree curve corresponding to the target data product according to the health model data. And obtaining a health degree histogram corresponding to the target data product according to the health model data. The health bar graph, the health curve and the standard health curve are displayed in the same page so as to show the fitting condition of the health curve and the standard health curve.
The embodiment of the application also provides corresponding equipment and a computer storage medium, which are used for realizing the scheme provided by the embodiment of the application.
The device comprises a memory for storing program instructions and a processor for executing the program instructions to cause the device to perform the method for diagnosing health of a data product according to any of the embodiments of the present application.
The computer storage medium has a computer program stored therein, and when the computer program is executed, a device running the computer program implements the method for diagnosing health of a data product according to any of the embodiments of the present application.
The "first" and "second" in the names of "first", "second" (where present) and the like in the embodiments of the present application are used for name identification only, and do not represent the first and second in sequence.
From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus general hardware platforms. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network communication device such as a router) to perform the method according to the embodiments or some parts of the embodiments of the present application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.
The foregoing description of the exemplary embodiments of the application is merely illustrative of the application and is not intended to limit the scope of the application.

Claims (10)

1. A method of diagnosing health of a data product, the method comprising:
program source codes of target data products are collected;
generating a first knowledge graph corresponding to the target data product according to the program source code, wherein the first knowledge graph is used for representing the dependency relationship among all entities in the program source code;
obtaining health model data corresponding to the target data product according to the first knowledge graph;
and comparing the health model data with a standard health model to obtain a health diagnosis result of the target data product.
2. The method of claim 1, wherein the program source code for collecting the target data product comprises:
capturing a source code file and/or a database program file of the target data product;
and sorting the grabbed files to obtain a standard program file, wherein the standard program file comprises program source codes of the target data product.
3. The method of claim 1, wherein generating the first knowledge-graph corresponding to the target data product according to the program source code comprises:
identifying the program source codes according to a general database operation language, and obtaining the entity in the program source codes and the dependency relationship between the entities;
and connecting the entities in one way according to the dependency relationship to obtain a net-shaped knowledge graph serving as a first knowledge graph.
4. A method according to claim 3, wherein the obtaining, by using the entity as a node of the first knowledge-graph, health model data corresponding to the target data product according to the first knowledge-graph includes:
traversing the nodes of the first knowledge graph layer by layer according to the dependency relationship, and layering the nodes in the first knowledge graph;
counting the total number of nodes and the number of non-child nodes of each layer of the first knowledge graph;
obtaining the number of the sub nodes of each layer according to the total number of the nodes and the number of the non-sub nodes;
and combining the layer sequence numbers of all the layers of the first knowledge graph with the corresponding number of the sub-nodes to obtain a plurality of group sequences which are used as the health model data.
5. The method of claim 1, wherein a plurality of parameters in the standard health model are user-defined, the plurality of parameters comprising: total number of source data, processing step size and compression ratio;
the total number of the source data is the number of root nodes in the second knowledge graph corresponding to the standard health product;
the processing step length is the number of layers of the layers where the root node is located and the service layer in the second knowledge graph, and the number of nodes of the service layer corresponds to the number of service models of the standard health product;
the compression ratio is the ratio of the node number of the service layer to the total number of the source data.
6. The method of claim 5, wherein comparing the health model data with a standard health model results in a health diagnosis of the target data product, comprising:
obtaining standard health data of the standard health model according to the parameters;
correspondingly comparing the health model data with the standard health data;
obtaining the product health degree corresponding to the target data product according to the comparison result;
and obtaining the health diagnosis result of the target data product according to a preset standard and the product health degree.
7. The method of claim 5, wherein the method further comprises:
obtaining a standard health degree curve corresponding to the standard health model according to the user-defined multiple parameters;
obtaining a health degree curve corresponding to the target data product according to the health model data;
obtaining a health degree histogram corresponding to the target data product according to the health model data;
and displaying the health histogram, the health curve and the standard health curve in the same page so as to display the fitting condition of the health curve and the standard health curve.
8. A health diagnostic device for a data product, the device comprising: the system comprises a source code acquisition module, a knowledge graph generation module, a health model data determination module and a diagnosis result determination module;
the source code acquisition module is used for acquiring program source codes of target data products;
the knowledge graph generation module is used for generating a first knowledge graph corresponding to the target data product according to the program source code, and the first knowledge graph is used for representing the dependency relationship among all entities in the source code;
the health model data determining module is used for obtaining health model data corresponding to the target data product according to the first knowledge graph;
and the diagnosis result determining module is used for comparing the health model data with a standard health model to obtain a health diagnosis result of the target data product.
9. An electronic device, the electronic device comprising: a memory and a processor;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is adapted to perform the steps of the method of health diagnosis of a data product according to any of claims 1-7 according to instructions in the program code.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when run on an electronic device, performs the steps of the method for health diagnosis of a data product according to any of claims 1-7.
CN202310897260.2A 2023-07-20 2023-07-20 Health diagnosis method, device and equipment for data product and storage medium Pending CN116841904A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310897260.2A CN116841904A (en) 2023-07-20 2023-07-20 Health diagnosis method, device and equipment for data product and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310897260.2A CN116841904A (en) 2023-07-20 2023-07-20 Health diagnosis method, device and equipment for data product and storage medium

Publications (1)

Publication Number Publication Date
CN116841904A true CN116841904A (en) 2023-10-03

Family

ID=88168885

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310897260.2A Pending CN116841904A (en) 2023-07-20 2023-07-20 Health diagnosis method, device and equipment for data product and storage medium

Country Status (1)

Country Link
CN (1) CN116841904A (en)

Similar Documents

Publication Publication Date Title
US7430548B2 (en) Rule processing system
US7926026B2 (en) Graphical analysis to detect process object anomalies
US20090313201A1 (en) Method and system for capturing business rules for automated decision procession
CN110554958B (en) Graph database testing method, system, device and storage medium
Bianchi et al. An exploratory case study of the maintenance effectiveness of traceability models
CN112000587B (en) Test man-hour automatic statistical method based on associated object operation statistics
Issaoui et al. Using metric-based filtering to improve design pattern detection approaches
CN106681808A (en) Task scheduling method and device
CN111859047A (en) Fault solving method and device
CN109743286A (en) A kind of IP type mark method and apparatus based on figure convolutional neural networks
Moore et al. A comparison of questionnaire-based and GUI-based requirements gathering
CN112559538A (en) Incidence relation generation method and device, computer equipment and storage medium
CN110825638B (en) Test case generation method, device, server and storage medium
CN107291616A (en) A kind of online generating platform of project report
CN116974805A (en) Root cause determination method, apparatus and storage medium
CN112818003A (en) Execution risk estimation method and device for query task
US20130124484A1 (en) Persistent flow apparatus to transform metrics packages received from wireless devices into a data store suitable for mobile communication network analysis by visualization
CN116841904A (en) Health diagnosis method, device and equipment for data product and storage medium
CN110515750A (en) A kind of applied topology generation method, system and cluster
CN114584453A (en) Fault analysis method and device of application system
KR102217092B1 (en) Method and apparatus for providing quality information of application
Fioravanti et al. A tool for process and product assessment of C++ applications
CN112416800A (en) Intelligent contract testing method, device, equipment and storage medium
CN112434831A (en) Troubleshooting method and device, storage medium and computer equipment
CN113570333B (en) Process design method suitable for integration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination