CN116932846A - Method and device for determining data asset, storage medium and electronic device - Google Patents

Method and device for determining data asset, storage medium and electronic device Download PDF

Info

Publication number
CN116932846A
CN116932846A CN202310794220.5A CN202310794220A CN116932846A CN 116932846 A CN116932846 A CN 116932846A CN 202310794220 A CN202310794220 A CN 202310794220A CN 116932846 A CN116932846 A CN 116932846A
Authority
CN
China
Prior art keywords
graph
data
determining
nodes
screened
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310794220.5A
Other languages
Chinese (zh)
Inventor
王龙龙
孙能林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Haier Uplus Intelligent Technology Beijing Co Ltd
Original Assignee
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haier Technology Co Ltd, Haier Smart Home Co Ltd, Haier Uplus Intelligent Technology Beijing Co Ltd filed Critical Qingdao Haier Technology Co Ltd
Priority to CN202310794220.5A priority Critical patent/CN116932846A/en
Publication of CN116932846A publication Critical patent/CN116932846A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a device for determining data assets, a storage medium and an electronic device, and relates to the technical field of smart families, wherein the method for determining the data assets comprises the following steps: determining a first graph corresponding to different graph data stored in a graph database, wherein the first graph is used for indicating the association relationship between nodes corresponding to the different graph data; preprocessing the first graph to generate a second graph under the condition that the first graph is determined to have a ring sub-graph; under the condition that data to be screened input by a first object is obtained, marking first graph data which are the same as the data to be screened in the second graph; according to the technical scheme, the technical problem that invalid data assets in the database cannot be comprehensively identified is solved.

Description

Method and device for determining data asset, storage medium and electronic device
Technical Field
The application relates to the technical field of smart families, in particular to a method and a device for determining data assets, a storage medium and an electronic device.
Background
At present, in large data items, a large amount of invalid data caused by historical reasons exists, the data occupy storage resources and even computing resources, but the data do not bring any value, the waste of the large amount of data resources increases the maintenance cost of corresponding resources, and the processing efficiency of related data services is reduced, so that the technical problem that the invalid data assets existing in the large data corresponding database cannot be comprehensively identified exists.
Therefore, no effective solution has been proposed for the technical problem that the invalid data assets existing in the database cannot be comprehensively identified in the related art.
Disclosure of Invention
The embodiment of the application provides a method and a device for determining data assets, a storage medium and an electronic device, which at least solve the technical problem that invalid data assets in a database cannot be comprehensively identified in the related art.
According to an embodiment of the present application, there is provided a method of determining a data asset, including: determining a first graph corresponding to different graph data stored in a graph database, wherein the first graph is used for indicating the association relationship between nodes corresponding to the different graph data; preprocessing the first graph to generate a second graph under the condition that the first graph is determined to have a ring sub-graph; under the condition that data to be screened input by a first object is obtained, marking first graph data which are the same as the data to be screened in the second graph; and extracting the data asset from the graph database according to the marking result.
In an exemplary embodiment, preprocessing the first graph to generate a second graph includes: determining a first adjacency list and/or a first inverse adjacency list corresponding to different nodes in the first graph; performing first traversal on the first adjacency list and/or the first inverse adjacency list, and marking the vertexes of the looped sub-graphs passing through in the first traversal, wherein at least one looped sub-graph exists in the first graph; determining a second adjacent table and/or a second inverse adjacent table corresponding to the ring sub-graph according to the vertex, and replacing all member points in the ring sub-graph with target points corresponding to the vertex based on the second adjacent table and/or the second inverse adjacent table so as to determine the whole ring sub-graph as a new vertex; in the case that the first graph does not exist in all ring subgraphs, generating a second graph corresponding to the first graph is determined.
In an exemplary embodiment, marking the first graph data in the second graph that is identical to the data to be screened to extract data assets from the graph database according to a marking result includes: adding a first mark for the first graph data, and identifying a transfer link containing the first graph data in the second graph, wherein the transfer link at least comprises two nodes; determining parent-child relationships between other nodes in the transfer link and target nodes corresponding to the first graph data; adding a second mark to the other nodes based on the father-son relationship and the propagation mode of the data to be screened; determining a marking result based on the second marking and the first marking to extract a data asset from the map database based on the marking result.
In an exemplary embodiment, adding a second flag to the other nodes based on the parent-child relationship and the propagation manner of the data to be screened includes: determining to add a second mark for the other nodes under the condition that the target node is determined to be a father node of the other nodes and the propagation mode of the data to be screened is upstream propagation; ending the mark adding operation under the condition that the target node is determined to be a father node of the other nodes and the propagation mode of the data to be screened is downstream propagation; ending the mark adding operation under the condition that the target node is determined to be a child node of the other nodes and the propagation mode of the data to be screened is upstream propagation; and under the condition that the target node is determined to be a child node of the other nodes and the propagation mode of the data to be screened is downstream propagation, adding a second mark for the other nodes.
In an exemplary embodiment, after adding a second flag to the other nodes based on the parent-child relationship and the propagation manner of the data to be screened, the method further includes: adjusting the marking result according to the sufficient condition corresponding to the propagation mode of the data to be screened; wherein the sufficient conditions include at least one of: determining that the propagation mode of the data to be screened is upstream propagation, and if all child nodes are determined to have marks, adding a second mark for a parent node with an inexistent mark; determining that the propagation mode of the data to be screened is downstream propagation, and if all father nodes are determined to have marks, adding a second mark for the child nodes with marks which do not exist; the adjusted target marking result is determined as a marking result for extracting the data asset.
In one exemplary embodiment, after extracting the data asset from the graph database according to the marking result, the method further comprises: identifying a data service that applies the data asset; and sending a management hint to the data service, wherein the management hint is used for indicating a second object using the data service to dereference the data asset.
In an exemplary embodiment, before determining the first graph corresponding to the different graph data stored in the graph database, the method further includes: acquiring a target storage engine preconfigured for the graph database; the target storage engine is utilized to transfer the data to be processed in the target database to the graph database; and determining a blood-edge map recorded in the target storage engine after the transfer is completed, so as to assist in determining the first map according to the blood-edge map, wherein the blood-edge map is used for indicating the transfer relationship of different data stored in the map database among various services.
According to another embodiment of the present application, there is also provided a data asset determining apparatus including: the determining module is used for determining a first graph corresponding to different graph data stored in the graph database, wherein the first graph is used for indicating the association relationship between nodes corresponding to the different graph data; the processing module is used for preprocessing the first graph to generate a second graph under the condition that the fact that the ring sub-graph exists in the first graph is determined; the extraction module is used for marking the first graph data which are the same as the data to be screened in the second graph under the condition that the data to be screened input by the first object are acquired; and extracting the data asset from the graph database according to the marking result.
According to a further embodiment of the present application, there is also provided a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above-described method of determining data assets when run.
According to yet another embodiment of the present application, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor performs the method for determining a data asset described above through the computer program.
In the embodiment of the application, a first graph corresponding to different graph data stored in a graph database is determined, wherein the first graph is used for indicating the association relationship between nodes corresponding to the different graph data; under the condition that the fact that the ring sub-graph exists in the first graph is determined, preprocessing the first graph to generate a second graph; under the condition that data to be screened input by a first object is obtained, marking first graph data which are the same as the data to be screened in a second graph; extracting data assets from the graph database according to the marking result; by adopting the technical scheme, the technical problem that the invalid data assets existing in the database cannot be comprehensively identified is solved, and the identification efficiency of the invalid assets existing in the database can be improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic diagram of a hardware environment for a method of determining data assets in accordance with an embodiment of the application;
FIG. 2 is a flow chart of a method of determining a data asset according to an embodiment of the application;
FIG. 3 is a schematic diagram of a cyclic subgraph in accordance with an embodiment of the present application;
FIG. 4 is a flow diagram of a method of determining a data asset according to an embodiment of the application;
FIG. 5 is a schematic diagram of a determination of a cyclic subgraph in accordance with an embodiment of the present application;
fig. 6 is a block diagram of a data asset determination device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to one aspect of an embodiment of the present application, a method of determining a data asset is provided. The method for determining the data asset is widely applied to full-house intelligent digital control application scenes such as intelligent Home (Smart Home), intelligent Home equipment ecology, intelligent Home (Intelligence House) ecology and the like. Alternatively, in the present embodiment, the above-described method of determining data assets may be applied to a hardware environment constituted by the terminal device 102 and the server 104 as shown in fig. 1. Fig. 1 is a schematic hardware environment of a method for determining a data asset according to an embodiment of the present application, as shown in fig. 1, where a server 104 is connected to a terminal device 102 through a network, and may be used to provide services (such as application services and the like) for a terminal or a client installed on the terminal, a database may be set on the server or independent of the server, for providing a data storage service for the server 104, and a cloud computing and/or edge computing service may be configured on the server or independent of the server, for providing a data computing service for the server 104.
The network may include, but is not limited to, at least one of: wired network, wireless network. The wired network may include, but is not limited to, at least one of: a wide area network, a metropolitan area network, a local area network, and the wireless network may include, but is not limited to, at least one of: WIFI (Wireless Fidelity ), bluetooth. The terminal device 102 may not be limited to a PC, a mobile phone, a tablet computer, an intelligent air conditioner, an intelligent smoke machine, an intelligent refrigerator, an intelligent oven, an intelligent cooking range, an intelligent washing machine, an intelligent water heater, an intelligent washing device, an intelligent dish washer, an intelligent projection device, an intelligent television, an intelligent clothes hanger, an intelligent curtain, an intelligent video, an intelligent socket, an intelligent sound box, an intelligent fresh air device, an intelligent kitchen and toilet device, an intelligent bathroom device, an intelligent sweeping robot, an intelligent window cleaning robot, an intelligent mopping robot, an intelligent air purifying device, an intelligent steam box, an intelligent microwave oven, an intelligent kitchen appliance, an intelligent purifier, an intelligent water dispenser, an intelligent door lock, and the like.
In this embodiment, a method for determining a data asset is provided, which is applied to the terminal device or the server, and fig. 2 is a flowchart of a method for determining a data asset according to an embodiment of the present application, where the flowchart includes the following steps:
Step S202, determining a first graph corresponding to different graph data stored in a graph database, wherein the first graph is used for indicating association relations between nodes corresponding to the different graph data;
step S204, preprocessing the first graph to generate a second graph under the condition that the first graph is determined to have the ring sub-graph;
step S206, under the condition that the data to be screened input by the first object is obtained, marking the first graph data which are the same as the data to be screened in the second graph; and extracting the data asset from the graph database according to the marking result.
Through the steps, determining a first graph corresponding to different graph data stored in the graph database, wherein the first graph is used for indicating the association relationship between nodes corresponding to the different graph data; under the condition that the fact that the ring sub-graph exists in the first graph is determined, preprocessing the first graph to generate a second graph; under the condition that data to be screened input by a first object is obtained, marking first graph data which are the same as the data to be screened in a second graph; extracting data assets from the graph database according to the marking result; by adopting the technical scheme, the technical problem that the invalid data assets existing in the database cannot be comprehensively identified is solved, and the identification efficiency of the invalid assets existing in the database can be improved.
In an exemplary embodiment, for the step S204, the preprocessing may be performed on the first graph to generate a second graph, where the specific steps include:
step S204-01, determining a first adjacency list and/or a first inverse adjacency list corresponding to different nodes in the first graph;
step S204-02, performing a first traversal on the first adjacency list and/or the first inverse adjacency list, and marking the vertices of the looped sub-graph passing through in the first traversal, wherein at least one looped sub-graph exists in the first graph;
step S204-03, determining a second adjacent table and/or a second inverse adjacent table corresponding to the cyclic sub-graph according to the vertex, and replacing all member points in the cyclic sub-graph with target points corresponding to the vertex based on the second adjacent table and/or the second inverse adjacent table so as to determine the cyclic sub-graph as a new vertex;
it will be appreciated that since a loop-containing sub-graph may cause traversal to enter a dead loop, the loop-containing sub-graph as a whole may be considered as a node and the node may be represented instead by the sub-loop information that the sub-loop carries all the loop member information, e.g., fig. 3 is a schematic diagram of a loop-containing sub-graph according to an embodiment of the application, if the initial step starts from G, H or F and a depth traversal is performed with an adjacency list, then F belongs to the [ G, H, F ] sub-graph. But if the traversal starts from B, D, E, then F belongs to the [ B, D, E, F, H, G ] subgraph. The sub-graph of [ G, H, F ] or [ B, D, E, F, H, G ] may then be considered as a node, alternatively the sub-rings may be represented by combinedNode (corresponding to the target point of the above embodiment), and a large sphere may be considered as being surrounded by a plurality of small spheres, each of which is a ring member, and the large sphere is a ring. Then in the marking, the combinedNode is regarded as a common vertex, and the only special place is that when the combinedNode is marked as a zombie asset, all member vertices inside the combinedNode need to be marked.
Step S204-04, determining to generate a second graph corresponding to the first graph in the case that the first graph is determined to have no cyclic subgraph.
Through the processing mode, the looped sub-graphs which influence traversal and exist in the first graph can be combined, the looped sub-graphs which exist in the first graph are simplified, the looped sub-graphs are regarded as a vertex combinedNode, and the vertex is marked, namely the looped sub-graph is marked, namely all member points in the sub-graph are marked. The effect of the presence of a looped sub-graph in the graph on subsequent labels is then avoided.
In an exemplary embodiment, marking the first graph data in the second graph that is identical to the data to be screened to extract data assets from the graph database according to a marking result includes: adding a first mark for the first graph data, and identifying a transfer link containing the first graph data in the second graph, wherein the transfer link at least comprises two nodes; determining parent-child relationships between other nodes in the transfer link and target nodes corresponding to the first graph data; adding a second mark to the other nodes based on the father-son relationship and the propagation mode of the data to be screened; determining a marking result based on the second marking and the first marking to extract a data asset from the map database based on the marking result.
Optionally, at the time of tagging, the target object is imported into an initial collection of zombic assets (corresponding to the data to be screened described above), a propagation algorithm of zombic tagging is performed based on the initial collection of zombic assets, and all the collection of zombic assets (including the new collection of zombic assets) is returned. When the upstream propagation is determined, the depth-first traversing DFS is performed by using an inverse adjacency list corresponding to the second graph, and the adjacency list is used for judging the sufficient condition. And when the downstream propagation is performed, performing depth-first traversing DFS by using an adjacent table corresponding to the second graph, and realizing sufficient condition judgment by using an inverse adjacent table.
In an exemplary embodiment, adding a second flag to the other nodes based on the parent-child relationship and the propagation manner of the data to be screened includes: determining to add a second mark for the other nodes under the condition that the target node is determined to be a father node of the other nodes and the propagation mode of the data to be screened is upstream propagation; ending the mark adding operation under the condition that the target node is determined to be a father node of the other nodes and the propagation mode of the data to be screened is downstream propagation; ending the mark adding operation under the condition that the target node is determined to be a child node of the other nodes and the propagation mode of the data to be screened is upstream propagation; and under the condition that the target node is determined to be a child node of the other nodes and the propagation mode of the data to be screened is downstream propagation, adding a second mark for the other nodes.
In an exemplary embodiment, after adding a second flag to the other nodes based on the parent-child relationship and the propagation manner of the data to be screened, the method further includes: adjusting the marking result according to the sufficient condition corresponding to the propagation mode of the data to be screened; wherein the sufficient conditions include at least one of: determining that the propagation mode of the data to be screened is upstream propagation, and if all child nodes are determined to have marks, adding a second mark for a parent node with an inexistent mark; determining that the propagation mode of the data to be screened is downstream propagation, and if all father nodes are determined to have marks, adding a second mark for the child nodes with marks which do not exist; the adjusted target marking result is determined as a marking result for extracting the data asset.
In order to ensure the accuracy of marking, the marked target data can be judged, whether the current mark is effective or not is identified, and adjustment is performed.
Optionally, during upstream propagation, DFS (depth first traversal) is performed using an inverse adjacency table, and judgment of sufficient conditions is achieved using the adjacency table. The upstream propagation conditions are: if currentNode is a zombie asset, then a sufficient condition that a certain parent Node is a zombie asset is that all child nodes of the pantnode have zombie tags.
During downstream propagation, DFS is performed using an adjacency list, and sufficient condition judgment is realized using an inverse adjacency list. The downstream propagation condition is that if currentNode is a zombie asset, then a child Node is a zombie asset, and the sufficient condition is that all parentNodes of child Node have zombie tags.
In one exemplary embodiment, after extracting the data asset from the graph database based on the marking result, the method further comprises: identifying a data service that applies the data asset; and sending a management hint to the data service, wherein the management hint is used for indicating a second object using the data service to dereference the data asset.
That is, in big data processing, the source and processing links of the data proximate to the user may be quite complex, and usually a complex directed graph (supporting a loop) is formed, all the data, scripts, etc. on the processing links may serve only the current zombie asset, so they should also be marked as zombie assets, after determining the data assets, in order to use the recovery of the data assets more accurately, the corresponding management prompt may be fed back to the data service applying the data assets, so that the corresponding recovery record may exist in the corresponding data service while the data assets are recovered, so as to avoid the abnormal recovery caused by the no record in the recovery of the data assets.
In an exemplary embodiment, before determining the first graph corresponding to the different graph data stored in the graph database, the method further includes: acquiring a target storage engine preconfigured for the graph database; the target storage engine is utilized to transfer the data to be processed in the target database to the graph database; and determining a blood-edge map recorded in the target storage engine after the transfer is completed, so as to assist in determining the first map according to the blood-edge map, wherein the blood-edge map is used for indicating the transfer relationship of different data stored in the map database among various services.
It should be noted that, since the database storing data in actual use does not have a graph attribute, when it is determined that a certain data set needs to be processed, the data set needs to be saved into a graph database capable of generating a graph, and the transfer relationship between different data and each service is recorded in the process of saving, so that the source of the different data and the use direction of the data can be clearly known, and the marking can be more comprehensively performed in the subsequent marking.
In order to better understand the process of the method for determining the data asset, the following description is given with reference to the implementation method flow of the determination of the data asset in the alternative embodiment, but the implementation method flow is not limited to the technical solution of the embodiment of the present application.
In this embodiment, a method for determining a data asset is provided, and fig. 4 is a schematic flow chart of a method for determining a data asset according to an embodiment of the present application, all graph data are first read from Neo4j, where the data include a ring graph, and the ring graph is denoted by gd_1; and constructing an adjacency list and an inverse adjacency list for the GD_1 respectively, merging and replacing the looped sub-graph in the GD_1, introducing an initial zombie asset set (ZombineCaltions) by a user, executing a zombie-marked propagation algorithm, and returning all the zombie asset sets (including new sets of the zombie assets). In sum, based on the big data blood margin system, combining with a graph algorithm, based on inputting the zombie assets confirmed by the user, finding out the zombie assets in all relevant links to realize the identification of various zombie assets in the big data system.
As shown in fig. 4, the implementation steps are as follows:
step S1: storing the data in a Neo4j graph database;
step S2: determining full blood-margin map data (corresponding to the first map in the above embodiment) corresponding to the map database;
step S3: constructing an adjacency list and an inverse adjacency list;
step S4: analyzing the adjacency list and the inverse adjacency list to determine whether a ring exists in the graph; it should be noted that, due to the propagation process of zombie marks, the traversal operation of the graph is essentially performed, and the loop-containing sub-graph causes the traversal to enter a dead loop, although the entry point of the loop-containing graph can be identified by recording the traversal times of the vertices, and the traversal of the current path is ended when the entry point is encountered. However, when traversing from different starting points and encountering the same loop subgraph, the result of determining the loop entry point will be different, which results in inaccuracy of the "zombie mark propagation algorithm". In order to solve the above problem, in the embodiment of the present application, the cyclic sub-graph is simplified, and is regarded as a combinedNode, and when the cyclic sub-graph is marked, the marking is performed on all member points in the sub-graph.
Step S5: if the ring exists, the maximum ring sub-graph is identified, the ring sub-graphs are combined to form a combined vertex, and the processing adjacency list and the inverse adjacency list transfer the vertex relation belonging to the ring sub-graph to the combined vertex.
As an alternative implementation, as shown in fig. 5, fig. 5 is a schematic diagram of a determination of a cyclic subgraph according to an embodiment of the present application;
as an alternative implementation manner, the looped sub-graph in the graph is identified by determining the outbound degree or inbound degree corresponding to all nodes in the graph, deleting the point with the outbound degree or inbound degree of 0 and the related edge thereof, and the rest points enable the outbound degree or inbound degree to be different from 0, so that the maximum sub-loop graph in the graph can be rapidly located. For example, when the A, F node with the outbound degree or inbound degree of 0 in fig. 5 is deleted, the node existing in the original graph includes A, B, C, D, E, F, and after merging, the remaining nodes include: B. c, D, E;
further, the adjacency list with ring subgraphs in FIG. 5 can be constructed as follows:
B->[C];
C->[D,E];
D->[E];
E->[C,B];
accordingly, the inverse adjacency table in fig. 5 is as follows:
B->[E];
C->[B,E];
D->[C];
E->[C,D];
optionally, performing depth-first traversal on each row of the adjacency list, and setting a mark on the traversed vertex, where the traversed vertex does not perform any operation, the following is the traversal result of each row of the adjacency list:
The traversal results for node B are as follows:
start->B(unmark)->C(unmark)->D(unmark)->E(unmark)->C(mark)->B(mar k)->E(mark)->end;
the traversal results for node C are as follows:
C:start->C(mark)->end;
the traversal results for node D are as follows:
D:start->D(mark)->end;
the traversal results for node E are as follows:
E:start->E(mark)->end;
it can be determined by the above traversal that B, C, D, E belong to the same ring.
Step S6: if no ring exists, the final adjacency list and inverse adjacency list incorporating the ring subgraph are determined directly.
Alternatively, the traversing operation applied in the step S5 may determine which sub-ring a vertex belongs to, where all ring members in the adjacency list and the inverse adjacency list of the complete blood map need to be replaced by the sub-ring, that is, the sub-ring is denoted by combinedNode, and it may be considered that a large sphere wraps a plurality of small spheres, each small sphere is a ring member, and the large sphere is a ring. Then in the zombie tag propagation algorithm, combinedNode is regarded as a common vertex, and the only special place is that when combinedNode is tagged as a zombie asset, all member vertices inside the combinedNode need to be tagged.
It can be understood that the merging identifies the ring sub-graph, finds member points of the ring sub-graph, and replaces all the member points with combineddode, that is, abstract representation is performed on the ring sub-graph, which can be regarded as a common point in the graph. And then the ring members of the adjacency list and the inverse adjacency list belonging to the same ring are replaced by corresponding combinedNodes.
Step S7: the set of zombie assets entered by the user (corresponding to the data to be screened in the above embodiment).
Step S8: marking zombie assets in a final adjacency list and an inverse adjacency list;
step S9: propagating the zombie tag upstream and propagating the zombie tag downstream;
optionally, during upstream propagation, DFS (depth first traversal) is performed using an inverse adjacency table, and judgment of sufficient conditions is achieved using the adjacency table. The upstream propagation conditions are: if currentNode is a zombie asset, then a sufficient condition that a certain parentNode is a zombie asset is that all child rennodes of the parentNode have zombie tags.
During downstream propagation, DFS is performed using an adjacency list, and sufficient condition judgment is realized using an inverse adjacency list. The downstream propagation condition is that if currentNode is a zombie asset, then a sufficient condition that a certain child is a zombie asset is that all parentNodes of child have zombie tags.
Step S10: traversing the final adjacency list and the inverse adjacency list, and returning all assets with zombie marks.
Specifically, the tag propagation algorithm is shown below in pseudo code.
Zombie asset identification algorithm pseudocode:
through the steps, the conditional propagation algorithm of the data in the graph is realized by loading the data in Neo4j into the memory and utilizing the graph-related algorithm. Therefore, in big data application, invalid assets can be quickly and effectively identified, in practical application, a user can identify all asset information from a source to a BI report according to the BI report to be offline (or other assets close to a user plane), then in the field of data management, the identified result can be utilized for carrying out resource recovery and other works, and finally the effects of cost reduction and efficiency enhancement are achieved.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the various embodiments of the present application.
FIG. 6 is a block diagram of a data asset determination device according to an embodiment of the application; as shown in fig. 6, includes:
a determining module 62, configured to determine a first graph corresponding to different graph data stored in a graph database, where the first graph is used to indicate an association relationship between nodes corresponding to the different graph data;
a processing module 64, configured to, in a case where it is determined that there is a ring sub-graph in the first graph, pre-process the first graph to generate a second graph;
The extracting module 66 is configured to, when obtaining data to be screened input by the first object, mark first graph data that is the same as the data to be screened in the second graph; and extracting the data asset from the graph database according to the marking result.
Through the device, the first graphs corresponding to different graph data stored in the graph database are determined, wherein the first graphs are used for indicating the association relationship between nodes corresponding to the different graph data; under the condition that the fact that the ring sub-graph exists in the first graph is determined, preprocessing the first graph to generate a second graph; under the condition that data to be screened input by a first object is obtained, marking first graph data which are the same as the data to be screened in a second graph; extracting data assets from the graph database according to the marking result; by adopting the technical scheme, the technical problem that the invalid data assets existing in the database cannot be comprehensively identified is solved, and the identification efficiency of the invalid assets existing in the database can be improved.
In an exemplary embodiment, preprocessing the first graph to generate a second graph includes: determining a first adjacency list and/or a first inverse adjacency list corresponding to different nodes in the first graph; performing first traversal on the first adjacency list and/or the first inverse adjacency list, and marking the vertexes of the looped sub-graphs passing through in the first traversal, wherein at least one looped sub-graph exists in the first graph; determining a second adjacent table and/or a second inverse adjacent table corresponding to the ring sub-graph according to the vertex, and replacing all member points in the ring sub-graph with target points corresponding to the vertex based on the second adjacent table and/or the second inverse adjacent table so as to determine the whole ring sub-graph as a new vertex; in the case that the first graph does not exist in all ring subgraphs, generating a second graph corresponding to the first graph is determined.
In an exemplary embodiment, marking the first graph data in the second graph that is identical to the data to be screened to extract data assets from the graph database according to a marking result includes: adding a first mark for the first graph data, and identifying a transfer link containing the first graph data in the second graph, wherein the transfer link at least comprises two nodes; determining parent-child relationships between other nodes in the transfer link and target nodes corresponding to the first graph data; adding a second mark to the other nodes based on the father-son relationship and the propagation mode of the data to be screened; determining a marking result based on the second marking and the first marking to extract a data asset from the map database based on the marking result.
In an exemplary embodiment, adding a second flag to the other nodes based on the parent-child relationship and the propagation manner of the data to be screened includes: determining to add a second mark for the other nodes under the condition that the target node is determined to be a father node of the other nodes and the propagation mode of the data to be screened is upstream propagation; ending the mark adding operation under the condition that the target node is determined to be a father node of the other nodes and the propagation mode of the data to be screened is downstream propagation; ending the mark adding operation under the condition that the target node is determined to be a child node of the other nodes and the propagation mode of the data to be screened is upstream propagation; and under the condition that the target node is determined to be a child node of the other nodes and the propagation mode of the data to be screened is downstream propagation, adding a second mark for the other nodes.
In an exemplary embodiment, after adding a second flag to the other nodes based on the parent-child relationship and the propagation manner of the data to be screened, the method further includes: adjusting the marking result according to the sufficient condition corresponding to the propagation mode of the data to be screened; wherein the sufficient conditions include at least one of: determining that the propagation mode of the data to be screened is upstream propagation, and if all child nodes are determined to have marks, adding a second mark for a parent node with an inexistent mark; determining that the propagation mode of the data to be screened is downstream propagation, and if all father nodes are determined to have marks, adding a second mark for the child nodes with marks which do not exist; the adjusted target marking result is determined as a marking result for extracting the data asset.
In one exemplary embodiment, after extracting the data asset from the graph database according to the marking result, the method further comprises: identifying a data service that applies the data asset; and sending a management hint to the data service, wherein the management hint is used for indicating a second object using the data service to dereference the data asset.
In an exemplary embodiment, before determining the first graph corresponding to the different graph data stored in the graph database, the method further includes: acquiring a target storage engine preconfigured for the graph database; the target storage engine is utilized to transfer the data to be processed in the target database to the graph database; and determining a blood-edge map recorded in the target storage engine after the transfer is completed, so as to assist in determining the first map according to the blood-edge map, wherein the blood-edge map is used for indicating the transfer relationship of different data stored in the map database among various services.
An embodiment of the present application also provides a storage medium including a stored program, wherein the program executes the method of any one of the above.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store program code for performing the steps of:
s1, determining a first graph corresponding to different graph data stored in a graph database, wherein the first graph is used for indicating association relations between nodes corresponding to the different graph data;
s2, preprocessing the first graph to generate a second graph under the condition that the first graph is determined to have the ring sub-graph;
S3, under the condition that data to be screened input by a first object is obtained, marking first graph data which are the same as the data to be screened in the second graph; and extracting the data asset from the graph database according to the marking result.
An embodiment of the application also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, determining a first graph corresponding to different graph data stored in a graph database, wherein the first graph is used for indicating association relations between nodes corresponding to the different graph data;
s2, preprocessing the first graph to generate a second graph under the condition that the first graph is determined to have the ring sub-graph;
s3, under the condition that data to be screened input by a first object is obtained, marking first graph data which are the same as the data to be screened in the second graph; and extracting the data asset from the graph database according to the marking result.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present application is not limited to any specific combination of hardware and software.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (10)

1. A method of determining a data asset, comprising:
determining a first graph corresponding to different graph data stored in a graph database, wherein the first graph is used for indicating the association relationship between nodes corresponding to the different graph data;
preprocessing the first graph to generate a second graph under the condition that the first graph is determined to have a ring sub-graph;
and under the condition that the data to be screened input by the first object is obtained, marking the first graph data which are the same as the data to be screened in the second graph, so as to extract the data asset from the graph database according to the marking result.
2. The method of claim 1, wherein preprocessing the first map to generate a second map comprises:
determining a first adjacency list and/or a first inverse adjacency list corresponding to different nodes in the first graph;
Performing first traversal on the first adjacency list and/or the first inverse adjacency list, and marking the vertexes of the looped sub-graphs passing through in the first traversal, wherein at least one looped sub-graph exists in the first graph;
determining a second adjacent table and/or a second inverse adjacent table corresponding to the ring sub-graph according to the vertex, and replacing all member points in the ring sub-graph with target points corresponding to the vertex based on the second adjacent table and/or the second inverse adjacent table so as to determine the whole ring sub-graph as a new vertex;
in the case that the first graph does not exist in all ring subgraphs, generating a second graph corresponding to the first graph is determined.
3. The method of claim 1, wherein marking the first map data in the second map that is identical to the data to be screened to extract data assets from the map database based on the marking results comprises:
adding a first mark for the first graph data, and identifying a transfer link containing the first graph data in the second graph, wherein the transfer link at least comprises two nodes;
Determining parent-child relationships between other nodes in the transfer link and target nodes corresponding to the first graph data;
adding a second mark to the other nodes based on the father-son relationship and the propagation mode of the data to be screened;
determining a marking result based on the second marking and the first marking to extract a data asset from the map database based on the marking result.
4. A method of determining a data asset according to claim 3, wherein adding a second tag to the other nodes based on the parent-child relationship and the propagation of the data to be screened comprises:
determining to add a second mark for the other nodes under the condition that the target node is determined to be a father node of the other nodes and the propagation mode of the data to be screened is upstream propagation;
ending the mark adding operation under the condition that the target node is determined to be a father node of the other nodes and the propagation mode of the data to be screened is downstream propagation;
ending the mark adding operation under the condition that the target node is determined to be a child node of the other nodes and the propagation mode of the data to be screened is upstream propagation;
And under the condition that the target node is determined to be a child node of the other nodes and the propagation mode of the data to be screened is downstream propagation, adding a second mark for the other nodes.
5. A method of determining a data asset according to claim 3, wherein after adding a second marker to the other nodes based on the parent-child relationship and the way of propagation of the data to be screened, the method further comprises:
adjusting the marking result according to the sufficient condition corresponding to the propagation mode of the data to be screened; wherein the sufficient conditions include at least one of:
determining that the propagation mode of the data to be screened is upstream propagation, and if all child nodes are determined to have marks, adding a second mark for a parent node with an inexistent mark;
determining that the propagation mode of the data to be screened is downstream propagation, and if all father nodes are determined to have marks, adding a second mark for the child nodes with marks which do not exist;
the adjusted target marking result is determined as a marking result for extracting the data asset.
6. The method of claim 1, wherein after extracting data assets from the graph database based on the marking results, the method further comprises:
Identifying a data service that applies the data asset;
and sending a management hint to the data service, wherein the management hint is used for indicating a second object using the data service to dereference the data asset.
7. The method of determining a data asset according to claim 1, wherein prior to determining a first graph corresponding to different graph data stored in the graph database, the method further comprises:
acquiring a target storage engine preconfigured for the graph database;
the target storage engine is utilized to transfer the data to be processed in the target database to the graph database;
and determining a blood-edge map recorded in the target storage engine after the transfer is completed, so as to assist in determining the first map according to the blood-edge map, wherein the blood-edge map is used for indicating the transfer relationship of different data stored in the map database among various services.
8. A data asset determination apparatus, comprising:
the determining module is used for determining a first graph corresponding to different graph data stored in the graph database, wherein the first graph is used for indicating the association relationship between nodes corresponding to the different graph data;
The processing module is used for preprocessing the first graph to generate a second graph under the condition that the fact that the ring sub-graph exists in the first graph is determined;
the extraction module is used for marking the first graph data which are the same as the data to be screened in the second graph under the condition that the data to be screened input by the first object are acquired; and extracting the data asset from the graph database according to the marking result.
9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run performs the method of any of the preceding claims 1 to 7.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 7 by means of the computer program.
CN202310794220.5A 2023-06-29 2023-06-29 Method and device for determining data asset, storage medium and electronic device Pending CN116932846A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310794220.5A CN116932846A (en) 2023-06-29 2023-06-29 Method and device for determining data asset, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310794220.5A CN116932846A (en) 2023-06-29 2023-06-29 Method and device for determining data asset, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN116932846A true CN116932846A (en) 2023-10-24

Family

ID=88385528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310794220.5A Pending CN116932846A (en) 2023-06-29 2023-06-29 Method and device for determining data asset, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN116932846A (en)

Similar Documents

Publication Publication Date Title
CN111178523B (en) Behavior detection method and device, electronic equipment and storage medium
JP6321681B2 (en) Method and apparatus for identifying website users
CN110912809B (en) Information sharing chain generation method and device, electronic equipment and storage medium
WO2019118868A1 (en) Fraud detection in data sets using bayesian networks
CN110276199A (en) A kind of dynamic security detection method of Kubernetes cloud native applications
CN114140075B (en) Service processing method, device, medium and electronic equipment
CN106529953B (en) Method and device for risk identification of business attributes
CN107577943A (en) Sample predictions method, apparatus and server based on machine learning
CN113240139A (en) Alarm cause and effect evaluation method, fault root cause positioning method and electronic equipment
CN117573320A (en) Task node execution method and device, storage medium and electronic device
CN112100452B (en) Method, apparatus, device and computer readable storage medium for data processing
CN116932846A (en) Method and device for determining data asset, storage medium and electronic device
CN113553577A (en) Unknown user malicious behavior detection method and system based on hypersphere variational automatic encoder
CN114723554B (en) Abnormal account identification method and device
CN116225834A (en) Alarm information sending method and device, storage medium and electronic device
CN113285977B (en) Network maintenance method and system based on block chain and big data
CN116382766A (en) Page packaging method and device, storage medium and electronic device
CN110309312B (en) Associated event acquisition method and device
CN113516238A (en) Model training method, denoising method, model, device and storage medium
CN117726908B (en) Training method and device for picture generation model, storage medium and electronic device
CN117095677A (en) Semantic understanding template generation method and device, storage medium and electronic device
CN117033745B (en) Method, system, equipment and storage medium for identifying cheating object
CN117573237A (en) Verification result determining method and device, storage medium and electronic device
CN116467176A (en) Determination method and device of test task, storage medium and electronic device
CN116301767A (en) Interface file generation method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination