CN114265956A - Data verification method and device, electronic equipment and storage medium - Google Patents

Data verification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114265956A
CN114265956A CN202111612356.7A CN202111612356A CN114265956A CN 114265956 A CN114265956 A CN 114265956A CN 202111612356 A CN202111612356 A CN 202111612356A CN 114265956 A CN114265956 A CN 114265956A
Authority
CN
China
Prior art keywords
data
graph
node
nodes
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111612356.7A
Other languages
Chinese (zh)
Inventor
喻琦
李凌
石彦彬
张英彬
杨婕
宋琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202111612356.7A priority Critical patent/CN114265956A/en
Publication of CN114265956A publication Critical patent/CN114265956A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The application provides a data verification method, a data verification device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a data record contained in a data set, and acquiring a foreign key of the data record; taking the data records as graph nodes, and generating edges of the graph nodes based on the foreign keys to obtain a directed graph for describing data relationships among the data records; the data set is checked based on the directed graph. The embodiment of the application can more effectively verify the data set from the global perspective.

Description

Data verification method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data analysis, and in particular, to a data verification method and apparatus, an electronic device, and a storage medium.
Background
In many fields where applications are implemented based on data, in order to ensure the validity of an application process, data needs to be checked in advance, so as to improve the quality of the data according to a check result. In the prior art, each data is usually checked by using a preset template or a preset rule. However, when the mass data are collected into an organic whole, the method in the prior art cannot sufficiently and effectively improve the quality of the whole data.
Disclosure of Invention
An object of the present application is to provide a data verification method, apparatus, electronic device, and storage medium, which can more effectively verify a data set from a global perspective.
According to an aspect of an embodiment of the present application, a data verification method is disclosed, the method including:
acquiring a data record contained in a data set, and acquiring a foreign key of the data record;
taking the data records as graph nodes, and generating edges of the graph nodes based on the foreign keys to obtain a directed graph for describing data relationships among the data records;
the data set is checked based on the directed graph.
According to an aspect of the embodiments of the present application, a data verification apparatus is disclosed, the apparatus including:
the acquisition module is configured to acquire data records contained in a data set and acquire external keys of the data records;
the graph generation module is configured to take the data records as graph nodes, generate edges of the graph nodes based on the foreign keys and obtain directed graphs for describing data relationships among the data records;
a verification module configured to verify the data set based on the directed graph.
In an exemplary embodiment of the present application, the apparatus is configured to:
traversing the data records, generating corresponding graph nodes for the data records traversed for the first time, and for the generated target graph nodes, if the data records corresponding to the target graph nodes are traversed again, marking the target graph nodes as repeated graph nodes;
and checking the data repeatability of the data set based on the repeated graph nodes.
In an exemplary embodiment of the present application, the apparatus is configured to:
regarding the generated target graph nodes, taking data records with the external keys of the target graph nodes as main keys as lower end data records at the lower ends of the target graph nodes;
when a graph node matching the lower end data record is detected in the generated graph nodes, an edge is generated from the target graph node to the graph node matching the lower end data record.
In an exemplary embodiment of the present application, the apparatus is configured to:
when detecting that no graph node matched with the lower end data record exists in the generated graph nodes, detecting whether the source database of the data record stores the lower end data record;
when detecting that the source database does not store the lower end data record, marking the lower end data record as a missing graph node;
and checking the data integrity of the data set based on the characteristic information of the missing graph node.
In an exemplary embodiment of the present application, the apparatus is configured to:
calculating the sum of the out-degree and the in-degree of the same graph node in the directed graph to obtain the sum of the in-degree and the out-degree of each graph node;
and checking the importance of each data record in the data set based on the sum of the access degrees of each graph node.
In an exemplary embodiment of the present application, the apparatus is configured to:
performing loop detection on the directed graph to obtain a node loop in the directed graph;
and checking the data security of the data set based on the characteristic information of the node loop.
In an exemplary embodiment of the present application, the apparatus is configured to:
acquiring path quantity characteristic information of the node loop, wherein the path quantity characteristic information of the node loop is used for describing the quantity of paths intersected with the node loop;
and checking the data security of the data set based on the path data characteristic information of the node loop.
According to an aspect of an embodiment of the present application, an electronic device is disclosed, including: a memory storing computer readable instructions; a processor reading computer readable instructions stored by the memory to perform the method of any of the above embodiments.
According to an aspect of embodiments of the present application, there is disclosed a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of any one of the above embodiments.
According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.
In the embodiment of the application, the data records are used as the graph nodes, and the edges of the graph nodes are generated based on the foreign keys of the data records, so that the directed graph which perfectly describes the data relationship among the data records in the data set can be obtained. Therefore, on the basis of the directed graph, the data set can be checked more effectively from the global perspective.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 shows a flow diagram of a data verification method according to one embodiment of the present application.
FIG. 2 shows a flow diagram of a data verification method according to one embodiment of the present application.
FIG. 3 illustrates a flow diagram for generating edges for graph nodes according to one embodiment of the present application.
FIG. 4 shows a block diagram of a data verification device according to one embodiment of the present application.
FIG. 5 illustrates a hardware diagram of an electronic device according to one embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the present application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, steps, and so forth. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The application provides a data verification method which is mainly used for carrying out integrity verification, repeatability verification, safety verification and the like on a data set containing a plurality of data records.
Fig. 1 shows a flowchart of a data verification method according to an embodiment of the present application. An exemplary execution subject of the method is any computer with sufficient computing power, such as a server. The method comprises the following steps:
step S110, acquiring data records contained in the data set, and acquiring external keys of the data records;
step S120, taking the data records as graph nodes, and generating edges of the graph nodes based on foreign keys to obtain a directed graph for describing data relationships among the data records;
and step S130, checking the data set based on the directed graph.
In the embodiment of the application, for a data set to be verified, each data record in the data set is obtained, and a foreign key of each data record is obtained. Wherein foreign key is a concept with respect to primary key. The primary key of a data record is used for uniquely identifying the data record, and the external key of a data record can be one or more primary keys of the data records, and the data records are linked through the external key. For example: the data structure of data record a1 is < key1, key2>, and the data structure of data record b2 is < key2, key3 >. Where key1 of data record a1 is different from key1 of other data records of the same structure, key1 uniquely identifies data record a1, key2 of data record a1 may be repeated with key2 of other data records of the same structure, then key1 is the primary key of data record a1, and key2 is not the primary key of data record a 1. Similarly, key2 is the primary key for data record b2, and key3 is not the primary key for data record b 2. Since key2 is not the primary key for data record a1, but is the primary key for data record b2, primary key2 in data record b2 is the foreign key for data record a 1.
The data records are then used as graph nodes, and edges of the graph nodes are generated based on the foreign keys of the data records. Since the foreign key is a primary key pointing from a non-primary key of a certain data record to another data record, an edge generated based on the foreign key can describe the data relationship between the graph nodes connected by the edge, and thus the generated directed graph can describe the data relationship between the data records in the data set.
By the directed graph, not only can the explicit data relation among the data records be visually described, but also the implicit data relation among the data records can be visually described, so that on the basis of the directed graph, the data set can be more effectively checked from the global perspective. In addition, the directed graph has a good dynamic expansion structure, so that the data set can adapt to a dynamically changing data set on the basis of the directed graph, and the data set is dynamically verified.
Therefore, in the embodiment of the application, the data records are used as the graph nodes, and the edges of the graph nodes are generated based on the foreign keys of the data records, so that the directed graph which perfectly describes the data relationship among the data records in the data set can be obtained. Therefore, on the basis of the directed graph, the data set can be checked more effectively from the global perspective.
In an embodiment, a configuration file is loaded, and a data file corresponding to a data set is read through description information in the configuration file, so that data records included in the data set are obtained through analysis.
Specifically, the description information in the configuration file includes: the header of the data file, the relation between the header fields, the primary key field of the data, the value constraint of the header fields, and the like.
In an embodiment, the data records are traversed, corresponding graph nodes are generated for the data records traversed for the first time, and for the generated target graph nodes, if the data records corresponding to the target graph nodes are traversed again, the target graph nodes are marked as repeated graph nodes. And checking the data repeatability of the data set based on the nodes of the repeated graph.
Specifically, the target graph node is a graph node generated in the broadly-directed graph. For any target graph node, if the corresponding data record is traversed again, it is indicated that the corresponding data record corresponds to a plurality of identical repeated data records, and therefore the corresponding data record is marked as a repeated graph node.
In other words, in the process of generating the directed graph, a corresponding graph node is generated for each unique data record, and a plurality of identical repeated data records collectively correspond to one graph node. And if a certain graph node corresponds to a plurality of identical repeated data records, marking the graph node as a repeated graph node. And after detecting the repeated data record, recording the repeated data record so as to facilitate the summarization of the verification result.
And when the data set is checked based on the directed graph, the data repeatability of the data set is checked based on the repeated graph nodes.
In an embodiment, for a plurality of identical duplicate data records, not only the graph nodes corresponding to the duplicate data records are marked as duplicate graph nodes, but also the duplicate graph nodes corresponding to the duplicate data records are counted. For example: in the process of traversing the data records in the data set, when the data record a1 is traversed for the first time, generating a corresponding graph node a1 for the data record a 1; when traversing to the data record a1 for the second time, marking the graph node a1 as a repeated graph node, and marking the record data characteristic information of the graph node a1 as 2, so as to indicate that the graph node a1 corresponds to 2 identical repeated data records; similarly, each time the data record a1 is traversed, the record data characteristic information is incremented by 1.
Further, based on the counts made for the repeating graph nodes, it may be determined how many data records are repeated in the data set. The more the repeated data records in the data set, the higher the data repeatability of the data set is; conversely, the less duplicate data records in the data set, the lower the data repeatability of the data set.
In one embodiment, for the generated target graph node, the data record with the foreign key of the target graph node as the main key is used as the lower end data record at the lower end of the target graph node. When a graph node matching the lower end data record is detected in the generated graph nodes, an edge from the target graph node to the graph node matching the lower end data record is generated.
Specifically, the target graph node is a graph node generated in the broadly-directed graph. The generated graph nodes in the directed graph can be traversed, and for the traversed target graph nodes, the foreign keys of the target graph nodes are read, so that the lower end data records at the lower ends of the target nodes are determined. And the lower end data record is a data record taking the external key of the target graph node as a main key. For example: the primary key2 in data record b2 is the foreign key of data record a1, and data record b2 is the lower data record of data record a 1.
And then detecting whether a graph node matched with the lower end data record of the target graph node exists in the generated graph nodes. If there is a graph node that matches the lower data record of the target graph node, illustrating that the generated portion of the directed graph describes the data relationship between the data record having the target graph node and the lower data record, then an edge is generated from the target graph node to the graph node that matches the lower data record.
In an embodiment, when it is detected that there is no graph node matching the lower end data record, it is detected whether the source database of the data record stores the lower end data record. And when detecting that the source database does not store the lower end data record in the generated graph node, marking the lower end data record as a missing graph node. And checking the data integrity of the data set based on the characteristic information of the nodes of the missing graph.
Specifically, if a graph node matching the lower end data record of the target graph node is not detected in the generated graph nodes, it is described that the generated part of the directed graph does not describe the data relationship between the data record of the target graph node and the lower end data record, and naturally, no corresponding edge exists.
There are two possibilities for this: firstly, the source database stores the lower end data record, but the lower end data record is not used as a graph node, and the graph node of the lower end data record is finally stored in the directed graph along with the reading of the source database; and secondly, the lower end data record is missed in the source database.
In this case, therefore, it is continuously checked whether the source database has the lower data record stored therein. If the lower end data record is not detected in the source database, which indicates that the lower end data record is missing, the lower end data record is marked as a missing graph node. And then checking the data integrity of the data set based on the characteristic information of the nodes of the missing graph (such as the total number of the nodes of the missing graph in the directed graph, the degree of the nodes of the missing graph in the directed graph, and the like).
In one embodiment, missing graph nodes may be added to the directed graph in a different representation than non-missing graph nodes. For example: taking solid boxes in the directed graph to represent non-missing graph nodes, then a dashed box may be taken to add missing graph nodes to the directed graph.
In one embodiment, the sum of the out-degree and the in-degree of the same graph node in the directed graph is calculated to obtain the sum of the in-degree and the out-degree of each graph node. The importance of each data record in the data set is checked based on the sum of the in and out degrees of each graph node.
Specifically, the out-degree of a graph node refers to the number of edges with the graph node as a starting point; the in-degree of a graph node refers to the number of edges that end at the graph node. The larger the sum of the discrepancy degrees of the graph nodes is, the closer the relationship between the graph node and other graph nodes is, and the more important the graph is in the directed graph. Accordingly, the greater the sum of the in and out degrees of graph nodes, the more important its corresponding data records are in the data set.
In an embodiment, loop detection is performed on the directed graph to obtain node loops in the directed graph. And verifying the data security of the data set based on the characteristic information of the node loop.
Specifically, after the directed graph is generated, loop detection is performed on the directed graph based on the topological structure of the directed graph. The existence of the node loop indicates that the primary key of each graph node in the node loop is the foreign key of another graph node, and the data structure can destroy the data security of the data set.
If the node loop exists in the digraph, after the node loop is detected, the characteristic information of the node loop is obtained, and then the data security of the data set is verified based on the characteristic information of the node loop.
In an embodiment, node quantity characteristic information of a node loop is obtained, wherein the node quantity characteristic information of the node loop is used for describing the quantity of graph nodes in the node loop. And verifying the data security of the data set based on the node quantity characteristic information of the node loop.
Specifically, based on the node quantity characteristic information of the node loop, the number of graph nodes in the node loop can be determined. The more graph nodes in the node loop, the larger the range of the directed graph influenced by the node loop is, and the lower the data security of the data set is; conversely, the fewer graph nodes in the node loop, the smaller the range of the directed graph affected by the node loop, and the higher the data security of the data set.
In one embodiment, path number characteristic information of a node loop is obtained, wherein the path number characteristic information of the node loop is used for describing the number of paths intersecting with the node loop. And verifying the data security of the data set based on the path data characteristic information of the node loop.
Specifically, a path intersecting a node loop refers to a path that shares at least one graph node with the node loop. It will be appreciated that graph nodes may form other legitimate paths in addition to node loops. In this case, a legitimate path in which each graph node in the node loop is located is detected, and the number of paths intersecting the node loop is determined.
The more paths intersected with the node loops, the larger the range of the directed graph influenced by the node loops is, and the lower the data security of the data set is; conversely, the fewer paths intersecting the node loop, the smaller the range of the directed graph affected by the node loop, and the higher the data security of the data set.
The embodiment has the advantages that the data safety of the data set is verified through the path data characteristic information based on the node loop, the influence degree of the node loop on the overall safety of the data set can be reflected more completely from the global angle, and the safety verification rationality is improved.
Fig. 2 shows a flowchart of a data verification method according to an embodiment of the present application, and fig. 3 shows a flowchart of generating an edge of a graph node according to an embodiment of the present application.
Referring to fig. 2 and fig. 3, in this embodiment, the server loads the configuration file, and reads the data file corresponding to the data set through the description information in the configuration file. The data records are read from the data file, and the unique data records are respectively used as corresponding graph nodes. In the process of reading the data records, if a plurality of identical repeated data records are detected, the graph nodes corresponding to the identical repeated data records are marked as repeated graph nodes.
And scanning the external keys of the data records according to the configuration file, and generating the edges of the graph nodes according to the external keys. Specifically, when the lower end data record of the foreign key exists in the directed graph, generating a corresponding edge, and continuously processing the next scanned foreign key; when the lower end data record of the foreign key does not exist in the directed graph, whether the lower end data record is stored in the source database is inquired; if the source database stores the lower end data record, the next scanned foreign key is continuously processed; and if the lower-end data record does not exist in the directed graph or the source database, confirming that the lower-end data record is missing, marking the lower-end data record as a missing graph node, and recording the data as incomplete. And after all the foreign keys are traversed, the edge generation processing of the graph nodes is finished.
After the digraph is generated, loop detection is performed. And if the node loop exists, recording the node loop and recording each data record on the node loop.
After the directed graph is generated, isolated point detection is also performed. And if the isolated node exists, recording the isolated node.
Furthermore, the server checks the data repeatability of the data set according to the nodes of the repeated graph, checks the data integrity of the data set according to the nodes of the missing graph, and checks the data security of the data set according to the node loops. And generating a corresponding verification report according to the verification result.
FIG. 4 shows a data verification apparatus according to an embodiment of the present application, the apparatus comprising:
an obtaining module 210 configured to obtain a data record included in a data set, and obtain a foreign key of the data record;
the graph generation module 220 is configured to use the data records as graph nodes, and generate edges of the graph nodes based on the foreign keys to obtain a directed graph for describing data relationships among the data records;
a verification module 230 configured to verify the data set based on the directed graph.
In an exemplary embodiment of the present application, the apparatus is configured to:
traversing the data records, generating corresponding graph nodes for the data records traversed for the first time, and for the generated target graph nodes, if the data records corresponding to the target graph nodes are traversed again, marking the target graph nodes as repeated graph nodes;
and checking the data repeatability of the data set based on the repeated graph nodes.
In an exemplary embodiment of the present application, the apparatus is configured to:
regarding the generated target graph nodes, taking data records with the external keys of the target graph nodes as main keys as lower end data records at the lower ends of the target graph nodes;
when a graph node matching the lower end data record is detected in the generated graph nodes, an edge is generated from the target graph node to the graph node matching the lower end data record.
In an exemplary embodiment of the present application, the apparatus is configured to:
when detecting that no graph node matched with the lower end data record exists in the generated graph nodes, detecting whether the source database of the data record stores the lower end data record;
when detecting that the source database does not store the lower end data record, marking the lower end data record as a missing graph node;
and checking the data integrity of the data set based on the characteristic information of the missing graph node.
In an exemplary embodiment of the present application, the apparatus is configured to:
calculating the sum of the out-degree and the in-degree of the same graph node in the directed graph to obtain the sum of the in-degree and the out-degree of each graph node;
and checking the importance of each data record in the data set based on the sum of the access degrees of each graph node.
In an exemplary embodiment of the present application, the apparatus is configured to:
performing loop detection on the directed graph to obtain a node loop in the directed graph;
and checking the data security of the data set based on the characteristic information of the node loop.
In an exemplary embodiment of the present application, the apparatus is configured to:
acquiring path quantity characteristic information of the node loop, wherein the path quantity characteristic information of the node loop is used for describing the quantity of paths intersected with the node loop;
and checking the data security of the data set based on the path data characteristic information of the node loop.
An electronic device 30 according to an embodiment of the present application is described below with reference to fig. 5. The electronic device 30 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 5, the electronic device 30 is in the form of a general purpose computing device. The components of the electronic device 30 may include, but are not limited to: the at least one processing unit 310, the at least one memory unit 320, and a bus 330 that couples various system components including the memory unit 320 and the processing unit 310.
Wherein the storage unit stores program code executable by the processing unit 310 to cause the processing unit 310 to perform steps according to various exemplary embodiments of the present invention described in the description part of the above exemplary methods of the present specification. For example, the processing unit 310 may perform the various steps as shown in fig. 1.
The storage unit 320 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)3201 and/or a cache memory unit 3202, and may further include a read only memory unit (ROM) 3203.
The storage unit 320 may also include a program/utility 3204 having a set (at least one) of program modules 3205, such program modules 3205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 330 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 30 may also communicate with one or more external devices 400 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 30, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 30 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 350. An input/output (I/O) interface 350 is connected to the display unit 340. Also, the electronic device 30 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 360. As shown, the network adapter 360 communicates with the other modules of the electronic device 30 via the bus 330. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 30, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present application.
In an exemplary embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method described in the above method embodiment section.
According to an embodiment of the present application, there is also provided a program product for implementing the method in the above method embodiment, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as JAVA, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods herein are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims (10)

1. A method for data verification, the method comprising:
acquiring a data record contained in a data set, and acquiring a foreign key of the data record;
taking the data records as graph nodes, and generating edges of the graph nodes based on the foreign keys to obtain a directed graph for describing data relationships among the data records;
the data set is checked based on the directed graph.
2. The method of claim 1, wherein using the data record as a graph node comprises: traversing the data records, generating corresponding graph nodes for the data records traversed for the first time, and for the generated target graph nodes, if the data records corresponding to the target graph nodes are traversed again, marking the target graph nodes as repeated graph nodes;
verifying the data set based on the directed graph, including: and checking the data repeatability of the data set based on the repeated graph nodes.
3. The method of claim 1, wherein generating edges for the graph nodes based on the foreign key comprises:
regarding the generated target graph nodes, taking data records with the external keys of the target graph nodes as main keys as lower end data records at the lower ends of the target graph nodes;
when a graph node matching the lower end data record is detected in the generated graph nodes, an edge is generated from the target graph node to the graph node matching the lower end data record.
4. The method of claim 3, further comprising:
when detecting that no graph node matched with the lower end data record exists in the generated graph nodes, detecting whether the source database of the data record stores the lower end data record;
when detecting that the source database does not store the lower end data record, marking the lower end data record as a missing graph node;
verifying the data set based on the directed graph, including: and checking the data integrity of the data set based on the characteristic information of the missing graph node.
5. The method of claim 1, wherein checking the dataset based on the directed graph comprises:
calculating the sum of the out-degree and the in-degree of the same graph node in the directed graph to obtain the sum of the in-degree and the out-degree of each graph node;
and checking the importance of each data record in the data set based on the sum of the access degrees of each graph node.
6. The method of claim 1, wherein checking the dataset based on the directed graph comprises:
performing loop detection on the directed graph to obtain a node loop in the directed graph;
and checking the data security of the data set based on the characteristic information of the node loop.
7. The method of claim 6, wherein verifying the data security of the data set based on the characterization information of the node loop comprises:
acquiring path quantity characteristic information of the node loop, wherein the path quantity characteristic information of the node loop is used for describing the quantity of paths intersected with the node loop;
and checking the data security of the data set based on the path data characteristic information of the node loop.
8. A data verification apparatus, the apparatus comprising:
the acquisition module is configured to acquire data records contained in a data set and acquire external keys of the data records;
the graph generation module is configured to take the data records as graph nodes, generate edges of the graph nodes based on the foreign keys and obtain directed graphs for describing data relationships among the data records;
a verification module configured to verify the data set based on the directed graph.
9. An electronic device, comprising:
a memory storing computer readable instructions;
a processor reading computer readable instructions stored by the memory to perform the method of any of claims 1-7.
10. A computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1-7.
CN202111612356.7A 2021-12-27 2021-12-27 Data verification method and device, electronic equipment and storage medium Pending CN114265956A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111612356.7A CN114265956A (en) 2021-12-27 2021-12-27 Data verification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111612356.7A CN114265956A (en) 2021-12-27 2021-12-27 Data verification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114265956A true CN114265956A (en) 2022-04-01

Family

ID=80830361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111612356.7A Pending CN114265956A (en) 2021-12-27 2021-12-27 Data verification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114265956A (en)

Similar Documents

Publication Publication Date Title
US10560486B2 (en) Method and system for rapid accreditation/re-accreditation of agile it environments, for example service oriented architecture (SOA)
CN109426723B (en) Detection method, system, equipment and storage medium using released memory
US11620389B2 (en) Method and system for reducing false positives in static source code analysis reports using machine learning and classification techniques
CN109992970A (en) JAVA unserializing leakage location and method
Le et al. Marple: a demand-driven path-sensitive buffer overflow detector
US20140304688A1 (en) Method and system for generating and processing black box test cases
Busany et al. Behavioral log analysis with statistical guarantees
Li et al. CLORIFI: software vulnerability discovery using code clone verification
US8676627B2 (en) Vertical process merging by reconstruction of equivalent models and hierarchical process merging
Wu et al. Mutation testing for ethereum smart contract
Latendresse et al. Not all dependencies are equal: An empirical study on production dependencies in npm
Hua et al. Rupair: towards automatic buffer overflow detection and rectification for Rust
Hong et al. xVDB: A high-coverage approach for constructing a vulnerability database
CN114356859A (en) Data import method and device, equipment and computer readable storage medium
US8875297B2 (en) Interactive analysis of a security specification
US8700542B2 (en) Rule set management
Abushark et al. Early Detection of Design Faults Relative to Requirement Specifications in Agent-Based Models.
CN115203061B (en) Interface automation test method and device, electronic equipment and storage medium
CN115412358B (en) Network security risk assessment method and device, electronic equipment and storage medium
Rabbi et al. AI writes, we analyze: The ChatGPT python code saga
CN114265956A (en) Data verification method and device, electronic equipment and storage medium
CN113672233B (en) Server out-of-band management method, device and equipment based on Redfish
CN114153447B (en) Automatic AI training code generation method
CN114691197A (en) Code analysis method and device, electronic equipment and storage medium
CN114116720A (en) Online database management method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination