CN114996297A - Data processing method, device, equipment, medium and product - Google Patents

Data processing method, device, equipment, medium and product Download PDF

Info

Publication number
CN114996297A
CN114996297A CN202210390235.0A CN202210390235A CN114996297A CN 114996297 A CN114996297 A CN 114996297A CN 202210390235 A CN202210390235 A CN 202210390235A CN 114996297 A CN114996297 A CN 114996297A
Authority
CN
China
Prior art keywords
data
node
edge
relational data
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210390235.0A
Other languages
Chinese (zh)
Other versions
CN114996297B (en
Inventor
吴丽清
陈少静
陈舒杭
刘一辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202210390235.0A priority Critical patent/CN114996297B/en
Publication of CN114996297A publication Critical patent/CN114996297A/en
Application granted granted Critical
Publication of CN114996297B publication Critical patent/CN114996297B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The application discloses a data processing method, device, equipment, medium and product. The method comprises the following steps: in response to a first input by a first user to first original relational data, determining a first object and a second object to which the first original relational data relates; determining a first node corresponding to the first object in the graph database and a second node corresponding to the second object in the graph database; displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in a graph database; in response to a second input of the first user to a target attribute hierarchy in the first attribute hierarchy, determining first index information stored in the target attribute hierarchy; acquiring second original relational data of a target attribute hierarchy between the first object and the second object from a relational database according to the first index information; displaying the second original relational data. Therefore, the efficiency of acquiring the required data from the mass data can be improved, and the computing resources are saved.

Description

Data processing method, device, equipment, medium and product
Technical Field
The present application belongs to the field of data processing technologies, and in particular, to a data processing method, apparatus, device, medium, and product.
Background
With the development of the big data era, the discovery of the relationship between things from the large-scale data with a loose organization structure becomes one of the mainstream of data analysis in each field, and therefore, how to determine and acquire partial data participating in analysis from mass data is particularly important.
In the prior art, structured data may be stored through a graph database, each node in the graph database corresponds to one object, multiple edges may exist between two nodes, one piece of data is stored in each edge, and when a part of data between two objects is to be searched, the data stored in each edge between two nodes corresponding to the two objects needs to be traversed, so as to determine the required data.
However, when the data between the two objects is large, the number of edges between the two nodes is also large, the efficiency of searching the required data by traversing the data stored in each edge is very low, and a large amount of computing resources are occupied.
Disclosure of Invention
Embodiments of the present application provide a data processing method, apparatus, device, medium, and product, which can at least solve the problems in the prior art that the efficiency of searching for required data by traversing data stored in each edge is very low and a large amount of computing resources are occupied.
In a first aspect, an embodiment of the present application provides a data processing method, where the method includes:
in response to a first input by a first user to first original relational data, determining a first object and a second object to which the first original relational data relates;
determining a first node corresponding to a first object in a graph database and a second node corresponding to a second object in the graph database, wherein the nodes in the graph database correspond to the objects, no more than one edge is connected between every two nodes, the edges are used for indicating that the objects corresponding to the two nodes connected with the edges have a relationship, different attribute layers are arranged on the edges, and the different attribute layers are used for storing index information of different types of original relational data;
displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in a graph database;
in response to a second input of the first user to a target attribute hierarchy in the first attribute hierarchy, determining first index information stored in the target attribute hierarchy;
acquiring second original relational data of a target attribute hierarchy between the first object and the second object from a relational database according to the first index information;
displaying the second original relational data.
In a second aspect, an embodiment of the present application provides a data processing apparatus, including:
a first determination module to determine, in response to a first input of first original relational data by a first user, a first object and a second object to which the first original relational data relates;
the second determining module is used for determining a first node corresponding to the first object in the graph database and a second node corresponding to the second object in the graph database, wherein the nodes in the graph database correspond to the objects, no more than one edge is connected between every two nodes, the edges are used for indicating that the objects corresponding to the two nodes connected with the edges have a relationship, different attribute layers are arranged on the edges, and the different attribute layers are used for storing index information of different types of original relationship type data;
the first display module is used for displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in a graph database;
the third determining module is used for responding to second input of the first user to a target attribute hierarchy in the first attribute hierarchy, and determining first index information stored in the target attribute hierarchy;
the first acquisition module is used for acquiring second original relational data of a target attribute hierarchy between the first object and the second object from the relational database according to the first index information;
and the second display module is used for displaying the second original relational data.
In a third aspect, an embodiment of the present application provides an electronic device, where the device includes: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a data processing method as shown in any of the embodiments of the first aspect.
In a fourth aspect, the present application provides a computer storage medium having computer program instructions stored thereon, where the computer program instructions, when executed by a processor, implement the data processing method shown in any one of the embodiments of the first aspect.
In a fifth aspect, the present application provides a computer program product, where when executed by a processor of an electronic device, the instructions cause the electronic device to perform the data processing method shown in any one of the embodiments of the first aspect.
The data processing method, apparatus, device, medium, and product of embodiments of the present application are capable of determining a first object and a second object related to a first original relational data in response to a first input of the first original relational data by a first user, and determining a first node corresponding to the first object in the graph database and a second node corresponding to the second object in the graph database, then displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in the graph database, determining first index information stored in a target attribute hierarchy in response to a second input of a first user to the target attribute hierarchy in the first attribute hierarchy, and acquiring second original relational data of the target attribute hierarchy between the first object and the second object from the relational database according to the first index information, and displaying the second original relational data. Thus, the original relational data need not be stored in the graph database, but rather in a relational database, only the index information of the original relational data in the relational database is stored in the graph database, so that no more than one edge connecting every two nodes in the graph database can be set, the edge is used for indicating that the relation exists between objects corresponding to the two nodes connected with the graph database, different attribute layers are arranged on the edge, the different attribute layers are used for storing the index information of different types of the original relational data in the relational database, and thus, when partial data between two nodes is required to be acquired, the partial data can be acquired by selecting the corresponding attribute hierarchy, and data stored in each edge between two nodes does not need to be traversed, so that the efficiency of acquiring required data from mass data is improved, and computing resources are saved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a basic architecture diagram of a native map database provided by one embodiment of the present application;
FIG. 2 is a flow chart of a data processing method provided by an embodiment of the present application;
fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Features of various aspects and exemplary embodiments of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of, and not restrictive on, the present application. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In addition, it should be noted that, in the technical solution of the present application, the acquisition, storage, use, processing, etc. of data all conform to relevant regulations of national laws and regulations.
Here, a method for acquiring a required part of data from a large amount of data through a conventional relational database and a knowledge graph in the prior art will be briefly described.
First, a relational database uses a relational model as an organization scheme of data, and the data is usually stored in the form of two-dimensional tables, and each table is associated by defining a primary key. The idea of performing association query in a relational database is generally as follows: firstly, determining a table to be associated, then determining a field to be queried, and determining an association condition and an association mode. Connections of the watch are classified as internal connections, external connections, cross-connects, self-connects, and the like.
Wherein, the internal connection is divided into an equal connection and a non-equal connection: the equivalent connection means compares the values of the connection columns of the two tables by using an equal sign, namely comparing the values of the connection columns of the two tables, and taking records with the same value of the connection columns of the two tables after executing Cartesian on the two tables; non-equal value concatenation refers to comparing the values of the concatenated columns of the two tables using ">" or "<", which is equivalent to taking a record of one table being greater or less than the concatenated column values of the other table after the two tables have been cartesian.
The outer connection is divided into left outer connection, right outer connection and full outer connection: the left external connection query result comprises all rows which need to be queried by the left table and the right table, all data in the left table can be displayed, but only the data matched with the left table in the data of the right table can be queried, otherwise, the data is displayed as null; the right external connection query result comprises all rows to be queried of the left table and the right table, all data in the right table can be displayed, but only the data matched with the right table in the data of the left table can be queried, otherwise, null is displayed; the full external connection query result comprises all rows to be queried of the left table and the right table, and no value is displayed null in a corresponding field.
The cross-connect makes a Cartesian product query between each row in the left table and all rows in the right table, also called tables.
The key point of the connection query of the self-connection current table and the self-connection table is that a table is virtualized, namely an alias is defined for the self-connection table
Under the condition of mass data, the table association query directly carried out in the relational database has low efficiency, so that the relational data also provides an indexing mechanism to accelerate the retrieval speed. An index is a data structure in a relational database that is used to quickly find records. The index types include a main key index, a foreign key index, a single-field index, a multi-field index and the like.
The main key index is an index established on the main key and must be a unique index, and a record can be quickly positioned through the main key.
The foreign key index is an index established on a foreign key, and needs to be associated with a field of another table, which greatly improves the speed of table association.
The single-segment index is an index of a certain field in the table, and can be selected to establish a common index, a unique index and a full-text index, or can be selected to be a multi-path search TREE (B-TREE) or a hash (hash) in a data structure.
The multi-field index indexes a plurality of fields in the table, and when the multi-field index is used, the left minimum matching principle can be used, for example: indexes (A, B, C and D), wherein A is independently used as a query condition during query, and the index does not need to be additionally established; the use of (A, B) and (A, B, C) also does not require the establishment of additional indexes; however, when B, (B, C) and (B, C, D) queries are used, additional indexing is required.
However, the relational database hides the association relationship in the foreign key structure, and has no display expression, which brings complexity of association query and calculation, especially brings a large number of join table calculations (join) in the case of processing multi-hop query, and the complexity of calculation increases exponentially with the increase of the number of hops. For example, after determining the user information and the user transaction table, to find out the shortest path of the account transaction record, when the conventional relational database is used to process a multi-hop query, a result cannot be obtained until more than 5 hops.
Secondly, data retrieval and acquisition can be carried out based on a knowledge graph, wherein the knowledge graph is a graph formed by nodes and edges and used for describing the association relationship among the object entities, the nodes represent the object entities, and the edges among the nodes represent the association among the objects. There are two ways for relation retrieval based on knowledge graph: the method comprises the steps of knowledge graph retrieval based on a relational database and knowledge graph retrieval based on a primary database.
The knowledge graph retrieval based on the relational database stores the relational information between things in a two-dimensional table mode, and the fields representing the relational information are stored as data columns or data groups, wherein typical storage and retrieval modes include the following modes:
1) ternary group table based retrieval
The method uses a relational database to suggest a table containing (subject, predict, object) three columns (wherein subject represents a subject, predict represents a relation, and obj represents an object), stores all object entities and relations in the three-tuple table, performs association query through an SQL statement, and under multiple association constraints, contains a lot of self-join queries (self-join), and is very inefficient.
2) Attribute table based retrieval
The method takes the type of an object entity as a center, and stores the relation belonging to the same type of entity as an attribute into a table, and the retrieval is essentially close to that of the traditional relational database. But many nulls will be generated.
3) Vertical partition table based retrieval
The basic idea of the method is to group triples according to the relationship attributes, establish a table containing two columns for each relationship attribute, and perform query calculation on the objects.
4) Full index structure based retrieval
The method is also to build a triplet table containing (subject, predict, object), but adds a variety of optimization means. Firstly, a mapping table is established, namely all field values are mapped to a unique identifier, the three-tuple table does not store real values any more, and only the corresponding identifier is stored. Then, establishing a six-fold index: SPO, SOP, PSO, POS, OPS, OSP, covering multidimensional graph query requirements (where O is full-spelled object representing object; P is full-spelled predictor representing relationship; and S is full-spelled sub representing subject).
The method comprises the steps of retrieving a knowledge graph based on a primary graph database, storing and inquiring by using structural features of the graph, performing explicit description and definition of a relation based on a determined business meaning in a relation extraction process in the knowledge graph, wherein one side represents a business relation, and the relation is stored as a side in a graph model. The basic idea is to represent a graph as an adjacency list, i.e. to represent the adjacency as an adjacency list, and then to build an index based on the adjacency list, so as to optimize the query on the graph.
As shown in fig. 1, the basic architecture of the physical storage of a native database may include: node storage file 101, relational edge storage file 102, tag storage file 103, attribute storage file 104, relational edge type storage file 105, attribute index file 106, and dynamic storage file 107.
In the node storage file and the relationship storage file, the storage position of each node and relationship edge is fixed so as to obtain an access address through identification, the identification of the node comprises identification of a first relationship edge, identification of a first attribute edge and identification of a first label, the identifications are similar to pointers (or indexes), and relationship edges, attribute edges, labels and the like related to the node can be quickly retrieved. The identification of the relational edge storage is similar to the node storage, so that head and tail nodes, relational edge types and the like related to the relational edges can be quickly retrieved. Through the design, the first relation edge of the node can be quickly found from one node, the other adjacent node can be found from the relation edge, the second, third and Nth relation edges are further found, and the full traversal retrieval is realized.
However, graph databases are efficient in modeling and retrieving relationships, and can support queries for multi-hop relationships. However, at present, the knowledge graph is mostly applied to extraction of entities and relationships and graph construction from unstructured data, for structured data, an existing graph database generally constructs relationship edges based on specific attribute fields, when massive data records exist under the attribute fields, a large number of relationship edges are generated, and when query of multi-hop association is performed, the retrieval amount is increased in a geometric manner, and a large amount of calculation power is consumed.
That is, when the data center performs data mining analysis, the main business angle includes performing comprehensive analysis and business connectivity analysis from different business dimensions based on the same event, and at this time, multi-table association analysis is required. At this time, how to find a small data set related to the event from the mass storage to participate in analysis is mostly restricted, and a large amount of overhead is occupied by searching data participating in calculation in the mass data, so that the calculation efficiency is low, and the performance loss is serious.
In order to solve the above problem, the embodiment of the present application provides a graph number separation principle for highly structured original relational data, and the essence of graph number separation is that a graph is used as an index of service data with relational communication features. Based on the method, a service communication relation topology construction and retrieval mode can be established, the service relation discovery based on the structured data and the index information for quickly finding the original relation data related to the communication relation are changed, the abnormal small data set is quickly positioned from the relation topology, and further data analysis is carried out, so that the efficiency of complex association query of massive structured data is improved; meanwhile, after the relation of the services is abstracted through the graphs, multi-dimensional service convergence expression is realized, and service personnel can find the potential possibility of multi-table linkage analysis.
Based on this, a graph-based data retrieval mechanism can be established for the raw relational data. The concept of edges is also built, and geometric data is defined in the graph database as the uniqueness of the existence of the edges. According to the graph number separation principle, an edge is established in a graph database aiming at the relationship between two nodes, the edge only represents the communication relationship of a graph layer, the edge is divided into different attribute layers according to different source tables, namely the relationship represented by the original relational data is different, and the index information of the original relational data generating the relationship is recorded in the corresponding attribute layer. In this way, it is possible to perform the combing of all the relationships between the two nodes in the global original relational data and the sorting of the index information of the original relational data that generated these relationships. The data model realizes translation and linkage work between graph data and data center relational data, and realizes linkage indexing and display of relational and original service data.
Fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application, and it should be noted that the data processing method may be applied to a data processing system, as shown in fig. 1, the data processing method may include the following steps:
s210, responding to a first input of a first user to the first original relational data, determining a first object and a second object related to the first original relational data;
s220, determining a first node corresponding to the first object in the graph database and a second node corresponding to the second object in the graph database;
s230, displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in a graph database;
s240, responding to the second input of the first user to the target attribute hierarchy in the first attribute hierarchy, and determining first index information stored in the target attribute hierarchy;
s250, acquiring second original relational data of a target attribute hierarchy between the first object and the second object from the relational database according to the first index information;
and S260, displaying the second original relational data.
Therefore, the first object and the second object related to the first original relational data can be determined in response to the first input of the first original relational data by the first user, the first node corresponding to the first object in the graph database and the second node corresponding to the second object in the graph database are determined, the first attribute hierarchy corresponding to the first edge between the first node and the second node in the graph database is displayed, the first index information stored in the target attribute hierarchy is determined in response to the second input of the target attribute hierarchy in the first attribute hierarchy by the first user, the second original relational data of the target attribute hierarchy between the first object and the second object is obtained from the relational database according to the first index information, and the second original relational data is displayed. Thus, the original relational data need not be stored in the graph database, but rather in a relational database, only the index information of the original relational data in the relational database is stored in the graph database, so that no more than one edge connecting every two nodes in the graph database can be set, the edge is used for indicating that the relation exists between objects corresponding to the two nodes connected with the graph database, different attribute layers are arranged on the edge, the different attribute layers are used for storing the index information of different types of the original relational data in the relational database, and thus, when partial data between two nodes is required to be acquired, the partial data can be acquired by selecting the corresponding attribute hierarchy, data stored in each edge between two nodes does not need to be traversed, the efficiency of acquiring required data from mass data is improved, and computing resources are saved.
Referring to S210, the first original relational data is data stored in a relational database, and may be business data, transaction data, or archive data. The first input may be an input for entering an identifier of the first original relational data, for example, a number of the first original relational data is entered, an input for clicking to select the first original relational data is also entered, and other inputs for the first original relational data are also possible, which is not limited herein. The first object may be a business or an individual and the second object may also be a business or an individual.
In particular, after a first user makes a first input to first raw relational data, the data processing system may determine, in response to the first input, a first object and a second object to which the first raw relational data relates. For example, if the first original relational data is transaction data, the first object and the second object may be both parties of a transaction of the transaction data.
Referring to S220, the graph database may be applied to a knowledge graph, and may also be applied to other graphs, which are not limited herein. The nodes in the graph database can correspond to the objects, no more than one edge connecting every two nodes can be provided, the edges can be used for indicating that the objects corresponding to the two nodes connected with the edges have a relationship, different attribute layers can be arranged on the edges, and the different attribute layers can be used for storing index information of different types of original relational data. Different types of raw relational data may represent different relationships between two objects to which the raw relational data relates. The index information may include storage locations, such as numbers 0001 to 0100 in table a, but may also include other index information, which is not limited herein.
Specifically, the correspondence between the object and the node in the graph database may be stored in advance, for example, the correspondence between the object identifier and the node identifier may be stored. According to the corresponding relation between the pre-stored object and the node, a first node corresponding to the first object in the graph database and a second node corresponding to the second object in the graph database can be determined.
Referring to S230, since no more than one edge connecting every two nodes in the graph database is used to indicate that there is a relationship between objects corresponding to the two nodes connected thereto, if there is a relationship between the first object and the second object, there is only one edge between the first node and the second node, so that after the first node and the second node are determined, the first edge between the first node and the second node can be determined. Different attribute hierarchies are arranged on the edges and can be used for storing index information of different types of original relational data, so that the first attribute hierarchy corresponding to the first edge can be displayed for the first user to select and obtain the required type of original relational data conveniently. The first attribute hierarchy may include one or more attribute hierarchies.
Referring to S240, the target attribute hierarchy may include one or more attribute hierarchies. The second input may be an input of the first user clicking to select the target attribute hierarchy, an input of the first user entering the target attribute hierarchy, or another input of the first user for the target attribute hierarchy, which is not limited herein.
Specifically, the first user selects a target attribute hierarchy from the displayed first attribute hierarchies by clicking, and the data processing system may determine first index information stored in the target attribute hierarchy in response to the click input, i.e., the second input.
Referring to S250, the second original relational data may be original relational data of an object type between the first object and the second object, the object type corresponding to the object attribute hierarchy, and the original relational data of the object type between the first object and the second object may embody an object relationship between the first object and the second object. The first index information may be a storage location of the second original relational data in the relational database, or may be other index information of the second original relational data in the relational database, which is not limited herein. Therefore, the second original relational data can be obtained from the relational database according to the first index information.
The graph database can have the compatibility of inputting and exporting the relational database through the ODBC protocol, a user does not need to convert original relational data in the relational database into specific formats such as a Comma-Separated Values (csv) and the like and then import the specific formats into the graph database, the linkage operation of the graph topology and the original relational data can be realized, the difficulty of user composition is reduced, and analysis can be performed based on a traditional analysis algorithm. And through the ODBC protocol, a user can show the association relationship among different objects and specific original relational data under the relationship in a graphical visualization environment.
Referring to S260, after the second original relational data is obtained, it may be displayed for viewing and use by the first user.
In some examples, if the first user wants to analyze whether a transaction is abnormal, the analysis may be performed by using the relationship between two transaction parties of the transaction, and based on the relationship between the two transaction parties, it is determined whether the transaction between the two transaction parties is reasonable, and if the analysis is performed based on the relationship between the two transaction parties, data that can reflect the relationship between the two transaction parties needs to be obtained. In order to obtain data representing the relationship between the two parties, the first user may input, at the data processing system, the transaction number "0001" of the original transaction data of the transaction, that is, the first original relationship type data, and the data processing system may determine, in response to the input, that is, the first input, that the two parties to the transaction of the original transaction data: enterprise a and enterprise B, i.e., the first object and the second object. It may then be determined that node a, the first node, corresponding to enterprise a in the graph database, and node B, corresponding to enterprise B in the graph database, after determining node a and node b in the graph database, an edge z, i.e. a first edge, further, the attribute level m, the attribute level n and the attribute level v corresponding to the edge z, i.e. the first attribute level, the attribute hierarchy m can store the storage position r of original relational data of fund transfer between the enterprise A and the enterprise B in the relational database, the attribute hierarchy n can store the storage position s of original relational data of interpersonal relationship between the enterprise A and the enterprise B in the relational database, and the attribute hierarchy v can store the storage position t of original relational data of tax receipt record between the enterprise A and the enterprise B in the relational database. The original relational data of the fund transfer class between the enterprise A and the enterprise B can reflect the fund transfer relation between the enterprise A and the enterprise B, the original relational data of the interpersonal relation class between the enterprise A and the enterprise B can reflect the interpersonal relation between the enterprise A and the enterprise B, the original relational data of the tax receipt record class between the enterprise A and the enterprise B can reflect the tax relation between the enterprise A and the enterprise B, if a first user wants to analyze whether the transaction between the enterprise A and the enterprise B is reasonable or not according to the fund transfer relation and the interpersonal relation, the attribute level m and the attribute level n, i.e., the target attribute level, may be selected by a click, and the data processing system may determine the storage location r stored in the attribute level m and the storage location s stored in the attribute level n, i.e., the first index information, in response to the click input, i.e., the second input. Then, the data processing system can respectively acquire the original relational data of the fund transfer class between the enterprise a and the enterprise B and the original relational data of the interpersonal class, namely the second original relational data, from the storage position r and the storage position s in the relational database according to the storage position r and the storage position s, and further can display the original relational data of the fund transfer class between the enterprise a and the enterprise B and the original relational data of the interpersonal class for the first user to check, so that the first user can analyze whether the transaction between the enterprise a and the enterprise B is reasonable or not according to the displayed data.
Based on the above, when the comprehensive analysis needs to be performed from different service dimensions based on the same event, the user can select the relationship to be involved in the analysis and the original relational data which can embody the relationship to be involved in the analysis through the edge and the attribute hierarchy corresponding to the edge. The realization mechanism is that the rapid collection of the original relational data in the relational Database is realized through the attribute hierarchy stored on the selected edge recorded in the graph Database and the index information of all the original relational data which are recorded on the attribute hierarchies and generate the relationship, and an Open Database interconnection (ODBC) interface protocol, so that a data set which needs to participate in analysis is found from mass data. Of course, the edges of the relationship representing the transaction can be found through the transaction, so that other relationships between the two objects can be found, and the related original relationship type data can be found through the same manner.
In some embodiments, the data size of the original relational data may be very large, and therefore, when a user needs to obtain a part of data therein, it is necessary to retrieve the part of data from the mass data, which generally takes a relatively long time, and in order to quickly obtain the required part of data from the mass data, before S210, the method may further include:
responding to a third input of the first user to the at least one piece of original relational data, and acquiring the at least one piece of original relational data;
for each piece of original relational data in at least one piece of original relational data, respectively executing the following steps to obtain a graph database:
determining a second attribute hierarchy corresponding to the original relational data, second index information of the original relational data, and a third object and a fourth object related to the original relational data according to the original relational data;
establishing a second edge between a third node corresponding to the third object and a fourth node corresponding to the fourth object;
creating a second attribute hierarchy as an attribute hierarchy corresponding to the second edge;
the second index information is stored in a second attribute hierarchy.
Here, a range of the original relational data to be acquired may be determined first, and a database may be generated for the original relational data within the range. For example, a graph database may be generated for raw relational data for month 1 of 2022. Of course, the first user may randomly select some original relational data to generate the graph database according to the requirement, which is not limited herein. The third input may be an input that determines at least one piece of raw relational data used to generate the graph database. The second attribute hierarchy may be used to store second index information of the original relational data in the relational database.
Specifically, the first user selects at least one piece of original relational data, the data processing system may obtain the at least one piece of original relational data in response to the selection input, that is, a third input, and determine, for each piece of original relational data, a second attribute hierarchy corresponding to the original relational data, second index information of the original relational data, and a third object and a fourth object related to the original relational data according to a preset correspondence between a type of the original relational data and an attribute hierarchy. Then, a third node corresponding to the third object and a fourth node corresponding to the fourth object may be determined according to a pre-stored correspondence between the object and the node, and since the original relational data relates to the third object and the fourth object, a relationship exists between the third object and the fourth object, and an edge between the third node and the fourth node, that is, a second edge, may be established. An edge may be a storage space, which may be divided into a plurality of sub-storage spaces, each of which may be an attribute hierarchy for storing index information. Since the attribute hierarchy corresponding to the original relational data is the second attribute hierarchy, the second attribute hierarchy may be created as the attribute hierarchy corresponding to the second edge, and then the second index information of the original relational data may be stored in the second attribute hierarchy.
It should be noted that after an edge is created between the third node and the fourth node, even if other original relational data also relates to the third object and the fourth object, the edge between the third node and the fourth node is not created any more, that is, only one edge is created between two nodes at most.
After the above method is performed on each piece of original relational data in at least one piece of original relational data, a graph database corresponding to the at least one piece of original relational data can be obtained.
In some examples, the first user wants to check whether abnormal data exists in the raw relational data of month 1 2022, month 1 2022 may be input in the data processing system, and the data processing system may obtain all the raw relational data of month 1 2022 in response to the input, i.e., the third input, and process each piece of the raw relational data separately. The process of processing each piece of original relational data is described by taking the original relational data L as an example: according to the type of the original relational data L and the corresponding relationship between the type and the attribute hierarchy, an attribute hierarchy w, namely a second attribute hierarchy, corresponding to the original relational data L is determined, a storage position u, namely second index information, of the original relational data L in a relational database can be determined, and an enterprise C and an enterprise D, namely a third object and a fourth object, related to the original relational data L are determined. Then, according to the pre-stored correspondence between the object and the node, a node C corresponding to the enterprise C, that is, a third node, and a node D corresponding to the enterprise D, that is, a fourth node, may be determined. Since the original relational data L relates to enterprise C and enterprise D, there is a relationship between enterprise C and enterprise D, and an edge y between node C and node D, i.e., a second edge, can be established. Then, since the attribute hierarchy corresponding to the original relational data L is the attribute hierarchy w, the attribute hierarchy w can be created as the attribute hierarchy corresponding to the edge y, and the storage location u of the original relational data L in the relational database is stored in the attribute hierarchy w. Thus, the processing of the original relational data L is completed, and the graph database can be obtained after the processing is performed on each piece of the acquired original relational data of 1 month in 2022.
Of course, it is also noted that at most one edge is established between any two nodes. For example, if there is original relational data H in the original relational data of month 1 in 2022, and the enterprises related to the original relational data H are also enterprise C and enterprise D, at this time, it is not necessary to establish an edge between node C and node D again, but a storage location q of the original relational data H in the relational database is stored in an attribute hierarchy corresponding to the edge y, specifically, if the attribute hierarchy corresponding to the original relational data H is also an attribute hierarchy w, it is not necessary to create a new attribute hierarchy, and the storage location q is directly stored in the created attribute hierarchy w; if the attribute hierarchy corresponding to the original relational data H is not the created attribute hierarchy but an attribute hierarchy p, a new attribute hierarchy p corresponding to the edge y needs to be created, and then the storage location q is stored in the newly created attribute hierarchy p.
Thus, the graph database generated through the process can store the original relational data in the relational database, and only the index information of the original relational data in the relational database is stored in the graph database, so that no more than one edge connecting every two nodes in the graph database can be set, the edge is used for indicating that the relationship exists between the objects corresponding to the two connected nodes, different attribute hierarchies are arranged on the edge, and the different attribute hierarchies are used for storing the index information of the different types of the original relational data in the relational database.
In addition, the graph database is generated through the process, and the storage space of the graph database can be saved.
Based on the graph number separation concept, in the original relational data knowledge ontology process, only the knowledge ontology is defined for the object and business relation related to the original relational data, and the difficulty in reconstructing the traditional original relational data knowledge is reduced.
In some embodiments, to facilitate rapid generation of the graph database, prior to the obtaining of the at least one piece of raw relational data in response to the third input by the first user to the at least one piece of raw relational data, the method may further comprise:
acquiring an object list;
and establishing a node corresponding to each object in the graph database according to the object list.
Here, the object list may be a list storing object data. In order to facilitate the rapid generation of the graph database, the nodes corresponding to each object in the object list can be established in the graph database. An object identifier can be set for each object, and a node identifier can be set for each node. In addition, the corresponding relation between the object and the node can be stored, and the corresponding relation between the object identification and the node identification can also be stored. Specifically, the nodes may be established or the newly added nodes may be updated in real time or periodically, or the nodes may be established before the graph database needs to be generated each time.
In some examples, enterprise a, enterprise B, enterprise C, enterprise D, enterprise E, and enterprise F may be included in the list of objects. According to the object list, a node a corresponding to the enterprise A, a node B corresponding to the enterprise B, a node C corresponding to the enterprise C, a node D corresponding to the enterprise D, a node E corresponding to the enterprise E and a node F corresponding to the enterprise F can be established in a graph database.
Therefore, through the process, the corresponding nodes of all the objects can be established in advance before the graph database is generated, so that the graph database can be generated quickly.
Based on the method, the global nodes are established in the graph database layer, the global identifications are made for all the nodes, the edges of the graph topology communication relation only exist between the two nodes are appointed, all the service attributes are set as attribute levels, and the index information of all the original relation type data generating the relations is stored through the attribute levels, so that the minimum of the number of the edges between the two nodes is realized. The storage space of the graph is minimized, and the calculation force when the communication analysis is carried out on the graph is also minimized.
In some embodiments, the user wants to analyze whether the relationship between two objects is abnormal, and an indirect relationship may exist between the two objects, for example, a fund transfer relationship exists between the object a and the object B, a fund transfer relationship exists between the object B and the object C, and an indirect fund transfer relationship may exist between the object a and the object C. And according to the relevant provisions, such indirect fund transfer relationship may not exist between the object a and the object C, the relationship between the object a and the object B may be considered to be abnormal, and in order to determine whether an abnormal relationship exists between the two objects, after the second index information is stored in the second attribute hierarchy, the method may further include:
in response to a fourth input by the first user of the fifth object, the sixth object, and the first condition, determining all first paths in the graph database that connect a fifth node corresponding to the fifth object and a sixth node corresponding to the sixth object;
for each first path in all the first paths, respectively executing the following steps:
judging whether a third edge included in the first path meets a first condition;
and displaying the first path under the condition that a third edge included in the first path meets a first condition.
Here, the first condition may be determined by setting a defining condition for an attribute hierarchy of an edge included in the first path. For example, the first condition may be that each edge included in the first path sets an attribute level m and an attribute level n, or the first condition may also be that each edge included in the first path sets at least one of the attribute level m and the attribute level n. Of course, the first condition may also be other definitions performed on the attribute hierarchy of the edge included in the first path according to actual requirements, and is not limited herein. The first condition may be a condition satisfied by an edge included in the abnormal path to be searched by the first user. The fourth input may be an input to enter a fifth object, a sixth object, and the first condition, and may be an entry or click input, for example. The total first paths may be all paths in the graph database that connect the fifth node and the sixth node. Each first path may include all nodes and edges between the fifth node and the sixth node in the path, and also includes the fifth node and the sixth node. The third side may be a side comprised by the first path, which may comprise one or more sides.
Specifically, a first user may select a fifth object and a sixth object in a data processing system, and set a first condition, the data processing system may respond to the selection input, that is, a fourth input, and determine, according to a pre-stored correspondence between an object and a node, a fifth node corresponding to the fifth object in a graph database and a sixth node corresponding to the sixth object in the graph database, and further determine all first paths linking the fifth node and the sixth node, and then respectively determine whether an edge included in each first path, that is, a third edge, satisfies the first condition, and if so, display the first path, indicating that the first path is abnormal; if not, the first path is not displayed. Here, the first route having an abnormality may be displayed in a specific color after the entire map database is displayed, or only the first route having an abnormality may be displayed without displaying the other routes. When the number of the abnormal first paths exceeds the preset number, that is, the number is too large, in order to avoid disordered display contents, the abnormal first paths with the preset number may be displayed first, and then the remaining abnormal first paths may be displayed according to input of the first user and in a manner of displaying the preset number of abnormal first paths each time.
In some examples, if an attribute hierarchy m is set for each edge included in a path between enterprise E and enterprise F, it indicates that the path is abnormal. Based on this, if the first user wants to analyze whether an abnormal relationship exists between the enterprise E and the enterprise F, that is, whether an abnormal path exists, the enterprise E and the enterprise F, that is, the fifth object and the sixth object, may be selected in the data processing system, and the first condition is set as follows: each edge is provided with an attribute hierarchy m. The data processing system may determine, according to the correspondence between the object and the node, a node E corresponding to the enterprise E, that is, a fifth node, and a node F corresponding to the enterprise F, that is, a sixth node, and then may determine, according to the link relationship between the graph database, all first paths of the link node E and the node F: path g and path j. And respectively judging whether each edge of the path g and the path j is provided with an attribute level m. For example, the path g is "node e, edge x, node c, edge h, node a, edge o, and node f", where the included edges include edge x, edge h, and edge o, that is, a third edge, where the edge x, edge h, and edge o all have an attribute hierarchy m set therein, so that the third edge included in the path g satisfies the first condition, and the path g has an exception, so that the path g is displayed. The judgment process for the path j is the same as the judgment process for the path g, but the third edge included in the path j does not satisfy the first condition, and thus the path j is not displayed.
Therefore, the path between the two nodes can be determined according to the link relation of the graph database through the process, and the abnormal path is screened out by setting the first condition, so that the abnormal relation between the two objects can be rapidly determined.
In some embodiments, if it is determined that there is an abnormality in the relationship between the two objects, the reason for the abnormality needs to be analyzed by looking up the corresponding original relationship type data, and in order to facilitate the user to look up the corresponding original relationship type data, after determining whether the third edge included in the first path satisfies the first condition, the method may further include:
displaying a third attribute hierarchy corresponding to a third edge included in the first path under the condition that the third edge included in the first path meets a first condition;
determining third index information stored in a fourth attribute hierarchy in response to a fifth input by the first user to the fourth attribute hierarchy in the third attribute hierarchy;
and acquiring third original relational data from the relational database according to the third index information.
Here, the first path may include one or more third sides, each of the third sides may correspond to one or more third attribute hierarchies, the fourth attribute hierarchy may be an attribute hierarchy of a fourth one of the third sides, and the fifth input may be an input of clicking or entering the fourth attribute hierarchy. The third index information may be a storage location of the third original relational data in the relational database.
Specifically, if the third edge included in the first path meets the first condition, it indicates that the first path is an abnormal path, at this time, in addition to the node and the third edge included in the first path, a third attribute hierarchy corresponding to each third edge may also be displayed, so that the first user selects a fourth attribute hierarchy in the third attribute hierarchy according to experience, and then the data processing system may acquire, from the relational database, third original relational data that the first user wants to view, according to third index information stored in the fourth attribute hierarchy selected by the first user. The third primitive relational data may then also be displayed for viewing by the first user for analysis of the cause of the anomaly.
In some examples, it is determined that the path g includes an edge x, an edge h, and an edge o that all satisfy the first condition, and thus, an attribute level m and an attribute level v corresponding to the edge x, an attribute level m and an attribute level n corresponding to the edge h, and an attribute level m and an attribute level p corresponding to the edge o may be displayed, respectively. The first user suspects that the original relational data T corresponding to the attribute hierarchy n corresponding to the edge h, namely the third original relational data, may have an abnormality according to experience, and therefore, the attribute hierarchy n, namely the fourth attribute hierarchy, may be clicked, and the data processing system determines, in response to the click input, namely the fifth input, the storage location i stored in the fourth attribute hierarchy, namely the third index information, and then acquires, according to the storage location i, the original relational data T corresponding to the attribute hierarchy n from the relational database, and displays the original relational data T, so that the first user can view and analyze the cause of the abnormality.
Therefore, when an abnormal path is found, a user can conveniently and quickly index the original relational data to be checked, and the abnormal reason can be quickly analyzed.
In some embodiments, it is necessary to analyze whether there is an exception in all relationships related to a certain object, and in order to determine whether there is an exception in all relationships related to a certain object, after the storing the second index information into the second attribute hierarchy, the method may further include:
in response to a sixth input by the first user of the seventh object and the second condition, determining a corresponding seventh node of the seventh object in the graph database;
determining all second paths passing through the seventh node;
for each second path in all the second paths, respectively executing the following steps:
judging whether a fifth edge included in the second path meets a second condition or not;
and displaying the second path under the condition that a fifth edge included in the second path meets the second condition.
Here, the second condition may be determined by setting a definition condition for an attribute hierarchy of an edge included in the second path. For example, the second condition may be that each edge included in the second path sets an attribute level m and an attribute level n, or the second condition may also be that each edge included in the second path sets at least one of the attribute level m and the attribute level n. Of course, the first condition may also be other definitions of the attribute hierarchy of the edge included in the second path according to actual requirements, and is not limited herein. The second condition may be a condition satisfied by an edge included in the abnormal path to be searched by the first user. The sixth input may be an input to enter a seventh object and a second condition, and may be an entry or click input, for example. All of the second paths may be all of the paths in the graph database that pass through the seventh node. Each second path may include all nodes and edges in the path. The fifth edge may be an edge comprised by the second path, and the fifth edge may comprise one or more edges.
Specifically, the first user may select a seventh object in the data processing system, and set a second condition, and the data processing system may determine, in response to the selection input, that is, a fifth input, a seventh node corresponding to the seventh object in the graph database according to a pre-stored correspondence between the object and the node, and further determine all second paths passing through the seventh node, and then respectively determine whether edges included in each of the second paths, that is, fifth edges, satisfy the second condition, and if so, display the second path, indicating that the second path is abnormal; if not, the second path is not displayed. Here, the second route having an abnormality may be displayed in a specific color after the entire map database is displayed, or only the second route having an abnormality may be displayed without displaying the other routes. When the number of the abnormal second paths exceeds the preset number, that is, the number is too large, in order to avoid confusion of display contents, the abnormal second paths of the preset number may be displayed first, and then the remaining abnormal second paths are displayed according to the input of the first user in a manner of displaying the preset number of abnormal second paths each time.
In some examples, if each edge included in a path through enterprise E has attribute level m set, it indicates that the path is abnormal. Based on this, if the first user wants to analyze whether an abnormal relationship exists between the enterprise E and another enterprise, that is, whether an abnormal path exists, the enterprise E, that is, the seventh object, may be selected in the data processing system, and the second condition is set as: each edge is provided with an attribute hierarchy m. The data processing system may determine a node E, i.e., a seventh node, corresponding to the enterprise E according to the correspondence between the object and the node, and then may determine all second paths passing through the node E according to the link relationship of the graph database: path g, path j, and path k. And respectively judging whether each edge of the path g, the path j and the path k is provided with an attribute hierarchy m. For example, the path g is "node e, edge x, node c, edge h, node a, edge o, and node f", where the included edges include edge x, edge h, and edge o, that is, a fifth edge, where the edge x, edge h, and edge o all have an attribute hierarchy m set therein, and therefore, the fifth edge included in the path g satisfies the second condition, and the path g has an exception, so the path g is displayed. The judgment process for the path j and the path k is the same as that for the path g, but the path k and the path j include the fifth edge that does not satisfy the second condition, and thus the path k and the path j are not displayed.
In some examples, path g is "node e, edge x, node c, edge h, node a, edge o, node f" and path k is "node e, edge x, node c, edge h, node a, edge o, node f, edge l, node b" and path k may be displayed because edge x, edge h, and edge o all set an attribute level m that satisfies the second condition, and path g may not be displayed because edge l does not set an attribute level m.
Therefore, the route passing through the seventh node can be determined according to the link relation of the graph database, and the abnormal route is screened out by setting the second condition, so that the abnormal relation between the seventh object and other objects can be rapidly determined.
In some embodiments, if it is determined that there is an abnormality in the relationship between a certain object and another object, the reason for the abnormality needs to be analyzed by looking up the corresponding original relationship type data, and in order to facilitate a user to look up the corresponding original relationship type data, after the determining whether the fifth edge included in the second path satisfies the second condition, the method may further include:
displaying a fifth attribute hierarchy corresponding to a fifth edge included in the second path under the condition that the fifth edge included in the second path meets a second condition;
in response to a sixth input by the first user to a sixth one of the fifth attribute levels, determining fourth index information stored in the sixth attribute level;
and acquiring fourth original relational data from the relational database according to the fourth index information.
Here, the specific process of acquiring the fourth original relational data is the same as the specific process of acquiring the third original relational data, and is not described herein again.
Based on this, when the service connectivity needs to be analyzed, the connectivity relationship of the service can be found through the connectivity analysis of the edges, then all the edges related to the service connectivity relationship are selected, the attribute levels related to the service connectivity relationship and the storage positions of all the original service data recorded on the attribute levels for generating the relationship are found through the edges, and the original service data is rapidly collected in a relational database through an ODBC interface protocol, so that a data set which needs to participate in analysis is found from massive data.
Based on the embodiment of the application, the analysis target can be locked through the relationship and the establishment of the index of the graph helps an analyst to accurately lock the data range needing attention, the huge computational overhead of data traversal or large-table association in the traditional relational database can be avoided, and the query efficiency is effectively improved. Moreover, the query operation is more targeted in the process of carrying out comprehensive analysis and service connectivity analysis from different service dimensions based on the same event, so that the analysis and query efficiency is improved.
Based on the same inventive concept, the embodiment of the application also provides a data processing device. The following describes the data processing apparatus provided in the embodiment of the present application in detail with reference to fig. 3.
Fig. 3 shows a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
As shown in fig. 3, the data processing apparatus may include:
a first determining module 301, configured to determine, in response to a first input of first original relational data by a first user, a first object and a second object to which the first original relational data relates;
a second determining module 302, configured to determine a first node corresponding to a first object in a graph database and a second node corresponding to a second object in the graph database, where the nodes in the graph database correspond to the objects, and no more than one edge connecting every two nodes is used to indicate that a relationship exists between the objects corresponding to the two nodes connected to the edge, and the edges are provided with different attribute hierarchies used to store index information of different types of original relationship type data;
a first display module 303, configured to display a first attribute hierarchy corresponding to a first edge between a first node and a second node in a graph database;
a third determining module 304, configured to determine, in response to a second input of the first user to a target attribute hierarchy in the first attribute hierarchy, first index information stored in the target attribute hierarchy;
a first obtaining module 305, configured to obtain, from the relational database, second original relational data of a target attribute hierarchy between the first object and the second object according to the first index information;
and a second display module 306, configured to display the second original relational data.
Therefore, the first object and the second object related to the first original relational data can be determined in response to the first input of the first original relational data by the first user, the first node corresponding to the first object in the graph database and the second node corresponding to the second object in the graph database are determined, the first attribute hierarchy corresponding to the first edge between the first node and the second node in the graph database is displayed, the first index information stored in the target attribute hierarchy is determined in response to the second input of the target attribute hierarchy in the first attribute hierarchy by the first user, the second original relational data of the target attribute hierarchy between the first object and the second object is obtained from the relational database according to the first index information, and the second original relational data is displayed. Thus, the original relational data need not be stored in the graph database, but rather in a relational database, only the index information of the original relational data in the relational database is stored in the graph database, so that no more than one edge connecting every two nodes in the graph database can be set, the edge is used for indicating that the relationship exists between objects corresponding to the two nodes connected with the edge, different attribute levels are arranged on the edge, the different attribute levels are used for storing the index information of different types of the original relational data in the relational database, and thus, when partial data between two nodes is required to be acquired, the partial data can be acquired by selecting the corresponding attribute hierarchy, data stored in each edge between two nodes does not need to be traversed, the efficiency of acquiring required data from mass data is improved, and computing resources are saved.
In some embodiments, the data size of the original relational data may be very large, and therefore, when a user needs to obtain a part of data therein, it is necessary to retrieve the part of data from the mass data, which usually takes a relatively long time, and in order to quickly obtain the required part of data from the mass data, the apparatus may further include:
the second acquisition module is used for acquiring at least one piece of original relational data in response to a third input of the first user to the at least one piece of original relational data before determining a first object and a second object related to the target business data in response to a first input of the first user to the target business data;
a fourth determining module, configured to perform, for each piece of original relational data in the at least one piece of original relational data: determining a second attribute hierarchy corresponding to the original relational data, second index information of the original relational data, and a third object and a fourth object related to the original relational data according to the original relational data;
a first establishing module, configured to perform, for each piece of original relational data in the at least one piece of original relational data: establishing a second edge between a third node corresponding to the third object and a fourth node corresponding to the fourth object;
a creating module, configured to perform, for each piece of original relational data in the at least one piece of original relational data: creating a second attribute hierarchy as an attribute hierarchy corresponding to the second edge;
a storage module, configured to perform, for each piece of original relational data in the at least one piece of original relational data: and storing the second index information into a second attribute hierarchy to obtain a graph database.
In some embodiments, to facilitate fast generation of a graph database, the apparatus may further comprise:
a third obtaining module, configured to obtain the object list before obtaining the at least one piece of original relational data in response to a third input of the first user to the at least one piece of original relational data;
and the second establishing module is used for establishing a node corresponding to each object in the graph database according to the object list.
In some embodiments, the user wants to analyze whether the relationship between two objects is abnormal, and an indirect relationship may exist between the two objects, for example, a fund transfer relationship exists between the object a and the object B, a fund transfer relationship exists between the object B and the object C, and an indirect fund transfer relationship may exist between the object a and the object C. Whereas according to the relevant provisions there may not be such an indirect funds transfer relationship between object a and object C, the relationship between object a and object B may be considered abnormal, and in order to determine whether an abnormal relationship exists between the two objects, the apparatus may further comprise:
a fifth determining module, configured to determine, in response to a fourth input of the fifth object, a sixth object, and a first condition by the first user after storing the second index information in the second attribute hierarchy, all first paths that connect a fifth node corresponding to the fifth object and a sixth node corresponding to the sixth object in the map database, where the first condition is determined by setting a limiting condition for an attribute hierarchy of an edge included in the first path;
a first judging module, configured to execute, for each first path in all the first paths: judging whether a third edge included in the first path meets a first condition;
a third display module, configured to perform, for each of all the first paths: and displaying the first path under the condition that a third edge included in the first path meets a first condition.
In some embodiments, if it is determined that there is an abnormality in the relationship between the two objects, the cause of the abnormality needs to be analyzed by looking at the corresponding original relational data, and in order to facilitate the user to look at the corresponding original relational data, the apparatus may further include:
the fourth display module is used for displaying a third attribute hierarchy corresponding to the third edge included in the first path under the condition that the third edge included in the first path meets the first condition after judging whether the third edge included in the first path meets the first condition;
a sixth determining module, configured to determine, in response to a fifth input by the first user to a fourth attribute hierarchy of the third attribute hierarchies, third index information stored in the fourth attribute hierarchy, where the fourth attribute hierarchy is an attribute hierarchy of a fourth edge of the third edge.
And the fourth acquisition module is used for acquiring third original relational data from the relational database according to the third index information.
In some embodiments, it is necessary to analyze whether there is an anomaly in all relationships associated with a certain object, and in order to determine whether there is an anomaly in all relationships associated with a certain object, the apparatus may further include:
a seventh determining module, configured to determine a seventh node corresponding to the seventh object in the graph database in response to a sixth input of the seventh object and the second condition by the first user after storing the second index information in the second attribute hierarchy;
an eighth determining module, configured to determine all second paths passing through the seventh node;
a second determining module, configured to perform, for each second path in all the second paths: judging whether a fifth edge included in the second path meets a second condition, wherein the second condition is determined by setting a limiting condition on the attribute hierarchy of the edge included in the second path;
a fourth display module, configured to perform, for each second path in all the second paths: and displaying the second path under the condition that a fifth edge included in the second path meets the second condition.
Fig. 4 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
As shown in fig. 4, the electronic device 4 is a structural diagram of an exemplary hardware architecture of an electronic device capable of implementing the data processing method and the data processing apparatus according to the embodiment of the present application. The electronic device may refer to an electronic device in the embodiments of the present application.
The electronic device 4 may comprise a processor 401 and a memory 402 in which computer program instructions are stored.
Specifically, the processor 401 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 402 may include mass storage for data or instructions. By way of example, and not limitation, memory 402 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 402 may include removable or non-removable (or fixed) media, where appropriate. The memory 402 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 402 is a non-volatile solid-state memory. In particular embodiments, memory 402 may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory 402 comprises one or more tangible (non-transitory) computer-readable storage media (e.g., a memory device) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to a method according to an aspect of the present application.
The processor 401 reads and executes the computer program instructions stored in the memory 402 to implement any one of the data processing methods in the above-described embodiments.
In one example, the electronic device can also include a communication interface 403 and a bus 404. As shown in fig. 4, the processor 401, the memory 402, and the communication interface 403 are connected via a bus 404 to complete communication therebetween.
The communication interface 403 is mainly used for implementing communication between modules, apparatuses, units and/or devices in this embodiment.
Bus 404 comprises hardware, software, or both coupling the components of the electronic device to one another. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 404 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The electronic device may execute the data processing method in the embodiment of the present application, so as to implement the data processing method and apparatus described in conjunction with fig. 1 to 3.
In addition, in combination with the data processing method in the foregoing embodiments, the embodiments of the present application may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any of the data processing methods in the above embodiments.
It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based computer instructions which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims (10)

1. A data processing method, comprising:
in response to a first input by a first user to first original relational data, determining a first object and a second object to which the first original relational data relates;
determining a first node corresponding to the first object in a graph database and a second node corresponding to the second object in the graph database, wherein the nodes in the graph database correspond to the objects, no more than one edge connecting every two nodes is used for indicating that a relationship exists between the objects corresponding to the two nodes connected with the edges, the edges are provided with different attribute hierarchies, and the different attribute hierarchies are used for storing index information of different types of original relational data;
displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in the graph database;
in response to a second input of a first user to a target attribute hierarchy of the first attribute hierarchies, determining first index information stored in the target attribute hierarchy;
acquiring second original relational data of a target attribute hierarchy between the first object and the second object from the relational database according to the first index information;
and displaying the second original relational data.
2. The method of claim 1, wherein prior to said determining the first object and the second object to which the target business data relates in response to the first input of the target business data by the first user, the method further comprises:
responding to a third input of the first user to the at least one piece of original relational data, and acquiring the at least one piece of original relational data;
for each piece of original relational data in the at least one piece of original relational data, respectively executing the following steps to obtain the graph database:
determining a second attribute hierarchy corresponding to the original relational data, second index information of the original relational data, and a third object and a fourth object related to the original relational data according to the original relational data;
establishing a second edge between a third node corresponding to the third object and a fourth node corresponding to the fourth object;
creating the second attribute hierarchy as the attribute hierarchy corresponding to the second edge;
and storing the second index information into the second attribute hierarchy.
3. The method of claim 2, wherein prior to said obtaining at least one piece of raw relational data in response to a third input by the first user to the at least one piece of raw relational data, the method further comprises:
acquiring an object list;
and establishing a node corresponding to each object in the graph database according to the object list.
4. The method of claim 2, wherein after said storing said second index information into said second attribute hierarchy, said method further comprises:
in response to a fourth input of a fifth object, a sixth object and a first condition by the first user, determining all first paths in the graph database, which connect a fifth node corresponding to the fifth object and a sixth node corresponding to the sixth object, wherein the first condition is determined by setting a defining condition for an attribute hierarchy of edges included in the first paths;
for each first path in all the first paths, respectively executing the following steps:
judging whether a third edge included in the first path meets the first condition or not;
displaying the first path if a third edge included in the first path satisfies the first condition.
5. The method of claim 4, wherein after the determining whether the first condition is satisfied by a third edge included in the first path, the method further comprises:
displaying a third attribute hierarchy corresponding to a third edge included in the first path under the condition that the third edge included in the first path meets the first condition;
in response to a fifth input by the first user to a fourth one of the third attribute hierarchies, determining third index information stored in the fourth attribute hierarchy, the fourth attribute hierarchy being an attribute hierarchy of a fourth one of the third edges;
and acquiring third original relational data from the relational database according to the third index information.
6. The method of claim 2, wherein after said storing said second index information into said second attribute hierarchy, said method further comprises:
in response to a sixth input by the first user of a seventh object and a second condition, determining a corresponding seventh node of the seventh object in the graph database;
determining all second paths passing through the seventh node;
for each second path in all the second paths, respectively executing the following steps:
judging whether a fifth edge included in the second path meets the second condition, wherein the second condition is determined by setting a limiting condition on the attribute hierarchy of the edge included in the second path;
and displaying the second path under the condition that a fifth edge included in the second path meets the second condition.
7. A data processing apparatus, characterized in that the apparatus comprises:
a first determining module for determining a first object and a second object to which first original relational data relates in response to a first input of the first original relational data by a first user;
a second determining module, configured to determine a first node corresponding to the first object in a graph database and a second node corresponding to the second object in the graph database, where a node in the graph database corresponds to an object, and no more than one edge connecting every two nodes is used to indicate that a relationship exists between objects corresponding to the two nodes connected to the edge, where the edge is provided with different attribute hierarchies used to store index information of different types of original relationship type data;
the first display module is used for displaying a first attribute hierarchy corresponding to a first edge between a first node and a second node in the graph database;
a third determining module, configured to determine, in response to a second input by the first user to a target attribute hierarchy of the first attribute hierarchies, first index information stored in the target attribute hierarchy;
a first obtaining module, configured to obtain, from the relational database, second original relational data of a target attribute hierarchy between the first object and the second object according to the first index information;
and the second display module is used for displaying the second original relational data.
8. An electronic device, characterized in that the device comprises: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a data processing method as claimed in any one of claims 1-6.
9. A computer storage medium, characterized in that it has stored thereon computer program instructions which, when executed by a processor, implement a data processing method according to any one of claims 1 to 6.
10. A computer program product, characterized in that instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the data processing method according to any of claims 1-6.
CN202210390235.0A 2022-04-14 2022-04-14 Data processing method, device, equipment and medium Active CN114996297B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210390235.0A CN114996297B (en) 2022-04-14 2022-04-14 Data processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210390235.0A CN114996297B (en) 2022-04-14 2022-04-14 Data processing method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN114996297A true CN114996297A (en) 2022-09-02
CN114996297B CN114996297B (en) 2023-09-26

Family

ID=83023471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210390235.0A Active CN114996297B (en) 2022-04-14 2022-04-14 Data processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114996297B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1349081A1 (en) * 2002-03-28 2003-10-01 LION Bioscience AG Method and apparatus for querying relational databases
CN106874422A (en) * 2017-01-25 2017-06-20 东南大学 A kind of figure querying method of facing relation type database
US20170329871A1 (en) * 2016-05-13 2017-11-16 Tibco Software Inc. Using a b-tree to store graph information in a database
US20180239796A1 (en) * 2017-02-21 2018-08-23 Linkedin Corporation Multi-tenant distribution of graph database caches
CN109726305A (en) * 2018-12-30 2019-05-07 中国电子科技集团公司信息科学研究院 A kind of complex_relation data storage and search method based on graph structure
EP3511842A1 (en) * 2018-01-16 2019-07-17 Palantir Technologies Inc. Concurrent automatic adaptive storage of datasets in graph databases
CN111930958A (en) * 2020-07-13 2020-11-13 车智互联(北京)科技有限公司 Graph database construction method, computing device and readable storage medium
CN112507354A (en) * 2020-12-04 2021-03-16 北京神州泰岳软件股份有限公司 Graph database-based authority management method and system
CN112988752A (en) * 2021-03-29 2021-06-18 北京大米科技有限公司 Resource management method, device, storage medium and electronic equipment
CN112988758A (en) * 2021-04-26 2021-06-18 北京芯愿景软件技术股份有限公司 Target object positioning method and device, electronic equipment and storage medium
CN112988915A (en) * 2021-01-27 2021-06-18 厦门市健康医疗大数据中心(厦门市医药研究所) Data display method and device
CN113901279A (en) * 2021-12-03 2022-01-07 支付宝(杭州)信息技术有限公司 Graph database retrieval method and device
CN114118816A (en) * 2021-11-30 2022-03-01 建信金融科技有限责任公司 Risk assessment method, device and equipment and computer storage medium
CN114116716A (en) * 2021-11-19 2022-03-01 天翼数字生活科技有限公司 Hierarchical data retrieval method, device and equipment

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1349081A1 (en) * 2002-03-28 2003-10-01 LION Bioscience AG Method and apparatus for querying relational databases
US20170329871A1 (en) * 2016-05-13 2017-11-16 Tibco Software Inc. Using a b-tree to store graph information in a database
CN106874422A (en) * 2017-01-25 2017-06-20 东南大学 A kind of figure querying method of facing relation type database
US20180239796A1 (en) * 2017-02-21 2018-08-23 Linkedin Corporation Multi-tenant distribution of graph database caches
EP3511842A1 (en) * 2018-01-16 2019-07-17 Palantir Technologies Inc. Concurrent automatic adaptive storage of datasets in graph databases
CN109726305A (en) * 2018-12-30 2019-05-07 中国电子科技集团公司信息科学研究院 A kind of complex_relation data storage and search method based on graph structure
CN111930958A (en) * 2020-07-13 2020-11-13 车智互联(北京)科技有限公司 Graph database construction method, computing device and readable storage medium
CN112507354A (en) * 2020-12-04 2021-03-16 北京神州泰岳软件股份有限公司 Graph database-based authority management method and system
CN112988915A (en) * 2021-01-27 2021-06-18 厦门市健康医疗大数据中心(厦门市医药研究所) Data display method and device
CN112988752A (en) * 2021-03-29 2021-06-18 北京大米科技有限公司 Resource management method, device, storage medium and electronic equipment
CN112988758A (en) * 2021-04-26 2021-06-18 北京芯愿景软件技术股份有限公司 Target object positioning method and device, electronic equipment and storage medium
CN114116716A (en) * 2021-11-19 2022-03-01 天翼数字生活科技有限公司 Hierarchical data retrieval method, device and equipment
CN114118816A (en) * 2021-11-30 2022-03-01 建信金融科技有限责任公司 Risk assessment method, device and equipment and computer storage medium
CN113901279A (en) * 2021-12-03 2022-01-07 支付宝(杭州)信息技术有限公司 Graph database retrieval method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
叶帅: "基于Neo4j的煤矿领域知识图谱构建及查询方法研究", 信息科技辑 *

Also Published As

Publication number Publication date
CN114996297B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
US8332389B2 (en) Join order for a database query
EP2909752B1 (en) Profiling data with location information
US8326869B2 (en) Analysis of object structures such as benefits and provider contracts
US10565201B2 (en) Query processing management in a database management system
US9218394B2 (en) Reading rows from memory prior to reading rows from secondary storage
CN104731814A (en) System and method for flexibly comparing and analyzing data
CN108681603B (en) Method for rapidly searching tree structure data in database and storage medium
US20190034500A1 (en) Creating dashboards for viewing data in a data storage system based on natural language requests
CN104794130B (en) Relation query method and device between a kind of table
KR20120108886A (en) Two phase method for processing multi-way join query over data streams
Jiang et al. Incremental evaluation of top-k combinatorial metric skyline query
US11573987B2 (en) System for detecting data relationships based on sample data
CN115328883A (en) Data warehouse modeling method and system
Khan et al. Set-based unified approach for attributed graph summarization
US9984107B2 (en) Database joins using uncertain criteria
CN113722600A (en) Data query method, device, equipment and product applied to big data
CN111797095A (en) Index construction method and JSON data query method
CN114996297B (en) Data processing method, device, equipment and medium
EP2530609A1 (en) Apparatus and method of searching for instance path based on ontology schema
CN107609110B (en) Mining method and device for maximum multiple frequent patterns based on classification tree
CN110008239A (en) Logic based on precomputation optimization executes optimization method and system
CN113076322A (en) Commodity search processing method and device
Aladakatti et al. Raif-semantics: a robust automated interlinking framework for semantic web using mapreduce and multi-node data processing
US20190034555A1 (en) Translating a natural language request to a domain specific language request based on multiple interpretation algorithms
US11341147B1 (en) Finding dimensional correlation using hyperloglog

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant