CN113778990A

CN113778990A - Method and system for constructing distributed graph database

Info

Publication number: CN113778990A
Application number: CN202111022933.7A
Authority: CN
Inventors: 张超; 陈贺巍
Original assignee: Bairong Zhixin Beijing Credit Investigation Co Ltd
Current assignee: Bairong Zhixin Beijing Credit Investigation Co Ltd
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2021-12-10

Abstract

The invention discloses a method and a system for constructing a distributed graph database, which construct a configuration management layer; constructing a computing layer, and providing computing power for nodes of a computing engine through the computing layer; constructing a metadata layer, and maintaining metadata indexed by the opposite sides, the top points and the indexes of the metadata layer; constructing a storage layer, wherein the storage layer and the computation layer are independent of each other; and configuring a big data assembly, wherein the big data assembly is used for filtering nodes, updating logic and cleaning data. The problem of when the data volume increases, unit vertical expansion can not satisfy the business demand, the maintenance degree of data is high, can't guarantee the uniformity of data, can't guarantee time retrospective nature, graph data base performance decay serious technique when increasing in scale among the prior art is solved, reach distributed level expansion, deal with the sudden increase influence of business, guarantee the uniformity of data, guarantee time retrospective nature, promote the graph data base and deal with the technical effect of handling performance when increasing in scale.

Description

Method and system for constructing distributed graph database

Technical Field

The invention relates to the field of graph database construction, in particular to a method and a system for constructing a distributed graph database.

Background

After 2000, with the blowout-type growth of internet data, business growth brought about two huge changes: the dramatic increase in data volume and the complication of data association. Meanwhile, the expectation of the user on the data value is higher and higher, and the information data of various dimensions related to the client can be displayed more clearly and more comprehensively in one graph. Particularly, the method is particularly obvious in the intricate social, logistics and financial wind control industries. In recent two years, Graph databases are more popular with the market, Graph databases of different types of scenes are also promoted in different business scenes, and according to database trend information of DB-Engineers, Graph DBMS can be found to be the database which grows most rapidly and presents a trend of increasing year by year.

However, in the process of implementing the technical solution of the invention in the embodiments of the present application, the inventors of the present application find that the above-mentioned technology has at least the following technical problems:

when the data volume of a graph database in the prior art is increased, the single-machine vertical expansion cannot meet the service requirement, the data maintenance degree is high, the consistency of the data cannot be ensured, the time backtracking cannot be ensured, and the performance attenuation is serious when the scale of the graph database is increased.

Disclosure of Invention

The embodiment of the application provides a method and a system for constructing a distributed graph database, solves the technical problems that when the data volume of a graph database in the prior art is increased, the single-machine vertical expansion cannot meet the service requirement, the data maintenance degree is high, the data consistency cannot be ensured, the time backtracking cannot be ensured, and the performance attenuation is serious when the graph database scale is increased, achieves the distributed horizontal expansion, deals with the sudden increase influence of the service, ensures the data consistency, ensures the time backtracking, and improves the technical effect that the graph database can handle the processing performance when the scale is increased.

In view of the above problems, the embodiments of the present application provide a method and a system for constructing a distributed graph database.

In a first aspect, the present application provides a method for constructing a distributed graph database, where the method includes: constructing a configuration management layer; constructing a computing layer, and providing computing power for nodes of a computing engine through the computing layer; constructing a metadata layer, and maintaining metadata indexed by the opposite sides, the top points and the indexes of the metadata layer; constructing a storage layer, wherein the storage layer and the computation layer are independent of each other; and configuring a big data assembly, wherein the big data assembly is used for filtering nodes, updating logic and cleaning data.

In another aspect, the present application further provides a system for constructing a distributed graph database, where the system includes: a first building unit for building a configuration management layer; the second construction unit is used for constructing a calculation layer, and providing calculation power for the nodes of the calculation engine through the calculation layer; the third construction unit is used for constructing a metadata layer and maintaining metadata indexed by the opposite sides, the top points and the indexes of the metadata layer; a fourth construction unit for constructing a storage tier, wherein the storage tier is independent from the computing tier; the system comprises a first configuration unit, a second configuration unit and a third configuration unit, wherein the first configuration unit is used for configuring a big data assembly, and the big data assembly is used for node filtering, logic updating and data cleaning.

In a third aspect, the present invention provides a system for constructing a distributed graph database, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to the first aspect when executing the program.

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

the construction configuration management layer is adopted; constructing a computing layer, and providing computing power for nodes of a computing engine through the computing layer; constructing a metadata layer, and maintaining metadata indexed by the opposite sides, the top points and the indexes of the metadata layer; constructing a storage layer, wherein the storage layer and the computation layer are independent of each other; the method comprises the steps of configuring a big data component, wherein the big data component is used for node filtering, logic updating and data cleaning, forming a graph database by configuring a management layer, a calculation layer, a metadata layer, a storage layer and the big data component, separating the calculation layer from the storage layer, expanding capacity of corresponding resources only when any resource reaches a bottleneck, distributing big data engine resources by configuring the big data component, realizing resource processing and cleaning of data, achieving distributed horizontal expansion, coping with sudden increase influence of business, ensuring data consistency, ensuring time backtracking and improving the technical effect of the graph database on processing performance when the scale is increased.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

FIG. 1 is a schematic flow chart of a method for constructing a distributed graph database according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a process of cleaning a file according to a method for constructing a distributed graph database according to an embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating a time backtracking process performed on a file according to a method for constructing a distributed graph database according to an embodiment of the present application;

FIG. 4 is a schematic flow chart illustrating further processing of time backtracking in a method of constructing a distributed graph database according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a system for constructing a distributed graph database according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an exemplary electronic device according to an embodiment of the present application.

Description of reference numerals: a first building element 11, a second building element 12, a third building element 13, a fourth building element 14, a first configuration element 15, electronics 50, a processor 51, a memory 52, an input device 53, an output device 54.

Detailed Description

The embodiment of the application provides a method and a system for constructing a distributed graph database, solves the technical problems that when the data volume of a graph database in the prior art is increased, the single-machine vertical expansion cannot meet the service requirement, the data maintenance degree is high, the data consistency cannot be ensured, the time backtracking cannot be ensured, and the performance attenuation is serious when the graph database scale is increased, achieves the distributed horizontal expansion, deals with the sudden increase influence of the service, ensures the data consistency, ensures the time backtracking, and improves the technical effect that the graph database can handle the processing performance when the scale is increased. Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Summary of the application

After 2000, with the blowout-type growth of internet data, business growth brought about two huge changes: the dramatic increase in data volume and the complication of data association. Meanwhile, the expectation of the user on the data value is higher and higher, and the information data of various dimensions related to the client can be displayed more clearly and more comprehensively in one graph. Particularly, the method is particularly obvious in the intricate social, logistics and financial wind control industries. In recent two years, Graph databases are more popular with the market, Graph databases of different types of scenes are also promoted in different business scenes, and according to database trend information of DB-Engineers, Graph DBMS can be found to be the database which grows most rapidly and presents a trend of increasing year by year. When the data volume of a graph database in the prior art is increased, the single-machine vertical expansion cannot meet the service requirement, the data maintenance degree is high, the consistency of the data cannot be ensured, the time backtracking cannot be ensured, and the performance attenuation is serious when the scale of the graph database is increased.

In view of the above technical problems, the technical solution provided by the present application has the following general idea:

the embodiment of the application provides a method for constructing a distributed graph database, wherein the method comprises the following steps: constructing a configuration management layer; constructing a computing layer, and providing computing power for nodes of a computing engine through the computing layer; constructing a metadata layer, and maintaining metadata indexed by the opposite sides, the top points and the indexes of the metadata layer; constructing a storage layer, wherein the storage layer and the computation layer are independent of each other; and configuring a big data assembly, wherein the big data assembly is used for filtering nodes, updating logic and cleaning data.

Having thus described the general principles of the present application, various non-limiting embodiments thereof will now be described in detail with reference to the accompanying drawings.

Example one

As shown in fig. 1, an embodiment of the present application provides a method for constructing a distributed graph database, where the method includes:

step S100: constructing a configuration management layer;

further, the configuration management layer further includes:

config: performing configuration maintenance on the metadata through the Config;

a GraphUI interface;

schedule: and scheduling the offline timing data task through the Schedule.

Specifically, the configuration management layer is a graph database layer for performing resource configuration, graph library query and resource scheduling integration. The configuration management layer comprises: the method comprises the following steps of Config, GraphUI and Schedule, wherein the Config has the function of carrying out configuration maintenance of background management general configuration Schema (Edge, Vertex and Index), namely metadata configuration maintenance, the GraphUI refers to a visual diagram library interface, the Schedule can carry out offline timing data task scheduling, and a foundation is laid for carrying out subsequent resource configuration, diagram library query, addition and modification of new data characteristics and metadata information dimension display through the construction of a configuration management layer.

Step S200: constructing a computing layer, and providing computing power for nodes of a computing engine through the computing layer;

specifically, the computation layer is a graph database layer that provides computational power for nodes of a computation engine, and the computation layer may further perform logical deduplication, filtering, backtracking computation logic, and sql parsing of a graph, where node1.. N: computing power is provided for the nodes of the compute engine.

Step S300: constructing a metadata layer, and maintaining metadata indexed by the opposite sides, the top points and the indexes of the metadata layer;

step S400: constructing a storage layer, wherein the storage layer and the computation layer are independent of each other;

further, the storage tier comprises an SSD.

Specifically, the metadata layer is a graph database layer of metadata operated by a graph library, and the metadata layer can maintain metadata of edges, vertexes and indexes and is represented by Schema. The Storage layer is a graph database layer for data Storage, the Storage layer comprises a Storage Engine, and preferably, the open source component is Hbase. The storage layer and the computing layer are separated, and when the subsequent computing power reaches the bottleneck, the storage layer and the computing layer are separated, and the capacity expansion of the nodes of the computing layer is achieved. And the bottom layer does not need to be expanded, so that resources are effectively saved. The storage layer is preferably an SSD, and the SSD refers to a solid state drive and is a hard disk made of a solid state electronic storage chip array. Therefore, the performance can be effectively provided by the separation of the computing layer and the storage layer, and the capacity can be expanded more conveniently.

Step S500: and configuring a big data assembly, wherein the big data assembly is used for filtering nodes, updating logic and cleaning data.

Further, the big data component comprises spark, flex, MapReduce.

Specifically, the big data component is a big data Engine, which is a big data platform for performing node filtering, logic updating, and data cleaning, and includes: spark, flink, MapRe duck. The map database comprises a map database, a map database and a map database storage layer, wherein the map database comprises a management layer, a calculation layer, a metadata layer, a storage layer and a big data assembly, the map database is formed by the management layer, the calculation layer, the metadata layer, the storage layer and the big data assembly, the calculation layer and the storage layer are separated, so that when the calculation power reaches a bottleneck, the calculation layer nodes can be expanded, the bottom layer does not need to be expanded, when the map database calculation layer and the storage layer are integrated, when any calculation layer or any storage layer reaches the bottleneck, the expansion needs to be synchronously performed, and resources are greatly wasted. Through the separation of the calculation layer and the storage layer, when any resource reaches a bottleneck, only the corresponding resource needs to be expanded, so that the resource is saved. Preferably, all storage layers are replaced by the SSD, so that the separation of computing and storing can effectively provide performance and is more convenient to expand. The unified management configuration layer integrates various configuration resources, gallery query and resource scheduling. And the configuration resources maintain the metadata information of the schema, so that various additions and modifications to new data characteristics (edge types, vertex types and attribute types) are facilitated, and the metadata information dimension of the current gallery is displayed. The data of the gallery can be conveniently inquired through the GraphUI. In addition, the schedule scheduling resources classify the data update of the gallery, the distribution of the big data engine platform resources is realized through the schedule scheduling resources, and the data resources are processed, cleaned and filtered to be processed into the required data resource files. The distributed horizontal capacity expansion is achieved, the sudden increase influence of the service is coped with, the consistency of data is guaranteed, the time backtracking performance is guaranteed, and the technical effect of processing performance when the size of the graph database is increased is improved.

Further, as shown in fig. 2, step S500 in the embodiment of the present application further includes:

step S510: uniformly cleaning and filtering the data files through an IDC1 machine room;

step S520: synchronizing the data file produced by the IDC1 computer room into a second Spark cluster of the IDC2 computer room through a first Spark;

step S530: and the IDC1 machine room and the IDC2 machine room are used for warehousing the same data file into the hbase cluster, and the data warehousing is realized through the bulkload of the hbase cluster.

Specifically, in the process of task scheduling in the dual computer rooms, due to the fact that the rpc interface is overtime, network bandwidth, transactions, resource scheduling and the like, data in the dual computer rooms are inconsistent, and further, the idempotency of the data cannot be guaranteed. The data files are uniformly cleaned and filtered through an IDC1 computer room, and generally, the cleaning and filtering comprises repeated value processing, missing value processing and difference exception processing. Thus, the uniqueness of the data file generated by the IDC1 machine room is ensured. And synchronizing the data files produced by the IDC1 computer room into the Spark cluster of the IDC2 computer room through Spark, wherein the data files in the Spark cluster of the IDC2 computer room are copied and pass the checksum consistency check of CRC32 of hadoop, so that the data files of the Spark cluster of the IDC2 computer room can be consistent with the data files of the IDC1 computer room. And then, the double computer rooms store the same data file into the hbase cluster, and the data storage is realized through the bulk of the hbase, so that the technical effect of ensuring the consistency of the data is achieved.

Further, as shown in fig. 3, step S600 in the embodiment of the present application further includes:

step S610: obtaining an original file, wherein the original file has an attribute flag and time information;

step S620: dividing the time information of the original file into months for storage;

step S630: and storing the customer information in the original file in a value result according to the attribute mark.

Specifically, the original file is any one of the data files, the data file comprises an attribute mark and time information, and when the original file is received, the original file is divided and stored according to the time information of the original file and months, so that the data can be prevented from being updated every day, a large amount of column data exists on the bottom layer, and the query performance of the bottom layer is influenced. For example, the original file may be file information including a user's scene, a mobile phone number, a client identifier, and a backtracking time, the sorted warehousing file is data including a vertex, an edge attribute, and an attribute value, the vertex includes the mobile phone number, the edge attribute includes the scene + month, and the attribute value includes the backtracking time + the client identifier. Based on the attribute marks carried by the original file, the specific client information in the original file is stored in the value result, so that only one piece of data is newly added in the value every time one piece of data is received, thereby avoiding the deep calculation on the service query, and realizing the time storage of the relationship through the design mode.

Further, as shown in fig. 4, step S700 in the embodiment of the present application further includes:

step S710: when the 3-degree relation of the two vertexes is inquired, time filtering is carried out on data obtained by each inquiry according to a preset time backtracking date, and meanwhile, data filtering is carried out by adding various filters;

step S720: and removing the duplicate of the edge relation obtained by the query in a multilateral duplicate removal mode, and removing the duplicate of the node on the call chain.

Specifically, although the storage engine supports data backtracking storage, some logical filtering decisions are still needed in the compute engine. When a 3-degree relation of two vertexes is inquired, time filtering is carried out on data inquired by each Thread according to a specific time backtracking date, various filters are added for data filtering, so that the data quantity stored to a computing layer by a bottom layer is reduced, computing resources are partially sunk into a storage engine, the use of network bandwidth is reduced, on the other hand, the inquired edge relation is deduplicated by a multilateral deduplication mode, so that repeated calling among multiple nodes is reduced, nodes on a call chain are deduplicated, so that closed-loop relation data is avoided, and in order to reduce unavailable service possibly caused by burrs, gc or other fault reasons of a single machine room, an MTTR (mean recovery time) function is added, which mainly has the function of calling another machine room data across a gallery when the service fails or is inquired slowly, therefore, the fastest return is ensured, and the upper-layer service application is not influenced.

To sum up, the method and the system for constructing the distributed graph database provided by the embodiment of the application have the following technical effects:

1. the construction configuration management layer is adopted; constructing a computing layer, and providing computing power for nodes of a computing engine through the computing layer; constructing a metadata layer, and maintaining metadata indexed by the opposite sides, the top points and the indexes of the metadata layer; constructing a storage layer, wherein the storage layer and the computation layer are independent of each other; the method comprises the steps of configuring a big data component, wherein the big data component is used for node filtering, logic updating and data cleaning, forming a graph database by configuring a management layer, a calculation layer, a metadata layer, a storage layer and the big data component, separating the calculation layer from the storage layer, expanding capacity of corresponding resources only when any resource reaches a bottleneck, distributing big data engine resources by configuring the big data component, realizing resource processing and cleaning of data, achieving distributed horizontal expansion, coping with sudden increase influence of business, ensuring data consistency, ensuring time backtracking and improving the technical effect of the graph database on processing performance when the scale is increased.

2. tps promotion: under the conditions of 50 hundred million + vertex and billion + edge relation, 2-degree tps is tested to reach 10000+ tps, the average response time is 35ms, and along with the expansion of nodes, a lifting space still exists, which is 3+ times higher than before.

3. Time backtracking: the backtracking of the data is ensured, and the requirement of modeling of upper-layer business products is greatly met. And (3) data volume promotion: previous graph databases, because of the limitations of single-source components, can only support side relationships on the scale of about 10 billion, and high performance is severely degraded. However, after the new gallery is adopted, the stock data is accessed into the gallery to form a super-large relational map, and the scale is at least 50 hundred million + vertex and billion + edge relations at present.

4. Consistency: the consistency of the two machine rooms is greatly guaranteed by adopting the new image library, when the Internet goes wrong, data are quickly switched to the other machine rooms, the data consistency is guaranteed, the operation is carried out at least 12+ times every year, and the service stability is greatly guaranteed.

Example two

Based on the same inventive concept as the method for constructing a distributed graph database in the foregoing embodiment, the present invention further provides a system for constructing a distributed graph database, as shown in fig. 5, the system includes:

a first building unit 11, wherein the first building unit 11 is used for building a configuration management layer;

a second building unit 12, where the second building unit 12 is configured to build a computation layer, and provide computation power for nodes of a computation engine through the computation layer;

a third constructing unit 13, where the third constructing unit 13 is configured to construct a metadata layer, and maintain metadata indexed by an opposite side, a vertex, and a vertex of the metadata layer;

a fourth construction unit 14, said fourth construction unit 14 being configured to construct a storage layer, wherein said storage layer is independent from said computation layer;

the first configuration unit 15 is configured to configure a big data component, and the big data component is used for node filtering, logic updating, and data cleaning.

Further, the system further comprises:

the first cleaning unit is used for uniformly cleaning and filtering the data files through an IDC1 machine room;

a first synchronization unit, configured to synchronize the data file produced by the IDC1 room to a second Spark cluster of the IDC2 room through a first Spark;

the first warehousing unit is used for the IDC1 machine room and the IDC2 machine room to warehouse the same data file into the hbase cluster, and the data warehousing is realized through the bulk load of the hbase cluster.

Further, the system further comprises:

a first obtaining unit, configured to obtain an original file, where the original file has an attribute flag and time information;

the first storage unit is used for dividing the time information of the original file into months for storage;

and the second storage unit is used for storing the customer information in the original file in a value result according to the attribute mark.

Further, the system further comprises:

the first query unit is used for carrying out time filtering on data obtained by each query according to a predetermined time backtracking date and adding a plurality of filters for carrying out data filtering when the 3-degree relation of the two vertexes is queried;

the first duplicate removal unit is used for removing the duplicate of the edge relation obtained by the query in a multilateral duplicate removal mode and removing the duplicate of the node on the calling chain.

Various changes and specific examples of the method for constructing a distributed graph database in the first embodiment of fig. 1 are also applicable to the system for constructing a distributed graph database in the present embodiment, and those skilled in the art can clearly know the method for constructing a distributed graph database in the present embodiment through the foregoing detailed description of the method for constructing a distributed graph database, so for the brevity of the description, detailed descriptions are not repeated here.

Exemplary electronic device

The electronic apparatus of the embodiment of the present application is described below with reference to fig. 6.

Fig. 6 illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application.

Based on the inventive concept of the method for constructing a distributed graph database in the foregoing embodiment, the present invention further provides a system for constructing a distributed graph database, and an electronic device according to an embodiment of the present application is described below with reference to fig. 6. The electronic device may be a removable device itself or a stand-alone device independent thereof, on which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods as described hereinbefore.

As shown in fig. 6, the electronic device 50 includes one or more processors 51 and a memory 52.

The processor 51 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 50 to perform desired functions.

The memory 52 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 51 to implement the methods of the various embodiments of the application described above and/or other desired functions.

In one example, the electronic device 50 may further include: an input device 53 and an output device 54, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The embodiment of the invention provides a method for constructing a distributed graph database, which comprises the following steps: constructing a configuration management layer; constructing a computing layer, and providing computing power for nodes of a computing engine through the computing layer; constructing a metadata layer, and maintaining metadata indexed by the opposite sides, the top points and the indexes of the metadata layer; constructing a storage layer, wherein the storage layer and the computation layer are independent of each other; and configuring a big data assembly, wherein the big data assembly is used for filtering nodes, updating logic and cleaning data. The problem of when the data volume increases, unit vertical expansion can not satisfy the business demand, the maintenance degree of data is high, can't guarantee the uniformity of data, can't guarantee time retrospective nature, graph data base performance decay serious technique when increasing in scale among the prior art is solved, reach distributed level expansion, deal with the sudden increase influence of business, guarantee the uniformity of data, guarantee time retrospective nature, promote the graph data base and deal with the technical effect of handling performance when increasing in scale.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for causing a computer device to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted from a computer-readable storage medium to another computer-readable storage medium, which may be magnetic (e.g., floppy disks, hard disks, tapes), optical (e.g., DVDs), or semiconductor (e.g., Solid State Disks (SSDs)), among others.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Additionally, the terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should be understood that in the embodiment of the present application, "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of constructing a distributed graph database, wherein the method comprises:

constructing a configuration management layer;

constructing a computing layer, and providing computing power for nodes of a computing engine through the computing layer;

constructing a metadata layer, and maintaining metadata indexed by the opposite sides, the top points and the indexes of the metadata layer;

constructing a storage layer, wherein the storage layer and the computation layer are independent of each other;

and configuring a big data assembly, wherein the big data assembly is used for filtering nodes, updating logic and cleaning data.

2. The method of claim 1, wherein the configuration management layer comprises:

a GraphUI interface;

schedule: and scheduling the offline timing data task through the Schedule.

3. The method of claim 1, wherein the big data component comprises spark, flex, MapReduce.

4. The method of claim 1, wherein the storage tier comprises an SSD.

5. The method of claim 3, wherein the method comprises:

uniformly cleaning and filtering the data files through an IDC1 machine room;

synchronizing the data file produced by the IDC1 computer room into a second Spark cluster of the IDC2 computer room through a first Spark;

and the IDC1 machine room and the IDC2 machine room are used for warehousing the same data file into the hbase cluster, and the data warehousing is realized through the bulkload of the hbase cluster.

6. The method of claim 1, wherein the method comprises:

obtaining an original file, wherein the original file has an attribute flag and time information;

dividing the time information of the original file into months for storage;

and storing the customer information in the original file in a value result according to the attribute mark.

7. The method of claim 1, wherein the method comprises:

when the 3-degree relation of the two vertexes is inquired, time filtering is carried out on data obtained by each inquiry according to a preset time backtracking date, and meanwhile, data filtering is carried out by adding various filters;

and removing the duplicate of the edge relation obtained by the query in a multilateral duplicate removal mode, and removing the duplicate of the node on the call chain.

8. A system for constructing a distributed graph database, wherein said system comprises:

a first building unit for building a configuration management layer;

the second construction unit is used for constructing a calculation layer, and providing calculation power for the nodes of the calculation engine through the calculation layer;

the third construction unit is used for constructing a metadata layer and maintaining metadata indexed by the opposite sides, the top points and the indexes of the metadata layer;

a fourth construction unit for constructing a storage tier, wherein the storage tier is independent from the computing tier;

the system comprises a first configuration unit, a second configuration unit and a third configuration unit, wherein the first configuration unit is used for configuring a big data assembly, and the big data assembly is used for node filtering, logic updating and data cleaning.

9. A system for constructing a distributed graph database, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of claims 1 to 7 when executing the program.