CN115658978B - Graph database system multi-source data importing method and device - Google Patents

Graph database system multi-source data importing method and device Download PDF

Info

Publication number
CN115658978B
CN115658978B CN202211419937.3A CN202211419937A CN115658978B CN 115658978 B CN115658978 B CN 115658978B CN 202211419937 A CN202211419937 A CN 202211419937A CN 115658978 B CN115658978 B CN 115658978B
Authority
CN
China
Prior art keywords
data
graph database
source data
source
configuration file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211419937.3A
Other languages
Chinese (zh)
Other versions
CN115658978A (en
Inventor
王昌圆
叶小萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Ouruozhi Technology Co ltd
Original Assignee
Hangzhou Ouruozhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Ouruozhi Technology Co ltd filed Critical Hangzhou Ouruozhi Technology Co ltd
Priority to CN202211419937.3A priority Critical patent/CN115658978B/en
Publication of CN115658978A publication Critical patent/CN115658978A/en
Application granted granted Critical
Publication of CN115658978B publication Critical patent/CN115658978B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a graph database system multi-source data import method and a graph database system multi-source data import device, wherein the graph database system multi-source data import method comprises the following steps: reading multi-source data according to a preset configuration file, wherein the configuration file configures corresponding data sources and source data system information for each point or edge configuration item; converting the multi-source data according to a preset configuration file, wherein the configuration file configures the mapping relation between the field and the graph database Schema attribute information aiming at the field of each source data; writing the converted data into the graph database, and re-importing the data which fails to be imported. By the method and the device, the problem that large-scale multi-source data cannot be imported into the graph database at the same time in the related technology is solved, and the import efficiency of the multi-source data into the graph database is improved.

Description

Graph database system multi-source data importing method and device
Technical Field
The application relates to the technical field of computers, in particular to a multi-source data importing method and device for a graph database system.
Background
With the rapid development of big data and artificial intelligence, the ultra-large scale network diagram has wide application space and good development prospect in the fields of finance, wind control, security protection, recommendation and the like, and the rapid rise of the network data of the diagram is stimulated. In each field application of graph database, the first step of graph application is to perform large-scale data filling on graphs, and when graph data filling is performed in actual business, data sources are various. In the related art, large-scale multi-source data cannot be simultaneously imported into a graph database, and how to rapidly and simultaneously import the graph database of the large-scale multi-source data is a problem which needs to be mainly solved when the graph database is applied.
At present, no effective solution is provided aiming at the problem that large-scale multi-source data cannot be imported into a graph database simultaneously in the related technology.
Disclosure of Invention
The embodiment of the application provides a graph database system multi-source data importing method and device, and aims to at least solve the problem that large-scale multi-source data cannot be imported into a graph database simultaneously in the related technology.
In a first aspect, an embodiment of the present application provides a multi-source data importing method for a graph database system, where the method includes:
reading multi-source data according to a preset configuration file, wherein the configuration file configures corresponding data sources and source data system information for each point or edge configuration item;
converting the multi-source data according to a preset configuration file, wherein the configuration file configures mapping relations between fields and graph database Schema attribute information aiming at the fields of each source data;
writing the converted data into the graph database, and re-importing the data which fails to be imported.
In some of these embodiments, the reading multi-source data process comprises:
according to the configuration file, whether the column number and the column name of the multi-source data to be read are correct is verified, and whether the column number and the data type of the data are correct is verified according to the mapping relation of the point-edge attributes in the graph database;
and if the verification is passed, reading the multi-source data, and performing row clipping on the data according to the configuration information.
In some embodiments, before verifying the multi-source data to be read, the reading the multi-source data further includes:
determining driving information of a corresponding database under the condition that a data source is a database or a data bin so as to perform connection verification of the database and data reading in the database;
and under the condition that the data source is a streaming data source system, determining configuration information of a corresponding system to establish connection with the streaming data source system, and periodically processing data records in micro batch according to a data processing period.
In some of these embodiments, the configuration file further includes a service address of a graph database, and the transforming multi-source data process includes:
determining Schema attribute information of the graph database according to the service address;
determining a target data type corresponding to the field of the source data in a graph database according to the mapping relation between the field of the source data and graph database Schema attribute information;
determining whether the data type of the field conforms to the target data type; if so, the data is encoded to construct a data structure of points and edges that can be supported by the graph database.
In some embodiments, writing the converted data to a graph database includes:
partitioning the converted data according to the number of partitions preset in the configuration file, independently writing the data into different partitions, and balancing the load of each service node of the distributed graph database system, wherein:
in the process of load balancing, a polling strategy is adopted when multiple connections are established, available graph database services are routed according to a service state table, a random strategy is adopted when a session is established, services of a graph database system are mixed, and requests are dispersed to different service nodes of a graph database.
In some embodiments, before writing data in each partition, in the case of importing multi-source data in bulk, the writing of the converted data into the graph database further includes:
and storing the point or edge data into the cache region, and uniformly generating insertion statements of the graph database system aiming at the data in the cache region under the condition that the data amount in the cache region reaches a batch value preset in the configuration file.
In some of these embodiments, after writing the converted data to the graph database, the method includes:
performing import parameter tuning test according to the resource allocation of the database so as to improve the import performance; wherein the database resource configuration comprises: the data volume sent by one request to the server, the partition number of the source data, and the number of executors or the number of executor cores allocated by the import task.
In a second aspect, an embodiment of the present application provides a multi-source data importing apparatus for a graph database system, where the apparatus includes:
the reading module is used for reading multi-source data according to a preset configuration file, wherein the configuration file configures corresponding data sources and source data system information aiming at each point or edge configuration item;
the conversion module is used for converting the multi-source data according to a preset configuration file, wherein the configuration file configures the mapping relation between the fields and the graph database Schema attribute information aiming at the fields of each source data;
and the writing module is used for writing the converted data into the graph database and re-importing the data which fails to be imported.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to execute the graph database system multi-source data importing method.
In a fourth aspect, an embodiment of the present application provides a storage medium, in which a computer program is stored, where the computer program is configured to execute the graph database system multi-source data importing method when running.
Compared with the prior art, the graph database system multi-source data importing method provided by the embodiment of the application reads multi-source data according to a preset configuration file, wherein the configuration file configures corresponding data sources and source data system information for each point or edge configuration item; converting the multi-source data according to a preset configuration file, wherein the configuration file configures the mapping relation between the field and the graph database Schema attribute information aiming at the field of each source data; writing the converted data into a graph database, and re-importing the data which fails to be imported, so that the simultaneous import of data of various different data sources into the graph database is supported, the problem that large-scale multi-source data cannot be imported into the graph database simultaneously in the related technology is solved, and the import efficiency of the multi-source data into the graph database is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a diagram of an application environment of a graph database system multi-source data import method according to an embodiment of the present application;
FIG. 2 is a flow diagram of a graph database system multi-source data import method according to an embodiment of the present application;
FIG. 3 is a block diagram illustrating an overall flow of importing a multi-source data into a graph database according to an embodiment of the present application;
FIG. 4 is a block diagram of a graph database system multi-source data import apparatus according to an embodiment of the present application;
fig. 5 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless otherwise defined, technical or scientific terms referred to herein should have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The use of the terms "a" and "an" and "the" and similar referents in the context of describing the invention (including a single reference) are to be construed in a non-limiting sense as indicating either the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, "a and/or B" may indicate: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The graph database system multi-source data importing method provided by the application can be applied to the application environment shown in fig. 1, fig. 1 is an application environment schematic diagram of the graph database system multi-source data importing method according to the embodiment of the application, and as shown in fig. 1, the terminal 102 and the server 104 communicate through a network. A user configures a configuration file in advance through the terminal 102, in the configuration file, corresponding data sources and source data system information are configured aiming at each point or edge configuration item, and a mapping relation between each field and graph database Schema attribute information is configured aiming at each field of source data, so that data import is carried out through command submission, and the user only needs to fill in the configuration file without encoding. It should be noted that the Schema of the graph database refers to metadata information stored in the graph database, including point types and edge types in the graph, and information such as attribute names, attribute data types, default values, attribute settings, and the like maintained by each type. The server 104 reads the multi-source data according to a preset configuration file under the condition that a data import command of a user is obtained, converts the multi-source data according to the preset configuration file, finally writes the converted data into a graph database, and conducts re-import on the data which fails to be imported. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
The embodiment provides a method for importing multi-source data of a graph database system, and fig. 2 is a flowchart of the method for importing multi-source data of the graph database system according to the embodiment of the application, and as shown in fig. 2, the flowchart includes the following steps:
step S201, reading multi-source data according to a preset configuration file, wherein the configuration file configures corresponding data sources and source data system information for each point or edge configuration item;
for example, when a user imports data of different data sources into a graph database system, data reading is performed according to a configuration file preset by the user, whether the column number and the column name of the data are correct is verified, and the column number and the data type are verified according to a set mapping relation of point-edge attributes in a graph database; reading data if the verification is passed, and performing row and column clipping on the data according to the configuration information; at the same operation time, allowing to read data from a plurality of different sources, requiring a user to configure a plurality of point or edge configuration items in a configuration file in advance, and configuring a file address, a server address, a port number and other special configurations of a corresponding data source and a source data system in each edge configuration item;
step S202, converting the multi-source data according to a preset configuration file, wherein the configuration file configures the mapping relation between the field and the graph database Schema attribute information aiming at the field of each source data;
for example, a request is initiated to a graph database service according to a graph database service address configured in a configuration file, and a graph library Schema configured by a user is queried, wherein the graph library Schema includes a data type of a graph library point id, a point edge attribute set, a data type of a point edge attribute and the like; meanwhile, mapping is carried out based on data fields in a source data system in the configuration file and attribute fields in the graph database to obtain a mapping table of the data fields in the source data system and target data types in the graph database system, namely mapping from the source data fields to the target data types;
performing data filtering according to the format requirement of the point id or the edge id of the graph database Schema, and when the data format of the id or rank data does not meet the setting of the target Schema (for example, the data type does not meet the requirement or the data is empty), performing log printing or terminating the current import task according to whether the current data source is a streaming data source, and not allowing illegal data to enter the graph database; encoding the data meeting the conditions to construct a data structure of points and edges which can be supported by the graph data;
step S203, writing the converted data into the graph database, and re-importing the data which fails to be imported.
Through the steps S201 to S203, in contrast to the problem that large-scale multi-source data cannot be simultaneously imported into a graph database in the related art, in the embodiment of the present application, multi-source data is read according to a preset configuration file, wherein the configuration file configures corresponding data sources and source data system information for each point or edge configuration item; converting the multi-source data according to a preset configuration file, wherein the configuration file configures the mapping relation between each field of the source data and the graph database Schema attribute information; writing the converted data into a graph database, and re-importing the data which fails to be imported, so that the data of various different data sources can be imported into the graph database at the same time, the problem that large-scale multi-source data cannot be imported into the graph database at the same time in the related technology is solved, and the importing efficiency of the multi-source data into the graph database is improved.
Fig. 3 is a schematic overall flow diagram of importing multiple sources of data into a graph database according to an embodiment of the present application, and as shown in fig. 3, multiple different data sources are allowed to be configured simultaneously in one import task to import data of the graph database, where the imported data may be heterogeneous from multiple sources, and the types of the data sources include file data sources, database data sources, data sources in several bins, streaming data sources, and the like, so as to get through data transmission channels between each large storage system and the graph database. When a file data source is read, the file is allowed to be located in a local disk, and the file is also allowed to be located in a remote distributed file system. When a database data source or a data source of a plurality of bins is read, drive information (driver) of a corresponding database needs to be specified, database connection verification and data reading are carried out, and when data reading is carried out on the database, executable query sentences can be specified to carry out customized data acquisition; when reading the stream data source, the subject of the stream data source, the offset of the stream data, whether the stream data can be read repeatedly, the data processing period and other configurations need to be specified, the read stream data is different from other data sources, a connection channel is established with a stream data source system according to the configuration, the stream data is waited to be transmitted, and the record of the stream data is processed in batches regularly according to the data processing period.
It is understood that the point data includes a point id and a point attribute, and the edge data includes a source point id, a target point id, an edge rank and an edge attribute, so the process of encoding the data may be as follows: performing data conversion according to mapping tables of id types, source data and target data types of the graph library, converting the id data and the attribute data into data types and data formats which can be supported by a graph database, for example, conversion of address position types, and performing data conversion on Point type data by using a Point () function in the graph database; for example, time-type conversion, converting datatime-type data using a datatime () function in a graph database; for another example, the special characters in the String type data are subjected to escape, and the like, and after the conversion is completed, the source data are constructed into point or edge objects one by one; in the process, if the data does not meet the format requirement, the import task can be selected to be terminated or the data record can be directly discarded according to the configuration of a user; therefore, a user does not need to care about data Schema information in a data source, only needs to configure mapping of a source data field and a graph database Schema attribute name, and can automatically convert the data format based on the target Schema in the DataProcessor.
Optionally, in order to increase the concurrency of data conversion and writing and improve the import performance, in some embodiments, data writing may be performed in a distributed manner. Partitioning the converted data, and independently writing the data into different partitions, wherein the number of the partitions is specified by a configuration file; the preparation before data writing is carried out in the partitions comprises the steps of obtaining database connection, selecting a database to be imported, balancing the load of each service node of the distributed database system in different partitions, realizing load balancing in the process, and integrating the balancing logic in the import tool. And adopting a random strategy to mix (shuffle) the services of the graph database system when a session is established, and dispersing the insertion request to different service nodes of the graph database. In the data importing process of the distributed graph database, the concurrency of the whole importing is increased through multi-layer distributed data processing and request sending, and the data writing performance is improved. Meanwhile, service state maintenance availability of the distributed graph database service is carried out through timing tasks in the importing tool, a polling strategy is adopted when multiple connections are established, and the available graph database service is routed according to the service state table, so that connection load of the graph database service is balanced, and high availability of the service is guaranteed. Therefore, through distributed data processing, polling service connection establishment and random data distribution writing requests, concurrence of data import is increased, and overall import performance is improved.
Further, in some embodiments, the user is allowed to import the source data in batches, and backlog the point-edge data according to the amount of batch data configured in the configuration file. And plugging the point or edge data into a buffer bucket, and uniformly converting the point or edge data into an insertion statement supported by a graph database system when the number of point and edge objects in the bucket reaches a batch value, and sending the insertion statement to the graph database system.
Because data which is written in failure may occur in the writing process, the data file which is written in failure can be independently re-imported, or the data which is written in failure is cached in the data writing process, the data file is uniformly recorded after the importing task is completed at a certain point/side, and the data which is written in failure is pulled to perform uniform importing retry after all the importing tasks are completed. The data which still fails to be executed after the retry is cached again and written into the path configured by the configuration file. In addition, when data is imported, the tool can be subjected to parameter tuning according to the resource allocation of the actual graph database, and a user is allowed to conduct import tuning without codes. The optimal import performance is achieved by performing an optimization test on the data volume which is sent by the adjustment tool to the server side in one request, the number of partitions of source data by the import tool, the number of work executors allocated by the import task, the number of executor cores allocated by the import task and the like.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment also provides a database system multi-source data importing device, which is used for implementing the above embodiments and preferred embodiments, and the description of the device is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
FIG. 4 is a block diagram of a multi-source data importing apparatus of a graph database system according to an embodiment of the present application, as shown in FIG. 4, the apparatus includes a reading module 41, a converting module 42, and a writing module 43:
the reading module 41 is configured to read multi-source data according to a preset configuration file, where the configuration file configures, for each point or edge configuration item, a corresponding data source and source data system information;
the conversion module 42 is configured to convert the multi-source data according to a preset configuration file, where the configuration file configures, for each field of the source data, a mapping relationship between the field and the graph database Schema attribute information;
the writing module 43 is used for writing the converted data into the graph database and re-importing the data which fails to be imported.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules may be located in different processors in any combination.
The present embodiment also provides an electronic device, comprising a memory having a computer program stored therein and a processor configured to run the computer program to perform the steps of any of the method embodiments described above.
In addition, by combining the graph database system multi-source data importing method in the above embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the above-described embodiments of a method for importing multi-source data from a graph database system.
In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a graph database system multi-source data import method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
In one embodiment, fig. 5 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 5, an electronic device is provided, where the electronic device may be a server, and the internal structure diagram may be as shown in fig. 5. The electronic device includes a processor, a network interface, an internal memory, and a non-volatile memory, which stores an operating system, a computer program, and a database, connected by an internal bus. The processor is used for providing calculation and control capability, the network interface is used for communicating with an external terminal through network connection, the internal memory is used for providing an environment for an operating system and the running of a computer program, the computer program is executed by the processor to realize a database system multi-source data importing method, and the database is used for storing data.
Those skilled in the art will appreciate that the configuration shown in fig. 5 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct Rambus Dynamic RAM (DRDRAM), and Rambus Dynamic RAM (RDRAM), among others.
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims (7)

1. A multi-source data importing method for a graph database system is characterized by comprising the following steps:
reading multi-source data according to a preset configuration file, wherein the configuration file configures corresponding data sources and source data system information for each point or edge configuration item;
converting the multi-source data according to a preset configuration file, wherein the configuration file configures mapping relations between fields and graph database Schema attribute information aiming at the fields of each source data;
writing the converted data into a graph database, and re-importing the data which fails to be imported;
the process of reading multi-source data comprises the following steps: according to the configuration file, whether the column number and the column name of the multi-source data to be read are correct is verified, and whether the column number and the data type of the data are correct is verified according to the mapping relation of the point-edge attributes in the graph database; if the verification is passed, reading the multi-source data, and performing row and column clipping on the data according to configuration information;
the configuration file further comprises a service address of a graph database, and the process of converting multi-source data comprises the following steps: determining Schema attribute information of the graph database according to the service address; determining a target data type corresponding to the field of the source data in the graph database according to the mapping relation between the field of the source data and the graph database Schema attribute information; determining whether the data type of the field conforms to the target data type; if yes, encoding the data to construct a data structure of points and edges which can be supported by the graph database;
the writing of the converted data to a graph database comprises the following steps: partitioning the converted data according to the number of partitions preset in the configuration file, independently writing the data into different partitions, and balancing the load of each service node of the distributed graph database system, wherein: in the process of load balancing, a polling strategy is adopted when multiple connections are established, available graph database services are routed according to a service state table, a random strategy is adopted when a session is established, services of a graph database system are mixed, and requests are dispersed to different service nodes of a graph database.
2. The method of claim 1, wherein before validating the multi-source data to be read, the reading the multi-source data process further comprises:
determining driving information of a corresponding database under the condition that a data source is a database or a data bin so as to perform connection verification of the database and data reading in the database;
and under the condition that the data source is a streaming data source system, determining configuration information of a corresponding system to establish connection with the streaming data source system, and periodically processing data records in micro batch according to a data processing period.
3. The method of claim 1, wherein, in the case of importing multi-source data in a batch before writing data in each partition, the writing the converted data into the graph database further comprises:
and storing point or edge data into the buffer area, and uniformly generating insertion sentences of the database system aiming at the data in the buffer area under the condition that the data amount in the buffer area reaches a batch value preset in the configuration file.
4. The method of claim 1, wherein after writing the converted data to the graph database, the method comprises:
performing import parameter tuning test according to the resource allocation of the database so as to improve the import performance; wherein the database resource configuration comprises: the data volume sent by one request to the server, the partition number of the source data, and the number of executors or the number of executor cores allocated by the import task.
5. A multi-source data importing apparatus for a graph database system, the apparatus comprising:
the reading module is used for reading multi-source data according to a preset configuration file, wherein the configuration file configures corresponding data sources and source data system information aiming at each point or edge configuration item;
the conversion module is used for converting the multi-source data according to a preset configuration file, wherein the configuration file configures the mapping relation between the fields and the graph database Schema attribute information aiming at the fields of each source data;
the writing module is used for writing the converted data into the graph database and re-importing the data which fails to be imported;
the process of reading multi-source data comprises the following steps: according to the configuration file, whether the column number and the column name of the multi-source data to be read are correct is verified, and whether the column number and the data type of the data are correct is verified according to the mapping relation of the point-edge attributes in the graph database; if the verification is passed, reading the multi-source data, and performing row cutting on the data according to configuration information;
the configuration file further comprises a service address of a graph database, and the process of converting multi-source data comprises the following steps: determining Schema attribute information of the graph database according to the service address; determining a target data type corresponding to the field of the source data in the graph database according to the mapping relation between the field of the source data and the graph database Schema attribute information; determining whether the data type of the field conforms to the target data type; if so, encoding the data to construct a data structure of points and edges supportable by the graph database;
the writing of the converted data to a graph database comprises the steps of: partitioning the converted data according to the number of partitions preset in the configuration file, independently writing the data into different partitions, and balancing the load of each service node of the distributed graph database system, wherein: in the process of load balancing, a polling strategy is adopted when multiple connections are established, available graph database services are routed according to a service state table, a random strategy is adopted when a session is established, services of a graph database system are mixed, and requests are dispersed to different service nodes of a graph database.
6. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method of multi-source data import for a graph database system according to any of claims 1 to 4.
7. A storage medium having stored thereon a computer program, wherein the computer program is arranged to execute the graph database system multi-source data import method of any of claims 1 to 4 when executed.
CN202211419937.3A 2022-11-14 2022-11-14 Graph database system multi-source data importing method and device Active CN115658978B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211419937.3A CN115658978B (en) 2022-11-14 2022-11-14 Graph database system multi-source data importing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211419937.3A CN115658978B (en) 2022-11-14 2022-11-14 Graph database system multi-source data importing method and device

Publications (2)

Publication Number Publication Date
CN115658978A CN115658978A (en) 2023-01-31
CN115658978B true CN115658978B (en) 2023-04-07

Family

ID=85021806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211419937.3A Active CN115658978B (en) 2022-11-14 2022-11-14 Graph database system multi-source data importing method and device

Country Status (1)

Country Link
CN (1) CN115658978B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116594958A (en) * 2023-05-25 2023-08-15 之江实验室 Graph dataset loading method, system, electronic device and medium
CN116628274B (en) * 2023-07-25 2023-09-22 浙江锦智人工智能科技有限公司 Data writing method, device and medium for graph database
CN116992065B (en) * 2023-09-26 2024-01-12 之江实验室 Graph database data importing method, system, electronic equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462269A (en) * 2014-11-24 2015-03-25 中国联合网络通信集团有限公司 Isomerous database data exchange method and system
WO2019232828A1 (en) * 2018-06-06 2019-12-12 平安科技(深圳)有限公司 Script deployment method and apparatus, and computer device and storage medium
CN111581169A (en) * 2020-03-25 2020-08-25 中国平安人寿保险股份有限公司 Data import method and device, computer equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7720873B2 (en) * 2007-06-21 2010-05-18 International Business Machines Corporation Dynamic data discovery of a source data schema and mapping to a target data schema
CN111221791A (en) * 2018-11-27 2020-06-02 中云开源数据技术(上海)有限公司 Method for importing multi-source heterogeneous data into data lake
CN113220659B (en) * 2021-04-08 2023-06-09 杭州费尔斯通科技有限公司 Data migration method, system, electronic device and storage medium
CN114461712A (en) * 2022-01-05 2022-05-10 中盈优创资讯科技有限公司 Method and device for importing and exporting multi-source heterogeneous data source and graph database
CN114647689A (en) * 2022-03-10 2022-06-21 杭州欧若数网科技有限公司 Method, system, device and medium for importing data of graph database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462269A (en) * 2014-11-24 2015-03-25 中国联合网络通信集团有限公司 Isomerous database data exchange method and system
WO2019232828A1 (en) * 2018-06-06 2019-12-12 平安科技(深圳)有限公司 Script deployment method and apparatus, and computer device and storage medium
CN111581169A (en) * 2020-03-25 2020-08-25 中国平安人寿保险股份有限公司 Data import method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
纪元 ; 李飞 ; 王玮 ; .数据转换平台的设计与实现.福建电脑.2016,(第06期),全文. *

Also Published As

Publication number Publication date
CN115658978A (en) 2023-01-31

Similar Documents

Publication Publication Date Title
CN115658978B (en) Graph database system multi-source data importing method and device
CN111767143B (en) Transaction data processing method, device, equipment and system
US9426219B1 (en) Efficient multi-part upload for a data warehouse
US20180107725A1 (en) Data Storage Method and Apparatus, and Data Read Method and Apparatus
CN113490918A (en) Calling external functions from a data warehouse
US8577892B2 (en) Utilizing affinity groups to allocate data items and computing resources
CN108737176B (en) Data gateway control method, electronic equipment, storage medium and architecture
EP2564318A1 (en) Data center operation
CN115525631B (en) Database data migration method, device, equipment and storage medium
WO2021237630A1 (en) Multi-key-value command processing method and apparatus, and electronic device and storage medium
WO2017174013A1 (en) Data storage management method and apparatus, and data storage system
CN116069811B (en) Extending database external functions using user-defined functions
CN117296049A (en) Invoking external table functions
CN102882960A (en) Method and device for transmitting resource files
US20240220334A1 (en) Data processing method in distributed system, and related system
CN113722114A (en) Data service processing method and device, computing equipment and storage medium
US11048547B2 (en) Method and system for routing and executing transactions
US9684525B2 (en) Apparatus for configuring operating system and method therefor
JP5043166B2 (en) Computer system, data search method, and database management computer
US20170364293A1 (en) Method and apparatus for data processing
CN111767345B (en) Modeling data synchronization method, modeling data synchronization device, computer equipment and readable storage medium
CN112925766A (en) Data security management and control device, system, method and readable storage medium thereof
CN115113800A (en) Multi-cluster management method and device, computing equipment and storage medium
CN113760868B (en) Data processing method, device and storage service system
US12073263B1 (en) Dynamic processing of API requests

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant