CN115543561A - Method, device and system for importing data - Google Patents

Method, device and system for importing data Download PDF

Info

Publication number
CN115543561A
CN115543561A CN202211126909.2A CN202211126909A CN115543561A CN 115543561 A CN115543561 A CN 115543561A CN 202211126909 A CN202211126909 A CN 202211126909A CN 115543561 A CN115543561 A CN 115543561A
Authority
CN
China
Prior art keywords
imported
data
import
file
importing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211126909.2A
Other languages
Chinese (zh)
Inventor
黄德荣
朱祖恩
吴楠
张彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202211126909.2A priority Critical patent/CN115543561A/en
Publication of CN115543561A publication Critical patent/CN115543561A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device and a system for importing data, and relates to the technical field of big data access. One embodiment of the method comprises: corresponding import tasks can be constructed for a plurality of files to be imported; the method comprises the steps that a preset plurality of service processes are utilized to concurrently acquire and execute an import task, and data to be imported in a file to be imported are imported into a running target graph database; and re-importing the detected import abnormal data into the target graph database. By means of technical means of concurrently executing import tasks, abnormal data monitoring, re-import and the like on a running target graph database, flexibility and reliability of import data are improved, efficiency of import data is improved, and complexity of import data is reduced.

Description

Method, device and system for importing data
Technical Field
The invention relates to the technical field of big data access, in particular to a method, a device and a system for importing data.
Background
As the business complexity of internet applications is higher, the requirement for data processing is higher, for example, data of multiple data sources needs to be imported into a set database (e.g., a database) to provide corresponding services based on the data of the set database.
The existing method for importing data comprises the following steps: data import in a cold loading mode is carried out under the condition that a target graph database (such as a NEO4J graph database and the like) stops providing services, or a line-by-line import mode is adopted for each piece of data needing to be imported; therefore, the existing method for importing data has the problems of poor flexibility, low efficiency and high complexity.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, and a system for importing data, which can construct corresponding import tasks for multiple files to be imported; concurrently acquiring and executing an import task by utilizing a plurality of preset service processes, and importing data to be imported in a file to be imported into a running target graph database; and re-importing the detected import abnormal data into the target graph database. By means of technical means of concurrently executing import tasks, abnormal data monitoring, re-import and the like on a running target graph database, flexibility and reliability of data import are improved, efficiency of data import is improved, and complexity of data import is reduced.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of importing data, including: determining a plurality of files to be imported corresponding to data to be imported; the file to be imported is a node file associated with the target graph database or a relation file between nodes; constructing a corresponding import task for each file to be imported; concurrently acquiring and executing an import task by utilizing a plurality of preset service processes so as to import data to be imported in a file to be imported corresponding to the import task into a running target graph database; and detecting import abnormal data of the executed import task, and importing the detected import abnormal data into the target graph database again.
Optionally, the determining a plurality of files to be imported corresponding to data to be imported includes: acquiring one or more data files containing the data to be imported from a data source; the data file is any one of a node file or a relationship file between nodes; for each of the data files, performing: judging whether the data file meets a preset segmentation condition, if so, segmenting the data file into a plurality of files to be imported with set sizes; otherwise, directly taking the data file as a file to be imported.
Optionally, the dividing the data file into a plurality of files to be imported with set sizes includes: reading a segmentation strategy of a data file from a preset configuration file; and based on the segmentation strategy, segmenting the data file into a plurality of files to be imported.
Optionally, the concurrently acquiring an import task by using a plurality of preset service processes includes: and respectively searching the unexecuted import tasks by the plurality of idle service processes, and pulling the file to be imported corresponding to the unexecuted import tasks by the idle service processes under the searched condition.
Optionally, after the idle service process pulls a file to be imported corresponding to the unexecuted import task, the method further includes: acquiring one or more attribute fields of the file to be imported from the preset configuration file; the executing of the import task comprises: writing one or more attribute fields and the file identification of the file to be imported into a preset import module comprising an import instruction, so that the import module imports the attribute fields and the attribute values corresponding to the attribute fields in the file to be imported into the target graph database based on the one or more attribute fields and the file identification of the file to be imported.
Optionally, the data importing method further includes: acquiring import cycle configuration information corresponding to the executed import task from the preset configuration file; the importing module is used for importing the attribute fields and the attribute values corresponding to the attribute fields in the file to be imported into the target graph database based on the importing period configuration information.
Optionally, after the detecting import exception data of the executed import task, the method further includes: in the case that the number of the imported abnormal data is detected to exceed a set fault tolerance threshold, executing the step of re-importing the detected imported abnormal data into the target graph database; and under the condition that the quantity of the imported abnormal data is detected not to exceed a set fault tolerance threshold, feeding back a message indicating that the import is completed by the service process.
Optionally, when the import abnormal data is detected, recording each import abnormal data in which import abnormality occurs; the re-importing the detected import abnormal data into the target graph database comprises: for each piece of the imported abnormal data, executing: writing one or more attribute fields corresponding to the imported abnormal data into a preset splicing module comprising a splicing instruction, so that the splicing module re-imports the attribute fields and attribute values corresponding to the attribute fields in the imported abnormal data into the target graph database based on the one or more attribute fields.
Optionally, in a case that the number of the imported abnormal data is detected to exceed a set fault tolerance threshold, the method further includes: and sending early warning information corresponding to the abnormal data so that a data processing party processes the abnormal data.
Optionally, the importing, to a running target graph database, data to be imported from a file to be imported corresponding to the import task includes: importing one or more node files and relationship files among a plurality of nodes into a running target graph database, so that the target graph database constructs graph relationships among attribute values in the files to be imported on the basis of the one or more node files and the relationship files among the plurality of nodes.
To achieve the above object, according to a second aspect of an embodiment of the present invention, there is provided an apparatus for importing data, including: the method comprises the steps of obtaining a file module, a service module and an import module; wherein, the first and the second end of the pipe are connected with each other,
the file acquisition module is used for determining a plurality of files to be imported corresponding to the data to be imported; the file to be imported is a node file associated with the target graph database or a relation file between nodes;
the service module is used for constructing a corresponding import task for each file to be imported;
the importing module comprises a plurality of preset service processes, and utilizes the service processes to concurrently acquire and execute an importing task so as to import data to be imported in a file to be imported corresponding to the importing task into a running target graph database, detect import abnormal data of the executed importing task, and re-import the detected import abnormal data into the target graph database.
Optionally, the apparatus for importing data, configured to determine a plurality of files to be imported that correspond to the data to be imported, includes: acquiring one or more data files containing the data to be imported from a data source; the data file is any one of a node file or a relation file between nodes; for each of the data files, performing: judging whether the data file meets a preset segmentation condition, if so, segmenting the data file into a plurality of files to be imported with set sizes; otherwise, directly taking the data file as a file to be imported.
Optionally, the apparatus for importing data, configured to divide the data file into a plurality of files to be imported with set sizes, includes: reading a segmentation strategy of a data file from a preset configuration file; and based on the segmentation strategy, segmenting the data file into a plurality of files to be imported.
Optionally, the apparatus for importing data is configured to concurrently acquire an import task by using a plurality of preset service processes, and includes: and respectively searching the unexecuted import tasks by the plurality of idle service processes, and pulling the file to be imported corresponding to the unexecuted import tasks by the idle service processes under the searched condition.
Optionally, the apparatus for importing data, after the idle service process pulls a file to be imported corresponding to an unexecuted import task, further includes: acquiring one or more attribute fields of the file to be imported from the preset configuration file; the executing of the import task comprises the following steps: writing one or more attribute fields and the file identification of the file to be imported into a preset import module comprising an import instruction, so that the import module imports the attribute fields and the attribute values corresponding to the attribute fields in the file to be imported into the target graph database based on the one or more attribute fields and the file identification of the file to be imported.
Optionally, the apparatus for importing data is further configured to obtain, from the preset configuration file, import cycle configuration information corresponding to the execution of the unexecuted import task; the importing module is used for importing the attribute fields and the attribute values corresponding to the attribute fields in the file to be imported into the target graph database based on the importing period configuration information.
Optionally, the apparatus for importing data, after the detecting import abnormal data of the executed import task, further includes: in the case that the number of the imported abnormal data exceeds a set fault tolerance threshold value, executing the step of re-importing the detected imported abnormal data into the target graph database; and under the condition that the quantity of the imported abnormal data is detected not to exceed a set fault tolerance threshold, feeding back a message indicating that the import is completed by the service process.
Optionally, the data importing device is configured to record each piece of import abnormal data in which an import exception occurs when the import abnormal data is detected; the re-importing the detected import abnormal data into the target graph database comprises: and aiming at each piece of import abnormal data, executing: writing the one or more attribute fields corresponding to the imported abnormal data into a preset splicing module comprising a splicing instruction, so that the splicing module re-imports the attribute fields and the attribute values corresponding to the attribute fields in the imported abnormal data into the target graph database based on the one or more attribute fields.
Optionally, the apparatus for importing data, configured to, in a case that it is detected that the quantity of the imported abnormal data exceeds a set fault tolerance threshold, further include: and sending early warning information corresponding to the abnormal data so that a data processing party processes the abnormal data.
Optionally, the apparatus for importing data is configured to import data to be imported from a file to be imported corresponding to the import task into an operating target graph database, and includes: importing one or more node files and relationship files among a plurality of nodes into a running target graph database, so that the target graph database constructs graph relationships among attribute values in the files to be imported on the basis of the one or more node files and the relationship files among the plurality of nodes.
To achieve the above object, according to a third aspect of embodiments of the present invention, there is provided a system for importing data, including: the apparatus for importing data of the second aspect, and one or more target graph databases; and importing the data to be imported in the files to be imported into the running target graph database by using the data importing device.
In order to achieve the above object, according to a fourth aspect of embodiments of the present invention, there is provided an electronic device for importing data, comprising: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method as in any one of the above methods of importing data.
To achieve the above object, according to a fifth aspect of the embodiments of the present invention, there is provided a computer readable medium on which a computer program is stored, wherein the program is configured to implement the method as any one of the methods of importing data as described above when the program is executed by a processor.
To achieve the above object, according to a sixth aspect of the embodiments of the present invention, there is also provided a computer program product including a computer program, where the computer program is configured to, when executed by a processor, implement any one of the above methods for importing data.
One embodiment of the above invention has the following advantages or benefits: corresponding import tasks can be constructed for a plurality of files to be imported; the method comprises the steps that a preset plurality of service processes are utilized to concurrently acquire and execute an import task, and data to be imported in a file to be imported are imported into a running target graph database; and re-importing the detected import abnormal data into the target graph database. By means of technical means of concurrently executing import tasks, abnormal data monitoring, re-import and the like on a running target graph database, flexibility and reliability of import data are improved, efficiency of import data is improved, and complexity of import data is reduced.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a flowchart illustrating a method for importing data according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating importing data according to another embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for importing data according to an embodiment of the present invention;
FIG. 4 is a block diagram illustrating a system for importing data according to an embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness; according to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.
As shown in fig. 1, an embodiment of the present invention provides a method for importing data, which may include the following steps:
step S101: determining a plurality of files to be imported corresponding to data to be imported; the file to be imported is a node file associated with the target graph database or a relation file between nodes.
Specifically, different kinds of databases contain corresponding methods for importing data, different importing methods all process data to be imported, and the data to be imported exists in multiple files to be imported, for example: file1.Csv, file2.Dat, \8230andfile n. Xx waits for the file to be imported. The file format and the file identification of the file to be imported are not limited. In the case of importing data into a target graph database (e.g., a NEO4J graph database, etc.), the files to be imported may include node files associated with the target graph data, and relationship files between nodes; it is understood that the node files and the relationship files between the nodes correspond to a plurality of data values in the graph database and an association relationship between a plurality of data, and the data contained in the node files and the data contained in the relationship files are required to be imported into the target graph database.
The file to be imported is generated according to data contained in the data source, for example, the file to be imported corresponding to the data source 1 is file1.Csv (for example, a node file), and the like; namely, the determining a plurality of files to be imported corresponding to data to be imported includes: acquiring one or more data files containing the data to be imported from a data source; the data file is any one of a node file or a relationship file between nodes;
the method for determining the data file as the file to be imported comprises the following steps:
the first method comprises the following steps: and directly taking one or more acquired data files (such as node files or relationship files between nodes) as files to be imported.
Preferably, the second method: for each data file (such as a node file or a relationship file between nodes), executing to judge whether the data file meets a preset segmentation condition, and if so, segmenting the data file into a plurality of files to be imported with set sizes; otherwise, directly taking the data file as a file to be imported.
Specifically, in the case that the data size of the data to be imported is large, in order to improve the efficiency of importing the data, the tape import file including the data to be imported is divided into a plurality of tape import files with set sizes. Whether the data file meets the preset splitting condition or not may be that the number of data pieces contained in the data file exceeds a set data threshold, for example: the quantity threshold is 20 ten thousand; the number of data contained in the file1.Csv is 2000 ten thousand, namely the preset segmentation condition is met; for example: the file1.Csv contains 2000 ten thousand pieces of data, and the file1.Csv is divided into 100 files according to a division strategy, wherein each file contains 20 ten thousand pieces of data; namely, the dividing the data file into a plurality of files to be imported with set sizes includes: reading a segmentation strategy of a data file from a preset configuration file; and based on the segmentation strategy, segmenting the data file into a plurality of files to be imported. The configuration information corresponding to the slicing policy may be configured in a preset configuration file, for example, setting. split _ rows _ size =200000; representing that the current file1.Csv is divided into multiple files by one file every 20 ten thousand data. It can be understood that the preset configuration file may also configure a directory where the file to be imported is located, for example:
csv_save_dir='home/ap/nas/abc/import'
the file to be imported can be automatically obtained according to the directory, and the data size corresponding to the application scene is obtained; the period of processing the file to be imported may be set in a setting file, for example: and reading the file to be imported generated every month to execute import, or reading the file to be imported generated every 10 days to execute import, and the like. The files are segmented through the segmentation strategy, the segmented files can be imported in a multi-process concurrent mode, and the efficiency of importing data is further improved.
Step S102: and constructing a corresponding import task for each file to be imported.
Specifically, an import task is constructed for each file to be imported, wherein the file to be imported may be a directly acquired data file or a data file after being processed; the Master node executes the segmentation of the strip import file and the construction of the corresponding import task; for example: the Matser node generates import tasks A, B and C and the like, and waits for a plurality of Slave nodes (such as service processes) to process.
Step S103: concurrently acquiring and executing an import task by utilizing a plurality of preset service processes so as to import data to be imported in a file to be imported corresponding to the import task into a running target graph database; and detecting import abnormal data of the executed import task, and importing the detected import abnormal data into the target graph database again.
Specifically, concurrently executing an import task, that is, concurrently executing a plurality of import tasks by using a plurality of preset service processes; before concurrently acquiring and executing an import task by using a plurality of preset service processes, the service process may actively search and pull the import task, that is, concurrently acquiring the import task by using the plurality of preset service processes includes: and respectively searching the unexecuted import tasks by a plurality of idle service processes, and pulling the file to be imported corresponding to the unexecuted import task by the idle service processes under the searched condition. By the method, the resource allocation among a plurality of concurrent service processes can be realized, the efficiency of importing data is improved, and the resource allocation of the imported data is optimized.
Further, after the idle service process pulls the file to be imported corresponding to the unexecuted import task, an import operation is executed.
In one embodiment of the present invention, a method for executing an import task includes: after the idle service process pulls the file to be imported corresponding to the unexecuted import task, the method further comprises the following steps: acquiring one or more attribute fields of the file to be imported from the preset configuration file; the executing of the import task comprises: writing one or more attribute fields and the file identification of the file to be imported into a preset import module comprising an import instruction, so that the import module imports the attribute fields and the attribute values corresponding to the attribute fields in the file to be imported into the target graph database based on the one or more attribute fields and the file identification of the file to be imported. Taking a target graph database as a NEO4J graph database as an example, an import module comprising an import instruction is described below, wherein the import instruction may comprise Load csv, create and the like; the data associated with the import instruction comprises one or more of the attribute fields, corresponding attribute values, file identification (such as file1. Csv) of the file to be imported and the like; the attribute field may be automatically obtained from a preset configuration file, for example: the schematic configuration in the configuration file is as follows, table _ columns = { 'abcd': [ 'a', 'b', 'c' \ 8230 }; wherein, 'abcd' represents the data table identifier to which the attribute field corresponding to file1.Csv belongs, 'a', 'b', 'c' and the like represent a plurality of attribute fields contained in the data table; and importing the attribute fields and the attribute values corresponding to the attribute fields in the file to be imported into the target graph database by utilizing an import module based on one or more attribute fields and the file identification of the file to be imported. It can be understood that one file to be imported may contain data of multiple data tables, and this may be implemented by configuring attribute fields of multiple data tables in a preset configuration file, and correspondingly adding codes for processing multiple data tables in the import module.
Furthermore, preferably, the configuration information of the import cycle corresponding to the execution of the unexecuted import task is acquired from the preset configuration file; the importing module is used for importing the attribute fields and the attribute values corresponding to the attribute fields in the file to be imported into the target graph database based on the importing period configuration information. For example: the import period configuration information is Using period commit 1000, which represents one commit (commit) every 1000 lines of data when the import is performed by the import module.
It can be understood that, for a data table to be imported, there are cases where only the attribute values of some attribute fields are changed within a set time range, such as attribute fields related to numerical values; therefore, not only all attribute fields of one data table can be configured in a preset configuration file, but also information of the attribute fields which are changed along with the change of time can be configured for the data table, and when data is imported, only the field values of the changed attribute fields can be selected to be imported into the target map database for the data table, so that the data in the target map database can be updated; the efficiency of importing data is further improved, and resource consumption is saved by reducing the data volume of the imported data.
Therefore, the generated codes can be used for importing the files to be imported corresponding to any one or more data sources by combining the preset configuration files with the import module, only the configuration information needs to be modified, the codes do not need to be modified, the processing universality and flexibility of the imported data are improved, and the data import efficiency is improved. And the importing module is used for importing the data to be imported into the running target graph database (namely, the database service does not need to be stopped), the technical defect that the data can be imported only after the database service is stopped is overcome, the flexibility of importing the data is further improved, and the reliability and the efficiency of providing the data service by the target graph database are improved.
Further, in order to ensure the integrity and stability of the imported data, embodiments of the present invention provide a fault-tolerant method for import exceptions, where the reason for the import exceptions may be timeout of an import task due to performance fluctuation of a server, unstable network connection, and the like; in this case, it is determined that there is abnormal imported data. The condition for judging the abnormality may be set by a configuration file, for example: tt _ load _ csv _ secs =600 represents that the set timeout threshold is 600 seconds; when the import operation exceeds the timeout threshold and is not completed, it is determined that an import exception has occurred. The import exception data is data that should be imported when an exception occurs in the import task, but is not successfully imported due to the exception.
After the detecting import exception data of the executed import task, the method further comprises: in the case that the number of the imported abnormal data exceeds a set fault tolerance threshold value, executing the step of re-importing the detected imported abnormal data into the target graph database; and under the condition that the quantity of the imported abnormal data is detected not to exceed a set fault tolerance threshold, feeding back a message indicating that the import is completed by the service process. The method comprises the steps of acquiring the quantity of imported data corresponding to import abnormity under the condition that the abnormity is determined, executing a re-import step under the condition that the quantity of the imported abnormal data exceeds a set fault tolerance threshold, otherwise, judging that the import is finished, and feeding back a message indicating that the import is finished by a service process. Wherein, the set fault tolerance threshold value can be configured in a preset configuration file; for example: setting tt _ error _ rows =100, which represents setting the fault tolerance threshold value as 100; in the case where the number of pieces of import abnormal data exceeds 100, the step of re-importing the detected import abnormal data into the target map database is performed. The invention does not limit the value and form of the set fault tolerance threshold.
Further, in order to execute the step of re-importing, when the imported abnormal data is detected, recording each piece of imported abnormal data in which an imported abnormality occurs; the re-importing the detected import abnormal data into the target graph database comprises: for each piece of the imported abnormal data, executing: writing one or more attribute fields corresponding to the imported abnormal data into a preset splicing module comprising a splicing instruction, so that the splicing module re-imports the attribute fields and attribute values corresponding to the attribute fields in the imported abnormal data into the target graph database based on the one or more attribute fields.
Specifically, each piece of import exception data in which an import exception occurs is recorded, and for example, the content (attribute value corresponding to an attribute field, and the like) and the number of the import exception data, the time when the exception occurs, a data table to which the exception data belongs, and the like are saved in a file or a database, so that a subsequent fault-tolerant operation is performed based on the record. Preferably, when a timeout occurs in importing a certain import file, the task (or the process of running the task) can be automatically ended, so that the task can be prevented from being blocked, information related to the import file can be recorded, and other import files can be executed continuously by skipping the file.
The preset splicing module including splicing instructions is described below by taking a target graph database as an NEO4J graph database as an example, wherein the splicing instructions may include Merge, on Create, on Match, return, and the like; the data related to the splicing instruction comprises an attribute field, an attribute value corresponding to the attribute field and the like, wherein the attribute field (such as Property1, property2 and the like) is acquired from a preset configuration file.
Further, preferably, in the case that the number of the imported abnormal data is detected to exceed a set fault tolerance threshold, the method further includes: and sending early warning information corresponding to the abnormal data so that a data processing party processes the abnormal data. Specifically, the Master node may be utilized to provide early warning information associated with the abnormal data, such as: sending early warning information through short messages, providing early warning information by using pages and the like; to enable the data handler to handle the exception data, for example: the abnormal data is analyzed to determine whether or not the operation for the abnormal data re-import has succeeded, or the like, or to determine the cause of the abnormality, or the like.
In an embodiment of the present invention, a target graph database in operation is taken as an example of a NEO4J graph database, and in the prior art, a method for importing data into the graph database NEO4J includes: by using the tools such as Batch _ Inserter, batch _ Import, neo4j _ Import, and the like, in all of the ways, the database needs to be in a stopped state, that is, a state in which data service cannot be provided, in the process of importing data. Or, the method is used for importing data item by utilizing a Create statement to a database in operation, and has the problem of low efficiency under the condition of large data volume; in view of this, the embodiment of the present invention may determine a plurality of files to be imported corresponding to data to be imported, where the files to be imported are node files or relationship files between nodes; constructing a corresponding import task for each file to be imported; concurrently acquiring and executing an import task by utilizing a plurality of preset service processes so as to import data to be imported in a file to be imported corresponding to the import task into a running target graph database; and detecting import abnormal data of the executed import task, and importing the detected import abnormal data into the target graph database again.
Namely, importing the data to be imported in the file to be imported corresponding to the import task into the running target graph database, comprising: importing one or more node files and a plurality of relationship files between nodes into a graph database, so that the graph database constructs a graph relationship between attribute values in the files to be imported based on the one or more node files and the relationship files between the nodes. The attribute fields of the nodes and the relationship between the nodes may be configured by a preset configuration file, for example: configuring a data table corresponding to the setting node123 by using node = { ' node123' { ' table ': NEO4J table name ', ' primary keys ': primary key list ] } and a plurality of pieces of primary key information contained in an attribute field in the data table; relationship fact = { relationship name: { ' ' reducict ': { 'table' 1',' relname 'collaboration', }, 'node1 dit' { 'node 1', 'pkey': primary key list '},' node2 dit '{' node2',' pkey ': primary key list' } represents configuration information that sets a relationship (e.g., configuration as collaboration, etc.) between two nodes. Further, according to configuration information of a configuration file, attribute values contained in one or more node files and relationship attribute values contained in relationship files among a plurality of nodes are imported into a graph database (for example, a NEO4J graph database), so that the graph database constructs a graph relationship among the attribute values in the file to be imported based on the one or more node files and the relationship files among the plurality of nodes.
As can be seen from the steps of step S101 to step S103, the embodiment of the present invention improves the efficiency and stability of importing data by concurrently processing tasks, splitting files, and multiple import instructions (e.g., a combination of load _ csv, create, and other instructions), overcomes the timeout problem that a single task causes a large frequency when importing a large data volume file to a large extent, and further performs a fault tolerance operation to improve the integrity and high availability of the import data when detecting that an import exception exists. In addition, the configuration information is combined with the import module and the splicing module, so that a data party unfamiliar with an execution instruction of the target graph database executes the imported data by using the method provided by the embodiment of the invention, the efficiency of importing the data is improved, the complexity of importing the data is reduced, and the labor cost and the time cost of importing the data are reduced.
As shown in fig. 2, an embodiment of the present invention provides a method for importing data, which may include the following steps:
step S201: determining a plurality of files to be imported corresponding to data to be imported, including: and acquiring one or more data files containing the data to be imported from a data source, wherein the data files are any one of node files or relationship files among nodes.
Specifically, the description of obtaining one or more data files containing the data to be imported from the data source is consistent with the description in step S101, and is not repeated here.
Step S202: using the Master node: judging whether the data file meets a preset segmentation condition, if so, segmenting the data file into a plurality of files to be imported with set sizes; otherwise, directly taking the data file as a file to be imported.
Specifically, the names of the Master node and the Slave node in the embodiment of the present invention are only examples, and are used to distinguish executed services, and the Master node and the Slave node may operate in the same physical machine, or may operate in multiple physical devices, respectively.
Step S203: and searching and pulling the file to be imported of the unexecuted import task from the Master node by using the service processes in the plurality of Slave nodes, and executing the import task. And judging whether the import is abnormal or not, and if so, executing the step of importing the abnormal data into the target graph database again.
Specifically, when any service process detects that an import task is completed, a completed result is fed back to the Master node, so that the Master node updates a task state to mark an unexecuted import task, and thus a service process in a plurality of Slave nodes (for example, the Slave node1, the Slave node2 \8230; the Slave node N, and the like shown in fig. 2) pulls the unexecuted import task from the Master node by using a preemptive method. The Slave node is a plurality of idle service processes, that is, a plurality of unexecuted import tasks are respectively searched by using the idle service processes, and under the condition of searching, the idle service processes pull files to be imported corresponding to the unexecuted import tasks.
Therefore, the Master node automatically determines to import the tasks, dynamically pulls the unfinished tasks by the plurality of Slave nodes and executes the tasks; therefore, the efficiency and the degree of automation of data importing are improved to a greater extent.
Further, when executing the import task, each Slave node determines whether an exception exists in the import, if so, a step of re-importing the exception data into the target graph database is executed, and the step of re-importing the exception data into the target graph database is consistent with the description of step S103, and is not described herein again.
Step S204: and importing the data to be imported into a running target graph database so that the target graph database constructs a graph relation between attribute values in the files to be imported on the basis of one or more node files and the relation files among a plurality of nodes.
Specifically, an import task is concurrently executed by using service processes in a plurality of Slave nodes, so that data to be imported is imported into a running target graph database, and the target graph database constructs a graph relation between attribute values in files to be imported based on one or more node files and the relation files among a plurality of nodes.
As shown in fig. 3, an embodiment of the present invention provides an apparatus 300 for importing data, including: a file obtaining module 301, a service module 302 and an importing module 303; wherein, the first and the second end of the pipe are connected with each other,
the file obtaining module 301 is configured to determine a plurality of files to be imported, where the files to be imported correspond to data to be imported; the file to be imported is a node file or a relation file between nodes;
the service module 302 is configured to construct a corresponding import task for each file to be imported;
the importing module 303 includes a plurality of preset service processes, and concurrently acquires and executes an importing task by using the plurality of service processes to import data to be imported from a file to be imported corresponding to the importing task into an operating target map database, detect import abnormal data of the executed importing task, and re-import the detected import abnormal data into the target map database.
As shown in fig. 4, an embodiment of the present invention provides a system 400 for importing data, including: the device 300 for importing data, and one or more target graph databases; wherein, the data to be imported in the file to be imported is imported into the running target graph database by the data importing device 300.
An embodiment of the present invention further provides an electronic device for importing data, including: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the method provided by any one of the above embodiments.
Embodiments of the present invention further provide a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method provided in any of the above embodiments.
An embodiment of the present invention further provides a computer program product, including a computer program, where the program is executed by a processor to implement any one of the methods described above.
Fig. 5 illustrates an exemplary system architecture 500 of a method of importing data or an apparatus for importing data to which an embodiment of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the terminal devices 501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The terminal devices 501, 502, 503 may have various client applications installed thereon, such as an e-mall client application, a web browser application, a search-type application, an instant messaging tool, a mailbox client, and the like.
The terminal devices 501, 502, 503 may be various electronic devices having display screens and supporting various client applications, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server providing support for client applications used by users with the terminal devices 501, 502, 503. The background management server can process the received data importing request and feed back the data importing result to the terminal equipment.
It should be noted that, the method for importing data provided by the embodiment of the present invention is generally executed by the server 505; the means for importing data is typically located in the server 505.
It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present invention.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that the computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules and/or units described in the embodiments of the present invention may be implemented by software, and may also be implemented by hardware. The described modules and/or units may also be provided in a processor, and may be described as: a processor includes a get file module, a service module, and an import module. The names of these modules do not in some cases constitute a limitation on the modules themselves, and for example, the file acquiring module may also be described as a "module that determines a plurality of files to be imported corresponding to data to be imported".
As another aspect, the present invention also provides a computer program product, which when executed by a processor implements the method of importing data according to an embodiment of the present invention.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: determining a plurality of files to be imported corresponding to data to be imported; the file to be imported is a node file associated with the target graph database or a relation file between nodes; constructing a corresponding import task for each file to be imported; concurrently acquiring and executing an import task by utilizing a plurality of preset service processes so as to import data to be imported in a file to be imported corresponding to the import task into the running target graph database; and detecting import abnormal data of the executed import task, and importing the detected import abnormal data into the target graph database again.
According to the embodiment of the invention, corresponding import tasks can be constructed for a plurality of files to be imported; concurrently acquiring and executing an import task by utilizing a plurality of preset service processes, and importing data to be imported in a file to be imported into a running target graph database; and re-importing the detected import abnormal data into the target graph database. By means of technical means of concurrently executing import tasks, abnormal data monitoring, re-import and the like on a running target graph database, flexibility and reliability of data import are improved, efficiency of data import is improved, and complexity of data import is reduced.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (19)

1. A method of importing data, comprising:
determining a plurality of files to be imported corresponding to data to be imported; the file to be imported is a node file associated with the target graph database or a relation file between nodes;
constructing a corresponding import task for each file to be imported;
concurrently acquiring and executing an import task by utilizing a plurality of preset service processes so as to import data to be imported in a file to be imported corresponding to the import task into the running target graph database; and detecting import abnormal data of the executed import task, and importing the detected import abnormal data into the target graph database again.
2. The method of claim 1,
the determining a plurality of files to be imported corresponding to the data to be imported comprises:
acquiring one or more data files containing the data to be imported from a data source, wherein the data files are any one of node files or relationship files among nodes;
for each of the data files, performing:
judging whether the data file meets a preset segmentation condition, if so, segmenting the data file into a plurality of files to be imported with set sizes; otherwise, directly taking the data file as a file to be imported.
3. The method of claim 2,
the dividing the data file into a plurality of files to be imported with set sizes comprises:
reading a segmentation strategy of a data file from a preset configuration file;
and cutting the data file into a plurality of files to be imported based on the cutting strategy.
4. The method of claim 1,
the method for concurrently acquiring the import task by utilizing the preset multiple service processes comprises the following steps:
and respectively searching the unexecuted import tasks by a plurality of idle service processes, and pulling the file to be imported corresponding to the unexecuted import task by the idle service processes under the searched condition.
5. The method of claim 4,
after the idle service process pulls the file to be imported corresponding to the unexecuted import task, the method further comprises the following steps:
acquiring one or more attribute fields of the file to be imported from the preset configuration file;
the executing of the import task comprises:
writing one or more attribute fields and the file identification of the file to be imported into a preset import module comprising an import instruction, so that the import module imports the attribute fields and the attribute values corresponding to the attribute fields in the file to be imported into the target graph database based on the one or more attribute fields and the file identification of the file to be imported.
6. The method of claim 5, further comprising:
acquiring import cycle configuration information corresponding to the executed import task from the preset configuration file;
the importing module is used for importing the attribute fields and the attribute values corresponding to the attribute fields in the file to be imported into the target graph database based on the importing period configuration information.
7. The method of claim 1,
after the detecting import exception data of the executed import task, the method further includes:
in the case that the number of the imported abnormal data is detected to exceed a set fault tolerance threshold, executing the step of re-importing the detected imported abnormal data into the target graph database;
and under the condition that the quantity of the imported abnormal data is detected not to exceed a set fault tolerance threshold, feeding back a message indicating that the import is completed by the service process.
8. The method of claim 7, further comprising:
when the imported abnormal data is detected, recording each piece of imported abnormal data with imported abnormality;
the re-importing the detected import abnormal data into the target graph database comprises:
for each piece of the imported abnormal data, executing:
writing the one or more attribute fields corresponding to the imported abnormal data into a preset splicing module comprising a splicing instruction, so that the splicing module re-imports the attribute fields and the attribute values corresponding to the attribute fields in the imported abnormal data into the target graph database based on the one or more attribute fields.
9. The method of claim 7,
in the case that the number of the imported abnormal data is detected to exceed a set fault tolerance threshold, the method further includes:
and sending early warning information corresponding to the abnormal data so that a data processing party processes the abnormal data.
10. The method of claim 1,
the importing the data to be imported in the file to be imported corresponding to the import task into the running target graph database comprises the following steps:
importing one or more node files and relationship files among a plurality of nodes into a running target graph database, so that the target graph database constructs a graph relationship among attribute values in the files to be imported on the basis of the one or more node files and the relationship files among the plurality of nodes.
11. An apparatus for importing data, comprising: the method comprises the steps of obtaining a file module, a service module and an import module; wherein the content of the first and second substances,
the file acquisition module is used for determining a plurality of files to be imported corresponding to the data to be imported; the file to be imported is a node file associated with the target graph database or a relation file between nodes;
the service module is used for constructing a corresponding import task for each file to be imported;
the importing module comprises a plurality of preset service processes, and utilizes the service processes to concurrently acquire and execute an importing task so as to import data to be imported in a file to be imported corresponding to the importing task into a running target graph database, detect import abnormal data of the executed importing task, and re-import the detected import abnormal data into the target graph database.
12. The apparatus of claim 11,
the file obtaining module determines a plurality of files to be imported corresponding to the data to be imported, and the file obtaining module comprises:
acquiring one or more data files containing the data to be imported from a data source, wherein the data files are any one of node files or relationship files among nodes;
for each of the data files, performing:
judging whether the data file meets a preset segmentation condition, if so, segmenting the data file into a plurality of files to be imported with set sizes; otherwise, directly taking the data file as a file to be imported.
13. The apparatus of claim 11,
the method comprises the following steps that the import module utilizes a plurality of preset service processes to concurrently acquire an import task, and the import task comprises the following steps:
and respectively searching the unexecuted import tasks by a plurality of idle service processes, and pulling the file to be imported corresponding to the unexecuted import task by the idle service processes under the searched condition.
14. The apparatus of claim 13,
after the idle service process pulls the file to be imported corresponding to the unexecuted import task, the import module further includes:
acquiring one or more attribute fields of the file to be imported from the preset configuration file;
the executing of the import task comprises:
writing one or more attribute fields and the file identification of the file to be imported into a preset import module comprising an import instruction, so that the import module imports the attribute fields and the attribute values corresponding to the attribute fields in the file to be imported into the target graph database based on the one or more attribute fields and the file identification of the file to be imported.
15. The apparatus of claim 11,
after the detecting the import abnormal data of the executed import task, the import module further includes:
in the case that the number of the imported abnormal data exceeds a set fault tolerance threshold value, executing the step of re-importing the detected imported abnormal data into the target graph database;
and under the condition that the quantity of the imported abnormal data is detected not to exceed a set fault tolerance threshold, feeding back a message indicating that the import is completed by the service process.
16. A system for importing data, comprising: the apparatus for importing data as recited in claim 11, and one or more target graph databases; and importing the data to be imported in the file to be imported into a running target graph database by using the data importing device.
17. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-10.
18. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-10.
19. A computer program product comprising a computer program, characterized in that the program realizes the method according to any one of claims 1-10 when executed by a processor.
CN202211126909.2A 2022-09-16 2022-09-16 Method, device and system for importing data Pending CN115543561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211126909.2A CN115543561A (en) 2022-09-16 2022-09-16 Method, device and system for importing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211126909.2A CN115543561A (en) 2022-09-16 2022-09-16 Method, device and system for importing data

Publications (1)

Publication Number Publication Date
CN115543561A true CN115543561A (en) 2022-12-30

Family

ID=84728378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211126909.2A Pending CN115543561A (en) 2022-09-16 2022-09-16 Method, device and system for importing data

Country Status (1)

Country Link
CN (1) CN115543561A (en)

Similar Documents

Publication Publication Date Title
CN108874558B (en) Message subscription method of distributed transaction, electronic device and readable storage medium
CN111190888A (en) Method and device for managing graph database cluster
CN107807815B (en) Method and device for processing tasks in distributed mode
CN112148711A (en) Processing method and device for batch processing tasks
CN111427899A (en) Method, device, equipment and computer readable medium for storing file
CN111435329A (en) Automatic testing method and device
CN111767126A (en) System and method for distributed batch processing
CN110941658A (en) Data export method, device, server and storage medium
CN116049142A (en) Data processing method, device, electronic equipment and storage medium
CN111723063A (en) Method and device for processing offline log data
CN110806967A (en) Unit testing method and device
CN110688355A (en) Method and device for changing container state
CN115543561A (en) Method, device and system for importing data
CN110543520B (en) Data migration method and device
CN114064803A (en) Data synchronization method and device
CN113760600A (en) Database backup method, database restoration method and related device
CN111178014A (en) Method and device for processing business process
CN112671877A (en) Data processing method and device
CN113779048A (en) Data processing method and device
CN112799863A (en) Method and apparatus for outputting information
CN112732728A (en) Data synchronization method and system
CN112749204A (en) Method and device for reading data
CN113760860B (en) Data reading method and device
CN110858240A (en) Front-end module loading method and device
CN113268417B (en) Task execution method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination