CN111897808B - Data processing method and device, computer equipment and storage medium - Google Patents

Data processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111897808B
CN111897808B CN202010683655.9A CN202010683655A CN111897808B CN 111897808 B CN111897808 B CN 111897808B CN 202010683655 A CN202010683655 A CN 202010683655A CN 111897808 B CN111897808 B CN 111897808B
Authority
CN
China
Prior art keywords
data
processed
database
gallery
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010683655.9A
Other languages
Chinese (zh)
Other versions
CN111897808A (en
Inventor
曹牧年
徐志欣
李国海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Financial Technology Nanjing Co Ltd
Original Assignee
Suning Financial Technology Nanjing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Financial Technology Nanjing Co Ltd filed Critical Suning Financial Technology Nanjing Co Ltd
Priority to CN202010683655.9A priority Critical patent/CN111897808B/en
Publication of CN111897808A publication Critical patent/CN111897808A/en
Application granted granted Critical
Publication of CN111897808B publication Critical patent/CN111897808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: synchronizing the data to be processed determined by the data warehouse to a database by using a data synchronization tool; acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to a message middleware; and acquiring the data to be processed from the message middleware, converting the data to be processed into gallery data in a target format according to the map relation, and writing the gallery data into a target map database. The invention stores big data of the map relation through the map database, realizes the timely update of the data, thereby realizing the near-real-time online search of the map relation data and improving the expansibility and the performance in a changeable service scene.

Description

Data processing method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, a computer device, and a storage medium.
Background
With the wide application of the internet, the continuous development of artificial intelligence is promoted. The artificial intelligence greatly improves the social production efficiency and releases people from heavy and repeated work. Knowledge maps on which artificial intelligence depends are more and more concerned by various industries. However, to construct a perfect knowledge map of an industry or business scenario, a huge amount of data and relationship data between data need to be stored. In the field of big data, the construction scheme of the bottom data platform of the knowledge graph generally adopts a neo4j graph library at present, but neo4j does not support distributed calculation and storage, so that on one hand, when the data volume is large, the storage can reach the upper limit of a single machine. On the other hand, the computation is limited to the resources of a single machine when multi-dimensional query is performed. Although the DGraph distributed native graph database can solve the distribution problem, the single-machine disk storage problem, the performance problem of query computation, and the like, there is no related tool for processing incremental data import in the prior art, so that the problem of incremental data import cannot be solved.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a data processing method, an apparatus, a computer device, and a storage medium, so as to overcome the problem that there is no related tool for processing incremental data import in the prior art, and the incremental data import cannot be solved.
In order to solve one or more technical problems, the invention adopts the technical scheme that:
in a first aspect, a data processing method is provided, which includes the following steps:
synchronizing the data to be processed determined by the data warehouse to a database by using a data synchronization tool;
acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to a message middleware;
and acquiring the data to be processed from the message middleware, converting the data to be processed into gallery data in a target format according to the map relation, and writing the gallery data into a target map database.
Further, the data to be processed includes incremental data, and the method further includes a process of determining, by the data warehouse, the data to be processed, including:
and receiving the service data, and comparing the service data with the corresponding original data to determine incremental data.
Further, the service data at least comprises one of data entered by a service party, data acquired by using a crawler technology and data acquired from other data sources.
Further, the data to be processed determined by the data warehouse is synchronized to the database by using a data synchronization tool:
and regularly extracting the data to be processed in the data warehouse by using a data synchronization tool, cleaning the data to be processed according to a preset data model, and writing the cleaned data to be processed into a data table corresponding to the database.
Further, the obtaining the data to be processed from the database by using a preset tool and pushing the data to be processed to the message middleware comprises:
and the preset tool sends a data request to the database, receives to-be-processed data returned by the database according to the data request, and pushes the to-be-processed data to a message middleware.
Further, the writing the gallery data to the target gallery data includes:
and acquiring information of all available machines in the target graph database, and determining a target machine for executing the writing operation of the graph database according to a preset rule.
In a second aspect, there is provided a data processing apparatus, the apparatus comprising:
the data synchronization module is used for synchronizing the data to be processed determined by the data warehouse to the database by using a data synchronization tool;
the data forwarding module is used for acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to the message middleware;
the data conversion module is used for acquiring the data to be processed from the message middleware and converting the data to be processed into the gallery data in the target format according to the map relation;
and the data writing module is used for writing the gallery data into the target gallery database.
Further, the apparatus further comprises:
and the data comparison module is used for receiving the service data, comparing the service data with the corresponding original data and determining the incremental data.
In a third aspect, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the following steps are implemented:
synchronizing the data to be processed determined by the data warehouse to a database by using a data synchronization tool;
acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to a message middleware;
and acquiring the data to be processed from the message middleware, converting the data to be processed into gallery data in a target format according to the map relation, and writing the gallery data into a target map database.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
synchronizing the data to be processed determined by the data warehouse to a database by using a data synchronization tool;
acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to a message middleware;
and acquiring the data to be processed from the message middleware, converting the data to be processed into gallery data in a target format according to the map relation, and writing the gallery data into a target map database.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
according to the data processing method, the data processing device, the computer equipment and the storage medium provided by the embodiment of the invention, the data to be processed determined by the data warehouse is synchronized to the database by using the data synchronization tool, the data to be processed is obtained from the database by using the preset tool and is pushed to the message middleware, the data to be processed is obtained from the message middleware, the data to be processed is converted into the gallery data in the target format according to the gallery relationship, and the gallery data is written into the target map database, so that the timely update of the data is realized, the online search of the near-real-time gallery relationship data is realized, and the expansibility and the performance in a changeable service scene are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating the processing of pending data into a graph database in accordance with an exemplary embodiment;
FIG. 2 is a schematic diagram illustrating the acquisition of data to be processed in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram illustrating a data processing method according to an exemplary embodiment;
FIG. 4 is a block diagram of a data processing apparatus according to an example embodiment;
FIG. 5 is a schematic diagram of an internal structure of a computer device shown in accordance with an exemplary embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, the data platform of the knowledge graph in each industry mainly has the following requirements:
(1) Large data volume, distributed computing, distributed storage
(2) Real-time nature of dynamic incremental data updates
(3) Algorithmic processing and data aggregation proximate to business scenarios
Example one
Specifically, in view of the above requirement points, referring to fig. 1, the scheme of the present invention may be implemented by the following steps:
step one, synchronizing the data to be processed determined by the data warehouse to a database by using a data synchronization tool.
Specifically, in the embodiment of the present invention, the business data generated in each business scenario is first stored in a data warehouse (e.g., hive), the incremental data corresponding to each business data is determined as to-be-processed data by the data warehouse, and then the to-be-processed data is synchronized in the database by using a data synchronization tool, where the database includes, but is not limited to, relational databases such as Mysql. Data synchronization tools include, but are not limited to, ETL tools, which are used to describe the process of extracting (extract), converting (transform), and loading (load) data from a source (including, but not limited to, a data warehouse in this embodiment) to a target (including, but not limited to, a database in this embodiment). ETL is more commonly used in data warehouses, but its objects are not limited to data warehouses.
In a specific implementation, as an example, a timing task of the ETL may be set to synchronize the to-be-processed data determined by the data warehouse to the database, and the timing task may be set according to a time dimension of the business increment data. When the ETL timing task is executed, data to be processed (such as service metadata) in a data warehouse is extracted, then data cleaning is carried out according to a preset service data model, and the cleaned data is written into a Mysql table of a general data model of a map docking service party.
Referring to fig. 2, in the embodiment of the present invention, the data warehouse may receive business data from multiple channels, including but not limited to data entered by business parties, data obtained by using a crawler technology, data obtained from other data sources, and the like. And the data warehouse compares the received service data with the corresponding original data to determine incremental data, wherein the incremental data is to-be-processed data, namely data which needs to be stored to a graph database.
And step two, acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to a message middleware.
Specifically, in the embodiment of the present invention, the preset tool includes, but is not limited to, canal, and the message middleware includes, but is not limited to, pulsar message middleware. In specific implementation, a database (such as Mysql) of a service party corresponding to the map side is monitored by a Canal, when data is written in the database, a binary log file is generated, and the Canal extracts data to be processed (such as incremental data) according to a preset format and pushes the data to enter a message middleware cluster Pulsar.
When the Canal log data of incremental business data in Mysql is extracted by the Canal, the Canal simulates an interaction protocol of Mysql slave to be Mysql slave, and sends a dump protocol to the Mysql master, and the Mysql master receives the dump request and starts to push the Canal log to the slave (namely the Canal) and analyzes a Canal log object (originally a byte stream), namely the data to be processed.
And thirdly, monitoring the Pulsar message of the message middleware by the data processing application program, processing the data to be processed into target format data according to the map relation, and storing the target format data into a target map database in batches.
Specifically, in the present embodiment, the target format data includes, but is not limited to, RDF format data, and the target graph database includes, but is not limited to, a DGraph graph database. In particular, a java code based development data processing application (i.e., a consumer of Pulsar messages on the application side of the graph) may be provided. After receiving the message pushed by the message middleware, the data processing application program analyzes the message to obtain data to be processed, processes the data according to a data model of a graph concept provided by a service party, arranges the data into graph library data in a RDF format which can be identified by a DGraph graph database, and finally writes the graph data into the DGraph graph database by calling a GRPC interface of the DGraph graph database.
The DGraph cluster is a distributed cluster, and each machine in the cluster can perform read and write operations. As an example, when a write operation is performed, the ip of an available machine can be randomly acquired and randomly routed to an available machine. The characteristics that each machine of the DGraph cluster can access the write data and the query data are fully utilized, and the concurrency and the throughput of the whole system are improved.
During specific implementation, a connection pool can be maintained, a timing detection graph database is arranged for monitoring the state of the machine, and the hung machine is kicked out from the maintained connection pool. By encapsulating the connection pool interface with the DGraph database, a high-availability high-performance service interface is provided.
Example two
Corresponding to the above embodiments, the present application provides a data processing method, as shown in fig. 3, the method includes the following steps:
s1: synchronizing the data to be processed determined by the data warehouse to a database by using a data synchronization tool;
specifically, the ETL data synchronization tool is used to synchronize the data to be processed determined in the data warehouse to a database, wherein the database includes, but is not limited to, mysql business increment database.
S2: acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to a message middleware;
specifically, binary log data of the incremental business data extracted by the Canal is pushed to the Pulsar message middleware.
S3: and acquiring the data to be processed from the message middleware, converting the data to be processed into gallery data in a target format according to the map relation, and writing the gallery data into a target map database.
Specifically, the data processing application program monitors Pulsar messages, acquires data to be processed from the message middleware, processes the data to be processed into RDF format data according to the map relation, and stores the data in a DGraph map database in batches.
As a preferred implementation manner, in an embodiment of the present invention, the to-be-processed data includes incremental data, and the method further includes a process of determining, by the data warehouse, the to-be-processed data, including:
and receiving the service data, and comparing the service data with the corresponding original data to determine incremental data.
As a preferred implementation manner, in the embodiment of the present invention, the service data at least includes one of data entered by a service party, data acquired by using a crawler technology, and data acquired from other data sources.
As a preferred implementation, in the embodiment of the present invention, the data to be processed determined by the data warehouse is synchronized to the database by using a data synchronization tool:
and regularly extracting the data to be processed in the data warehouse by using a data synchronization tool, cleaning the data to be processed according to a preset data model, and writing the cleaned data to be processed into a data table corresponding to the database.
As a preferred implementation manner, in an embodiment of the present invention, the obtaining, by using a preset tool, the to-be-processed data from the database and pushing the to-be-processed data to a message middleware includes:
and the preset tool sends a data request to the database, receives to-be-processed data returned by the database according to the data request, and pushes the to-be-processed data to a message middleware.
As a preferred implementation, in an embodiment of the present invention, the writing the gallery data into the target map database includes:
and acquiring information of all available machines in the target graph database, and determining a target machine for executing the writing operation of the graph database according to a preset rule.
EXAMPLE III
FIG. 4 is a schematic diagram illustrating a configuration of a user representation construction apparatus according to an exemplary embodiment, the apparatus including:
the data synchronization module is used for synchronizing the data to be processed determined by the data warehouse to the database by using a data synchronization tool;
the data forwarding module is used for acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to the message middleware;
the data conversion module is used for acquiring the data to be processed from the message middleware and converting the data to be processed into the gallery data in the target format according to the map relation;
and the data writing module is used for writing the gallery data into the target gallery database.
As a preferred implementation manner, in an embodiment of the present invention, the apparatus further includes:
and the data comparison module is used for receiving the service data, comparing the service data with the corresponding original data and determining the incremental data.
As a preferred implementation manner, in the embodiment of the present invention, the service data at least includes one of data entered by a service party, data acquired by using a crawler technology, and data acquired from other data sources.
As a preferred implementation manner, in the embodiment of the present invention, the data synchronization module is specifically configured to:
and regularly extracting the data to be processed in the data warehouse by using a data synchronization tool, cleaning the data to be processed according to a preset data model, and writing the cleaned data to be processed into a data table corresponding to the database.
As a preferred implementation manner, in the embodiment of the present invention, the data forwarding module is specifically configured to:
and the preset tool sends a data request to the database, receives to-be-processed data returned by the database according to the data request, and pushes the to-be-processed data to a message middleware.
As a preferred implementation manner, in an embodiment of the present invention, the data writing module is specifically configured to:
and acquiring all available machine information in the target graph database, and determining a target machine for executing the writing operation of the graph database according to a preset rule.
Example four
Fig. 5 is a schematic diagram illustrating an internal configuration of a computer device according to an exemplary embodiment, which includes a processor, a memory, and a network interface connected through a system bus, as shown in fig. 5. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of optimizing an execution plan.
Those skilled in the art will appreciate that the configuration shown in fig. 5 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing devices to which aspects of the present invention may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
As a preferred implementation manner, in an embodiment of the present invention, the computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the following steps when executing the computer program:
synchronizing the data to be processed determined by the data warehouse to a database by using a data synchronization tool;
acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to a message middleware;
and acquiring the data to be processed from the message middleware, converting the data to be processed into gallery data in a target format according to a map relation, and writing the gallery data into a target map database.
As a preferred implementation manner, in the embodiment of the present invention, when the processor executes the computer program, the following steps are further implemented:
and receiving the service data, and comparing the service data with the corresponding original data to determine incremental data.
As a preferred implementation manner, in the embodiment of the present invention, when the processor executes the computer program, the following steps are further implemented:
and regularly extracting the data to be processed in the data warehouse by using a data synchronization tool, cleaning the data to be processed according to a preset data model, and writing the cleaned data to be processed into a data table corresponding to the database.
As a preferred implementation manner, in the embodiment of the present invention, when the processor executes the computer program, the following steps are further implemented:
and the preset tool sends a data request to the database, receives to-be-processed data returned by the database according to the data request, and pushes the to-be-processed data to a message middleware.
As a preferred implementation manner, in the embodiment of the present invention, when the processor executes the computer program, the following steps are further implemented:
and acquiring information of all available machines in the target graph database, and determining a target machine for executing the writing operation of the graph database according to a preset rule.
EXAMPLE five
In an embodiment of the present invention, a computer-readable storage medium is further provided, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the following steps:
synchronizing the data to be processed determined by the data warehouse to a database by using a data synchronization tool;
acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to a message middleware;
and acquiring the data to be processed from the message middleware, converting the data to be processed into gallery data in a target format according to the map relation, and writing the gallery data into a target map database.
As a preferred implementation manner, in the embodiment of the present invention, when executed by a processor, the computer program further implements the following steps:
and receiving the service data, and comparing the service data with the corresponding original data to determine incremental data.
As a preferred implementation manner, in the embodiment of the present invention, when executed by the processor, the computer program further implements the following steps:
and regularly extracting the data to be processed in the data warehouse by using a data synchronization tool, cleaning the data to be processed according to a preset data model, and writing the cleaned data to be processed into a data table corresponding to the database.
As a preferred implementation manner, in the embodiment of the present invention, when executed by the processor, the computer program further implements the following steps:
the preset tool sends a data request to the database, receives to-be-processed data returned by the database according to the data request, and pushes the to-be-processed data to a message middleware.
As a preferred implementation manner, in the embodiment of the present invention, when executed by the processor, the computer program further implements the following steps:
and acquiring all available machine information in the target graph database, and determining a target machine for executing the writing operation of the graph database according to a preset rule.
In summary, the technical solution provided by the embodiment of the present invention has the following beneficial effects:
according to the data processing method, the data processing device, the computer equipment and the storage medium provided by the embodiment of the invention, the data to be processed determined by the data warehouse is synchronized to the database by using the data synchronization tool, the data to be processed is obtained from the database by using the preset tool and is pushed to the message middleware, the data to be processed is obtained from the message middleware, the data to be processed is converted into the gallery data in the target format according to the gallery relationship, and the gallery data is written into the target map database, so that the timely update of the data is realized, the online search of the near-real-time gallery relationship data is realized, and the expansibility and the performance in a changeable service scene are improved.
It should be noted that: in the data processing apparatus provided in the foregoing embodiment, when triggering a data processing service, only the division of each functional module is illustrated, and in practical applications, the function allocation may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the data processing apparatus and the data processing method provided in the above embodiments belong to the same concept, that is, the apparatus is based on the data processing method, and the specific implementation process thereof is described in detail in the method embodiments and is not described herein again.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. A method of data processing, the method comprising the steps of:
synchronizing the data to be processed determined by the data warehouse to a database by using a data synchronization tool;
acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to a message middleware;
acquiring the data to be processed from the message middleware, converting the data to be processed into gallery data in a target format according to a map relation, and writing the gallery data into a target map database;
the gallery data of the target format comprises RDF format data, and the target map database comprises a DGraph map database;
the obtaining the data to be processed from the message middleware, converting the data to be processed into the gallery data in the target format according to the map relation, and writing the gallery data into the target map database includes:
a data processing application program is developed based on java codes, and after the data processing application program receives the message pushed by the message middleware, the message is analyzed to obtain the data to be processed;
processing according to a data model of a graph concept provided by a service party, arranging the data model into graph library data in an RDF format which can be identified by a DGraph graph database, and finally writing the graph library data into the DGraph graph database by calling a GRPC interface of the DGraph graph database;
wherein, the synchronizing the data to be processed determined by the data warehouse to the database by using the data synchronization tool comprises:
setting a timing task of ETL according to the time dimension of the business increment data, extracting data to be processed in a data warehouse, cleaning the data through a preset business data model, and writing the cleaned data into a Mysql table of a general data model of a map docking business side.
2. The data processing method of claim 1, wherein the data to be processed comprises incremental data, the method further comprising a process of a data warehouse determining the data to be processed, comprising:
and receiving the service data, and comparing the service data with the corresponding original data to determine incremental data.
3. The data processing method of claim 2, wherein the business data comprises at least one of data entered by a business party, data obtained by using a crawler technology, and data obtained from other data sources.
4. The data processing method according to claim 1 or 2, wherein the obtaining the data to be processed from the database by using a preset tool and pushing the data to a message middleware comprises:
and the preset tool sends a data request to the database, receives to-be-processed data returned by the database according to the data request, and pushes the to-be-processed data to a message middleware.
5. The data processing method according to claim 1 or 2, wherein said writing the gallery data to a target gallery database comprises:
and acquiring information of all available machines in the target graph database, and determining a target machine for executing the writing operation of the graph database according to a preset rule.
6. A data processing apparatus for implementing the method of claim 1, the apparatus comprising:
the data synchronization module is used for synchronizing the data to be processed determined by the data warehouse to the database by using a data synchronization tool;
the data forwarding module is used for acquiring the data to be processed from the database by using a preset tool and pushing the data to be processed to the message middleware;
the data conversion module is used for acquiring the data to be processed from the message middleware and converting the data to be processed into the gallery data in the target format according to the map relation;
and the data writing module is used for writing the gallery data into the target gallery data.
7. The data processing apparatus of claim 6, wherein the apparatus further comprises:
and the data comparison module is used for receiving the service data, comparing the service data with the corresponding original data and determining the incremental data.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 5 are implemented when the computer program is executed by the processor.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN202010683655.9A 2020-07-15 2020-07-15 Data processing method and device, computer equipment and storage medium Active CN111897808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010683655.9A CN111897808B (en) 2020-07-15 2020-07-15 Data processing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010683655.9A CN111897808B (en) 2020-07-15 2020-07-15 Data processing method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111897808A CN111897808A (en) 2020-11-06
CN111897808B true CN111897808B (en) 2023-04-11

Family

ID=73192830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010683655.9A Active CN111897808B (en) 2020-07-15 2020-07-15 Data processing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111897808B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685405A (en) * 2020-12-21 2021-04-20 福建新大陆软件工程有限公司 Data management method, system, equipment and medium based on knowledge graph
CN112732763A (en) * 2021-01-20 2021-04-30 北京千方科技股份有限公司 Data aggregation method and device, electronic equipment and medium
CN113656445A (en) * 2021-08-26 2021-11-16 五八同城信息技术有限公司 Data processing method and device, electronic equipment and storage medium
CN116932779B (en) * 2023-08-14 2024-03-12 企查查科技股份有限公司 Knowledge graph data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557592A (en) * 2016-12-02 2017-04-05 中铁程科技有限责任公司 Method of data synchronization, device and server cluster
CN106682140A (en) * 2016-12-20 2017-05-17 华北计算技术研究所(中国电子科技集团公司第十五研究所) Multi-system user incremental synchronization method based on timestamps and mapping strategies
CN108681590A (en) * 2018-05-15 2018-10-19 普信恒业科技发展(北京)有限公司 Incremental data processing method and processing device, computer equipment, computer storage media
CN110704458A (en) * 2019-08-15 2020-01-17 平安科技(深圳)有限公司 Data synchronization method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557592A (en) * 2016-12-02 2017-04-05 中铁程科技有限责任公司 Method of data synchronization, device and server cluster
CN106682140A (en) * 2016-12-20 2017-05-17 华北计算技术研究所(中国电子科技集团公司第十五研究所) Multi-system user incremental synchronization method based on timestamps and mapping strategies
CN108681590A (en) * 2018-05-15 2018-10-19 普信恒业科技发展(北京)有限公司 Incremental data processing method and processing device, computer equipment, computer storage media
CN110704458A (en) * 2019-08-15 2020-01-17 平安科技(深圳)有限公司 Data synchronization method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111897808A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN111897808B (en) Data processing method and device, computer equipment and storage medium
CN111324610A (en) Data synchronization method and device
CN109376196B (en) Method and device for batch synchronization of redo logs
CN108073696B (en) GIS application method based on distributed memory database
AU2017254506B2 (en) Method, apparatus, computing device and storage medium for data analyzing and processing
CN105468720A (en) Method for integrating distributed data processing systems, corresponding systems and data processing method
CN112231402A (en) Real-time synchronization method, device, equipment and storage medium for heterogeneous data
CN109063005B (en) Data migration method and system, storage medium and electronic device
CN111680017A (en) Data synchronization method and device
CN111858730A (en) Data importing and exporting device, method, equipment and medium of graph database
CN112632035A (en) Autonomous controllable database migration method and storage medium
CN115033646A (en) Method for constructing real-time warehouse system based on Flink and Doris
CN109800069B (en) Method and device for realizing data management
CN112199443B (en) Data synchronization method and device, computer equipment and storage medium
CN106844716B (en) Mass data automatic storage method based on Solr index and Oracle storage
CN115982230A (en) Cross-data-source query method, system, equipment and storage medium of database
CN112148705A (en) Data migration method and device
CN114547206A (en) Data synchronization method and data synchronization system
CN114490865A (en) Database synchronization method, device, equipment and computer storage medium
CN111858616A (en) Streaming data storage method and device
CN110633332A (en) Data warehouse, data updating and calling method, device and equipment
CN111339245A (en) Data storage method, device, storage medium and equipment
CN110990378A (en) Block chain-based data consistency comparison method, device and medium
CN112231292A (en) File processing method and device, storage medium and computer equipment
CN114625730A (en) Data storage method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant