CN114254016A - Data synchronization method, device and equipment based on elastic search and storage medium - Google Patents

Data synchronization method, device and equipment based on elastic search and storage medium Download PDF

Info

Publication number
CN114254016A
CN114254016A CN202111555692.2A CN202111555692A CN114254016A CN 114254016 A CN114254016 A CN 114254016A CN 202111555692 A CN202111555692 A CN 202111555692A CN 114254016 A CN114254016 A CN 114254016A
Authority
CN
China
Prior art keywords
log file
message
type
determining
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111555692.2A
Other languages
Chinese (zh)
Inventor
高一淇
韩方方
鲁良
李天与
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jindi Technology Co Ltd
Original Assignee
Beijing Jindi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jindi Technology Co Ltd filed Critical Beijing Jindi Technology Co Ltd
Priority to CN202111555692.2A priority Critical patent/CN114254016A/en
Publication of CN114254016A publication Critical patent/CN114254016A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a data synchronization method, a data synchronization device, computer equipment and a storage medium based on elastic search, wherein the method comprises the following steps: monitoring the transaction log of each node in the elastic search cluster; under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file; and converting the log file into a message and sending the message to a message cache middleware according to the determined operation type, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database, thereby acquiring the log file which changes in the elastic search in real time, converting the log file into the message and sending the message to the message cache middleware, and synchronizing the message to the target database after subscribing by the message consumption middleware, thereby realizing real-time incremental data synchronization.

Description

Data synchronization method, device and equipment based on elastic search and storage medium
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for data synchronization based on flexible search.
Background
An Elastic Search (ES) database is a Lucene-based distributed search engine, and can support quasi-real-time data retrieval and processing of massive data, including structured and unstructured data, and provide a powerful full-text search capability.
However, when the data of the flexible search database needs to be synchronized with other databases, the search function can only be queried based on the Elasticsearch index, so that the real-time incremental data cannot be acquired, and the synchronization cannot be performed on the real-time incremental data.
Disclosure of Invention
An embodiment of the present invention is directed to provide a method, an apparatus, a device, and a storage medium for data synchronization based on flexible search, which are used to overcome the above technical problems in the prior art.
The technical scheme provided by the embodiment of the application is as follows:
a method of data synchronization based on elastic search, comprising:
monitoring the transaction log of each node in the elastic search cluster;
under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file;
and converting the log file into a message and sending the message to a message cache middleware according to the determined type of the operation, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database.
Optionally, before monitoring the transaction log of each node in the flexible search cluster and pre-installing a collector on each node in the flexible search cluster, the method includes: a collector is pre-installed on each node in the elastic search cluster, data input of a configuration file of the collector is set to be an affair log of elastic search so as to search all log files in the affair log, and data output of the configuration file of the collector is set to be message cache middleware.
Optionally, the log file of the transaction log includes version information, wherein the version information includes an iteration version number;
the method further comprises the following steps: and determining whether the log file in the transaction log changes or not according to the iteration version number of the log file.
Optionally, the acquiring and analyzing the changed log file, and determining the type of the operation recorded by the changed log file specifically include:
extracting the changed log file according to a preset regular expression;
and determining the type of the operation recorded by the changed log file based on the extraction result.
Optionally, the extracting the changed log file according to a preset regular expression, and determining the type of the operation recorded by the changed log file based on the extraction result specifically include:
extracting the log file through a first regular expression, and determining the type of operation recorded by the log file according to the extraction result;
and under the condition that no result is extracted through the first regular expression, extracting the log file through the second regular expression, and determining the type of the operation recorded by the log file according to the extraction result.
Optionally, the determining, according to the extraction result, the type of the operation recorded by the log file specifically includes:
if only the unique mark in the log file is extracted, determining the type of the operation recorded by the log file as a deleting operation;
and if the unique mark and the operation data in the log file are extracted, determining the type of the operation recorded in the log file as an updating or inserting operation.
Optionally, the acquiring and analyzing the changed log file, and determining the type of the operation recorded by the changed log file specifically include:
extracting a unique mark in the log file through a third regular expression;
extracting operation data in the log file through a fourth regular expression;
determining the type of operation recorded by the log file based on the extraction result.
Optionally, the determining the type of the operation recorded by the log file based on the extraction result specifically includes:
if the operation data in the log file is extracted, determining the type of the operation recorded in the log file as an updating operation or an inserting operation;
and if the operation data in the log file is not extracted, determining the type of the operation recorded in the log file as a deleting operation.
Optionally, the acquiring and analyzing the changed log file, and determining the type of the operation recorded by the log file specifically include:
extracting a naming suffix of the log file;
determining the type of the operation recorded by the log file according to the named suffix;
and determining a regular expression according to the type of the operation, and extracting data in the log file.
Optionally, the type of the operation is specifically a delete operation;
the determining a regular expression according to the type of the operation and extracting data in the log file specifically includes: and determining the regular expression as a fifth regular expression according to the deleting operation, and extracting the unique mark in the log file through the fifth regular expression.
Optionally, the type of the operation is specifically an update or insertion operation;
the determining a regular expression according to the type of the operation and extracting data in the log file specifically includes: and determining the regular expression as a sixth regular expression according to the updating or inserting operation, and extracting the unique mark and the operation data in the log file through the sixth regular expression.
Optionally, the type of the operation is a delete operation, the converting the log file into a message and sending the message to a message caching middleware specifically includes: acquiring a unique mark in the log file, combining the deleting operation and the unique mark in a numerical value pair form to generate a message which conforms to a first preset format, and sending the message to a message queue of a message cache middleware;
the type of the operation is an update or insertion operation, the converting the log file into a message and sending the message to a message cache middleware specifically includes: and acquiring the unique mark and the operation data in the log file, combining the updating or inserting operation, the unique mark and the operation data in a numerical value pair form to generate a message which accords with a first preset format, and sending the message to a message queue of a message cache middleware.
A data synchronization device based on elastic search, comprising:
the monitoring module is used for monitoring the transaction log of each node in the elastic search cluster;
the analysis module is used for collecting and analyzing the changed log file under the condition that the log file in the transaction log is changed, and determining the type of the operation recorded by the log file;
and the sending module is used for converting the log file into a message and sending the message to the message cache middleware according to the determined type of the operation, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database.
A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of a data synchronization method as claimed in any one of the embodiments of the present application.
A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by a processor, implement the steps of a data synchronization method as in any one of the embodiments of the present application.
Compared with the prior art, the embodiment of the invention mainly has the following beneficial effects:
the transaction log of each node in the elastic search cluster is monitored; under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file; and converting the log file into a message and sending the message to a message cache middleware according to the determined operation type, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database, thereby obtaining the log file which changes in the elastic search in real time, converting the log file into the message and sending the message to the message cache middleware, and synchronizing the message to the target database after subscribing by the message consumption middleware, thereby realizing real-time incremental data synchronization.
Drawings
In order to more clearly illustrate the solution of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive labor.
Fig. 1 is a schematic view of an application scenario in an embodiment of the present application;
FIG. 2 is a schematic flowchart illustrating a data synchronization method based on flexible search according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram illustrating an implementation manner of step S202 in the embodiment of the present application;
fig. 4 is a schematic diagram illustrating another implementation manner of step S202 in the embodiment of the present application;
fig. 5 is a schematic diagram illustrating still another implementation manner of step S202 in the embodiment of the present application;
FIG. 6 is a schematic structural diagram of a data synchronization apparatus based on flexible search according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention; the terms "comprising" and "having," and any variations thereof, in the description and claims of this invention and the description of the above figures, are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and in the claims, or in the drawings, are used for distinguishing between different objects and not necessarily for describing a particular sequential order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions of the present invention better understood by those skilled in the art, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.
The transaction log of each node in the elastic search cluster is monitored; under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file; and converting the log file into a message and sending the message to a message cache middleware according to the determined operation type, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database, thereby obtaining the log file which changes in the elastic search in real time, converting the log file into the message and sending the message to the message cache middleware, and synchronizing the message to the target database after subscribing by the message consumption middleware, thereby realizing real-time incremental data synchronization.
Fig. 1 is a schematic view of an application scenario in an embodiment of the present application; as shown in fig. 1, the application scenario includes an ES database, a message caching middleware, a message consumption middleware (or also called a message consumer), and a target database. The message caching middleware is also called a message caching platform, such as kafka specifically. The message caching middleware and the message consumption middleware can form a message processing system. The ES database may constitute a flexible search cluster.
In this embodiment, a process of writing data into the ES database is schematically as follows:
when a piece of data needs to be written into an ES database, in order to increase the writing speed, the piece of data to be written is not directly written on a disk, but is written into a memory (also called a buffer) first. Meanwhile, in order to prevent a piece of data from being lost, a corresponding piece of data is added into a transaction log (also called a transactional log); when the default time is reached or a certain amount of data in the memory is reached, a Refresh (also called Refresh) is triggered, a piece of data in the buffer is refreshed (also called Refresh) into a cache (also called OS cache) of the operating system, and then a piece of data of the OS cache is written into a segment file (also called segment file) at set time intervals, for example, 1 second. Wherein a piece of data in the buffer is flushed to the OS cache, which means that the piece of data can be searched. When a piece of data enters the OS cache, one piece of data in the buffer is emptied, and meanwhile, one piece of data is written into the transactional, so that one piece of data in the transactional can be persisted into the disk at set time intervals, such as 5 seconds. Repeating the above operations, writing each piece of data into the buffer, and simultaneously writing a log file into the transom log, wherein the log file in the transom log can be continuously enlarged, and when reaching a certain degree or reaching a set time, triggering flush operation, namely, refreshing the data in the OS cache into a hard disk, deleting the old log file, and creating an empty log file. In addition, since a segment file is generated every time the buffer is updated, a merge (merge) operation is periodically performed.
For a message processing system, it may comprise: several message producers (producers), several kafka broker servers (also known as kafkamberoks), several message consumption middleware (also known as consumers), and a Zookeeper cluster, several kafka amberoks forming a kafka cluster. Among them, Producer is used to push (also called push) the message obtained when the following data synchronization method is performed to the kafka brooker in the kafka cluster. Consumer pulls (pull) messages from the Kafka cluster for consumption, thereby enabling real-time synchronization of data from ES data to the target database.
Since each kafkab server is a kafka server, each kafkab server can accommodate multiple categories of messages (also known as topics). Topic is actually a message queue, and a Topic can be divided into multiple partitions (also called partitions), and multiple partitions can be stored on multiple kafkabroks in a kafka cluster.
In addition, one to a plurality of consumers form a Consumer Group (for example, also called a message consumption middleware cluster), a Consumer in the Consumer Group is responsible for consuming data of different partitions, one Partition (Partition) can only be consumed by one message consumption middleware in one Consumer Group, and the Consumer groups are not affected by each other, that is, each message can only be consumed by one Consumer in the Consumer Group, but can be consumed by a plurality of Consumer Group groups, so that unicast and multicast are realized.
In the application scenario, all data is stored in a buffer or an OS cache before a flush operation is executed, and since the buffer or the OS cache is a memory, when a node where the ES database is located is down, the data in the memory is lost, and therefore, since data is also stored in the log transaction log, when the node is restarted again, the ES database actively reads a log file in the log, and the lost data in the memory is recovered to the memory buffer and the OS cache.
FIG. 2 is a schematic flowchart illustrating a data synchronization method based on flexible search according to an embodiment of the present disclosure; as shown in fig. 2, it includes:
s201, monitoring a transaction log of each node in the elastic search cluster;
in this embodiment, before monitoring the transaction log of each node in the elastic search cluster and installing a collector in advance for each node in the elastic search cluster, the method may include: a collector is pre-installed on each node in an elastic search cluster, data input of a configuration file of the collector is set to be an elastic search transaction log so as to search all log files in the transaction log, data output of the configuration file of the collector is set to be the message cache middleware, and therefore pre-installation of the collector is achieved directly on the basis of a query search function supported by the elastic search cluster, all log files in the transaction log can be conveniently searched in real time, changes of the log files in the transaction log are monitored, and the changes of the data include data insertion or data updating of an ES database or data deletion operation.
Further, the collector may be specifically a filebed to perform lightweight monitoring on the transaction log of each node in the elastic search cluster, that is, to monitor whether a log file in the transaction log changes.
Illustratively, when the suffix of the log file is tlog ending, when the data input of the collector configuration file is set as the flexibly searched transaction log to search all log files in the transaction log, a directory for accessing the flexibly searched transaction log may be added in the configuration file to monitor the file name of tlog ending in/data/directory in each node, thereby implementing accurate monitoring of the flexibly searched transaction log.
As described above, for the transcologe, each piece of data is written into the buffer, and at the same time, a log file is written into the transcologe, the log file in the transcologe is continuously enlarged, and when a certain degree or a set time is reached, flush operation is triggered, that is, the data in the OS cache is refreshed into the hard disk, the old log file is deleted, and an empty log file is created. As the log file in the transcog becomes larger, it is directly characterized by the iteration version number of the transcog, such as the iteration version number is incremented, such as specifically, for example, the first version of the transcog is transcog 1.tlog, when a set time is reached and a flush is triggered in 30 minutes, the transcog 1.tlog log file will be deleted and a new transcog is created, which is denoted as transcog 2. tlog.
For this, for example, if the log file of the transaction log includes version information, and the version information includes an iteration version number, the method may further include: and determining whether the log file in the transaction log changes or not according to the iteration version number of the log file. This step is included in step S201, and may be performed between steps S201 and S202.
Exemplarily, the determining whether the log file in the transaction log changes according to the iteration version number of the log file specifically includes: and judging whether the iteration version number of the log file changes or not, if so, determining that the log file in the transaction log changes, and if not, determining that the log file in the transaction log does not change, thereby realizing the quick and accurate judgment of whether the log file changes or not based on the iteration version number.
S202, under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file;
in this embodiment, the step S202 of determining the type of the operation recorded by the log file may be implemented by any one of the following manners shown in fig. 3 to 5.
Fig. 3 is a schematic diagram illustrating an implementation manner of step S202 in the embodiment of the present application; as shown in fig. 3, in this embodiment, the acquiring and analyzing the changed log file, and determining the type of the operation recorded by the changed log file specifically includes: extracting the changed log file according to a preset regular expression; and determining the type of the operation recorded by the changed log file based on the extraction result.
The regular expression is relatively simple to construct, so that the difficulty of algorithm implementation can be reduced, and rapid extraction can be realized.
Optionally, in this embodiment, the extracting the changed log file according to a preset regular expression in the step, and determining the type of the operation recorded by the changed log file based on the extraction result specifically includes:
S212A, extracting the log file through a first regular expression, and determining the type of operation recorded by the log file according to the extraction result;
S222A, under the condition that no result is extracted through the first regular expression, extracting the log file through the second regular expression, and determining the type of operation recorded by the log file according to the extraction result.
Specifically, the log file in the transaction log may be changed due to a change of data in the ES database, where the change of data includes a change caused by a data insertion or data update or a data deletion operation performed on the ES database, and the log file caused by the data insertion or data update in the ES database is the same. Therefore, when the regular expressions are constructed, two regular expressions are constructed to respectively extract the changed log files so as to judge whether the changed log files are caused by data insertion or data updating operation or data deleting operation. The two regular expressions are, for example, a log file caused by an insertion operation of corresponding data or an update operation of the data, and a log file caused by a deletion operation of the data, respectively.
For this, in performing the above steps S212A and S222A, one of the two regular expressions is selected as a first regular expression to perform step S212A first, and if the result is extracted in step S212A, the type of the operation recorded by the log file is determined directly according to the extraction result; otherwise, under the condition that no result is extracted through the first regular expression, taking the remaining one of the two regular expressions as a second regular expression to extract the log file.
Here, it should be noted that which one of the two regular expressions participates in the execution of the step S212A and which one participates in the execution of the step S222A may be flexibly selected according to an application scenario, for example, according to the occurrence frequency of operations.
Further, determining the type of the operation recorded by the log file according to the extraction result in step S202 specifically includes:
if only the unique mark in the log file is extracted, determining the type of the operation recorded by the log file as a deleting operation;
and if the unique mark and the operation data in the log file are extracted, determining the type of the operation recorded in the log file as an updating or inserting operation.
In this embodiment, the unique identifier may be, for example, a primary key, but the primary key is merely an example and is not limited to the example, and the unique identifier may include any type that can directly or indirectly characterize an operation.
Specifically, for the deletion operation of data, only the primary key that can be used as the unique identifier is saved in the corresponding log file, while for the update or insertion operation of data, not only the primary key that can be used as the unique identifier but also the operation data must be saved in the corresponding log file. Therefore, if only the unique mark in the log file is extracted, determining the type of the operation recorded by the log file as a deleting operation; and if the unique mark and the operation data in the log file are extracted, determining the type of the operation recorded in the log file as an updating or inserting operation.
Further, under the condition that no result is extracted through the first regular expression and no result is extracted through the second regular expression, the changed log file is discarded, so that the data processing efficiency is improved, and data backlog is avoided.
Fig. 4 is a schematic diagram illustrating another implementation manner of step S202 in the embodiment of the present application; as shown in fig. 4, in this embodiment, the acquiring and analyzing the changed log file, and determining the type of the operation recorded by the changed log file specifically includes:
S212B, extracting the unique mark in the log file through a third regular expression;
S222B, extracting operation data in the log file through a fourth regular expression;
S232B, determining the type of the operation recorded by the log file based on the extraction result.
In this embodiment, the third regular expression and the fourth regular expression may be selected from the two regular expressions.
Optionally, the determining the type of the operation recorded by the log file based on the extraction result specifically includes:
if the operation data in the log file is extracted, determining the type of the operation recorded in the log file as an updating operation or an inserting operation;
and if the operation data in the log file is not extracted, determining the type of the operation recorded in the log file as a deleting operation.
Specifically, as described above, for the deletion operation of data, only the primary key that can be used as the unique identifier is saved in the corresponding log file, whereas for the update or insertion operation of data, not only the primary key that can be used as the unique identifier but also the operation data must be saved in the corresponding log file. Therefore, whether the operation is updating or inserting or deleting can be distinguished by judging whether the changed log file stores the operation data or not. Therefore, the type of the operation recorded by the log file can be quickly and accurately determined by judging whether the operation data is included in the log file or not.
Furthermore, no matter for the deletion operation of the data, or for the update or insertion operation of the data, the corresponding log file stores the primary key which can be used as the unique mark, so when the type of the operation recorded by the log file is determined based on the extraction result, if the unique mark in the log file is not extracted, the changed log file is discarded, thereby improving the efficiency of data processing and avoiding data backlog.
Fig. 5 is a schematic diagram illustrating still another implementation manner of step S202 in the embodiment of the present application; as shown in fig. 5, in this embodiment, the collecting and analyzing the changed log file, and determining the type of the operation recorded by the changed log file specifically include:
S212C, extracting a naming suffix of the log file;
s222, determining the type of the operation recorded by the log file according to the naming suffix 222C;
S232C, determining a regular expression according to the type of the operation, and extracting data in the log file.
Optionally, if the type of the operation is specifically a deletion operation, in step S232C, determining a regular expression according to the type of the operation, and extracting data in the log file includes: and determining the regular expression as a fifth regular expression according to the deleting operation, and extracting the unique mark in the log file through the fifth regular expression.
Optionally, the type of the operation is specifically an update or insertion operation; in step S232C, the determining a regular expression according to the type of the operation, and extracting data in the log file specifically includes: and determining the regular expression as a sixth regular expression according to the updating or inserting operation, and extracting the unique mark and the operation data in the log file through the sixth regular expression.
Illustratively, the fifth regular expression is, for example, a regular expression of the log file caused by a deletion operation of corresponding data in the above two regular expressions, and the sixth regular expression is a regular expression of the log file caused by an insertion operation of corresponding data or an update operation of data.
Specifically, if the log file corresponding to the change is caused by the deletion operation of the data, a file with a.del suffix is generated, and the state of the log file marked in the.del file is delete; if an update or insert operation on the log file data corresponding to the change results, a corresponding segment file, i.e., a segment file, may be generated, followed by a si file. According to whether the suffix is delta or si, whether the type of the operation recorded by the log file is an insertion or update operation of data or a deletion operation of data can be quickly determined.
Specifically, in this embodiment, the two regular expressions may specifically be: "(? (? (. By "(? (? (.
The third regular expression in the embodiment shown in fig. 4 is, for example, "(? (? (.
Here, it should be noted that the specific structure of the regular expression herein is merely an example and is not limited.
In this embodiment, the operation data may be data in json format, for example. However, it should be noted that the operation data is data in json format, which is merely an example and is not limited thereto.
S203, converting the log file into a message and sending the message to a message cache middleware according to the determined operation type, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database.
Illustratively, if the type of the operation is a delete operation, the converting the log file into a message and sending the message to a message cache middleware specifically includes: and acquiring the unique mark in the log file, combining the deleting operation and the unique mark in a numerical value pair form to generate a message in accordance with a first preset format, and sending the message to a message queue of a message cache middleware.
The type of the operation is an update or insertion operation, the converting the log file into a message and sending the message to a message cache middleware specifically includes: and acquiring the unique mark and the operation data in the log file, combining the updating or inserting operation, the unique mark and the operation data in a numerical value pair form to generate a message which accords with a first preset format, and sending the message to a message queue of a message cache middleware.
Specifically, the first preset format may be a json format.
Illustratively, the deleting operation and unique flag are combined in the form of value pairs to generate a message conforming to a first preset format, such as { "type": DELETE "} (_ id), and the updating or inserting operation, unique flag, and operation data are combined in the form of value pairs to generate a message conforming to a first preset format, such as {" type ": UPDATE (_ id | $).
Further, the message further includes: the time stamp takes the time of the collector collecting the log file as the time stamp of the message; and after receiving the message, the message consumption middleware sorts the message according to the timestamp of the message, so that the message is consumed in a first-in first-out mode when consumed, and data collision or data disorder is avoided.
Illustratively, the converting the parsed log file into a message and sending the message to a message cache middleware, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database specifically includes:
converting the analyzed log file into a message and sending the message to a message cache middleware, so that the message consumption middleware subscribes the message from a message queue of the message cache middleware according to the sequencing and synchronizes the message to a target database;
and the target database adopts a data warehouse, the data warehouse extracts the data in the message and maps the data into a database table, and the extracted data in the message comprises the type of operation, the unique identifier and a timestamp.
In particular, the data warehouse may be a Hive warehouse. In addition, it should be noted that, in the embodiment of the present application, the target database is not limited uniquely, and may be determined by a person of ordinary skill in the art according to requirements of an application scenario, for example, the target database may be mysql, oracle, sqlserver, mongodb, and the like.
FIG. 6 is a schematic structural diagram of a data synchronization apparatus based on flexible search according to an embodiment of the present application; as shown in fig. 6, the data synchronization apparatus 600 for elastic search includes:
a monitoring module 601, configured to monitor a transaction log of each node in the elastic search cluster;
the analysis module 602 is configured to, when a log file in the transaction log changes, acquire and analyze the changed log file, and determine a type of an operation recorded by the log file;
a sending module 603, configured to convert the log file into a message according to the determined type of the operation, and send the message to a message caching middleware, so that the message consuming middleware subscribes the message from the message caching middleware and synchronizes the message to a target database.
Optionally, the monitoring module 601 is further configured to, before monitoring the transaction log of each node in the elastic search cluster, pre-install a collector on each node in the elastic search cluster, and set the data input of the configuration file of the collector as the transaction log of the elastic search to search all log files in the transaction log, where the data output of the configuration file of the collector is set as a message cache middleware.
Optionally, the log file of the transaction log includes version information, wherein the version information includes an iteration version number;
the monitoring module 601 is further configured to determine whether a log file in the transaction log changes according to the iteration version number of the log file.
Optionally, the parsing module 602 is specifically configured to:
extracting the changed log file according to a preset regular expression;
and determining the type of the operation recorded by the changed log file based on the extraction result.
Optionally, the parsing module 602 is specifically configured to:
extracting the log file through a first regular expression, and determining the type of operation recorded by the log file according to the extraction result;
and under the condition that no result is extracted through the first regular expression, extracting the log file through the second regular expression, and determining the type of the operation recorded by the log file according to the extraction result.
Optionally, the parsing module 602 is specifically configured to:
if only the unique mark in the log file is extracted, determining the type of the operation recorded by the log file as a deleting operation;
and if the unique mark and the operation data in the log file are extracted, determining the type of the operation recorded in the log file as an updating or inserting operation.
Optionally, the parsing module 602 is specifically configured to:
extracting a unique mark in the log file through a third regular expression;
extracting operation data in the log file through a fourth regular expression;
determining the type of operation recorded by the log file based on the extraction result.
Optionally, the parsing module 602 is specifically configured to:
if the operation data in the log file is extracted, determining the type of the operation recorded in the log file as an updating operation or an inserting operation;
and if the operation data in the log file is not extracted, determining the type of the operation recorded in the log file as a deleting operation.
Optionally, the parsing module 602 is specifically configured to:
extracting a naming suffix of the log file;
determining the type of the operation recorded by the log file according to the named suffix;
and determining a regular expression according to the type of the operation, and extracting data in the log file.
Optionally, the type of the operation is specifically a delete operation;
the parsing module 602 is specifically configured to: and determining the regular expression as a fifth regular expression according to the deleting operation, and extracting the unique mark in the log file through the fifth regular expression.
Optionally, the type of the operation is specifically an update or insertion operation;
the parsing module 602 is specifically configured to: and determining the regular expression as a sixth regular expression according to the updating or inserting operation, and extracting the unique mark and the operation data in the log file through the sixth regular expression.
Optionally, the type of the operation is a deletion operation, and the sending module 603 is specifically configured to: acquiring a unique mark in the log file, combining the deleting operation and the unique mark in a numerical value pair form to generate a message which conforms to a first preset format, and sending the message to a message queue of a message cache middleware;
the type of the operation is an update or insertion operation, and the sending module 603 is specifically configured to: and acquiring the unique mark and the operation data in the log file, combining the updating or inserting operation, the unique mark and the operation data in a numerical value pair form to generate a message which accords with a first preset format, and sending the message to a message queue of a message cache middleware.
In fig. 6, for an exemplary explanation of each module, reference may be made to the examples shown in fig. 1 to fig. 5, which are not described herein again.
FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application; as shown in fig. 7, the computer device 700 comprises a memory 701 having computer readable instructions stored therein, and a processor 702, which when executed implements the steps of the data synchronization method according to any of the embodiments of the present application.
The present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions, when executed by a processor, implement the steps of the data synchronization method according to any one of the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention without limiting its scope. This invention may be embodied in many different forms and, on the contrary, these embodiments are provided so that this disclosure will be thorough and complete. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and modifications can be made, and equivalents may be substituted for elements thereof. All equivalent structures made by using the contents of the specification and the attached drawings of the invention can be directly or indirectly applied to other related technical fields, and are also within the protection scope of the patent of the invention.

Claims (15)

1. A data synchronization method based on elastic search is characterized by comprising the following steps:
monitoring the transaction log of each node in the elastic search cluster;
under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file;
and converting the log file into a message and sending the message to a message cache middleware according to the determined type of the operation, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database.
2. The data synchronization method according to claim 1, wherein before monitoring the transaction log of each node in the elastic search cluster, the method comprises: a collector is pre-installed on each node in the elastic search cluster, data input of a configuration file of the collector is set to be an affair log of elastic search so as to search all log files in the affair log, and data output of the configuration file of the collector is set to be message cache middleware.
3. The data synchronization method of claim 1, wherein the log file of the transaction log comprises version information, wherein the version information comprises an iteration version number;
the method further comprises the following steps: and determining whether the log file in the transaction log changes or not according to the iteration version number of the log file.
4. The data synchronization method according to claim 1, wherein the collecting and analyzing the changed log file and determining the type of the operation recorded by the changed log file specifically include:
extracting the changed log file according to a preset regular expression;
and determining the type of the operation recorded by the changed log file based on the extraction result.
5. The data synchronization method according to claim 4, wherein the extracting the changed log file according to a preset regular expression, and determining the type of the operation recorded by the changed log file based on the extraction result specifically includes:
extracting the log file through a first regular expression, and determining the type of operation recorded by the log file according to the extraction result;
and under the condition that no result is extracted through the first regular expression, extracting the log file through the second regular expression, and determining the type of the operation recorded by the log file according to the extraction result.
6. The data synchronization method according to claim 5, wherein the determining the type of the operation recorded by the log file according to the extraction result specifically comprises:
if only the unique mark in the log file is extracted, determining the type of the operation recorded by the log file as a deleting operation;
and if the unique mark and the operation data in the log file are extracted, determining the type of the operation recorded in the log file as an updating or inserting operation.
7. The data synchronization method according to claim 1, wherein the collecting and analyzing the changed log file and determining the type of the operation recorded by the changed log file specifically include:
extracting a unique mark in the log file through a third regular expression;
extracting operation data in the log file through a fourth regular expression;
determining the type of operation recorded by the log file based on the extraction result.
8. The data synchronization method according to claim 7, wherein the determining the type of the operation recorded by the log file based on the extraction result specifically includes:
if the operation data in the log file is extracted, determining the type of the operation recorded in the log file as an updating operation or an inserting operation;
and if the operation data in the log file is not extracted, determining the type of the operation recorded in the log file as a deleting operation.
9. The data synchronization method according to claim 1, wherein the collecting and analyzing the changed log file and determining the type of operation recorded by the log file specifically include:
extracting a naming suffix of the log file;
determining the type of the operation recorded by the log file according to the named suffix;
and determining a regular expression according to the type of the operation, and extracting data in the log file.
10. The data synchronization method according to claim 9, wherein the type of the operation is specifically a delete operation;
the determining a regular expression according to the type of the operation and extracting data in the log file specifically includes: and determining the regular expression as a fifth regular expression according to the deleting operation, and extracting the unique mark in the log file through the fifth regular expression.
11. The data synchronization method according to claim 9, characterized in that the type of the operation is in particular an update or an insert operation;
the determining a regular expression according to the type of the operation and extracting data in the log file specifically includes: and determining the regular expression as a sixth regular expression according to the updating or inserting operation, and extracting the unique mark and the operation data in the log file through the sixth regular expression.
12. The data synchronization method according to claim 1, wherein the type of the operation is a delete operation, and the converting the log file into a message and sending the message to a message cache middleware specifically includes: acquiring a unique mark in the log file, combining the deleting operation and the unique mark in a numerical value pair form to generate a message which conforms to a first preset format, and sending the message to a message queue of a message cache middleware;
the type of the operation is an update or insertion operation, the converting the log file into a message and sending the message to a message cache middleware specifically includes: and acquiring the unique mark and the operation data in the log file, combining the updating or inserting operation, the unique mark and the operation data in a numerical value pair form to generate a message which accords with a first preset format, and sending the message to a message queue of a message cache middleware.
13. An apparatus for synchronizing data based on elastic search, comprising:
the monitoring module is used for monitoring the transaction log of each node in the elastic search cluster;
the analysis module is used for collecting and analyzing the changed log file under the condition that the log file in the transaction log is changed, and determining the type of the operation recorded by the log file;
and the sending module is used for converting the log file into a message and sending the message to the message cache middleware according to the determined type of the operation, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database.
14. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of a data synchronization method as claimed in any one of claims 1 to 12.
15. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the data synchronization method of any one of claims 1 to 12.
CN202111555692.2A 2021-12-17 2021-12-17 Data synchronization method, device and equipment based on elastic search and storage medium Pending CN114254016A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111555692.2A CN114254016A (en) 2021-12-17 2021-12-17 Data synchronization method, device and equipment based on elastic search and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111555692.2A CN114254016A (en) 2021-12-17 2021-12-17 Data synchronization method, device and equipment based on elastic search and storage medium

Publications (1)

Publication Number Publication Date
CN114254016A true CN114254016A (en) 2022-03-29

Family

ID=80792926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111555692.2A Pending CN114254016A (en) 2021-12-17 2021-12-17 Data synchronization method, device and equipment based on elastic search and storage medium

Country Status (1)

Country Link
CN (1) CN114254016A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114978889A (en) * 2022-05-13 2022-08-30 厦门兆翔智能科技有限公司 Airport enterprise service bus system
CN116089545A (en) * 2023-04-07 2023-05-09 云筑信息科技(成都)有限公司 Method for collecting storage medium change data into data warehouse
CN116719821A (en) * 2023-08-09 2023-09-08 北京联云天下科技有限公司 Concurrent data insertion elastic search weight removing method, device and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114978889A (en) * 2022-05-13 2022-08-30 厦门兆翔智能科技有限公司 Airport enterprise service bus system
CN114978889B (en) * 2022-05-13 2024-04-16 厦门兆翔智能科技有限公司 Airport enterprise service bus system
CN116089545A (en) * 2023-04-07 2023-05-09 云筑信息科技(成都)有限公司 Method for collecting storage medium change data into data warehouse
CN116089545B (en) * 2023-04-07 2023-08-22 云筑信息科技(成都)有限公司 Method for collecting storage medium change data into data warehouse
CN116719821A (en) * 2023-08-09 2023-09-08 北京联云天下科技有限公司 Concurrent data insertion elastic search weight removing method, device and storage medium
CN116719821B (en) * 2023-08-09 2023-10-10 北京联云天下科技有限公司 Concurrent data insertion elastic search weight removing method, device and storage medium

Similar Documents

Publication Publication Date Title
CN114254016A (en) Data synchronization method, device and equipment based on elastic search and storage medium
CN110502583B (en) Distributed data synchronization method, device, equipment and readable storage medium
Markowetz et al. Keyword search on relational data streams
CN105183860B (en) Method of data synchronization and system
TW201301062A (en) Extracting incremental data
CN111881011A (en) Log management method, platform, server and storage medium
US11500879B2 (en) Method, device, and program product for managing index of streaming data storage system
CN101901237A (en) Type-Index-Value distributed database built based on SQIite
CN109947729B (en) Real-time data analysis method and device
CN112328702B (en) Data synchronization method and system
CN115328894A (en) Data processing method based on data blood margin
CN103034650A (en) System and method for processing data
CN109491988B (en) Data real-time association method supporting full-scale updating
CN116089545B (en) Method for collecting storage medium change data into data warehouse
CN113448757A (en) Message processing method, device, equipment, storage medium and system
CN115391457B (en) Cross-database data synchronization method, device and storage medium
CN116821179A (en) Dream database cross-database searching system and method
CN114553970A (en) Distributed message processing method based on Kafka and data bus system
CN115374939A (en) Expert knowledge base construction method based on multi-label dynamic update
CN110287172B (en) Method for formatting HBase data
JP2009282563A (en) Data storage system, program, method, and monitoring device
CN112948490B (en) Data synchronization method, device, equipment and storage medium based on kafka and redis
CN110297881A (en) For realizing the method and computer-readable medium of secondary index
CN115185942A (en) Data extraction and synchronization method based on kafka and log
CN111143475B (en) State management method and device for Storm data analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination