CN114254016A - Data synchronization method, device and equipment based on elastic search and storage medium - Google Patents
Data synchronization method, device and equipment based on elastic search and storage medium Download PDFInfo
- Publication number
- CN114254016A CN114254016A CN202111555692.2A CN202111555692A CN114254016A CN 114254016 A CN114254016 A CN 114254016A CN 202111555692 A CN202111555692 A CN 202111555692A CN 114254016 A CN114254016 A CN 114254016A
- Authority
- CN
- China
- Prior art keywords
- log file
- message
- type
- determining
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000012544 monitoring process Methods 0.000 claims abstract description 17
- 230000014509 gene expression Effects 0.000 claims description 84
- 238000000605 extraction Methods 0.000 claims description 28
- 238000003780 insertion Methods 0.000 claims description 17
- 230000037431 insertion Effects 0.000 claims description 17
- 238000012217 deletion Methods 0.000 description 11
- 230000037430 deletion Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 10
- 238000005192 partition Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 230000001960 triggered effect Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/275—Synchronous replication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the invention provides a data synchronization method, a data synchronization device, computer equipment and a storage medium based on elastic search, wherein the method comprises the following steps: monitoring the transaction log of each node in the elastic search cluster; under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file; and converting the log file into a message and sending the message to a message cache middleware according to the determined operation type, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database, thereby acquiring the log file which changes in the elastic search in real time, converting the log file into the message and sending the message to the message cache middleware, and synchronizing the message to the target database after subscribing by the message consumption middleware, thereby realizing real-time incremental data synchronization.
Description
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for data synchronization based on flexible search.
Background
An Elastic Search (ES) database is a Lucene-based distributed search engine, and can support quasi-real-time data retrieval and processing of massive data, including structured and unstructured data, and provide a powerful full-text search capability.
However, when the data of the flexible search database needs to be synchronized with other databases, the search function can only be queried based on the Elasticsearch index, so that the real-time incremental data cannot be acquired, and the synchronization cannot be performed on the real-time incremental data.
Disclosure of Invention
An embodiment of the present invention is directed to provide a method, an apparatus, a device, and a storage medium for data synchronization based on flexible search, which are used to overcome the above technical problems in the prior art.
The technical scheme provided by the embodiment of the application is as follows:
a method of data synchronization based on elastic search, comprising:
monitoring the transaction log of each node in the elastic search cluster;
under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file;
and converting the log file into a message and sending the message to a message cache middleware according to the determined type of the operation, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database.
Optionally, before monitoring the transaction log of each node in the flexible search cluster and pre-installing a collector on each node in the flexible search cluster, the method includes: a collector is pre-installed on each node in the elastic search cluster, data input of a configuration file of the collector is set to be an affair log of elastic search so as to search all log files in the affair log, and data output of the configuration file of the collector is set to be message cache middleware.
Optionally, the log file of the transaction log includes version information, wherein the version information includes an iteration version number;
the method further comprises the following steps: and determining whether the log file in the transaction log changes or not according to the iteration version number of the log file.
Optionally, the acquiring and analyzing the changed log file, and determining the type of the operation recorded by the changed log file specifically include:
extracting the changed log file according to a preset regular expression;
and determining the type of the operation recorded by the changed log file based on the extraction result.
Optionally, the extracting the changed log file according to a preset regular expression, and determining the type of the operation recorded by the changed log file based on the extraction result specifically include:
extracting the log file through a first regular expression, and determining the type of operation recorded by the log file according to the extraction result;
and under the condition that no result is extracted through the first regular expression, extracting the log file through the second regular expression, and determining the type of the operation recorded by the log file according to the extraction result.
Optionally, the determining, according to the extraction result, the type of the operation recorded by the log file specifically includes:
if only the unique mark in the log file is extracted, determining the type of the operation recorded by the log file as a deleting operation;
and if the unique mark and the operation data in the log file are extracted, determining the type of the operation recorded in the log file as an updating or inserting operation.
Optionally, the acquiring and analyzing the changed log file, and determining the type of the operation recorded by the changed log file specifically include:
extracting a unique mark in the log file through a third regular expression;
extracting operation data in the log file through a fourth regular expression;
determining the type of operation recorded by the log file based on the extraction result.
Optionally, the determining the type of the operation recorded by the log file based on the extraction result specifically includes:
if the operation data in the log file is extracted, determining the type of the operation recorded in the log file as an updating operation or an inserting operation;
and if the operation data in the log file is not extracted, determining the type of the operation recorded in the log file as a deleting operation.
Optionally, the acquiring and analyzing the changed log file, and determining the type of the operation recorded by the log file specifically include:
extracting a naming suffix of the log file;
determining the type of the operation recorded by the log file according to the named suffix;
and determining a regular expression according to the type of the operation, and extracting data in the log file.
Optionally, the type of the operation is specifically a delete operation;
the determining a regular expression according to the type of the operation and extracting data in the log file specifically includes: and determining the regular expression as a fifth regular expression according to the deleting operation, and extracting the unique mark in the log file through the fifth regular expression.
Optionally, the type of the operation is specifically an update or insertion operation;
the determining a regular expression according to the type of the operation and extracting data in the log file specifically includes: and determining the regular expression as a sixth regular expression according to the updating or inserting operation, and extracting the unique mark and the operation data in the log file through the sixth regular expression.
Optionally, the type of the operation is a delete operation, the converting the log file into a message and sending the message to a message caching middleware specifically includes: acquiring a unique mark in the log file, combining the deleting operation and the unique mark in a numerical value pair form to generate a message which conforms to a first preset format, and sending the message to a message queue of a message cache middleware;
the type of the operation is an update or insertion operation, the converting the log file into a message and sending the message to a message cache middleware specifically includes: and acquiring the unique mark and the operation data in the log file, combining the updating or inserting operation, the unique mark and the operation data in a numerical value pair form to generate a message which accords with a first preset format, and sending the message to a message queue of a message cache middleware.
A data synchronization device based on elastic search, comprising:
the monitoring module is used for monitoring the transaction log of each node in the elastic search cluster;
the analysis module is used for collecting and analyzing the changed log file under the condition that the log file in the transaction log is changed, and determining the type of the operation recorded by the log file;
and the sending module is used for converting the log file into a message and sending the message to the message cache middleware according to the determined type of the operation, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database.
A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of a data synchronization method as claimed in any one of the embodiments of the present application.
A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by a processor, implement the steps of a data synchronization method as in any one of the embodiments of the present application.
Compared with the prior art, the embodiment of the invention mainly has the following beneficial effects:
the transaction log of each node in the elastic search cluster is monitored; under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file; and converting the log file into a message and sending the message to a message cache middleware according to the determined operation type, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database, thereby obtaining the log file which changes in the elastic search in real time, converting the log file into the message and sending the message to the message cache middleware, and synchronizing the message to the target database after subscribing by the message consumption middleware, thereby realizing real-time incremental data synchronization.
Drawings
In order to more clearly illustrate the solution of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive labor.
Fig. 1 is a schematic view of an application scenario in an embodiment of the present application;
FIG. 2 is a schematic flowchart illustrating a data synchronization method based on flexible search according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram illustrating an implementation manner of step S202 in the embodiment of the present application;
fig. 4 is a schematic diagram illustrating another implementation manner of step S202 in the embodiment of the present application;
fig. 5 is a schematic diagram illustrating still another implementation manner of step S202 in the embodiment of the present application;
FIG. 6 is a schematic structural diagram of a data synchronization apparatus based on flexible search according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention; the terms "comprising" and "having," and any variations thereof, in the description and claims of this invention and the description of the above figures, are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and in the claims, or in the drawings, are used for distinguishing between different objects and not necessarily for describing a particular sequential order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions of the present invention better understood by those skilled in the art, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.
The transaction log of each node in the elastic search cluster is monitored; under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file; and converting the log file into a message and sending the message to a message cache middleware according to the determined operation type, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database, thereby obtaining the log file which changes in the elastic search in real time, converting the log file into the message and sending the message to the message cache middleware, and synchronizing the message to the target database after subscribing by the message consumption middleware, thereby realizing real-time incremental data synchronization.
Fig. 1 is a schematic view of an application scenario in an embodiment of the present application; as shown in fig. 1, the application scenario includes an ES database, a message caching middleware, a message consumption middleware (or also called a message consumer), and a target database. The message caching middleware is also called a message caching platform, such as kafka specifically. The message caching middleware and the message consumption middleware can form a message processing system. The ES database may constitute a flexible search cluster.
In this embodiment, a process of writing data into the ES database is schematically as follows:
when a piece of data needs to be written into an ES database, in order to increase the writing speed, the piece of data to be written is not directly written on a disk, but is written into a memory (also called a buffer) first. Meanwhile, in order to prevent a piece of data from being lost, a corresponding piece of data is added into a transaction log (also called a transactional log); when the default time is reached or a certain amount of data in the memory is reached, a Refresh (also called Refresh) is triggered, a piece of data in the buffer is refreshed (also called Refresh) into a cache (also called OS cache) of the operating system, and then a piece of data of the OS cache is written into a segment file (also called segment file) at set time intervals, for example, 1 second. Wherein a piece of data in the buffer is flushed to the OS cache, which means that the piece of data can be searched. When a piece of data enters the OS cache, one piece of data in the buffer is emptied, and meanwhile, one piece of data is written into the transactional, so that one piece of data in the transactional can be persisted into the disk at set time intervals, such as 5 seconds. Repeating the above operations, writing each piece of data into the buffer, and simultaneously writing a log file into the transom log, wherein the log file in the transom log can be continuously enlarged, and when reaching a certain degree or reaching a set time, triggering flush operation, namely, refreshing the data in the OS cache into a hard disk, deleting the old log file, and creating an empty log file. In addition, since a segment file is generated every time the buffer is updated, a merge (merge) operation is periodically performed.
For a message processing system, it may comprise: several message producers (producers), several kafka broker servers (also known as kafkamberoks), several message consumption middleware (also known as consumers), and a Zookeeper cluster, several kafka amberoks forming a kafka cluster. Among them, Producer is used to push (also called push) the message obtained when the following data synchronization method is performed to the kafka brooker in the kafka cluster. Consumer pulls (pull) messages from the Kafka cluster for consumption, thereby enabling real-time synchronization of data from ES data to the target database.
Since each kafkab server is a kafka server, each kafkab server can accommodate multiple categories of messages (also known as topics). Topic is actually a message queue, and a Topic can be divided into multiple partitions (also called partitions), and multiple partitions can be stored on multiple kafkabroks in a kafka cluster.
In addition, one to a plurality of consumers form a Consumer Group (for example, also called a message consumption middleware cluster), a Consumer in the Consumer Group is responsible for consuming data of different partitions, one Partition (Partition) can only be consumed by one message consumption middleware in one Consumer Group, and the Consumer groups are not affected by each other, that is, each message can only be consumed by one Consumer in the Consumer Group, but can be consumed by a plurality of Consumer Group groups, so that unicast and multicast are realized.
In the application scenario, all data is stored in a buffer or an OS cache before a flush operation is executed, and since the buffer or the OS cache is a memory, when a node where the ES database is located is down, the data in the memory is lost, and therefore, since data is also stored in the log transaction log, when the node is restarted again, the ES database actively reads a log file in the log, and the lost data in the memory is recovered to the memory buffer and the OS cache.
FIG. 2 is a schematic flowchart illustrating a data synchronization method based on flexible search according to an embodiment of the present disclosure; as shown in fig. 2, it includes:
s201, monitoring a transaction log of each node in the elastic search cluster;
in this embodiment, before monitoring the transaction log of each node in the elastic search cluster and installing a collector in advance for each node in the elastic search cluster, the method may include: a collector is pre-installed on each node in an elastic search cluster, data input of a configuration file of the collector is set to be an elastic search transaction log so as to search all log files in the transaction log, data output of the configuration file of the collector is set to be the message cache middleware, and therefore pre-installation of the collector is achieved directly on the basis of a query search function supported by the elastic search cluster, all log files in the transaction log can be conveniently searched in real time, changes of the log files in the transaction log are monitored, and the changes of the data include data insertion or data updating of an ES database or data deletion operation.
Further, the collector may be specifically a filebed to perform lightweight monitoring on the transaction log of each node in the elastic search cluster, that is, to monitor whether a log file in the transaction log changes.
Illustratively, when the suffix of the log file is tlog ending, when the data input of the collector configuration file is set as the flexibly searched transaction log to search all log files in the transaction log, a directory for accessing the flexibly searched transaction log may be added in the configuration file to monitor the file name of tlog ending in/data/directory in each node, thereby implementing accurate monitoring of the flexibly searched transaction log.
As described above, for the transcologe, each piece of data is written into the buffer, and at the same time, a log file is written into the transcologe, the log file in the transcologe is continuously enlarged, and when a certain degree or a set time is reached, flush operation is triggered, that is, the data in the OS cache is refreshed into the hard disk, the old log file is deleted, and an empty log file is created. As the log file in the transcog becomes larger, it is directly characterized by the iteration version number of the transcog, such as the iteration version number is incremented, such as specifically, for example, the first version of the transcog is transcog 1.tlog, when a set time is reached and a flush is triggered in 30 minutes, the transcog 1.tlog log file will be deleted and a new transcog is created, which is denoted as transcog 2. tlog.
For this, for example, if the log file of the transaction log includes version information, and the version information includes an iteration version number, the method may further include: and determining whether the log file in the transaction log changes or not according to the iteration version number of the log file. This step is included in step S201, and may be performed between steps S201 and S202.
Exemplarily, the determining whether the log file in the transaction log changes according to the iteration version number of the log file specifically includes: and judging whether the iteration version number of the log file changes or not, if so, determining that the log file in the transaction log changes, and if not, determining that the log file in the transaction log does not change, thereby realizing the quick and accurate judgment of whether the log file changes or not based on the iteration version number.
S202, under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file;
in this embodiment, the step S202 of determining the type of the operation recorded by the log file may be implemented by any one of the following manners shown in fig. 3 to 5.
Fig. 3 is a schematic diagram illustrating an implementation manner of step S202 in the embodiment of the present application; as shown in fig. 3, in this embodiment, the acquiring and analyzing the changed log file, and determining the type of the operation recorded by the changed log file specifically includes: extracting the changed log file according to a preset regular expression; and determining the type of the operation recorded by the changed log file based on the extraction result.
The regular expression is relatively simple to construct, so that the difficulty of algorithm implementation can be reduced, and rapid extraction can be realized.
Optionally, in this embodiment, the extracting the changed log file according to a preset regular expression in the step, and determining the type of the operation recorded by the changed log file based on the extraction result specifically includes:
S212A, extracting the log file through a first regular expression, and determining the type of operation recorded by the log file according to the extraction result;
S222A, under the condition that no result is extracted through the first regular expression, extracting the log file through the second regular expression, and determining the type of operation recorded by the log file according to the extraction result.
Specifically, the log file in the transaction log may be changed due to a change of data in the ES database, where the change of data includes a change caused by a data insertion or data update or a data deletion operation performed on the ES database, and the log file caused by the data insertion or data update in the ES database is the same. Therefore, when the regular expressions are constructed, two regular expressions are constructed to respectively extract the changed log files so as to judge whether the changed log files are caused by data insertion or data updating operation or data deleting operation. The two regular expressions are, for example, a log file caused by an insertion operation of corresponding data or an update operation of the data, and a log file caused by a deletion operation of the data, respectively.
For this, in performing the above steps S212A and S222A, one of the two regular expressions is selected as a first regular expression to perform step S212A first, and if the result is extracted in step S212A, the type of the operation recorded by the log file is determined directly according to the extraction result; otherwise, under the condition that no result is extracted through the first regular expression, taking the remaining one of the two regular expressions as a second regular expression to extract the log file.
Here, it should be noted that which one of the two regular expressions participates in the execution of the step S212A and which one participates in the execution of the step S222A may be flexibly selected according to an application scenario, for example, according to the occurrence frequency of operations.
Further, determining the type of the operation recorded by the log file according to the extraction result in step S202 specifically includes:
if only the unique mark in the log file is extracted, determining the type of the operation recorded by the log file as a deleting operation;
and if the unique mark and the operation data in the log file are extracted, determining the type of the operation recorded in the log file as an updating or inserting operation.
In this embodiment, the unique identifier may be, for example, a primary key, but the primary key is merely an example and is not limited to the example, and the unique identifier may include any type that can directly or indirectly characterize an operation.
Specifically, for the deletion operation of data, only the primary key that can be used as the unique identifier is saved in the corresponding log file, while for the update or insertion operation of data, not only the primary key that can be used as the unique identifier but also the operation data must be saved in the corresponding log file. Therefore, if only the unique mark in the log file is extracted, determining the type of the operation recorded by the log file as a deleting operation; and if the unique mark and the operation data in the log file are extracted, determining the type of the operation recorded in the log file as an updating or inserting operation.
Further, under the condition that no result is extracted through the first regular expression and no result is extracted through the second regular expression, the changed log file is discarded, so that the data processing efficiency is improved, and data backlog is avoided.
Fig. 4 is a schematic diagram illustrating another implementation manner of step S202 in the embodiment of the present application; as shown in fig. 4, in this embodiment, the acquiring and analyzing the changed log file, and determining the type of the operation recorded by the changed log file specifically includes:
S212B, extracting the unique mark in the log file through a third regular expression;
S222B, extracting operation data in the log file through a fourth regular expression;
S232B, determining the type of the operation recorded by the log file based on the extraction result.
In this embodiment, the third regular expression and the fourth regular expression may be selected from the two regular expressions.
Optionally, the determining the type of the operation recorded by the log file based on the extraction result specifically includes:
if the operation data in the log file is extracted, determining the type of the operation recorded in the log file as an updating operation or an inserting operation;
and if the operation data in the log file is not extracted, determining the type of the operation recorded in the log file as a deleting operation.
Specifically, as described above, for the deletion operation of data, only the primary key that can be used as the unique identifier is saved in the corresponding log file, whereas for the update or insertion operation of data, not only the primary key that can be used as the unique identifier but also the operation data must be saved in the corresponding log file. Therefore, whether the operation is updating or inserting or deleting can be distinguished by judging whether the changed log file stores the operation data or not. Therefore, the type of the operation recorded by the log file can be quickly and accurately determined by judging whether the operation data is included in the log file or not.
Furthermore, no matter for the deletion operation of the data, or for the update or insertion operation of the data, the corresponding log file stores the primary key which can be used as the unique mark, so when the type of the operation recorded by the log file is determined based on the extraction result, if the unique mark in the log file is not extracted, the changed log file is discarded, thereby improving the efficiency of data processing and avoiding data backlog.
Fig. 5 is a schematic diagram illustrating still another implementation manner of step S202 in the embodiment of the present application; as shown in fig. 5, in this embodiment, the collecting and analyzing the changed log file, and determining the type of the operation recorded by the changed log file specifically include:
S212C, extracting a naming suffix of the log file;
s222, determining the type of the operation recorded by the log file according to the naming suffix 222C;
S232C, determining a regular expression according to the type of the operation, and extracting data in the log file.
Optionally, if the type of the operation is specifically a deletion operation, in step S232C, determining a regular expression according to the type of the operation, and extracting data in the log file includes: and determining the regular expression as a fifth regular expression according to the deleting operation, and extracting the unique mark in the log file through the fifth regular expression.
Optionally, the type of the operation is specifically an update or insertion operation; in step S232C, the determining a regular expression according to the type of the operation, and extracting data in the log file specifically includes: and determining the regular expression as a sixth regular expression according to the updating or inserting operation, and extracting the unique mark and the operation data in the log file through the sixth regular expression.
Illustratively, the fifth regular expression is, for example, a regular expression of the log file caused by a deletion operation of corresponding data in the above two regular expressions, and the sixth regular expression is a regular expression of the log file caused by an insertion operation of corresponding data or an update operation of data.
Specifically, if the log file corresponding to the change is caused by the deletion operation of the data, a file with a.del suffix is generated, and the state of the log file marked in the.del file is delete; if an update or insert operation on the log file data corresponding to the change results, a corresponding segment file, i.e., a segment file, may be generated, followed by a si file. According to whether the suffix is delta or si, whether the type of the operation recorded by the log file is an insertion or update operation of data or a deletion operation of data can be quickly determined.
Specifically, in this embodiment, the two regular expressions may specifically be: "(? (? (. By "(? (? (.
The third regular expression in the embodiment shown in fig. 4 is, for example, "(? (? (.
Here, it should be noted that the specific structure of the regular expression herein is merely an example and is not limited.
In this embodiment, the operation data may be data in json format, for example. However, it should be noted that the operation data is data in json format, which is merely an example and is not limited thereto.
S203, converting the log file into a message and sending the message to a message cache middleware according to the determined operation type, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database.
Illustratively, if the type of the operation is a delete operation, the converting the log file into a message and sending the message to a message cache middleware specifically includes: and acquiring the unique mark in the log file, combining the deleting operation and the unique mark in a numerical value pair form to generate a message in accordance with a first preset format, and sending the message to a message queue of a message cache middleware.
The type of the operation is an update or insertion operation, the converting the log file into a message and sending the message to a message cache middleware specifically includes: and acquiring the unique mark and the operation data in the log file, combining the updating or inserting operation, the unique mark and the operation data in a numerical value pair form to generate a message which accords with a first preset format, and sending the message to a message queue of a message cache middleware.
Specifically, the first preset format may be a json format.
Illustratively, the deleting operation and unique flag are combined in the form of value pairs to generate a message conforming to a first preset format, such as { "type": DELETE "} (_ id), and the updating or inserting operation, unique flag, and operation data are combined in the form of value pairs to generate a message conforming to a first preset format, such as {" type ": UPDATE (_ id | $).
Further, the message further includes: the time stamp takes the time of the collector collecting the log file as the time stamp of the message; and after receiving the message, the message consumption middleware sorts the message according to the timestamp of the message, so that the message is consumed in a first-in first-out mode when consumed, and data collision or data disorder is avoided.
Illustratively, the converting the parsed log file into a message and sending the message to a message cache middleware, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database specifically includes:
converting the analyzed log file into a message and sending the message to a message cache middleware, so that the message consumption middleware subscribes the message from a message queue of the message cache middleware according to the sequencing and synchronizes the message to a target database;
and the target database adopts a data warehouse, the data warehouse extracts the data in the message and maps the data into a database table, and the extracted data in the message comprises the type of operation, the unique identifier and a timestamp.
In particular, the data warehouse may be a Hive warehouse. In addition, it should be noted that, in the embodiment of the present application, the target database is not limited uniquely, and may be determined by a person of ordinary skill in the art according to requirements of an application scenario, for example, the target database may be mysql, oracle, sqlserver, mongodb, and the like.
FIG. 6 is a schematic structural diagram of a data synchronization apparatus based on flexible search according to an embodiment of the present application; as shown in fig. 6, the data synchronization apparatus 600 for elastic search includes:
a monitoring module 601, configured to monitor a transaction log of each node in the elastic search cluster;
the analysis module 602 is configured to, when a log file in the transaction log changes, acquire and analyze the changed log file, and determine a type of an operation recorded by the log file;
a sending module 603, configured to convert the log file into a message according to the determined type of the operation, and send the message to a message caching middleware, so that the message consuming middleware subscribes the message from the message caching middleware and synchronizes the message to a target database.
Optionally, the monitoring module 601 is further configured to, before monitoring the transaction log of each node in the elastic search cluster, pre-install a collector on each node in the elastic search cluster, and set the data input of the configuration file of the collector as the transaction log of the elastic search to search all log files in the transaction log, where the data output of the configuration file of the collector is set as a message cache middleware.
Optionally, the log file of the transaction log includes version information, wherein the version information includes an iteration version number;
the monitoring module 601 is further configured to determine whether a log file in the transaction log changes according to the iteration version number of the log file.
Optionally, the parsing module 602 is specifically configured to:
extracting the changed log file according to a preset regular expression;
and determining the type of the operation recorded by the changed log file based on the extraction result.
Optionally, the parsing module 602 is specifically configured to:
extracting the log file through a first regular expression, and determining the type of operation recorded by the log file according to the extraction result;
and under the condition that no result is extracted through the first regular expression, extracting the log file through the second regular expression, and determining the type of the operation recorded by the log file according to the extraction result.
Optionally, the parsing module 602 is specifically configured to:
if only the unique mark in the log file is extracted, determining the type of the operation recorded by the log file as a deleting operation;
and if the unique mark and the operation data in the log file are extracted, determining the type of the operation recorded in the log file as an updating or inserting operation.
Optionally, the parsing module 602 is specifically configured to:
extracting a unique mark in the log file through a third regular expression;
extracting operation data in the log file through a fourth regular expression;
determining the type of operation recorded by the log file based on the extraction result.
Optionally, the parsing module 602 is specifically configured to:
if the operation data in the log file is extracted, determining the type of the operation recorded in the log file as an updating operation or an inserting operation;
and if the operation data in the log file is not extracted, determining the type of the operation recorded in the log file as a deleting operation.
Optionally, the parsing module 602 is specifically configured to:
extracting a naming suffix of the log file;
determining the type of the operation recorded by the log file according to the named suffix;
and determining a regular expression according to the type of the operation, and extracting data in the log file.
Optionally, the type of the operation is specifically a delete operation;
the parsing module 602 is specifically configured to: and determining the regular expression as a fifth regular expression according to the deleting operation, and extracting the unique mark in the log file through the fifth regular expression.
Optionally, the type of the operation is specifically an update or insertion operation;
the parsing module 602 is specifically configured to: and determining the regular expression as a sixth regular expression according to the updating or inserting operation, and extracting the unique mark and the operation data in the log file through the sixth regular expression.
Optionally, the type of the operation is a deletion operation, and the sending module 603 is specifically configured to: acquiring a unique mark in the log file, combining the deleting operation and the unique mark in a numerical value pair form to generate a message which conforms to a first preset format, and sending the message to a message queue of a message cache middleware;
the type of the operation is an update or insertion operation, and the sending module 603 is specifically configured to: and acquiring the unique mark and the operation data in the log file, combining the updating or inserting operation, the unique mark and the operation data in a numerical value pair form to generate a message which accords with a first preset format, and sending the message to a message queue of a message cache middleware.
In fig. 6, for an exemplary explanation of each module, reference may be made to the examples shown in fig. 1 to fig. 5, which are not described herein again.
FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application; as shown in fig. 7, the computer device 700 comprises a memory 701 having computer readable instructions stored therein, and a processor 702, which when executed implements the steps of the data synchronization method according to any of the embodiments of the present application.
The present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions, when executed by a processor, implement the steps of the data synchronization method according to any one of the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention without limiting its scope. This invention may be embodied in many different forms and, on the contrary, these embodiments are provided so that this disclosure will be thorough and complete. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and modifications can be made, and equivalents may be substituted for elements thereof. All equivalent structures made by using the contents of the specification and the attached drawings of the invention can be directly or indirectly applied to other related technical fields, and are also within the protection scope of the patent of the invention.
Claims (15)
1. A data synchronization method based on elastic search is characterized by comprising the following steps:
monitoring the transaction log of each node in the elastic search cluster;
under the condition that a log file in the transaction log changes, collecting and analyzing the changed log file, and determining the type of operation recorded by the log file;
and converting the log file into a message and sending the message to a message cache middleware according to the determined type of the operation, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database.
2. The data synchronization method according to claim 1, wherein before monitoring the transaction log of each node in the elastic search cluster, the method comprises: a collector is pre-installed on each node in the elastic search cluster, data input of a configuration file of the collector is set to be an affair log of elastic search so as to search all log files in the affair log, and data output of the configuration file of the collector is set to be message cache middleware.
3. The data synchronization method of claim 1, wherein the log file of the transaction log comprises version information, wherein the version information comprises an iteration version number;
the method further comprises the following steps: and determining whether the log file in the transaction log changes or not according to the iteration version number of the log file.
4. The data synchronization method according to claim 1, wherein the collecting and analyzing the changed log file and determining the type of the operation recorded by the changed log file specifically include:
extracting the changed log file according to a preset regular expression;
and determining the type of the operation recorded by the changed log file based on the extraction result.
5. The data synchronization method according to claim 4, wherein the extracting the changed log file according to a preset regular expression, and determining the type of the operation recorded by the changed log file based on the extraction result specifically includes:
extracting the log file through a first regular expression, and determining the type of operation recorded by the log file according to the extraction result;
and under the condition that no result is extracted through the first regular expression, extracting the log file through the second regular expression, and determining the type of the operation recorded by the log file according to the extraction result.
6. The data synchronization method according to claim 5, wherein the determining the type of the operation recorded by the log file according to the extraction result specifically comprises:
if only the unique mark in the log file is extracted, determining the type of the operation recorded by the log file as a deleting operation;
and if the unique mark and the operation data in the log file are extracted, determining the type of the operation recorded in the log file as an updating or inserting operation.
7. The data synchronization method according to claim 1, wherein the collecting and analyzing the changed log file and determining the type of the operation recorded by the changed log file specifically include:
extracting a unique mark in the log file through a third regular expression;
extracting operation data in the log file through a fourth regular expression;
determining the type of operation recorded by the log file based on the extraction result.
8. The data synchronization method according to claim 7, wherein the determining the type of the operation recorded by the log file based on the extraction result specifically includes:
if the operation data in the log file is extracted, determining the type of the operation recorded in the log file as an updating operation or an inserting operation;
and if the operation data in the log file is not extracted, determining the type of the operation recorded in the log file as a deleting operation.
9. The data synchronization method according to claim 1, wherein the collecting and analyzing the changed log file and determining the type of operation recorded by the log file specifically include:
extracting a naming suffix of the log file;
determining the type of the operation recorded by the log file according to the named suffix;
and determining a regular expression according to the type of the operation, and extracting data in the log file.
10. The data synchronization method according to claim 9, wherein the type of the operation is specifically a delete operation;
the determining a regular expression according to the type of the operation and extracting data in the log file specifically includes: and determining the regular expression as a fifth regular expression according to the deleting operation, and extracting the unique mark in the log file through the fifth regular expression.
11. The data synchronization method according to claim 9, characterized in that the type of the operation is in particular an update or an insert operation;
the determining a regular expression according to the type of the operation and extracting data in the log file specifically includes: and determining the regular expression as a sixth regular expression according to the updating or inserting operation, and extracting the unique mark and the operation data in the log file through the sixth regular expression.
12. The data synchronization method according to claim 1, wherein the type of the operation is a delete operation, and the converting the log file into a message and sending the message to a message cache middleware specifically includes: acquiring a unique mark in the log file, combining the deleting operation and the unique mark in a numerical value pair form to generate a message which conforms to a first preset format, and sending the message to a message queue of a message cache middleware;
the type of the operation is an update or insertion operation, the converting the log file into a message and sending the message to a message cache middleware specifically includes: and acquiring the unique mark and the operation data in the log file, combining the updating or inserting operation, the unique mark and the operation data in a numerical value pair form to generate a message which accords with a first preset format, and sending the message to a message queue of a message cache middleware.
13. An apparatus for synchronizing data based on elastic search, comprising:
the monitoring module is used for monitoring the transaction log of each node in the elastic search cluster;
the analysis module is used for collecting and analyzing the changed log file under the condition that the log file in the transaction log is changed, and determining the type of the operation recorded by the log file;
and the sending module is used for converting the log file into a message and sending the message to the message cache middleware according to the determined type of the operation, so that the message consumption middleware subscribes the message from the message cache middleware and synchronizes the message to a target database.
14. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of a data synchronization method as claimed in any one of claims 1 to 12.
15. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the data synchronization method of any one of claims 1 to 12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111555692.2A CN114254016A (en) | 2021-12-17 | 2021-12-17 | Data synchronization method, device and equipment based on elastic search and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111555692.2A CN114254016A (en) | 2021-12-17 | 2021-12-17 | Data synchronization method, device and equipment based on elastic search and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114254016A true CN114254016A (en) | 2022-03-29 |
Family
ID=80792926
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111555692.2A Pending CN114254016A (en) | 2021-12-17 | 2021-12-17 | Data synchronization method, device and equipment based on elastic search and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114254016A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114978889A (en) * | 2022-05-13 | 2022-08-30 | 厦门兆翔智能科技有限公司 | Airport enterprise service bus system |
CN116089545A (en) * | 2023-04-07 | 2023-05-09 | 云筑信息科技(成都)有限公司 | Method for collecting storage medium change data into data warehouse |
CN116719821A (en) * | 2023-08-09 | 2023-09-08 | 北京联云天下科技有限公司 | Concurrent data insertion elastic search weight removing method, device and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090198772A1 (en) * | 2008-01-31 | 2009-08-06 | Samsung Electronics Co. Ltd. | Data synchronization method and system between devices |
CN107402963A (en) * | 2017-06-20 | 2017-11-28 | 阿里巴巴集团控股有限公司 | Search for construction method, the method for pushing and device and equipment of incremental data of data |
CN108399256A (en) * | 2018-03-06 | 2018-08-14 | 北京慧萌信安软件技术有限公司 | Heterogeneous database content synchronization method, device and middleware |
US20190171650A1 (en) * | 2017-12-01 | 2019-06-06 | Chavdar Botev | System and method to improve data synchronization and integration of heterogeneous databases distributed across enterprise and cloud using bi-directional transactional bus of asynchronous change data system |
CN111400378A (en) * | 2020-02-18 | 2020-07-10 | 中国平安人寿保险股份有限公司 | Real-time log display method and device based on ElasticSearch, computer equipment and medium |
CN111444200A (en) * | 2020-02-27 | 2020-07-24 | 中国平安人寿保险股份有限公司 | Data updating method and storage medium |
CN112000737A (en) * | 2020-08-14 | 2020-11-27 | 苏州浪潮智能科技有限公司 | Data synchronization method, system, terminal and storage medium based on multi-cloud management |
CN112182001A (en) * | 2020-09-27 | 2021-01-05 | 浪潮云信息技术股份公司 | Method, apparatus and medium for incremental synchronization of database to dynamic ES index library |
CN112307037A (en) * | 2019-07-26 | 2021-02-02 | 北京京东振世信息技术有限公司 | Data synchronization method and device |
CN112905618A (en) * | 2021-04-06 | 2021-06-04 | 浙江网商银行股份有限公司 | Data processing method and device |
CN113176978A (en) * | 2021-04-30 | 2021-07-27 | 平安壹钱包电子商务有限公司 | Monitoring method, system and device based on log file and readable storage medium |
-
2021
- 2021-12-17 CN CN202111555692.2A patent/CN114254016A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090198772A1 (en) * | 2008-01-31 | 2009-08-06 | Samsung Electronics Co. Ltd. | Data synchronization method and system between devices |
CN107402963A (en) * | 2017-06-20 | 2017-11-28 | 阿里巴巴集团控股有限公司 | Search for construction method, the method for pushing and device and equipment of incremental data of data |
US20190171650A1 (en) * | 2017-12-01 | 2019-06-06 | Chavdar Botev | System and method to improve data synchronization and integration of heterogeneous databases distributed across enterprise and cloud using bi-directional transactional bus of asynchronous change data system |
CN108399256A (en) * | 2018-03-06 | 2018-08-14 | 北京慧萌信安软件技术有限公司 | Heterogeneous database content synchronization method, device and middleware |
CN112307037A (en) * | 2019-07-26 | 2021-02-02 | 北京京东振世信息技术有限公司 | Data synchronization method and device |
CN111400378A (en) * | 2020-02-18 | 2020-07-10 | 中国平安人寿保险股份有限公司 | Real-time log display method and device based on ElasticSearch, computer equipment and medium |
CN111444200A (en) * | 2020-02-27 | 2020-07-24 | 中国平安人寿保险股份有限公司 | Data updating method and storage medium |
CN112000737A (en) * | 2020-08-14 | 2020-11-27 | 苏州浪潮智能科技有限公司 | Data synchronization method, system, terminal and storage medium based on multi-cloud management |
CN112182001A (en) * | 2020-09-27 | 2021-01-05 | 浪潮云信息技术股份公司 | Method, apparatus and medium for incremental synchronization of database to dynamic ES index library |
CN112905618A (en) * | 2021-04-06 | 2021-06-04 | 浙江网商银行股份有限公司 | Data processing method and device |
CN113176978A (en) * | 2021-04-30 | 2021-07-27 | 平安壹钱包电子商务有限公司 | Monitoring method, system and device based on log file and readable storage medium |
Non-Patent Citations (2)
Title |
---|
YUBIAO WANG: ""Research on Incremental Heterogeneous Database Synchronization Update Based on Web Service"", 《 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS》, 18 August 2016 (2016-08-18) * |
吴雅娟: ""基于操作日志的完井数据同步模型"", 《计算机系统应用》, 15 May 2015 (2015-05-15) * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114978889A (en) * | 2022-05-13 | 2022-08-30 | 厦门兆翔智能科技有限公司 | Airport enterprise service bus system |
CN114978889B (en) * | 2022-05-13 | 2024-04-16 | 厦门兆翔智能科技有限公司 | Airport enterprise service bus system |
CN116089545A (en) * | 2023-04-07 | 2023-05-09 | 云筑信息科技(成都)有限公司 | Method for collecting storage medium change data into data warehouse |
CN116089545B (en) * | 2023-04-07 | 2023-08-22 | 云筑信息科技(成都)有限公司 | Method for collecting storage medium change data into data warehouse |
CN116719821A (en) * | 2023-08-09 | 2023-09-08 | 北京联云天下科技有限公司 | Concurrent data insertion elastic search weight removing method, device and storage medium |
CN116719821B (en) * | 2023-08-09 | 2023-10-10 | 北京联云天下科技有限公司 | Concurrent data insertion elastic search weight removing method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114254016A (en) | Data synchronization method, device and equipment based on elastic search and storage medium | |
JP7271670B2 (en) | Data replication method, device, computer equipment and computer program | |
CN110321387B (en) | Data synchronization method, equipment and terminal equipment | |
CN107402963B (en) | Search data construction method, incremental data pushing device and equipment | |
CN110502583B (en) | Distributed data synchronization method, device, equipment and readable storage medium | |
Markowetz et al. | Keyword search on relational data streams | |
CN105183860B (en) | Method of data synchronization and system | |
US20220129468A1 (en) | Method, device, and program product for managing index of streaming data storage system | |
CN111881011A (en) | Log management method, platform, server and storage medium | |
TW201301062A (en) | Extracting incremental data | |
CN101901237A (en) | Type-Index-Value distributed database built based on SQIite | |
CN109947729B (en) | Real-time data analysis method and device | |
CN112328702B (en) | Data synchronization method and system | |
US11599425B2 (en) | Method, electronic device and computer program product for storage management | |
CN115328894A (en) | Data processing method based on data blood margin | |
CN111897867A (en) | Database log statistical method, system and related device | |
CN103034650A (en) | System and method for processing data | |
CN116089545B (en) | Method for collecting storage medium change data into data warehouse | |
CN115391457B (en) | Cross-database data synchronization method, device and storage medium | |
CN108564250A (en) | Forwarding record real time inspection system, method, computer equipment and storage medium | |
CN116821179A (en) | Dream database cross-database searching system and method | |
CN114553970A (en) | Distributed message processing method based on Kafka and data bus system | |
CN115374939A (en) | Expert knowledge base construction method based on multi-label dynamic update | |
CN110287172B (en) | Method for formatting HBase data | |
CN113760950A (en) | Index data query method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |