CN112100160B - Elastic Search based double-activity real-time data warehouse construction method - Google Patents

Elastic Search based double-activity real-time data warehouse construction method Download PDF

Info

Publication number
CN112100160B
CN112100160B CN202011224108.0A CN202011224108A CN112100160B CN 112100160 B CN112100160 B CN 112100160B CN 202011224108 A CN202011224108 A CN 202011224108A CN 112100160 B CN112100160 B CN 112100160B
Authority
CN
China
Prior art keywords
data
elastic search
file
time
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011224108.0A
Other languages
Chinese (zh)
Other versions
CN112100160A (en
Inventor
谭巍
陈卫
田浩兵
张奎
李烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan XW Bank Co Ltd
Original Assignee
Sichuan XW Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan XW Bank Co Ltd filed Critical Sichuan XW Bank Co Ltd
Priority to CN202011224108.0A priority Critical patent/CN112100160B/en
Publication of CN112100160A publication Critical patent/CN112100160A/en
Application granted granted Critical
Publication of CN112100160B publication Critical patent/CN112100160B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a double-activity real-time data warehouse construction method based on Elastic Search, which relates to the technical field of big data real-time calculation and solves the problem that the consistency of data cannot be ensured in the real-time data warehouse construction in the prior art; the scheme comprises the following steps: acquiring an index main fragment on each node in an Elastic Search cluster A, and reading a pre-written log record of each main fragment under the node; judging the read pre-written log record, and writing the read data into an Elastic Search cluster B in a synchronous blocking mode; rewriting the data with write failure, detecting the data which exists on the disk and is persistent due to write failure at regular time, sending the abnormal error message in operation to the kafka cluster, and accessing to monitor and alarm in real time. The method can ensure that the data in the two clusters are completely consistent, and is mainly applied to the field of big data analysis.

Description

Elastic Search based double-activity real-time data warehouse construction method
Technical Field
The invention relates to the technical field of big data real-time calculation, in particular to a double-activity real-time data warehouse construction method based on Elastic Search.
Background
In the current age of big data, there are many data warehouses for storing massive data, and a distributed search Engine (ES) is one of them. The distributed search engine ElasticSearch is an open-source, distributed and Restful search server constructed based on Lucene and is generally used in cloud computing. It can conveniently make a large amount of data have the capability of searching, analyzing and exploring. The horizontal flexibility of the distributed search engine ElasticSearch is fully utilized, so that the data can become more valuable in a production environment.
With the wider application range of the big data technology in the financial field, the timeliness requirement on the data is higher and higher, such as real-time accurate marketing and real-time risk control anti-fraud. In order to meet the requirements of business scenes, a real-time data warehouse is basically established, but obvious business high-low peak fluctuation exists in financial industries such as banks and the like, so that higher requirements are put forward on the established real-time data warehouse, the high availability of the real-time data warehouse needs to be ensured, flow sharing needs to be considered in the business peak time, and the fluency of user experience is ensured. The Elastic Search cluster comprises a plurality of nodes, each node comprises more than one index, each index is divided into more than one index fragment, and the index fragments only comprise a main fragment or simultaneously comprise the main fragment and more than one copy.
In the prior art, the following two methods are mainly used for real-time data warehouse construction:
application layer double writing: data is written into 2 clusters through application layer codes, data is written into 2 clusters through deployment of 2 sets of service codes, and meanwhile the consistency of the data in the 2 clusters is guaranteed through an application layer. The method is simplest, but the later management and maintenance are troublesome, for example, once online rollback operation needs to be written twice, deployment needs to be performed twice, and meanwhile, the problem of data consistency exists.
Message queue pull:
the method includes the steps that data needing to be written are placed into a message queue, such as kafka, and then 2 clusters respectively pull the data from the same message queue.
The same problem exists with both of the above methods: the consistency of data cannot be guaranteed, the problem that the data can be successfully written in one cluster and fails to be written in the other cluster exists, the root cause is that the two-time writing makes 2 operations independent, and meanwhile, the later management and maintenance cost of the method is high.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a double-live real-time data warehouse construction method based on Elastic Search, which aims to:
the technical scheme adopted by the invention is as follows:
a double-activity real-time data warehouse construction method based on Elastic Search comprises an Elastic Search cluster A and an Elastic Search cluster B, and comprises the following steps:
a, acquiring an index main fragment on each node in an Elastic Search cluster A, wherein each main fragment stores the IP address of the node where the main fragment is located;
b, reading the pre-written log record of each main sub-slice under each node under the data disc directory on each node;
and C: judging the read pre-written log records, and writing the pre-written log records meeting the requirements into a circular buffer queue;
step D, reading data in the annular buffer queue in a multithread mode, and writing the read data into an Elastic Search cluster B in a synchronous blocking mode;
step E: judging whether all data are successfully written, if the data are unsuccessfully written, rewriting, and if the data are rewritten for more than a specified number of times, preferably 3 times, writing the data into a disk for persistence;
step F: detecting data which exist on the disk and are persisted due to write-in failure at regular time, preferably detecting the data once in five minutes, if the data exist, obtaining the persisted data in the disk and writing the data into an Elastic Search cluster B, and clearing the successfully written data content on the disk after the data are successfully written;
and G, sending the running abnormal error message to the kafka cluster, and accessing the monitoring real-time alarm.
The Elastic Search cluster B writes data into the Elastic Search cluster A through the steps, and the two clusters achieve real-time synchronization through mutual writing.
Further, the step a specifically comprises the following steps:
step A1: accessing the Elastic Search cluster A in an http mode to obtain a hash character corresponding to each index under all nodes, storing the obtained hash character into a Map object belonging to the current node, wherein the English name of the index is used as a key of the Map object, and the hash character corresponding to the index is used as a value of the Map object;
step A2: and transmitting the key values of the Map object into all nodes of an Elastic Search cluster A to obtain an indexed main fragment in an http batch request mode, comparing whether the IP address of the current node of the Elastic Search cluster A is consistent with the IP address of the node where the obtained indexed main fragment is located, and storing the consistent IP address into a main fragment Map object maintained by the current node.
The invention stores the IP address, which is convenient to determine the main fragment on the node, and the result returned by the query in the Elastic Search cluster contains the main fragments of all the nodes, so the main fragments are stored in the IP address to be separated, and the main fragments are determined to be the main fragments of the node.
Further, the step B specifically includes:
step B1: traversing and acquiring a key value of a Map object under a data disk directory of each node, transmitting the key value of the Map object into a current node, acquiring a path of a main fragment of each index on a disk, acquiring a file set of pre-written log records under the main fragment directory under the path of the main fragment, and sequencing the acquired pre-written log records according to update time, wherein the update time is closest to the current time and is arranged at the front;
step B2: traversing the obtained pre-written log file set, reading the content in each file in a Java nio mode to obtain the offset, the number of written pieces and the file algebra of each file, and judging the value of the offset of each file; if the value of the file offset is smaller than or equal to the specified value, skipping the file, and if the value of the file offset is larger than the specified value, finding the corresponding log record file according to the file algebra, reading the byte number content of the specified offset, and simultaneously adding or updating the offset into a Map object maintained by a current node, wherein the specified value is preferably 55. The offset 55 is defined because the content of the data can be completely successfully read at 55, which is advantageous.
According to the method, a file name set at the end of the segment of No. ckp is obtained through a path of a main segment, namely, the file name set is a pre-written log record, an Elastic Search can divide a data file by taking 65M as a standard, and the file at the end of the segment of No. ckp records some metadata information of the divided files; sorting the files according to the update time, wherein the latest time closest to the current time is ranked at the top; real-time updating can be realized, and the latest updating is obtained each time; when the data file is less than 65M, the data file is read once, after the data file is updated, the read data is skipped through the offset of the Map object record, and the data is read from the updated position.
Further, the step C specifically comprises:
and filtering and screening the data content of the read file, filtering the data content if the read data content contains a 'transclog' key word, selecting the data content if the read data content does not contain the 'transclog' key word, adding a 'transclog' key word to the data content, and writing the data content into the ring buffer queue.
By adding a "transclog" key, a message is prevented from being synchronized back and forth between two Elastic Search clusters.
Further, the step E specifically includes:
step E1, checking the written data, obtaining a returned result JSON character string after being submitted to the cluster B, obtaining the value of an alarm field in the JSON character string, if the value is true, the fact that the submission has errors is shown, if the value is false, the fact that the submission has no errors is shown, the submission is successful, on the premise that the value is true, obtaining the number of data volumes in the returned JSON character string, comparing the number with the data volumes before the writing, if the data volumes are equal, the fact that the submission is successful, then reading the data from the annular buffer queue for the next time, and if the submission has errors or the returned writing number is inconsistent, rewriting;
step E2: and outputting the data which is not successfully submitted to the disk, rewriting for a specified time, preferably 3 times, if the data is not successfully submitted, storing the data to the disk in a persistent mode, and then reading the data from the ring buffer queue for the next time.
The data can be rewritten to prevent the failure of submission caused by the error of reading the data, but if the data is rewritten for 3 times and still fails, the data is persisted to the disk.
Further, step F specifically comprises: detecting whether data which is write-failed and persisted exists on the disk at regular intervals, if so, reading all the persisted data, then submitting the data to an Elastic Search cluster B, and clearing the content of the data in the file after successful submission.
Further, the step G specifically comprises: in the operation process, the abnormal error can send the error content to the communication equipment appointed by the kafka cluster, then the communication equipment is accessed to monitor, the monitor sends the alarm message to the receiver, and the receiver processes the error content in time.
In the running process, for example, the opposite side cluster is unavailable, the network connection is overtime, and the written cluster has error contents such as data loss and the like. Then, the alarm message is sent to the receiver for processing in time through access monitoring.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: mutual synchronization details among internal data are shielded outwards by analyzing the pre-written log records of the bottom layer main fragment, the difficulty of an application layer is reduced firstly, the application layer does not need to write doubly any more, and the consistency of the data is not required to be ensured, so that the problems of double-write later-period management and maintenance are avoided, the problem of double-write inconsistency is solved secondly, the writing operation is ensured to be atomic, the data in two clusters are ensured to be completely consistent, the data is successfully written in one cluster and is bound to appear in the other cluster, and the scheme also has good wide applicability, and not only can the mutual real-time synchronization among the Elastic Search cluster data under different versions be supported.
Drawings
The invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIG. 1 is a schematic flow diagram of one embodiment of the present invention;
FIG. 2 is a schematic flow chart of writing persistent data to the cluster of the disk in FIG. 1.
Detailed Description
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
The present invention will be described in detail with reference to fig. 1 and 2, wherein "Y" represents "YES" and "N" represents "NO" in the flowchart.
The invention relates to a double-activity real-time data warehouse construction method based on Elastic Search, which comprises an Elastic Search cluster B and an Elastic Search cluster A, wherein data are mutually synchronized between the Elastic Search cluster B and the Elastic Search cluster A by analyzing pre-written log records of a bottom layer.
The method comprises the following specific steps:
a, acquiring an index main fragment on each node in an Elastic Search cluster A, wherein each main fragment stores the IP address of the node where the main fragment is located;
step A1: accessing the Elastic Search cluster A in an http mode to obtain a hash character corresponding to each index under all nodes, storing the obtained hash character into a Map object belonging to the current node, wherein the English name of the index is used as a key of the Map object, and the hash character corresponding to the index is used as a value of the Map object;
step A2: and transmitting the key values of the Map object into all nodes of an Elastic Search cluster A to obtain an indexed main fragment in an http batch request mode, comparing whether the IP address of the current node of the Elastic Search cluster A is consistent with the IP address of the node where the obtained indexed main fragment is located, and storing the consistent IP address into a main fragment Map object maintained by the current node.
The Map object maintains all master shards on the current node. The IP address is stored to facilitate the determination of the main fragment on the node, and because the result returned by the Elastic Search cluster contains all the nodes, the main fragment is separated from all other main fragments by the IP address.
Design of Map object: and splicing the hash character string with the index value with a new character string formed by the fragment number, wherein the key is the English name of the index. As follows:
(TnRucew-ThawtzMZcfMXpw/4,flinkxw123)。
where TnRucew-thawttzmzcfmxpw is a hash string indexing the english name flinkxw123, with 4 being denoted as master partition number 4.
B, reading the pre-written log record of each main sub-slice under each node under the data disc directory on each node;
b1, traversing the data disk directory of each node to obtain the key value of the Map object, transmitting the key value of the Map object to the current node to obtain the path of the main fragment of each index on the disk, obtaining the file name set at the end of the main fragment directory of the & ltSUB & gt & lt ckp & gt under the path of the main fragment, and sequencing the obtained file sets according to the update time, wherein the update time closest to the current time is arranged at the forefront;
step B2: traversing the obtained pre-written log file set, reading the content in each file in a Java nio mode to obtain the offset (offset), the number of written pieces (numOps) and the file algebra (generation) of each file, and judging the value of the offset of each file; if the offset value of the file is less than or equal to 55, skipping the file, and if the offset value of the file is greater than 55, finding a corresponding log record file according to the file algebra, reading the byte number content of the specified offset, and adding or updating the offset into a Map object maintained by a current node. The offset 55 is defined because the content of the data can be completely successfully read at 55, which is advantageous.
According to the method, a file name set at the end of the segment of No. ckp is obtained through a path of a main segment, namely, the file name set is a pre-written log record, an Elastic Search can divide a data file by taking 65M as a standard, and the file at the end of the segment of No. ckp records some metadata information of the divided files; sorting the files according to the update time, wherein the latest time closest to the current time is ranked at the top; real-time updating can be realized, and the latest updating is obtained each time; when the data file is less than 65M, the data file is read once, after the data file is updated, the read data is skipped through the offset of the Map object record, and the data is read from the updated position.
And C: and filtering and screening the data content of the read file, filtering the data content if the read data content contains a 'transclog' key word, selecting the data content if the read data content does not contain the 'transclog' key word, adding a 'transclog' key word to the data content, and writing the data content into the ring buffer queue.
The new addition of a "transclog" key avoids a piece of content from being synchronized back and forth between two Elastic Search clusters, a piece of newly written data is indicated by parsing if the entire "transclog" key is not included in the content, and no synchronization is required if the "transclog" key indicates that the piece of newly written data has been parsed.
Step D, reading data in the annular buffer queue in a multithread mode, and writing the read data into an Elastic Search cluster B in a synchronous blocking mode;
step E: judging whether all data are successfully written, if the data are unsuccessfully written, rewriting, and if the data are rewritten more than a specified number of times, writing the data into a disk for persistence;
step E1, checking the written data, obtaining a returned result JSON character string after being submitted to the cluster B, obtaining the value of an alarm field in the JSON character string, if the value is true, the fact that the submission has errors is shown, if the value is false, the fact that the submission has no errors is shown, the submission is successful, on the premise that the value is true, obtaining the number of data volumes in the returned JSON character string, comparing the number with the data volumes before the writing, if the data volumes are equal, the fact that the submission is successful, then reading the data from the annular buffer queue for the next time, and if the submission has errors or the returned writing number is inconsistent, rewriting;
step E2: and outputting the data which is not successfully submitted to the disk, and after 3 times of rewriting, if the data is still not successfully submitted, persistently storing the data to the disk. And then the next reading of data from the ring buffer queue is performed.
The data can be rewritten to prevent the failure of submission caused by the error of reading the data, but the data is persisted to the disk if the data is still failed after being rewritten three times.
Step F: detecting data which exist on the disk and are persisted due to write-in failure in a timed manner, if the data exist, obtaining the persisted data in the disk and writing the data into an Elastic Search cluster B, and clearing the successfully written data content on the disk after the data are successfully written;
detecting whether data which is persisted due to write failure exists on the disk at regular intervals, if so, locking the file, reading all the persisted data, then submitting the data to an Elastic Search cluster B, clearing the content of the data in the file after successful submission, and then releasing the lock to avoid data exception caused by simultaneous modification of multiple threads.
And G, in the operation process, sending error contents to topic specified by the kafka cluster for errors, wherein the error contents comprise unavailable opposite cluster, overtime network connection, data loss of a write-in cluster and the like. Then, the alarm message is sent to the receiver for processing in time through access monitoring.
Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all the action flow data of a consumer in a web site. The format of the message sent to the kafka cluster is a JSON string, and the format is defined as follows:
name of Chinese Index name Node IP Cause of error Time of occurrence
Json's key indexName nodeIP error time
The method mainly provides a method for realizing real-time synchronization between data through an Elastic Search bottom layer to construct a double-living cluster, and not only reduces the difficulty of double-write later maintenance and management, but also can completely ensure the timely consistency of the data of two clusters.
The double live of Elastic Search is by way of parsing the underlying write-ahead log (WAL). This distinguishes which records are written externally and which are sent by parsing the write-ahead log (WAL); this avoids "dead cycles" of data synchronization. The method adopts a marking mode for distinguishing; if the data in the write-ahead log (WAL) does not contain a flag, such as the field mentioned in step C of the method, we consider the data written externally to be added into the queue, otherwise, the data is skipped.
Mutual synchronization details among internal data are shielded externally by analyzing the underlying pre-written log (WAL). Firstly, the difficulty of an application layer is reduced, the application layer does not need double writing any more, and the consistency of data does not need to be ensured, so that the problems of double writing later-stage management and maintenance do not exist; secondly, the problem of double-write inconsistency is solved, the write operation is ensured to be atomic, the data in the two clusters are ensured to be completely consistent, the data is successfully written in one cluster, and the data must appear in the other cluster; finally, the scheme has better wide applicability, can support the construction of the double-live clusters of the Elastic Search under different versions, and is also suitable for mutual real-time synchronization with 3 or more than 3 Elastic Search cluster data.
The above-mentioned embodiments only express the specific embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for those skilled in the art, without departing from the technical idea of the present application, several changes and modifications can be made, which are all within the protection scope of the present application.

Claims (5)

1. A double-activity real-time data warehouse construction method based on Elastic Search comprises an Elastic Search cluster A and an Elastic Search cluster B, and is characterized in that: the method comprises the following steps:
a, acquiring an index main fragment on each node in an Elastic Search cluster A, wherein each main fragment stores the IP address of the node where the main fragment is located;
step A1: accessing the Elastic Search cluster A in an http mode to obtain a hash character corresponding to each index under all nodes, storing the obtained hash character into a Map object belonging to the current node, wherein the English name of the index is used as a key of the Map object, and the hash character corresponding to the index is used as a value of the Map object;
step A2: transmitting the key values of the Map objects into all nodes of an Elastic Search cluster A to obtain main fragments of the index in an http batch request mode, comparing whether the IP address of the current node of the Elastic Search cluster A is consistent with the IP address of the node where the main fragment of the obtained index is located, and storing the consistent IP addresses into one main fragment Map object maintained by the current node;
b, reading the pre-written log record of each main sub-slice under each node under the data disc directory on each node;
step B1: traversing and acquiring a key value of a Map object under a data disk directory of each node, transmitting the key value of the Map object into a current node, acquiring a path of a main fragment of each index on a disk, acquiring a file set of pre-written log records under the main fragment directory under the path of the main fragment, and sequencing the acquired pre-written log records according to update time, wherein the update time is closest to the current time and is arranged at the front;
step B2: traversing the obtained pre-written log file set, reading the content in each file in a Java nio mode to obtain the offset, the number of written pieces and the file algebra of each file, and judging the value of the offset of each file; if the value of the offset of the file is less than or equal to a specified value, skipping the file, if the value of the offset of the file is greater than the specified value, finding a corresponding log record file according to a file algebra, reading the byte number content of the specified offset, and adding or updating the offset into a Map object maintained by a current node;
and C: judging the read pre-written log records, and writing the pre-written log records meeting the requirements into a circular buffer queue;
step D, reading data in the annular buffer queue in a multithread mode, and writing the read data into an Elastic Search cluster B in a synchronous blocking mode;
step E: judging whether all data are successfully written, if the data are unsuccessfully written, rewriting, and if the data are rewritten more than a specified number of times, writing the data into a disk for persistence;
step F: detecting data which exist on the disk and are persisted due to write-in failure in a timed manner, if the data exist, obtaining the persisted data in the disk and writing the data into an Elastic Search cluster B, and clearing the successfully written data content on the disk after the data are successfully written;
and G, sending the running abnormal error message to the kafka cluster, and accessing the monitoring real-time alarm.
2. The Elastic Search based double-live real-time data warehouse construction method according to claim 1, characterized in that: the step C is specifically as follows:
and filtering and screening the data content of the read file, filtering the data content if the read data content contains a 'transclog' key word, selecting the data content if the read data content does not contain the 'transclog' key word, adding a 'transclog' key word to the data content, and writing the data content into the ring buffer queue.
3. The Elastic Search based double-live real-time data warehouse construction method according to claim 1, characterized in that: the step E specifically comprises the following steps:
step E1, checking the written data, obtaining a returned result JSON character string after being submitted to the cluster B, obtaining the value of an alarm field in the JSON character string, if the value is true, the fact that the submission has errors is shown, if the value is false, the fact that the submission has no errors is shown, the submission is successful, on the premise that the value is true, obtaining the number of data volumes in the returned JSON character string, comparing the number with the data volumes before the writing, if the data volumes are equal, the fact that the submission is successful, then reading the data from the annular buffer queue for the next time, and if the submission has errors or the returned writing number is inconsistent, rewriting;
step E2: outputting the data which is not successfully submitted to the disk, after rewriting for a specified number of times, if the data is still not successfully submitted, storing the data to the disk in a persistent mode, and then reading the data from the ring buffer queue for the next time.
4. The Elastic Search based double-live real-time data warehouse construction method according to claim 1, characterized in that: the step F is specifically as follows:
detecting whether data which is write-failed and persisted exists on the disk at regular intervals, if so, reading all the persisted data, then submitting the data to an Elastic Search cluster B, and clearing the content of the data in the file after successful submission.
5. The Elastic Search based double-live real-time data warehouse construction method according to claim 1, characterized in that: the step G specifically comprises the following steps: in the operation process, the abnormal error can send the error content to the communication equipment appointed by the kafka cluster, then the communication equipment is accessed to monitor, the monitor sends the alarm message to the receiver, and the receiver processes the error content in time.
CN202011224108.0A 2020-11-05 2020-11-05 Elastic Search based double-activity real-time data warehouse construction method Active CN112100160B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011224108.0A CN112100160B (en) 2020-11-05 2020-11-05 Elastic Search based double-activity real-time data warehouse construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011224108.0A CN112100160B (en) 2020-11-05 2020-11-05 Elastic Search based double-activity real-time data warehouse construction method

Publications (2)

Publication Number Publication Date
CN112100160A CN112100160A (en) 2020-12-18
CN112100160B true CN112100160B (en) 2021-09-07

Family

ID=73784581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011224108.0A Active CN112100160B (en) 2020-11-05 2020-11-05 Elastic Search based double-activity real-time data warehouse construction method

Country Status (1)

Country Link
CN (1) CN112100160B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988690B (en) * 2021-03-16 2023-02-17 挂号网(杭州)科技有限公司 Dictionary file synchronization method, device, server and storage medium
CN114579668A (en) * 2022-05-06 2022-06-03 中建电子商务有限责任公司 Database data synchronization method
CN114579596B (en) * 2022-05-06 2022-09-06 达而观数据(成都)有限公司 Method and system for updating index data of search engine in real time

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005055519A1 (en) * 2003-12-01 2005-06-16 International Business Machines Corporation Node clustering based on user interests, application requirements and network characteristics
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN103294731A (en) * 2012-03-05 2013-09-11 阿里巴巴集团控股有限公司 Real-time index creating and real-time searching method and device
CN103793290A (en) * 2012-10-31 2014-05-14 腾讯科技(深圳)有限公司 Disaster tolerant system and data reading method thereof
CN104239417A (en) * 2014-08-19 2014-12-24 天津南大通用数据技术股份有限公司 Dynamic adjustment method and dynamic adjustment device after data fragmentation in distributed database
CN105095762A (en) * 2015-07-31 2015-11-25 中国人民解放军信息工程大学 Global offset table protection method based on address randomness and segment isolation
CN109408289A (en) * 2018-10-16 2019-03-01 国网山东省电力公司信息通信公司 A kind of cloud disaster tolerance data processing method
CN110825816A (en) * 2020-01-09 2020-02-21 四川新网银行股份有限公司 System and method for data acquisition of partitioned database
CN111752962A (en) * 2020-07-01 2020-10-09 浪潮云信息技术股份公司 System and method for ensuring high availability and consistency of MHA cluster

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10656866B2 (en) * 2014-12-31 2020-05-19 Pure Storage, Inc. Unidirectional vault synchronization to support tiering
US10560544B2 (en) * 2015-08-25 2020-02-11 Box, Inc. Data caching in a collaborative file sharing system
CN107959695B (en) * 2016-10-14 2021-01-29 北京国双科技有限公司 Data transmission method and device
US10394670B2 (en) * 2017-06-02 2019-08-27 Verizon Patent And Licensing Inc. High availability and disaster recovery system architecture
CN108418859B (en) * 2018-01-24 2020-11-06 华为技术有限公司 Method and device for writing data
CN108737184B (en) * 2018-05-22 2021-08-20 华为技术有限公司 Management method and device of disaster recovery system
CN111352766A (en) * 2018-12-21 2020-06-30 中国移动通信集团山东有限公司 Database double-activity implementation method and device
CN110990366B (en) * 2019-12-04 2024-02-23 中国农业银行股份有限公司 Index allocation method and device for improving performance of ES-based log system
CN111026621B (en) * 2019-12-23 2023-04-07 杭州安恒信息技术股份有限公司 Monitoring alarm method, device, equipment and medium for Elasticissearch cluster

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005055519A1 (en) * 2003-12-01 2005-06-16 International Business Machines Corporation Node clustering based on user interests, application requirements and network characteristics
CN103294731A (en) * 2012-03-05 2013-09-11 阿里巴巴集团控股有限公司 Real-time index creating and real-time searching method and device
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN103793290A (en) * 2012-10-31 2014-05-14 腾讯科技(深圳)有限公司 Disaster tolerant system and data reading method thereof
CN104239417A (en) * 2014-08-19 2014-12-24 天津南大通用数据技术股份有限公司 Dynamic adjustment method and dynamic adjustment device after data fragmentation in distributed database
CN105095762A (en) * 2015-07-31 2015-11-25 中国人民解放军信息工程大学 Global offset table protection method based on address randomness and segment isolation
CN109408289A (en) * 2018-10-16 2019-03-01 国网山东省电力公司信息通信公司 A kind of cloud disaster tolerance data processing method
CN110825816A (en) * 2020-01-09 2020-02-21 四川新网银行股份有限公司 System and method for data acquisition of partitioned database
CN111752962A (en) * 2020-07-01 2020-10-09 浪潮云信息技术股份公司 System and method for ensuring high availability and consistency of MHA cluster

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ElasticSearch如何保证数据一致性,实时性;SHAN某人;《https://www.jianshu.com/61dd9fb7d785》;20180821;1-6 *
The Network Architecture Design of Distributed Dual Live Data Center;Nan Shuping 等;《 2019 IEEE International Conference on Power, Intelligent》;20191226;638-642 *
容灾备份系统中的同步策略研究及效率分析;杨鹏;《中国优秀硕士学位论文全文数据库 信息科技辑》;20091015(第 10 期);I138-41 *

Also Published As

Publication number Publication date
CN112100160A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN112100160B (en) Elastic Search based double-activity real-time data warehouse construction method
US11657053B2 (en) Temporal optimization of data operations using distributed search and server management
US11496545B2 (en) Temporal optimization of data operations using distributed search and server management
CN111723160B (en) Multi-source heterogeneous incremental data synchronization method and system
US20220237166A1 (en) Table partitioning within distributed database systems
US10891297B2 (en) Method and system for implementing collection-wise processing in a log analytics system
US7702640B1 (en) Stratified unbalanced trees for indexing of data items within a computer system
EP3791284A1 (en) Conflict resolution for multi-master distributed databases
AU2022200375A1 (en) Temporal optimization of data operations using distributed search and server management
US8396840B1 (en) System and method for targeted consistency improvement in a distributed storage system
US20170242761A1 (en) Fault tolerant listener registration in the presence of node crashes in a data grid
US20120197958A1 (en) Parallel Serialization of Request Processing
JP2023546249A (en) Transaction processing methods, devices, computer equipment and computer programs
US10936559B1 (en) Strongly-consistent secondary index for a distributed data set
US8468134B1 (en) System and method for measuring consistency within a distributed storage system
US11676066B2 (en) Parallel model deployment for artificial intelligence using a primary storage system
CN112527783B (en) Hadoop-based data quality exploration system
CN113360456B (en) Data archiving method, device, equipment and storage medium
CN111639114A (en) Distributed data fusion management system based on Internet of things platform
CN113111129A (en) Data synchronization method, device, equipment and storage medium
US11663192B2 (en) Identifying and resolving differences between datastores
US20200250188A1 (en) Systems, methods and data structures for efficient indexing and retrieval of temporal data, including temporal data representing a computing infrastructure
US11397750B1 (en) Automated conflict resolution and synchronization of objects
CN109947730B (en) Metadata recovery method, device, distributed file system and readable storage medium
US11188228B1 (en) Graphing transaction operations for transaction compliance analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant