CN112100160B - Elastic Search based double-activity real-time data warehouse construction method - Google Patents
Elastic Search based double-activity real-time data warehouse construction method Download PDFInfo
- Publication number
- CN112100160B CN112100160B CN202011224108.0A CN202011224108A CN112100160B CN 112100160 B CN112100160 B CN 112100160B CN 202011224108 A CN202011224108 A CN 202011224108A CN 112100160 B CN112100160 B CN 112100160B
- Authority
- CN
- China
- Prior art keywords
- data
- elastic search
- file
- time
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/548—Queue
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a double-activity real-time data warehouse construction method based on Elastic Search, which relates to the technical field of big data real-time calculation and solves the problem that the consistency of data cannot be ensured in the real-time data warehouse construction in the prior art; the scheme comprises the following steps: acquiring an index main fragment on each node in an Elastic Search cluster A, and reading a pre-written log record of each main fragment under the node; judging the read pre-written log record, and writing the read data into an Elastic Search cluster B in a synchronous blocking mode; rewriting the data with write failure, detecting the data which exists on the disk and is persistent due to write failure at regular time, sending the abnormal error message in operation to the kafka cluster, and accessing to monitor and alarm in real time. The method can ensure that the data in the two clusters are completely consistent, and is mainly applied to the field of big data analysis.
Description
Technical Field
The invention relates to the technical field of big data real-time calculation, in particular to a double-activity real-time data warehouse construction method based on Elastic Search.
Background
In the current age of big data, there are many data warehouses for storing massive data, and a distributed search Engine (ES) is one of them. The distributed search engine ElasticSearch is an open-source, distributed and Restful search server constructed based on Lucene and is generally used in cloud computing. It can conveniently make a large amount of data have the capability of searching, analyzing and exploring. The horizontal flexibility of the distributed search engine ElasticSearch is fully utilized, so that the data can become more valuable in a production environment.
With the wider application range of the big data technology in the financial field, the timeliness requirement on the data is higher and higher, such as real-time accurate marketing and real-time risk control anti-fraud. In order to meet the requirements of business scenes, a real-time data warehouse is basically established, but obvious business high-low peak fluctuation exists in financial industries such as banks and the like, so that higher requirements are put forward on the established real-time data warehouse, the high availability of the real-time data warehouse needs to be ensured, flow sharing needs to be considered in the business peak time, and the fluency of user experience is ensured. The Elastic Search cluster comprises a plurality of nodes, each node comprises more than one index, each index is divided into more than one index fragment, and the index fragments only comprise a main fragment or simultaneously comprise the main fragment and more than one copy.
In the prior art, the following two methods are mainly used for real-time data warehouse construction:
application layer double writing: data is written into 2 clusters through application layer codes, data is written into 2 clusters through deployment of 2 sets of service codes, and meanwhile the consistency of the data in the 2 clusters is guaranteed through an application layer. The method is simplest, but the later management and maintenance are troublesome, for example, once online rollback operation needs to be written twice, deployment needs to be performed twice, and meanwhile, the problem of data consistency exists.
Message queue pull:
the method includes the steps that data needing to be written are placed into a message queue, such as kafka, and then 2 clusters respectively pull the data from the same message queue.
The same problem exists with both of the above methods: the consistency of data cannot be guaranteed, the problem that the data can be successfully written in one cluster and fails to be written in the other cluster exists, the root cause is that the two-time writing makes 2 operations independent, and meanwhile, the later management and maintenance cost of the method is high.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a double-live real-time data warehouse construction method based on Elastic Search, which aims to:
the technical scheme adopted by the invention is as follows:
a double-activity real-time data warehouse construction method based on Elastic Search comprises an Elastic Search cluster A and an Elastic Search cluster B, and comprises the following steps:
a, acquiring an index main fragment on each node in an Elastic Search cluster A, wherein each main fragment stores the IP address of the node where the main fragment is located;
b, reading the pre-written log record of each main sub-slice under each node under the data disc directory on each node;
and C: judging the read pre-written log records, and writing the pre-written log records meeting the requirements into a circular buffer queue;
step D, reading data in the annular buffer queue in a multithread mode, and writing the read data into an Elastic Search cluster B in a synchronous blocking mode;
step E: judging whether all data are successfully written, if the data are unsuccessfully written, rewriting, and if the data are rewritten for more than a specified number of times, preferably 3 times, writing the data into a disk for persistence;
step F: detecting data which exist on the disk and are persisted due to write-in failure at regular time, preferably detecting the data once in five minutes, if the data exist, obtaining the persisted data in the disk and writing the data into an Elastic Search cluster B, and clearing the successfully written data content on the disk after the data are successfully written;
and G, sending the running abnormal error message to the kafka cluster, and accessing the monitoring real-time alarm.
The Elastic Search cluster B writes data into the Elastic Search cluster A through the steps, and the two clusters achieve real-time synchronization through mutual writing.
Further, the step a specifically comprises the following steps:
step A1: accessing the Elastic Search cluster A in an http mode to obtain a hash character corresponding to each index under all nodes, storing the obtained hash character into a Map object belonging to the current node, wherein the English name of the index is used as a key of the Map object, and the hash character corresponding to the index is used as a value of the Map object;
step A2: and transmitting the key values of the Map object into all nodes of an Elastic Search cluster A to obtain an indexed main fragment in an http batch request mode, comparing whether the IP address of the current node of the Elastic Search cluster A is consistent with the IP address of the node where the obtained indexed main fragment is located, and storing the consistent IP address into a main fragment Map object maintained by the current node.
The invention stores the IP address, which is convenient to determine the main fragment on the node, and the result returned by the query in the Elastic Search cluster contains the main fragments of all the nodes, so the main fragments are stored in the IP address to be separated, and the main fragments are determined to be the main fragments of the node.
Further, the step B specifically includes:
step B1: traversing and acquiring a key value of a Map object under a data disk directory of each node, transmitting the key value of the Map object into a current node, acquiring a path of a main fragment of each index on a disk, acquiring a file set of pre-written log records under the main fragment directory under the path of the main fragment, and sequencing the acquired pre-written log records according to update time, wherein the update time is closest to the current time and is arranged at the front;
step B2: traversing the obtained pre-written log file set, reading the content in each file in a Java nio mode to obtain the offset, the number of written pieces and the file algebra of each file, and judging the value of the offset of each file; if the value of the file offset is smaller than or equal to the specified value, skipping the file, and if the value of the file offset is larger than the specified value, finding the corresponding log record file according to the file algebra, reading the byte number content of the specified offset, and simultaneously adding or updating the offset into a Map object maintained by a current node, wherein the specified value is preferably 55. The offset 55 is defined because the content of the data can be completely successfully read at 55, which is advantageous.
According to the method, a file name set at the end of the segment of No. ckp is obtained through a path of a main segment, namely, the file name set is a pre-written log record, an Elastic Search can divide a data file by taking 65M as a standard, and the file at the end of the segment of No. ckp records some metadata information of the divided files; sorting the files according to the update time, wherein the latest time closest to the current time is ranked at the top; real-time updating can be realized, and the latest updating is obtained each time; when the data file is less than 65M, the data file is read once, after the data file is updated, the read data is skipped through the offset of the Map object record, and the data is read from the updated position.
Further, the step C specifically comprises:
and filtering and screening the data content of the read file, filtering the data content if the read data content contains a 'transclog' key word, selecting the data content if the read data content does not contain the 'transclog' key word, adding a 'transclog' key word to the data content, and writing the data content into the ring buffer queue.
By adding a "transclog" key, a message is prevented from being synchronized back and forth between two Elastic Search clusters.
Further, the step E specifically includes:
step E1, checking the written data, obtaining a returned result JSON character string after being submitted to the cluster B, obtaining the value of an alarm field in the JSON character string, if the value is true, the fact that the submission has errors is shown, if the value is false, the fact that the submission has no errors is shown, the submission is successful, on the premise that the value is true, obtaining the number of data volumes in the returned JSON character string, comparing the number with the data volumes before the writing, if the data volumes are equal, the fact that the submission is successful, then reading the data from the annular buffer queue for the next time, and if the submission has errors or the returned writing number is inconsistent, rewriting;
step E2: and outputting the data which is not successfully submitted to the disk, rewriting for a specified time, preferably 3 times, if the data is not successfully submitted, storing the data to the disk in a persistent mode, and then reading the data from the ring buffer queue for the next time.
The data can be rewritten to prevent the failure of submission caused by the error of reading the data, but if the data is rewritten for 3 times and still fails, the data is persisted to the disk.
Further, step F specifically comprises: detecting whether data which is write-failed and persisted exists on the disk at regular intervals, if so, reading all the persisted data, then submitting the data to an Elastic Search cluster B, and clearing the content of the data in the file after successful submission.
Further, the step G specifically comprises: in the operation process, the abnormal error can send the error content to the communication equipment appointed by the kafka cluster, then the communication equipment is accessed to monitor, the monitor sends the alarm message to the receiver, and the receiver processes the error content in time.
In the running process, for example, the opposite side cluster is unavailable, the network connection is overtime, and the written cluster has error contents such as data loss and the like. Then, the alarm message is sent to the receiver for processing in time through access monitoring.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: mutual synchronization details among internal data are shielded outwards by analyzing the pre-written log records of the bottom layer main fragment, the difficulty of an application layer is reduced firstly, the application layer does not need to write doubly any more, and the consistency of the data is not required to be ensured, so that the problems of double-write later-period management and maintenance are avoided, the problem of double-write inconsistency is solved secondly, the writing operation is ensured to be atomic, the data in two clusters are ensured to be completely consistent, the data is successfully written in one cluster and is bound to appear in the other cluster, and the scheme also has good wide applicability, and not only can the mutual real-time synchronization among the Elastic Search cluster data under different versions be supported.
Drawings
The invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIG. 1 is a schematic flow diagram of one embodiment of the present invention;
FIG. 2 is a schematic flow chart of writing persistent data to the cluster of the disk in FIG. 1.
Detailed Description
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
The present invention will be described in detail with reference to fig. 1 and 2, wherein "Y" represents "YES" and "N" represents "NO" in the flowchart.
The invention relates to a double-activity real-time data warehouse construction method based on Elastic Search, which comprises an Elastic Search cluster B and an Elastic Search cluster A, wherein data are mutually synchronized between the Elastic Search cluster B and the Elastic Search cluster A by analyzing pre-written log records of a bottom layer.
The method comprises the following specific steps:
a, acquiring an index main fragment on each node in an Elastic Search cluster A, wherein each main fragment stores the IP address of the node where the main fragment is located;
step A1: accessing the Elastic Search cluster A in an http mode to obtain a hash character corresponding to each index under all nodes, storing the obtained hash character into a Map object belonging to the current node, wherein the English name of the index is used as a key of the Map object, and the hash character corresponding to the index is used as a value of the Map object;
step A2: and transmitting the key values of the Map object into all nodes of an Elastic Search cluster A to obtain an indexed main fragment in an http batch request mode, comparing whether the IP address of the current node of the Elastic Search cluster A is consistent with the IP address of the node where the obtained indexed main fragment is located, and storing the consistent IP address into a main fragment Map object maintained by the current node.
The Map object maintains all master shards on the current node. The IP address is stored to facilitate the determination of the main fragment on the node, and because the result returned by the Elastic Search cluster contains all the nodes, the main fragment is separated from all other main fragments by the IP address.
Design of Map object: and splicing the hash character string with the index value with a new character string formed by the fragment number, wherein the key is the English name of the index. As follows:
(TnRucew-ThawtzMZcfMXpw/4,flinkxw123)。
where TnRucew-thawttzmzcfmxpw is a hash string indexing the english name flinkxw123, with 4 being denoted as master partition number 4.
B, reading the pre-written log record of each main sub-slice under each node under the data disc directory on each node;
b1, traversing the data disk directory of each node to obtain the key value of the Map object, transmitting the key value of the Map object to the current node to obtain the path of the main fragment of each index on the disk, obtaining the file name set at the end of the main fragment directory of the & ltSUB & gt & lt ckp & gt under the path of the main fragment, and sequencing the obtained file sets according to the update time, wherein the update time closest to the current time is arranged at the forefront;
step B2: traversing the obtained pre-written log file set, reading the content in each file in a Java nio mode to obtain the offset (offset), the number of written pieces (numOps) and the file algebra (generation) of each file, and judging the value of the offset of each file; if the offset value of the file is less than or equal to 55, skipping the file, and if the offset value of the file is greater than 55, finding a corresponding log record file according to the file algebra, reading the byte number content of the specified offset, and adding or updating the offset into a Map object maintained by a current node. The offset 55 is defined because the content of the data can be completely successfully read at 55, which is advantageous.
According to the method, a file name set at the end of the segment of No. ckp is obtained through a path of a main segment, namely, the file name set is a pre-written log record, an Elastic Search can divide a data file by taking 65M as a standard, and the file at the end of the segment of No. ckp records some metadata information of the divided files; sorting the files according to the update time, wherein the latest time closest to the current time is ranked at the top; real-time updating can be realized, and the latest updating is obtained each time; when the data file is less than 65M, the data file is read once, after the data file is updated, the read data is skipped through the offset of the Map object record, and the data is read from the updated position.
And C: and filtering and screening the data content of the read file, filtering the data content if the read data content contains a 'transclog' key word, selecting the data content if the read data content does not contain the 'transclog' key word, adding a 'transclog' key word to the data content, and writing the data content into the ring buffer queue.
The new addition of a "transclog" key avoids a piece of content from being synchronized back and forth between two Elastic Search clusters, a piece of newly written data is indicated by parsing if the entire "transclog" key is not included in the content, and no synchronization is required if the "transclog" key indicates that the piece of newly written data has been parsed.
Step D, reading data in the annular buffer queue in a multithread mode, and writing the read data into an Elastic Search cluster B in a synchronous blocking mode;
step E: judging whether all data are successfully written, if the data are unsuccessfully written, rewriting, and if the data are rewritten more than a specified number of times, writing the data into a disk for persistence;
step E1, checking the written data, obtaining a returned result JSON character string after being submitted to the cluster B, obtaining the value of an alarm field in the JSON character string, if the value is true, the fact that the submission has errors is shown, if the value is false, the fact that the submission has no errors is shown, the submission is successful, on the premise that the value is true, obtaining the number of data volumes in the returned JSON character string, comparing the number with the data volumes before the writing, if the data volumes are equal, the fact that the submission is successful, then reading the data from the annular buffer queue for the next time, and if the submission has errors or the returned writing number is inconsistent, rewriting;
step E2: and outputting the data which is not successfully submitted to the disk, and after 3 times of rewriting, if the data is still not successfully submitted, persistently storing the data to the disk. And then the next reading of data from the ring buffer queue is performed.
The data can be rewritten to prevent the failure of submission caused by the error of reading the data, but the data is persisted to the disk if the data is still failed after being rewritten three times.
Step F: detecting data which exist on the disk and are persisted due to write-in failure in a timed manner, if the data exist, obtaining the persisted data in the disk and writing the data into an Elastic Search cluster B, and clearing the successfully written data content on the disk after the data are successfully written;
detecting whether data which is persisted due to write failure exists on the disk at regular intervals, if so, locking the file, reading all the persisted data, then submitting the data to an Elastic Search cluster B, clearing the content of the data in the file after successful submission, and then releasing the lock to avoid data exception caused by simultaneous modification of multiple threads.
And G, in the operation process, sending error contents to topic specified by the kafka cluster for errors, wherein the error contents comprise unavailable opposite cluster, overtime network connection, data loss of a write-in cluster and the like. Then, the alarm message is sent to the receiver for processing in time through access monitoring.
Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all the action flow data of a consumer in a web site. The format of the message sent to the kafka cluster is a JSON string, and the format is defined as follows:
name of Chinese | Index name | Node IP | Cause of error | Time of occurrence |
Json's key | indexName | nodeIP | error | time |
The method mainly provides a method for realizing real-time synchronization between data through an Elastic Search bottom layer to construct a double-living cluster, and not only reduces the difficulty of double-write later maintenance and management, but also can completely ensure the timely consistency of the data of two clusters.
The double live of Elastic Search is by way of parsing the underlying write-ahead log (WAL). This distinguishes which records are written externally and which are sent by parsing the write-ahead log (WAL); this avoids "dead cycles" of data synchronization. The method adopts a marking mode for distinguishing; if the data in the write-ahead log (WAL) does not contain a flag, such as the field mentioned in step C of the method, we consider the data written externally to be added into the queue, otherwise, the data is skipped.
Mutual synchronization details among internal data are shielded externally by analyzing the underlying pre-written log (WAL). Firstly, the difficulty of an application layer is reduced, the application layer does not need double writing any more, and the consistency of data does not need to be ensured, so that the problems of double writing later-stage management and maintenance do not exist; secondly, the problem of double-write inconsistency is solved, the write operation is ensured to be atomic, the data in the two clusters are ensured to be completely consistent, the data is successfully written in one cluster, and the data must appear in the other cluster; finally, the scheme has better wide applicability, can support the construction of the double-live clusters of the Elastic Search under different versions, and is also suitable for mutual real-time synchronization with 3 or more than 3 Elastic Search cluster data.
The above-mentioned embodiments only express the specific embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for those skilled in the art, without departing from the technical idea of the present application, several changes and modifications can be made, which are all within the protection scope of the present application.
Claims (5)
1. A double-activity real-time data warehouse construction method based on Elastic Search comprises an Elastic Search cluster A and an Elastic Search cluster B, and is characterized in that: the method comprises the following steps:
a, acquiring an index main fragment on each node in an Elastic Search cluster A, wherein each main fragment stores the IP address of the node where the main fragment is located;
step A1: accessing the Elastic Search cluster A in an http mode to obtain a hash character corresponding to each index under all nodes, storing the obtained hash character into a Map object belonging to the current node, wherein the English name of the index is used as a key of the Map object, and the hash character corresponding to the index is used as a value of the Map object;
step A2: transmitting the key values of the Map objects into all nodes of an Elastic Search cluster A to obtain main fragments of the index in an http batch request mode, comparing whether the IP address of the current node of the Elastic Search cluster A is consistent with the IP address of the node where the main fragment of the obtained index is located, and storing the consistent IP addresses into one main fragment Map object maintained by the current node;
b, reading the pre-written log record of each main sub-slice under each node under the data disc directory on each node;
step B1: traversing and acquiring a key value of a Map object under a data disk directory of each node, transmitting the key value of the Map object into a current node, acquiring a path of a main fragment of each index on a disk, acquiring a file set of pre-written log records under the main fragment directory under the path of the main fragment, and sequencing the acquired pre-written log records according to update time, wherein the update time is closest to the current time and is arranged at the front;
step B2: traversing the obtained pre-written log file set, reading the content in each file in a Java nio mode to obtain the offset, the number of written pieces and the file algebra of each file, and judging the value of the offset of each file; if the value of the offset of the file is less than or equal to a specified value, skipping the file, if the value of the offset of the file is greater than the specified value, finding a corresponding log record file according to a file algebra, reading the byte number content of the specified offset, and adding or updating the offset into a Map object maintained by a current node;
and C: judging the read pre-written log records, and writing the pre-written log records meeting the requirements into a circular buffer queue;
step D, reading data in the annular buffer queue in a multithread mode, and writing the read data into an Elastic Search cluster B in a synchronous blocking mode;
step E: judging whether all data are successfully written, if the data are unsuccessfully written, rewriting, and if the data are rewritten more than a specified number of times, writing the data into a disk for persistence;
step F: detecting data which exist on the disk and are persisted due to write-in failure in a timed manner, if the data exist, obtaining the persisted data in the disk and writing the data into an Elastic Search cluster B, and clearing the successfully written data content on the disk after the data are successfully written;
and G, sending the running abnormal error message to the kafka cluster, and accessing the monitoring real-time alarm.
2. The Elastic Search based double-live real-time data warehouse construction method according to claim 1, characterized in that: the step C is specifically as follows:
and filtering and screening the data content of the read file, filtering the data content if the read data content contains a 'transclog' key word, selecting the data content if the read data content does not contain the 'transclog' key word, adding a 'transclog' key word to the data content, and writing the data content into the ring buffer queue.
3. The Elastic Search based double-live real-time data warehouse construction method according to claim 1, characterized in that: the step E specifically comprises the following steps:
step E1, checking the written data, obtaining a returned result JSON character string after being submitted to the cluster B, obtaining the value of an alarm field in the JSON character string, if the value is true, the fact that the submission has errors is shown, if the value is false, the fact that the submission has no errors is shown, the submission is successful, on the premise that the value is true, obtaining the number of data volumes in the returned JSON character string, comparing the number with the data volumes before the writing, if the data volumes are equal, the fact that the submission is successful, then reading the data from the annular buffer queue for the next time, and if the submission has errors or the returned writing number is inconsistent, rewriting;
step E2: outputting the data which is not successfully submitted to the disk, after rewriting for a specified number of times, if the data is still not successfully submitted, storing the data to the disk in a persistent mode, and then reading the data from the ring buffer queue for the next time.
4. The Elastic Search based double-live real-time data warehouse construction method according to claim 1, characterized in that: the step F is specifically as follows:
detecting whether data which is write-failed and persisted exists on the disk at regular intervals, if so, reading all the persisted data, then submitting the data to an Elastic Search cluster B, and clearing the content of the data in the file after successful submission.
5. The Elastic Search based double-live real-time data warehouse construction method according to claim 1, characterized in that: the step G specifically comprises the following steps: in the operation process, the abnormal error can send the error content to the communication equipment appointed by the kafka cluster, then the communication equipment is accessed to monitor, the monitor sends the alarm message to the receiver, and the receiver processes the error content in time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011224108.0A CN112100160B (en) | 2020-11-05 | 2020-11-05 | Elastic Search based double-activity real-time data warehouse construction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011224108.0A CN112100160B (en) | 2020-11-05 | 2020-11-05 | Elastic Search based double-activity real-time data warehouse construction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112100160A CN112100160A (en) | 2020-12-18 |
CN112100160B true CN112100160B (en) | 2021-09-07 |
Family
ID=73784581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011224108.0A Active CN112100160B (en) | 2020-11-05 | 2020-11-05 | Elastic Search based double-activity real-time data warehouse construction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112100160B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112988690B (en) * | 2021-03-16 | 2023-02-17 | 挂号网(杭州)科技有限公司 | Dictionary file synchronization method, device, server and storage medium |
CN114579668A (en) * | 2022-05-06 | 2022-06-03 | 中建电子商务有限责任公司 | Database data synchronization method |
CN114579596B (en) * | 2022-05-06 | 2022-09-06 | 达而观数据(成都)有限公司 | Method and system for updating index data of search engine in real time |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005055519A1 (en) * | 2003-12-01 | 2005-06-16 | International Business Machines Corporation | Node clustering based on user interests, application requirements and network characteristics |
CN102779185A (en) * | 2012-06-29 | 2012-11-14 | 浙江大学 | High-availability distribution type full-text index method |
CN103294731A (en) * | 2012-03-05 | 2013-09-11 | 阿里巴巴集团控股有限公司 | Real-time index creating and real-time searching method and device |
CN103793290A (en) * | 2012-10-31 | 2014-05-14 | 腾讯科技(深圳)有限公司 | Disaster tolerant system and data reading method thereof |
CN104239417A (en) * | 2014-08-19 | 2014-12-24 | 天津南大通用数据技术股份有限公司 | Dynamic adjustment method and dynamic adjustment device after data fragmentation in distributed database |
CN105095762A (en) * | 2015-07-31 | 2015-11-25 | 中国人民解放军信息工程大学 | Global offset table protection method based on address randomness and segment isolation |
CN109408289A (en) * | 2018-10-16 | 2019-03-01 | 国网山东省电力公司信息通信公司 | A kind of cloud disaster tolerance data processing method |
CN110825816A (en) * | 2020-01-09 | 2020-02-21 | 四川新网银行股份有限公司 | System and method for data acquisition of partitioned database |
CN111752962A (en) * | 2020-07-01 | 2020-10-09 | 浪潮云信息技术股份公司 | System and method for ensuring high availability and consistency of MHA cluster |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10656866B2 (en) * | 2014-12-31 | 2020-05-19 | Pure Storage, Inc. | Unidirectional vault synchronization to support tiering |
US10560544B2 (en) * | 2015-08-25 | 2020-02-11 | Box, Inc. | Data caching in a collaborative file sharing system |
CN107959695B (en) * | 2016-10-14 | 2021-01-29 | 北京国双科技有限公司 | Data transmission method and device |
US10394670B2 (en) * | 2017-06-02 | 2019-08-27 | Verizon Patent And Licensing Inc. | High availability and disaster recovery system architecture |
CN108418859B (en) * | 2018-01-24 | 2020-11-06 | 华为技术有限公司 | Method and device for writing data |
CN113849328B (en) * | 2018-05-22 | 2024-04-12 | 华为技术有限公司 | Management method and device of disaster recovery system |
CN111352766A (en) * | 2018-12-21 | 2020-06-30 | 中国移动通信集团山东有限公司 | Database double-activity implementation method and device |
CN110990366B (en) * | 2019-12-04 | 2024-02-23 | 中国农业银行股份有限公司 | Index allocation method and device for improving performance of ES-based log system |
CN111026621B (en) * | 2019-12-23 | 2023-04-07 | 杭州安恒信息技术股份有限公司 | Monitoring alarm method, device, equipment and medium for Elasticissearch cluster |
-
2020
- 2020-11-05 CN CN202011224108.0A patent/CN112100160B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005055519A1 (en) * | 2003-12-01 | 2005-06-16 | International Business Machines Corporation | Node clustering based on user interests, application requirements and network characteristics |
CN103294731A (en) * | 2012-03-05 | 2013-09-11 | 阿里巴巴集团控股有限公司 | Real-time index creating and real-time searching method and device |
CN102779185A (en) * | 2012-06-29 | 2012-11-14 | 浙江大学 | High-availability distribution type full-text index method |
CN103793290A (en) * | 2012-10-31 | 2014-05-14 | 腾讯科技(深圳)有限公司 | Disaster tolerant system and data reading method thereof |
CN104239417A (en) * | 2014-08-19 | 2014-12-24 | 天津南大通用数据技术股份有限公司 | Dynamic adjustment method and dynamic adjustment device after data fragmentation in distributed database |
CN105095762A (en) * | 2015-07-31 | 2015-11-25 | 中国人民解放军信息工程大学 | Global offset table protection method based on address randomness and segment isolation |
CN109408289A (en) * | 2018-10-16 | 2019-03-01 | 国网山东省电力公司信息通信公司 | A kind of cloud disaster tolerance data processing method |
CN110825816A (en) * | 2020-01-09 | 2020-02-21 | 四川新网银行股份有限公司 | System and method for data acquisition of partitioned database |
CN111752962A (en) * | 2020-07-01 | 2020-10-09 | 浪潮云信息技术股份公司 | System and method for ensuring high availability and consistency of MHA cluster |
Non-Patent Citations (3)
Title |
---|
ElasticSearch如何保证数据一致性,实时性;SHAN某人;《https://www.jianshu.com/61dd9fb7d785》;20180821;1-6 * |
The Network Architecture Design of Distributed Dual Live Data Center;Nan Shuping 等;《 2019 IEEE International Conference on Power, Intelligent》;20191226;638-642 * |
容灾备份系统中的同步策略研究及效率分析;杨鹏;《中国优秀硕士学位论文全文数据库 信息科技辑》;20091015(第 10 期);I138-41 * |
Also Published As
Publication number | Publication date |
---|---|
CN112100160A (en) | 2020-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112100160B (en) | Elastic Search based double-activity real-time data warehouse construction method | |
US11657053B2 (en) | Temporal optimization of data operations using distributed search and server management | |
US11496545B2 (en) | Temporal optimization of data operations using distributed search and server management | |
CN111723160B (en) | Multi-source heterogeneous incremental data synchronization method and system | |
US11314714B2 (en) | Table partitioning within distributed database systems | |
US10891297B2 (en) | Method and system for implementing collection-wise processing in a log analytics system | |
US7702640B1 (en) | Stratified unbalanced trees for indexing of data items within a computer system | |
US11314717B1 (en) | Scalable architecture for propagating updates to replicated data | |
AU2022200375A1 (en) | Temporal optimization of data operations using distributed search and server management | |
EP3791284A1 (en) | Conflict resolution for multi-master distributed databases | |
US8396840B1 (en) | System and method for targeted consistency improvement in a distributed storage system | |
US20120197958A1 (en) | Parallel Serialization of Request Processing | |
CN112527783B (en) | Hadoop-based data quality exploration system | |
US10936559B1 (en) | Strongly-consistent secondary index for a distributed data set | |
US20210224684A1 (en) | Parallel Model Deployment for Artificial Intelligence Using a Primary Storage System | |
US8468134B1 (en) | System and method for measuring consistency within a distributed storage system | |
US11397750B1 (en) | Automated conflict resolution and synchronization of objects | |
CN102737127A (en) | Massive data storage method | |
US10133767B1 (en) | Materialization strategies in journal-based databases | |
CN113111129A (en) | Data synchronization method, device, equipment and storage medium | |
CN113360456B (en) | Data archiving method, device, equipment and storage medium | |
CN111639114A (en) | Distributed data fusion management system based on Internet of things platform | |
US11386078B2 (en) | Distributed trust data storage system | |
US20200250188A1 (en) | Systems, methods and data structures for efficient indexing and retrieval of temporal data, including temporal data representing a computing infrastructure | |
US11663192B2 (en) | Identifying and resolving differences between datastores |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |