CN109840251B - Big data aggregation query method - Google Patents
Big data aggregation query method Download PDFInfo
- Publication number
- CN109840251B CN109840251B CN201811635184.3A CN201811635184A CN109840251B CN 109840251 B CN109840251 B CN 109840251B CN 201811635184 A CN201811635184 A CN 201811635184A CN 109840251 B CN109840251 B CN 109840251B
- Authority
- CN
- China
- Prior art keywords
- data
- cluster
- logstar
- service
- elastic search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000002776 aggregation Effects 0.000 title claims abstract description 17
- 238000004220 aggregation Methods 0.000 title claims abstract description 17
- 230000001360 synchronised effect Effects 0.000 claims abstract description 14
- 238000013500 data storage Methods 0.000 claims abstract description 7
- 230000002159 abnormal effect Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The application provides a big data aggregation query method, which comprises the following steps: adding an identification field in a source data table to be synchronized; deploying the Logstar cluster as a data synchronization middleware; deploying an elastic search cluster as a data storage system; setting a data synchronization configuration file in the Logstar cluster; setting an elastic search index in an elastic search cluster, and storing service information and service statistics data; and starting the Logstash cluster synchronous data. By the method and the system, the problem of cross-service and cross-database associated query in the micro-service architecture can be solved, and the query efficiency of software on the education cloud platform is improved.
Description
Technical Field
The application relates to the technical field of data query, in particular to a big data aggregation query method.
Background
At present, along with the innovation progress of computer technology and network technology, particularly the progress of virtualization technology, new concepts and new schemes are innovated and developed, and particularly the rapid development of Docker technology lays a foundation for the promotion of an online education cloud platform.
In the prior art, educational cloud platforms generally employ a micro-service architecture. The micro-service architecture solves some of the problems in the traditional hierarchical architecture, and has the core characteristics of high scalability, easy development, testing and deployment of independent service components, which are decoupled, distributed and independent from each other.
However, when the micro-service architecture is adopted on the education cloud platform, the source data are distributed in each database, so that when the source data are queried, the problems of cross-service and cross-database associated query are difficult to solve, and the query efficiency is low.
Disclosure of Invention
In view of the above, the application provides a big data aggregation query method, so that the problem of cross-service and cross-database association query in a micro-service architecture can be solved, and the query efficiency of software on an education cloud platform is improved.
The technical scheme of the application is realized specifically as follows:
a big data aggregation query method comprises the following steps:
adding an identification field in a source data table to be synchronized;
deploying the Logstar cluster as a data synchronization middleware;
deploying an elastic search cluster as a data storage system;
setting a data synchronization configuration file in the Logstar cluster;
setting an elastic search index in an elastic search cluster, and storing service information and service statistics data;
and starting the Logstash cluster synchronous data.
Preferably, the source data table is a user table in a base database or a service table in a service database.
Preferably, the identification field is a timestamp.
Preferably, the starting logstack cluster synchronization data includes:
when the data in the source data table is changed, changing the value of the identification field in the source data table;
the Logstar cluster polls a corresponding source data table according to a preset time interval;
when the Logstar cluster detects that the value of the identification field in the source data table is changed, reading changed data from the source data table and processing the changed data;
the logstar cluster outputs the processed data to the elastic search cluster.
Preferably, the logstack cluster comprises a plurality of logstacks.
Preferably, when the logstar cluster includes a first logstar and a second logstar, the deploying the logstar cluster as a data synchronization middleware includes the following steps:
registering the first logstar and the second logstar in a ZooKeeper;
the first Logstar and the second Logstar occupy the synchronous lock, carry on the data synchronization, keep the respective synchronous state in the ZooKeeper;
when the ZooKeeper detects that one of the Logstash is abnormal, the execution right is transferred to the other Logstash;
the Logstash obtaining the execution right obtains the data synchronization state from the ZooKeeper and continues to execute the task.
Preferably, the data in the user table in the base database includes: user number, name and gender;
the data in the service table in the service database comprises: user number and service data.
Preferably, the value of the identification field in the modification source data table is:
changing the value of the time stamp into the time when the current data is changed.
As can be seen from the above, in the big data aggregation query method of the present application, since the identification field is added in the source data table, the logstar cluster is used as the data synchronization middleware, the elastic search cluster is used as the data storage system, and the elastic search index is set in the elastic search cluster, the triggering mechanism of data update is introduced, and the service data is synchronously redundant in the elastic search cluster, so that the service query only needs to perform the elastic search aggregation query in the elastic search cluster, and does not need to consider which database the specific source data is distributed in, thereby solving the problem of cross-service and cross-database association query in the micro-service architecture.
In addition, because the elastic search can be used for rapidly inquiring, paging and sorting, the problem that fuzzy inquiring, paging and sorting cannot be carried out in a micro-service architecture is solved, and therefore the inquiring efficiency and accuracy of software on the education cloud platform are greatly improved.
In addition, in the technical scheme of the application, the Logstar cluster is deployed as a data synchronization middleware, and the data synchronization configuration file is arranged in the Logstar cluster, so that calculation, statistics and analysis can be performed during data synchronization through a synchronization mechanism, and corresponding results are stored in an elastic search index, thereby solving the problem of analysis and statistics in a micro-service architecture.
In addition, in the technical scheme of the application, the index of the elastic search can be adjusted according to the actual service requirement, and different data are redundant so as to meet the service requirement, thereby solving the problem of expandability of the service in the micro-service architecture.
In addition, a plurality of Logstash can be further arranged in the Logstash cluster, so that the problem that data synchronization cannot be performed due to single-point faults of the Logstash cluster in the data synchronization process can be effectively avoided.
Drawings
Fig. 1 is a flowchart of a big data aggregation query method in an embodiment of the present application.
Fig. 2 is a schematic deployment diagram of a big data aggregation query method in an embodiment of the present application.
Detailed Description
In order to make the technical scheme and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a flowchart of a big data aggregation query method according to an embodiment of the present application, and fig. 2 is a deployment schematic diagram of the big data aggregation query method according to an embodiment of the present application. As shown in fig. 1 and fig. 2, the big data aggregation query method in the embodiment of the present application includes the following steps:
and step 11, adding an identification field in a source data table which needs to be synchronized.
In this step, an identification field may be added to each source data table to be synchronized as an identification for checking whether the data is changed.
For example, in a preferred embodiment of the present application, the source data table may be a user table in the base database, or may be a service table in the service database.
In addition, in a specific embodiment of the present application, the identification field may be a timestamp, or may be other identification fields that may be used as an identifier for checking whether the data is changed.
Step 12, deploying the Logstar cluster as a data synchronization middleware.
In this step, a logstack cluster may be preset, and the logstack cluster is used as a data synchronization middleware. Wherein logstack is an open source data gathering engine.
Step 13, deploying an elastic search cluster as a data storage system.
In this step, an elastiscearch cluster may be preset, and the elastiscearch cluster may be used as a data storage system. Among them, elastiscearch is a highly scalable open source search engine.
In addition, in the technical scheme of the present application, the step 12 and the step 13 may be performed simultaneously or sequentially. For example, step 12 may be performed first, step 13 may be performed first, or step 12 and step 13 may be performed simultaneously.
And 14, setting a data synchronization configuration file in the Logstar cluster.
And step 15, setting an elastic search index in the elastic search cluster, and storing service information and service statistics data.
In this step, an elastic search index may be set in the elastic search cluster, and service information and service statistics are stored, and a service information index and a service statistics index (may be simply referred to as a statistics index) are formed, respectively.
And step 16, starting the Logstar cluster synchronous data.
After all the settings are completed through the steps 11 to 15, the logstack cluster synchronization data can be started in this step.
In the technical solution of the present application, the above-mentioned step 16 may be implemented in various ways. The following describes in detail the technical solution of the present application by taking one implementation manner as an example.
For example, in a preferred embodiment of the present application, the step 16 may include the following steps:
step 161, when the data in the source data table is changed, changing the value of the identification field in the source data table.
For example, when data (e.g., user number, name, gender, etc.) in a user table in the base database is changed, if the identification field at that time is a timestamp, the value of the timestamp in the user table is changed, and the value of the timestamp is changed to the time when the current data is changed.
For another example, when data (e.g., user number, service data, etc.) in a service table in the service database is changed, if the identification field at that time is a timestamp, the value of the timestamp in the service table is changed, and the value of the timestamp is changed to the time when the current data is changed.
In step 162, the logstack cluster polls the corresponding source data table at preset time intervals.
And 163, when the logstack cluster detects that the value of the identification field in the source data table is changed, reading the changed data from the source data table and processing the changed data.
In step 164, the logstar cluster outputs the processed data to the elastic search cluster.
Through the steps 161 to 164, when the data in the source data table is changed, the changed data can be timely output to the elastic search cluster. Thus, the business can perform the required operations only by querying the elastic search cluster, without regard to which database the specific source data is distributed.
In addition, preferably, in a specific embodiment of the present application, the logstack cluster may include a plurality of logstacks.
For example, as shown in fig. 2, in a preferred embodiment of the present application, the logstack cluster includes two logstacks: a first logstack and a second logstack.
By setting a plurality of Logstash in the Logstash clusters, the problem that data synchronization cannot be performed due to single-point faults of the Logstash clusters in the data synchronization process can be effectively avoided.
For example, in a preferred embodiment of the present application, when the logstack cluster includes two logstacks, the step 12 may include the following steps:
in step 121, the first logstack and the second logstack are registered in a ZooKeeper (a distributed, open source distributed application coordination service).
And step 122, the first Logstar and the second Logstar preempt the synchronous lock, perform data synchronization, and store respective synchronous states into the ZooKeeper.
In step 123, when the ZooKeeper detects that one of the Logstash is abnormal, the execution right is transferred to the other Logstash.
For example, if the ZooKeeper detects that the first logstack is abnormal, the execution right is transferred to the second logstack; or if the ZooKeeper detects that the second logstack is abnormal, transferring the execution right to the first logstack.
In step 124, the logstack that obtains the execution right obtains the data synchronization status from the ZooKeeper, and continues to execute the task. For example, synchronizing data, saving master data synchronization status, and the like.
Similarly, if the logstack cluster includes a plurality of (e.g., 3, 4, etc.) logstacks, operations similar to the above steps 121 to 124 may be performed, so as to avoid the problem that the logstack cluster fails to perform data synchronization due to a single point of failure, and specific operation manners are not described herein.
Through the steps 11-16, the big data aggregation query method can be realized.
In summary, in the technical scheme of the application, since the identification field is added in the source data table, the logstar cluster is used as the data synchronization middleware, the elastic search cluster is used as the data storage system, and the elastic search index is set in the elastic search cluster, a triggering mechanism for data update is introduced, and service data is synchronously redundant in the elastic search cluster, so that service inquiry only needs to perform the elastic search aggregation inquiry in the elastic search cluster without considering the specific database in which the source data is distributed, and the problem of cross-service and cross-database associated inquiry in the micro-service architecture is solved.
In addition, because the elastic search can be used for rapidly inquiring, paging and sorting, the problem that fuzzy inquiring, paging and sorting cannot be carried out in a micro-service architecture is solved, and therefore the inquiring efficiency of software on the education cloud platform is greatly improved.
In addition, in the technical scheme of the application, the Logstar cluster is deployed as a data synchronization middleware, and the data synchronization configuration file is arranged in the Logstar cluster, so that calculation, statistics and analysis can be performed during data synchronization through a synchronization mechanism, and corresponding results are stored in an elastic search index, thereby solving the problem of analysis and statistics in a micro-service architecture.
In addition, in the technical scheme of the application, the index of the elastic search can be adjusted according to the actual service requirement, and different data are redundant so as to meet the service requirement, thereby solving the problem of expandability of the service in the micro-service architecture.
In addition, a plurality of Logstash can be further arranged in the Logstash cluster, so that the problem that data synchronization cannot be performed due to single-point faults of the Logstash cluster in the data synchronization process can be effectively avoided.
Therefore, the application provides the query scheme using the elastic search aggregation on the education cloud platform, and the query scheme is applied to the education cloud platform, so that a new application scene can be created more quickly, and the requirement of creating new business on the education cloud platform can be met more quickly.
The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.
Claims (5)
1. The big data aggregation query method is characterized by comprising the following steps:
adding an identification field in a source data table to be synchronized; the source data table is a user table in a basic database or a service table in a service database; the identification field is a timestamp;
deploying the Logstar cluster as a data synchronization middleware;
deploying an elastic search cluster as a data storage system;
setting a data synchronization configuration file in the Logstar cluster;
setting an elastic search index in an elastic search cluster, and storing service information and service statistics data;
when the data in the source data table is changed, changing the value of the identification field in the source data table;
the Logstar cluster polls a corresponding source data table according to a preset time interval;
when the Logstar cluster detects that the value of the identification field in the source data table is changed, reading changed data from the source data table and processing the changed data;
the logstar cluster outputs the processed data to the elastic search cluster.
2. The method according to claim 1, characterized in that:
the logstack cluster comprises a plurality of logstacks.
3. The method according to claim 2, wherein when the logstack cluster comprises a first logstack and a second logstack, the deploying the logstack cluster as a data synchronization middleware comprises the steps of:
registering the first logstar and the second logstar in a ZooKeeper;
the first Logstar and the second Logstar occupy the synchronous lock, carry on the data synchronization, keep the respective synchronous state in the ZooKeeper;
when the ZooKeeper detects that one of the Logstash is abnormal, the execution right is transferred to the other Logstash;
the Logstash obtaining the execution right obtains the data synchronization state from the ZooKeeper and continues to execute the task.
4. The method according to claim 1, characterized in that:
the data in the user table in the base database comprises: user number, name and gender;
the data in the service table in the service database comprises: user number and service data.
5. The method of claim 1, wherein the value of the identification field in the change source data table is:
changing the value of the time stamp into the time when the current data is changed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811635184.3A CN109840251B (en) | 2018-12-29 | 2018-12-29 | Big data aggregation query method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811635184.3A CN109840251B (en) | 2018-12-29 | 2018-12-29 | Big data aggregation query method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109840251A CN109840251A (en) | 2019-06-04 |
CN109840251B true CN109840251B (en) | 2023-11-03 |
Family
ID=66883502
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811635184.3A Active CN109840251B (en) | 2018-12-29 | 2018-12-29 | Big data aggregation query method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109840251B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110457540B (en) * | 2019-06-28 | 2020-07-14 | 卓尔智联(武汉)研究院有限公司 | Data query method, service platform, terminal device and storage medium |
CN110532272A (en) * | 2019-08-30 | 2019-12-03 | 北京东软望海科技有限公司 | Data query method, apparatus, electronic equipment and computer readable storage medium |
CN111309793A (en) * | 2020-01-15 | 2020-06-19 | 北大方正集团有限公司 | Data processing method, device and equipment |
CN111711639B (en) * | 2020-06-29 | 2023-04-18 | 深圳前海微众银行股份有限公司 | Terminal, data transmission method, system, and computer-readable storage medium |
CN113364864B (en) * | 2021-06-03 | 2022-09-30 | 上海微盟企业发展有限公司 | Server data synchronization method, system and storage medium |
CN114780820B (en) * | 2022-04-28 | 2022-11-01 | 广州高专资讯科技有限公司 | Open source platform-based target matching system and method |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104065741A (en) * | 2014-07-04 | 2014-09-24 | 用友软件股份有限公司 | Data collection system and method |
CN106341454A (en) * | 2016-08-23 | 2017-01-18 | 世纪龙信息网络有限责任公司 | Across-room multiple-active distributed database management system and across-room multiple-active distributed database management method |
CN106682073A (en) * | 2016-11-14 | 2017-05-17 | 上海轻维软件有限公司 | HBase fuzzy retrieval system based on Elastic Search |
CN106682140A (en) * | 2016-12-20 | 2017-05-17 | 华北计算技术研究所(中国电子科技集团公司第十五研究所) | Multi-system user incremental synchronization method based on timestamps and mapping strategies |
CN107688520A (en) * | 2017-07-18 | 2018-02-13 | 北京奥鹏远程教育中心有限公司 | distributed service tracking system and method |
CN107861859A (en) * | 2017-11-22 | 2018-03-30 | 北京汇通金财信息科技有限公司 | A kind of blog management method and system based on micro services framework |
CN108255592A (en) * | 2017-12-19 | 2018-07-06 | 武汉市烽视威科技有限公司 | A kind of Quartz clusters timing task processing system and method |
CN108376181A (en) * | 2018-04-24 | 2018-08-07 | 丹阳飓风物流股份有限公司 | Log services platform based on ELK |
CN108540352A (en) * | 2018-05-02 | 2018-09-14 | 上海妙克信息科技有限公司 | A kind of optimization extended method for on-line education system subscriber channel Auto-matching |
CN109086409A (en) * | 2018-08-02 | 2018-12-25 | 泰康保险集团股份有限公司 | Micro services data processing method, device, electronic equipment and computer-readable medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8200846B2 (en) * | 2009-02-10 | 2012-06-12 | International Business Machines Corporation | Timestamp synchronization for queries to database portions in nodes that have independent clocks in a parallel computer system |
US10970280B2 (en) * | 2015-10-07 | 2021-04-06 | International Business Machines Corporation | Query plan based on a data storage relationship |
-
2018
- 2018-12-29 CN CN201811635184.3A patent/CN109840251B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104065741A (en) * | 2014-07-04 | 2014-09-24 | 用友软件股份有限公司 | Data collection system and method |
CN106341454A (en) * | 2016-08-23 | 2017-01-18 | 世纪龙信息网络有限责任公司 | Across-room multiple-active distributed database management system and across-room multiple-active distributed database management method |
CN106682073A (en) * | 2016-11-14 | 2017-05-17 | 上海轻维软件有限公司 | HBase fuzzy retrieval system based on Elastic Search |
CN106682140A (en) * | 2016-12-20 | 2017-05-17 | 华北计算技术研究所(中国电子科技集团公司第十五研究所) | Multi-system user incremental synchronization method based on timestamps and mapping strategies |
CN107688520A (en) * | 2017-07-18 | 2018-02-13 | 北京奥鹏远程教育中心有限公司 | distributed service tracking system and method |
CN107861859A (en) * | 2017-11-22 | 2018-03-30 | 北京汇通金财信息科技有限公司 | A kind of blog management method and system based on micro services framework |
CN108255592A (en) * | 2017-12-19 | 2018-07-06 | 武汉市烽视威科技有限公司 | A kind of Quartz clusters timing task processing system and method |
CN108376181A (en) * | 2018-04-24 | 2018-08-07 | 丹阳飓风物流股份有限公司 | Log services platform based on ELK |
CN108540352A (en) * | 2018-05-02 | 2018-09-14 | 上海妙克信息科技有限公司 | A kind of optimization extended method for on-line education system subscriber channel Auto-matching |
CN109086409A (en) * | 2018-08-02 | 2018-12-25 | 泰康保险集团股份有限公司 | Micro services data processing method, device, electronic equipment and computer-readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN109840251A (en) | 2019-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109840251B (en) | Big data aggregation query method | |
CN107783975B (en) | Method and device for synchronous processing of distributed databases | |
US9154553B2 (en) | Methods, devices and systems for dynamically managing memberships in replicated state machines within a distributed computing environment | |
US9589041B2 (en) | Client and server integration for replicating data | |
CA3065118C (en) | Distributed searching and index updating method and system, servers, and computer devices | |
CN103345502B (en) | Transaction processing method and system of distributed type database | |
US20130124565A1 (en) | Mechanism for co-located data placement in a parallel elastic database management system | |
CA2972382A1 (en) | Apparatus and methods of data synchronization | |
CN111143382B (en) | Data processing method, system and computer readable storage medium | |
CN106250543A (en) | A kind of automation data inquiry synchronous storage method | |
EP2564306A1 (en) | System and methods for mapping and searching objects in multidimensional space | |
CN104504128B (en) | A kind of construction method of the asymmetric deployment of Database Systems based on federal cluster | |
CN104050276A (en) | Cache processing method and system of distributed database | |
CN105608228B (en) | A kind of efficient distributed RDF data storage method | |
CN105843933A (en) | Index building method for distributed memory columnar database | |
CN109145060A (en) | Data processing method and device | |
CN109857768B (en) | Big data aggregation query method | |
US10089350B2 (en) | Proactive query migration to prevent failures | |
CN113672692A (en) | Data processing method, data processing device, computer equipment and storage medium | |
JP6237633B2 (en) | Distributed storage device, storage node, data providing method and program | |
CN107291938A (en) | Order Query System and method | |
CN113590651B (en) | HQL-based cross-cluster data processing system and method | |
CN112416944A (en) | Method and equipment for synchronizing service data | |
JP6180710B2 (en) | Data storage method and apparatus | |
CN113254437A (en) | Batch processing job processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |