CN109840251B - Big data aggregation query method - Google Patents

Big data aggregation query method Download PDF

Info

Publication number
CN109840251B
CN109840251B CN201811635184.3A CN201811635184A CN109840251B CN 109840251 B CN109840251 B CN 109840251B CN 201811635184 A CN201811635184 A CN 201811635184A CN 109840251 B CN109840251 B CN 109840251B
Authority
CN
China
Prior art keywords
data
cluster
logstar
service
elastic search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811635184.3A
Other languages
Chinese (zh)
Other versions
CN109840251A (en
Inventor
王雪松
刘铁生
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Open Distance Education Center Co ltd
Original Assignee
Beijing Open Distance Education Center Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Open Distance Education Center Co ltd filed Critical Beijing Open Distance Education Center Co ltd
Priority to CN201811635184.3A priority Critical patent/CN109840251B/en
Publication of CN109840251A publication Critical patent/CN109840251A/en
Application granted granted Critical
Publication of CN109840251B publication Critical patent/CN109840251B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a big data aggregation query method, which comprises the following steps: adding an identification field in a source data table to be synchronized; deploying the Logstar cluster as a data synchronization middleware; deploying an elastic search cluster as a data storage system; setting a data synchronization configuration file in the Logstar cluster; setting an elastic search index in an elastic search cluster, and storing service information and service statistics data; and starting the Logstash cluster synchronous data. By the method and the system, the problem of cross-service and cross-database associated query in the micro-service architecture can be solved, and the query efficiency of software on the education cloud platform is improved.

Description

Big data aggregation query method
Technical Field
The application relates to the technical field of data query, in particular to a big data aggregation query method.
Background
At present, along with the innovation progress of computer technology and network technology, particularly the progress of virtualization technology, new concepts and new schemes are innovated and developed, and particularly the rapid development of Docker technology lays a foundation for the promotion of an online education cloud platform.
In the prior art, educational cloud platforms generally employ a micro-service architecture. The micro-service architecture solves some of the problems in the traditional hierarchical architecture, and has the core characteristics of high scalability, easy development, testing and deployment of independent service components, which are decoupled, distributed and independent from each other.
However, when the micro-service architecture is adopted on the education cloud platform, the source data are distributed in each database, so that when the source data are queried, the problems of cross-service and cross-database associated query are difficult to solve, and the query efficiency is low.
Disclosure of Invention
In view of the above, the application provides a big data aggregation query method, so that the problem of cross-service and cross-database association query in a micro-service architecture can be solved, and the query efficiency of software on an education cloud platform is improved.
The technical scheme of the application is realized specifically as follows:
a big data aggregation query method comprises the following steps:
adding an identification field in a source data table to be synchronized;
deploying the Logstar cluster as a data synchronization middleware;
deploying an elastic search cluster as a data storage system;
setting a data synchronization configuration file in the Logstar cluster;
setting an elastic search index in an elastic search cluster, and storing service information and service statistics data;
and starting the Logstash cluster synchronous data.
Preferably, the source data table is a user table in a base database or a service table in a service database.
Preferably, the identification field is a timestamp.
Preferably, the starting logstack cluster synchronization data includes:
when the data in the source data table is changed, changing the value of the identification field in the source data table;
the Logstar cluster polls a corresponding source data table according to a preset time interval;
when the Logstar cluster detects that the value of the identification field in the source data table is changed, reading changed data from the source data table and processing the changed data;
the logstar cluster outputs the processed data to the elastic search cluster.
Preferably, the logstack cluster comprises a plurality of logstacks.
Preferably, when the logstar cluster includes a first logstar and a second logstar, the deploying the logstar cluster as a data synchronization middleware includes the following steps:
registering the first logstar and the second logstar in a ZooKeeper;
the first Logstar and the second Logstar occupy the synchronous lock, carry on the data synchronization, keep the respective synchronous state in the ZooKeeper;
when the ZooKeeper detects that one of the Logstash is abnormal, the execution right is transferred to the other Logstash;
the Logstash obtaining the execution right obtains the data synchronization state from the ZooKeeper and continues to execute the task.
Preferably, the data in the user table in the base database includes: user number, name and gender;
the data in the service table in the service database comprises: user number and service data.
Preferably, the value of the identification field in the modification source data table is:
changing the value of the time stamp into the time when the current data is changed.
As can be seen from the above, in the big data aggregation query method of the present application, since the identification field is added in the source data table, the logstar cluster is used as the data synchronization middleware, the elastic search cluster is used as the data storage system, and the elastic search index is set in the elastic search cluster, the triggering mechanism of data update is introduced, and the service data is synchronously redundant in the elastic search cluster, so that the service query only needs to perform the elastic search aggregation query in the elastic search cluster, and does not need to consider which database the specific source data is distributed in, thereby solving the problem of cross-service and cross-database association query in the micro-service architecture.
In addition, because the elastic search can be used for rapidly inquiring, paging and sorting, the problem that fuzzy inquiring, paging and sorting cannot be carried out in a micro-service architecture is solved, and therefore the inquiring efficiency and accuracy of software on the education cloud platform are greatly improved.
In addition, in the technical scheme of the application, the Logstar cluster is deployed as a data synchronization middleware, and the data synchronization configuration file is arranged in the Logstar cluster, so that calculation, statistics and analysis can be performed during data synchronization through a synchronization mechanism, and corresponding results are stored in an elastic search index, thereby solving the problem of analysis and statistics in a micro-service architecture.
In addition, in the technical scheme of the application, the index of the elastic search can be adjusted according to the actual service requirement, and different data are redundant so as to meet the service requirement, thereby solving the problem of expandability of the service in the micro-service architecture.
In addition, a plurality of Logstash can be further arranged in the Logstash cluster, so that the problem that data synchronization cannot be performed due to single-point faults of the Logstash cluster in the data synchronization process can be effectively avoided.
Drawings
Fig. 1 is a flowchart of a big data aggregation query method in an embodiment of the present application.
Fig. 2 is a schematic deployment diagram of a big data aggregation query method in an embodiment of the present application.
Detailed Description
In order to make the technical scheme and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a flowchart of a big data aggregation query method according to an embodiment of the present application, and fig. 2 is a deployment schematic diagram of the big data aggregation query method according to an embodiment of the present application. As shown in fig. 1 and fig. 2, the big data aggregation query method in the embodiment of the present application includes the following steps:
and step 11, adding an identification field in a source data table which needs to be synchronized.
In this step, an identification field may be added to each source data table to be synchronized as an identification for checking whether the data is changed.
For example, in a preferred embodiment of the present application, the source data table may be a user table in the base database, or may be a service table in the service database.
In addition, in a specific embodiment of the present application, the identification field may be a timestamp, or may be other identification fields that may be used as an identifier for checking whether the data is changed.
Step 12, deploying the Logstar cluster as a data synchronization middleware.
In this step, a logstack cluster may be preset, and the logstack cluster is used as a data synchronization middleware. Wherein logstack is an open source data gathering engine.
Step 13, deploying an elastic search cluster as a data storage system.
In this step, an elastiscearch cluster may be preset, and the elastiscearch cluster may be used as a data storage system. Among them, elastiscearch is a highly scalable open source search engine.
In addition, in the technical scheme of the present application, the step 12 and the step 13 may be performed simultaneously or sequentially. For example, step 12 may be performed first, step 13 may be performed first, or step 12 and step 13 may be performed simultaneously.
And 14, setting a data synchronization configuration file in the Logstar cluster.
And step 15, setting an elastic search index in the elastic search cluster, and storing service information and service statistics data.
In this step, an elastic search index may be set in the elastic search cluster, and service information and service statistics are stored, and a service information index and a service statistics index (may be simply referred to as a statistics index) are formed, respectively.
And step 16, starting the Logstar cluster synchronous data.
After all the settings are completed through the steps 11 to 15, the logstack cluster synchronization data can be started in this step.
In the technical solution of the present application, the above-mentioned step 16 may be implemented in various ways. The following describes in detail the technical solution of the present application by taking one implementation manner as an example.
For example, in a preferred embodiment of the present application, the step 16 may include the following steps:
step 161, when the data in the source data table is changed, changing the value of the identification field in the source data table.
For example, when data (e.g., user number, name, gender, etc.) in a user table in the base database is changed, if the identification field at that time is a timestamp, the value of the timestamp in the user table is changed, and the value of the timestamp is changed to the time when the current data is changed.
For another example, when data (e.g., user number, service data, etc.) in a service table in the service database is changed, if the identification field at that time is a timestamp, the value of the timestamp in the service table is changed, and the value of the timestamp is changed to the time when the current data is changed.
In step 162, the logstack cluster polls the corresponding source data table at preset time intervals.
And 163, when the logstack cluster detects that the value of the identification field in the source data table is changed, reading the changed data from the source data table and processing the changed data.
In step 164, the logstar cluster outputs the processed data to the elastic search cluster.
Through the steps 161 to 164, when the data in the source data table is changed, the changed data can be timely output to the elastic search cluster. Thus, the business can perform the required operations only by querying the elastic search cluster, without regard to which database the specific source data is distributed.
In addition, preferably, in a specific embodiment of the present application, the logstack cluster may include a plurality of logstacks.
For example, as shown in fig. 2, in a preferred embodiment of the present application, the logstack cluster includes two logstacks: a first logstack and a second logstack.
By setting a plurality of Logstash in the Logstash clusters, the problem that data synchronization cannot be performed due to single-point faults of the Logstash clusters in the data synchronization process can be effectively avoided.
For example, in a preferred embodiment of the present application, when the logstack cluster includes two logstacks, the step 12 may include the following steps:
in step 121, the first logstack and the second logstack are registered in a ZooKeeper (a distributed, open source distributed application coordination service).
And step 122, the first Logstar and the second Logstar preempt the synchronous lock, perform data synchronization, and store respective synchronous states into the ZooKeeper.
In step 123, when the ZooKeeper detects that one of the Logstash is abnormal, the execution right is transferred to the other Logstash.
For example, if the ZooKeeper detects that the first logstack is abnormal, the execution right is transferred to the second logstack; or if the ZooKeeper detects that the second logstack is abnormal, transferring the execution right to the first logstack.
In step 124, the logstack that obtains the execution right obtains the data synchronization status from the ZooKeeper, and continues to execute the task. For example, synchronizing data, saving master data synchronization status, and the like.
Similarly, if the logstack cluster includes a plurality of (e.g., 3, 4, etc.) logstacks, operations similar to the above steps 121 to 124 may be performed, so as to avoid the problem that the logstack cluster fails to perform data synchronization due to a single point of failure, and specific operation manners are not described herein.
Through the steps 11-16, the big data aggregation query method can be realized.
In summary, in the technical scheme of the application, since the identification field is added in the source data table, the logstar cluster is used as the data synchronization middleware, the elastic search cluster is used as the data storage system, and the elastic search index is set in the elastic search cluster, a triggering mechanism for data update is introduced, and service data is synchronously redundant in the elastic search cluster, so that service inquiry only needs to perform the elastic search aggregation inquiry in the elastic search cluster without considering the specific database in which the source data is distributed, and the problem of cross-service and cross-database associated inquiry in the micro-service architecture is solved.
In addition, because the elastic search can be used for rapidly inquiring, paging and sorting, the problem that fuzzy inquiring, paging and sorting cannot be carried out in a micro-service architecture is solved, and therefore the inquiring efficiency of software on the education cloud platform is greatly improved.
In addition, in the technical scheme of the application, the Logstar cluster is deployed as a data synchronization middleware, and the data synchronization configuration file is arranged in the Logstar cluster, so that calculation, statistics and analysis can be performed during data synchronization through a synchronization mechanism, and corresponding results are stored in an elastic search index, thereby solving the problem of analysis and statistics in a micro-service architecture.
In addition, in the technical scheme of the application, the index of the elastic search can be adjusted according to the actual service requirement, and different data are redundant so as to meet the service requirement, thereby solving the problem of expandability of the service in the micro-service architecture.
In addition, a plurality of Logstash can be further arranged in the Logstash cluster, so that the problem that data synchronization cannot be performed due to single-point faults of the Logstash cluster in the data synchronization process can be effectively avoided.
Therefore, the application provides the query scheme using the elastic search aggregation on the education cloud platform, and the query scheme is applied to the education cloud platform, so that a new application scene can be created more quickly, and the requirement of creating new business on the education cloud platform can be met more quickly.
The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims (5)

1. The big data aggregation query method is characterized by comprising the following steps:
adding an identification field in a source data table to be synchronized; the source data table is a user table in a basic database or a service table in a service database; the identification field is a timestamp;
deploying the Logstar cluster as a data synchronization middleware;
deploying an elastic search cluster as a data storage system;
setting a data synchronization configuration file in the Logstar cluster;
setting an elastic search index in an elastic search cluster, and storing service information and service statistics data;
when the data in the source data table is changed, changing the value of the identification field in the source data table;
the Logstar cluster polls a corresponding source data table according to a preset time interval;
when the Logstar cluster detects that the value of the identification field in the source data table is changed, reading changed data from the source data table and processing the changed data;
the logstar cluster outputs the processed data to the elastic search cluster.
2. The method according to claim 1, characterized in that:
the logstack cluster comprises a plurality of logstacks.
3. The method according to claim 2, wherein when the logstack cluster comprises a first logstack and a second logstack, the deploying the logstack cluster as a data synchronization middleware comprises the steps of:
registering the first logstar and the second logstar in a ZooKeeper;
the first Logstar and the second Logstar occupy the synchronous lock, carry on the data synchronization, keep the respective synchronous state in the ZooKeeper;
when the ZooKeeper detects that one of the Logstash is abnormal, the execution right is transferred to the other Logstash;
the Logstash obtaining the execution right obtains the data synchronization state from the ZooKeeper and continues to execute the task.
4. The method according to claim 1, characterized in that:
the data in the user table in the base database comprises: user number, name and gender;
the data in the service table in the service database comprises: user number and service data.
5. The method of claim 1, wherein the value of the identification field in the change source data table is:
changing the value of the time stamp into the time when the current data is changed.
CN201811635184.3A 2018-12-29 2018-12-29 Big data aggregation query method Active CN109840251B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811635184.3A CN109840251B (en) 2018-12-29 2018-12-29 Big data aggregation query method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811635184.3A CN109840251B (en) 2018-12-29 2018-12-29 Big data aggregation query method

Publications (2)

Publication Number Publication Date
CN109840251A CN109840251A (en) 2019-06-04
CN109840251B true CN109840251B (en) 2023-11-03

Family

ID=66883502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811635184.3A Active CN109840251B (en) 2018-12-29 2018-12-29 Big data aggregation query method

Country Status (1)

Country Link
CN (1) CN109840251B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457540B (en) * 2019-06-28 2020-07-14 卓尔智联(武汉)研究院有限公司 Data query method, service platform, terminal device and storage medium
CN110532272A (en) * 2019-08-30 2019-12-03 北京东软望海科技有限公司 Data query method, apparatus, electronic equipment and computer readable storage medium
CN111309793A (en) * 2020-01-15 2020-06-19 北大方正集团有限公司 Data processing method, device and equipment
CN111711639B (en) * 2020-06-29 2023-04-18 深圳前海微众银行股份有限公司 Terminal, data transmission method, system, and computer-readable storage medium
CN113364864B (en) * 2021-06-03 2022-09-30 上海微盟企业发展有限公司 Server data synchronization method, system and storage medium
CN114780820B (en) * 2022-04-28 2022-11-01 广州高专资讯科技有限公司 Open source platform-based target matching system and method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104065741A (en) * 2014-07-04 2014-09-24 用友软件股份有限公司 Data collection system and method
CN106341454A (en) * 2016-08-23 2017-01-18 世纪龙信息网络有限责任公司 Across-room multiple-active distributed database management system and across-room multiple-active distributed database management method
CN106682073A (en) * 2016-11-14 2017-05-17 上海轻维软件有限公司 HBase fuzzy retrieval system based on Elastic Search
CN106682140A (en) * 2016-12-20 2017-05-17 华北计算技术研究所(中国电子科技集团公司第十五研究所) Multi-system user incremental synchronization method based on timestamps and mapping strategies
CN107688520A (en) * 2017-07-18 2018-02-13 北京奥鹏远程教育中心有限公司 distributed service tracking system and method
CN107861859A (en) * 2017-11-22 2018-03-30 北京汇通金财信息科技有限公司 A kind of blog management method and system based on micro services framework
CN108255592A (en) * 2017-12-19 2018-07-06 武汉市烽视威科技有限公司 A kind of Quartz clusters timing task processing system and method
CN108376181A (en) * 2018-04-24 2018-08-07 丹阳飓风物流股份有限公司 Log services platform based on ELK
CN108540352A (en) * 2018-05-02 2018-09-14 上海妙克信息科技有限公司 A kind of optimization extended method for on-line education system subscriber channel Auto-matching
CN109086409A (en) * 2018-08-02 2018-12-25 泰康保险集团股份有限公司 Micro services data processing method, device, electronic equipment and computer-readable medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8200846B2 (en) * 2009-02-10 2012-06-12 International Business Machines Corporation Timestamp synchronization for queries to database portions in nodes that have independent clocks in a parallel computer system
US10970280B2 (en) * 2015-10-07 2021-04-06 International Business Machines Corporation Query plan based on a data storage relationship

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104065741A (en) * 2014-07-04 2014-09-24 用友软件股份有限公司 Data collection system and method
CN106341454A (en) * 2016-08-23 2017-01-18 世纪龙信息网络有限责任公司 Across-room multiple-active distributed database management system and across-room multiple-active distributed database management method
CN106682073A (en) * 2016-11-14 2017-05-17 上海轻维软件有限公司 HBase fuzzy retrieval system based on Elastic Search
CN106682140A (en) * 2016-12-20 2017-05-17 华北计算技术研究所(中国电子科技集团公司第十五研究所) Multi-system user incremental synchronization method based on timestamps and mapping strategies
CN107688520A (en) * 2017-07-18 2018-02-13 北京奥鹏远程教育中心有限公司 distributed service tracking system and method
CN107861859A (en) * 2017-11-22 2018-03-30 北京汇通金财信息科技有限公司 A kind of blog management method and system based on micro services framework
CN108255592A (en) * 2017-12-19 2018-07-06 武汉市烽视威科技有限公司 A kind of Quartz clusters timing task processing system and method
CN108376181A (en) * 2018-04-24 2018-08-07 丹阳飓风物流股份有限公司 Log services platform based on ELK
CN108540352A (en) * 2018-05-02 2018-09-14 上海妙克信息科技有限公司 A kind of optimization extended method for on-line education system subscriber channel Auto-matching
CN109086409A (en) * 2018-08-02 2018-12-25 泰康保险集团股份有限公司 Micro services data processing method, device, electronic equipment and computer-readable medium

Also Published As

Publication number Publication date
CN109840251A (en) 2019-06-04

Similar Documents

Publication Publication Date Title
CN109840251B (en) Big data aggregation query method
CN107783975B (en) Method and device for synchronous processing of distributed databases
US9154553B2 (en) Methods, devices and systems for dynamically managing memberships in replicated state machines within a distributed computing environment
US9589041B2 (en) Client and server integration for replicating data
CA3065118C (en) Distributed searching and index updating method and system, servers, and computer devices
CN103345502B (en) Transaction processing method and system of distributed type database
US20130124565A1 (en) Mechanism for co-located data placement in a parallel elastic database management system
CA2972382A1 (en) Apparatus and methods of data synchronization
CN111143382B (en) Data processing method, system and computer readable storage medium
CN106250543A (en) A kind of automation data inquiry synchronous storage method
EP2564306A1 (en) System and methods for mapping and searching objects in multidimensional space
CN104504128B (en) A kind of construction method of the asymmetric deployment of Database Systems based on federal cluster
CN104050276A (en) Cache processing method and system of distributed database
CN105608228B (en) A kind of efficient distributed RDF data storage method
CN105843933A (en) Index building method for distributed memory columnar database
CN109145060A (en) Data processing method and device
CN109857768B (en) Big data aggregation query method
US10089350B2 (en) Proactive query migration to prevent failures
CN113672692A (en) Data processing method, data processing device, computer equipment and storage medium
JP6237633B2 (en) Distributed storage device, storage node, data providing method and program
CN107291938A (en) Order Query System and method
CN113590651B (en) HQL-based cross-cluster data processing system and method
CN112416944A (en) Method and equipment for synchronizing service data
JP6180710B2 (en) Data storage method and apparatus
CN113254437A (en) Batch processing job processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant