CN109840251B

CN109840251B - Big data aggregation query method

Info

Publication number: CN109840251B
Application number: CN201811635184.3A
Authority: CN
Inventors: 王雪松; 刘铁生; 王勇
Original assignee: Beijing Open Distance Education Center Co ltd
Current assignee: Beijing Open Distance Education Center Co ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2023-11-03
Anticipated expiration: 2038-12-29
Also published as: CN109840251A

Abstract

The application provides a big data aggregation query method, which comprises the following steps: adding an identification field in a source data table to be synchronized; deploying the Logstar cluster as a data synchronization middleware; deploying an elastic search cluster as a data storage system; setting a data synchronization configuration file in the Logstar cluster; setting an elastic search index in an elastic search cluster, and storing service information and service statistics data; and starting the Logstash cluster synchronous data. By the method and the system, the problem of cross-service and cross-database associated query in the micro-service architecture can be solved, and the query efficiency of software on the education cloud platform is improved.

Description

Big data aggregation query method

Technical Field

The application relates to the technical field of data query, in particular to a big data aggregation query method.

Background

At present, along with the innovation progress of computer technology and network technology, particularly the progress of virtualization technology, new concepts and new schemes are innovated and developed, and particularly the rapid development of Docker technology lays a foundation for the promotion of an online education cloud platform.

In the prior art, educational cloud platforms generally employ a micro-service architecture. The micro-service architecture solves some of the problems in the traditional hierarchical architecture, and has the core characteristics of high scalability, easy development, testing and deployment of independent service components, which are decoupled, distributed and independent from each other.

However, when the micro-service architecture is adopted on the education cloud platform, the source data are distributed in each database, so that when the source data are queried, the problems of cross-service and cross-database associated query are difficult to solve, and the query efficiency is low.

Disclosure of Invention

In view of the above, the application provides a big data aggregation query method, so that the problem of cross-service and cross-database association query in a micro-service architecture can be solved, and the query efficiency of software on an education cloud platform is improved.

The technical scheme of the application is realized specifically as follows:

a big data aggregation query method comprises the following steps:

adding an identification field in a source data table to be synchronized;

deploying the Logstar cluster as a data synchronization middleware;

deploying an elastic search cluster as a data storage system;

setting a data synchronization configuration file in the Logstar cluster;

setting an elastic search index in an elastic search cluster, and storing service information and service statistics data;

and starting the Logstash cluster synchronous data.

Preferably, the source data table is a user table in a base database or a service table in a service database.

Preferably, the identification field is a timestamp.

Preferably, the starting logstack cluster synchronization data includes:

when the data in the source data table is changed, changing the value of the identification field in the source data table;

the Logstar cluster polls a corresponding source data table according to a preset time interval;

when the Logstar cluster detects that the value of the identification field in the source data table is changed, reading changed data from the source data table and processing the changed data;

the logstar cluster outputs the processed data to the elastic search cluster.

Preferably, the logstack cluster comprises a plurality of logstacks.

Preferably, when the logstar cluster includes a first logstar and a second logstar, the deploying the logstar cluster as a data synchronization middleware includes the following steps:

registering the first logstar and the second logstar in a ZooKeeper;

the first Logstar and the second Logstar occupy the synchronous lock, carry on the data synchronization, keep the respective synchronous state in the ZooKeeper;

when the ZooKeeper detects that one of the Logstash is abnormal, the execution right is transferred to the other Logstash;

the Logstash obtaining the execution right obtains the data synchronization state from the ZooKeeper and continues to execute the task.

Preferably, the data in the user table in the base database includes: user number, name and gender;

the data in the service table in the service database comprises: user number and service data.

Preferably, the value of the identification field in the modification source data table is:

changing the value of the time stamp into the time when the current data is changed.

As can be seen from the above, in the big data aggregation query method of the present application, since the identification field is added in the source data table, the logstar cluster is used as the data synchronization middleware, the elastic search cluster is used as the data storage system, and the elastic search index is set in the elastic search cluster, the triggering mechanism of data update is introduced, and the service data is synchronously redundant in the elastic search cluster, so that the service query only needs to perform the elastic search aggregation query in the elastic search cluster, and does not need to consider which database the specific source data is distributed in, thereby solving the problem of cross-service and cross-database association query in the micro-service architecture.

In addition, because the elastic search can be used for rapidly inquiring, paging and sorting, the problem that fuzzy inquiring, paging and sorting cannot be carried out in a micro-service architecture is solved, and therefore the inquiring efficiency and accuracy of software on the education cloud platform are greatly improved.

In addition, in the technical scheme of the application, the Logstar cluster is deployed as a data synchronization middleware, and the data synchronization configuration file is arranged in the Logstar cluster, so that calculation, statistics and analysis can be performed during data synchronization through a synchronization mechanism, and corresponding results are stored in an elastic search index, thereby solving the problem of analysis and statistics in a micro-service architecture.

In addition, in the technical scheme of the application, the index of the elastic search can be adjusted according to the actual service requirement, and different data are redundant so as to meet the service requirement, thereby solving the problem of expandability of the service in the micro-service architecture.

In addition, a plurality of Logstash can be further arranged in the Logstash cluster, so that the problem that data synchronization cannot be performed due to single-point faults of the Logstash cluster in the data synchronization process can be effectively avoided.

Drawings

Fig. 1 is a flowchart of a big data aggregation query method in an embodiment of the present application.

Fig. 2 is a schematic deployment diagram of a big data aggregation query method in an embodiment of the present application.

Detailed Description

In order to make the technical scheme and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a flowchart of a big data aggregation query method according to an embodiment of the present application, and fig. 2 is a deployment schematic diagram of the big data aggregation query method according to an embodiment of the present application. As shown in fig. 1 and fig. 2, the big data aggregation query method in the embodiment of the present application includes the following steps:

and step 11, adding an identification field in a source data table which needs to be synchronized.

In this step, an identification field may be added to each source data table to be synchronized as an identification for checking whether the data is changed.

For example, in a preferred embodiment of the present application, the source data table may be a user table in the base database, or may be a service table in the service database.

In addition, in a specific embodiment of the present application, the identification field may be a timestamp, or may be other identification fields that may be used as an identifier for checking whether the data is changed.

Step 12, deploying the Logstar cluster as a data synchronization middleware.

In this step, a logstack cluster may be preset, and the logstack cluster is used as a data synchronization middleware. Wherein logstack is an open source data gathering engine.

Step 13, deploying an elastic search cluster as a data storage system.

In this step, an elastiscearch cluster may be preset, and the elastiscearch cluster may be used as a data storage system. Among them, elastiscearch is a highly scalable open source search engine.

In addition, in the technical scheme of the present application, the step 12 and the step 13 may be performed simultaneously or sequentially. For example, step 12 may be performed first, step 13 may be performed first, or step 12 and step 13 may be performed simultaneously.

And 14, setting a data synchronization configuration file in the Logstar cluster.

And step 15, setting an elastic search index in the elastic search cluster, and storing service information and service statistics data.

In this step, an elastic search index may be set in the elastic search cluster, and service information and service statistics are stored, and a service information index and a service statistics index (may be simply referred to as a statistics index) are formed, respectively.

And step 16, starting the Logstar cluster synchronous data.

After all the settings are completed through the steps 11 to 15, the logstack cluster synchronization data can be started in this step.

In the technical solution of the present application, the above-mentioned step 16 may be implemented in various ways. The following describes in detail the technical solution of the present application by taking one implementation manner as an example.

For example, in a preferred embodiment of the present application, the step 16 may include the following steps:

step 161, when the data in the source data table is changed, changing the value of the identification field in the source data table.

For example, when data (e.g., user number, name, gender, etc.) in a user table in the base database is changed, if the identification field at that time is a timestamp, the value of the timestamp in the user table is changed, and the value of the timestamp is changed to the time when the current data is changed.

For another example, when data (e.g., user number, service data, etc.) in a service table in the service database is changed, if the identification field at that time is a timestamp, the value of the timestamp in the service table is changed, and the value of the timestamp is changed to the time when the current data is changed.

In step 162, the logstack cluster polls the corresponding source data table at preset time intervals.

And 163, when the logstack cluster detects that the value of the identification field in the source data table is changed, reading the changed data from the source data table and processing the changed data.

In step 164, the logstar cluster outputs the processed data to the elastic search cluster.

Through the steps 161 to 164, when the data in the source data table is changed, the changed data can be timely output to the elastic search cluster. Thus, the business can perform the required operations only by querying the elastic search cluster, without regard to which database the specific source data is distributed.

In addition, preferably, in a specific embodiment of the present application, the logstack cluster may include a plurality of logstacks.

For example, as shown in fig. 2, in a preferred embodiment of the present application, the logstack cluster includes two logstacks: a first logstack and a second logstack.

By setting a plurality of Logstash in the Logstash clusters, the problem that data synchronization cannot be performed due to single-point faults of the Logstash clusters in the data synchronization process can be effectively avoided.

For example, in a preferred embodiment of the present application, when the logstack cluster includes two logstacks, the step 12 may include the following steps:

in step 121, the first logstack and the second logstack are registered in a ZooKeeper (a distributed, open source distributed application coordination service).

And step 122, the first Logstar and the second Logstar preempt the synchronous lock, perform data synchronization, and store respective synchronous states into the ZooKeeper.

In step 123, when the ZooKeeper detects that one of the Logstash is abnormal, the execution right is transferred to the other Logstash.

For example, if the ZooKeeper detects that the first logstack is abnormal, the execution right is transferred to the second logstack; or if the ZooKeeper detects that the second logstack is abnormal, transferring the execution right to the first logstack.

In step 124, the logstack that obtains the execution right obtains the data synchronization status from the ZooKeeper, and continues to execute the task. For example, synchronizing data, saving master data synchronization status, and the like.

Similarly, if the logstack cluster includes a plurality of (e.g., 3, 4, etc.) logstacks, operations similar to the above steps 121 to 124 may be performed, so as to avoid the problem that the logstack cluster fails to perform data synchronization due to a single point of failure, and specific operation manners are not described herein.

Through the steps 11-16, the big data aggregation query method can be realized.

In summary, in the technical scheme of the application, since the identification field is added in the source data table, the logstar cluster is used as the data synchronization middleware, the elastic search cluster is used as the data storage system, and the elastic search index is set in the elastic search cluster, a triggering mechanism for data update is introduced, and service data is synchronously redundant in the elastic search cluster, so that service inquiry only needs to perform the elastic search aggregation inquiry in the elastic search cluster without considering the specific database in which the source data is distributed, and the problem of cross-service and cross-database associated inquiry in the micro-service architecture is solved.

In addition, because the elastic search can be used for rapidly inquiring, paging and sorting, the problem that fuzzy inquiring, paging and sorting cannot be carried out in a micro-service architecture is solved, and therefore the inquiring efficiency of software on the education cloud platform is greatly improved.

Therefore, the application provides the query scheme using the elastic search aggregation on the education cloud platform, and the query scheme is applied to the education cloud platform, so that a new application scene can be created more quickly, and the requirement of creating new business on the education cloud platform can be met more quickly.

The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims

1. The big data aggregation query method is characterized by comprising the following steps:

adding an identification field in a source data table to be synchronized; the source data table is a user table in a basic database or a service table in a service database; the identification field is a timestamp;

deploying the Logstar cluster as a data synchronization middleware;

deploying an elastic search cluster as a data storage system;

setting a data synchronization configuration file in the Logstar cluster;

the logstar cluster outputs the processed data to the elastic search cluster.

2. The method according to claim 1, characterized in that:

the logstack cluster comprises a plurality of logstacks.

3. The method according to claim 2, wherein when the logstack cluster comprises a first logstack and a second logstack, the deploying the logstack cluster as a data synchronization middleware comprises the steps of:

registering the first logstar and the second logstar in a ZooKeeper;

4. The method according to claim 1, characterized in that:

the data in the user table in the base database comprises: user number, name and gender;

5. The method of claim 1, wherein the value of the identification field in the change source data table is: