CN115905313A

CN115905313A - MySQL big table association query system and method

Info

Publication number: CN115905313A
Application number: CN202211168253.0A
Authority: CN
Inventors: 陈武; 梅欣; 陈亨通
Original assignee: Shanghai Kejia Information Technology Co ltd
Current assignee: Shanghai Kejia Information Technology Co ltd
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2023-04-04

Abstract

The invention discloses a MySQL big table association query system and a MySQL big table association query method, which comprise Canal, MQ, a data processing module and an elastic search, wherein Mysql records data update in real time by using binlog, the real-time data of the binlog analyzed by Canal is synchronized into MQ, the data processing module monitors MQ messages and processes the data to be synchronized into ES in real time, the ES is used for screening and querying the data, and the acquired data id set is further transmitted to MySQL to query a corresponding table to acquire detailed information to be assembled and returned to a client; the system synchronizes data change in real time through the Binlog of MySQL, avoids intrusion operation on service codes and a database, improves the expansibility of the system by introducing MQ in a loose coupling mode, uses ES for searching, solves the performance problem of MySQL related query, and improves user experience.

Description

MySQL large table association query system and method

Technical Field

The invention relates to the field of computer data processing, in particular to a MySQL big table association query system and a MySQL big table association query method.

Background

General system development requires data to be stored in a database. Commonly used databases are classified as Relational databases (Relational database) such as MySQL, oracle, SQLServer, etc., or non-Relational databases (Not Only SQL) such as MongoDB, HBase, dynamo, etc. Relational databases are widely used because of their row and column storage, which is easy to understand and good support for transactions ensures data consistency. More and more businesses under the present stream of internet use MySQL as the preferred database for open source and free reasons.

The elastic search (hereinafter referred to as ES) is a highly extended, open-source full-text retrieval and analysis engine, which can quickly store and search massive data in near-real time.

Query exposure of data in a database is an operation that is not normal, and this often involves the associated query exposure among a plurality of tables. The relational database consumes a long time when a plurality of tables are subjected to correlation query, and particularly, the correlation of the tables with large data volume can cause the exhaustion of the physical machine resources of the database, thereby causing the unavailability of an upper application system.

The existing solutions are:

1) The read-write separation is carried out, the fields to be inquired form a new wide table and are put into a read database, but the scheme relates to code reconstruction and data migration, the reconstruction cost is high, and the new inquiry conditions need to be changed. Relational databases also have performance bottlenecks for storage and querying of large data volumes. Under the condition that the database and the table of the service library are divided, the difficulty of realizing the scheme is further increased.

2) And performing data increment synchronization into the ES by using a third-party data synchronization tool. The scheme can cause data to have certain delay, and cannot meet the query with higher data real-time requirement.

Therefore, how to achieve multi-table association query that is non-intrusive to the original service code and can meet the performance requirement is a technical problem to be solved by technicians at present.

Disclosure of Invention

The invention aims to solve the technical problem of a MySQL big table association query system and a MySQL big table association query method.

The invention is realized by the following technical scheme: a MySQL big table association query system comprises a Canal, an MQ, a data processing module and an ES, wherein real-time data of Canal analysis binlog are synchronized into the MQ, the data processing module monitors MQ messages and processes the data to be synchronized into the ES in real time, the ES is used for screening and querying the data, and according to an acquired data id set, the table corresponding to the query in MySQL is queried to acquire detailed information, and the detailed information is assembled and returned to a client.

As a preferred technical scheme, the method comprises the steps that an interactive protocol of MySQL slave is simulated by the Canal, the Canal is disguised as a MySQL slave and sends a dump protocol to the MySQL, the MySQL pushes binlog to the Canal after receiving a dump request, the Canal receives a binlog change event and then is packaged into an internal event object which is divided into three types of adding INSERT, modifying UPDATE and deleting DELETE, and the internal event object is sent to a message queue MQ.

As a preferred technical solution, after monitoring the message in the message queue, the data processing module finds the corresponding ES index configuration according to the table name to determine the index to be changed, and operates the corresponding index.

As an optimized technical scheme, the data processing module comprises the functions of mapping relation maintenance of database fields and ES indexes, message queue MQ monitoring, ES operation command construction and ES changing operation.

As the preferred technical scheme, the event objects are divided into three types of new addition, modification and deletion.

The invention relates to a MySQL big table association query method, which comprises the following specific steps:

step one, a binlog switch of the MySQL server is started, and a Row format is set to reflect data change of a Row level in real time, all DDL and DML operations are recorded in the binlog, and master-slave copy and data recovery are realized by using the binlog;

simulating a slave node of MySQL by using a Canal tool, monitoring and acquiring binlog incremental change in real time, analyzing and packaging the monitored MySQL operation event into an event object, and sending the event object to a message queue MQ;

step three, after a consumer using the data processing module as a message queue monitors binlog event information in the queue in real time, corresponding logic processing is carried out, the data information is back-checked in MySQL, and finally the correlation query screening conditions and the id of each table are synchronized into ES;

and step four, screening and inquiring the data by using the ES, and inquiring a corresponding table in MySQL according to the acquired data id set to acquire detailed information, assembling the detailed information and returning the assembled information to the client.

The invention has the beneficial effects that: the invention synchronizes data change in real time through the Binlog of MySQL, avoids intrusion operation to service codes and a database, improves the expansibility of the system by introducing MQ in a loose coupling mode, uses ES for searching, solves the performance problem of MySQL related query, and improves user experience.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic flow chart of the present invention;

fig. 2 is a system architecture diagram of the present invention.

Detailed Description

All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.

As shown in fig. 2, the MySQL large table association query system of the present invention includes Canal, MQ, a data processing module, and ES, where real-time data of Canal analysis binlog is synchronized to MQ, the data processing module monitors MQ messages and processes data and synchronizes to ES in real time, the ES is used to perform screening query on data, and according to the acquired data id set, the MySQL queries a corresponding table to acquire detailed information, assembles the detailed information, and returns the detailed information to a client.

The method comprises the steps that a client simulates an interactive protocol of MySQL slave, pretends to be MySQL slave and sends a dump protocol to MySQL, the MySQL receives the dump request and then pushes binlog to the client, the client receives a binlog change event and then encapsulates the binlog change event into an internal event object which is divided into three types of adding INSERT, modifying UPDATE and deleting DELETE, the internal event object is sent to a message queue MQ, and a data processing module is responsible for monitoring the message in the message queue, then finds out the corresponding ES index configuration according to the table name, judges the index to be changed, and operates the corresponding index.

The data processing module comprises the functions of mapping relation maintenance of database fields and ES indexes, message queue MQ monitoring, ES operation command construction and ES change operation.

The event objects in the embodiment are divided into three types, namely adding, modifying and deleting.

As shown in fig. 1, in the MySQL large table association query method of the present invention, during query, a binlog switch of a MySQL server is turned on, and is set to a Row format to reflect data changes at a Row level in real time, all DDL and DML operations are recorded in the binlog, and master-slave replication and data recovery are implemented by using the binlog; then, simulating a slave node of MySQL by using a Canal tool, monitoring and acquiring binlog increment change in real time, analyzing and encapsulating the monitored MySQL operation event into an event object, and sending the event object to a message queue MQ; after a consumer using a data processing module as a message queue monitors binlog event information in the queue in real time, corresponding logic processing is carried out, the data information is back-checked in MySQL, and finally the correlation query screening conditions and the id of each table are synchronized into ES; and finally, screening and inquiring the data by using the ES, and acquiring detailed information from a corresponding table inquired in MySQL according to the acquired data id set, assembling the detailed information and returning the assembled detailed information to the client.

In this embodiment, the method and system for MySQL large table associative query of the present invention are mainly used to solve the performance problem of MySQL associative query and implement no code intrusion in a loose coupling manner.

Specifically, (1) MySQL binding records all DDL and DML operations, and stores the DDL and DML operations in a database server disk in a file form, which can be used to view information such as SQL change operations and time consumption of a database, and implement master-slave copy, data recovery, and the like. The Row mode needs to be started for better supporting the Canal operation;

(2) The method comprises the steps that an interactive protocol of a MySQL slave is simulated by the Canal, the Canal is disguised as the MySQL slave and sends a dump protocol to the MySQL, the MySQL pushes a binlog to the Canal after receiving a dump request, the Canal encapsulates an internal event object after receiving a binlog change event into three types, namely adding INSERT, modifying UPDATE and deleting DELETE, and sends the three types of the internal event object into a message queue MQ (the queue needs to be distinguished according to schemas of a database, namely different schemas are distributed into different queues).

The message content comprises operation type, field value after change, field value before change and table name:

(3) And the data processing module is responsible for finding out the corresponding ES index configuration according to the table name to judge the index to be changed after monitoring the message in the message queue, and operating the corresponding index. The module should include the following functions: the method comprises the steps of maintaining mapping relation of database fields and ES indexes, monitoring a message queue MQ, constructing an ES operation command and performing change operation on an ES.

Taking the table user and the company joint query as an example, the table user and the company are in a many-to-one relationship, that is, one company record corresponds to multiple user records, the associated field is the company _ id field in the user table, the query conditions are company.

Corresponding ES index structure:

"mappings":{

"_doc":{

"properties":{

"user_id":{

"type":"long"

},

"company_id":{

"type":"long"

},

"user_name":{

"type":"keyword"

},

"compay_name":{

"type":"keyword"

}

}。

if the mapping relationship between the database field and the ES index comprises: database table fields, ES index fields, SQL mapping, index name, reference table, and master table:

/>

the fields of the database table and the index fields need to be in one-to-one correspondence, SQL mapping needs to contain all associated query conditions and ids when generating ES instructions and reversely checking data, and a left join statement is needed to designate a main table.

After monitoring MQ messages changed by a user or a company table, matching indexes, namely user _ company _ index, needing to be changed according to the maintained mapping data, and generating corresponding operation instructions;

according to the maintained SQL mapping primary and secondary table relation, if the INSERT operation of the company table is required to be ignored and no operation is required to be carried out; the INSERT and UPDATE operations in the ES instruction can perform reverse check on a database according to the SQL mapping statement to obtain all field values, and then the field values are converted into newly-added or updateByQuery operations at the bottom layer; the DELETE instruction of ES generates a deleteByQuery operation according to the id field value changed by the USER table.

(4) When in query, the index of the user _ company _ index in the ES is queried, the queried user _ id set is obtained, then the corresponding user data is queried in the user table, and then the corresponding information is queried in the company table according to the user and the company associated field company _ id.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that are not thought of through the inventive work should be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope defined by the claims.

Claims

1. A MySQL large table correlation query system is characterized in that: the system comprises a Canal, an MQ, a data processing module and an ES, wherein real-time data of Canal analysis binlog are synchronized into the MQ, the data processing module monitors MQ messages and processes the data to be synchronized into the ES in real time, the ES is used for screening and inquiring the data, and the acquired data id set is transmitted to a MySQL to inquire a corresponding table to acquire detailed information to be assembled and returned to a client.

2. The MySQL big table correlation query system according to claim 1, characterized in that: the method comprises the steps that an interactive protocol of MySQL slave is simulated by the Canal, the Canal is disguised as MySQL slave and sends a dump protocol to the MySQL, the MySQL pushes binlog to the Canal after receiving a dump request, the Canal encapsulates an internal event object after receiving a binlog change event, the internal event object is divided into three types of adding INSERT, modifying UPDATE and deleting DELETE, and the three types of adding INSERT, modifying UPDATE and deleting DELETE are sent to a message queue MQ.

3. The MySQL large-table associative query system according to claim 1, characterized in that: and the data processing module is responsible for finding out the corresponding ES index configuration according to the table name to judge the index to be changed after monitoring the message in the message queue, and operating the corresponding index.

4. The MySQL large-table associative query system according to claim 1 or 3, characterized in that: the data processing module comprises the functions of mapping relation maintenance of database fields and ES indexes, message queue MQ monitoring, ES operation command construction and ES change operation.

5. The MySQL large-table associative query system according to claim 1, characterized in that: the event objects are divided into three types of new addition, modification and deletion.

6. A MySQL big table association query method is characterized in that the query system of any one of claims 1 to 5 is used, and the specific steps are as follows:

step one, a binlog switch of a MySQL server is started and set to be in a Row format so as to reflect data change of a Row level in real time, all DDL and DML operations are recorded in the binlog, and master-slave copy and data recovery are realized by using the binlog;

step three, after a data processing module is used as a consumer of the message queue to monitor binlog event information in the queue in real time, corresponding logic processing is carried out, data information is back-checked in MySQL, and finally the correlation query screening conditions and the id of each table are synchronized into an ES;

and step four, screening and inquiring the data by using the ES, and inquiring a corresponding table in MySQL according to the acquired data id set to acquire detailed information, assembling the detailed information and returning the assembled detailed information to the client.