CN106649828B

CN106649828B - Data query method and system

Info

Publication number: CN106649828B
Application number: CN201611248518.2A
Authority: CN
Inventors: 禹熹; 周继恩; 冯兴; 王颖卓; 方亚超; 叶炜
Original assignee: China Unionpay Co Ltd
Current assignee: China Unionpay Co Ltd
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2019-12-24
Anticipated expiration: 2036-12-29
Also published as: CN106649828A

Abstract

The embodiment of the invention discloses a data query method and a system, comprising the following steps: receiving an inquiry request message sent by a service system, determining a target data engine corresponding to the data record to be inquired according to the inquiry information of the data record to be inquired and the corresponding relation between the inquiry information and the data engine, and further inquiring the data record in the target data engine to obtain the data record to be inquired. Because the query information of the data records comprises the query concurrency amount and the query condition amount of the data records, and the query information and the data engine have a corresponding relationship, it can be seen that the data records with different query concurrency amounts and query condition amounts can correspond to different data engines. Therefore, the target data engine of the data record to be inquired is determined according to the corresponding relation between the inquiry information and the data engine, and the data record in the target data engine is inquired through the determined target data engine, so that the advantages of different data engines can be fully utilized, and the efficiency of data inquiry is effectively improved.

Description

Data query method and system

Technical Field

The invention relates to the technical field of data processing, in particular to a data query method and a data query system.

Background

With the advent of the big data era, the data volume in each business system becomes huge, and the query application of big data becomes more and more common. Because the query efficiency directly affects the response time of the query system, how to implement efficient, accurate and real-time data query in the face of increasing mass data has become an important problem to be solved urgently in the industry.

At present, a relational database is usually adopted in each business system to store business data, but the relational database has poor lateral expansion capability and high expansion cost, and is difficult to perform distributed expansion. When mass data is stored in a database, the database has slow response and poor read-write access performance when data query is performed due to excessive resource occupation of the database.

To solve this problem, in the prior art, the database may be partitioned to divide the data into different libraries and different tables, so as to avoid storing too much data in a single table of the database. However, even if the database is partitioned, since complex database partitioning logic needs to be processed when data is written into and read from the database, when massive data is stored in the database, the number of tables is too large, which still deteriorates the access performance of the database, makes data query inefficient, and complicates management and operation and maintenance of the database.

In summary, a data query method is needed to improve the efficiency of data query.

Disclosure of Invention

The invention provides a data query method and a data query system, which are used for solving the technical problems of poor database access performance and low data query efficiency in the prior art.

The data query method provided by the embodiment of the invention comprises the following steps:

receiving a query request message sent by a service system, wherein the query request message comprises query information of a data record to be queried; the query information of the data records to be queried comprises query concurrency and query condition quantity corresponding to the data records to be queried;

determining a target data engine corresponding to the query information of the data record to be queried according to the query concurrency and the query condition number of the data record to be queried and the corresponding relation between the query information and the data engine;

and querying the data record in the target data engine to obtain the data record to be queried.

Optionally, the data records in the target data engine are imported by:

receiving an Nth data record sent by an online system, and storing the Nth data record in a data buffer area;

acquiring the Nth data record and query information of the Nth data record from the data buffer area;

and storing the Nth data record into the target data engine according to the query information of the Nth data record and the corresponding relation between the query information and the data engine.

Optionally, storing the nth data record in the target data engine according to query information of the nth data record and a corresponding relationship between the query information and the data engine, including:

if the query information of the Nth data record is that the query concurrency is greater than or equal to the preset query concurrency and the query condition number is less than or equal to the preset query condition threshold, determining the Hbase data engine as the target data engine according to the corresponding relation between the query information and the data engine, and storing the Nth data record into the Hbase data engine;

and if the query information of the Nth data record is that the query concurrency is less than the preset query concurrency or the query condition number is greater than or equal to the preset query condition threshold, determining the Impala data engine as the target data engine according to the corresponding relation between the query information and the data engine, and storing the Nth data record into the Impala data engine.

Optionally, after the storing the nth data record in the target data engine according to the query information of the nth data record and the corresponding relationship between the query information and the data engine, the method further includes:

receiving batch data records sent by the online system;

determining a target data engine corresponding to the batch data records according to the query information of the batch data records;

replacing the data records stored in the target data engine corresponding to the batch data records with the batch data records.

Optionally, the target data engine includes M cluster nodes, where M is an integer greater than or equal to 1.

Based on the same inventive concept, the embodiment of the present invention further provides a data query system, including:

the receiving module is used for receiving query request information sent by a service system, wherein the query request information comprises query information of data records to be queried; the query information of the data records to be queried comprises query concurrency and query condition quantity corresponding to the data records to be queried;

the determining module is used for determining a target data engine corresponding to the query information of the data record to be queried according to the query concurrency and the query condition number of the data record to be queried and the corresponding relation between the query information and the data engine;

and the processing module is used for inquiring the data record in the target data engine to obtain the data record to be inquired.

Optionally, the receiving module is further configured to:

the processing module is further configured to:

acquiring the Nth data record and query information of the Nth data record from the data buffer area; and the number of the first and second groups,

Optionally, the processing module is specifically configured to:

Optionally, the receiving module is further configured to:

receiving batch data records sent by the online system;

the processing module is further configured to:

According to the embodiment of the invention, the query request message sent by the service system is received, the target data engine corresponding to the query information of the data record to be queried is determined according to the query information of the data record to be queried contained in the query request message and the corresponding relation between the query information and the data engine, and the data record to be queried can be obtained by querying the data record in the target data engine. Because the query information of the data records comprises the query concurrency amount and the query condition amount of the data records, and the query information and the data engine have a corresponding relationship, it can be seen that the data records with different query concurrency amounts and query condition amounts can correspond to different data engines. Therefore, the target data engine of the data record to be inquired is determined according to the inquiry concurrency amount, the inquiry condition number and the corresponding relation between the target data engine and the data engine, the data record in the target data engine is inquired through the determined target data engine, the advantages of different data engines can be fully utilized, and the data inquiry efficiency is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart corresponding to a data query method in an embodiment of the present invention;

fig. 2 is a schematic flow chart corresponding to a real-time importing flow of data records in an embodiment of the present invention;

FIG. 3 is a schematic flow chart illustrating a batch data record importing flow according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a data query system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The data query method in the embodiment of the invention can be applied to a big data system, wherein the big data system comprises one or more data engines, and the data engines are specifically used for storing data records, and performing processing operations such as query on the stored data records.

Specifically, the data engine may be various types of data engines, such as a non-relational NoSQL database Hbase data engine or an MPP database Impala data engine, or may be other types of data engines, which is not limited herein.

In the embodiment of the invention, the big data system can be connected with one or more online systems. The online system may be various types of online systems, and the content of the service processing performed by the online system may be set by those skilled in the art according to actual needs, which is not limited herein. Moreover, the online system can generate or update the data records in the source database in real time while performing business processing.

Therefore, in order to avoid that processing operations such as data query performed by the big data system affect business processing of the online system, the data records in the big database can be stored independently from the data records in the source database of the online system. In the embodiment of the invention, the big data system can acquire the copy of the data record from the source database of the online system so as to realize the query of the data record.

In the embodiment of the invention, the big data system can be connected with one or more service systems. Any business system can inquire the data records stored in the big data system in a mode of sending inquiry request messages through a corresponding interface in the big data system.

The embodiments of the present invention will be described in further detail with reference to the drawings attached hereto.

Fig. 1 is a schematic flow chart corresponding to a data query method provided by an embodiment of the present invention, as shown in fig. 1, including the following steps 101 to 103:

step 101: receiving a query request message sent by a service system, wherein the query request message comprises query information of a data record to be queried; the query information of the data records to be queried comprises query concurrency and query condition quantity corresponding to the data records to be queried;

step 102: determining a target data engine corresponding to the query information of the data record to be queried according to the query concurrency and the query condition number of the data record to be queried and the corresponding relation between the query information and the data engine;

step 103: and querying the data record in the target data engine to obtain the data record to be queried.

Because the query information of the data records comprises the query concurrency amount and the query condition amount of the data records, and the query information and the data engine have a corresponding relationship, it can be seen that the data records with different query concurrency amounts and query condition amounts can correspond to different data engines. Therefore, the target data engine of the data record to be inquired is determined according to the inquiry concurrency amount, the inquiry condition number and the corresponding relation between the target data engine and the data engine, the data record in the target data engine is inquired through the determined target data engine, the advantages of different data engines can be fully utilized, and the data inquiry efficiency is effectively improved.

Specifically, in step 201, a data query client is included in the big data system, and the data query client has one or more interfaces. Thus, the business system may send a query request message to the data query service through a corresponding interface of the data query client of the big data system.

The query request message includes query information of the data record to be queried, that is, query concurrency and query conditions corresponding to the data record to be queried, and is used to represent a service application scenario of the data record to be queried. The number of the service systems may be one or more, and the interfaces of the data query clients corresponding to different service systems may be the same or different, which is not limited herein.

The big data system can also comprise a data query service, and the data query service has a data engine adaptation function. Further, in steps 202 and 203, after receiving the query request message, the data query service may determine a target data engine where the data record to be queried is located according to the query information of the data record to be queried and the corresponding relationship between the query information and the data engine.

Further, in step 103, the big data system may query and obtain the data record to be queried through the determined corresponding access interface of the target data engine.

The data records to be inquired can be stored in the corresponding target data engine according to the business application scene corresponding to the data records to be inquired, namely the quantity of inquiry concurrency and inquiry conditions, and the corresponding relation between the business application scene and the data engine before the business system sends the inquiry request message, so that the advantages of different data engines can be fully exerted during inquiry, and the efficiency of data inquiry is improved.

Taking the inquiry of the transaction detail data in the Unionpay system in one year as an example, if the transaction detail inquiry is carried out through the card number and the time, about 90% of the data inquiry requests can be completed within 10ms, and about 97% of the data inquiry requests can be completed within 1 s.

In the embodiment of the invention, the big data system can acquire the data record from the source database of the online system in a real-time import mode while providing data query service for each business system, and store the data record into the target data engine.

After the big data system is connected with the online system, the big data system can import data records from a source database of the online system in real time. Since data records are generally many, the real-time importing process of data records will be described in detail below by taking the nth data record sent by the online system as an example.

Fig. 2 is a schematic flow chart corresponding to a real-time importing flow of data recording in an embodiment of the present invention, as shown in fig. 2, including the following steps 201 to 203:

step 201: receiving an Nth data record sent by an online system, and storing the Nth data record in a data buffer area;

step 202: acquiring the Nth data record and query information of the Nth data record from the data buffer area;

step 203: and storing the Nth data record into the target data engine according to the query information of the Nth data record and the corresponding relation between the query information and the data engine.

Specifically, the big data system may receive an nth data record sent by the online system through the data access layer, and convert the nth data record into a preset data format. Because different online systems have different modes for generating and sending real-time data records, the big data system can convert the data records received from different online systems into a preset data format through corresponding adapters in order to facilitate the big data system to carry out subsequent processing on the received data records. In the embodiment of the present invention, the preset data format may be set by a person skilled in the art according to actual needs, and is not limited herein.

Because the receiving and processing rates of the data records by the big data system may be different, after the big data system converts the received nth data record into the preset data format, the nth data record may be stored in the data buffer area, so as to improve the reliability of receiving the data record and effectively avoid the mutual influence between the receiving of the data record and the subsequent data record processing.

In the embodiment of the present invention, the data buffer may adopt a persistent storage medium. Therefore, even if the large data system has an exception in the processing process of the data record, the data record can not be lost, and the data record can still be read from the data buffer area for reprocessing.

In addition, since the online system connected to the big data system may generate data records at a faster rate and with a larger data volume, the data buffer layer should have a larger data throughput, for example, the data buffer layer can be expanded horizontally to increase the data processing capability by adding nodes.

In step 202, the streaming data processing layer in the big data system may obtain the nth data record from the data buffer, and perform corresponding data processing according to the service requirement related to the nth data record.

The streaming data processing layer can adopt various types of processing structures according to the complexity and time consumption of data processing. For example, the streaming data processing layer may adopt a single-process or multi-thread concurrent processing structure inside the process, or may adopt a distributed architecture of multi-process cooperative real-time processing using a streaming processing framework, which is not limited herein.

Further, in step 203, the processed nth data record may be stored in the target data engine in the data storage layer according to the corresponding relationship between the query information of the nth data record and the data engine.

The target data engine is constructed by a distributed technology and has a horizontal expansion capability, that is, the target data engine can comprise M cluster nodes, wherein M is an integer greater than or equal to 1. When the target data engine needs to store massive data records, the data processing capacity of the whole big data system can be improved by increasing the number of cluster nodes. For example, a small Hbase cluster including 60 nodes can be used to manage one-year-old transaction detail data including 36 fields and provide a superior query service to the outside. With a larger data size, the nodes of the Hbase cluster can be further increased to support the performance of the query service.

In the embodiment of the invention, the data storage layer comprises a plurality of data engines of different types, and the data processing capacity of the data engines of different types is different because the data engines of different types store and query data records in different modes, so that the data records of different business application scenes are respectively stored in the data engines of different types, the characteristics of the data engines can be fully utilized, and the efficiency of importing and querying the data records is improved.

Specifically, the big data system may store the nth data record in the corresponding target data engine according to the query information of the nth data record and the corresponding relationship between the query information and the data engine. The query information of the data record comprises query concurrency and query condition number, and is used for representing a service application scene corresponding to the data record.

If the query information of the Nth data record is that the query concurrency is greater than or equal to the preset query concurrency and the query condition number is less than or equal to the preset query condition threshold, determining the Hbase data engine as a target data engine according to the corresponding relation between the query information and the data engine, and storing the Nth data record into the Hbase data engine;

and if the query information of the Nth data record is that the query concurrency is less than the preset query concurrency or the query condition number is greater than or equal to the preset query condition threshold, determining the Impala data engine as a target data engine according to the corresponding relation between the query information and the data engine, and storing the Nth data record into the Impala data engine.

The query information of the data record may be received by the big data system from a source database of the online system, or may be determined by the big data system according to the received data record, which is not limited herein; moreover, the preset query concurrency amount and the preset query condition threshold value can be set by those skilled in the art according to actual needs, and are not limited here.

It should be noted that the query information may further include a query performance requirement. If the query performance corresponding to the nth data record is greater than the preset performance requirement threshold, determining the Hbase data engine as a target data engine according to the corresponding relationship between the query performance requirement and the data engine, and storing the nth data record in the Hbase data engine.

For example, high-concurrency and high-performance query data such as CUPS clearing details of the union pay system facing to external cardholders and merchants can be stored in the Hbase data engine for processing, and data facing to the internal management system and rich in query conditions can be stored in the Impala data engine for processing, so that the advantages and the benefits of each data engine are fully exerted.

In order to further improve the reliability of real-time importing of data records, the process of real-time importing data records from a source database of an online system by the big data system may further include processing of distributed transactions, and when the real-time data record writing is abnormal, transaction rollback may be performed, and data may be rewritten.

Because some offline data records processed by batch background uploading exist in the source database of the online system, the offline data records cannot be imported into the data engine through the real-time import process. Therefore, in order to prevent the data records stored in each data engine of the data storage layer from being leaked, the big data system can also adopt a mode of importing batch data records from an online system at the end of a day to supplement or erase the data records which are lost or repeatedly written due to network or system abnormity in the real-time import flow.

Fig. 3 is a schematic flow diagram corresponding to a batch data record importing flow in the embodiment of the present invention, and as shown in fig. 3, the batch data importing flow includes the following steps 301 to 303:

step 301: receiving batch data records sent by the online system;

step 302: determining a target data engine corresponding to the batch data records according to the query information of the batch data records;

step 303: replacing the data records stored in the target data engine corresponding to the batch data records with the batch data records.

Specifically, in step 301, the big data system may use an ETL (Extract-Transform-Load) tool to Extract the batch data records from the source database of the online system and store the batch data records in the intermediate data store. The ETL tool may be various types of ETL tools, such as Informatica, Datastage, etc., and those skilled in the art may select an appropriate ETL tool according to actual needs, which is not limited herein.

Accordingly, the intermediate data storage area may also be a plurality of types of data storage areas. In the embodiment of the invention, as the Hbase and the Impala data engines are adopted, the intermediate data storage area can adopt a Hadoop distributed file system matched with the data engines. Of course, in the case of a big data system using other data engines, the intermediate data storage area may also use other types of data storage forms, and is not limited herein.

Subsequently, in step 302, after the big data system processes and converts the obtained batch data records, according to the query information, i.e. the query concurrency and the query condition number, and the corresponding relationship between the query information and the data engines, the big data system determines a target data engine corresponding to the batch data records from the plurality of data engines in the data storage, and stores the batch data records into the corresponding target data engine to replace the original data records in the target data engine.

Specifically, for each data record in the batch data records, the big data system can process each data record, and store each data record into a corresponding target data engine according to query information, namely the number of query concurrency and query conditions. The processing and converting processing of the batch data records comprises format processing of batch data record files and business processing corresponding to each data record in the batch data records.

It should be noted that, in order to improve the processing performance of the batch data import flow, the big data system may adopt a concurrent form to process the extraction and processing processes of the batch data, so in the embodiment of the present invention, a MapReduce parallel data processing framework is adopted to perform the extraction and processing processes of the data, and the time for importing the batch data is effectively shortened.

Based on the same inventive concept, the embodiment of the present invention further provides a data query system, as shown in fig. 4, where the system 400 includes:

a receiving module 401, configured to receive a query request message sent by a service system, where the query request message includes query information of a data record to be queried; the query information of the data records to be queried comprises query concurrency and query condition quantity corresponding to the data records to be queried;

a determining module 402, configured to determine, according to the query concurrency and the query condition number of the data records to be queried and the corresponding relationship between the query information and a data engine, a target data engine corresponding to the query information of the data records to be queried;

the processing module 403 is configured to query the data record in the target data engine to obtain the data record to be queried.

Optionally, the receiving module 401 is further configured to:

the processing module 403 is further configured to:

Optionally, the processing module 403 is specifically configured to:

Optionally, the receiving module 401 is further configured to:

receiving batch data records sent by the online system;

the processing module 403 is further configured to:

From the above, it can be seen that:

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for data query, the method comprising:

querying a data record in the target data engine to obtain the data record to be queried;

wherein the data records in the target data engine are imported by:

receiving an Nth data record sent by an online system, storing the Nth data record in a data buffer area, and acquiring the Nth data record and query information of the Nth data record from the data buffer area; if the query information of the Nth data record is that the query concurrency is greater than or equal to the preset query concurrency and the query condition number is less than or equal to the preset query condition threshold, determining the Hbase data engine as the target data engine according to the corresponding relation between the query information and the data engine, and storing the Nth data record into the Hbase data engine; and if the query information of the Nth data record is that the query concurrency is smaller than the preset query concurrency or the query condition number is larger than the preset query condition threshold, determining the Impala data engine as the target data engine according to the corresponding relation between the query information and the data engine, and storing the Nth data record into the Impala data engine.

2. The method according to claim 1, wherein after storing the nth data record in the target data engine according to the query information of the nth data record and the corresponding relationship between the query information and the data engine, further comprising:

receiving batch data records sent by the online system;

3. The method of claim 1 or 2, wherein the target data engine comprises M cluster nodes, M being an integer greater than or equal to 1.

4. A data query system, the system comprising:

a processing module for querying the data record in the target data engine to obtain the data record to be queried

The receiving module is further configured to: receiving an Nth data record sent by an online system, and storing the Nth data record in a data buffer area;

the processing module is further configured to: acquiring the Nth data record and query information of the Nth data record from the data buffer area; if the query information of the Nth data record is that the query concurrency is greater than or equal to the preset query concurrency and the query condition number is less than or equal to the preset query condition threshold, determining the Hbase data engine as the target data engine according to the corresponding relation between the query information and the data engine, and storing the Nth data record into the Hbase data engine; and if the query information of the Nth data record is that the query concurrency is smaller than the preset query concurrency or the query condition number is larger than the preset query condition threshold, determining the Impala data engine as the target data engine according to the corresponding relation between the query information and the data engine, and storing the Nth data record into the Impala data engine.

5. The system of claim 4, wherein the receiving module is further configured to:

receiving batch data records sent by the online system;

the processing module is further configured to:

6. The system according to any one of claims 4 or 5, wherein the target data engine comprises M cluster nodes, M being an integer greater than or equal to 1.

7. A computing device comprising at least one processing unit and at least one memory unit, wherein the memory unit stores a computer program that, when executed by the processing unit, causes the processing unit to perform the method of any of claims 1 to 3.

8. A computer-readable storage medium storing a computer program executable by a computing device, the program, when run on the computing device, causing the computing device to perform the method of any of claims 1 to 3.