CN113064925A

CN113064925A - Big data query method, system and computer readable storage medium

Info

Publication number: CN113064925A
Application number: CN202110277458.1A
Authority: CN
Inventors: 李鹏
Original assignee: Shenzhen Yishi Huolala Technology Co Ltd
Current assignee: Shenzhen Yishi Huolala Technology Co Ltd
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2021-07-02

Abstract

The invention provides a big data query method, a big data query system and a computer readable storage medium, wherein the method comprises the following steps: receiving a user permission application and distributing permissions to the user; receiving the query service selected by the user and the input SQL query statement; analyzing the SQL query statement and distributing a task to a computing engine; and outputting the query result, the log, the resource utilization rate and the task information returned by the task executed by the computing engine. The big data query system provided by the invention allows a user to select query services through unified SQL query, thereby integrating multiple query scenes, displaying and controlling the resource utilization rate in the finest granularity, detecting sensitive data and system risks, ensuring the safety and stability of data, customizing functions by the user, and flexibly connecting other systems to adapt to various user requirements.

Description

Big data query method, system and computer readable storage medium

Technical Field

The present invention relates to the field of big data technologies, and in particular, to a big data query method, a big data query system, and a computer-readable storage medium.

Background

With the development of technology and the progress of society, the application fields of big data are more and more, and the query platform for big data is more and more important. At present, most of big data query systems mainly use open source Hue or Zeppelin, or directly use a Linux terminal to configure different clients for directly querying. The existing big data query method or system needs to be adapted to a specific database or engine, the resource use condition is not detailed enough, error information is usually subjected to standardized processing and cannot be adapted to various user requirements, the user authority management is not flexible enough, and therefore various working requirements cannot be flexibly and efficiently adapted. There is therefore a need for a method and system for big data query that is more flexible and adaptable to various user needs.

Disclosure of Invention

The invention aims to provide a big data query method system, a big data query system and a computer readable storage medium, and aims to solve the problem that a query system in the prior art needs to be adapted to a specific database and an engine and cannot be adapted to various user requirements.

In a first aspect, the present invention provides a big data query method, where the method includes:

receiving a user permission application and distributing permissions to the user;

receiving the query service selected by the user and the input SQL query statement;

analyzing the SQL query statement and distributing a task to a computing engine;

and outputting the query result, the log, the resource utilization rate and the task information returned by the task executed by the computing engine.

In a second aspect, the present invention provides a big data query system, including:

the authority management module is used for receiving a user authority application and distributing authority to the user;

the input module is used for receiving the query service selected by the user and the SQL query statement input by the user;

the SQL analysis module is used for analyzing the SQL sentences and distributing tasks to the computing engine;

the output module is used for outputting a log, a query result, a resource utilization rate and task information returned by the task executed by the computing engine; and

and the function setting module is used for receiving user input to set functions.

In a third aspect, the present invention provides a computer-readable storage medium, which stores a computer program, wherein the computer program, when executed by a processor, implements the steps of the big data query method as described above.

In the invention, different authorities are distributed to clients, users are allowed to select query services and input uniform SQL query statements, a plurality of query scenes are integrated, the query of various distributed data storage engines and conventional databases under mass data is supported, the resource utilization rate is displayed and controlled according to the finest granularity, and the reference is provided for the optimization of users, intercepting when SQL query statement relates to system risk, detecting sensitive data when query result is downloaded, thereby ensuring the safety of data and the stability of service, and supporting the user to customize the functions of log error reporting unified processing, self-defining system notification, multi-option card inquiry, dispatching random red packet, setting uploading and downloading magnitude for each person, and the like, therefore, the system can flexibly interface the authority systems, alarms, notifications, red packet services and the like of other systems so as to adapt to various user requirements.

Drawings

Fig. 1 is a flowchart of a big data query method according to an embodiment of the present invention.

Fig. 2 is a block diagram of a big data query system according to a second embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

The first embodiment is as follows:

as shown in fig. 1, a big data query method provided in an embodiment of the present invention includes the following steps, it should be noted that, if substantially the same result is obtained, the big data query method of the present invention is not limited to the flow sequence shown in fig. 1.

S101, receiving a user authority application and distributing authority to the user.

In the first embodiment of the present invention, the permission application of the user includes, but is not limited to, a usable query engine, an accessible database, and an accessible data table application, and the permission assignment to the user may be automatically based on the user permission application to assign the engine query permission, the database, and the data table access permission to the user, or may be assigned to the user after an administrator approves a user permission application process. When receiving a user authority application, the user, the database, the data table and the function authority are recorded, and after the authority is approved, the meta information can be changed correspondingly. When a user inputs SQL sentences for query, the SQL meta-information is analyzed and examined. The system bottom layer stores information such as link addresses, account numbers, passwords and the like of all the query engines in an API calling mode, and the information is directly and uniformly configured by an administrator.

S102, receiving the query service selected by the user and the input SQL query statement.

In the first embodiment of the present invention, the user may select the query service required by the user, wherein the query service may select one or more of Hive, ES, Kafka, Mongodb, or Phoenix. After the user selects the query service, the user can perform query after inputting the SQL query statement. Various query scenes are integrated so as to unify query entries, and a user can query in a unified SQL visualization query mode and support the query of various distributed data storage engines and conventional databases under mass data.

S103, analyzing the SQL query statement and distributing the task to a computing engine.

And when the SQL query statement input by the user is analyzed, outputting optimization guidance information to guide the user to optimize the SQL query statement and guide the user to carry out advanced learning when the abnormal SQL statement or the SQL statement which can be optimized is detected, wherein the optimization guidance information can be updated regularly or added manually. When the SQL sentence syntax error report occurs or the error report occurs during operation, formatting processing or user presetting processing is carried out, and the presetting processing allows a user to preset error report processing information and supports linking to a knowledge document library or a questioning robot. SQL sentences are analyzed to generate SQL portrait information, a routing algorithm is set according to the SQL portrait information to automatically distribute computing tasks to the fastest computing engine, and users do not need to pay attention to underlying computing logic. The SQL portrait information is one or more of read data size, read table/partition number, various Join times, number of associated fields, aggregation complexity and filtering conditions, and the calculation engine is one or more of Presto, Spark SQL, Druid, MR and TEZ. Furthermore, SQL query statements may be intercepted when they are detected as operations that are risky and/or involve system stability.

And S104, outputting a query result, a log, a resource utilization rate and task information returned by the task executed by the computing engine.

The computing engine returns a query result, a log, the resource utilization rate and task information after executing a query task, the query result is output, data of the query result can be downloaded, the downloading magnitude is set by an administrator in advance for an individual, and the downloaded data is detected to be automatically encrypted or trigger a data security approval stream when sensitive data is contained in the downloaded data. The log of the current query process can be output in real time, a user can also check the historical log, different tab queries are selected, different query results can be correspondingly displayed, and the error-reporting log supports user-defined explanation, documents or knowledge guidance. When the query task is executed, the resource utilization rate and task information can be output in real time, wherein the resource utilization rate can be displayed and controlled in a fine-grained manner according to one or more dimensions of departments, queues, personnel, tasks and the like, and the task information comprises SQL details of the tasks, task state information and the like. The user can perform corresponding regulation and control on task termination, task removal and the like according to the resource utilization rate.

In the first embodiment of the present invention, the big data query method may further include the steps of:

and S105, receiving user input and setting functions, wherein the functions comprise one or more of surprise red packet, system notification, uploading and downloading of large data magnitude, multi-option card inquiry or log error unified processing. The surprise red packet function can trigger a certain probability to give a reward to the red packet according to the operation condition of the user SQL. The system notification feature may allow a user to customize the notification information desired to be presented. The uploading and downloading function of the big data magnitude supports direct uploading of Excel files into hive lists, supports file uploading functions (including different forms of storage formats such as jar, txt, Excel and compressed files), supports downloading of the big files, can set downloading authorities of different magnitudes owned by different users, and can set triggering of a sensitive data auditing mechanism when the downloaded data contains sensitive data. The multi-tab query function supports multiple windows to query in parallel, and a user does not need to open multiple window pages. The log error reporting unified processing function packages and classifies the system error reporting in a unified way, so that complicated stack error reporting is more humanized, and the system error reporting unified processing function can be appointed to be linked to a knowledge base of each system service, and a user can check various help document information at any time.

Example two:

as shown in fig. 2, a second embodiment of the present invention provides a big data query system 200, which includes a rights management module 201, an input module 202, an SQL parsing module 203, an output module 204, and a function setting module 205.

The right management module 201 is configured to receive a user right application and assign a right to a user. The user's authority application includes but is not limited to available inquiry engine, accessible database and accessible data table, and the system or administrator may assign the engine inquiry authority, database and data table access authority to the user according to the user's authority application. When receiving a user authority application, the system records a database, a data table and a function authority of a user, and changes the meta-information after the authority examination and approval is passed. When a user writes SQL to make a query, the SQL meta-information is parsed and examined. The system bottom layer stores information such as link addresses, account numbers, passwords and the like of all the query engines in an API calling mode, and the information is directly and uniformly configured by an administrator.

The input module 202 is used for receiving the query service selected by the user and the SQL query statement input by the user. Wherein the query service is one or more of Hive, ES, Kafka, Mongdb, or Phoenix. The big data query system 200 unifies query entries through the input module 202, integrates various query scenarios, and supports query of various distributed data storage engines and conventional databases under mass data in a unified SQL-based visual query manner.

And the SQL parsing module 203 is used for parsing the SQL statement and distributing the task to the computing engine. Specifically, the SQL parsing module 203 may check an SQL statement that is not standardized or can be optimized, and output optimization guidance information, which may be updated periodically or added manually. The SQL parsing module 203 may perform formatting processing or user presetting processing on SQL syntax errors or runtime errors, where the presetting processing may preset error processing information to support linking to a knowledge document library or a questioning robot. The SQL analysis module 203 automatically distributes the calculation task to the fastest calculation engine according to the SQL portrait information setting routing algorithm, and the user does not need to pay attention to the underlying operation logic. The SQL portrait information is one or more of read data size, read table/partition number, various Join times, number of associated fields, aggregation complexity and filtering conditions, and the calculation engine supports one or more of Presto, Spark SQL, Druid, MR and TEZ. SQL parsing module 203 intercepts when operations are detected that are at risk and/or involve system stability.

And the output module 204 is used for outputting logs, query results, resource utilization rate and task information. The log of the current query process can be output in real time, and the historical log can also be viewed, wherein the error log supports custom explanation, documents or knowledge guidance. The data of the query result can be downloaded, the downloading magnitude is set by an administrator in advance, and when the downloaded data contains sensitive data, the data can be automatically encrypted or a data security approval stream is triggered. The output module 204 may also output resource usage and task information when the current task is executed. The resource utilization rate can be displayed and controlled in a fine-grained manner according to one or more dimensions of departments, queues, personnel, tasks and the like, and the task information comprises SQL details of the tasks, task state information and the like. The user can perform corresponding regulation and control on task termination or task removal and the like according to the resource utilization rate.

A function setting module 205, configured to receive a user input to set a function, where the function includes one or more of a surprise red packet, a system notification, an upload and download of a big data level, a multi-tab query, and a log error unification process. The surprise red packet function can trigger a certain probability to give a red packet reward according to the operation condition of the user SQL. The system notification feature may allow a user to customize the notification information desired to be presented. The uploading and downloading function of the big data magnitude supports direct uploading of Excel files into hive lists, supports file uploading functions (including different forms of storage formats such as jar, txt, Excel and compressed files), supports downloading of the big files, can set downloading authorities of different magnitudes owned by different users, and can set triggering of a sensitive data auditing mechanism when the downloaded data contains sensitive data. The multi-tab query function supports multiple windows to query in parallel, and a user does not need to open multiple window pages. The log error reporting unified processing function packages and classifies the system error reporting in a unified way, so that complicated stack error reporting is more humanized, and the system error reporting unified processing function can be appointed to be linked to a knowledge base of each system service, and a user can check various help document information at any time.

The big data query system provided by the second embodiment of the present invention and the big data query method provided by the first embodiment of the present application belong to the same concept, and the specific implementation process thereof is detailed throughout the entire specification and will not be described herein again.

EXAMPLE III：

A third embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the third embodiment of the present invention implements the steps of the big data query method according to the first embodiment of the present invention.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A big data query method is characterized by comprising the following steps:

2. The method of claim 1, wherein the query service is one or more of Hive, ES, Kafka, Mongdb, or Phoenix.

3. The method of claim 1, wherein the resource usage is fine-grained exposed and managed in one or more dimensions of department, queue, people, tasks.

4. The method of claim 1, wherein parsing the SQL query statement and distributing tasks to a computing engine specifically is:

analyzing the SQL query statement to generate SQL portrait information, setting a routing algorithm according to the SQL portrait information to automatically distribute a calculation task to a fastest calculation engine, wherein the SQL portrait information is one or more of data reading amount, reading table/partition number, various Join times, number of associated fields, aggregation complexity and filtering conditions, and the calculation engine is one or more of Presto, Spark SQL, Druid, MR and TEZ.

5. The method of claim 1, wherein parsing the SQL query statement and distributing tasks to a compute engine further comprises:

intercepting when the SQL query statement is analyzed to be risky and/or related to the operation of system stability.

6. The method of claim 1, wherein the method further comprises:

and when the query result is downloaded, detecting whether the downloaded data contains sensitive data, and if so, encrypting the sensitive data or triggering a data security approval process.

7. The method of claim 1, wherein the method further comprises:

receiving user input and setting functions, wherein the functions comprise one or more of surprise red packet, system notification, uploading and downloading of large data magnitude, multi-option card inquiry or log error unified processing.

8. The method of claim 7, wherein the logging unified processing function uniformly encapsulates and categorizes system errors and specifies a repository that links to various system services.

9. A big data query system, the system comprising:

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the big data query method according to any one of claims 1 to 8.