CN113064925A - Big data query method, system and computer readable storage medium - Google Patents

Big data query method, system and computer readable storage medium Download PDF

Info

Publication number
CN113064925A
CN113064925A CN202110277458.1A CN202110277458A CN113064925A CN 113064925 A CN113064925 A CN 113064925A CN 202110277458 A CN202110277458 A CN 202110277458A CN 113064925 A CN113064925 A CN 113064925A
Authority
CN
China
Prior art keywords
query
user
sql
task
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110277458.1A
Other languages
Chinese (zh)
Inventor
李鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yishi Huolala Technology Co Ltd
Original Assignee
Shenzhen Yishi Huolala Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yishi Huolala Technology Co Ltd filed Critical Shenzhen Yishi Huolala Technology Co Ltd
Priority to CN202110277458.1A priority Critical patent/CN113064925A/en
Publication of CN113064925A publication Critical patent/CN113064925A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a big data query method, a big data query system and a computer readable storage medium, wherein the method comprises the following steps: receiving a user permission application and distributing permissions to the user; receiving the query service selected by the user and the input SQL query statement; analyzing the SQL query statement and distributing a task to a computing engine; and outputting the query result, the log, the resource utilization rate and the task information returned by the task executed by the computing engine. The big data query system provided by the invention allows a user to select query services through unified SQL query, thereby integrating multiple query scenes, displaying and controlling the resource utilization rate in the finest granularity, detecting sensitive data and system risks, ensuring the safety and stability of data, customizing functions by the user, and flexibly connecting other systems to adapt to various user requirements.

Description

Big data query method, system and computer readable storage medium
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a big data query method, a big data query system, and a computer-readable storage medium.
Background
With the development of technology and the progress of society, the application fields of big data are more and more, and the query platform for big data is more and more important. At present, most of big data query systems mainly use open source Hue or Zeppelin, or directly use a Linux terminal to configure different clients for directly querying. The existing big data query method or system needs to be adapted to a specific database or engine, the resource use condition is not detailed enough, error information is usually subjected to standardized processing and cannot be adapted to various user requirements, the user authority management is not flexible enough, and therefore various working requirements cannot be flexibly and efficiently adapted. There is therefore a need for a method and system for big data query that is more flexible and adaptable to various user needs.
Disclosure of Invention
The invention aims to provide a big data query method system, a big data query system and a computer readable storage medium, and aims to solve the problem that a query system in the prior art needs to be adapted to a specific database and an engine and cannot be adapted to various user requirements.
In a first aspect, the present invention provides a big data query method, where the method includes:
receiving a user permission application and distributing permissions to the user;
receiving the query service selected by the user and the input SQL query statement;
analyzing the SQL query statement and distributing a task to a computing engine;
and outputting the query result, the log, the resource utilization rate and the task information returned by the task executed by the computing engine.
In a second aspect, the present invention provides a big data query system, including:
the authority management module is used for receiving a user authority application and distributing authority to the user;
the input module is used for receiving the query service selected by the user and the SQL query statement input by the user;
the SQL analysis module is used for analyzing the SQL sentences and distributing tasks to the computing engine;
the output module is used for outputting a log, a query result, a resource utilization rate and task information returned by the task executed by the computing engine; and
and the function setting module is used for receiving user input to set functions.
In a third aspect, the present invention provides a computer-readable storage medium, which stores a computer program, wherein the computer program, when executed by a processor, implements the steps of the big data query method as described above.
In the invention, different authorities are distributed to clients, users are allowed to select query services and input uniform SQL query statements, a plurality of query scenes are integrated, the query of various distributed data storage engines and conventional databases under mass data is supported, the resource utilization rate is displayed and controlled according to the finest granularity, and the reference is provided for the optimization of users, intercepting when SQL query statement relates to system risk, detecting sensitive data when query result is downloaded, thereby ensuring the safety of data and the stability of service, and supporting the user to customize the functions of log error reporting unified processing, self-defining system notification, multi-option card inquiry, dispatching random red packet, setting uploading and downloading magnitude for each person, and the like, therefore, the system can flexibly interface the authority systems, alarms, notifications, red packet services and the like of other systems so as to adapt to various user requirements.
Drawings
Fig. 1 is a flowchart of a big data query method according to an embodiment of the present invention.
Fig. 2 is a block diagram of a big data query system according to a second embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
The first embodiment is as follows:
as shown in fig. 1, a big data query method provided in an embodiment of the present invention includes the following steps, it should be noted that, if substantially the same result is obtained, the big data query method of the present invention is not limited to the flow sequence shown in fig. 1.
S101, receiving a user authority application and distributing authority to the user.
In the first embodiment of the present invention, the permission application of the user includes, but is not limited to, a usable query engine, an accessible database, and an accessible data table application, and the permission assignment to the user may be automatically based on the user permission application to assign the engine query permission, the database, and the data table access permission to the user, or may be assigned to the user after an administrator approves a user permission application process. When receiving a user authority application, the user, the database, the data table and the function authority are recorded, and after the authority is approved, the meta information can be changed correspondingly. When a user inputs SQL sentences for query, the SQL meta-information is analyzed and examined. The system bottom layer stores information such as link addresses, account numbers, passwords and the like of all the query engines in an API calling mode, and the information is directly and uniformly configured by an administrator.
S102, receiving the query service selected by the user and the input SQL query statement.
In the first embodiment of the present invention, the user may select the query service required by the user, wherein the query service may select one or more of Hive, ES, Kafka, Mongodb, or Phoenix. After the user selects the query service, the user can perform query after inputting the SQL query statement. Various query scenes are integrated so as to unify query entries, and a user can query in a unified SQL visualization query mode and support the query of various distributed data storage engines and conventional databases under mass data.
S103, analyzing the SQL query statement and distributing the task to a computing engine.
And when the SQL query statement input by the user is analyzed, outputting optimization guidance information to guide the user to optimize the SQL query statement and guide the user to carry out advanced learning when the abnormal SQL statement or the SQL statement which can be optimized is detected, wherein the optimization guidance information can be updated regularly or added manually. When the SQL sentence syntax error report occurs or the error report occurs during operation, formatting processing or user presetting processing is carried out, and the presetting processing allows a user to preset error report processing information and supports linking to a knowledge document library or a questioning robot. SQL sentences are analyzed to generate SQL portrait information, a routing algorithm is set according to the SQL portrait information to automatically distribute computing tasks to the fastest computing engine, and users do not need to pay attention to underlying computing logic. The SQL portrait information is one or more of read data size, read table/partition number, various Join times, number of associated fields, aggregation complexity and filtering conditions, and the calculation engine is one or more of Presto, Spark SQL, Druid, MR and TEZ. Furthermore, SQL query statements may be intercepted when they are detected as operations that are risky and/or involve system stability.
And S104, outputting a query result, a log, a resource utilization rate and task information returned by the task executed by the computing engine.
The computing engine returns a query result, a log, the resource utilization rate and task information after executing a query task, the query result is output, data of the query result can be downloaded, the downloading magnitude is set by an administrator in advance for an individual, and the downloaded data is detected to be automatically encrypted or trigger a data security approval stream when sensitive data is contained in the downloaded data. The log of the current query process can be output in real time, a user can also check the historical log, different tab queries are selected, different query results can be correspondingly displayed, and the error-reporting log supports user-defined explanation, documents or knowledge guidance. When the query task is executed, the resource utilization rate and task information can be output in real time, wherein the resource utilization rate can be displayed and controlled in a fine-grained manner according to one or more dimensions of departments, queues, personnel, tasks and the like, and the task information comprises SQL details of the tasks, task state information and the like. The user can perform corresponding regulation and control on task termination, task removal and the like according to the resource utilization rate.
In the first embodiment of the present invention, the big data query method may further include the steps of:
and S105, receiving user input and setting functions, wherein the functions comprise one or more of surprise red packet, system notification, uploading and downloading of large data magnitude, multi-option card inquiry or log error unified processing. The surprise red packet function can trigger a certain probability to give a reward to the red packet according to the operation condition of the user SQL. The system notification feature may allow a user to customize the notification information desired to be presented. The uploading and downloading function of the big data magnitude supports direct uploading of Excel files into hive lists, supports file uploading functions (including different forms of storage formats such as jar, txt, Excel and compressed files), supports downloading of the big files, can set downloading authorities of different magnitudes owned by different users, and can set triggering of a sensitive data auditing mechanism when the downloaded data contains sensitive data. The multi-tab query function supports multiple windows to query in parallel, and a user does not need to open multiple window pages. The log error reporting unified processing function packages and classifies the system error reporting in a unified way, so that complicated stack error reporting is more humanized, and the system error reporting unified processing function can be appointed to be linked to a knowledge base of each system service, and a user can check various help document information at any time.
Example two:
as shown in fig. 2, a second embodiment of the present invention provides a big data query system 200, which includes a rights management module 201, an input module 202, an SQL parsing module 203, an output module 204, and a function setting module 205.
The right management module 201 is configured to receive a user right application and assign a right to a user. The user's authority application includes but is not limited to available inquiry engine, accessible database and accessible data table, and the system or administrator may assign the engine inquiry authority, database and data table access authority to the user according to the user's authority application. When receiving a user authority application, the system records a database, a data table and a function authority of a user, and changes the meta-information after the authority examination and approval is passed. When a user writes SQL to make a query, the SQL meta-information is parsed and examined. The system bottom layer stores information such as link addresses, account numbers, passwords and the like of all the query engines in an API calling mode, and the information is directly and uniformly configured by an administrator.
The input module 202 is used for receiving the query service selected by the user and the SQL query statement input by the user. Wherein the query service is one or more of Hive, ES, Kafka, Mongdb, or Phoenix. The big data query system 200 unifies query entries through the input module 202, integrates various query scenarios, and supports query of various distributed data storage engines and conventional databases under mass data in a unified SQL-based visual query manner.
And the SQL parsing module 203 is used for parsing the SQL statement and distributing the task to the computing engine. Specifically, the SQL parsing module 203 may check an SQL statement that is not standardized or can be optimized, and output optimization guidance information, which may be updated periodically or added manually. The SQL parsing module 203 may perform formatting processing or user presetting processing on SQL syntax errors or runtime errors, where the presetting processing may preset error processing information to support linking to a knowledge document library or a questioning robot. The SQL analysis module 203 automatically distributes the calculation task to the fastest calculation engine according to the SQL portrait information setting routing algorithm, and the user does not need to pay attention to the underlying operation logic. The SQL portrait information is one or more of read data size, read table/partition number, various Join times, number of associated fields, aggregation complexity and filtering conditions, and the calculation engine supports one or more of Presto, Spark SQL, Druid, MR and TEZ. SQL parsing module 203 intercepts when operations are detected that are at risk and/or involve system stability.
And the output module 204 is used for outputting logs, query results, resource utilization rate and task information. The log of the current query process can be output in real time, and the historical log can also be viewed, wherein the error log supports custom explanation, documents or knowledge guidance. The data of the query result can be downloaded, the downloading magnitude is set by an administrator in advance, and when the downloaded data contains sensitive data, the data can be automatically encrypted or a data security approval stream is triggered. The output module 204 may also output resource usage and task information when the current task is executed. The resource utilization rate can be displayed and controlled in a fine-grained manner according to one or more dimensions of departments, queues, personnel, tasks and the like, and the task information comprises SQL details of the tasks, task state information and the like. The user can perform corresponding regulation and control on task termination or task removal and the like according to the resource utilization rate.
A function setting module 205, configured to receive a user input to set a function, where the function includes one or more of a surprise red packet, a system notification, an upload and download of a big data level, a multi-tab query, and a log error unification process. The surprise red packet function can trigger a certain probability to give a red packet reward according to the operation condition of the user SQL. The system notification feature may allow a user to customize the notification information desired to be presented. The uploading and downloading function of the big data magnitude supports direct uploading of Excel files into hive lists, supports file uploading functions (including different forms of storage formats such as jar, txt, Excel and compressed files), supports downloading of the big files, can set downloading authorities of different magnitudes owned by different users, and can set triggering of a sensitive data auditing mechanism when the downloaded data contains sensitive data. The multi-tab query function supports multiple windows to query in parallel, and a user does not need to open multiple window pages. The log error reporting unified processing function packages and classifies the system error reporting in a unified way, so that complicated stack error reporting is more humanized, and the system error reporting unified processing function can be appointed to be linked to a knowledge base of each system service, and a user can check various help document information at any time.
The big data query system provided by the second embodiment of the present invention and the big data query method provided by the first embodiment of the present application belong to the same concept, and the specific implementation process thereof is detailed throughout the entire specification and will not be described herein again.
EXAMPLE III
A third embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the third embodiment of the present invention implements the steps of the big data query method according to the first embodiment of the present invention.
In the invention, different authorities are distributed to clients, users are allowed to select query services and input uniform SQL query statements, a plurality of query scenes are integrated, the query of various distributed data storage engines and conventional databases under mass data is supported, the resource utilization rate is displayed and controlled according to the finest granularity, and the reference is provided for the optimization of users, intercepting when SQL query statement relates to system risk, detecting sensitive data when query result is downloaded, thereby ensuring the safety of data and the stability of service, and supporting the user to customize the functions of log error reporting unified processing, self-defining system notification, multi-option card inquiry, dispatching random red packet, setting uploading and downloading magnitude for each person, and the like, therefore, the system can flexibly interface the authority systems, alarms, notifications, red packet services and the like of other systems so as to adapt to various user requirements.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A big data query method is characterized by comprising the following steps:
receiving a user permission application and distributing permissions to the user;
receiving the query service selected by the user and the input SQL query statement;
analyzing the SQL query statement and distributing a task to a computing engine;
and outputting the query result, the log, the resource utilization rate and the task information returned by the task executed by the computing engine.
2. The method of claim 1, wherein the query service is one or more of Hive, ES, Kafka, Mongdb, or Phoenix.
3. The method of claim 1, wherein the resource usage is fine-grained exposed and managed in one or more dimensions of department, queue, people, tasks.
4. The method of claim 1, wherein parsing the SQL query statement and distributing tasks to a computing engine specifically is:
analyzing the SQL query statement to generate SQL portrait information, setting a routing algorithm according to the SQL portrait information to automatically distribute a calculation task to a fastest calculation engine, wherein the SQL portrait information is one or more of data reading amount, reading table/partition number, various Join times, number of associated fields, aggregation complexity and filtering conditions, and the calculation engine is one or more of Presto, Spark SQL, Druid, MR and TEZ.
5. The method of claim 1, wherein parsing the SQL query statement and distributing tasks to a compute engine further comprises:
intercepting when the SQL query statement is analyzed to be risky and/or related to the operation of system stability.
6. The method of claim 1, wherein the method further comprises:
and when the query result is downloaded, detecting whether the downloaded data contains sensitive data, and if so, encrypting the sensitive data or triggering a data security approval process.
7. The method of claim 1, wherein the method further comprises:
receiving user input and setting functions, wherein the functions comprise one or more of surprise red packet, system notification, uploading and downloading of large data magnitude, multi-option card inquiry or log error unified processing.
8. The method of claim 7, wherein the logging unified processing function uniformly encapsulates and categorizes system errors and specifies a repository that links to various system services.
9. A big data query system, the system comprising:
the authority management module is used for receiving a user authority application and distributing authority to the user;
the input module is used for receiving the query service selected by the user and the SQL query statement input by the user;
the SQL analysis module is used for analyzing the SQL sentences and distributing tasks to the computing engine;
the output module is used for outputting a log, a query result, a resource utilization rate and task information returned by the task executed by the computing engine; and
and the function setting module is used for receiving user input to set functions.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the big data query method according to any one of claims 1 to 8.
CN202110277458.1A 2021-03-15 2021-03-15 Big data query method, system and computer readable storage medium Pending CN113064925A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110277458.1A CN113064925A (en) 2021-03-15 2021-03-15 Big data query method, system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110277458.1A CN113064925A (en) 2021-03-15 2021-03-15 Big data query method, system and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113064925A true CN113064925A (en) 2021-07-02

Family

ID=76560590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110277458.1A Pending CN113064925A (en) 2021-03-15 2021-03-15 Big data query method, system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113064925A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919612A (en) * 2015-12-25 2017-07-04 中国移动通信集团浙江有限公司 A kind of processing method and processing device of SQL script of reaching the standard grade
CN108509805A (en) * 2018-03-21 2018-09-07 深圳天源迪科信息技术股份有限公司 Data encrypting and deciphering and desensitization runtime engine and its working method
CN109033123A (en) * 2018-05-31 2018-12-18 康键信息技术(深圳)有限公司 Querying method, device, computer equipment and storage medium based on big data
CN110968593A (en) * 2019-12-10 2020-04-07 上海达梦数据库有限公司 Database SQL statement optimization method, device, equipment and storage medium
CN111506611A (en) * 2020-04-21 2020-08-07 北京同邦卓益科技有限公司 Data query method, device, equipment and storage medium
WO2020228801A1 (en) * 2019-05-15 2020-11-19 华为技术有限公司 Multi-language fusion query method and multi-model database system
CN112347120A (en) * 2020-10-27 2021-02-09 蜂助手股份有限公司 Automatic optimization method and device based on complex SQL
CN112416964A (en) * 2020-11-17 2021-02-26 深圳依时货拉拉科技有限公司 Data processing method, device and system, computer equipment and computer readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919612A (en) * 2015-12-25 2017-07-04 中国移动通信集团浙江有限公司 A kind of processing method and processing device of SQL script of reaching the standard grade
CN108509805A (en) * 2018-03-21 2018-09-07 深圳天源迪科信息技术股份有限公司 Data encrypting and deciphering and desensitization runtime engine and its working method
CN109033123A (en) * 2018-05-31 2018-12-18 康键信息技术(深圳)有限公司 Querying method, device, computer equipment and storage medium based on big data
WO2020228801A1 (en) * 2019-05-15 2020-11-19 华为技术有限公司 Multi-language fusion query method and multi-model database system
CN110968593A (en) * 2019-12-10 2020-04-07 上海达梦数据库有限公司 Database SQL statement optimization method, device, equipment and storage medium
CN111506611A (en) * 2020-04-21 2020-08-07 北京同邦卓益科技有限公司 Data query method, device, equipment and storage medium
CN112347120A (en) * 2020-10-27 2021-02-09 蜂助手股份有限公司 Automatic optimization method and device based on complex SQL
CN112416964A (en) * 2020-11-17 2021-02-26 深圳依时货拉拉科技有限公司 Data processing method, device and system, computer equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US10789204B2 (en) Enterprise-level data protection with variable data granularity and data disclosure control with hierarchical summarization, topical structuring, and traversal audit
US10824758B2 (en) System and method for managing enterprise data
US20170154188A1 (en) Context-sensitive copy and paste block
US10891552B1 (en) Automatic parser selection and usage
US9646088B1 (en) Data collection and transmission
US7440905B2 (en) Integrative risk management system and method
CN110069335A (en) Task processing system, method, computer equipment and storage medium
EP3133507A1 (en) Context-based data classification
US9367586B2 (en) Data validation and service
CN110134658B (en) Log monitoring method, device, computer equipment and storage medium
AU2022203757A1 (en) Database security
CN108733532B (en) Health degree control method and device for big data platform, medium and electronic equipment
EP3468145B1 (en) Automated vulnerability grouping
US11783349B2 (en) Compliance management system
US20120254337A1 (en) Mainframe Management Console Monitoring
EP3196798A1 (en) Context-sensitive copy and paste block
US20120254416A1 (en) Mainframe Event Correlation
US8738768B2 (en) Multiple destinations for mainframe event monitoring
US11416631B2 (en) Dynamic monitoring of movement of data
CN114021176B (en) SELinux dynamic authorization method and system
Seenivasan ETL (extract, transform, load) best practices
CN113342775A (en) Centralized multi-tenant-as-a-service in a cloud-based computing environment
CN113064925A (en) Big data query method, system and computer readable storage medium
CN116910023A (en) Data management system
US11645659B2 (en) Facilitating customers to define policies for their clouds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210702