CN113190573A - Data file analysis processing method and device based on SQL-like and electronic equipment - Google Patents

Data file analysis processing method and device based on SQL-like and electronic equipment Download PDF

Info

Publication number
CN113190573A
CN113190573A CN202110476827.XA CN202110476827A CN113190573A CN 113190573 A CN113190573 A CN 113190573A CN 202110476827 A CN202110476827 A CN 202110476827A CN 113190573 A CN113190573 A CN 113190573A
Authority
CN
China
Prior art keywords
sql
syntax
operator
operators
data file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110476827.XA
Other languages
Chinese (zh)
Inventor
郑晓旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zuoyebang Education Technology Beijing Co Ltd
Original Assignee
Zuoyebang Education Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zuoyebang Education Technology Beijing Co Ltd filed Critical Zuoyebang Education Technology Beijing Co Ltd
Priority to CN202110476827.XA priority Critical patent/CN113190573A/en
Publication of CN113190573A publication Critical patent/CN113190573A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2445Data retrieval commands; View definitions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of big data analysis, and discloses a data file analysis processing method and device based on similar SQL (structured query language) and electronic equipment, wherein the data file analysis processing method based on similar SQL comprises the following steps: receiving SQL-like statements, and analyzing and converting the SQL-like statements into a plurality of groups of syntax operators; and calling the data file, and carrying out operation analysis processing on the data file by each syntax operator according to the mutual logical relation. According to the data file analyzing and processing method based on the SQL-like, an analyst only needs to input the SQL-like statements, the SQL-like statements are analyzed and converted into a plurality of groups of syntax operators, the data file to be analyzed is loaded to a memory, and the plurality of groups of syntax operators perform analyzing calculation to finally output a result; the data file analyzing and processing method based on the SQL-like adopts a SQL-like + memory computing mode, perfectly realizes low learning cost, reduces the text data analyzing and counting cost with unified rules, and simultaneously realizes the aim of instant use.

Description

Data file analysis processing method and device based on SQL-like and electronic equipment
Technical Field
The invention relates to the technical field of big data analysis, in particular to a data file analysis processing method and device based on similar SQL and an electronic device.
Background
With the rapid development of the internet industry, large data analysis is gradually popularized, a large number of scenes with fixed schema (also called schema) data file analysis often appear, and the schema is defined as a collection of database entities forming a single namespace. A namespace is a collection in which the name of each element is unique. Here we can view the schema as a container that holds objects in a database. Fixed schema data files are commonly referred to as log files, execl format large files, and similar files need to be analyzed and counted by analysis software (execl, Hadoop, presto, etc.) or scripts (awk in bash, sort combination).
The prior art has the following disadvantages:
firstly, the analysis software execl and the script bash script are processed based on a host memory, so that the analysis and statistics purposes can be achieved, but the learning cost of the execl and the use cost of the bash command are very high.
The Hadoop (distributed system infrastructure), presto (distributed SQL query engine) and the like belong to open source services in the technical field of big data, but the installation cost and the learning cost are high, and data to be analyzed can be analyzed and counted only by being independently stored after being analyzed, so that the aim of being used immediately after being opened cannot be achieved.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
In order to solve the above problems, the present invention proposes
A data file analysis processing method based on SQL-like includes:
receiving SQL-like statements, and analyzing and converting the SQL-like statements into a plurality of groups of syntax operators;
and calling the data file, and carrying out operation analysis processing on the data file by each syntax operator according to the mutual logical relation.
As an optional implementation manner of the present invention, the receiving the SQL-like statements, and performing analysis and conversion on the SQL-like statements into multiple sets of syntax operators includes:
extracting key signs based on the received SQL-like sentences;
according to the extracted key signs, performing overall SQL-like statement segmentation, and segmenting SQL clauses corresponding to each key sign into Query nodes Query;
and carrying out syntax abstraction on each group of segmented Query nodes, and converting the Query nodes into syntax operators.
As an optional implementation manner of the invention, the mutual logical relationship between the syntax operators is determined according to the mutual association relationship of the SQL statement group corresponding to each Query node Query in the SQL-like statement;
and connecting the syntax operators according to the mutual logical relationship among the syntax operators, and converting the whole SQL-like statement into an abstract syntax tree combined by a plurality of groups of syntax operators.
As an optional embodiment of the present invention, the invoking the locally stored data file, and controlling each syntax operator to perform operation analysis processing on the data file according to a mutual logical relationship includes:
connecting the registered sets of syntax operators according to the context structure of the abstract syntax tree;
calling a data text and loading the data text into an operating memory;
and each group of syntactic operators performs operation, analysis and statistical calculation on the data file according to the connection relation.
As an optional embodiment of the present invention, the extracting key tokens based on the received SQL-like statements includes:
based on the SQL-like statements, extracting key tokens includes selecting "SELECT", FROM "FROM", locating "WHERE", subscribed to "ORDERBY", limiting to "LIMIT", calculating a combination of one or more of the total "COUNT".
As an alternative embodiment of the present invention, the syntax operator includes one or a combination of more of "Fields" operator, "GroupBy" operator, "OrderBy" operator, "Where" operator, "Count" operator, "stop" operator, "Avg" operator, "Sum" operator, "Max" operator, "Min" operator, "FROM _ UNIXTIME" operator, and "UNIX _ TIMESTAMP" operator.
The invention also provides a data file analyzing and processing device based on similar SQL, which comprises:
the syntax operator conversion module is used for receiving the SQL-like sentences and analyzing and converting the SQL-like sentences into a plurality of groups of syntax operators;
and the operator manager is used for uniformly managing and maintaining the input, output and calculation methods of all the syntax operators, calling the data files stored by the local computer, and controlling the syntax operators to carry out operation analysis processing on the data files according to the mutual logical relationship.
As an optional embodiment of the present invention, the syntax operator conversion module includes:
a token extractor for extracting key tokens based on the received SQL-like statements;
the Query segmenter is used for segmenting the whole SQL-like sentences according to the extracted key signs and segmenting the SQL clauses corresponding to each key sign into Query nodes;
and the abstract syntax tree analyzer is used for performing syntax abstraction on each group of the Query nodes Query which are segmented, converting the syntax abstraction into syntax operators, determining the mutual logical relationship among the syntax operators according to the mutual incidence relationship of the SQL clauses corresponding to the Query nodes Query in the SQL-like sentences, connecting the syntax operators according to the mutual logical relationship among the syntax operators, and converting the whole SQL-like sentences into abstract syntax trees combined by a plurality of groups of syntax operators.
The present invention also provides an electronic device comprising a processor and a memory, said memory for storing a computer executable program,
when the computer program is executed by the processor, the processor executes the SQL-like data file analysis processing method.
The invention also provides a computer readable storage medium, which stores a computer executable program, and when the computer executable program is executed, the data file analysis processing method based on the SQL-like is realized.
Compared with the prior art, the invention has the beneficial effects that:
according to the data file analyzing and processing method based on the SQL-like, an analyst only needs to input the SQL-like statements, the SQL-like statements are analyzed and converted into a plurality of groups of syntax operators, the data files to be analyzed are loaded to the memory, and the plurality of groups of syntax operators perform analyzing calculation to finally output results. The data file analyzing and processing method based on the SQL-like adopts a SQL-like + memory computing mode, perfectly realizes low learning cost, reduces the text data analyzing and counting cost with unified rules, and simultaneously realizes the aim of instant use.
Description of the drawings:
FIG. 1 is a block flow diagram of a data file analysis processing method based on SQL-like;
FIG. 2 is a block diagram example of the flow of step S120 in the SQL-like data file analyzing and processing method according to the present invention;
FIG. 3 is a block diagram example of the flow of step S220 in the SQL-like data file analyzing and processing method according to the present invention;
FIG. 4 is a block diagram of the structure and an example of the workflow of the SQL-like data file analyzing and processing device according to the present invention;
FIG. 5 is an example of the SQL-like syntax of the present invention;
FIG. 6 is an example of conditional operators in an SQL-like statement of the present invention;
FIG. 7 is an example of an aggregation function in an SQL-like statement of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments.
Thus, the following detailed description of the embodiments of the invention is not intended to limit the scope of the invention as claimed, but is merely representative of some embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments of the present invention and the features and technical solutions thereof may be combined with each other without conflict.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that the terms "upper", "lower", and the like refer to orientations or positional relationships based on those shown in the drawings, or orientations or positional relationships that are conventionally arranged when the products of the present invention are used, or orientations or positional relationships that are conventionally understood by those skilled in the art, and such terms are used for convenience of description and simplification of the description, and do not refer to or imply that the devices or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
Referring to fig. 1, the present embodiment provides a method for analyzing and processing a data file based on SQL-like, including:
s110, receiving SQL-like statements;
s120, analyzing and converting the SQL-like statements into a plurality of groups of syntax operators;
s210, calling a data file;
s220, the syntactic operators perform operation analysis processing on the data file according to the mutual logical relation.
SQL (structured Query language) is a core language of a database, and SQL is a standard computer language for accessing and processing a relational database, and is a high-level non-procedural programming language. The multifunctional electric water heater is powerful in function, high in efficiency, simple, easy to learn and easy to maintain. The SQL language is basically independent of the database itself, the machine used, the network, the operating system.
SQL languages are divided into four major classes: data query language DQL, data manipulation language DML, data definition language DDL, data control language DCL:
1. data query language DQL
The DQL basic structure of the data query language is a query block consisting of a SELECT clause, a FROM clause and a WHERE clause:
SELECT < field name TABLE >
FROM < Table or View name >
WHERE < query >
2. Data manipulation language DML
The data manipulation language DML has three main forms:
1) inserting: INSERT
2) Updating: UPDATE
3) And (3) deleting: DELETE
3 data definition language DDL
The data definition language DDL is used to create various objects in a database-tables, views, indexes, synonyms, clusters, etc. such as:
CREATE TABLE/VIEW/INDEX/SYN/CLUSTER
|||||
table view index synonym cluster
4. Data control language DCL
The data control language DCL is used to grant or reclaim certain privileges to access the database, and to control when and how efficiently database manipulation transactions occur, to monitor the database, etc. Such as:
1) GRANT: and (4) authorizing.
2) ROLLBACK [ WORK ] TO [ SAVEPOINT ]: back to a certain point.
ROLLBACK, the command that rolls the database state back to the last committed state. The format is as follows: SQL > ROLLBACK.
3) COMMIT [ WORK ]: and (5) submitting.
In database insert, delete, and modify operations, a transaction is only completed when it is committed to the database. Before the transaction commits, only the person operating the database can have the right to see what is done, and others can see only after the last commit is completed.
There are three types of submitted data: explicit submission, implicit submission, and automatic submission. These three types are described separately below.
(1) Explicit submission
The COMMIT that is done directly with the COMMIT command is an explicit COMMIT. The format is as follows: SQL > COMMIT.
(2) Implicit commit
A commit that is indirectly completed with an SQL command is an implicit commit. These commands are: ALTER, AUDIT, COMMENT, CONNECT, CREATE, DISCONNECT, DROP, EXIT, GRANT, NOAUDIT, QUIT, REVOKE, RENAME.
(3) Automatic submission
If AUTOCOMMIT is set to ON, the system will automatically submit after the insert, modify, delete statements are executed, which is automatic submission. The format is as follows: SQL > SET AUTOCOMMIT ON.
The SQL-like statement in this embodiment is a data query language DQL.
The data file of this embodiment is used to store text data with uniform rules, such as a log file with a fixed schema, and an execl format large file. The log file is a record file or a file set for recording system operation events and can be divided into event logs and message logs, and the SQL-like data file analysis processing method is used for processing historical data of the log file to realize important functions of tracing diagnosis problems, understanding system activities and the like. The large file in execl format is also called as spreadsheet, and is a computer program for simulating a calculation table on paper, which displays a grid formed by a series of rows and columns, and numerical values, calculation formulas or texts can be stored in the cells; according to the SQL-like data file analysis processing method, data extraction or operation can be achieved on the execl format large file by inputting the SQL-like sentences, a data analyst relies on the self-mastered SQL-like sentences to achieve data analysis on the execl format large file, the execl operation is not required to be learned, and the learning use cost of the data analyst is greatly reduced.
The operation analysis processing of the data file includes extracting the designated data in the data file, and/or performing mathematical operation (+, -) on the data at the designated position (column, row) in the data file, and/or performing mathematical comparison operation (>, <) on the data at the designated position (column, row) in the data file.
The SQL-like data file analysis processing method is high in popularity and low in learning cost based on SQL-like sentences.
The operator grammar of the embodiment is a grammar which adopts operators to operate, and is similar to dependency grammar, category grammar and predicate operation in mathematical logic. The operator is called "Operation" in english, and can be mathematically interpreted as a function space to function space mapping O: x- > X is a processing unit, which usually refers to a function, and when an operator is used, input and output are usually generated, and the operator completes conversion of corresponding data. Commonly used operator grammars include: selecting: for selecting data from the DataSet/DataStream for screening out certain columns; WHERE: for filtering data from a data set/stream, for use with SELECT, for horizontal partitioning of relationships according to certain conditions, i.e. selecting eligible records; DISTINCT: for deduplication from a dataset/stream according to the results of SELECT. GROUP BY: grouping data, for example, calculating the total score of each student in a score list; UNION and UNION ALL: UNION is used for combining two result sets, and requires that fields of the two result sets are completely consistent, wherein the fields comprise field types and field sequences, and unlike UNION ALL, UNION can deduplicate result data; JOIN: for combining data from both tables to form a result table.
The operator grammar of the embodiment calls the data file to perform data operation according to the operator, and directly obtains the operation analysis result of the operator on the data file.
In the data file analyzing and processing method based on the SQL-like statement, an analyst only needs to input the SQL-like statement, the SQL-like statement is analyzed and converted into a plurality of sets of syntax operators, the data file to be analyzed is loaded to the memory, and the plurality of sets of syntax operators perform analyzing and calculating to finally output a result. The data file analyzing and processing method based on SQL-like adopts a SQL-like + memory computing mode, perfectly realizes low learning cost, reduces the text data analyzing and counting cost with unified rules, and simultaneously realizes the aim of instant use.
Further, referring to fig. 2, in the method for analyzing and processing a data file based on SQL-like in this embodiment, receiving the SQL-like statements, and converting the SQL-like statements into multiple sets of syntax operators includes:
s121, extracting key signs based on the received SQL-like sentences;
and S122, according to the extracted key signs, performing overall SQL-like statement segmentation, and segmenting the SQL clauses corresponding to each key sign into Query nodes Query.
The embodiment is based on the SQL-like statement, and the extracting of the key mark comprises extracting the key mark "SELECT" in the SELECT clause, and/or extracting the key mark "FROM" in the FROM clause, and/or extracting the key mark "WHERE" in the WHERE clause, and/or extracting the key mark "ORDERBY" in the ORDERBY clause, and/or extracting the key mark "LIMIT" in the LIMIT clause, and/or extracting the key mark "COUNT" in the COUNT clause.
The SQL SELECT clause of this embodiment is used to select data from the table, and the results are stored in a result table (called a result set); SQL WHERE clause is used for defining the selection standard; the ORDER BY clause is used to sort the result set.
In this embodiment, the method for analyzing and processing the data file based on the SQL-like first extracts the key marks in the SQL-like statement, as shown in fig. 5, for example, the key marks included in the SQL-like statement are SELECT/FROM/WHERE/GROUP BY/ORDER BY/LIMIT, and then cuts the whole SQL-like statement into corresponding SQL clauses according to the key marks SELECT/FROM/WHERE/ORDER/LIMIT/COUNT.
And carrying out syntax abstraction on each group of segmented Query nodes, and converting the Query nodes into syntax operators. A grammar abstraction is the conversion of SQL-like source code into an abstract grammar structure. Query in this embodiment means a Query, a message sent by a search engine or a database in order to search a database for a specific file, website, record or a series of records.
Referring to fig. 2, the method for analyzing and processing a data file based on SQL-like in this embodiment includes:
s123, determining a logical relation between syntax operators according to the correlation relation of the SQL clauses corresponding to the Query nodes Query in the SQL-like sentences;
and S124, connecting the syntax operators according to the mutual logical relationship among the syntax operators, and converting the whole SQL-like statement into an abstract syntax tree combined by a plurality of groups of syntax operators.
Because the SQL clauses divided by the SQL-like statement have a mutual correlation relationship, for example, a WHERE clause is embedded in a SELECT clause, a WHERE clause is embedded in a COUNT clause, and the like, a logical relationship of sequential execution or result operation exists between the syntax operators obtained by carrying out syntax abstraction conversion on the SQL-like statement, and the syntax operators are connected according to the logical relationship to determine the sequential execution and result operation relationship between the syntax operators.
In computer science, an Abstract Syntax Tree (AST) or syntax tree is a tree representation of the abstract syntax structure of source code written in a programming language. Each node of the tree represents a construct that appears in the source code. The grammar is "abstract" in that it does not represent every detail that appears in the true grammar, but rather just structural, content-related details. For example, grouping brackets are implicit in the tree structure, and a syntax structure similar to the if-condition-then expression may be represented by a single node with three branches.
This distinguishes abstract syntax trees from the traditionally specified parse trees, which are typically built by parsers during source code conversion and compilation. Once constructed, additional information is added to the AST through subsequent processing (e.g., context analysis).
The parsing stage converts a token stream into the form of an Abstract Syntax Tree (AST) and uses the information in the token to convert them into a tree structure for an AST.
The AST tree is also called a Node per layer structure. An AST can be formed by a single node or hundreds or thousands of nodes, and the programs are combined to describe static analysis program syntax (static analysis is a process of analyzing code without executing code (code analysis is performed while code is executed, i.e., dynamic analysis).
Referring to fig. 3, the method for analyzing and processing a data file based on SQL-like in this embodiment, the invoking a locally stored data file and controlling each syntax operator to perform operation analysis processing on the data file according to a mutual logical relationship includes:
s221, connecting the registered sets of syntax operators according to the context structure of the abstract syntax tree;
s222, calling a data text and loading the data text into an operating memory;
and S223, operating, analyzing and carrying out statistical calculation on the data files by each group of syntax operators according to the connection relation.
In S221 of this embodiment, an operator (op) and a real-time operating system (kernel) are two most important concepts in the TF framework, and if an analogy is to be made, the op is considered to be equivalent to a function declaration, and the kernel is equivalent to a function implementation. For example, for matrix multiplication, an op called MatMul may be declared, indicating its name, inputs, outputs, parameters, and constraints on the parameters, etc. An op simply tells us what the purpose of this operation is, what is customizable inside the op, but does not provide a specific implementation. The specific implementation method of the op on a certain device is determined by kernel. The computation graph of the TF is composed of nodes, each node corresponds to one op, and when the computation graph is constructed, the operation corresponding to different nodes is only known, and how the operation is realized in the runtime is not known. That is, ops are compile-time concepts and kernel is run-time concepts. Therefore, the operator needs to be registered in the operating system before it can be called to execute. In this embodiment, the operation, analysis, and statistical calculation of each group of syntax operators for the data file according to the connection relationship includes that, since the connection relationship between the syntax operators corresponds to the mutual association relationship between the SQL clauses split by the SQL-like statement, each syntax operator operates for the data file, the operation result of each syntax operator operates according to the connection relationship between the syntax operators, and finally, the result of the analysis of the SQL-like statement for the data file is obtained and displayed.
Because a plurality of SQL clauses are included in the SQL-like statement of this embodiment, and a logical relationship exists between the SQL clauses, for example, if data needs to be conditionally selected from the table, the WHERE clause can be added to the SELECT clause, so that the entire SQL-like statement is converted into an abstract syntax tree, and the relationship between each group of syntax operators corresponding to each SQL clause is represented.
The grammar operator described in this embodiment includes one or a combination of more of "Fields" operator, "GroupBy" operator, "OrderBy" operator, "Where" operator, "Count" operator, "discontinuity" operator, "Avg" operator, "Sum" operator, "Max" operator, "Min" operator, "FROM _ UNIXTIME" operator, and "UNIX _ TIMESTAMP" operator.
An example of the data file analysis processing method based on the SQL-like service of this embodiment is as follows:
Ng access.log demo
1.1.1.1--[03/Jul/2020:17:41:29+0800]"GET/ttt/l/11lId=11HTTP/1.0"2002290"https://dddd.test.cc/ssss/view/ttt/task/contact""S=4aebaa3f5d964a8e0930059af1898""Mozilla/5.0(Windows NT 10.0;Win64;x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/79.0.3945.88Safari/537.36"0.059 2489926140 21.10.134.321.10.128.156unix:/hhhh/hhhhwork/var/php-cgi.sock dddd.test.cc"1.1.1.1"hhhhwork question 24899261402625677504070317 1593769289.982 0.786
viewing access timestamps
fql-s"select@1as ip,UNIX_TIMESTAMP(@4::'[%d/%em/%y:%h:%i:%s')as ts,@4,@7as uri from test/access.log limit 4"
IP|TS|@4|URI
21.10.128.229|1593766881|[03/Jul/2020:17:01:21|/llllcall/b/getrecord
21.10.128.229|1593766881|[03/Jul/2020:17:01:21|/llllcall/b/getrecord
21.10.128.161|1593766881|[03/Jul/2020:17:01:21|/llllcall/b/getrecord
21.10.134.3|1593766881|[03/Jul/2020:17:01:21|
/aaasc/b/editnamea=567809&a=106951695&name=%E5%BC%A0%
Statistical access PV/UV
fql-s"select count(@1)as PV,count(distinct(@1))as UV from test/access.log"
PV|UV
77394|150
Referring to fig. 5, an example of the SQL-like syntax of the present embodiment is shown.
Referring to fig. 6, for the conditional operator in the SQL-like statement of the present embodiment, the operator is a reserved word or a character in a where clause mainly used in an SQL statement, and is used to perform an operation, for example: comparison and arithmetic operations. These operators are used to specify conditions in sql statements and act as conjunctions of multiple conditions in the statements. The following are common operators: arithmetic operators (+, -,/,% and so on); comparison operators (>, <, >, and the like); logical operators (in, not in, etc.); negating the conditional operator.
Referring to fig. 7, it shows the aggregation function in the SQL-like statement in this embodiment. When accessing a database, it is often necessary to perform statistical analysis on a certain list of data in the table, such as finding the maximum, minimum, average, and the like. All of these analyses for one or more columns of data in the table are referred to as aggregate analyses. In SQL, aggregate analysis of data can be quickly implemented using an aggregation function that performs computations on a set of values and returns a single value.
The characteristics of the aggregation function:
1. except for COUNT, the aggregation function ignores null values.
2. The aggregation function is often used with the GROUP BY clause of the SELECT statement.
3. All aggregation functions are deterministic. Whenever they are to be invoked with a given set of input values, the same values are returned.
4. Scalar function: only a single number or value can be calculated. The method mainly comprises four categories of character functions, date/time functions, numerical value functions and conversion functions.
Common aggregation functions include:
1. number of records/number of items, etc.: count ().
2. Average a certain column of avg (). 3. Summing, total score, etc.: sum () - -must be a column of numbers.
4. Maximum, highest score, highest payroll, etc.: max ().
5. Minimum, minimum score, minimum payroll, etc.: min ().
6. count _ big () returns the number of items in the specified group.
Unlike the count () function, count _ big () returns a bigint value, and count () returns an int value.
7. grouping () generates an additional column.
When a row is added with the cube or rollup operator, the output value is 1;
when the added row is not generated by cube or rolup, the output value is 0.
8. The binary checksum () returns the binary check value computed for the row or expression list in the table for detecting changes to the row in the table.
9. checksum _ agg () returns a check value specifying the data, and null values are ignored.
10. checksum () returns the check value computed on the row of the table or on the expression list for generating the hash index.
11. stdev () returns the statistical standard deviation of all values in a given expression.
12. stdevp () returns the padding statistical standard deviation for all values in a given expression.
13. var () returns the statistical variance of all values in a given expression.
14. varp () returns the filled statistical variance of all values in a given expression.
Referring to fig. 4, the present embodiment also provides an apparatus for analyzing and processing a data file based on SQL-like, including:
the syntax operator conversion module is used for receiving the SQL-like sentences and analyzing and converting the SQL-like sentences into a plurality of groups of syntax operators;
and the operator manager is used for uniformly managing and maintaining the input, output and calculation methods of all the syntax operators, calling the data files stored by the local computer, and controlling the syntax operators to carry out operation analysis processing on the data files according to the mutual logical relationship.
The operator manager of the embodiment uniformly manages and maintains the input, output and calculation methods of all operators; and the main process manager connects the registered operators according to the context structure of the syntax tree, loads the data text into the memory, and performs multi-dimensional operation, analysis and statistical calculation.
The SQL-like statement in this embodiment is a data query language DQL.
The SQL-like data file analysis processing device has high popularity of SQL-like sentences and low learning cost, and most data analysts are proficient.
The operator grammar of the embodiment is a grammar which adopts operators to operate, and is similar to dependency grammar, category grammar and predicate operation in mathematical logic. The operator is called "Operation" in english, and can be mathematically interpreted as a function space to function space mapping O: x- > X is a processing unit, which usually refers to a function, and when an operator is used, input and output are usually generated, and the operator completes conversion of corresponding data.
The operator syntax commonly used by the SQL-like data file analysis processing apparatus in this embodiment includes: selecting: for selecting data from the DataSet/DataStream for screening out certain columns; WHERE: for filtering data from a data set/stream, for use with SELECT, for horizontal partitioning of relationships according to certain conditions, i.e. selecting eligible records; DISTINCT: for deduplication from a dataset/stream according to the results of SELECT. GROUP BY: grouping data, for example, calculating the total score of each student in a score list; UNION and UNION ALL: UNION is used for combining two result sets, and requires that fields of the two result sets are completely consistent, wherein the fields comprise field types and field sequences, and unlike UNION ALL, UNION can deduplicate result data; JOIN: for combining data from both tables to form a result table.
The operator grammar of the embodiment calls the data file to perform data operation according to the operator, and directly obtains the operation analysis result of the operator on the data file.
In the data file analyzing and processing device based on the SQL-like, an analyst only needs to input the SQL-like statements, the SQL-like statements are analyzed and converted into a plurality of sets of syntax operators, the data file to be analyzed is loaded to the memory, and the plurality of sets of syntax operators perform analysis and calculation to finally output a result. The data file analyzing and processing method based on SQL-like adopts a SQL-like + memory computing mode, perfectly realizes low learning cost, reduces the text data analyzing and counting cost with unified rules, and simultaneously realizes the aim of instant use.
Further, referring to fig. 4, the syntax operator conversion module according to this embodiment includes:
a token extractor for extracting key tokens based on the received SQL-like statements;
the Query segmenter is used for segmenting the whole SQL-like sentences according to the extracted key signs and segmenting the SQL clauses corresponding to each key sign into Query nodes;
the abstract syntax tree analyzer is used for carrying out syntax abstraction on each group of Query nodes Query which are segmented, converting the syntax abstraction into syntax operators and determining the mutual logical relationship between the syntax operators according to the mutual incidence relationship of SQL clauses corresponding to the Query nodes Query in SQL-like sentences; and connecting the syntax operators according to the mutual logical relationship among the syntax operators, and converting the whole SQL-like statement into an abstract syntax tree combined by a plurality of groups of syntax operators.
The embodiment is based on the SQL-like statement, and the extracting of the key mark comprises extracting the key mark "SELECT" in the SELECT clause, and/or extracting the key mark "FROM" in the FROM clause, and/or extracting the key mark "WHERE" in the WHERE clause, and/or extracting the key mark "ORDERBY" in the ORDERBY clause, and/or extracting the key mark "LIMIT" in the LIMIT clause, and/or extracting the key mark "COUNT" in the COUNT clause.
The SQL SELECT clause of this embodiment is used to select data from the table, and the results are stored in a result table (called a result set); SQL WHERE clause is used for defining the selection standard; the ORDER BY clause is used to sort the result set.
Query in this embodiment means a Query, a message sent by a search engine or a database in order to search a database for a specific file, website, record or a series of records.
In computer science, an Abstract Syntax Tree (AST) or syntax tree is a tree representation of the abstract syntax structure of source code written in a programming language. Each node of the tree represents a construct that appears in the source code. The grammar is "abstract" in that it does not represent every detail that appears in the true grammar, but rather just structural, content-related details. For example, grouping brackets are implicit in the tree structure, and a syntax structure similar to the if-condition-then expression may be represented by a single node with three branches.
This distinguishes abstract syntax trees from the traditionally specified parse trees, which are typically built by parsers during source code conversion and compilation. Once constructed, additional information is added to the AST through subsequent processing (e.g., context analysis).
Abstract syntax trees are also used in program analysis and program transformation systems.
The Parser is a Parser for converting js source codes into abstract syntax trees and is generally divided into lexical analysis, syntax analysis and code generation or execution.
The JavaScript Parser is a Parser for converting js source codes into abstract syntax trees, and is generally divided into lexical analysis, syntax analysis and code generation or execution.
The parsing stage converts a token stream into the form of an Abstract Syntax Tree (AST) and uses the information in the token to convert them into a tree structure for an AST.
The AST tree is also called a Node per layer structure. An AST can be formed by a single node or hundreds or thousands of nodes, and the programs are combined to describe static analysis program syntax (static analysis is a process of analyzing code without executing code (code analysis is performed while code is executed, i.e., dynamic analysis).
The operator manager of the embodiment connects the registered sets of syntax operators according to the context structure of the abstract syntax tree;
calling a data text and loading the data text into an operating memory;
and each group of syntactic operators performs operation, analysis and statistical calculation on the data file according to the connection relation.
Because a plurality of SQL clauses are included in the SQL-like statement of this embodiment, and a logical relationship exists between the SQL clauses, for example, if data needs to be conditionally selected from the table, the WHERE clause can be added to the SELECT clause, so that the entire SQL-like statement is converted into an abstract syntax tree, and the relationship between each group of syntax operators corresponding to each SQL clause is represented.
The grammar operator described in this embodiment includes one or a combination of more of "Fields" operator, "GroupBy" operator, "OrderBy" operator, "Where" operator, "Count" operator, "discontinuity" operator, "Avg" operator, "Sum" operator, "Max" operator, "Min" operator, "FROM _ UNIXTIME" operator, and "UNIX _ TIMESTAMP" operator.
The embodiment also provides an electronic device, which includes a processor and a memory, where the memory is used to store a computer executable program, and when the computer program is executed by the processor, the processor executes the SQL-like-based data file analysis processing method.
The electronic device is in the form of a general purpose computing device. The processor can be one or more and can work together. The invention also does not exclude that distributed processing is performed, i.e. the processors may be distributed over different physical devices. The electronic device of the present invention is not limited to a single entity, and may be a sum of a plurality of entity devices.
The memory stores a computer executable program, typically machine readable code. The computer readable program may be executed by the processor to enable an electronic device to perform the method of the invention, or at least some of the steps of the method.
The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may also be non-volatile memory, such as read-only memory (ROM).
It should be understood that elements or components not shown in the above examples may also be included in the electronic device of the present invention. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a human-computer interaction element such as a button, a keyboard, and the like. Electronic devices are considered to be covered by the present invention as long as the electronic devices are capable of executing a computer-readable program in a memory to implement the method of the present invention or at least a part of the steps of the method.
The embodiment also provides a computer-readable storage medium, which stores a computer-executable program, and when the computer-executable program is executed, the method for analyzing and processing the data file based on the SQL-like is realized.
The data file analysis processing method based on the SQL-like is coded in the mode of an executable file, the executable file and the data file to be analyzed are respectively stored in a folder of a specified path in a computer, the executable file is read from the folder of the specified path through the instruction of the operation command of the computer, an SQL statement editor pops up, and a user can analyze and process the data file by inputting the SQL-like statement in the popped SQL statement editor.
The executable file analyzes and converts the SQL-like statements input by the user into a plurality of groups of syntax operators;
the executable file calls the data file stored by the local machine according to the folder of the path of the data file written by the SQL-like statement, and controls each syntax operator to carry out operation analysis processing on the data file according to the mutual logical relationship.
The executable file of the embodiment extracts key signs based on the received SQL-like sentences;
according to the extracted key signs, performing overall SQL-like statement segmentation, and segmenting SQL clauses corresponding to each key sign into Query nodes Query;
and carrying out syntax abstraction on each group of segmented Query nodes, and converting the Query nodes into syntax operators.
The executable file of the embodiment determines the mutual logical relationship between syntax operators according to the mutual association relationship of the SQL clauses corresponding to the Query nodes Query in the SQL-like statements;
and connecting the syntax operators according to the mutual logical relationship among the syntax operators, and converting the whole SQL-like statement into an abstract syntax tree combined by a plurality of groups of syntax operators.
The executable file of the embodiment connects the registered sets of syntax operators according to the context structure of the abstract syntax tree;
calling a data text and loading the data text into an operating memory;
and each group of syntactic operators performs operation, analysis and statistical calculation on the data file according to the connection relation.
The executable file of this embodiment performs key token extraction based on the received SQL-like statements, including:
based on the SQL-like statement, extracting the key token comprises extracting the key token "SELECT" in the SELECT clause, and/or extracting the key token "FROM" in the FROM clause, and/or extracting the key token "WHERE" in the WHERE clause, and/or extracting the key token "ORDERBY" in the ORDERBY clause, and/or extracting the key token "LIMIT" in the LIMIT clause, and/or extracting the key token "COUNT" in the COUNT clause.
The syntax operator of the executable file of the embodiment includes one or more of "Fields" operator, "GroupBy" operator, "OrderBy" operator, "Where" operator, "Count" operator, "distict" operator, "Avg" operator, "Sum" operator, "Max" operator, "Min" operator, "FROM _ UNIXTIME" operator, and "UNIX _ TIMESTAMP" operator.
In the data file analyzing and processing device based on the SQL-like, an analyst only needs to input the SQL-like statements, the SQL-like statements are analyzed and converted into a plurality of sets of syntax operators, the data file to be analyzed is loaded to the memory, and the plurality of sets of syntax operators perform analysis and calculation to finally output a result.
The data file analyzing and processing device based on the SQL-like adopts a SQL-like + memory computing mode, perfectly realizes low learning cost, reduces the text data analyzing and counting cost with unified rules, and simultaneously realizes the aim of instant use.
The computer readable storage medium of the present embodiments may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
From the above description of the embodiments, those skilled in the art will readily appreciate that the present invention can be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, and electronic processing units, servers, clients, mobile phones, control units, processors, etc. included in the system. The invention may also be implemented by computer software for performing the method of the invention, e.g. control software executed by a microprocessor, an electronic control unit, a client, a server, etc. It should be noted that the computer software for executing the method of the present invention is not limited to be executed by one or a specific hardware entity, and can also be realized in a distributed manner by non-specific hardware. For computer software, the software product may be stored in a computer readable storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or may be distributed over a network, as long as it enables the electronic device to perform the method according to the present invention.
The above embodiments are only used for illustrating the invention and not for limiting the technical solutions described in the invention, and although the present invention has been described in detail in the present specification with reference to the above embodiments, the present invention is not limited to the above embodiments, and therefore, any modification or equivalent replacement of the present invention is made; all such modifications and variations are intended to be included herein within the scope of this disclosure and the appended claims.

Claims (10)

1. A data file analysis processing method based on SQL-like is characterized by comprising the following steps:
receiving SQL-like statements, and analyzing and converting the SQL-like statements into a plurality of groups of syntax operators;
and calling the data file, and carrying out operation analysis processing on the data file by each syntax operator according to the mutual logical relation.
2. The SQL-like-data-file parsing processing method according to claim 1, wherein the receiving SQL-like statements, and the parsing and converting the SQL-like statements into multiple sets of syntax operators includes:
extracting key signs based on the received SQL-like sentences;
according to the extracted key signs, performing overall SQL-like statement segmentation, and segmenting SQL clauses corresponding to each key sign into Query nodes Query;
and carrying out syntax abstraction on each group of segmented Query nodes, and converting the Query nodes into syntax operators.
3. The SQL-like data file analyzing, processing and processing method according to claim 2, wherein the logical relationship between the syntax operators is determined according to the correlation relationship of the SQL clause corresponding to each Query node Query in the SQL-like statement; and connecting the syntax operators according to the mutual logical relationship among the syntax operators, and converting the whole SQL-like statement into an abstract syntax tree combined by a plurality of groups of syntax operators.
4. The SQL-like data file analyzing, processing and processing method according to claim 3, wherein the step of calling the locally stored data file and controlling each syntax operator to perform operation, analysis and processing on the data file according to the mutual logical relationship comprises the steps of:
connecting the registered sets of syntax operators according to the context structure of the abstract syntax tree;
calling a data text and loading the data text into an operating memory;
and each group of syntactic operators performs operation, analysis and statistical calculation on the data file according to the connection relation.
5. The method according to claim 2, wherein the extracting key tokens based on the received SQL-like statements comprises:
based on the SQL-like statement, extracting the key token comprises extracting the key token "SELECT" in the SELECT clause, and/or extracting the key token "FROM" in the FROM clause, and/or extracting the key token "WHERE" in the WHERE clause, and/or extracting the key token "ORDERBY" in the ORDERBY clause, and/or extracting the key token "LIMIT" in the LIMIT clause, and/or extracting the key token "COUNT" in the COUNT clause.
6. The SQL-like data file analyzing, processing and processing method according to any one of claims 1-5, wherein the syntax operator includes one or more of "Fields" operator, "GroupBy" operator, "OrderBy" operator, "Where" operator, "Count" operator, "Distingt" operator, "Avg" operator, "Sum" operator, "Max" operator, "Min" operator, "FROM _ UNIXTIME" operator, and "UNIX _ TIMESTAMP" operator.
7. A data file analysis processing device based on SQL-like, comprising:
the syntax operator conversion module is used for receiving the SQL-like sentences and analyzing and converting the SQL-like sentences into a plurality of groups of syntax operators;
and the operator manager is used for uniformly managing and maintaining the input, output and calculation methods of all the syntax operators, calling the data files stored by the local computer, and controlling the syntax operators to carry out operation analysis processing on the data files according to the mutual logical relationship.
8. The apparatus according to claim 7, wherein the syntax operator transformation module comprises:
a token extractor for extracting key tokens based on the received SQL-like statements;
the Query segmenter is used for segmenting the whole SQL-like sentences according to the extracted key signs and segmenting the SQL clauses corresponding to each key sign into Query nodes;
and the abstract syntax tree analyzer is used for performing syntax abstraction on each group of the Query nodes Query which are segmented, converting the syntax abstraction into syntax operators, determining the mutual logical relationship among the syntax operators according to the mutual incidence relationship of the SQL clauses corresponding to the Query nodes Query in the SQL-like sentences, connecting the syntax operators according to the mutual logical relationship among the syntax operators, and converting the whole SQL-like sentences into abstract syntax trees combined by a plurality of groups of syntax operators.
9. An electronic device comprising a processor and a memory, the memory for storing a computer-executable program, characterized in that:
when the computer program is executed by the processor, the processor executes a SQL-like-data-file analysis processing method according to any of claims 1 to 6.
10. A computer-readable storage medium, storing a computer-executable program, which when executed, implements a SQL-like-data-file-analysis-processing method according to any one of claims 1 to 6.
CN202110476827.XA 2021-04-30 2021-04-30 Data file analysis processing method and device based on SQL-like and electronic equipment Pending CN113190573A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110476827.XA CN113190573A (en) 2021-04-30 2021-04-30 Data file analysis processing method and device based on SQL-like and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110476827.XA CN113190573A (en) 2021-04-30 2021-04-30 Data file analysis processing method and device based on SQL-like and electronic equipment

Publications (1)

Publication Number Publication Date
CN113190573A true CN113190573A (en) 2021-07-30

Family

ID=76980865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110476827.XA Pending CN113190573A (en) 2021-04-30 2021-04-30 Data file analysis processing method and device based on SQL-like and electronic equipment

Country Status (1)

Country Link
CN (1) CN113190573A (en)

Similar Documents

Publication Publication Date Title
US10460277B2 (en) Business intelligence language macros
US10496644B2 (en) Query plan generation and execution in a relational database management system with a temporal-relational database
US11816102B2 (en) Natural language query translation based on query graphs
CN105518676B (en) Universal SQL enhancement to query arbitrary semi-structured data and techniques to efficiently support such enhancements
US20100017395A1 (en) Apparatus and methods for transforming relational queries into multi-dimensional queries
US20050102613A1 (en) Generating a hierarchical plain-text execution plan from a database query
US11698918B2 (en) System and method for content-based data visualization using a universal knowledge graph
US20070143321A1 (en) Converting recursive hierarchical data to relational data
US10042889B2 (en) Pseudo columns for data retrieval
US20090070300A1 (en) Method for Processing Data Queries
US7085760B2 (en) Data query differential analysis
CN113297251A (en) Multi-source data retrieval method, device, equipment and storage medium
US10162603B2 (en) Loading data for iterative evaluation through SIMD registers
EP3293645B1 (en) Iterative evaluation of data through simd processor registers
Rompf et al. A SQL to C compiler in 500 lines of code
CN110580170A (en) software performance risk identification method and device
Fehily SQL
CN113190573A (en) Data file analysis processing method and device based on SQL-like and electronic equipment
Paganelli et al. Pushing ML Predictions Into DBMSs
US11036730B2 (en) Business intelligence language type representing result structure
Cybula et al. Decomposition of SBQL queries for optimal result caching
US20240028594A1 (en) Query refactoring framework
CN111190886B (en) Database access-oriented computation flow graph construction method, access method and device
Ye et al. Structured Knowledge Base Q&A System Based on TorchServe Deployment
US20230394021A1 (en) Computing similarity of tree data structures using metric functions defined on sets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination