CN114416771A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114416771A
CN114416771A CN202111613360.5A CN202111613360A CN114416771A CN 114416771 A CN114416771 A CN 114416771A CN 202111613360 A CN202111613360 A CN 202111613360A CN 114416771 A CN114416771 A CN 114416771A
Authority
CN
China
Prior art keywords
data
grammar
list information
structural body
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111613360.5A
Other languages
Chinese (zh)
Inventor
张雪岩
姜婧妍
黄杰
位凯志
古亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202111613360.5A priority Critical patent/CN114416771A/en
Publication of CN114416771A publication Critical patent/CN114416771A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Abstract

The embodiment of the invention is suitable for the technical field of data processing, and provides a data processing method, a data processing device, electronic equipment and a storage medium, wherein the data processing method comprises the following steps: analyzing the query statement to obtain an abstract syntax tree corresponding to the query statement; obtaining structural body data of each of at least two grammar keywords in the query statement from an abstract grammar tree; the structural body data represents descriptive information of a table and/or a column corresponding to the grammar keywords; obtaining list information corresponding to each grammar keyword of at least two grammar keywords based on the structural body data; the list information represents the frequency of appearance of the table and/or column corresponding to the grammar keyword in the structural body data.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
The SQL query on the database is a common operation in a business scenario, and in order to support functions of subsequent middleware, such as read-write separation, database splitting, and the like, there is a need to extract list information such as table names, column names, and the like from the SQL. At present, list information of SQL statements is extracted by parsers such as calcium, Presto and the like in the related art, and the related art can only extract list information of a surface layer and cannot extract list information of a deep layer.
Disclosure of Invention
In order to solve the above problem, embodiments of the present invention provide a data processing method, an apparatus, an electronic device, and a storage medium, so as to at least solve the problem that the related art parser can only extract list information of a surface layer.
The technical scheme of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a data processing method, where the method includes:
analyzing the query statement to obtain an abstract syntax tree corresponding to the query statement;
obtaining structural body data of each of at least two grammar keywords in the query statement from the abstract grammar tree; the structural body data represents descriptive information of a table and/or a column corresponding to the grammar keywords;
acquiring list information corresponding to each grammar keyword of the at least two grammar keywords based on the structural body data; and the list information represents the occurrence frequency of the table and/or the column corresponding to the grammar keywords in the structural body data.
In an embodiment, the obtaining structural data of each of at least two syntax keywords in the query statement from the abstract syntax tree includes:
obtaining the value of a key value pair with a key as a grammar keyword in the abstract grammar tree through character string matching;
and determining the value of the key value pair with the key in the abstract syntax tree as the corresponding structural data of the syntax key.
In an embodiment, the obtaining, based on the structural body data, list information corresponding to each of the at least two syntax keywords includes:
acquiring list information corresponding to each grammar keyword in the structural body data based on a set data structure; the list information comprises column information and/or table information corresponding to the grammar keywords; the column information and/or the table information represent the occurrence frequency of column names and/or table names corresponding to the grammar keywords.
In one embodiment, the list information represents the number of times that every two column names in the structure data occur simultaneously; the obtaining of the list information corresponding to each of the at least two grammar keywords based on the structural body data includes:
and under the condition that the structural body data comprises at least two columns, arranging and combining the column names of the at least two columns to obtain the number of times that every two column names simultaneously appear in the structural body data.
In one embodiment, the list information characterizes table join relationships in the structure data; the obtaining of the list information corresponding to each of the at least two grammar keywords based on the structural body data includes:
and under the condition that the structural body data comprises at least two connected tables, splicing the table names of the connected tables to obtain a connection statement representing the table connection relation.
In an embodiment, when the table names of the connected tables are spliced to obtain a connection statement representing a table connection relationship, the method includes:
under the condition that the at least two tables comprise a first table with a nested structure, the nested structure of the first table is released, and at least two second tables contained in the first table are obtained;
and splicing the table names of the at least two second tables with the table names of the tables connected with the first table to obtain a connection statement representing the table connection relation.
In an embodiment, when obtaining list information corresponding to each of the at least two syntax keywords based on the structural body data, the method includes:
determining an alias corresponding to a table name and a column name in the structure data;
and accumulating the list information corresponding to the alias to the list information of the list name or the column name corresponding to the alias.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, including:
the analysis module is used for analyzing the query statement to obtain an abstract syntax tree corresponding to the query statement;
a first obtaining module, configured to obtain, from the abstract syntax tree, structural body data of each of at least two syntax keywords in the query statement; the structural body data represents descriptive information of a table and/or a column corresponding to the grammar keywords;
the second acquisition module is used for acquiring list information corresponding to each grammar keyword of the at least two grammar keywords based on the structural body data; and the list information represents the occurrence frequency of the table and/or the column corresponding to the grammar keywords in the structural body data.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the steps of the data processing method provided in the first aspect of the embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, including: the computer-readable storage medium stores a computer program. Which when executed by a processor performs the steps of the data processing method as provided by the first aspect of an embodiment of the invention.
The embodiment of the invention obtains the abstract syntax tree corresponding to the query statement by analyzing the query statement, obtains the structural data of each of at least two syntax keywords in the query statement from the abstract syntax tree, and the structural data represents the descriptive information of the table and/or the column corresponding to the syntax keywords. And acquiring list information corresponding to each of the at least two grammar keywords based on the structural body data, wherein the list information represents the occurrence frequency of tables and/or columns corresponding to the grammar keywords in the structural body data. Compared with the prior art that the list information of the query statement is extracted through a parser, the method and the device can extract the deeper list information, the list information can contain and represent the relation between columns and tables related to the original SQL query to a greater extent, and the list information can be used as statistical information and provided for a plurality of service scenes.
Drawings
Fig. 1 is a schematic flow chart illustrating an implementation of a data processing method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another implementation of a data processing method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of another implementation of a data processing method according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart illustrating another implementation of a data processing method according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a data processing flow according to an embodiment of the present invention;
FIG. 6 is a diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Data queries on databases are common operations in business scenarios, such as Structured Query Language (SQL) queries, which is a standard computer Language for accessing and processing databases. In order to support functions of subsequent middleware, such as read-write separation, library division and table division, there is a need to extract relevant list information such as table names and library names from SQL. At present, mainstream databases include Hive, MySQL, Oracle and the like, different open source or paid analyzers are used for different databases, analysis results are different, results are not all syntax trees, and different results are returned according to subsequent service needs. The resolver comprises calcite, SQLParser and the like.
In the related art, the list information of the SQL query statement is extracted by the parser, the parser can only extract the list information of the SQL query statement surface layer, and the included list information is simple and cannot meet the service requirements of part of users.
In view of the above disadvantages of the related art, embodiments of the present invention provide a data processing method, which can obtain deep list information of query statements. In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Fig. 1 is a schematic flow chart illustrating an implementation process of a data processing method according to an embodiment of the present invention, where an execution subject of the data processing method is an electronic device, and the electronic device includes a desktop computer, a notebook computer, a server, and the like. Referring to fig. 1, the data processing method includes:
s101, analyzing the query statement to obtain an abstract syntax tree corresponding to the query statement.
Here, the query statement may be any programming language that includes query information and can be abstracted as an abstract syntax tree. For example, the query may be SQL query statements including SELECT, FROM, WHERE, GROUP BY, and WITH.
For example, "select from tb _ userwher userid > 10; "is an SQL query statement, meaning that all records with userid greater than 10 are screened from table tb _ user.
An Abstract Syntax Tree (AST) is an Abstract representation of the source code Syntax structure. It represents the syntactic structure of the programming language in the form of a tree, each node on the tree representing a structure in the source code.
Taking SQL query sentences as an example, any SQL parser is selected to process the SQL query sentences to obtain parsing results, the parsing results are abstract syntax trees, and each SQL syntax keyword is a node of the abstract syntax trees. The SQL parser (SQLParser) functions to parse SQL query statements according to SQL syntax rules, converting SQL text into an abstract syntax tree.
Specifically, a query sentence is first subjected to lexical analysis to perform word segmentation, and then is subjected to syntactic analysis by using a syntactic parser to form an abstract syntax tree. The word segmentation is to divide the character string of the query sentence into an array of minimum syntax units, and to establish the relationship between the analysis syntax units on the basis of the word segmentation, so as to generate a tree representing the syntax structure of the program, which is called an abstract syntax tree.
Abstract syntax trees are typically stored in JSON format, which is a data transport format that presents a concise and clear hierarchy for storing and representing data in text format. The JSON data may be parsed into a dictionary or other data structure in the python programming language.
S102, obtaining structural body data of each grammar keyword of at least two grammar keywords in the query statement from the abstract grammar tree; the structural data represents descriptive information of tables and/or columns corresponding to the grammar keywords.
Each grammar keyword in the query statement is a node of an abstract grammar tree, each node on the abstract grammar tree represents a structure in the source code, and structural body data of each grammar keyword of at least two grammar keywords in the query statement is obtained from the abstract grammar tree. The structure data is also a grammar tree, and the structure data stores the relational description information of the columns or tables corresponding to the grammar keywords. The abstract syntax tree may also be stored in JSON format.
A certain data structure can be selected to process the abstract syntax tree, and the structural body data of each syntax keyword can be obtained. Such as dictionary data structures and queue data structures, etc.
Referring to fig. 2, in an embodiment, the obtaining structural data of each of at least two syntax keywords in the query statement from the abstract syntax tree includes:
s201, obtaining the value of the key value pair with the key as the grammar keyword in the abstract grammar tree through character string matching.
S202, determining the value of the key value pair with the key as the grammar keyword in the abstract grammar tree as the structural body data of the corresponding grammar keyword.
In the embodiment of the invention, a dictionary data structure is taken as an example, and values of key value pairs taking keys as grammar keywords in an abstract grammar tree are extracted from the outermost layer of the abstract grammar tree in a character string matching mode to be used as structural body data corresponding to the grammar keywords.
A dictionary is a data structure of the python programming language, consisting of a structure of "keys and values", where values can take any data type, but the keys must be immutable, such as a string, number, or tuple.
In addition, the abstract syntax tree can be stripped layer BY adopting a data structure of the queue, all structural bodies corresponding to each syntax keyword key are stored, all structural bodies corresponding to the SELECT syntax keyword and the FROM syntax keyword are finally obtained, and all corresponding structural bodies can be obtained BY other syntax keywords such as WHERE, GROUP BY, and the like in the same way.
S103, acquiring list information corresponding to each grammar keyword of the at least two grammar keywords based on the structural body data; and the list information represents the occurrence frequency of the table and/or the column corresponding to the grammar keywords in the structural body data.
Extracting list information in each structural body data from the structural body data obtained in the above step, wherein the list information includes: the number of times of appearance of the same column name, the number of times of appearance of the same table name, the number of times of simultaneous appearance of a plurality of column names, connection information between tables, and the like in the same structure data.
In an embodiment, the obtaining, based on the structural body data, list information corresponding to each of the at least two syntax keywords includes:
acquiring list information corresponding to each grammar keyword in the structural body data based on a set data structure; the list information comprises column information and/or table information corresponding to the grammar keywords; the column information and/or the table information represent the occurrence frequency of column names and/or table names corresponding to the grammar keywords.
Here, the setting data structure may be a dictionary data structure or a queue data structure.
In an SQL query statement, structural body data corresponding to the syntax keywords of SELECT, WHERE, GROUP BY and WITH only contain column information and only relate to extraction of column names; the structure body corresponding to the FROM grammar key only contains table information and only relates to the extraction of table names. The structure data can be stripped layer by adopting a data structure of the queue from the outermost layer of the structure data, all column names or table names in the structure data are stored, and the times of the appearance of the same column name and the same table name are counted at the same time.
Referring to fig. 3, in an embodiment, when obtaining list information corresponding to each of the at least two syntax keywords based on the structure data, the method includes:
s301, an alias corresponding to the table name and the column name in the structure data is determined.
S302, the list information corresponding to the alias is accumulated on the list information of the list name or the column name corresponding to the alias.
In some cases, the original SQL query statement may alias tables or columns, and at this time, the abstract syntax tree parsed by the SQL query statement may also alias operations, for example:
"operand1":"REVENUE",
"operand0":{"operand0":"V_REVENUE","operator":"SUM"},
"operator":"AS"
the above expression establishes an alias name for SUM (V _ REVENUE), which is called REVENUE. In the subsequent SQL query statement, the operation on the REVENUE column is actually the operation on the SUM (V _ revenuee), so that the list information of the REVENUE is accumulated on the original column name V _ revenuee, and finally the number of times that the REVENUE column is actually referred to in each syntax keyword structure, that is, the list information, is obtained.
For each abstract syntax tree, the alias of the outer layer can be corresponding to the original table name or the original column name of the inner layer through the AS key word, and the result of identifying the alias is achieved. And accumulating the list information of the alias to the list information of the original name so as to obtain the real list information. So far, the times statistical information of all single tables and single columns in all SQL queries is obtained.
In practical application, the open-source alias identification tool can be used to obtain the corresponding relation between the alias and the original name in the SQL query statement, and then the above method is used to obtain the real list information.
In one embodiment, the list information represents the number of times that every two column names in the structure data occur simultaneously; the obtaining of the list information corresponding to each of the at least two grammar keywords based on the structural body data includes:
and under the condition that the structural body data comprises at least two columns, arranging and combining the column names of the at least two columns to obtain the number of times that every two column names simultaneously appear in the structural body data.
When two or more columns exist in one structural body data, the column names are combined pairwise through a permutation and combination mode and the like, and therefore the frequency statistical information of each group of columns which appear pairwise simultaneously in the structural body data is obtained. The information can be used for prediction, common data query models are built in advance, and the like.
The structural body data corresponding to the FROM grammar keyword is column information after the "on" node, namely, a column condition of table connection. When the alias of the column is related, the alias of the outer layer and the original column name of the inner layer are corresponding through the key word of the AS, and the list information of the alias is accumulated on the list information of the original name to obtain the number statistical information of all columns which appear in all SQL queries simultaneously pairwise.
For example, in one SQL statement: SELECT col1, col2, col3 FROM tab 1.
Col1, col2 and col3 were considered to occur simultaneously in a plurality of columns, and col1, col2 and col3 were considered to occur simultaneously once. The permutation and combination refers to two columns of col1, col2, col1, col3, col2 and col3, and in other SQL statements, the number of simultaneous occurrences is checked on the basis of the two columns.
The purpose of calculating the number of times, for example, when the number of simultaneous occurrences is large, it can be considered that col1, col2, and col3 have strong correlation in traffic. In addition, in the field of data warehouse, when SQL sentences are used for query, frequently related query groups can be constructed in advance, and query results can be obtained in advance, so that time can be saved.
In one embodiment, the list information characterizes table join relationships in the structure data; the obtaining of the list information corresponding to each of the at least two grammar keywords based on the structural body data includes:
and under the condition that the structural body data comprises at least two connected tables, splicing the table names of the connected tables to obtain a connection statement representing the table connection relation.
For example, in one SQL query statement: SELECT col FROM tab1 JOIN tab2, showing the connection of the two tables tab1 and tab 2. SELECT col FROM tab1 JOIN tab2 JOIN tab3, showing the connection of the three tables tab1, tab2 and tab 3.
And under the condition that one structural body data comprises a plurality of tables which are connected, the connected indications are spliced to obtain a statement containing JOIN for indicating the connection mode.
Referring to fig. 4, in an embodiment, when table names of the connected tables are spliced to obtain a connection statement representing a table connection relationship, the method includes:
s401, under the condition that the at least two tables comprise the first table with the nested structure, the nested structure of the first table is removed, and at least two second tables contained in the first table are obtained.
S402, splicing the table names of the at least two second tables with the table names of the tables connected with the first table to obtain a connection statement representing the table connection relation.
For example, table 1 and table 2 are linked to form a new table 3, and table 3 is further linked to table 4, and this structure is a nested structure.
The at least two tables comprise a first table with a nested structure, the first table is obtained by connecting the at least two second tables, and the nested structure of the first table is removed to obtain the connection relation of the at least two second tables. And then splicing the table names of the at least two second tables and the tables connected with the first table to obtain a statement containing JOIN for representing the connection mode.
According to the method, the original SQL query information can be obtained, and the original SQL query information comprises the times information of single tables and single columns after different grammar keywords, the times information of multiple columns after different grammar keywords and the connection information between the tables, and the list information is stored by using data structures such as dictionaries, JSONs and the like.
The list information can be provided for a plurality of common service scenes for use, for example, in the field of data warehouses, when a query mode is modeled, model establishment can be carried out based on a table connection mode in the list information; when the data cube is built, the data cube can be built based on the information that the plurality of columns in the list information appear simultaneously.
The embodiment of the invention obtains the abstract syntax tree corresponding to the query statement by analyzing the query statement, obtains the structural data of each of at least two syntax keywords in the query statement from the abstract syntax tree, and the structural data represents the descriptive information of the table and/or the column corresponding to the syntax keywords. And obtaining list information corresponding to each grammar keyword of the at least two grammar keywords based on the structural body data, wherein the list information represents the occurrence frequency of the table and/or the column corresponding to the grammar keyword in the structural body data. Compared with the prior art that the list information of the query statement is extracted through a parser, the method and the device can extract the deeper list information, the list information can contain and represent the relation between columns and tables related to the original SQL query to a greater extent, and the list information can be used as statistical information and provided for a plurality of service scenes.
Referring to fig. 5, fig. 5 is a schematic diagram of a data processing flow provided by an application embodiment of the present invention, where the data processing flow includes:
firstly, an abstract syntax tree is obtained by converting SQL sentences, any SQL parser can be selected to process SQL query sentences to obtain parsing results, the parsing results are the abstract syntax tree, and each SQL syntax keyword is a node of the abstract syntax tree. The SQL Parser (SQL Parser) functions to parse SQL query statements according to SQL syntax rules and convert SQL text into an abstract syntax tree.
Then, the structural data corresponding to each grammar keyword is extracted, and a certain data structure can be selected to process the abstract grammar tree, so as to obtain the structural data of each grammar keyword. Such as dictionary data structures and queue data structures, etc. The structure data is also a grammar tree, and the structure data stores the relational description information of the columns or tables corresponding to the grammar keywords. The abstract syntax tree may also be stored in JSON format.
And finally, extracting list information of the structure data to obtain single-list occurrence frequency statistical information, single-column occurrence frequency statistical information, multi-column simultaneous occurrence frequency statistical information and multi-list connection relation information. The above list information is stored using a data structure such as a dictionary, JSON, or the like.
Compared with the prior art that the list information is acquired through the analyzer, the application embodiment of the invention can extract deep list information, can greatly contain and represent the relation between columns and tables related to the original SQL query, can be used as statistical information for a plurality of service scenes, for example, in the field of data warehouses, and can perform model establishment based on the table connection mode in the list information when modeling a query mode; when the data cube is built, the data cube can be built based on the information that the plurality of columns in the list information appear simultaneously.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The technical means described in the embodiments of the present invention may be arbitrarily combined without conflict.
In addition, in the embodiments of the present invention, "first", "second", and the like are used for distinguishing similar objects, and are not necessarily used for describing a specific order or a sequential order.
Referring to fig. 6, fig. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention, as shown in fig. 6, the apparatus includes: the device comprises an analysis module, a first acquisition module and a second acquisition module.
The analysis module is used for analyzing the query statement to obtain an abstract syntax tree corresponding to the query statement;
a first obtaining module, configured to obtain, from the abstract syntax tree, structural body data of each of at least two syntax keywords in the query statement; the structural body data represents descriptive information of a table and/or a column corresponding to the grammar keywords;
a second obtaining module, configured to obtain, based on the structural body data, list information corresponding to each of the at least two syntax keywords; and the list information represents the occurrence frequency of the table and/or the column corresponding to the grammar keywords in the structural body data.
In an embodiment, when the obtaining module obtains the structural data of each of the at least two syntax keywords in the query statement from the abstract syntax tree, the obtaining module is configured to:
obtaining the value of a key value pair with a key as a grammar keyword in the abstract grammar tree through character string matching;
and determining the value of the key value pair with the key in the abstract syntax tree as the corresponding structural data of the syntax key.
In an embodiment, the second obtaining module, when obtaining the list information corresponding to each of the at least two syntax keywords based on the structure data, is configured to:
and acquiring column information and table information in the structural body data based on a set data structure to obtain the occurrence frequency of all column names and table names in the structural body data.
In an embodiment, the second obtaining module, when obtaining the list information corresponding to each of the at least two syntax keywords based on the structure data, is configured to:
and under the condition that the structural body data comprises at least two columns, arranging and combining the column names of the at least two columns to obtain the number of times that every two column names simultaneously appear in the structural body data.
In an embodiment, the second obtaining module, when obtaining the list information corresponding to each of the at least two syntax keywords based on the structure data, is configured to:
and under the condition that the structural body data comprises at least two connected tables, splicing the table names of the connected tables to obtain a connection statement representing the table connection relation.
In an embodiment, the second obtaining module is further configured to:
under the condition that the at least two tables comprise a first table with a nested structure, the nested structure of the first table is released, and at least two second tables contained in the first table are obtained;
and splicing the table names of the at least two second tables with the table names of the tables connected with the first table to obtain a connection statement representing the table connection relation.
In an embodiment, the second obtaining module, when obtaining the list information corresponding to each of the at least two syntax keywords based on the structure data, is further configured to:
determining an alias corresponding to a table name and a column name in the structure data;
and accumulating the list information corresponding to the alias to the list information of the list name or the column name corresponding to the alias.
In practical applications, the parsing module, the obtaining module and the second obtaining module may be implemented by a Processor in an electronic device, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Micro Control Unit (MCU), or a Programmable Gate Array (FPGA).
It should be noted that: in the data processing apparatus provided in the above embodiment, when performing data processing, only the division of the above modules is exemplified, and in practical applications, the processing may be distributed to different modules as needed, that is, the internal structure of the apparatus may be divided into different modules to complete all or part of the processing described above. In addition, the data processing apparatus and the data processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Based on the hardware implementation of the program module, in order to implement the method of the embodiment of the present application, an embodiment of the present application further provides an electronic device. Fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application, and as shown in fig. 7, the electronic device includes:
the communication interface can carry out information interaction with other equipment such as network equipment and the like;
and the processor is connected with the communication interface to realize information interaction with other equipment, and is used for executing the method provided by one or more technical schemes on the electronic equipment side when running a computer program. And the computer program is stored on the memory.
Of course, in practice, the various components in an electronic device are coupled together by a bus system. It will be appreciated that a bus system is used to enable communications among the components. The bus system includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as a bus system in fig. 7.
The memory in the embodiments of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program for operating on an electronic device.
It will be appreciated that the memory can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memories described in the embodiments of the present application are intended to comprise, without being limited to, these and any other suitable types of memory.
The method disclosed in the embodiments of the present application may be applied to a processor, or may be implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in a memory where a processor reads the programs in the memory and in combination with its hardware performs the steps of the method as previously described.
Optionally, when the processor executes the program, the corresponding process implemented by the electronic device in each method of the embodiment of the present application is implemented, and for brevity, no further description is given here.
In an exemplary embodiment, the present application further provides a storage medium, specifically a computer storage medium, for example, a first memory storing a computer program, where the computer program is executable by a processor of an electronic device to perform the steps of the foregoing method. The computer readable storage medium may be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The technical means described in the embodiments of the present application may be arbitrarily combined without conflict.
In addition, in the examples of the present application, "first", "second", and the like are used for distinguishing similar objects, and are not necessarily used for describing a specific order or a sequential order.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of data processing, the method comprising:
analyzing the query statement to obtain an abstract syntax tree corresponding to the query statement;
obtaining structural body data of each of at least two grammar keywords in the query statement from the abstract grammar tree; the structural body data represents descriptive information of a table and/or a column corresponding to the grammar keywords;
acquiring list information corresponding to each grammar keyword of the at least two grammar keywords based on the structural body data; and the list information represents the occurrence frequency of the table and/or the column corresponding to the grammar keywords in the structural body data.
2. The method of claim 1, wherein obtaining structural data for each of at least two syntax keywords in the query statement from the abstract syntax tree comprises:
obtaining the value of a key value pair with a key as a grammar keyword in the abstract grammar tree through character string matching;
and determining the value of the key value pair with the key in the abstract syntax tree as the corresponding structural data of the syntax key.
3. The method according to claim 1, wherein the obtaining list information corresponding to each of the at least two syntax keywords based on the structure data comprises:
acquiring list information corresponding to each grammar keyword in the structural body data based on a set data structure; the list information comprises column information and/or table information corresponding to the grammar keywords; the column information and/or the table information represent the occurrence frequency of column names and/or table names corresponding to the grammar keywords.
4. The method according to claim 1, wherein the list information characterizes the number of times each two column names in the structure data occur simultaneously; the obtaining of the list information corresponding to each of the at least two grammar keywords based on the structural body data includes:
and under the condition that the structural body data comprises at least two columns, arranging and combining the column names of the at least two columns to obtain the number of times that every two column names simultaneously appear in the structural body data.
5. The method of claim 1, wherein the list information characterizes table join relationships in the structure data; the obtaining of the list information corresponding to each of the at least two grammar keywords based on the structural body data includes:
and under the condition that the structural body data comprises at least two connected tables, splicing the table names of the connected tables to obtain a connection statement representing the table connection relation.
6. The method according to claim 5, wherein when the table names of the connected tables are spliced to obtain a connection statement representing a table connection relationship, the method comprises:
under the condition that the at least two tables comprise a first table with a nested structure, the nested structure of the first table is released, and at least two second tables contained in the first table are obtained;
and splicing the table names of the at least two second tables with the table names of the tables connected with the first table to obtain a connection statement representing the table connection relation.
7. The method according to any one of claims 1 to 6, wherein when obtaining the list information corresponding to each of the at least two syntax keywords based on the structure data, the method comprises:
determining an alias corresponding to a table name and a column name in the structure data;
and accumulating the list information corresponding to the alias to the list information of the list name or the column name corresponding to the alias.
8. A data processing apparatus, comprising:
the analysis module is used for analyzing the query statement to obtain an abstract syntax tree corresponding to the query statement;
a first obtaining module, configured to obtain, from the abstract syntax tree, structural body data of each of at least two syntax keywords in the query statement; the structural body data represents descriptive information of a table and/or a column corresponding to the grammar keywords;
a second obtaining module, configured to obtain, based on the structural body data, list information corresponding to each of the at least two syntax keywords; and the list information represents the occurrence frequency of the table and/or the column corresponding to the grammar keywords in the structural body data.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the data processing method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the data processing method according to any one of claims 1 to 7.
CN202111613360.5A 2021-12-27 2021-12-27 Data processing method and device, electronic equipment and storage medium Pending CN114416771A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111613360.5A CN114416771A (en) 2021-12-27 2021-12-27 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111613360.5A CN114416771A (en) 2021-12-27 2021-12-27 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114416771A true CN114416771A (en) 2022-04-29

Family

ID=81269100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111613360.5A Pending CN114416771A (en) 2021-12-27 2021-12-27 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114416771A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115630085A (en) * 2022-12-02 2023-01-20 天津南大通用数据技术股份有限公司 Database variable parameter scope control method, device and equipment
CN116991877A (en) * 2023-09-25 2023-11-03 城云科技(中国)有限公司 Method, device and application for generating structured query statement

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115630085A (en) * 2022-12-02 2023-01-20 天津南大通用数据技术股份有限公司 Database variable parameter scope control method, device and equipment
CN115630085B (en) * 2022-12-02 2023-03-28 天津南大通用数据技术股份有限公司 Database variable parameter scope control method, device and equipment
CN116991877A (en) * 2023-09-25 2023-11-03 城云科技(中国)有限公司 Method, device and application for generating structured query statement
CN116991877B (en) * 2023-09-25 2024-01-02 城云科技(中国)有限公司 Method, device and application for generating structured query statement

Similar Documents

Publication Publication Date Title
CN111522816B (en) Data processing method, device, terminal and medium based on database engine
Gildea et al. The necessity of parsing for predicate argument recognition
KR102237702B1 (en) Entity relationship data generating method, apparatus, equipment and storage medium
KR101120798B1 (en) Method and apparatus for identifying semantic structures from text
CN110276071B (en) Text matching method and device, computer equipment and storage medium
CN109408811B (en) Data processing method and server
CN113032362B (en) Data blood edge analysis method, device, electronic equipment and storage medium
CN114416771A (en) Data processing method and device, electronic equipment and storage medium
CN110222194B (en) Data chart generation method based on natural language processing and related device
US9182947B2 (en) Program source code navigation
US7555428B1 (en) System and method for identifying compounds through iterative analysis
US9940355B2 (en) Providing answers to questions having both rankable and probabilistic components
JP2012212422A (en) Information processor, information processing method, and program
CN116483850A (en) Data processing method, device, equipment and medium
CN112687403B (en) Medicine dictionary generation and medicine search method and device
CN116383412B (en) Functional point amplification method and system based on knowledge graph
Butler et al. INVocD: Identifier name vocabulary dataset
Handler et al. Relational summarization for corpus analysis
KR20120070713A (en) Method for indexing natural language and mathematical formula, apparatus and computer-readable recording medium with program therefor
CN113779200A (en) Target industry word stock generation method, processor and device
CN111370083B (en) Text structuring method and device
CN111221846B (en) Automatic translation method and device for SQL sentences
KR20190140668A (en) The korean morpheme analyzer using user defined morpheme and the method of the same
CN110362579B (en) Information processing method and electronic equipment
CN115934751A (en) Data retrieval method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination