CN112270175A - ANTLR-based complex report formula analysis method and device - Google Patents

ANTLR-based complex report formula analysis method and device Download PDF

Info

Publication number
CN112270175A
CN112270175A CN202011178242.1A CN202011178242A CN112270175A CN 112270175 A CN112270175 A CN 112270175A CN 202011178242 A CN202011178242 A CN 202011178242A CN 112270175 A CN112270175 A CN 112270175A
Authority
CN
China
Prior art keywords
antlr
formula
lexical
defining
grammar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011178242.1A
Other languages
Chinese (zh)
Inventor
张艳清
张达
巩亚辉
赖文文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sefon Software Co Ltd
Original Assignee
Chengdu Sefon Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sefon Software Co Ltd filed Critical Chengdu Sefon Software Co Ltd
Priority to CN202011178242.1A priority Critical patent/CN112270175A/en
Publication of CN112270175A publication Critical patent/CN112270175A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an ANTLR-based complex report formula analysis method and device, which mainly solve the problems of insufficient expression writing flexibility, insufficient logic concentration and slightly insufficient performance of the existing analysis mode in the prior art. The method for analyzing the complex report formula based on the ANTLR comprises the following steps: defining a lexical method and declaring legal character sequences in a language; defining grammar, and defining different grammar structures in the language; inputting the defined lexical and the defined grammar into ANTLR; and (4) inputting the formula to be analyzed into the ANTLR obtained in the step S3 to obtain a syntax tree, and traversing the syntax tree to obtain an analysis result. Through the scheme, the invention achieves the purpose of high flexibility, expansibility and analytic performance of the analytic formula, and has very high practical value and popularization value.

Description

ANTLR-based complex report formula analysis method and device
Technical Field
The invention relates to the field of data query engines, in particular to an ANTLR-based complex report formula analysis method and device.
Background
The analysis of the complex report necessarily involves the analysis of a formula in the complex report, and the analysis of the formula in the complex report is based on the syntax of the Aviator, but the syntax of the Aviator is limited, and the complex report is not a complete language but only a small set of languages; the extension point is a custom function.
The existing treatment mode has two defects: firstly, the self-defined function can only support the function writing method, such as function (arg0, arg1 …), and the flexibility of the expression writing method is insufficient; secondly, the user-defined functions are realized through inheritance, each user-defined function needs an implementation class, the structural definition of the expression is scattered at each part of the source code, the logic is not concentrated enough, the implementation bottom layer of the Aviator compiles the expression into byte codes and sends the byte codes to the JVM for execution, the performance is slightly insufficient under the condition that the logic of the expression is complex, the complex formulas in the complex report are difficult to analyze, and the complex formulas need to be checked one by one, so that the working efficiency and the progress are influenced.
Disclosure of Invention
The invention aims to provide an ANTLR-based complex report formula analysis method and device, which are used for solving the problems that the expression writing method is not flexible enough, the logic is not concentrated enough and the performance is slightly insufficient in the conventional analysis mode, and the complex formula in the complex report is difficult to analyze.
In order to solve the above problems, the present invention provides the following technical solutions:
an ANTLR-based complex report formula analysis method comprises the following steps:
s1, defining a lexical method and declaring legal character sequences in the language;
s2, defining a grammar and defining different syntactic structures in the language;
s3, inputting the lexical method defined in the step S1 and the grammar defined in the step S2 into ANTLR;
and S4, inputting the formula to be analyzed into the ANTLR obtained in the step S3 to obtain a syntax tree, and traversing the syntax tree to obtain an analysis result.
The existing complex report company analyzes and uses Aviator which is built with a plurality of functions and expression structures, provides support for most operators to be used after opening the box, and comprises arithmetic operators, relational operators, logic operators, bit operators, regular matching operators and ternary expressions, and supports the priority and the bracketing mandatory priority of the operators, so that the threshold for using the complex report company is low, and meanwhile, the user can self-define the functions according to the scene, and has certain flexibility; but also has the problems of insufficient flexibility of expression writing method, insufficient logic concentration and slightly insufficient performance, and aims at the problems.
According to the scheme, lexical and grammatical files are freely compiled according to the rules of the ANTLR, the ANTLR generates a lexical analyzer and a grammatical analyzer, the grammar parsing logic is customized according to the analyzer, the flexibility and the expansibility of formula compiling are provided, only the lexical or grammatical files need to be modified if the language rules are adjusted or added in the later period, the parser is regenerated again and the customized logic is supplemented, the parsing calculation of various formulas is supported, various formula calculation scenes can be met, and the method has the characteristics of flexible formula compiling, high parsing performance and the like.
Further, the lexical rule defined in step S1 is: defining the lexical method as key: value, key is the name of legal character sequence, value is character string and regular expression; value may also refer to other legal rules.
Further, in step S2, the grammar definition rule is: defining the grammar as key: expression1 expression2 …, key is the name of the grammar and expression is the combination of the lexical and grammatical names.
Furthermore, the expressons express the priority in the order of precedence, and the expresson priority declared first is high.
Further, in step S3, a lexical analyzer and a syntax analyzer are generated by inputting the lexical defined in step S1 and the syntax defined in step S2 into ANTLR.
Further, in step S3, the lexical method defined in step S1 and the grammar defined in step S2 are input into ANTLR to generate specific computation logic for implementing different grammar structures according to semantics.
Further, the specific process of step S4 is:
s401, splitting a formula to be analyzed into character sequence streams through a lexical analyzer;
s402, generating a syntax tree by the character sequence flow of the step S401 through a syntax analyzer;
and S403, carrying out hierarchy traversal on the syntax tree in the step S402 by using the custom logic to obtain an analysis result of the formula.
An ANTLR-based complex report formula analysis device comprises
A memory: for storing executable instructions;
a processor: the method is used for executing the executable instructions stored in the memory and realizing the complicated report formula analysis method based on the ANTLR.
Compared with the prior art, the invention has the following beneficial effects:
(1) the lexical and grammatical files are freely compiled according to the rules of the ANTLR, the ANTLR generates the lexical analyzer and the grammatical analyzer, the flexibility and the expansibility of formula compiling are provided according to the self-defined grammar parsing logic of the analyzer, only the lexical or grammatical files need to be modified if the language rules are adjusted or added in the later period, the analyzer is repeatedly generated and the self-defined logic is supplemented, the parsing calculation of various formulas is supported, various formula calculation scenes can be met, and the method has the characteristics of flexible formula compiling, high parsing performance and the like.
(2) The invention can analyze complex formulas in various complex reports, has extremely wide application range, simultaneously improves the capability of the processor for processing and analyzing the complex reports, has quicker operation, solves the difficulties of character string analysis, grammar analysis, formula calculation and the like, prevents the condition that part of complex formulas are difficult to analyze, ensures the accuracy and comprehensiveness of the analysis, accelerates the rate of report analysis, and ensures the progress of the part of work.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the following embodiments, which should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making creative efforts shall fall within the protection scope of the present invention.
Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.
The lexical method comprises the following steps: refers to a series of rules that contain all possible character sequences of a language, which are typically described using regular expressions.
Grammar: a set of rules that refer to how to generate language-efficient strings using the lexical meaning of a language, do not describe the meaning of the strings, nor what they can do in any context, but only their form.
A lexical analyzer: typically occurring as a component of a compiler or interpreter, functions to separate individual words from an input character stream according to a lexical approach for use by a parser.
A syntax analyzer: usually as a component of a compiler or interpreter that performs syntax checking and constructs data structures consisting of the input words, typically hierarchical data structures such as abstract syntax trees.
Abstract syntax tree: the syntactic structure of the language is represented in the form of a tree, each node on the tree representing a structure in the grammar.
ANTLR: an open source software that automatically generates lexical and syntactic parsers from user-defined grammar files and processes input text into a visual grammar tree.
The Aviator is an expression evaluation engine realized by high-performance and lightweight Java language and is mainly used for the dynamic evaluation of various expressions.
Example 1
An ANTLR-based complex report formula analysis method comprises the following steps:
s1, defining a lexical method and declaring legal character sequences in the language;
s2, defining a grammar and defining different syntactic structures in the language;
s3, inputting the lexical method defined in the step S1 and the grammar defined in the step S2 into ANTLR;
and S4, inputting the formula to be analyzed into the ANTLR obtained in the step S3 to obtain a syntax tree, and traversing the syntax tree to obtain an analysis result.
According to the method, the syntax analysis is carried out on the calculation formula, the syntax tree is generated, the hierarchy traversal is carried out on the syntax tree, different self-defined logics are executed on different syntax structures, flexible syntax expansion is provided, the method can be applied to the function calculation scene of the complex report, and the problem of the analysis of the cell calculation formula in the complex report is solved; the invention provides the analytic calculation capability of the complex report calculation formula, comprises basic arithmetic and logic formulas and complex cell global search or offset related formulas, meets various formula calculation scenes, and has the advantages of flexible formula compiling, high analytic performance and the like.
Example 2
In this embodiment, based on embodiment 1, the lexical rule defined in step S1 is: defining the lexical method as key: value, key is the name of the legal character sequence, value is the character string and regular expression.
Example 3
In this embodiment, based on embodiment 1, the syntax defining rule of step S2 is: defining the grammar as key: the method comprises the following steps that (1) an expression1 | expression2 | …, key is a grammar name, the expression is a combination of a lexical name and a grammar name, the expression expresses priority according to the sequence, and the expression priority declared first is high; the plurality of expressions are formed by' segmentation, have a fixed sequence, and when a character string expression is analyzed, the expressions are sequentially applied to the expression analysis in sequence to form a syntax tree, wherein the priority is high, namely the priority is preferentially applied to the syntax analysis of the expression.
Example 4
The present embodiment further provides on the basis of embodiment 1 that in step S3, the lexical analyzer and the syntactic analyzer are generated by inputting the lexical defined in step S1 and the syntactic defined in step S2 into ANTLR, and specific computation logic is implemented for different syntactic structures according to semantics.
Example 5
In this embodiment, based on embodiment 4, the specific process of step S4 is as follows:
s401, splitting a formula to be analyzed into character sequence streams through a lexical analyzer;
s402, generating a syntax tree by the character sequence flow of the step S401 through a syntax analyzer;
and S403, carrying out hierarchy traversal on the syntax tree in the step S402 by using the custom logic to obtain an analysis result of the formula.
Example 6
In this embodiment, on the basis of embodiment 1, an ANTLR-based complex report formula analyzing apparatus includes
A memory: for storing executable instructions;
a processor: the method is used for executing the executable instructions stored in the memory and realizing the complicated report formula analysis method based on the ANTLR.
Example 7
Further to the embodiment 1, in the complex report, most cells only specify the database field, and the data is loaded during the derivation of the complex report, the database field defined in one cell generates a corresponding number of cells after the data is loaded, which is troublesome when defining the calculation formula for the cells, such as calculating annual sales volume ring ratio, a1 cell representing year, B1 cell representing sales volume, C1 cell representing annual ring ratio, C1 having a calculation formula defined as if (& a1 > 1, B1-B1 [ a1: -1],'), which indicates that the current year is not the first year when the value of the current year (a 1) is greater than 1 in the database sequence number, returning the sales volume corresponding to the current year minus the sales volume corresponding to the year having the database sequence number one year later, i.e., annual ring ratio, otherwise empty strings are returned.
The invention freely compiles the lexical and grammatical files according to the rules of ANTLR, generates the lexical analyzer and the grammar analyzer code by the ANTLR, provides the flexibility and the expansibility of formula compiling according to the self-defined grammar analyzing logic of the analyzer, and only needs to modify the lexical or grammatical files, regenerate the analyzer and supplement the self-defined logic if the language rules are adjusted or added in the later period.
The invention solves the analysis problem of the cell calculation formula in the complex report, and solves the difficulties of character string analysis, grammar analysis, formula calculation and the like by designing the lexical method and grammar of the ANTLR and the self-defined grammar tree analysis logic.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The above-described apparatus embodiments are merely illustrative. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. An ANTLR-based complex report formula parsing method is characterized by comprising the following steps:
s1, defining a lexical method and declaring legal character sequences in the language;
s2, defining a grammar and defining different syntactic structures in the language;
s3, inputting the lexical method defined in the step S1 and the grammar defined in the step S2 into ANTLR;
and S4, inputting the formula to be analyzed into the ANTLR obtained in the step S3 to obtain a syntax tree, and traversing the syntax tree to obtain an analysis result.
2. The ANTLR-based complex report formula parsing method as recited in claim 1, wherein the lexical rules defined in step S1 are: defining the lexical method as key: value, key is the name of the legal character sequence, value is the character string and regular expression.
3. The ANTLR-based parsing method for complex report form formulas as recited in claim 1, wherein the syntax defining rule of step S2 is: defining the grammar as key: expression1 expression2 …, key is the name of the grammar and expression is the combination of the lexical and grammatical names.
4. The ANTLR-based complex report formula parsing method as recited in claim 3, wherein the expressions express priorities in a sequential order, and an expression priority declared first is higher.
5. The method of claim 1, wherein the lexical analyzer and the syntactic parser are generated by inputting the lexical method defined in step S1 and the syntactic method defined in step S2 into ANTLR in step S3.
6. The method for parsing a complex report formula according to claim 5, wherein the lexical method defined in step S1 and the syntax defined in step S2 are inputted into ANTLR in step S3 to generate specific computation logic for different syntax structures according to semantics.
7. The ANTLR-based complex report form formula parsing method as recited in claim 6, wherein the specific process of the step S4 is as follows:
s401, splitting a formula to be analyzed into character sequence streams through a lexical analyzer;
s402, generating a syntax tree by the character sequence flow of the step S401 through a syntax analyzer;
and S403, carrying out hierarchy traversal on the syntax tree in the step S402 by using the custom logic to obtain an analysis result of the formula.
8. An ANTLR-based complex report formula analysis device is characterized by comprising
A memory: for storing executable instructions;
a processor: the executable instructions stored in the memory are executed to implement the ANTLR-based complex report formula parsing method as claimed in any one of claims 1 to 7.
CN202011178242.1A 2020-10-29 2020-10-29 ANTLR-based complex report formula analysis method and device Pending CN112270175A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011178242.1A CN112270175A (en) 2020-10-29 2020-10-29 ANTLR-based complex report formula analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011178242.1A CN112270175A (en) 2020-10-29 2020-10-29 ANTLR-based complex report formula analysis method and device

Publications (1)

Publication Number Publication Date
CN112270175A true CN112270175A (en) 2021-01-26

Family

ID=74344660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011178242.1A Pending CN112270175A (en) 2020-10-29 2020-10-29 ANTLR-based complex report formula analysis method and device

Country Status (1)

Country Link
CN (1) CN112270175A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639501A (en) * 2020-05-04 2020-09-08 国网浙江省电力有限公司 Power grid service micro-service combination method based on AMSL
CN115576535A (en) * 2022-11-10 2023-01-06 商飞软件有限公司 Universal expression parser
CN116611412A (en) * 2023-07-20 2023-08-18 云筑信息科技(成都)有限公司 Report form filling and displaying method based on interaction between Excel template and front end

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298657A (en) * 2014-09-23 2015-01-21 广东电网公司电网规划研究中心 Evaluation index analysis system based on expression
CN110018829A (en) * 2019-04-01 2019-07-16 北京东方国信科技股份有限公司 Improve the method and device of PL/SQL language interpreter execution efficiency
CN111639501A (en) * 2020-05-04 2020-09-08 国网浙江省电力有限公司 Power grid service micro-service combination method based on AMSL

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298657A (en) * 2014-09-23 2015-01-21 广东电网公司电网规划研究中心 Evaluation index analysis system based on expression
CN110018829A (en) * 2019-04-01 2019-07-16 北京东方国信科技股份有限公司 Improve the method and device of PL/SQL language interpreter execution efficiency
CN111639501A (en) * 2020-05-04 2020-09-08 国网浙江省电力有限公司 Power grid service micro-service combination method based on AMSL

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639501A (en) * 2020-05-04 2020-09-08 国网浙江省电力有限公司 Power grid service micro-service combination method based on AMSL
CN115576535A (en) * 2022-11-10 2023-01-06 商飞软件有限公司 Universal expression parser
CN116611412A (en) * 2023-07-20 2023-08-18 云筑信息科技(成都)有限公司 Report form filling and displaying method based on interaction between Excel template and front end

Similar Documents

Publication Publication Date Title
CN112270175A (en) ANTLR-based complex report formula analysis method and device
CN110555032A (en) Data blood relationship analysis method and system based on metadata
CA2533073C (en) Relationship modeling
US11776533B2 (en) Building a natural language understanding application using a received electronic record containing programming code including an interpret-block, an interpret-statement, a pattern expression and an action statement
US9122540B2 (en) Transformation of computer programs and eliminating errors
KR101213890B1 (en) Using strong data types to express speech recognition grammars in software programs
CN110502227B (en) Code complement method and device, storage medium and electronic equipment
US9311058B2 (en) Jabba language
Claessen et al. Generating constrained random data with uniform distribution
CN108984155A (en) Flow chart of data processing setting method and device
Basten et al. M3: A general model for code analytics in rascal
CN113609838A (en) Document information extraction and mapping method and system
CN115809063A (en) Storage process compiling method, system, electronic equipment and storage medium
CN109857458A (en) The method for transformation of the flattening of AltaRica 3.0 based on ANTLR
Basten et al. Faster ambiguity detection by grammar filtering
Goloveshkin et al. Tolerant parsing with a special kind of" Any" symbol: the algorithm and practical application
Romanov et al. Representing programs with dependency and function call graphs for learning hierarchical embeddings
CN114912110A (en) Js code security detection method and system
CN111984233B (en) Class flattening method in AltaRica model
Basten Ambiguity detection for programming language grammars
Grigorev et al. String-embedded language support in integrated development environment
Moser et al. Towards attribute grammar mining by symbolic execution
CN110874350A (en) Method and device for processing structured log data
Kulkarni et al. Computational Intelligence Model for Code Generation from Natural Language Problem Statement
Mirghasemi et al. Naming anonymous JavaScript functions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210126