CN118035287A

CN118035287A - Index blood margin analysis method and device

Info

Publication number: CN118035287A
Application number: CN202410145722.XA
Authority: CN
Inventors: 肖钢; 马丽霞; 王嘉俊; 杨腾; 许哲
Original assignee: China Securities Co Ltd
Current assignee: China Securities Co Ltd
Priority date: 2024-02-01
Filing date: 2024-02-01
Publication date: 2024-05-14

Abstract

The embodiment of the invention provides an index blood margin analysis method and device, which relate to the technical field of data processing, and specifically comprise the following steps: acquiring an initial SQL script of a structured query language; traversing each SQL sentence in the initial SQL script, and if traversing to the SQL sentence comprising the fuzzy query sentence in the initial SQL script, replacing the target fuzzy query sentence according to the item data contained in the query table in the target fuzzy query sentence to obtain a target SQL script; acquiring index data contained in the target SQL script, and acquiring source data processed by the initial SQL script; and inquiring the source data of the index data in each source data according to the target SQL script. By applying the scheme provided by the embodiment of the invention, the blood margin analysis of the index data can be realized.

Description

Index blood margin analysis method and device

Technical Field

The invention relates to the technical field of data processing, in particular to an index blood margin analysis method and device.

Background

With the acceleration of informatization and the continuous development of big data technology, the tracking and tracing analysis of data becomes more important. The source of the trace back data may be referred to as performing a blood-lineage resolution on the data.

Index data is an important data collection result of an enterprise, and is usually data obtained by performing data processing on a plurality of source tables, and huge losses are caused to the enterprise by index data errors, so that an index blood margin analysis scheme aiming at the index data needs to be provided to trace the sources of various indexes in the index data in various data contained in the plurality of source tables.

Disclosure of Invention

The embodiment of the invention aims to provide an index blood margin analysis method and device for analyzing blood margins of index data. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a method for analyzing an index blood margin, where the method includes:

Acquiring an initial SQL script of a structured query language (Structured Query Language), wherein the initial SQL script is used for carrying out data processing to obtain index data;

Traversing each SQL sentence in the initial SQL script, if traversing to the SQL sentence comprising the fuzzy query sentence in the initial SQL script, replacing the target fuzzy query sentence according to the item data contained in the query table in the target fuzzy query sentence to obtain a target SQL script, wherein the target fuzzy query sentence is the fuzzy query sentence in the traversed SQL sentence;

acquiring index data contained in the target SQL script, and acquiring source data processed by the initial SQL script;

and inquiring the source data of the index data in each source data according to the target SQL script.

In one embodiment of the present invention, the replacing the target fuzzy query statement according to the item data contained in the to-be-queried table in the target fuzzy query statement includes:

taking the SQL sentence containing the target fuzzy query sentence as an initial to-be-processed sentence, and detecting whether the to-be-processed sentence contains a sub-query sentence or not;

if so, taking the sub-query statement in the statement to be processed as a new statement to be processed, and returning to the step of detecting whether the statement to be processed contains the sub-query statement or not until detecting that the statement to be processed does not contain the sub-query statement;

taking a to-be-processed sentence which does not contain sub-query sentences as an initial to-be-replaced sentence, and detecting whether the to-be-replaced sentence belongs to a fuzzy query sentence or not;

if not, taking the query sentence containing the sentence to be replaced as a new sentence to be replaced, and returning to the step of detecting whether the sentence to be replaced belongs to the fuzzy query sentence;

If yes, determining each item data contained in a to-be-queried table in the to-be-replaced statement, and replacing the query item in the to-be-replaced statement with target information to obtain a replaced statement, wherein the target information contains the determined item data and information of the to-be-queried table;

And taking the query statement containing the replaced statement as a new statement to be replaced, and returning to the step of detecting whether the statement to be replaced belongs to the fuzzy query statement or not until the detection of the whole SQL statement is completed.

In one embodiment of the present invention, the step of using the SQL statement including the target fuzzy query statement as an initial pending statement, and the step of detecting whether the pending statement includes a sub-query statement includes:

Judging whether a first SQL sentence containing the target fuzzy query sentence belongs to an associated sentence or not;

If yes, determining a target table recorded in the first SQL sentence, determining a target query sentence recorded with index data in each sub-query sentence associated with the first SQL sentence, reconstructing the query sentence in the initial SQL script according to the information of the target table and the target query sentence, and deleting the target query sentence;

and taking the reconstructed query statement as an initial statement to be processed, and detecting whether the statement to be processed contains sub-query statements.

In one embodiment of the present invention, before the traversing each SQL statement in the initial SQL script, the method further comprises:

and performing script cleaning on the initial SQL script.

In one embodiment of the present invention, the performing script cleaning on the initial SQL script includes:

and deleting the comment field in the SQL script.

the step of performing script cleaning on the initial SQL script comprises the following steps:

Performing field matching on the initial SQL script to obtain a target field in the initial SQL script, wherein the target field comprises at least one of a condition field and a character string field;

Performing field replacement on the target field;

before the target SQL script queries the source data of the index data in each source data, the method further comprises the following steps:

and performing field restoration aiming at the target field on the target SQL script.

In one embodiment of the present invention, the obtaining the index data contained in the target SQL script includes:

Traversing index data in the SQL statement aiming at each SQL statement in the target SQL script; if the index data in the SQL sentence is traversed for the first time, marking the SQL sentence, generating an index set corresponding to the marking of the SQL sentence, and if the index data in the SQL sentence is traversed again, storing the traversed other index data in the index set;

The querying the source data of the index data in each source data according to the target SQL script comprises the following steps:

And querying source data of index data contained in an index set corresponding to the mark of a second SQL sentence in each source data, wherein the second SQL sentence is a marked sentence in the target SQL script.

In one embodiment of the present invention,

The marking the SQL sentence comprises the following steps:

Marking the SQL sentence according to the index data traversed for the first time and the information of the index table where the index data is located;

Before the second marked SQL statement in the target SQL script is determined, the method further comprises:

and carrying out unique field replacement on each mark existing in the target SQL script.

In a second aspect, an embodiment of the present invention provides an index blood margin analyzing device, including:

the information acquisition module is used for acquiring an initial SQL script of the structured query language, wherein the initial SQL script is used for carrying out data processing to obtain index data;

The script traversing module is used for traversing each SQL sentence in the initial SQL script, if the SQL sentence comprising the fuzzy query sentence is traversed into the initial SQL script, replacing the target fuzzy query sentence according to the item data contained in the query table in the target fuzzy query sentence to obtain the target SQL script, wherein the target fuzzy query sentence is the fuzzy query sentence in the traversed SQL sentence;

the data acquisition module is used for acquiring index data contained in the target SQL script and acquiring source data processed by the initial SQL script;

and the source query module is used for querying source data of the index data in each source data according to the target SQL script.

In one embodiment of the present invention, the script traversal module is specifically configured to:

In one embodiment of the invention, the apparatus further comprises:

and the script cleaning module is used for cleaning the initial SQL script before traversing each SQL sentence in the initial SQL script.

In one embodiment of the present invention, the script cleaning module is specifically configured to:

And deleting the comment field in the initial SQL script.

The script cleaning module is specifically configured to:

Performing field matching on the initial SQL script to obtain a target field in the initial SQL script, wherein the target field comprises at least one of a condition field and a character string field; performing field replacement on the target field;

The apparatus further comprises:

and the field restoration module is used for carrying out field restoration aiming at the target field on the target SQL script before the source data of the index data are queried in each source data.

In one embodiment of the present invention, the data acquisition module includes:

The field marking sub-module is used for traversing index data in each SQL statement in the target SQL script aiming at the SQL statement; if the index data in the SQL sentence is traversed for the first time, marking the SQL sentence, generating an index set corresponding to the marking of the SQL sentence, and if the index data in the SQL sentence is traversed again, storing the traversed other index data in the index set;

the source query module is specifically configured to:

In one embodiment of the present invention, the field marking submodule is specifically configured to:

The apparatus further comprises:

And the field replacement module is used for carrying out unique field replacement on each mark existing in the target SQL script before the second marked SQL statement in the target SQL script is determined.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

A memory for storing a computer program;

A processor for implementing the method steps of any of the above first aspects when executing a program stored on a memory.

In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having a computer program stored therein, which when executed by a processor, implements the method steps of any of the first aspects described above.

The embodiment of the invention has the beneficial effects that:

From the above, when the solution provided by the embodiment of the present invention is applied to performing the blood-edge analysis on the index data, after the initial SQL script is obtained, if the initial SQL script is traversed to the SQL statement containing the fuzzy query statement, the replacement processing is performed on the target fuzzy query statement in the traversed SQL statement, so that the fuzzy query statement contained in the SQL statement is converted into the conventional query statement, and according to the SQL script after the replacement processing, that is, according to the target SQL script, not only the source data of the index data can be queried in the item data contained in the source table, but also the problem that the blood-edge analysis does not support the fuzzy query grammar can be solved, thereby improving the accuracy of the blood-edge analysis.

Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other embodiments may be obtained according to these drawings to those skilled in the art.

FIG. 1 is a flow chart of a first index blood margin analysis method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a second method for analyzing an index blood margin according to an embodiment of the present invention;

FIG. 3 is a flow chart of a third method for analyzing an index blood margin according to an embodiment of the present invention;

fig. 4 is a flow chart of a first field processing method according to an embodiment of the present invention;

Fig. 5 is a flow chart of a second field processing method according to an embodiment of the present invention;

FIG. 6 is a flowchart of a fourth method for analyzing an index blood margin according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of an index blood margin analyzing device according to an embodiment of the present invention;

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by the person skilled in the art based on the present invention are included in the scope of protection of the present invention.

First, a part of the concepts mentioned in the index blood-edge analysis scheme provided by the embodiment of the present invention will be described.

1. Structured query language SQL script

The SQL script is a script containing SQL sentences, the sentences contained in the SQL script are also called SQL fields, and the SLQ script is executed, that is, the sentences contained in the SQL script are executed. SQL statements are of various types for performing various processes on data, such as query statements, table-building statements, insert statements, association statements, and the like.

The above-mentioned sentences are described below.

1.1 Query statement

The query statement is called a SELECT statement and is used to query a data item from a table to be queried.

For example: SELECT NAME from table1 represents a data item of a query name from the table 1.

1.2, Build table sentence

The build table statement is called the CREATE statement and is used to CREATE a new table.

For example create table tabname1, means creating a table named tabnanme 1.

1.3 Insert sentence

The INSERT statement is referred to as an INSERT statement, and is used to INSERT data into the table.

For example, insert into target _table from (SELECT NAME from table 1) indicates that the name is queried from table1 and inserted into the target table.

1.4 Related statements

The association statement is called a UNION statement for merging a plurality of data.

For example, (select index1 from table 1) union (select index2 from table 2) means that index1 is queried from table1, index2 is queried from table2, and index1 and index2 are combined into one data.

2. Fuzzy query statement

The fuzzy query statement is a query statement called SELECT statement, which is used to query all data items in the table to be queried.

For example, select form table1 represents all data items in query table 1.

3. Sub-query statements

Multiple levels of nesting may occur in a query statement, a sub-query statement being a query statement contained in the statement.

For example, SELECT NAME form (select from table 1) means that all data items are first queried from table1, and then the data item of name is queried from the queried data items, wherein select from table1 is a nested sub-query statement.

Next, the method and apparatus for analyzing the blood-related index according to the embodiments of the present invention will be described in detail.

Referring to fig. 1, fig. 1 is a flow chart of an index blood margin analysis method according to an embodiment of the present invention, and in this embodiment, the method includes the following steps S101 to S104.

Step S101: and obtaining the initial SQL script of the structured query language.

The initial SQL script is used for carrying out data processing to obtain index data.

Specifically, the index data is data obtained by processing the source data by using the SQL script, and the source data may be stored in a table form, so the index data may be understood as data obtained by processing the source table storing the source data by using the SQL script. The reason why the source table is analyzed is to ensure that the index data is accurate, so that after the index data is obtained by processing the source table by the SQL script, the source data of the index data is reversely traced in the source table through the index reason analysis, and whether the index data is accurate is checked according to the traced source data.

In view of this, after the index data is obtained by processing the source table using the SQL script, the SQL script in the processing procedure may be directly obtained as the initial SQL script.

Step S102: traversing each SQL sentence in the initial SQL script, and if traversing the SQL sentence comprising the fuzzy query sentence in the initial SQL script, replacing the target fuzzy query sentence according to the item data contained in the query table in the target fuzzy query sentence to obtain the target SQL script.

The target fuzzy query statement is a fuzzy query statement in the traversed SQL statement.

Specifically, for each SQL statement in the target SQL script, whether the SQL statement contains a fuzzy query statement or not can be detected through any one of the following two implementation modes.

In the first implementation manner, regular matching can be performed on the SQL statement according to the regular expression recorded with the various statement structure information, so that various statement structures contained in the SQL statement are determined, and statement types corresponding to the determined statement structures are obtained, namely statement types of the statements contained in the SQL statement are obtained, and then whether the obtained statement types contain fuzzy query statement types is judged, so that whether the statements contain fuzzy query statements can be judged.

In the second implementation manner, since the fuzzy query statement includes "select x", the character detection may be performed on the SQL statement, if the SQL statement is detected to include "select x", it is determined that the SQL statement includes the fuzzy query statement, and if the SQL statement is detected to not include "select x", it is determined that the SQL statement does not include the fuzzy query statement.

In the case that the SQL statement includes a fuzzy query statement, table data following "select from" in the fuzzy query statement may be determined, and the table to be queried is parsed as the table to be queried, so as to obtain item data included in the table to be queried, and "select from" is replaced by the obtained item data.

For example, if the SQL statement includes the fuzzy query statement "select from table1", the table data table1 following the select from may be determined as the table to be queried in the fuzzy query statement included in the SQL statement, if the table1 includes two data of name and gender, the table1 is parsed to obtain two data of name and gender, and the select from is replaced with the name and the gender, and the replaced statement is: select (name, generator) from table1.

Step S103: index data contained in the target SQL script is obtained, and source data processed by the initial SQL script is obtained.

Specifically, the index data has a specific data structure, so after the target SQL script is obtained, the text structure identification can be performed on the target SQL script according to the data structure of the index data, and the index data contained in the target SQL script is obtained.

In addition, as described in the above step S101, after the index data is obtained by processing the source table using the SQL script, the source table in the process of processing may be directly obtained, and the data included in the source table may be determined as the source data.

Step S104: and inquiring the source data of the index data in each source data according to the target SQL script.

After the target SQL script is obtained, a conventional blood margin analysis mode can be adopted to obtain blood margin relations among indexes appearing in the target SQL script, so that data existing in the obtained blood margin relations are inquired in each source data and serve as source data of index data contained in the same blood margin relation.

Multi-level nesting may occur in SQL statements. For the case of multi-stage nesting, referring to fig. 2, a flow chart of a second method for analyzing the blood edges of the index is provided in an embodiment of the present invention, in this embodiment, the step S102 may be implemented by the following steps S102A-S102H.

Step S102A: detecting whether the to-be-processed sentence contains a sub-query sentence, if so, executing step S102B; if not, step S102C is performed.

The initial value of the sentence to be processed is an SQL sentence containing the target fuzzy query sentence.

Specifically, the characters in the sentence to be processed may be detected one by one, and if it is detected that the sentence to be processed includes at least two "select", it is indicated that the sentence to be processed includes a sub-query sentence, and at this time, step S102B is executed; if it is detected that the to-be-processed sentence contains only one "select", the to-be-processed sentence is considered to be the sub-query data of the bottom layer in the index field, and no sub-query sentence exists in the to-be-processed sentence, at this time, step S102C is executed.

Step S102B: taking the sub-query statement in the statement to be processed as a new statement to be processed, and returning to the step S102A.

Step S102C: detecting whether the sentence to be replaced belongs to a fuzzy query sentence, if not, executing a step S102D; if yes, step S102F is performed.

The initial value of the statement to be replaced is a statement to be processed which does not contain sub-query statements.

If the to-be-processed sentence does not contain the sub-query sentence, the to-be-processed sentence is the deepest sub-sentence nested in the index field, and at the moment, whether the to-be-processed sentence belongs to the fuzzy query sentence can be detected from the deepest sub-sentence step by step to the outer layer.

Specifically, detection is performed by either of the following two implementations.

In the first implementation manner, the to-be-replaced sentence can be subjected to regular matching according to the regular expression recorded with various sentence structure information, if the to-be-replaced sentence is matched with the sentence structure containing the fuzzy query sentence, the to-be-replaced sentence is described as belonging to the fuzzy query sentence, and at this time, step S102F is executed; otherwise, it is indicated that the statement to be replaced does not belong to the fuzzy query statement, and at this time, step S102D is executed, and then recursion is performed outwards until the whole SQL statement is detected.

In the second implementation manner, since the fuzzy query sentence includes "select x", the character detection may be performed on the to-be-replaced sentence, and if it is detected that the to-be-replaced sentence includes "select x", it is determined that the to-be-replaced sentence belongs to the fuzzy query sentence, and step S102F is executed; if it is detected that the candidate sentence does not include "select x", it is determined that the candidate sentence does not belong to the fuzzy query sentence, and at this time, step S102D is performed.

Step S102D: judging whether a query sentence containing a sentence to be replaced exists, if so, executing step S102E.

Specifically, whether characters positioned before the to-be-replaced sentence exist in the SQL sentence or not can be detected, if the characters do not exist, the current to-be-replaced sentence is the whole SQL sentence, the replacement processing is completed, and the current to-be-replaced sentence is the SQL sentence after the replacement processing.

If the characters located before the to-be-replaced sentence exist in the SQL sentence, detecting whether the characters located before the to-be-replaced sentence contain a selection, if so, indicating that the current to-be-replaced sentence is still a sub-query sentence in the SQL sentence, and executing step S102E; if the current statement to be replaced does not contain the sub-query statement in the SQL statement, the replacement processing is completed, and the current statement to be replaced is the SQL statement after the replacement processing.

Step S102E: and returning the query sentence containing the sentence to be replaced as a new sentence to be replaced to the step S102C.

Step S102F: and determining each item data contained in the to-be-searched table in the to-be-replaced sentence, and replacing the searched item in the to-be-replaced sentence with the target information to obtain the replaced sentence.

The target information comprises the determined item data and information of a table to be queried in which the item data is located.

In the case that the to-be-replaced sentence is determined to belong to the fuzzy query sentence, a to-be-queried table behind "select from" in the to-be-replaced sentence may be determined, the to-be-queried table is parsed to obtain item data contained in the to-be-queried table, and "select from" is replaced with target information containing the obtained item data and information of the to-be-queried table. Because the space of characters contained in the SQL statement is increased after the substitution of the "x", the "x" in the substitution statement is also called field amplification of the SQL statement.

Step S102G: judging whether a query sentence containing the replaced sentence exists or not, if so, executing step S102H; if not, step S103 is performed.

If there is a query statement containing an replaced statement, it is necessary to indicate whether the replaced statement is a sub-query statement in the SQL statement or to recursively apply to the outer layer, and at this time, step S102H is executed to use the query statement containing the replaced statement as a new statement to be replaced.

If the step S102G determines that there is no query statement of the replaced statement, it indicates that the replaced statement is not a sub-query statement in the index field, and the replacing process is completed at this time, and the current replaced statement is the SQL statement after the replacing process.

The manner of determining whether the query sentence containing the substituted sentence exists is the same as the manner of determining whether the query sentence containing the sentence to be substituted exists in the above step S102D, and will not be described here again.

Step S102H: the query sentence containing the replaced sentence is returned to the above step S102C as a new sentence to be replaced.

From the above, when the scheme provided by the embodiment of the invention is applied to performing the blood-edge analysis on the index data, firstly, whether the sub-query statement is contained in the statement to be processed or not is continuously detected, the sub-query statement in the field is used as a new statement to be processed, the nested deepest sub-query data can be obtained under the condition that the SQL statement is subjected to multi-level nesting, then, whether the sub-query statement belongs to the fuzzy query statement is continuously and recursively detected from the deepest sub-query statement, and when the sub-query statement is detected to belong to the fuzzy query statement, the fuzzy query statement is replaced, so that the fuzzy query statement is converted into the conventional query statement, and the problem that the blood-edge analysis does not support the fuzzy query grammar under the condition that the SQL statement has multi-level nesting is solved, so that the accuracy of the blood-edge analysis can be improved.

The related statement may exist in the above SQL statement, in which case the index field may be obtained through steps S102A1-S102A3 in the embodiment shown in fig. 3 below.

In an embodiment of the present invention, referring to fig. 3, a flow chart of a third index blood margin analysis method is provided, and in this embodiment, the step S102A may be implemented by the following steps S102A1-S102A 3.

Step S102A1: judging whether the first SQL statement containing the target fuzzy query statement belongs to the associated statement, if so, executing step S102A2.

Specifically, whether the first SQL sentence contains a character of union or not can be detected, if the character does not contain the character, the first SQL sentence is not included in the associated field, at this time, the first SQL sentence is directly used as an initial value of the sentence to be processed, and whether the sentence to be processed contains a sub-query sentence or not is detected; if so, it is indicated that the first SQL statement belongs to the associated field, and at this time, step S102A2 is executed.

Step S102A2: determining a target table recorded in the first SQL sentence, determining a target query sentence recorded with index data in each sub-query sentence associated with the first SQL sentence, reconstructing the query sentence in the initial SQL script according to the information of the target table and the target query sentence, and deleting the target query sentence.

Specifically, each sentence has a respective sentence format, and the positions of different data in the sentences are fixed in the sentences, so when determining the target table recorded in the first SQL sentence, table data in the position of the target table in the first SQL sentence can be determined according to the sentence format of the related sentence.

In addition, "union" included in the first SQL statement may be detected, and regular matching for sub-query statements may be performed in a statement before and a statement after the location where "union" is located, so as to obtain each sub-query statement associated in the first SQL statement, and in each sub-query statement, a target query statement including information of index data may be detected. Thus, after the target table and the target query statement recorded in the first SQL statement are determined, the query statement containing the information of the target table and the target query statement is reconstructed, and the target query statement in the initial field is deleted.

Step S102A3: and detecting whether the to-be-processed statement contains a sub-query statement.

The difference between this step and the above step S102A is only that the initial value of the sentence to be processed in this step is a reconstructed query sentence, and will not be described here again.

From the above, when the solution provided by the embodiment of the present invention is applied to performing the blood-edge analysis on the index data, the target table and the target query statement recorded in the first SQL statement are determined, the query statement is reconstructed in the initial SQL script, and the target query statement is deleted, so that the resolution of the related statement can be understood, and the problem of inaccurate analysis in the case that the related statement exists in the initial SQL script is solved, so that the accuracy of the blood-edge analysis can be improved by applying the solution provided by the embodiment of the present invention.

In one embodiment of the present invention, referring to fig. 4, a first processing flow of an SQL statement is shown, in fig. 4, it is first determined whether the SQL statement includes an insert statement and an associated statement, if yes, a target table in the insert statement is obtained, a sub-query statement associated with the SQL statement is traversed, for each traversed sub-query statement, it is determined whether the sub-query statement is a first sub-query statement in the SQL statement, if yes, a next sub-query statement is processed; if not, judging whether index data is contained in the sub-query statement, if so, disassembling the sub-query statement from the SQL statement, reconstructing a new statement according to the target table and the sub-query statement, and deleting the sub-query statement in the SQL statement; and if the sub-query statement does not contain index data, processing the next sub-query statement. Thus, after each sub-query statement is processed, the whole SQL statement processing is completed.

In the process of acquiring the source data, if the source data is stored in the source table, the source table can be acquired, the source table is analyzed, the table structure of the source table and the contained item data are acquired, and in the metadata updating module, a new table is rebuilt according to the table structure of the source table and the contained item data, which is called a temporary table.

If the initial SQL script contains a table-building statement, and the item data in the source table needs to be used, the item data can be obtained from a temporary table stored in the metadata updating module. The intermediate table obtained by executing the table construction sentence can also be stored in the metadata updating module, and when the intermediate table is processed by executing the sentence in the initial SQL script, for example, the data in the intermediate table can synchronously update the intermediate table stored in the metadata updating module.

In this case, when the SQL statement is processed, it may be first determined whether the SQL statement includes a table-building statement, and if so, the table-building statement is executed, the information in the metadata update module is updated, and then the subsequent replacement processing is performed on the SQL statement.

In one embodiment of the invention, after the initial SQL script is obtained, SQL grammar analysis is firstly carried out on the initial SQL script, whether the initial SQL script contains a table building sentence is judged, if yes, the table building sentence is executed, and the information recorded in the metadata updating module is updated; if not, acquiring the SQL sentence including the fuzzy query sentence in the initial SQL script in a traversing manner, and executing the steps S102A-S102H in the embodiment shown in FIG. 2 for each traversed SQL sentence.

In one embodiment of the invention, before traversing each SQL sentence in the initial SQL script, the initial SQL script can be subjected to script cleaning so as to clean irrelevant contents in the initial SQL script, and the influence of the irrelevant contents on subsequent steps is avoided, thereby improving the accuracy of blood-lineage analysis.

Specifically, the initial SQL script may be script cleaned by at least one of the following three implementations.

In a first implementation, the comment field in the initial SQL script is deleted.

Specifically, the start character and the end character of the annotation identifier in the initial SQL script may be detected, for example, the start character is "the" end character is "the" so that after the start character and the end character are detected, the field from the start character to the end character is determined to be the annotation field, thereby deleting the determined annotation field.

In the implementation manner, the comment field in the initial SQL script is deleted, so that interference of the content in the comment field on the blood margin analysis can be avoided, and the accuracy of the blood margin analysis can be improved.

In a second implementation manner, condition field matching is performed on the initial SQL script to obtain condition fields in the initial SQL script; and performing field replacement on the condition field in the initial SQL script.

Specifically, the condition characters representing the condition statement in the initial SQL script may be detected, for example, the condition characters such as "where", "join" and the like in the SQL script are detected, the field in which the detected condition characters are located is determined, and the determined field is the condition field in the SQL script, so that the determined field is replaced by a preset first temporary field, and the determined field and the first temporary field are stored in a manner that the determined field corresponds to the first temporary field.

When the condition field is replaced by the first temporary field, the replacement can be performed by using a pre-obtained SQL field replacement reconstruction module.

And after the index fields in the initial SQL script are subjected to replacement processing and before source data is queried according to the replaced target SQL script, carrying out condition field restoration on the replaced target SQL script.

In the field restoration process, a first temporary field in the target SQL script after the replacement processing can be detected to obtain the first temporary field in the SQL script, and the first temporary field in the target SQL script is replaced with a stored condition field corresponding to the first temporary field.

In the implementation mode, the condition fields in the initial SQL script are replaced before the initial SQL script is traversed, and then the condition fields are restored to the target SQL script after the index fields in the SQL script are replaced, so that the condition fields are prevented from interfering the traversing SQL script by the content of the condition fields, and the accuracy of blood margin analysis can be improved.

In a third implementation manner, performing character string field matching on the initial SQL script to obtain character string fields in the initial SQL script; and performing field replacement on the character string field in the initial SQL script.

Specifically, according to the regular expression of the field structure recorded with the character string field, the initial SQL script is subjected to regular matching aiming at the character string field to obtain the character string field contained in the initial SQL script, the character string field in the initial SQL script is replaced by the second temporary field, and the character string field and the second temporary field are stored in a mode that the character string field corresponds to the second temporary field.

When the character string field is replaced by the second temporary field, the pre-obtained SQL field replacement reconstruction module can be used for replacing the character string field.

And after the index fields in the initial SQL script are subjected to replacement processing and before source data is queried according to the replaced target SQL script, performing character string field restoration on the replaced target SQL script.

In the field restoration process, a second temporary field in the target SQL script after the replacement processing can be detected, so that the second temporary field in the SQL script is obtained, and the second temporary field in the target SQL script is replaced by a stored character string field corresponding to the second temporary field.

In the implementation mode, the character string fields in the SQL script are replaced before the initial SQL script is traversed, and then the character string field is restored to the target SQL script after the index fields in the initial SQL script are replaced, so that the interference of the content of the character string fields on the traversed SQL script can be avoided, and the accuracy of blood-lineage analysis can be improved.

In addition, one or more of the three implementation modes can be selected for script cleaning, so that the accuracy of blood margin analysis is further improved.

In one embodiment of the present invention, referring to FIG. 5, a process flow of a field processing method is shown, in FIG. 5, first extracting a string field in an initial SQL script; generating a universally unique identification code (Universally Unique Identifier, UUID) according to the character string field; and replacing the character string field in the initial SQL script with the generated UUID, so that after the initial SQL script is processed, the UUID in the target SQL script is restored into the character string field.

Because the data volume of the initial SQL script is usually larger, after the SQL script after the replacement processing is obtained, the source data needs to be queried based on the target SQL script with larger data volume, and the query efficiency is lower.

To solve the above problem, in one embodiment of the present invention, when the index data contained in the target SQL script is obtained in the step S103, each SQL statement in the target SQL script is processed as follows:

Traversing index data in the SQL statement; and marking the SQL sentence if the index data in the SQL sentence is traversed for the first time, generating an index set corresponding to the marking of the SQL sentence, and storing other traversed index data in the index set if the index data in the SQL sentence is traversed again.

Specifically, the text structure recognition can be performed on the SQL sentence, and when index data in the SQL sentence is recognized for the first time, the SQL sentence can be marked.

In addition, since one SQL statement may include a plurality of index data, an index set corresponding to the label of the SQL statement and storing other index data to be traversed later may be generated in addition to the label of the SQL statement. Thus, when traversing to other index data in the SQL sentence again, the traversed other index data is stored in the generated index set.

For example, for the statement (select index1 from table 1) union (select index2 from table 2), if index1 and index2 are index data, when traversing to the first of the two data, for example, when traversing to index1 first, the statement may be marked and an index set corresponding to the mark may be generated, so that index2 is stored in the set when traversing to index 2.

After the above operation is performed on each SQL sentence in the target SQL script, the SQL sentence containing index data in the target SQL script is determined, and index data contained in the SQL sentence can also be determined, so that when the source data of the index data is queried in each source data according to the target SQL script, the source data of the index data contained in the index set corresponding to the mark of the second SQL sentence can be queried in each source data.

The second SQL sentence is a marked sentence in the target SQL script.

Specifically, for each second SQL statement marked in the target SQL script, the mark of the second SQL statement can be determined, and an index set corresponding to the mark is determined, so that for the second SQL statement, a conventional blood margin analysis mode can be adopted to obtain blood margin relations among indexes appearing in the second SQL statement, determine the blood margin relation of index data contained in the index set, and query item data existing in the determined blood margin relation from each source data, and serve as source data of the index data contained in the index set.

From the above, when the scheme provided by the embodiment of the invention is applied to perform the blood-edge analysis on the index data, the marked second SQL statement can be quickly positioned in the target SQL script after the subsequent replacement processing, and the second SQL statement can be understood as the SQL statement used for acquiring the index data in the target SQL script after the replacement processing, so that the source data can be directly queried according to the second SQL statement without querying the source data according to the whole target SQL script, thereby improving the efficiency of querying the source data.

In one embodiment of the invention, when the SQL sentence is marked, the SQL sentence can be marked according to the index data traversed for the first time and the information of the index table where the index data is located.

The index table may be understood as a table for storing index data after data processing is performed by using an SQL script to obtain the index data.

Specifically, when the index data is traversed for the first time, information of an index table where the traversed index data is located can be obtained, and the SQL statement is marked according to the traversed index data and the information of the index table.

For example, if the index data index1 is traversed for the first time in an SQL statement, information of the index table where the index1 is located, such as the index table name, may be obtained, so that the SQL statement may be marked, and the marking content is the index table name-index 1.

In addition, in the marking process, index data traversed from different sentences for the first time may be the same, so that when different sentences are marked, marked contents are the same, and when source data are queried later, it is difficult to determine index sets corresponding to marks of the SQL sentences according to marked contents of the SQL sentences.

To avoid this, after determining the second SQL statement marked in the target SQL script, unique field substitutions can be made to each mark that is already present in the target SQL script.

After each mark is subjected to unique field replacement, the marks of each second SQL statement and the index sets form a one-to-one correspondence, so that in the process of inquiring source data later, the index sets corresponding to the marks can be accurately determined according to the replaced marks, and the source data inquiry is further realized. Therefore, by applying the index blood margin analysis scheme provided by the embodiment of the invention, the source data of each target index can be accurately inquired, and the accuracy of index blood margin analysis is improved.

In an embodiment of the present invention, referring to fig. 6, a flow chart of a fourth index blood-edge analysis method is provided, in this embodiment, firstly, an SQL script is obtained, and general cleaning is performed on the SQL script, that is, comment fields in the SQL script are deleted, then index configuration information of the SQL script is obtained, and according to the index configuration information of the SQL script, the SQL script is split, so as to obtain each SQL statement in the SQL script.

Traversing the SQL script, judging whether the currently traversed SQL sentence contains index data, if not, continuing traversing; if yes, judging whether the index data is the first index traversed from the current SQL statement.

If the index data is the first index traversed from the current SQL sentence, marking the current SQL sentence and generating an index set corresponding to the mark of the current SQL sentence;

if the index data is not the first index traversed from the current SQL statement, the index data is added to the index set.

After traversing each SQL sentence, storing each index set, and then carrying out index field replacement processing on the SQL script to obtain a replaced SQL script, so that source data of each index data are queried according to the replaced SQL script.

Corresponding to the index blood margin analysis method, the embodiment of the invention also provides an index blood margin analysis device.

In one embodiment of the present invention, referring to fig. 7, there is provided an index blood margin analyzing apparatus, the apparatus comprising:

The information acquisition module 701 is configured to acquire an initial SQL script of a structured query language, where the initial SQL script is used for performing data processing to obtain index data;

The script traversing module 702 is configured to traverse each SQL statement in the initial SQL script, and if the initial SQL script is traversed to include an SQL statement of a fuzzy query statement, replace the target fuzzy query statement according to item data included in a to-be-queried table in the target fuzzy query statement to obtain a target SQL script, where the target fuzzy query statement is a fuzzy query statement in the traversed SQL statement;

the data acquisition module 703 is configured to acquire index data included in the target SQL script, and acquire source data processed by the initial SQL script;

And the source query module 704 is configured to query source data of the index data in each source data according to the target SQL script.

In one embodiment of the present invention, the script traversal module 702 is specifically configured to:

In one embodiment of the invention, the apparatus further comprises:

In the scheme, the initial SQL script is cleaned, irrelevant contents in the initial SQL script can be deleted, and the interference of the irrelevant contents on subsequent steps is avoided, so that the accuracy of blood margin analysis is improved.

and deleting the comment field in the SQL script.

In the scheme, the comment field in the initial SQL script is deleted, so that the interference of the content in the comment field on the blood margin analysis can be avoided, and the accuracy of the blood margin analysis can be improved.

In the scheme, the condition fields in the SQL script are replaced before traversing the SQL script, and then the condition fields are restored after replacing the index fields in the SQL script, so that the condition fields are prevented from interfering the traversing SQL script, and the accuracy of blood-lineage analysis can be improved.

The apparatus further comprises:

In the scheme, the target field in the SQL script is replaced before the initial SQL script is traversed, and then the target field is restored to the target SQL script after the SQL sentence in the initial SQL script is replaced, so that the interference of the content of the target field on the traversed SQL script can be avoided, and the accuracy of blood margin analysis can be improved.

In one embodiment of the present invention,

The data acquisition module 703 includes:

the source query module is specifically configured to:

From the above, when the scheme provided by the embodiment of the invention is applied to perform the blood-edge analysis on the index data, the marked second SQL statement can be quickly positioned in the target SQL script after the subsequent replacement processing, and the second SQL statement can be understood as the SQL statement used for acquiring the index data in the target SQL script, so that the source data can be directly queried according to the second SQL statement, and the source data does not need to be queried according to the whole target SQL script, thereby improving the efficiency of querying the source data.

The apparatus further comprises:

The embodiment of the present invention further provides an electronic device, as shown in fig. 8, including a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete communication with each other through the communication bus 804,

A memory 803 for storing a computer program;

the processor 801, when executing the program stored in the memory 803, implements the following steps:

acquiring an initial SQL script of a structured query language, wherein the initial SQL script is used for performing data processing to obtain index data;

Other schemes for implementing index blood-margin analysis by the processor 801 executing the program stored in the memory 803 can be referred to the foregoing method embodiments, and will not be described herein.

The communication bus mentioned above for the electronic device may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In yet another embodiment of the present invention, a computer readable storage medium is provided, in which a computer program is stored, the computer program implementing the steps of any of the index blood-edge analysis methods described above when executed by a processor.

In yet another embodiment of the present invention, a computer program product comprising instructions that, when run on a computer, cause the computer to perform the index blood margin analysis method of any of the above embodiments is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for an apparatus, an electronic device, a computer readable storage medium, a computer program product embodiment, the description is relatively simple, as it is substantially similar to the method embodiment, as relevant see the partial description of the method embodiment.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. An index blood margin analysis method, which is characterized by comprising the following steps:

Acquiring index data contained in the target SQL script, and acquiring source data corresponding to the index data;

2. The method according to claim 1, wherein the replacing the target fuzzy query statement according to the item data contained in the table to be queried in the target fuzzy query statement includes:

3. The method according to claim 2, wherein the step of using the SQL statement containing the target fuzzy query statement as an initial pending statement, and detecting whether the pending statement contains a sub-query statement comprises:

4. A method according to any one of claims 1-3, further comprising, prior to said traversing each SQL statement in said initial SQL script:

and performing script cleaning on the initial SQL script.

5. The method of claim 4, wherein said script washing said initial SQL script comprises:

And deleting the comment field in the initial SQL script.

6. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

performing field matching on the initial SQL script to obtain a target field in the SQL script, wherein the target field comprises at least one of a condition field and a character string field;

Performing field replacement on the target field;

7. The method of any of claims 1-3, wherein the obtaining the index data contained in the target SQL script comprises:

8. The method of claim 7, wherein the marking the SQL statement comprises:

9. An index blood margin analyzing device, characterized in that the device comprises:

10. The apparatus of claim 9, wherein the script traversal module is configured to:

11. The apparatus according to claim 9 or 10, wherein the script traversal module is specifically configured to:

12. The apparatus according to any one of claims 9-11, wherein the apparatus further comprises:

13. The device according to claim 12, wherein the script cleaning module is specifically configured to:

And deleting the comment field in the initial SQL script.

14. The device according to claim 12, wherein the script cleaning module is specifically configured to:

The apparatus further comprises:

15. The apparatus according to any one of claims 9-11, wherein the data acquisition module comprises:

the source query module is specifically configured to:

16. The apparatus of claim 15, wherein the field marking submodule is specifically configured to:

The apparatus further comprises:

17. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

A memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 1-8 when executing a program stored on a memory.

18. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-8.