CN112132420A

CN112132420A - SQL query-oriented refinement scoring method

Info

Publication number: CN112132420A
Application number: CN202010922595.1A
Authority: CN
Inventors: 许嘉; 莫晓琨; 吕品
Original assignee: Guangxi University
Current assignee: Guangxi University
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-12-25
Anticipated expiration: 2040-09-04
Also published as: CN112132420B

Abstract

The invention discloses a SQL query-oriented refinement scoring method, which utilizes an equivalent SQL statement set obtained based on the equivalent transformation of a correct answer SQL statement to realize reasonable score judgment of different forms of answers submitted by students on one hand, and provides correction of partial wrong student answers on the other hand, and quantifies the score of the student answers on the basis of correction cost and conversion cost between the corrected answers and the correct answers. Answer data of students is collected through online SQL programming practice activities participated by students of a plurality of teaching classes, and relevant SQL query refinement scoring is carried out for experimental analysis based on the collected answer data of the students. According to the method, equivalent change of the teacher answer, correction of the student answers and subsequent detailed grading processing are combined, so that the accuracy of SQL query judgment can be improved on the basis that only simple judgment of wrong answers is available, and the SQL query judgment is fair and reasonable.

Description

SQL query-oriented refinement scoring method

Technical Field

The invention relates to a database technology, in particular to a refinement scoring method for SQL query.

Background

Database technology has been the core backbone of computer science, and whether to be proficient is also an important standard for measuring the skills of IT practitioners. Learning the structured query language (i.e., SQL) plays a crucial role in the learning process of database technology. Like other programming languages, SQL is highly demanding on programming practice, i.e., it requires a great deal of programming practice to really understand and master SQL. In the actual teaching process, the SQL programming questions are generally corrected by professional teachers. However, the modification of the SQL programming questions by teachers presents the following two important problems. First, the answers to the SQL programming questions are diverse, which undoubtedly increases the judgment burden of the teacher, and may cause the teacher to give different scores to the equivalent answers with different forms, affecting the fairness of scoring. Secondly, the judgment standard of the SQL programming questions has ambiguity, and teachers may give different scores to wrong answer judgments with consistent forms, so that the grading fairness is also influenced. Therefore, how to realize effective automatic scoring of SQL is an important problem that needs to be researched and solved when the database technology course is constructed.

In order to reduce the teacher correction burden of the SQL programming questions and improve the fairness of scoring, in recent years, many excellent research works emerge at the top level or important academic conferences in the database research fields such as VLDB, ICDE and the like, so that the automatic scoring of SQL queries is realized, and the methods can be specifically divided into three types: (1) SQL scoring techniques based on result set comparisons; (2) crowd-sourced based SQL scoring techniques; (3) SQL scoring techniques based on syntactic structural analysis. The SQL scoring technology based on result set comparison realizes the automatic scoring of SQL by comparing the query result set of correct answers with the query result set of student answers. SQL scoring techniques based on result set comparisons can give fair scores (i.e., full scores) for equivalent correct answers, but have the limitation of "not correct or wrong" correction. Specifically, for a part of wrong SQL statements given by a student (for example, the student simply spells a wrong field name), the SQL scoring technology based on result set comparison cannot analyze the SQL contents, and therefore cannot judge a refined score smaller than a full score, but can judge a refined score of 0 score, as a teacher does, and thus may discourage the student from being interested in learning SQL and from being confident. The SQL scoring technology based on crowdsourcing assigns the task of correcting the SQL programming questions to students and estimates the real scores of the SQL programming questions based on the scores of a plurality of students on the questions. Although the crowd-sourced SQL scoring technology not only reduces the correcting burden of teachers, but also can give detailed scoring of SQL, the crowd-sourced SQL scoring technology has the limitations that the correcting period is long, and scoring feedback cannot be given in time. In recent years, some researchers have worked to develop SQL scoring techniques based on syntactic structural analysis. The technology realizes the automatic grading of SQL by comparing the difference between the syntactic structure and the clause content of the correct answer and the syntactic structure and the clause content of the student answer. Although the SQL scoring technology based on syntactic structure analysis can provide a detailed scoring result of SQL, it is still impossible for teachers to provide all correct answers for each question due to the diversity of correct answers for SQL programming questions.

In summary, the focus and difficulty of SQL learning lies in the data query function, and the existing SQL automatic scoring technology intensively solves the problem of automatic scoring of SQL query. Although the prior art has made many research advances in the automatic scoring of SQL queries, in order to further improve the fairness of the automatic scoring of SQL queries, the following two challenge problems still need to be solved:

challenge one: automatic scoring techniques also need to be able to deduce other equivalent correct answers based on the correct answers given by the teacher, thereby ensuring fairness in scoring different forms of answers submitted by the students. The existing SQL automatic scoring technology cannot provide the guarantee. As shown in table 1, for the query semantic of "querying student information of a history system and a physical system", the teacher gives an SQL answer based on the merge operation, and the student a gives an SQL answer based on the WHERE clause condition judgment. Since the problem that the SQL answer of the student A has wrong spelling of the field name of the department can not be executed, the existing SQL scoring technology based on result set comparison can only judge 0 score. Moreover, the SQL answer structure of the student A is obviously different from the SQL answer structure given by the teacher, so that the existing SQL scoring technology based on the syntactic structure analysis cannot give fair refined scores.

Challenge two: for SQL statements that are partially wrong with respect to the answer, the automatic scoring technique needs to be able to give a refined scoring result, thereby guaranteeing the fairness in scoring the partially wrong answer. The existing SQL automatic scoring technology cannot provide the guarantee. As shown in table 1, the SQL answer submitted by student B is consistent with the answer given by the teacher, except that the keyword SELECT spelling is incorrect. However, the existing SQL scoring technology based on result set comparison can only judge 0 score because the statement cannot be executed by mistake. The existing SQL scoring technology based on syntactic structure analysis can only judge 0 point because syntactic analysis can not be implemented due to syntactic errors.

Table 1: unfair SQL scoring example.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a SQL query-oriented refinement scoring method, which can improve the SQL query judgment accuracy on the basis that only simple answer judgment is wrong at present by modifying the equivalent change of teacher answers and the student answers and combining with subsequent refinement scoring treatment, so that the SQL query judgment is fairer and more reasonable.

In order to solve the technical problems, the invention adopts the technical scheme that:

a refinement scoring method for SQL query comprises the following steps:

1) according to SQL query answer Q provided by the teacher for the target question and the input conversion rule set R, SQL query equivalent transformation is carried out to obtain an equivalent SQL query answer set Q⁺；

2) According to the original SQL answer Q submitted by students for the target subject_sCarrying out SQL query correction on the data dictionary Dict related to the query to obtain a corrected SQL answer Q_sCalculating the correction cost of the student SQL answer;

3) carrying out normalization processing on the corrected SQL answer Qs to obtain a normalized SQL statement Q'_sWill getConverting the obtained standard SQL statement into a relational algebra expression and further converting to obtain a student SQL answer query tree;

4) answer set Q for equivalent SQL queries⁺Each equivalent SQL query answer Q in (a)_i: first, the answer Q to the equivalent SQL query_iCarrying out normalization processing to obtain a normalized SQL statement Q'_iConverting the obtained standard SQL statement into a relational algebra expression and further converting the relational algebra expression into an equivalent SQL query answer query tree; then, calculating a standard SQL statement Q 'according to the edit distance between the student SQL answer query tree and the equivalent SQL answer query tree'_sAnd a canonical SQL statement Q'_iA transition cost between; according to the correction cost of student SQL answer and the normalized SQL statement Q'_sAnd canonical SQL statement Q'_iAnd calculating the conversion cost to obtain a refined score of the student SQL answer.

Optionally, in step 1), performing SQL query equivalent transformation to obtain an equivalent SQL query answer set Q⁺Comprises the following steps:

1.1) acquiring an SQL query answer Q provided by a teacher aiming at a target question and an input conversion rule set R;

1.2) converting the SQL query answer Q into a relational algebra RA (Q), and adding the relational algebra RA (Q) into a relational algebra set RA (Q)_T；

1.3) if the conversion rule set R is empty, skipping to execute the step 1.5); otherwise, the current conversion rule R is traversed and taken out from the conversion rule set R_iSkipping to execute the next step;

1.4) algebraic relationship RA (Q) according to current conversion rule R_iPerforming equivalence transformation to obtain equivalence relation algebra

If equivalence relation algebra

Different from the relationship algebra RA (Q), the equivalent relationship algebra

Adding relational algebra sets RA_T(ii) a Collecting relationship algebra RA_TConverting into SQL sentence to obtain equivalent SQL query answer and adding equivalent SQL query answer set Q⁺(ii) a Skipping to execute step 1.3);

1.5) outputting equivalent SQL query answer set Q⁺。

Optionally, the step of performing SQL query modification in step 2) to obtain a modified SQL answer Qs includes:

2.1) original SQL answer Q submitted by student_sThe original SQL answer Q is obtained by segmentation_sA set of clauses C;

2.2) amending the original SQL answer Q based on the clause set C_sThe SQL key words and the database mode information in the SQL database are wrong, and the SQL answer Q is corrected based on the clause set C_sThe clauses in (1) are in wrong order.

Optionally, modifying the original SQL answer Q in step 2.2) based on the clause set C_sThe step of the SQL key word and the database mode information error in (1) comprises the following steps:

2.2.1A) generating a data dictionary Dict from a set of SQL query keywords Dict_keyDatabase schema information set Dict_schemaComposition, said database schema information set Dict_schemaThe database mode information comprises a basic table name, a view name, a field name and an index name;

2.2.2A) traversing clauses in the clause set C to obtain the current clause C_iIf the clause set C is traversed completely, judging that the SQL key words and the database mode information are corrected completely, and returning; otherwise, skipping to execute the next step;

2.2.3A) for the current clause C_iPerforming word segmentation processing to obtain a word and brand sequence T to be corrected;

2.2.4A) obtaining a current word tag T from the word tag sequence T in a traversal way_jIf the traversal is finished, skipping to execute the step 2.2.2A); otherwise, skipping to execute the next step;

2.2.5A) judging the current word brand T_jWhether the dictionary Dict is true or not is judged, if true, the next step is executed by skipping; otherwise, skipping the execution step2.2.4A)；

2.2.6A) SQL query key set Dict in data dictionary Dict_keyThe distance between the current word plate and the word plate T is calculated based on the editing distance of the character string_jThe key word ED having the smallest edit distance therebetween_keySet of database schema information Dict in data dictionary Dict_schemaThe distance between the current word plate and the word plate T is calculated based on the editing distance of the character string_jThe key word ED having the smallest edit distance therebetween_schemaIf the key word ED_keyKeyword ED_schemaIf the absolute value of the difference is less than the preset threshold, the word tag T is selected according to the current word tag_jIn clause C_iContext of (1) determining the current word brand T_jThe corrected result obtains the current word card T 'after modification'_jOtherwise, the current word plate T is used_jCorrection to key words ED_keyKeyword ED_schemaThe object with the minimum editing distance in the two elements is in the data dictionary Dict; the jump performs step 2.2.4A).

Optionally, amending the SQL answer Q in step 2.2) based on the clause set C_sThe step of clause misordering in (1) comprises:

2.2.1B) initially set the set of error clauses C_errorSet of null, unmatched clauses C_unmatchIs empty;

2.2.2B) traversing the clauses in the clause set C to obtain the current clause C_iIf the clause set C is traversed, skipping to execute the step 2.2.4B); otherwise, skipping to execute the next step;

2.2.3B) judging the next clause C of the current clause_i+1For the current clause C_iIf the next word is true, the next clause C of the current clause is determined to be true_i+1Adding a set of wrong clauses C_errorThe current clause C is added_iAdding a set of unmatched clauses C_unmatch(ii) a If yes, the next clause C of the current clause is added_i+1As the current clause C_iThe next word or sentence; jump execution 2.2.2B);

2.2.4B) traverse the set of unmatched clauses C_unmatchThe clause in (1) obtains the current clause C_l(ii) a If not matched, clause setC_unmatchAnd step 2.2.7B) is executed by jumping after traversing is finished; otherwise, skipping to execute the next step;

2.2.5B) in the wrong clause set C_errorFind current clause C in_lMatched clause C_mathch；

2.2.6B) if clause C_mathchIf the answer is null, the SQL answer Q submitted by the student is transmitted_sReturning as a revised SQL answer Qs; otherwise, clause C is set_mathchAs the current clause C_lThe next clause of (a); skipping to execute step 2.2.4B);

2.2.7B) if wrong clause set C_errorIf not, the SQL answer Q submitted by the student_sAs a revised SQL answer Q_sReturning; otherwise, adjusting the sequence of the clauses in the clause set C according to the determined sequence, and then converting the clause set C after the sequence is adjusted into an SQL statement to be used as a modified SQL answer Q_sAnd returning.

Optionally, the normalizing in step 3) includes: standardizing the form of the equivalent relationship assertion into the relationship assertion in a unified specified form; standardizing the equivalent connection query form into a connection query in a unified specified form; standardizing the form of the equivalent nested query into a nested query of a unified specified form; the form of the equivalent interval query is normalized to a representation based on a comparison operator.

Optionally, the SQL statement Q 'specified in step 4)'_sAnd canonical SQL statement Q'_iThe computational function expression of the conversion cost between is:

in the above formula, Penalty_t(Q′_s，Q’_i) Is a canonical SQL statement Q'_sAnd a canonical SQL statement Q'_iConversion cost between, Component (Q'_s) Express canonical SQL statement Q'_sSet of composition elements of the corresponding query tree, O_iExpress canonical SQL statement Q'_sIth group of corresponding query treeElement of general formula ED (O)_i，Q’_i) Is a specification-based SQL statement Q'_iModified canonical SQL statement Q 'of the corresponding query tree'_sCorresponding combined element O in query tree_iThe value of the edit distance introduced is,

is a combination element O_iThe weight values of the clause types belong to, W is the sum of the weight values of all the clause types, and S is the score of the target topic;

calculating the correction cost of the student SQL answer in the step 2) comprises calculating a correction SQL answer Q_sThe first correction cost of SQL keyword and database mode information error in the database, and calculating the corrected SQL answer Q_sA second correction cost for a wrong order of clauses in (1); and the calculation function expressions of the first correction cost and the second correction cost are shown as the following formulas:

in the above formula, Penalty_c(Q_s) Represents a first correction cost, Q_sAs a revised SQL answer, C (Q)_s) Answer Q for SQL_sThe set of clauses of (a) is,

is a clause C_iThe number of editing modifications of (a) the number of editing modifications,

is a clause C_iThe weight values of the clause types belong to, W is the sum of the weight values of all the clause types, and S is the score of the target topic;

in the above formula, Penalty_a(Q_s) Representing a second correction cost, Q_sAs a revised SQL answer, C (Q)_s) SQL answer Q_sSet of clauses of, I (C)_i) Is a clause C_iIf clause C_iIs adjusted to I (C)_i) 1, and vice versa I (C)_i)＝0，

Is a clause C_iAnd W is the sum of the weighted values of all clause types, and S is the score of the target topic.

Optionally, when the refined score of the student SQL answer is calculated in step 4), a functional expression is adopted as shown in the following formula:

in the above formula, G (Q)_s) For the refined score of the student 'S SQL answer, S is the student' S submitted original SQL answer Q_sScore of (1), Penalty_c(Q_s) Represents a first correction cost, Penalty_a(Q_s) Indicating a second correction cost, Penalty_t(Q′_s，Q’_i) Is a canonical SQL statement Q'_sAnd a canonical SQL statement Q'_iThe cost of the conversion between (a) and (b),

representing all canonical SQL statements Q'_sAnd a canonical SQL statement Q'_iThe minimum in the transition costs between.

In addition, the invention also provides a refinement scoring system for the SQL query, which comprises a computer device, wherein the computer device is programmed or configured to execute the steps of the refinement scoring method for the SQL query, or a computer program which is programmed or configured to execute the refinement scoring method for the SQL query is stored in a memory of the computer device.

Furthermore, the present invention also provides a computer readable storage medium having stored therein a computer program programmed or configured to execute the SQL query-oriented refinement scoring method.

Compared with the prior art, the invention has the following advantages: the learning of the structured query language SQL has been a major and difficult point of the database technology, requiring students to perform a large number of SQL programming practices. In recent years, the automatic scoring technology for SQL query has become a current research focus because the burden of teachers to modify student SQL jobs in batches can be greatly reduced. However, the existing SQL query automatic scoring technology does not fully consider the fairness of scoring for different forms of answers submitted by students and the fairness of scoring for answers submitted by students containing partial errors, thereby affecting the learning interest and confidence of students in SQL queries. Aiming at the problems, the invention provides a SQL query-oriented refinement scoring method, which on one hand utilizes an equivalent SQL statement set obtained based on the equivalent transformation of a correct answer SQL statement to realize reasonable score judgment of different forms of answers submitted by students, on the other hand, provides correction of partial wrong student answers and quantifies the scores of the student answers based on the correction cost and the conversion cost between the corrected answers and the correct answers. Answer data of students is collected through online SQL programming practice activities participated by students of a plurality of teaching classes, and relevant SQL query refinement scoring is carried out for experimental analysis based on the collected answer data of the students. Compared with the related technology, the technology of the invention has more advantages in the SQL scoring fairness, and the SQL scoring accuracy is obviously improved.

Drawings

FIG. 1 is a schematic diagram of the basic principle of the method according to the embodiment of the present invention.

FIG. 2 is a diagram illustrating a query tree according to an embodiment of the present invention.

Fig. 3 is a Graph of a directed acyclic Graph obtained by a first recursion in the embodiment of the present invention.

Fig. 4 is a directed acyclic Graph obtained by the second recursion in the embodiment of the present invention.

Fig. 5 is a Graph of a directed acyclic Graph obtained by a third recursion in the embodiment of the present invention.

Fig. 6 is a deformed directed acyclic Graph' finally obtained in the embodiment of the present invention.

FIG. 7 is a schematic diagram showing a comparison of the run-time analysis of the three methods in the embodiment of the present invention.

FIG. 8 is a diagram illustrating the comparison of the number of wrong answers by the three methods in the embodiment of the present invention.

Detailed Description

The detailed scoring method (abbreviated as SQL-graph) for SQL query according to the present invention will be further described in detail below by taking the target topic as "student querying all History systems (History) and physical systems (Physics) in student table (student)", wherein student a and student B submit original SQL answer Q for the target topic_sAs shown in table 1.

As shown in fig. 1, the method for refining and scoring for SQL query in this embodiment includes:

3) carrying out normalization processing on the corrected SQL answer Qs to obtain a normalized SQL statement Q'_sConverting the obtained standard SQL statement into a relational algebra expression and further converting the relational algebra expression into a student SQL answer query tree;

As can be seen from fig. 1, as an optional implementation manner, in this embodiment, specifically, an SQL query equivalence transformation module is used as an execution main body of step 1), and in step 1), one SQL query answer (as shown in table 1) provided by a teacher is used as an input, and an equivalence transformation rule of a relational algebra expression is used to perform equivalence transformation on the relational algebra expression of the SQL query answer, so as to obtain a plurality of relational algebra expressions equivalent to the relational algebra expression of the SQL query answer, and further obtain an SQL query answer set equivalent to the SQL query answer, thereby ensuring fairness in scoring of equivalent answers of different forms submitted by students.

In this embodiment, in step 1), an equivalent SQL query transformation is performed to obtain an equivalent SQL query answer set Q⁺Comprises the following steps:

If equivalence relation algebra

Adding relational algebra sets RA_T(ii) a Collecting relationship algebra RA_TConverting into SQL sentence to obtain equivalent SQL query answer and adding equivalent SQL queryAnswer set Q⁺(ii) a Skipping to execute step 1.3);

1.5) outputting equivalent SQL query answer set Q⁺。

Referring to steps 1.1) to 1.5), the SQL answer Q given by the teacher and the relational algebra expression equivalent transformation rule set R are used as input, Q is firstly converted into a relational algebra expression RA (Q), and the relational algebra expression is added into the equivalent relational algebra expression set RA_TIn (1). The relational algebra expression ra (Q) of Q is then equivalently transformed each time based on one of the transformation rules in the set of transformation rules R. If the transformed relational algebra expression

A difference is found between the relational algebraic expression RA (Q) and Q, then

Is an equivalent relation algebraic expression of Q, and is added into an equivalent relation algebraic expression set RA_TIn (1). Finally, RA is added_TAll equivalent relational algebra expressions in the system are converted into SQL sentences, and the obtained equivalent SQL query sentence set Q based on SQL answers Q given by teachers⁺And returning.

It should be noted that the conversion rule set R may be considered as specified as required, for example, as shown in table 1, the conversion rule set R in this embodiment includes three classes, i.e., an operator-based equivalent transformation rule, an operation-based equivalent transformation rule, and a sub-query equivalent transformation rule, and each class includes a plurality of specific rules.

Table 1: a set of conversion rules R for relational algebra expressions.

In this embodiment, for a target topic, "a student who queries all History systems (History) and physical systems (Physics) in a student table (student)", step 1) first converts a SQL answer provided by a teacher into a relational algebra expressionAnd (4) RA. Then, based on the relational algebra expression equivalent transformation rule R1.2 shown in Table 1, RA is subjected to equivalent transformation to obtain the relational algebra expression RA equivalent to RA₁，RA₁The corresponding SQL query is as follows:

SELECT*

FROM student

WHERE dept_name＝'Physics'

OR dept_name＝'History'

the equivalent SQL answer is consistent with the syntax structure of the SQL answer given by the student A and is the SQL query given based on the condition limit of the WHERE clause, so that the fairness of the refinement and grading of the SQL answer of the student A in the syntax structure aspect can be ensured. Consider that there are partial errors in the SQL answers for both student a and student B, resulting in their failure to be converted into relational algebra expressions.

Step 1.4) the relation algebra RA (Q) is converted according to the current conversion rule R_iPerforming equivalence transformation (abbreviated as SQLTransform in this embodiment) to obtain equivalence relation algebra

Relating to a relational algebra conversion function transformDAG and an equivalent transformation function ApplyRules for performing equivalent transformation according to relational algebra and a transformation rule, the detailed steps comprise:

1.4.1) generating a directed acyclic Graph (Graph) by using a relational algebra RA (Q) and adopting a relational algebra conversion function transformDAG;

the relational algebra conversion function TransformDAG is used for carrying out rule matching on the relational algebra, the relational algebra is required to be expressed in a mode of a directed acyclic Graph, the input of the relational algebra conversion function TransformDAG is relational algebra RA (Q), and the output of the relational algebra conversion function TransformDAG is directed acyclic Graph, and the executing step of the relational algebra conversion function TransformDAG comprises the following steps:

s1, creating a directed acyclic graph;

s2, acquiring a first node (RA (Q) root) of a relation algebra RA (Q) as a current node rel;

s3, judging whether the directed acyclic Graph of the current node rel is true, if true, jumping out of the recursion round, and jumping to execute the step S6; otherwise, executing the next step;

s4, acquiring node set inputs connected with the current node rel;

s5, aiming at each node input in the node set inputs: calling a relational algebra conversion function transformDAG to recursively convert the node input into a Graph node child Vertex, adding the Graph node child Vertex into the directed acyclic Graph and setting a directed edge pointing to a child node child Vertex for a current node rel;

s6, trying to obtain the next node of the relation algebra RA (Q) as the current node rel, and if the node is successful, jumping to execute the step S3; otherwise, judging that the recursion is ended, and outputting the directed acyclic Graph.

1.4.2) taking the conversion rule set R and the directed acyclic Graph as input, and calling an equivalent transformation function ApplyRules to obtain the deformed directed acyclic Graph'.

The equivalent transformation function ApplyRules is a rule matching algorithm based on depth-first traversal, and the input of the equivalent transformation function ApplyRules is a transformation rule set R and a directed acyclic Graph, and the output of the equivalent transformation function ApplyRules is a deformed directed acyclic Graph. The executing step of the relational algebra conversion function transformDAG comprises the following steps:

s1, initializing and setting the matching times matches to be 0;

s2, if the matching times matches are less than or equal to the limit times matchLimit, obtaining the current Graph node vertex from the directed acyclic Graph, and if the obtaining is successful, skipping to execute the next step; otherwise, jumping to execute step S4;

s3, traversing each rule in the conversion rule set R, and carrying out the following processing aiming at the conversion rule R: equivalently deforming the current Graph node vertex by using a rule to obtain a new node newVertex, if the current Graph node vertex is different from the new node newVertex, increasing the matching times matches by 1, and if the matching times matches still do not reach the limit times matchLimit, performing traversal conversion on the Graph by using depth-first traversal according to a conversion rule set R, the new node newVertex and the matching times matches to obtain a deformed directed acyclic Graph; jumping to execute step S2;

and S4, outputting the deformed directed acyclic Graph'.

Take the following example SQL as an example:

SELECT instructor.name,instructor.salary

FROM instructor JOIN teaches ON instructor.id＝teaches.id

WHERE instructor.salary>3000AND

first, the example SQL is converted into a relational algebra expression, and a directed acyclic Graph obtained by recursion of the relational algebra expression using a relational algebra conversion function TransformDAG is shown in fig. 3.

And secondly, traversing from the root node of the directed acyclic Graph shown in the figure 3 by using an equivalent transformation function ApplyRules, and when traversing to the Filter node and matching the Filter node to meet the equivalent deformation requirement of a rule R1.1 in the table 1, performing equivalent transformation on the Filter node according to the rule R1.1 to obtain the directed acyclic Graph shown in the figure 4.

And thirdly, performing rule matching on the Filter nodes pushed down in the second step according to depth-first traversal in an equivalent transformation function ApplyRules. And (3) traversing all the rules of the Filter nodes of the directed acyclic graph in the graph 3, if no rule which accords with the type of the Filter nodes exists, continuously performing rule matching on subsequent nodes of the Filter, and stopping until all the nodes do not accord with the equivalent deformation rule. Finally, the directed acyclic graph shown in fig. 3 is restored to SQL-2, and an SQL query that is equivalent to the input SQL and has a different writing method is obtained as follows:

SELECT name,salary FROM(SELECT*FROM public.instructor,public.teaches WHERE instructor.id＝teaches.id)AS t WHERE t.salary>3000AND t.year>2010

and fourthly, after the first result is obtained according to the third step, performing rule matching on the node Join in the Graph 3 again according to depth-first traversal in an equivalent transformation function ApplyRules, wherein the node Join meets a rule R3.1 in the table 12, and performing equivalent transformation on the node Join according to the rule R3.1 to obtain the directed acyclic Graph shown in the Graph 5. Finally, the directed acyclic graph shown in fig. 5 is restored to SQL-3, and an SQL query that is equivalent to the input SQL and has a different writing method is obtained as follows:

SELECT t.name,t.salary FROM(SELECT*FROM public.instructor WHERE salary>3000)AS t INNER JOIN(SELECT*FROM public.teaches WHERE course_id>500)AS t0 ON t.id＝t0.id

and fifthly, after a second equivalent transformation result is obtained according to the fourth step, traversing the residual matchable rules for the Join node according to an equivalent transformation function ApplyRules. Traversing the residual rules, if the Join node meets the rule R2.3 in the table 1, performing equivalent transformation on the Join node according to the rule R2.3, and obtaining the deformed directed acyclic Graph' as shown in FIG. 6. Finally, the directed acyclic graph shown in fig. 6 is restored to obtain SQL-4, and an SQL query with the same semantics as the SQL provided by the teacher and a different writing method is obtained, as follows:

SELECT instructor.name,instructor.salary

FROM(SELECT*FROM instructor,teaches WHERE instructor.id＝teaches.id)AS t

WHERE t.salary>3000AND t.year>2010

referring to fig. 1, as an optional implementation manner, step 2) in this embodiment is specifically implemented by using an SQL query modification module. In step 2) of this embodiment, the SQL query containing partial errors (which may be a keyword error, a pattern information error, or a clause sequence error) submitted by the student is automatically corrected by using the SQL syntax rules and the data dictionary related to the query, so as to obtain a corrected student answer SQL statement. The corrected student SQL answers can be converted into relational algebra expressions in a subsequent refinement scoring module, so that the student SQL answers with partial errors are refined and judged, and the fairness of scoring is guaranteed.

The SQL refinement scoring strategy based on the syntactic structure is implemented on the premise that SQL query answers submitted by students can be converted into relational algebra expressions, and further refinement scoring of the student SQL answers can be given by comparing differences between query trees determined by the relational algebra expressions of the student SQL answers and the query trees of correct answers. However, as can be seen from the actual teaching feedback, the SQL answers submitted by students often appear similar to the SQL answers given by teachers, but there are some cases of errors. The errors that students are prone to include two main cases: (1) errors exist in SQL keywords or database schema information; (2) SQL clauses are in wrong order. These errors all result in the system's inability to convert student submitted SQL answers into relational algebra expressions. In view of this, the present embodiment first implements automatic correction of SQL answers submitted by students and containing partial errors based on SQL syntax rules and a data dictionary related to queries; and then correcting the clause sequence error existing in the SQL answer of the student based on the SQL query grammar. In this embodiment, the step 2) of performing SQL query modification to obtain a modified SQL answer Qs includes:

This embodiment corrects the original SQL answer Q based on the clause set C in step 2.2)_sThe step of the SQL key word and the database mode information error in (1) comprises the following steps:

2.2.5A) judging the current word brand T_jWhether the dictionary Dict is true or not is judged, if true, the next step is executed by skipping; otherwise, skipping to execute the step 2.2.4A);

Step 2.2.1A), when the data dictionary Dict is generated, SQL keywords relevant to query are extracted based on an SQL standard (ISO/IEC9075)2016 to obtain an SQL query keyword set Dict_key. Then extracting the database mode information (including the name of the basic table, the name of the view, the name of the field and the name of the index) of an application database to obtain a database mode information set Dict_schema. Gathering Dict with SQL query keyword set_keyAnd the database schema information set Dict_schemaIntegration can be performed to obtain a data dictionary (denoted as Dict ═ Dict) related to the query_key∪Dict_schema). And then, correcting the SQL answer submitted by the student based on the data dictionary Dict related to the query: setting a word plate (token) to be corrected in the Chinese word plate sequence T as a current word plate T_jIn this embodiment, the word tag T is obtained by calculation based on the edit distance of the character string_jQuery key word with minimum edit distance between them (denoted as ED)_key) And the current word brand T_jMinimum edit distance betweenDatabase schema information (denoted as ED)_schema) And will present word brand T_jThe element of Dict that is the smallest edit distance from it is corrected. If ED_keyWith ED_schemaWith current word brand T_jIf the difference between the editing distances is less than the threshold value, the current word brand T is used_jContext of department determines current word brand T_jThe corrected result of (1). And the correction cost for correcting the SQL key words and the database mode information is calculated simultaneously in the process of correcting the student SQL answer.

See steps 2.2.1A) -2.2.6A), the present embodiment first performs the correction of the keyword and the database schema information for each clause C in C_iPerforming word segmentation to obtain C_iT. Secondly, whether each token in the T is an element in the data dictionary Dict related to the query is judged. If a token T_jNot in Dict, T is calculated separately_jWith the query key set Dict in Dict_keyMinimum string edit distance ED for each element_keyAnd T_jAnd the database schema information set Dict in Dict_schemaMinimum string edit distance ED for each element_schema. If edit distance ED_keyAnd Dict_schemaThe difference between them is not large, and T is needed to be based on_jIn clause C_iTo determine T from the context information in_jIf not, then T_jThe element of Dict that is the smallest edit distance from it is corrected.

In this embodiment, in step 2.2), SQL answer Q is corrected based on clause set C_sThe step of clause misordering in (1) comprises:

2.2.3B) judging the next clause C of the current clause_i+1For the current clause C_iIf the next word is true, the current sub-sentence is determinedSentence next clause C_i+1Adding a set of wrong clauses C_errorThe current clause C is added_iAdding a set of unmatched clauses C_unmatch(ii) a If yes, the next clause C of the current clause is added_i+1As the current clause C_iThe next word or sentence; jump execution 2.2.2B);

2.2.4B) traverse the set of unmatched clauses C_unmatchThe clause in (1) obtains the current clause C_l(ii) a If not, clause set C_unmatchAnd step 2.2.7B) is executed by jumping after traversing is finished; otherwise, skipping to execute the next step;

2.2.7B) if wrong clause set C_errorIf not, the SQL answer Q submitted by the student_sReturning as a revised SQL answer Qs; otherwise, adjusting the sequence of the clauses in the clause set C according to the determined sequence, and then converting the clause set C after the sequence is adjusted into an SQL statement to be used as a modified SQL answer Q_sAnd returning.

See steps 2.2.1B) -2.2.7B), this embodiment first traverses Q sequentially for the clause-sequential correction portion_sEach clause C in clause set C_iJudging the successor clause C_i+1Whether the SQL query clause order rule shown in table 2 is satisfied. If not, C is added_i+1Adding a set of mismatching clauses C_errorAnd C is_iJoining set of unsuccessfully matched clauses C_unmatch(ii) a If yes, clause C is added_iIs set to C_i+1. For C_unmatchEach unsuccessfully matched clause C in (a)_lFrom the set of mismatching clauses C_errorIn is C_lThe match satisfies the subsequent clause restriction rule (see table 2 for details) andthe clause with the strongest relevance is set as the successor clause. If not, and C can not be found_lThe matched subsequent clause indicates that the SQL answer Q which can not be submitted by the student_sCorrection is effected, at which point Q is returned_sItself; if C can be found_errorMatching successor clause C in (1)_matchIs mixing Q with_sIs set to C_errorWhile simultaneously adding C_errorFrom C_errorAnd (4) clearing. If the matching process is finished, finding the wrong matching clause set C_errorIf there is still an object that fails to match, it indicates that the SQL answer Q that cannot be submitted to the student is not the same_sCorrection is effected, at which point Q is returned_sItself; otherwise, the description is Q_sSuccess of correction based on Q_sAnd reordering the clause set C according to the successor clause information of each clause in the clause set C, and returning to generate a corresponding SQL statement based on the reordered clause sequence.

The present embodiment will correct the SQL query syntax in the SQL standard (ISO/IEC9075)2016 for clause ORDER errors present in student submitted SQL answers (e.g., ORDER BY clause errors placed before the HAVING clause). The SQL clause order adjustment rule is shown in table 2.

Table 2: the SQL clause order adjusts the rules.

Clause	Subsequent clause
		SELECT	FROM
FROM	JOIN、WHERE、<NULL>
		WHERE	GROUP BY, collective clause,<NULL>
JOIN	JOIN、WHERE、<NULL>
		Set keywords	SELECT
GROUP BY	HAVING、ORDER BY、<NULL>
		HAVING	ORDER BY、<NULL>
ORDER BY	ORDER BY、<NULL>

Table 2 gives the syntax order of each clause in the SQL query statement shown in the SQL standard, where < NULL > indicates that the current clause may be followed by no other clauses. For example, as can be seen from table 3, the HAVING clause may be followed BY the ORDER BY clause, which is followed BY either no clause or may be followed BY the ORDER BY clause and no other clauses such as HAVING. Based on the SQL clause order adjustment rule shown in table 2, in this embodiment, a student SQL answer corrected by SQL keywords and database mode information and a clause set of the student SQL answer obtained by division before are used as input, and then whether the position of each clause meets the SQL clause order rule shown in table 2 is checked, and if not, correction is performed according to the rule.

In this embodiment, for the target topic "students who query all the History systems (History) and physical systems (Physics) in the student table (student)", step 2) performs keyword and database schema information correction on SQL answers of the students a and B based on the data dictionary Dict related to the query. Specifically, the method comprises the following steps: for the SQL answer of student A, correcting the error token 'dept _ names' into the element 'dept _ name' with the minimum editing distance of the character string in the Dict; for student B's SQL answer, the error token "SELET" is modified to the element "SELECT" in Dict that is the smallest edit distance from its string. Because the SQL answers of two students have no clause sequence error, the clause sequence in the SQL answers of the two students does not need to be adjusted.

Referring to fig. 1, as an optional implementation manner, step 3) and step 4) in this embodiment are implemented by an SQL query refinement scoring module, which takes an SQL query answer set equivalent to an SQL query answer, a modified student SQL answer, and a modification cost as inputs, and obtains a normalized SQL statement by performing normalization processing on each equivalent SQL answer and each modified student SQL answer. Then, the SQL statements of each specification are converted into relational algebra expressions, and then a query tree structure corresponding to the SQL statements can be obtained. And then, determining the refined score of the student SQL answer by taking the correction cost of the student SQL answer and the editing distance between the student SQL answer query tree and each equivalent SQL answer query tree as input.

In order to reduce the syntax difference between the SQL answers irrelevant to the SQL equivalent transformation, and thus reduce the workload of analyzing the SQL answers subsequently, the present embodiment performs the normalization preprocessing on the SQL answers provided by the teacher and the SQL answers submitted by the students. The normalization process in step 3) of this embodiment includes:

normalized relationship assertion: standardizing the form of the equivalent relationship assertion into the relationship assertion in a unified specified form; for example, the assertion NOT (a < B) is replaced with the assertion a > -B, so that the NOT operator does NOT appear in the SQL answer.

Normalized connection query: standardizing the equivalent connection query form into a connection query in a unified specified form; for example, a connection query initiated by NATURAL INNER JOIN is uniformly rewritten to a connection query initiated by INNER JOIN.

Normalized nested queries: standardizing the form of the equivalent nested query into a nested query of a unified specified form; for example, uniformly rewriting those nested sub-queries connected by the operator IN/ANY into nested sub-queries connected by EXISTS.

Delete betweeen assertion: the form of the equivalent interval query (betweeen) is specified as a representation based on a comparison operator. For example, uniformly rewriting assertions represented by betweeen in SQL queries into their equivalent assertions represented by comparison operators (e.g., > and <).

In order to quantify the difference between the SQL answer provided by the teacher and the modified SQL answer of the student in the syntactic structure, the present embodiment converts the SQL answer provided by the teacher and the SQL answer submitted by the student into a query Tree in a fat Tree (Flattened Tree) structure, respectively. The query tree used in the query optimization process of the relational database system has mature heuristic optimization rules and can be optimized into a consistent organization form, so that the difference comparison of syntax structures between the SQL answer provided by a teacher and the modified student SQL answer is effectively supported. FIG. 2 illustrates the query tree structure of the following SQL query:

SELECT student.name

FROM student,takes

WHERE student.id＝takes.id

AND takes.course_id＝’2’；

it should be noted that, the query tree and the generation method thereof are prior art, and therefore, the specific implementation is not expanded here. After the SQL answer provided by the teacher and the modified student SQL answer are converted into the query tree structure, the conversion cost on the syntactic structure between the two answers can be quantized based on the editing distance of the tree. Based on the correction cost of the SQL answer submitted by the student and the conversion cost between the query tree of the SQL answer provided by the teacher and the corrected query tree of the SQL answer of the student, the detailed scoring of the SQL answer submitted by the student can be realized. It can be seen that the implementation of refinement scoring relies on a computational method that defines two costs. The calculation of the two costs depends on the weight setting given by the teacher aiming at different clause types, and the weight value expresses the importance degree of different clauses for SQL query. The related clause types include 9, which are respectively a SELECT clause, a FROM clause, a WHERE clause, a GROUP BY clause, a JOINS clause, an AGGREGATES clause, a DISTINCT clause, a HAVING clause and a SUBQUERY clause. Two cost calculation methods are introduced below based on the weight setting of clauses.

In this embodiment, the SQL statement Q 'specified in step 4) is'_sAnd canonical SQL statement Q'_iThe computational function expression of the conversion cost between is:

in the above formula, Penalty_t(Q′_s，Q’_i) Is a canonical SQL statement Q'_sAnd a canonical SQL statement Q'_iConversion cost between, Component (Q'_s) Express canonical SQL statement Q'_sSet of composition elements of the corresponding query tree, O_iExpress canonical SQL statement Q'_sThe ith combined element, ED (O), of the corresponding query tree_i，Q’_i) Is a specification-based SQL statement Q'_iModified canonical SQL statement Q 'of the corresponding query tree'_sCorresponding combined element O in query tree_iThe value of the edit distance introduced is,

is a combination element O_iAnd W is the sum of the weighted values of all clause types, and S is the score of the target topic.

Calculating the correction cost of the student SQL answer in the step 2) of the embodiment includes calculating a correction SQL answer Q_sThe first correction cost of SQL keyword and database mode information error in the database, and calculating the corrected SQL answer Q_sA second correction cost for a wrong order of clauses in (1); and the calculation function expressions of the first correction cost and the second correction cost are shown as the following formulas:

in the above formula, Penalty_c(Q_s) Watch (A)Shows a first correction cost, Q_sAs a revised SQL answer, C (Q)_s) Answer Q for SQL_sThe set of clauses of (a) is,

In this embodiment, when the refined score of the student SQL answer is calculated in step 4), a function expression is adopted as follows:

in the above formula, G (Q)_s) For the refined score of the student 'S SQL answer, S is the student' S submitted original SQL answer Q_sScore of (1), Penalty_c(Q_s) Represents a first correction cost, Penalty_a(Q_s) Indicating a second correction cost, Penalty_t(Q′_s，Q’_i) Is standardizedSQL statement Q'_sAnd a canonical SQL statement Q'_iThe cost of the conversion between (a) and (b),

representing all canonical SQL statements Q'_sAnd a canonical SQL statement Q'_iThe minimum in the transition costs between. Wherein, the last item of the formula represents the student SQL answer Q 'after being corrected'_sThe equivalent answer with the smallest conversion cost between the query trees. It can be seen that the refined score of the student SQL answer is obtained by deducting the correction cost of the SQL answer submitted by the student on the basis of the score of the SQL programming question and converting the corrected student SQL answer into the conversion cost of the SQL answer closest to the syntactic structure of the student SQL answer.

In this embodiment, for a target topic "a student who queries all historical systems (History) and physical systems (Physics) in a student table (student)", step 4) calculates an edit conversion cost between an SQL answer query tree of each student and each equivalent SQL answer query tree based on two equivalent SQL answers to the programming question, and further refines and obtains a refined score of an SQL answer of the student a and an SQL answer of the student B based on a correction cost of the SQL query and the edit conversion cost between the query trees.

The complexity analysis of the method of this example is as follows: the method of the embodiment integrates the SQL query equivalent transformation (SQLTranform) and the SQL query correction (SQLCorrection), and finally achieves the refinement scoring of the SQL query_cAnd SQL clause order modification cost Penalty_a. Then answer Q 'to student SQL'_sAnd an equivalent SQL answer set Q⁺Each equivalent SQL answer in (1) is normalized, and then a corrected student SQL answer Q 'is calculated'_sQuery tree and Q⁺The query tree of each equivalent SQL answer and the translation cost between them. And finally, calculating to obtain and return a refined score of the student SQL answer. The spatiotemporal complexity of the refined scoring method of the present embodiment is analyzed as follows. This example is detailedThe main calculation cost of the chemical scoring method comes from three parts: (1) the computation cost of the equivalent transformation method (SQLTranform method) of SQL queries; (2) the computation cost of the SQL query correction method (SQLCorrection method); (3) and calculating the cost of conversion between the corrected student SQL answer and the query tree of each equivalent SQL answer. The temporal complexity of the first partial SQLTranform method is O (| R |), where | R | is the cardinality of the equivalent transformation rule set R of the relational algebra expression. The temporal complexity of the second partial SQLCorrection method is O (T)_ELog L), wherein T_EIs student SQL answer Q_sThe total number of false tokens contained in each clause, L is the maximum length of the element name string in the data dictionary Dict, and O (Log L) represents the time complexity of searching a string in Dict based on the Tire tree index. And in the third part, the time complexity of conversion calculation between the modified student SQL answers and the query tree of each equivalent SQL answer is O (| Q)⁺| N log N), wherein | Q⁺I is SQL equivalent answer set Q⁺Is the average number of constituent elements included in the query tree, and O (N × log N) is the computational time complexity of the query tree edit distance. In summary, the computation time complexity of the method of the present embodiment is O (| R | + T)_E*logL+|Q⁺| N | _ log N). Since the space consumption of the method of this embodiment is mainly from storing the data dictionary Dict, the space complexity of the method of this embodiment is O (| Dict |), wherein | Dict |, represents the number of elements included in the data dictionary Dict.

In addition, the present embodiment also provides an SQL-query-oriented refinement and scoring system, which includes a computer device programmed or configured to execute the steps of the SQL-query-oriented refinement and scoring method, or a computer program programmed or configured to execute the aforementioned SQL-query-oriented refinement and scoring method stored in a memory of the computer device.

Furthermore, the present embodiment also provides a computer-readable storage medium, in which a computer program programmed or configured to execute the above-mentioned SQL query-oriented refinement scoring method is stored.

In order to test the effectiveness of the method (abbreviated as SQL-graph) of the present embodiment, the present embodiment compares the variation technology of SQL-graph (abbreviated as SQL-graph w/o correction technology) that does not include the SQL query modification function with the most representative related work XData technology as the comparison of the method of the present embodiment. In order to obtain student SQL answers, five textbooks (including 284 students) with database technology courses are provided with jobs (covering typical SQL query types such as set query, connection query and condition query) including 12 SQL programming questions on an SQL online learning platform. Based on SQL answer data which are submitted by students and aim at 12 SQL programming questions, the advantages and disadvantages of the three SQL scoring technologies are compared. The information on the number of students from the five classes is shown in table 4.

Table 4: the experiment relates to information about the class.

Course(s)	Number of students	Number of SQL questions
			DB-1	58	12
DB-2	52	12
			DB-3	55	12
DB-4	57	12
			DB-5	62	12

In the experiment of this embodiment, the teacher provides only one SQL answer to each SQL programming question. The weighted values of different query clause types which are commonly depended on by the three technologies for realizing the SQL refinement scoring function are all set to be 1, which is shown in table 5 in detail.

Table 5: and inquiring the weight setting of the clause type.

Sub-query type	Weight of	Sub-query type	Weight of
				SELECT	1	AGGREGATES	1
FROM	1	DISTINCT	1
				WHERE	1	HAVING	1
GROUP BY	1	SUBQUERY	1
				JOINS	1

Since scorers often have a certain subjectivity when scoring the SQL answers of students, it is not suitable for directly numerically comparing the SQL query scores given by scorers with the SQL query scores given by various technologies to determine whether the scores given by the SQL refinement scoring technology are fair. In view of this, following the experimental strategy described in Automated Grading of SQL Queries (ICDE,2019: 1630-1633) of Chandra B et al, several random creations of the form of multiple questions j are created for each question j

Each query pair contains two student SQL answers with partial errors, and the answer is recorded as

And

(wherein

The 1 st SQL query in the nth query pair representing the SQL programming question j,

then the 2 nd SQL query is represented). Then two teachers with the course teaching experience of database technology for more than 8 years and three assistant teachers are invited to serve as scorers, and the scorers are required to mark each SQL query pair as one of the following three categories, and the final mark of each SQL query pair is determined based on a few rules subject to majority:

(1)

a higher score should be obtained;

(2)

a higher score should be obtained;

(3) both contain different errors and they should get almost the same score.

Next, each SQL query in each SQL query pair is refined and scored using three SQL refinement scoring techniques, respectively, and each query pair is classified as one of the above three categories based on the results of the refinement scoring. It should be noted that if the difference between the refined scores of two SQL queries in a SQL query pair is less than 10% of the score of the SQL programming question to which they correspond, then the query pair is labeled as the third class.

For SQL query pair SQ_j,nIf the scorer and the SQL automatic scoring technology classify the scorer and the SQL automatic scoring technology into the same class, the SQL automatic scoring technology is determined to be SQ_j,nThe score for the medium SQL query is a fair score. Therefore, the fairness of scoring of a certain SQL refinement scoring technique with respect to the SQL programming question j can be quantified based on the following scoring accuracy formula.

In the above formula, Accurancy_jThe fairness of scoring for a certain SQL refinement scoring technique relative to the SQL programming question j, m is the number of query pairs of the SQL programming question j,

then represents SThe label matching result of the nth SQL query pair of the QL programming question j is as follows: if the scorer and SQL refinement scoring technique mark the query pair as the same class, then

Otherwise, the reverse is carried out

For the same SQL programming question, the SQL answer submitted by students is diversified in form, and different solution ideas of the students are reflected. To analyze the existence of student answer diversity in this SQL programming practice, 3167 student SQL answers collected from the experiment were counted. Assuming that the ratio of the kth SQL type answer type of the SQL programming question E is p_kAnd k is 1, …, n, then the formula for the student answer diversity index (E) of the SQL programming question E is given based on the idea of information entropy:

in the above formula, p_kThe proportion of the kth SQL answer type of the SQL programming question E is, and n is the classification number of the SQL programming question E.

Table 6 shows student SQL answer type division ratios of 12-pass SQL programming questions and diversity index of student SQL answers. As can be seen from table 6, the student SQL answers corresponding to each SQL programming question exhibit significant diversity. The 9 th student SQL answer diversity expression is most obvious, and the student answer diversity index reaches 1.415: 20% of students use the idea of aggregate queries to compose answers, 57% of students use the idea of conditional queries to compose answers, and 23% of students use the idea of nested queries to compose answers. The diversity of student answers has affected the fairness in scoring for XData technologies that rely only on a single SQL answer provided by the teacher to implement SQL refinement scoring. Meanwhile, the necessity of automatically generating a plurality of equivalent SQL answers based on the equivalent transformation rule of the relational algebra expression in the method is also proved.

Table 6: student answer diversity analysis table.

In this embodiment, the fairness of scoring the SQL answers submitted by the students is compared with the three SQL refinement scoring techniques. And randomly extracting 202 SQL query pairs without repetition from the incorrect student SQL answer set of each SQL programming question as evaluation objects. Table 7 shows the number of equivalent SQL answers obtained for each SQL programming question and the accuracy of the SQL refinement scoring for the three SQL refinement scoring techniques.

Table 7: and (5) carrying out score fairness analysis.

As can be seen from table 7, the XData technique has the lowest scoring accuracy (embodying the fairness of scoring), and the average value of the scoring accuracy is only 61.55%, because it cannot cope well with the diversity of SQL answers of students, and does not correct the SQL answers submitted by students with partial errors. Compared with the XData technology, the method provided by the embodiment can generate a plurality of SQL answers with different semantic equivalent forms as a batch modification basis, and corrects the student SQL answers with partial grammar errors, so that the highest scoring fairness is realized on 12 SQL programming questions, the average scoring precision reaches 83.15%, and the average scoring precision is improved by 33% compared with the average scoring precision of the XData technology on the 12 questions. Meanwhile, it can be observed that the score accuracy of the SQL-graph w/o correction method is averagely lower by 9 percent than that of the method in the embodiment because the SQL answer submitted by the student is not corrected. This shows that modifying the student submitted SQL answers is very effective in improving the fairness of the SQL refinement scoring. Table 7 also shows that the scoring accuracy of the method of this embodiment on the SQL programming question of the 9 th channel exceeds 90%, because the SQL answers submitted by students are the most diverse (see table 6 in detail), which makes the superiority of the method of this embodiment over the XData technology more remarkable.

FIG. 7 shows SQL refinement scoring runtime for the same number of equivalent SQL answers under different SQL refinement scoring techniques. As shown in FIG. 7, XData technology has minimal runtime and its runtime does not fluctuate significantly with the number of equivalent SQL answers. This is because XData performs scoring of SQL queries based on only one teacher-supplied SQL answer. The two scoring techniques, namely the method of the present embodiment and the SQL-score w/o correction method, can generate a plurality of equivalent SQL answers based on the SQL answers provided by the teacher and accordingly achieve scoring of the SQL query, so that the running times of the two scoring techniques fluctuate with the change of the number of the equivalent SQL answers owned by the problem. Meanwhile, it can be observed that, as the number of equivalent SQL answers owned by the problem increases, the running times of the method of the present embodiment and the SQL-graph w/o correction method both show a rising trend, but when the number of equivalent SQL answers is 4 or 6, the rising trend of the running times fluctuates. This is because the running time of the method of the present embodiment and the SQL-graph w/o correction method has a positive correlation with the number of wrong answers to the problem, because the more wrong answers, the more time the two methods consume for correcting the SQL answer submitted by the student and calculating the edit distance between the query tree of the SQL answer submitted by the student and the query tree of each equivalent SQL answer. FIG. 8 shows the statistics of the number of wrong answers corresponding to the problem with the same number of equivalent SQL answers in the SQL-GRADER w/o correction method. As can be seen from the data in fig. 7 and fig. 8, for the problem set with the equivalent SQL answer number of 4, the number of wrong answers is significantly smaller than that of the problem set with the equivalent SQL answer number of 3, so the running time of the method of the present embodiment and the SQL-graph w/o correction method on the problem set with the equivalent SQL answer number of 4 is slightly smaller than that on the problem set with the equivalent SQL answer number of 3. Similarly, it is easy to know that the running time of the two methods on the problem set with the number of the equivalent SQL answers of 6 is slightly less than that on the problem set with the number of the equivalent SQL answers of 5 due to the difference of the number of wrong answers. In addition, it can be observed from FIG. 7 that the running time of the SQL-GRADER w/o correction method is slightly less than that of the present embodiment. This is because the SQL-GRADER w/o correction method removes the revision module of the SQL query, compared to the present embodiment.

In conclusion, the automatic scoring technology for SQL query can greatly reduce the burden of teachers on batch modification of student SQL work, and has important research significance. However, the existing SQL query automatic scoring technology does not fully consider the fairness of scoring for different forms of answers submitted by students and the fairness of scoring for answers submitted by students containing partial errors, thereby affecting the learning interest and confidence of students in SQL queries. Aiming at the defects of the existing research work, the embodiment provides a SQL query-oriented refinement scoring technology which is named as SQL-GRADER. SQL-GRADER utilizes SQL correction strategy based on correct answer SQL statement equivalent transformation, SQL grammar rule and query related data dictionary to raise the fairness of SQL refinement scoring. The method comprises the following steps of collecting student SQL answer data by a plurality of students in teaching classes participating in SQL programming teaching activities, and carrying out experimental analysis on related SQL query refinement scores based on the collected student answer data to show that: the SQL-GRADER refinement scoring technology can effectively improve the accuracy and fairness of scoring.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. A refinement scoring method for SQL query is characterized by comprising the following steps:

2) According to the source submitted by students for target questionsFirst SQL answer Q_sCarrying out SQL query correction on the data dictionary Dict related to the query to obtain a corrected SQL answer Q_sCalculating the correction cost of the student SQL answer;

3) the corrected SQL answer Q is_sCarrying out normalization processing to obtain a normalized SQL statement Q'_sConverting the obtained standard SQL statement into a relational algebra expression and further converting the relational algebra expression into a student SQL answer query tree;

2. The SQL query-oriented refinement scoring method according to claim 1, wherein the SQL query equivalence transformation is performed in the step 1) to obtain an equivalent SQL query answer set Q⁺Comprises the following steps:

If equivalence relation algebra

1.5) outputting equivalent SQL query answer set Q⁺。

3. The SQL query-oriented refinement scoring method according to claim 1, wherein the step of modifying the SQL query in the step 2) to obtain a modified SQL answer Qs comprises the following steps:

4. The SQL query-oriented refinement scoring method according to claim 3, wherein in the step 2.2), the original SQL answer Q is corrected based on the clause set C_sThe step of the SQL key word and the database mode information error in (1) comprises the following steps:

2.2.2A) traversing clauses in clause set C to obtain current clause C_iIf the clause set C is traversed completely, judging that the SQL key words and the database mode information are corrected completely, and returning; otherwise, skipping to execute the next step;

5. The SQL query-oriented refinement scoring method according to claim 3, wherein in the step 2.2), the SQL answer Q is corrected based on the clause set C_sThe step of clause misordering in (1) comprises:

2.2.6B) if clause C_mathchIf the answer is null, the SQL answer Q submitted by the student is transmitted_sAs a revised SQL answer Q_sReturning; otherwise, clause C is set_mathchAs the current clause C_lThe next clause of (a); skipping to execute step 2.2.4B);

6. The SQL query-oriented refinement scoring method according to claim 1, wherein the normalization processing in the step 3) comprises: standardizing the form of the equivalent relationship assertion into the relationship assertion in a unified specified form; standardizing the equivalent connection query form into a connection query in a unified specified form; standardizing the form of the equivalent nested query into a nested query of a unified specified form; the form of the equivalent interval query is normalized to a representation based on a comparison operator.

7. The SQL query-oriented refinement scoring method according to claim 1, wherein the SQL statement Q 'specified in the step 4)'_sAnd canonical SQL statement Q'_iThe computational function expression of the conversion cost between is:

8. The SQL query-oriented refinement scoring method according to claim 1, wherein the function expression adopted when the refinement score of the student SQL answer is calculated in the step 4) is as follows:

9. A refined scoring system for SQL queries, comprising a computer device, wherein the computer device is programmed or configured to perform the steps of the refined scoring method for SQL queries according to any one of claims 1 to 8, or a computer program programmed or configured to perform the refined scoring method for SQL queries according to any one of claims 1 to 8 is stored in a memory of the computer device.

10. A computer-readable storage medium having stored thereon a computer program programmed or configured to perform the SQL query-oriented refinement scoring method according to any one of claims 1 to 8.