CN112132420B

CN112132420B - SQL query-oriented refinement scoring method

Info

Publication number: CN112132420B
Application number: CN202010922595.1A
Authority: CN
Inventors: 许嘉; 莫晓琨; 吕品
Original assignee: Guangxi University
Current assignee: Guangxi University
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2023-11-28
Anticipated expiration: 2040-09-04
Also published as: CN112132420A

Abstract

The invention discloses a refinement scoring method for SQL queries, which is characterized in that on one hand, reasonable score judgment of answers of different forms submitted by students is realized by utilizing an equivalent SQL statement set obtained based on equivalent transformation of correct answers and SQL statements, and on the other hand, correction of partial incorrect answers of the students is provided, and the scores of the answers of the students are quantized based on correction cost and conversion cost between the corrected answers and the correct answers. And collecting answer data of students through online SQL programming practice activities participated by a plurality of teaching class students, and performing experimental analysis on related SQL query refinement scores based on the collected student answer data. According to the invention, through the equivalent change of the teacher answer and the correction of the student answer and the subsequent refinement scoring processing, the accuracy of SQL query judgment can be improved on the basis that the conventional simple judgment answers are wrong, so that the SQL query judgment is fairer and more reasonable.

Description

SQL query-oriented refinement scoring method

Technical Field

The invention relates to database technology, in particular to a refinement scoring method oriented to SQL (structured query language) query.

Background

Database technology has been the core backbone class of computer science and whether to proficiency database technology is also an important criterion for measuring IT practitioner skills. Learning of structured query language (i.e., SQL) plays a vital role in the learning process of database technology. Like other programming languages, SQL requires high programming practices, i.e., a large number of programming practices are required to actually understand and master SQL. In the actual teaching process, SQL programming questions are generally modified by professional teachers. However, modifying SQL programming questions by a teacher presents two important issues. Firstly, the answers of the SQL programming questions have diversity, which definitely increases the evaluation burden of teachers, and can cause the teachers to judge equivalent answers with different forms with different scores, so that the scoring fairness is affected. Secondly, the evaluation criteria of the SQL programming questions have ambiguity, and teachers can judge wrong answers with consistent forms with different scores, and the scoring fairness is also affected. It can be seen how to realize effective automatic scoring of SQL is an important issue for research and solution in the current database technology course construction.

In order to reduce the correction burden of teachers of SQL programming questions and improve the fairness of scoring, in recent years, very good research works are developed on top-level or important academic conferences in the fields of database research such as VLDB, ICDE and the like, automatic scoring of SQL query is realized, and the method can be specifically divided into three types: (1) SQL scoring techniques based on result set comparisons; (2) crowd-sourced based SQL scoring techniques; (3) SQL scoring techniques based on syntactic structure analysis. The SQL scoring technology based on the result set comparison realizes SQL automatic scoring by comparing the query result set of correct answers with the query result set of student answers. SQL scoring techniques based on result set comparisons can give fair scores (i.e., full scores) for equivalent correct answers, however, there are "mismatching" correction limitations. Specifically, for a partially incorrect SQL statement given by a student (for example, the student has only one field name misspelled), the SQL scoring technique based on the result set comparison cannot analyze the SQL content, so that the result set comparison cannot be given a refinement score smaller than full score like a teacher, and only 0 score can be given to the result set comparison, thereby possibly frustrating the interest and confidence of the student in learning the SQL. The crowdsourcing-based SQL scoring technique assigns a task of modifying SQL programming questions to students, and estimates the real score of a certain SQL programming question based on scores of the questions by multiple students. Although the SQL scoring technology based on crowdsourcing not only reduces the correction burden of teachers, but also gives out the refinement score of SQL, the correction period is long, and the score feedback cannot be given out in time. In recent years, some researchers have been working on developing SQL scoring techniques based on syntactic structure analysis. The technology realizes automatic grading of SQL by comparing the syntax structure of correct answers, the syntax structure of clause contents and student answers, and the difference between the clause contents. Although the SQL scoring technology based on the syntactic structure analysis can give out the refined scoring result of SQL, in view of the diversity of correct answers of SQL programming questions, teachers can hardly give out all correct answers of each question, so that the technology still cannot effectively solve SQL automatic scoring.

In summary, because the key and difficult points of SQL learning are the data query function, the automatic scoring problem of SQL query is solved in a centralized way by the existing SQL automatic scoring technology. Although the prior art has made a lot of research progress in automatic scoring of SQL queries, in order to further improve the fairness of automatic scoring of SQL queries, the following two challenge problems still need to be solved:

challenge one: the automatic scoring technique also needs to be able to derive other equivalent correct answers based on the correct answers given by the teacher, thereby guaranteeing the fairness of scoring for the different forms of answers submitted by the students. The conventional SQL automatic scoring technology can not provide the guarantee. As shown in table 1, for the query semantic of "student information of query history system and physical system", the teacher gives out the SQL answer based on the set merge operation, and the student a gives out the SQL answer based on the WHERE clause condition judgment. Because the problem of misspelling of the field name of the system of the SQL answer of the student A cannot be executed, the traditional SQL scoring technology based on result set comparison can only score 0. And because the SQL answer structure of the student A has a significant difference from that given by a teacher, the conventional SQL scoring technology based on syntactic structure analysis cannot give fair refinement scores.

Challenge two: for SQL statements that are partially incorrect relative to answers, automatic scoring techniques need to be able to give refined scoring results to ensure scoring fairness for partially incorrect answers. The conventional SQL automatic scoring technology can not provide the guarantee. As shown in Table 1, the SQL answers submitted by student B are consistent with the answers given by the teacher, except that the keyword SELECT is misspelled. However, since the statement is wrongly unable to be executed, the conventional SQL scoring technology based on result set comparison can only score 0. The existing SQL scoring technology based on the syntactic structure analysis can not implement syntactic analysis due to grammar errors and can only judge 0 points.

Table 1: unfair SQL scoring example.

Disclosure of Invention

The invention aims to solve the technical problems: aiming at the problems in the prior art, the invention provides a refinement scoring method for SQL query, which can improve the accuracy of SQL query judgment on the basis that the prior simple judgment answers are wrong by correcting the equivalent changes of the answers of teachers and the answers of students and combining the subsequent refinement scoring processing, so that the SQL query judgment is fairer and more reasonable.

In order to solve the technical problems, the invention adopts the following technical scheme:

A refinement scoring method for SQL queries, comprising:

1) Performing SQL query equivalent transformation according to an SQL query answer Q provided by a teacher for a target question and an input conversion rule set R to obtain an equivalent SQL query answer set Q ⁺ ；

2) According to the original SQL answer Q submitted by students for target questions _s SQL query correction is carried out on the data dictionary Dict related to the query to obtain a corrected SQL answer Q _s Calculating correction cost of the SQL answer of the student;

3) Normalizing the corrected SQL answer Qs to obtain a normalized SQL statement Q' _s Converting the obtained standard SQL sentence into a relational algebra expression and further converting the relational algebra expression into a student SQL answer query tree;

4) Answer set Q for equivalent SQL query ⁺ Each equivalent SQL query answer Q in (1) _i : first, answer Q to an equivalent SQL query _i Normalized processing is carried out to obtain normalized SQL statement Q' _i Converting the obtained standard SQL statement into a relational algebra expression and further converting the relational algebra expression into an equivalent SQL query answer query tree; then, according to the edit distance between the SQL answer query tree and the SQL answer query tree, calculating the normalized SQL sentence Q' _s Canonical SQL statement Q' _i Conversion cost between; according to the correction cost of the student SQL answer and the standard SQL statement Q' _s And canonical SQL statement Q' _i And (5) calculating the conversion cost between the two to obtain the refinement score of the SQL answer of the student.

Optionally, in step 1), performing an SQL query equivalent transformation to obtain an equivalent SQL query answer set Q ⁺ The method comprises the following steps:

1.1 Acquiring SQL query answers Q and an input conversion rule set R provided by a teacher aiming at a target question;

1.2 Converts SQL query answer Q into relational algebra RA (Q), adds relational algebra RA (Q) to relational algebra set RA _T ；

1.3 If the conversion rule set R is empty, skipping to execute step 1.5); otherwise, traversing and taking out a current conversion rule R from the conversion rule set R _i Skipping to execute the next step;

1.4 Relational algebra RA (Q) according to the current conversion rule R _i Performing equivalent transformation to obtain equivalent relation algebraAlgebraic equivalent relation->Different from relation algebra RA (Q), equivalent relation algebra is +>Joining relational algebra set RA _T The method comprises the steps of carrying out a first treatment on the surface of the Combining relational algebra sets RA _T Converting into SQL sentences to obtain an equivalent SQL query answer and adding the equivalent SQL query answer set Q ⁺ The method comprises the steps of carrying out a first treatment on the surface of the Skipping to execute the step 1.3);

1.5 Output equivalent SQL query answer set Q) ⁺ 。

Optionally, the step of performing the SQL query modification in the step 2) to obtain the modified SQL answer Qs includes:

2.1 Original SQL answer Q submitted to student _s Splitting to obtain an original SQL answer Q _s Clause set C of (a);

2.2 Correcting the original SQL answer Q based on clause set C _s SQL key and database schema information errors in the database are based on childrenSentence set C revises SQL answer Q _s Is in error.

Optionally, in step 2.2), the original SQL answer Q is modified based on the clause set C _s The step of SQL key and database schema information errors in the database comprises the following steps:

2.2.1A) generating a data dictionary Dict, which is composed of a set of SQL query keywords Dict _key Database schema information set Dict _schema The database schema information set Dict is formed _schema The database schema information of (1) comprises a basic table name, a view name, a field name and an index name;

2.2.2A) traversing clauses in clause set C to obtain current clause C _i If the clause set C is traversed, judging that the SQL key and the database mode information are corrected in error, and returning; otherwise, jumping to execute the next step;

2.2.3A) for the current clause C _i Word segmentation processing is carried out to obtain a word card sequence T to be corrected;

2.2.4A) traversing a current word T from a word sequence T _j If the traversal is completed, the step 2.2.2A is skipped; otherwise, jumping to execute the next step;

2.2.5A) judging the current word T _j Whether the jump is not established in the data dictionary Dict is judged, and if so, the jump is carried out to the next step; otherwise, jumping to execute step 2.2.4A);

2.2.6A) SQL query keyword set Dict in data dictionary Dict _key Based on the editing distance of the character string, the current word T is obtained _j Keyword ED with minimum editing distance _key Database schema information set Dict in data dictionary Dict _schema Based on the editing distance of the character string, the current word T is obtained _j Keyword ED with minimum editing distance _schema If the keyword ED _key Keyword ED _schema The absolute value of the difference is smaller than the preset threshold epsilon, and the current word T is used as the reference _j In clause C _i Context determination of current word T _j The corrected result of the (c) is obtained the corrected current word T' _j Otherwise, the current word is usedCard T _j Corrected to keyword ED _key Keyword ED _schema The object with the smallest editing distance is element in the data dictionary Dict; jump execution step 2.2.4A).

Optionally, in step 2.2), the SQL answer Q is modified based on the clause set C _s The step of mistakes the clause sequence in (a) comprises:

2.2.1B) initialization setting error clause set C _error Set C of empty, unmatched clauses _unmatch Is empty;

2.2.2B) traversing clauses in clause set C to obtain current clause C _i If clause set C has been traversed, jumping to step 2.2.4B); otherwise, jumping to execute the next step;

2.2.3B) determine the next clause C of the current clause _i+1 For the current clause C _i Whether or not the next clause of the current clause is true, if not, then the next clause C of the current clause _i+1 Adding the wrong clause set C _error Will present clause C _i Adding unmatched clause set C _unmatch The method comprises the steps of carrying out a first treatment on the surface of the If so, then the next clause C of the current clause _i+1 As the current clause C _i Is the next word in (a); jump execution 2.2.2B);

2.2.4B) traverse unmatched clause set C _unmatch In the clause to obtain the current clause C _l The method comprises the steps of carrying out a first treatment on the surface of the If not matched with clause set C _unmatch Jump to step 2.2.7B) after the traversal is completed; otherwise, jumping to execute the next step;

2.2.5B) in the error clause set C _error Find the current clause C in _l Matched clause C _mathch ；

2.2.6B) if clause C _mathch If the answer is empty, the SQL answer Q submitted by the student is given _s Returning as the modified SQL answer Qs; otherwise, clause C _mathch As the current clause C _l Is the next clause of (a); jump execution step 2.2.4B);

2.2.7B) if there is a set of erroneous clauses C _error SQL answer Q submitted by student if not empty _s As modified SQL answer Q _s Returning; otherwise, according to the determinationThe clause sequence in the clause set C is regulated, and then the clause set C with the regulated sequence is converted into SQL sentences to be used as a corrected SQL answer Q _s And (5) returning.

Optionally, the normalizing in step 3) includes: normalizing the form of the equivalent relationship assertion into a relationship assertion of a unified specification form; normalizing the form of the equivalent connection query into a connection query in a unified designated form; normalizing the form of the equivalent nested query into a nested query in a unified specified form; the form of the equivalent interval query is normalized to a representation based on the comparison operator.

Optionally, SQL statement Q 'specified in step 4)' _s And canonical SQL statement Q' _i The expression of the calculation function of the conversion cost between the two is as follows:

in the above, penalty _t (Q′ _s ，Q’ _i ) Is a canonical SQL statement Q' _s Canonical SQL statement Q' _i Conversion cost between components (Q' _s ) SQL statement Q 'representing specification' _s Corresponding combined element set of query tree, O _i SQL statement Q 'representing specification' _s The ith combined element of the corresponding query tree, ED (O _i ，Q’ _i ) Is SQL statement Q 'based on specification' _i Modified canonical SQL statement Q 'of corresponding query tree' _s Combined element O in corresponding query tree _i The edit distance value that is introduced,is a combined element O _i The weight value of the clause type, W is the sum of the weight values of all clause types, S is the score of the target topic;

Calculating the modified cost of the student SQL answer in step 2) includes calculating a modified SQL answer Q _s First correction cost of SQL key and database mode information error in the database, and calculation correction SQL answer Q _s A second correction cost for clause order errors in (a); and the calculation function expressions of the first correction cost and the second correction cost are shown as follows:

in the above, penalty _c (Q _s ) Representing the first correction cost, Q _s For the modified SQL answer, C (Q _s ) For SQL answer Q _s Is used for the set of clauses of (a),for clause C _i Editing modification number of (a),>for clause C _i The weight value of the clause type, W is the sum of the weight values of all clause types, S is the score of the target topic;

in the above, penalty _a (Q _s ) Representing the second correction cost, Q _s For the modified SQL answer, C (Q _s ) SQL answer Q _s Clause set of (C) _i ) For clause C _i Indicating variable of (C), if clause C _i Is adjusted according to the order of I (C _i ) =1, otherwise I (C _i )＝0，For clause C _i The weight value of the clause type, W is the sum of the weight values of all clause types, and S is the score of the target topic.

Optionally, the refinement score of the student SQL answer is calculated in the step 4) and is expressed as follows by adopting a functional expression:

in the above, G (Q) _s ) To obtain a refined score of the SQL answer of the student, S is the original SQL answer Q submitted by the student _s Scores of Penalty _c (Q _s ) Representing the first correction cost, penalty _a (Q _s ) Representing a second modified cost, penalty _t (Q′ _s ，Q’ _i ) Is a canonical SQL statement Q' _s Canonical SQL statement Q' _i The cost of the conversion between the two,SQL statement Q 'representing all specifications' _s Canonical SQL statement Q' _i The minimum of the transition costs between.

In addition, the invention also provides a refinement scoring system facing the SQL query, which comprises computer equipment, wherein the computer equipment is programmed or configured to execute the steps of the refinement scoring method facing the SQL query, or a computer program programmed or configured to execute the refinement scoring method facing the SQL query is stored in a memory of the computer equipment.

Furthermore, the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program programmed or configured to execute the refinement scoring method facing SQL query.

Compared with the prior art, the invention has the following advantages: learning of structured query language SQL has been a major learning problem and difficulty of database technology, requiring students to perform extensive SQL programming practices. In recent years, automatic scoring technology for SQL queries becomes a current research hotspot because the burden of teachers to modify students' SQL homework can be greatly reduced. However, the existing automatic scoring technique for SQL queries does not fully consider the fairness of scoring for different forms of answers submitted by students and the fairness of scoring for answers submitted by students that contain partial errors, thereby affecting the learning interest and confidence of students in SQL queries. Aiming at the problems, the invention provides a refinement scoring method facing SQL query, which on one hand utilizes an equivalent SQL statement set obtained based on the equivalent transformation of a correct answer SQL statement to realize reasonable score judgment of different forms of answers submitted by students, and on the other hand, provides correction of partial incorrect student answers and quantifies the score of the student answers based on correction cost and conversion cost between the corrected answers and the correct answers. And collecting answer data of students through online SQL programming practice activities participated by a plurality of teaching class students, and performing experimental analysis on related SQL query refinement scores based on the collected student answer data. Compared with the related technology, the method has the advantages that the SQL scoring fairness is better, and the accuracy of the SQL scoring is obviously improved.

Drawings

Fig. 1 is a schematic diagram of the basic principle of the method according to the embodiment of the invention.

FIG. 2 is a schematic diagram of a query tree in an embodiment of the present invention.

Fig. 3 is a directed acyclic Graph obtained from a first recursion in an embodiment of the invention.

Fig. 4 is a directed acyclic Graph from a second recursion in an embodiment of the present invention.

Fig. 5 is a directed acyclic Graph from a third iteration of an embodiment of the invention.

Fig. 6 is a Graph' of the resulting deformed directed acyclic Graph in an embodiment of the invention.

FIG. 7 is a comparative schematic of the run-time analysis of three methods in the examples of the present invention.

FIG. 8 is a schematic diagram showing comparison of the number of wrong respondents in three methods according to the embodiment of the present invention.

Detailed Description

In the following, taking the objective questions as "query students in all History and Physics" as examples, the refined scoring method (abbreviated as SQL-GRADER) for SQL query of the invention will be further described in detail, and the original SQL answers Q submitted by students A and B for objective questions will be submitted by students A and B _s As shown in table 1.

As shown in fig. 1, the refinement scoring method for the SQL query in this embodiment includes:

Referring to fig. 1, as an alternative implementation manner, in this embodiment, an SQL query equivalent transformation module is specifically adopted as an execution main body of step 1), step 1) takes an SQL query answer (as shown in table 1) provided by a teacher as input, and performs equivalent transformation on a relational algebra expression of the SQL query answer by using an equivalent transformation rule of the relational algebra expression to obtain a plurality of relational algebra expressions equivalent to the relational algebra expression of the SQL query answer, and further obtain an SQL query answer set equivalent to the SQL query answer, thereby guaranteeing scoring fairness of different forms of equivalent answers submitted by students.

Step 1 of this exampleIn the process, SQL query equivalent transformation is carried out to obtain an equivalent SQL query answer set Q ⁺ The method comprises the following steps:

1.5 Output equivalent SQL query answer set Q) ⁺ 。

Referring to steps 1.1) to 1.5), it can be known that, taking the SQL answer Q and the equivalent transformation rule set R of the relational algebra expression given by the teacher as inputs, firstly converting Q into the relational algebra expression RA (Q) and adding the relational algebra expression into the equivalent relational algebra expression set RA _T Is a kind of medium. And then performing equivalent transformation on the relation algebra expression RA (Q) of the Q based on one conversion rule in the conversion rule set R. If the relational algebra expression after transformationDistinguishing from the relational algebra expression RA (Q) of Q, then +.>Is an equivalent relational algebra expression of Q, which is added to the set of equivalent relational algebra expressions RA _T Is a kind of medium. Finally, RA _T All equivalent relational algebra expressions in the table are converted into SQL sentences, and the obtained equivalent SQL query sentence set Q based on the SQL answer Q given by a teacher is obtained ⁺ And (5) returning.

It should be noted that, the conversion rule set R may be considered to be specified according to needs, for example, in this embodiment, the conversion rule set R is shown in table 1, and includes three types of equivalence change rules based on operators, equivalence transform rules based on operation rules, and sub-query equivalence transform rules, and each type includes a plurality of specific rules.

Table 1: a set of conversion rules R of relational algebra expressions.

In this embodiment, for the target topic "query students of all History (History) and Physics (Physics) in student table (student)", step 1) first converts the SQL answer provided by the teacher into relational algebra expression RA. Then based on the relation algebra expression equivalent transformation rule R1.2 shown in the table 1, carrying out equivalent transformation on RA to obtain the relation algebra expression RA equivalent to the RA ₁ ，RA ₁ The corresponding SQL query is as follows:

SELECT*

FROM student

WHERE dept_name＝'Physics'

OR dept_name＝'History'

because the above equivalent SQL answers are consistent with the syntax structure of the SQL answers given by the student A, and are the SQL queries given based on the condition limitation of the WHERE clause, the fairness of the refinement scoring of the SQL answers of the student A in the syntax structure can be ensured. Consider that the SQL answers of both student A and student B have partial errors that render them unable to be converted into relational algebra expressions.

Step 1.4) relating algebra RA (Q) to current conversion rule R _i Performing an equivalence transformation (simply called SQLTransform in the embodiment) to obtain an equivalence relation algebraThe method relates to a relational algebra conversion function transformDAG and an equivalent transformation function applyRules for carrying out equivalent transformation according to relational algebra and transformation rules, and comprises the following detailed steps:

1.4.1 Generating a directed acyclic Graph by using a relational algebra RA (Q) and a relational algebra conversion function transformDAG;

the rule matching of the relational algebra transfer function TransformDAG for the relational algebra needs to represent the relational algebra in a directed acyclic Graph mode, the input is the relational algebra RA (Q), the output is the directed acyclic Graph, and the executing steps of the relational algebra transfer function TransformDAG include:

S1, creating a directed acyclic graph;

s2, acquiring a first node (RA (Q). Root) of a relational algebra RA (Q) as a current node rel;

s3, judging whether a directed acyclic Graph exists in the current node rel or not, if so, jumping out of the round of recursion, and executing a step S6; otherwise, executing the next step;

s4, acquiring a node set input connected with the current node rel;

s5, aiming at each node input in the node set inputs: the relation algebra conversion function TransformaDAG recursion is called to convert the node input into a Graph node child, the Graph node child is added into a directed acyclic Graph, and a directed edge pointing to a child node child is set for the current node rel;

s6, attempting to acquire the next node of the relation algebra RA (Q) as a current node rel, and if successful, jumping to execute the step S3; otherwise, judging that the recursion is ended, and outputting a directed acyclic Graph.

1.4.2 With the conversion rule set R and the directed acyclic Graph as input, calling an equivalent transformation function ApplyRules to obtain a deformed directed acyclic Graph'.

The equivalent transformation function applyRules is a rule matching algorithm based on depth-first traversal, the input of the rule matching algorithm is a transformation rule set R and a directed acyclic Graph, and the output of the rule matching algorithm is a deformed directed acyclic Graph'. The step of executing the relational algebraic transfer function TransformDAG includes:

S1, initializing and setting matching times matches to 0;

s2, if the matching times matches is smaller than or equal to the limiting times match limit, acquiring a current Graph node vertex from the directed acyclic Graph, and if the acquisition is successful, jumping to execute the next step; otherwise, jumping to execute the step S4;

s3, traversing each rule in the conversion rule set R, and carrying out the following processing on the conversion rule R: performing equivalent deformation on the current Graph node vertex by using a rule to obtain a new node newVertex, if the current Graph node vertex is different from the new node newVertex, automatically increasing the matching times matches by 1, and if the matching times matches still do not reach the limit times matchLimit, performing traversal conversion on the Graph by using depth-first traversal according to a conversion rule set R, the new node newVertex and the matching times matches to obtain a deformed directed acyclic Graph'; step S2 is executed in a jumping mode;

and S4, outputting the deformed directed acyclic Graph'.

Taking the following example SQL as an example:

SELECT instructor.name,instructor.salary

FROM instructor JOIN teaches ON instructor.id＝teaches.id

WHERE instructor.salary>3000AND

in the first step, the above example SQL is converted into a relational algebraic expression, and a directed acyclic Graph obtained by recursion of the relational algebraic expression by using a relational algebraic conversion function TransformDAG is shown in FIG. 3.

Secondly, traversing from the root node of the directed acyclic Graph shown in fig. 3 by adopting an equivalent transformation function ApplyRules, and carrying out equivalent transformation on the Filter node according to the rule R1.1 when traversing to the Filter node and matching the Filter node to meet the equivalent deformation requirement of the rule R1.1 in the table 1, so as to obtain the directed acyclic Graph shown in fig. 4.

And thirdly, performing rule matching on the Filter node pushed down in the second step according to depth-first traversal in the equivalent transformation function ApplyRules. Traversing all rules of the Filter node of the directed acyclic graph in FIG. 3, if the rule conforming to the type of the Filter node does not exist, continuing rule matching to subsequent nodes of the Filter until all nodes do not conform to the equivalent deformation rule, and stopping. Finally, the directed acyclic graph shown in fig. 3 is restored to SQL-2, so that SQL queries which are equivalent to the input SQL and have different writing methods are obtained, as follows:

SELECT name,salary FROM(SELECT*FROM public.instructor,public.teaches WHERE instructor.id＝teaches.id)AS t WHERE t.salary>3000AND t.year>2010

fourth, after the first result is obtained according to the third step, according to depth-first traversal in the equivalent transformation function ApplyRules, rule matching is performed on the Join nodes in fig. 3 again, the Join nodes conform to the rule R3.1 in the table 12, and the Join nodes are subjected to equivalent transformation according to the rule R3.1, so that the directed acyclic Graph shown in fig. 5 is obtained. Finally, the directed acyclic graph shown in FIG. 5 is restored to SQL-3, so that SQL queries which are equivalent to the input SQL and have different writing methods are obtained, as follows:

SELECT t.name,t.salary FROM(SELECT*FROM public.instructor WHERE salary>3000)AS t INNER JOIN(SELECT*FROM public.teaches WHERE course_id>500)AS t0 ON t.id＝t0.id

And fifthly, after the second equivalent transformation result is obtained according to the fourth step, traversing the remaining matchable rules for Join nodes according to an equivalent transformation function ApplyRules. Traversing the remaining rules, and performing equivalent transformation on the Join node according to the rule R2.3 when the Join node accords with the rule R2.3 in the table 1, so as to obtain a deformed directed acyclic Graph' as shown in fig. 6. Finally, the directed acyclic graph shown in FIG. 6 is restored to obtain SQL-4, so that SQL queries with the same semantics as that of the teacher and different writing methods are obtained, and the SQL queries are as follows:

SELECT instructor.name,instructor.salary

FROM(SELECT*FROM instructor,teaches WHERE instructor.id＝teaches.id)AS t

WHERE t.salary>3000AND t.year>2010

referring to fig. 1, as an alternative implementation manner, step 2) of this embodiment is specifically implemented by using an SQL query modification module. Step 2) of the embodiment firstly, automatically correcting the SQL query submitted by the student and containing partial errors (which can be keyword errors, mode information errors or clause sequence errors) by utilizing SQL grammar rules and a query related data dictionary, thereby obtaining corrected student answer SQL sentences. The corrected SQL sentence of the student answer can be converted into a relational algebra expression in a subsequent refinement scoring module, so that the refinement scoring of the SQL answer of the student with partial errors is realized, and the fairness of scoring is ensured.

The implementation premise of the SQL refinement scoring strategy based on the syntactic structure is that SQL query answers submitted by students can be converted into relational algebra expressions, and further the refinement scoring of the student SQL answers can be given by comparing the difference between the query tree determined by the relational algebra expressions of the student SQL answers and the query tree of the correct answers. However, as known from actual teaching feedback, the SQL answer submitted by the student often appears similar to the SQL answer given by the teacher, but there are some cases of errors. The error that students are prone to include mainly two cases: (1) SQL keywords or database schema information are incorrect; (2) SQL clause order is incorrect. These errors can result in the system failing to convert the SQL answer submitted by the student into a relational algebraic expression. In view of this, the present embodiment first implements automatic correction of SQL answers submitted by students containing partial errors based on SQL grammar rules and query-related data dictionaries; and correcting clause sequence errors existing in the SQL answers of the students based on the SQL query grammar. The step of performing the SQL query modification in step 2) of this embodiment to obtain the modified SQL answer Qs includes:

2.1 Original SQL answer Q submitted to student _s Splitting to obtain original SQL answerQ _s Clause set C of (a);

2.2 Correcting the original SQL answer Q based on clause set C _s SQL keywords and database schema information errors in the database, correct SQL answers Q based on clause set C _s Is in error.

In step 2.2) of this embodiment, the original SQL answer Q is modified based on the clause set C _s The step of SQL key and database schema information errors in the database comprises the following steps:

2.2.6A) SQL query keyword set Dict in data dictionary Dict _key Based on the editing distance of the character string, the current word T is obtained _j Keyword ED with minimum editing distance _key Database schema information set Dict in data dictionary Dict _schema Based on the editing distance of the character string, the current word T is obtained _j Keyword ED with minimum editing distance _schema If the keyword ED _key Keyword ED _schema The absolute value of the difference is smaller than a preset thresholdEpsilon, according to the current word T _j In clause C _i Context determination of current word T _j The corrected result of the (c) is obtained the corrected current word T' _j Otherwise, the current word T is used _j Corrected to keyword ED _key Keyword ED _schema The object with the smallest editing distance is element in the data dictionary Dict; jump execution step 2.2.4A).

Step 2.2.1A) when generating the data dictionary Dict, the SQL keywords related to the query are extracted from the SQL standard (ISO/IEC 9075) 2016 to obtain the SQL query keyword set Dict _key . Then extracting the database mode information (including basic table name, view name, field name and index name) of an application database to obtain a database mode information set Dict _schema . Collecting SQL query keywords into Dict _key And database schema information set Dict _schema Integration may result in a query-dependent data dictionary (denoted as dct=dct _key ∪Dict _schema ). Then, correction of SQL answers submitted by students can be realized based on the query-related data dictionary Dict: for a word card (token) to be corrected in the medium word card sequence T, setting the word card as the current word card T _j The embodiment calculates the current word T based on the editing distance of the character string _j Query keywords with minimum edit distance between them (denoted ED) _key ) And the current word T _j Database schema information (denoted ED) with minimum edit distance between _schema ) And the current word T _j The element of the Dict whose edit distance is the smallest is corrected. If ED _key With ED _schema Same as the current word T _j The editing distance of the word is less than the threshold value, according to the current word T _j Context determination of a place current word T _j Is used for correcting the result of the correction. And in the process of correcting the SQL answers of the students, the correction cost for correcting the SQL key and the database mode information is calculated at the same time.

See steps 2.2.1A) -2.2.6A), the present embodiment first calculates, for each clause C in C, for the correction portion of the key and database schema information _i Word segmentation is carried out to obtain C _i Token set T of (c). Next, each of T is judged Whether or not a token is an element in the query-dependent data dictionary, dict. If a token T _j Not in Dict, calculate T separately _j With query keyword set Dict in Dict _key Minimum string edit distance ED for individual elements _key T is as follows _j With database schema information set in Dict _schema Minimum string edit distance ED for individual elements _schema . If the editing distance ED _key With Dict _schema The difference is not large, then the T-based _j In clause C _i Context information in determining T _j And otherwise T is _j The element of the Dict whose edit distance is the smallest is corrected.

In step 2.2) of the present embodiment, SQL answer Q is modified based on clause set C _s The step of mistakes the clause sequence in (a) comprises:

2.2.7B) if there is a set of erroneous clauses C _error SQL answer Q submitted by student if not empty _s Returning as the modified SQL answer Qs; otherwise, adjusting the clause sequence in the clause set C according to the determined sequence, and converting the clause set C after the adjustment of the sequence into SQL sentences to serve as a corrected SQL answer Q _s And (5) returning.

See steps 2.2.1B) through 2.2.7B), the present embodiment first traverses Q in order for the clause order correction portion _s Each clause C in clause set C _i Judging its subsequent clause C _i+1 Whether the SQL query clause order rules shown in Table 2 are satisfied. If not, C _i+1 Adding a mismatching clause set C _error And C is combined with _i Adding unsuccessful matched clause set C _unmatch The method comprises the steps of carrying out a first treatment on the surface of the If so, clause C _i The successor sentence of (2) is set to C _i+1 . For C _unmatch Clause C of each unsuccessful match _l From the set of mismatching clauses C _error Wherein is C _l The clause that matches the constraint rule of its successor (see table 2 for details) and that has the strongest association with it is set as its successor. If unable to find C _l The matched subsequent clause indicates that the SQL answer Q can not be submitted to the student _s Implement correction, return to Q at this time _s Itself, the method comprises the steps of; if can find C _error Matching successor sentence C in (1) _match Will Q _s The successor sentence of (2) is set to C _error At the same time C _error From C _error And (5) cleaning. If the matching process is completed, finding out the incorrect matching clause set C _error If there are still objects which fail to match, then the SQL answer Q which cannot be submitted to the student is explained _s Implement correction, return to Q at this time _s Itself, the method comprises the steps of; otherwise, describe Q _s Successful correction based on Q _s Clauses of (2)And reordering the successor clause information of each clause in the set C, and returning to generate a corresponding SQL sentence based on the reordered clause sequence.

For clause ORDER errors present in the SQL answer submitted BY the student (e.g., ORDER BY clause errors precede HAVING clause), the present embodiment will correct it based on the SQL query grammar in SQL Standard (ISO/IEC 9075) 2016. The SQL clause order adjustment rules are shown in Table 2.

Table 2: SQL clause order adjustment rules.

Clauses	Post clause
		SELECT	FROM
FROM	JOIN、WHERE、<NULL>
		WHERE	GROUP BY, aggregate clauses,<NULL>
JOIN	JOIN、WHERE、<NULL>
		Collecting keywords	SELECT
GROUP BY	HAVING、ORDER BY、<NULL>
		HAVING	ORDER BY、<NULL>
ORDER BY	ORDER BY、<NULL>

Table 2 shows the grammatical order of the various clauses in the SQL query statement shown in the SQL standard, where < NULL > indicates that the current clause may not be followed by other clauses. For example, as can be seen from Table 3, the ORDER BY clause may be followed BY the HAVING clause, while the ORDER BY clause may be followed BY either no clause or may continue to be followed BY the ORDER BY clause and no other clause such as HAVING may be followed. Based on the SQL clause sequence adjustment rule shown in table 2, the embodiment takes as input a set of clauses of the student SQL answers corrected by the SQL keywords and database mode information and the student SQL answers obtained by dividing before, and then checks whether the position of each clause meets the SQL clause sequence rule shown in table 2, and if not, corrects according to the rule.

In this embodiment, for the target title "query students of all History system (History) and physical system (Physics) in student table (student)", step 2) corrects the SQL answers of student a and student B for the key word and database schema information based on the data dictionary dct related to the query. Specifically: for the SQL answer of student A, correcting the error token 'dept_names' to be the element 'dept_name' with the smallest editing distance with the character string in the Dict; for the SQL answer of student B, the error token "SELET" is corrected to the element "SELECT" with the smallest editing distance from the character string in the Dict. Because no clause order is wrong in the SQL answers of two students, the clause order in the SQL answers of the students is not required to be adjusted.

Referring to fig. 1, as an alternative implementation manner, steps 3) and 4) of this embodiment are implemented by an SQL query refinement scoring module, which takes a set of SQL query answers equivalent to the SQL query answer, a revised student SQL answer, and a revised cost as input, and performs normalization processing on each equivalent SQL answer and the revised student SQL answer to obtain a normalized SQL statement. Then, the SQL sentences of each specification are converted into relational algebra expressions, and then the corresponding query tree structure of the SQL sentences can be obtained. And then, determining the refinement score of the student SQL answer by taking the correction cost of the student SQL answer and the edit distance between the student SQL answer query tree and each equivalent SQL answer query tree as inputs.

In order to reduce the syntax difference between SQL answers which is irrelevant to SQL equivalent transformation, thereby reducing the workload of subsequent analysis of SQL answers, the embodiment performs normalized preprocessing on the SQL answers provided by teachers and the SQL answers submitted by students. The normalization processing in step 3) of the present embodiment includes:

normalized relationship assertion: normalizing the form of the equivalent relationship assertion into a relationship assertion of a unified specification form; for example, replace the predicate NOT (a < B) with predicate a > =b, so that the NOT operator does NOT appear in the SQL answer.

Normalized connection query: normalizing the form of the equivalent connection query into a connection query in a unified designated form; for example, the connection query initiated by NATURAL INNER JOIN is uniformly rewritten as the connection query initiated by INNER JOIN.

Normalized nested queries: normalizing the form of the equivalent nested query into a nested query in a unified specified form; such as those connected IN operators IN/ANY, are uniformly rewritten as nested sub-queries connected IN EXISTS.

Delete BETWEEN assertion: the form of the equivalent interval query (BETWEEN) is normalized to a representation based on the comparison operator. For example, the assertions represented by BETWEEN in SQL queries are uniformly rewritten to the assertions represented by comparison operators (e.g., > and < >) that are equivalent thereto.

In order to quantify the syntactic structural difference between the SQL answer provided by the teacher and the modified student SQL answer, the embodiment converts the SQL answer provided by the teacher and the SQL answer submitted by the student into a query Tree with a fat Tree (Flattened Tree) structure respectively. The query tree used in the query optimization process of the relational database system can be optimized into a consistent organization form due to the mature heuristic optimization rule, so that the difference comparison of the syntactic structure between the SQL answers provided by teachers and the revised SQL answers of students is effectively supported. FIG. 2 illustrates the query tree structure of the following SQL query:

SELECT student.name

FROM student,takes

WHERE student.id＝takes.id

AND takes.course_id＝’2’；

It should be noted that, the query tree and the generating method thereof are all in the prior art, so the specific implementation is not expanded here. After the SQL answers provided by the teacher and the revised student SQL answers are converted into the query tree structure, the conversion cost on the syntactic structure between the two answers can be quantized based on the edit distance of the tree. Based on the correction cost of the SQL answers submitted by the students and the conversion cost between the query tree of the SQL answers provided by the teacher and the query tree of the corrected SQL answers of the students, the refinement scoring of the SQL answers submitted by the students can be realized. It can be seen that the implementation of refinement scores relies on a computational method that defines two costs. Both cost calculations depend on the weight settings given by the teacher for different clause types, and the weight values express how important the different clauses are to the SQL query. The clause types involved include 9, respectively a SELECT clause, a FROM clause, a WHERE clause, a GROUP BY clause, a join clause, a AGGREGATES clause, a distict clause, a HAVING clause, and a subsquery clause. Two cost calculation methods are introduced below based on the weight setting of clauses.

SQL statement Q 'specified in step 4) of this embodiment' _s And canonical SQL statement Q' _i The expression of the calculation function of the conversion cost between the two is as follows:

in the above, penalty _t (Q′ _s ，Q’ _i ) Is a canonical SQL statement Q' _s Canonical SQL statement Q' _i Conversion cost between components (Q' _s ) SQL statement Q 'representing specification' _s Corresponding combined element set of query tree, O _i SQL statement Q 'representing specification' _s The ith combined element of the corresponding query tree, ED (O _i ，Q’ _i ) Is SQL statement Q 'based on specification' _i Modified canonical SQL statement Q 'of corresponding query tree' _s Combined element O in corresponding query tree _i The edit distance value that is introduced,is a combined element O _i The weight value of the clause type, W is the sum of the weight values of all clause types, and S is the score of the target topic.

The calculating of the correction cost of the student SQL answer in step 2) of the present embodiment includes calculating a correction SQL answer Q _s First correction cost of SQL key and database mode information error in the database, and calculation correction SQL answer Q _s A second correction cost for clause order errors in (a); and the calculation function expressions of the first correction cost and the second correction cost are shown as follows:

In the embodiment, in the step 4), when the refinement score of the SQL answer of the student is calculated, a function expression is adopted as follows:

in the above, G (Q) _s ) To obtain a refined score of the SQL answer of the student, S is the original SQL answer Q submitted by the student _s Scores of Penalty _c (Q _s ) Representing the first correction cost, penalty _a (Q _s ) Representing a second modified cost, penalty _t (Q′ _s ，Q’ _i ) Is a canonical SQL statement Q' _s Canonical SQL statement Q' _i The cost of the conversion between the two,SQL statement Q 'representing all specifications' _s Canonical SQL statement Q' _i The minimum of the transition costs between. Wherein, the last term of the formula represents the modified student SQL answer Q' _s The conversion cost of the equivalent answer with the smallest conversion cost among the query trees. It can be seen that the refined score of the student SQL answer is obtained by deducting the correction cost of the SQL answer submitted by the student based on the SQL programming question score and converting the corrected student SQL answer into a Chinese character ' and ' e ' with the SQL program The syntax structure of the method is closest to the conversion cost of SQL answers.

In this embodiment, for the objective question "query students of all histories (History) and Physics (Physics)" in the student table (student), step 4) calculates the cost of editing conversion between the SQL answer query tree of each student and the SQL answer query tree of each equivalent SQL answer based on the two equivalent SQL answers of the programming question, and further can refine to obtain the refinement scores of the SQL answers of student A and the SQL answers of student B based on the correction cost of SQL query and the cost of editing conversion between query trees.

The complexity analysis of the method of this example is as follows: the method of the embodiment synthesizes the SQL query equivalent transformation (SQLTransition) and the SQL query correction (SQLCorrection) and finally realizes the refinement score of the SQL query, and the embodiment firstly calls the SQLTransition and the SQLCorrection respectively and then calculates the correction cost Penalty of the SQL keyword and the database mode information _c And correction cost Penalty of SQL clause sequence _a . Thereafter, for student SQL answer Q' _s Equivalent SQL answer set Q ⁺ Each equivalent SQL answer in the table is normalized, and then the corrected student SQL answer Q 'is calculated' _s Query tree and Q ⁺ The query tree of each equivalent SQL answer and the conversion cost between. And finally, calculating and returning the refinement score of the SQL answer of the student. The space-time complexity of the refinement scoring method of the present embodiment is analyzed as follows. The main calculation cost of the refinement scoring method of the embodiment comes from three parts: (1) The computational cost of the equivalent transformation method (i.e., the sqltransform method) of the SQL query; (2) The computation cost of the SQL query correction method (namely the SQLCorrection method); (3) The conversion between the modified student SQL answers and the query tree of each equivalent SQL answer calculates the cost. The temporal complexity of the first partial SQLTransform method is O (|R|), where |R| is the radix of the equivalent transformation rule set R of the relational algebraic expression. The temporal complexity of the second partial SQLCorptions method is O (T _E * log L), where T _E Is the SQL answer Q of students _s The total number of erroneous tokens contained in each clause, L being the maximum length of the element name string in the data dictionary Dict, and O (Log L) tableShowing the time complexity of searching a string in the Dict based on the Tire tree index. The third part, the time complexity of the conversion calculation between the modified student SQL answer and the query tree of each equivalent SQL answer is O (|Q) ⁺ Log N) where Q ⁺ I is SQL equivalent answer set Q ⁺ The radix number, N, of the constituent elements contained in the query tree, and O (N log N) is the computation time complexity of the query tree edit distance. In summary, the computation time complexity of the method of this embodiment is O (|r|+t) _E *logL+|Q ⁺ Log N. Since the space storage consumption of the method of the present embodiment mainly comes from storing the data dictionary dic, the space complexity of the method of the present embodiment is O (|dic|), where|dic| represents the number of elements contained in the data dictionary dic.

In addition, the embodiment also provides a refinement scoring system for the SQL query, which comprises a computer device, wherein the computer device is programmed or configured to execute the steps of the refinement scoring method for the SQL query, or a computer program programmed or configured to execute the refinement scoring method for the SQL query is stored in a memory of the computer device.

In addition, the embodiment also provides a computer readable storage medium, and the computer readable storage medium stores a computer program programmed or configured to execute the refinement scoring method facing the SQL query.

To test the effectiveness of the method of the present embodiment (SQL-GRADER for short), the present embodiment compares the variant technique of SQL-GRADER (abbreviated as SQL-GRADER w/o correction technique) that does not include SQL query correction function with the most representative related work XData technique. In order to obtain the SQL answers of students, five teaching classes (including 284 students) with the courses of database technology are simultaneously arranged with homeworks (typical SQL query types such as coverage set query, connection query and conditional query) including 12 SQL programming questions on an SQL online learning platform. Based on SQL answer data submitted by students for 12 SQL programming questions, comparing the advantages and disadvantages of the three SQL scoring techniques. The information of the number of students in five teaching classes is shown in Table 4.

Table 4: the experiment involves information about the class.

Course	Number of students	SQL question number
			DB-1	58	12
DB-2	52	12
			DB-3	55	12
DB-4	57	12
			DB-5	62	12

In the experiment of this embodiment, the teacher provides only one SQL answer to each SQL programming question. And the weight values of different query clause types which are relied on jointly by the three technologies for realizing the SQL refinement scoring function are all set to be 1, and are shown in a table 5 in detail.

Table 5: and inquiring the weight setting of the clause type.

Sub-query types	Weighting of	Sub-query types	Weighting of
				SELECT	1	AGGREGATES	1
FROM	1	DISTINCT	1
				WHERE	1	HAVING	1
GROUP BY	1	SUBQUERY	1
				JOINS	1

Because the scoring personnel often have a certain subjectivity when scoring the SQL answers of students, the scoring personnel are not suitable for directly comparing the SQL query scores given by the scoring personnel with the SQL query scores given by the techniques in numerical values to judge whether the scores given by the SQL refinement scoring techniques are fair or not. In view of this, several forms were created randomly for each problem j, following the experimental strategy described in Chandra B et al, automated Grading of SQL Queries (ICDE, 2019:1630-1633.)And each query pair contains two student SQL answers with partial errors, recorded as +.>And->(wherein->1 st SQL query in the nth query pair representing SQL programming question j, +.>Then the 2 nd SQL query is represented). Two teachers with teaching experience of database technology courses for more than 8 years and three teaching aids are then invited as scoring agents and are required to mark each SQL query pair as one of the following three classes, and the final mark of each SQL query pair is determined based on a few rules obeying most: / >

(1)A higher score should be obtained;

(2)a higher score should be obtained;

(3) Both contain different errors, they should get nearly the same score.

Next, each of the SQL query pairs is refined scored using three SQL refinement scoring techniques, respectively, and each query pair is classified into one of the three classes based on the results of the refinement scoring. It should be noted that if the difference between the refined scores of two SQL queries in an SQL query pair is less than 10% of the score of their corresponding SQL programming questions, the query pair is labeled as a third class.

For SQL query pair SQ _j,n If the grader and the SQL automatic grading technology are classified into the same type, the SQL automatic grading technology is determined to be used for SQ _j,n The score of the SQL query is a fairness score. Thus, the scoring fairness of a certain SQL refinement scoring technique with respect to SQL programming questions j can be quantified based on the following scoring accuracy formula.

In the above, accurancy _j Score fairness for a certain SQL refinement scoring technique relative to SQL programming questions j, m is the number of query pairs for SQL programming questions j,the tag of the nth SQL query pair representing SQL programming topic j matches the result: if the grader and SQL refinement scoring technique mark the query pair as the same class, +. >On the contrary->

For the same SQL programming questions, the forms of SQL answers submitted by students are diversified, and different questions solving ideas of the students are reflected. To analyze the existence of student answer diversity in this SQL programming practiceUnder the condition, 3167 student SQL answers collected through experiments are counted. Assume that the proportion of the k type SQL answer type of the SQL programming question E is p _k And k=1, …, n, then the calculation formula of the student answer Diversity index Diversity (E) of the SQL programming question E is given based on the idea of information entropy:

in the above, p _k The k type SQL answer type of the SQL programming questions E is occupied proportion, and n is the classification number of the SQL programming questions E.

Table 6 shows the division ratios of the types of student SQL answers and the diversity index of the student SQL answers for 12-way SQL programming questions. As can be seen from table 6, the student SQL answers corresponding to each of the SQL programming questions exhibit significant diversity. The student SQL answer diversity of the 9 th question shows the most remarkable, and the student answer diversity index reaches 1.415:20% of students use ideas of the set query to write answers, 57% of students use ideas of the condition query to write answers, and 23% of students use ideas of the nested query to write answers. The diversity of student answers has the effect that XData techniques that rely solely on a single SQL answer provided by a teacher to achieve a refined score of SQL have an impact on score fairness. Meanwhile, the necessity of automatically generating a plurality of equivalent SQL answers based on the equivalent transformation rules of the relational algebra expression in the method of the embodiment is also proved.

Table 6: student answer diversity analysis table.

In this embodiment, three SQL refinement scoring techniques are compared to score fairness for SQL answers submitted by students. Randomly extracting 202 SQL query pairs without repetition from an incorrect student SQL answer set of each SQL programming question as evaluation objects. Table 7 shows the number of equivalent SQL answers obtained for three SQL refinement scoring techniques versus each SQL programming question and the accuracy of the SQL refinement scoring.

Table 7: score fairness analysis.

As can be seen from table 7, the XData technique has the lowest scoring accuracy (which shows scoring fairness), and the average value of the scoring accuracy is only 61.55%, because it cannot cope with the diversity of the student SQL answers well, and also does not correct the SQL answers submitted by the students with partial errors. Compared with the XData technology, the method provided by the embodiment of the invention can generate a plurality of SQL answers with different semantic equivalent forms as correction basis, and correct the student SQL answers with partial grammar errors, so that the highest scoring fairness is realized on 12 SQL programming questions, the average scoring precision reaches 83.15%, and the average scoring precision of the method is improved by 33% compared with that of the XData technology on the 12 questions. Meanwhile, the SQL-GRADER w/o correction method can be observed, and the grading accuracy of the SQL-GRADER w/o correction method is 9 percentage points lower than that of the SQL answer submitted by the student. This illustrates that modifying the SQL answer submitted by the student is very effective in improving the fairness of the SQL refinement score. Table 7 also shows that the scoring accuracy of the present example method on the 9 th pass SQL programming questions exceeds 90% because the diversity of SQL answers submitted for the students of the questions is highest (see Table 6 for details), which makes the present example method more remarkable in its superiority over XData technology.

FIG. 7 illustrates SQL refinement scoring runtime for the same number of equivalent SQL answers questions under different SQL refinement scoring techniques. As shown in FIG. 7, the XData technique has a minimum runtime and its runtime does not fluctuate significantly with a change in the number of equivalent SQL answers. This is because XData makes a scoring of SQL queries based on only one teacher's provided SQL answer. The two scoring technologies, namely the SQL-GRADER w/o scoring method, can generate a plurality of equivalent SQL answers based on the SQL answers provided by teachers and accordingly realize judgment of SQL queries, so that the running time of the SQL answers shows fluctuation along with the change of the number of the equivalent SQL answers owned by the problems. Meanwhile, the running time of the method and the SQL-GRADER w/o correction method both show rising trend along with the increase of the number of equivalent SQL answers owned by the problem, and only the rising trend of the running time of the SQL-GRADER w/o correction method fluctuates when the number of equivalent SQL answers is 4 or 6. This is because the run time of the method of this embodiment has a positive correlation with the number of wrong answers to the problem, as the more the number of wrong answers, the more time the two methods consume in correcting the SQL answers submitted by the students and calculating the edit distance between the query tree of the SQL answers submitted by the students and the query tree of each equivalent SQL answer. FIG. 8 shows statistics of the number of fault respondents corresponding to the same number of questions of the equivalent SQL answers in the SQL-GRADER w/o correction method. As can be seen from the data of FIGS. 7 and 8, for the problem set with the number of equivalent SQL answers of 4, the number of misanswers is significantly smaller than that of the problem set with the number of equivalent SQL answers of 3, so that the running time of the method and the SQL-GRADER w/o correction method on the problem set with the number of equivalent SQL answers of 4 is slightly smaller than that on the problem set with the number of equivalent SQL answers of 3. Similarly, it is known that the running time of the two methods on the problem set with the equivalent SQL answer number of 6 is slightly less than the running time of the two methods on the problem set with the equivalent SQL answer number of 5 due to the difference of the number of wrong problem answering persons. In addition, it can be seen from FIG. 7 that the SQL-GRADER w/o correction method has a slightly smaller run time than the method of the present embodiment. This is because the SQL-GRADER w/o correction method removes the correction module of the SQL query compared to the present embodiment method.

In summary, the automatic scoring technology for SQL queries has important research significance because the burden of modifying the SQL homework of students by teachers can be greatly reduced. However, the existing automatic scoring technique for SQL queries does not fully consider the fairness of scoring for different forms of answers submitted by students and the fairness of scoring for answers submitted by students that contain partial errors, thereby affecting the learning interest and confidence of students in SQL queries. Aiming at the defects of the existing research work, the embodiment provides a refinement scoring technology oriented to SQL query, which is named SQL-GRADER. SQL-GRADER utilizes SQL modification strategy based on SQL sentence equivalent transformation of correct answer, SQL grammar rule and query related data dictionary to raise fairness of SQL refinement score. The method comprises the steps that a plurality of teaching class students participate in SQL programming teaching activities to collect student SQL answer data, and experimental analysis on related SQL query refinement scores based on the collected student answer data shows that: the SQL-GRADER refinement scoring technology can effectively improve the accuracy and fairness of scoring.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims

1. The refinement scoring method for the SQL query is characterized by comprising the following steps of:

3) Normalizing the corrected SQL answer Qs to obtain a normalized SQL statement Q' _s The obtained canonical SQL statement Q' _s Converting the SQL answer query tree into a relational algebra expression and further converting the relational algebra expression into a student SQL answer query tree;

4) Answer set Q for equivalent SQL query ⁺ Each equivalent SQL query answer Q in (1) _i : first, answer Q to an equivalent SQL query _i Normalized processingObtain canonical SQL statement Q' _i The obtained canonical SQL statement Q' _i Converting the SQL answer query tree into a relational algebra expression and further converting the relational algebra expression into an equivalent SQL answer query tree; then, according to the edit distance between the SQL answer query tree and the SQL answer query tree, calculating the normalized SQL sentence Q' _s And canonical SQL statement Q' _i Conversion cost between; according to the correction cost of the student SQL answer and the standard SQL statement Q' _s And canonical SQL statement Q' _i Calculating conversion cost between the two to obtain refinement score of the SQL answer of the student;

in step 1), SQL query equivalent transformation is carried out to obtain an equivalent SQL query answer set Q ⁺ The method comprises the following steps:

1.5 Output equivalent SQL query answer set Q) ⁺ ；

The step of performing SQL query correction in the step 2) to obtain corrected SQL answer Qs comprises the following steps:

2.2 Correcting the original SQL answer Q based on clause set C _s In SQL keywords and database schema information errors, correct the original SQL answer Q based on clause set C _s The clause order of (a) is wrong;

step 2.2) correcting the original SQL answer Q based on clause set C _s The step of SQL key and database schema information errors in the database comprises the following steps:

2.2.6A) SQL query keyword set Dict in data dictionary Dict _key Based on the editing distance of the character string, the current word T is obtained _j Keyword ED with minimum editing distance _key Database schema information collection in data dictionary DictDict _schema Based on the editing distance of the character string, the current word T is obtained _j Keyword ED with minimum editing distance _schema If the keyword ED _key Keyword ED _schema The absolute value of the difference is smaller than the preset threshold epsilon, and the current word T is used as the reference _j In clause C _i Context determination of current word T _j The corrected result of the word T is corrected ^’ _j Otherwise, the current word T is used _j Corrected to keyword ED _key Keyword ED _schema Of the two, the current word T _j Editing elements of the object with the smallest distance in the data dictionary Dict; jump execution step 2.2.4A);

step 2.2) correcting the original SQL answer Q based on clause set C _s The step of mistakes the clause sequence in (a) comprises:

2.2.3B) determine the next clause C of the current clause _i+1 For the current clause C _i Whether the next clause of the current clause is true or not, if not, the next clause C of the current clause _i+1 Adding the wrong clause set C _error Will present clause C _i Adding unmatched clause set C _unmatch The method comprises the steps of carrying out a first treatment on the surface of the If so, then the next clause C of the current clause _i+1 As the current clause C _i Is the next clause of (a); jump execution step 2.2.2B);

2.2.6B) if clause C _mathch If empty, the student submits the original SQL answer Q _s Returning as the modified SQL answer Qs; otherwise, clause C _mathch As the current clause C _l Is the next clause of (a); jump execution step 2.2.4B);

2.2.7B) if there is a set of erroneous clauses C _error Non-null then submits the original SQL answer Q to the student _s Returning as the modified SQL answer Qs; otherwise, adjusting the clause sequence in the clause set C according to the determined sequence, and converting the clause set C after the adjustment of the sequence into SQL sentences to serve as a corrected SQL answer Q _s Returning;

SQL statement Q 'specified in step 4)' _s And canonical SQL statement Q' _i The expression of the calculation function of the conversion cost between the two is as follows:

in the above-mentioned method, the step of,is a canonical SQL statement Q' _s And canonical SQL statement Q' _i The cost of the conversion between the two,SQL statement Q 'representing specification' _s Corresponding combined element set of query tree, O _i SQL statement Q 'representing specification' _s The i-th combination element of the corresponding query tree,>is SQL statement Q 'based on specification' _i Corresponding query tree modification specification SQL statement Q' _s Combined element O in corresponding query tree _i Edit distance value introduced,/->Is a combined elementO _i The weight value of the clause type, W is the sum of the weight values of all clause types, S is the score of the target topic;

in the above-mentioned method, the step of,representing the first correction cost, Q _s For the modified SQL answer, C (Q _s ) For SQL answer Q _s Clause set of >For clause C _i Editing modification number of (a),>for clause C _i A weight value of the clause type to which the clause belongs;

in the above-mentioned method, the step of,representing the second correction cost, Q _s For the modified SQL answer, I (C _i ) For clause C _i Indicating variable of (C), if clause C _i Is adjusted according to the order of I (C _i ) =1, otherwise I (C _i )=0。

2. The refinement scoring method for SQL queries according to claim 1, wherein the normalization process in step 3) comprises: normalizing the form of the equivalent relationship assertion into a relationship assertion of a unified specification form; normalizing the form of the equivalent connection query into a connection query in a unified designated form; normalizing the form of the equivalent nested query into a nested query in a unified specified form; normalizing the form of an equivalent interval query to a comparison operator based representation

3. The refinement scoring method for SQL queries according to claim 1, wherein the refinement score of the student SQL answer calculated in step 4) is expressed by the following formula:

in the above, G (Q) _s ) To obtain a refined score of the student SQL answer, S is the score of the target question,representing a first correction cost,/->Representing a second correction cost,/->Is a canonical SQL statement Q' _s And canonical SQL statement Q' _i Conversion cost between->SQL statement Q 'representing all specifications' _s And canonical SQL statement Q' _i The minimum of the transition costs between.

4. A refinement scoring system for SQL queries, comprising a computer device, characterized in that the computer device is programmed or configured to perform the steps of the refinement scoring method for SQL queries of any one of claims 1 to 3, or a computer program programmed or configured to perform the refinement scoring method for SQL queries of any one of claims 1 to 3 is stored in a memory of the computer device.

5. A computer readable storage medium having stored therein a computer program programmed or configured to perform the SQL query-oriented refinement scoring method of any one of claims 1-3.