CN112132420A - SQL query-oriented refinement scoring method - Google Patents

SQL query-oriented refinement scoring method Download PDF

Info

Publication number
CN112132420A
CN112132420A CN202010922595.1A CN202010922595A CN112132420A CN 112132420 A CN112132420 A CN 112132420A CN 202010922595 A CN202010922595 A CN 202010922595A CN 112132420 A CN112132420 A CN 112132420A
Authority
CN
China
Prior art keywords
sql
answer
clause
query
student
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010922595.1A
Other languages
Chinese (zh)
Other versions
CN112132420B (en
Inventor
许嘉
莫晓琨
吕品
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University
Original Assignee
Guangxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University filed Critical Guangxi University
Priority to CN202010922595.1A priority Critical patent/CN112132420B/en
Publication of CN112132420A publication Critical patent/CN112132420A/en
Application granted granted Critical
Publication of CN112132420B publication Critical patent/CN112132420B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • G06F16/24545Selectivity estimation or determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Technology (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a SQL query-oriented refinement scoring method, which utilizes an equivalent SQL statement set obtained based on the equivalent transformation of a correct answer SQL statement to realize reasonable score judgment of different forms of answers submitted by students on one hand, and provides correction of partial wrong student answers on the other hand, and quantifies the score of the student answers on the basis of correction cost and conversion cost between the corrected answers and the correct answers. Answer data of students is collected through online SQL programming practice activities participated by students of a plurality of teaching classes, and relevant SQL query refinement scoring is carried out for experimental analysis based on the collected answer data of the students. According to the method, equivalent change of the teacher answer, correction of the student answers and subsequent detailed grading processing are combined, so that the accuracy of SQL query judgment can be improved on the basis that only simple judgment of wrong answers is available, and the SQL query judgment is fair and reasonable.

Description

SQL query-oriented refinement scoring method
Technical Field
The invention relates to a database technology, in particular to a refinement scoring method for SQL query.
Background
Database technology has been the core backbone of computer science, and whether to be proficient is also an important standard for measuring the skills of IT practitioners. Learning the structured query language (i.e., SQL) plays a crucial role in the learning process of database technology. Like other programming languages, SQL is highly demanding on programming practice, i.e., it requires a great deal of programming practice to really understand and master SQL. In the actual teaching process, the SQL programming questions are generally corrected by professional teachers. However, the modification of the SQL programming questions by teachers presents the following two important problems. First, the answers to the SQL programming questions are diverse, which undoubtedly increases the judgment burden of the teacher, and may cause the teacher to give different scores to the equivalent answers with different forms, affecting the fairness of scoring. Secondly, the judgment standard of the SQL programming questions has ambiguity, and teachers may give different scores to wrong answer judgments with consistent forms, so that the grading fairness is also influenced. Therefore, how to realize effective automatic scoring of SQL is an important problem that needs to be researched and solved when the database technology course is constructed.
In order to reduce the teacher correction burden of the SQL programming questions and improve the fairness of scoring, in recent years, many excellent research works emerge at the top level or important academic conferences in the database research fields such as VLDB, ICDE and the like, so that the automatic scoring of SQL queries is realized, and the methods can be specifically divided into three types: (1) SQL scoring techniques based on result set comparisons; (2) crowd-sourced based SQL scoring techniques; (3) SQL scoring techniques based on syntactic structural analysis. The SQL scoring technology based on result set comparison realizes the automatic scoring of SQL by comparing the query result set of correct answers with the query result set of student answers. SQL scoring techniques based on result set comparisons can give fair scores (i.e., full scores) for equivalent correct answers, but have the limitation of "not correct or wrong" correction. Specifically, for a part of wrong SQL statements given by a student (for example, the student simply spells a wrong field name), the SQL scoring technology based on result set comparison cannot analyze the SQL contents, and therefore cannot judge a refined score smaller than a full score, but can judge a refined score of 0 score, as a teacher does, and thus may discourage the student from being interested in learning SQL and from being confident. The SQL scoring technology based on crowdsourcing assigns the task of correcting the SQL programming questions to students and estimates the real scores of the SQL programming questions based on the scores of a plurality of students on the questions. Although the crowd-sourced SQL scoring technology not only reduces the correcting burden of teachers, but also can give detailed scoring of SQL, the crowd-sourced SQL scoring technology has the limitations that the correcting period is long, and scoring feedback cannot be given in time. In recent years, some researchers have worked to develop SQL scoring techniques based on syntactic structural analysis. The technology realizes the automatic grading of SQL by comparing the difference between the syntactic structure and the clause content of the correct answer and the syntactic structure and the clause content of the student answer. Although the SQL scoring technology based on syntactic structure analysis can provide a detailed scoring result of SQL, it is still impossible for teachers to provide all correct answers for each question due to the diversity of correct answers for SQL programming questions.
In summary, the focus and difficulty of SQL learning lies in the data query function, and the existing SQL automatic scoring technology intensively solves the problem of automatic scoring of SQL query. Although the prior art has made many research advances in the automatic scoring of SQL queries, in order to further improve the fairness of the automatic scoring of SQL queries, the following two challenge problems still need to be solved:
challenge one: automatic scoring techniques also need to be able to deduce other equivalent correct answers based on the correct answers given by the teacher, thereby ensuring fairness in scoring different forms of answers submitted by the students. The existing SQL automatic scoring technology cannot provide the guarantee. As shown in table 1, for the query semantic of "querying student information of a history system and a physical system", the teacher gives an SQL answer based on the merge operation, and the student a gives an SQL answer based on the WHERE clause condition judgment. Since the problem that the SQL answer of the student A has wrong spelling of the field name of the department can not be executed, the existing SQL scoring technology based on result set comparison can only judge 0 score. Moreover, the SQL answer structure of the student A is obviously different from the SQL answer structure given by the teacher, so that the existing SQL scoring technology based on the syntactic structure analysis cannot give fair refined scores.
Challenge two: for SQL statements that are partially wrong with respect to the answer, the automatic scoring technique needs to be able to give a refined scoring result, thereby guaranteeing the fairness in scoring the partially wrong answer. The existing SQL automatic scoring technology cannot provide the guarantee. As shown in table 1, the SQL answer submitted by student B is consistent with the answer given by the teacher, except that the keyword SELECT spelling is incorrect. However, the existing SQL scoring technology based on result set comparison can only judge 0 score because the statement cannot be executed by mistake. The existing SQL scoring technology based on syntactic structure analysis can only judge 0 point because syntactic analysis can not be implemented due to syntactic errors.
Table 1: unfair SQL scoring example.
Figure BDA0002667239540000021
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a SQL query-oriented refinement scoring method, which can improve the SQL query judgment accuracy on the basis that only simple answer judgment is wrong at present by modifying the equivalent change of teacher answers and the student answers and combining with subsequent refinement scoring treatment, so that the SQL query judgment is fairer and more reasonable.
In order to solve the technical problems, the invention adopts the technical scheme that:
a refinement scoring method for SQL query comprises the following steps:
1) according to SQL query answer Q provided by the teacher for the target question and the input conversion rule set R, SQL query equivalent transformation is carried out to obtain an equivalent SQL query answer set Q+
2) According to the original SQL answer Q submitted by students for the target subjectsCarrying out SQL query correction on the data dictionary Dict related to the query to obtain a corrected SQL answer QsCalculating the correction cost of the student SQL answer;
3) carrying out normalization processing on the corrected SQL answer Qs to obtain a normalized SQL statement Q'sWill getConverting the obtained standard SQL statement into a relational algebra expression and further converting to obtain a student SQL answer query tree;
4) answer set Q for equivalent SQL queries+Each equivalent SQL query answer Q in (a)i: first, the answer Q to the equivalent SQL queryiCarrying out normalization processing to obtain a normalized SQL statement Q'iConverting the obtained standard SQL statement into a relational algebra expression and further converting the relational algebra expression into an equivalent SQL query answer query tree; then, calculating a standard SQL statement Q 'according to the edit distance between the student SQL answer query tree and the equivalent SQL answer query tree'sAnd a canonical SQL statement Q'iA transition cost between; according to the correction cost of student SQL answer and the normalized SQL statement Q'sAnd canonical SQL statement Q'iAnd calculating the conversion cost to obtain a refined score of the student SQL answer.
Optionally, in step 1), performing SQL query equivalent transformation to obtain an equivalent SQL query answer set Q+Comprises the following steps:
1.1) acquiring an SQL query answer Q provided by a teacher aiming at a target question and an input conversion rule set R;
1.2) converting the SQL query answer Q into a relational algebra RA (Q), and adding the relational algebra RA (Q) into a relational algebra set RA (Q)T
1.3) if the conversion rule set R is empty, skipping to execute the step 1.5); otherwise, the current conversion rule R is traversed and taken out from the conversion rule set RiSkipping to execute the next step;
1.4) algebraic relationship RA (Q) according to current conversion rule RiPerforming equivalence transformation to obtain equivalence relation algebra
Figure BDA0002667239540000031
If equivalence relation algebra
Figure BDA0002667239540000032
Different from the relationship algebra RA (Q), the equivalent relationship algebra
Figure BDA0002667239540000033
Adding relational algebra sets RAT(ii) a Collecting relationship algebra RATConverting into SQL sentence to obtain equivalent SQL query answer and adding equivalent SQL query answer set Q+(ii) a Skipping to execute step 1.3);
1.5) outputting equivalent SQL query answer set Q+
Optionally, the step of performing SQL query modification in step 2) to obtain a modified SQL answer Qs includes:
2.1) original SQL answer Q submitted by studentsThe original SQL answer Q is obtained by segmentationsA set of clauses C;
2.2) amending the original SQL answer Q based on the clause set CsThe SQL key words and the database mode information in the SQL database are wrong, and the SQL answer Q is corrected based on the clause set CsThe clauses in (1) are in wrong order.
Optionally, modifying the original SQL answer Q in step 2.2) based on the clause set CsThe step of the SQL key word and the database mode information error in (1) comprises the following steps:
2.2.1A) generating a data dictionary Dict from a set of SQL query keywords DictkeyDatabase schema information set DictschemaComposition, said database schema information set DictschemaThe database mode information comprises a basic table name, a view name, a field name and an index name;
2.2.2A) traversing clauses in the clause set C to obtain the current clause CiIf the clause set C is traversed completely, judging that the SQL key words and the database mode information are corrected completely, and returning; otherwise, skipping to execute the next step;
2.2.3A) for the current clause CiPerforming word segmentation processing to obtain a word and brand sequence T to be corrected;
2.2.4A) obtaining a current word tag T from the word tag sequence T in a traversal wayjIf the traversal is finished, skipping to execute the step 2.2.2A); otherwise, skipping to execute the next step;
2.2.5A) judging the current word brand TjWhether the dictionary Dict is true or not is judged, if true, the next step is executed by skipping; otherwise, skipping the execution step2.2.4A);
2.2.6A) SQL query key set Dict in data dictionary DictkeyThe distance between the current word plate and the word plate T is calculated based on the editing distance of the character stringjThe key word ED having the smallest edit distance therebetweenkeySet of database schema information Dict in data dictionary DictschemaThe distance between the current word plate and the word plate T is calculated based on the editing distance of the character stringjThe key word ED having the smallest edit distance therebetweenschemaIf the key word EDkeyKeyword EDschemaIf the absolute value of the difference is less than the preset threshold, the word tag T is selected according to the current word tagjIn clause CiContext of (1) determining the current word brand TjThe corrected result obtains the current word card T 'after modification'jOtherwise, the current word plate T is usedjCorrection to key words EDkeyKeyword EDschemaThe object with the minimum editing distance in the two elements is in the data dictionary Dict; the jump performs step 2.2.4A).
Optionally, amending the SQL answer Q in step 2.2) based on the clause set CsThe step of clause misordering in (1) comprises:
2.2.1B) initially set the set of error clauses CerrorSet of null, unmatched clauses CunmatchIs empty;
2.2.2B) traversing the clauses in the clause set C to obtain the current clause CiIf the clause set C is traversed, skipping to execute the step 2.2.4B); otherwise, skipping to execute the next step;
2.2.3B) judging the next clause C of the current clausei+1For the current clause CiIf the next word is true, the next clause C of the current clause is determined to be truei+1Adding a set of wrong clauses CerrorThe current clause C is addediAdding a set of unmatched clauses Cunmatch(ii) a If yes, the next clause C of the current clause is addedi+1As the current clause CiThe next word or sentence; jump execution 2.2.2B);
2.2.4B) traverse the set of unmatched clauses CunmatchThe clause in (1) obtains the current clause Cl(ii) a If not matched, clause setCunmatchAnd step 2.2.7B) is executed by jumping after traversing is finished; otherwise, skipping to execute the next step;
2.2.5B) in the wrong clause set CerrorFind current clause C inlMatched clause Cmathch
2.2.6B) if clause CmathchIf the answer is null, the SQL answer Q submitted by the student is transmittedsReturning as a revised SQL answer Qs; otherwise, clause C is setmathchAs the current clause ClThe next clause of (a); skipping to execute step 2.2.4B);
2.2.7B) if wrong clause set CerrorIf not, the SQL answer Q submitted by the studentsAs a revised SQL answer QsReturning; otherwise, adjusting the sequence of the clauses in the clause set C according to the determined sequence, and then converting the clause set C after the sequence is adjusted into an SQL statement to be used as a modified SQL answer QsAnd returning.
Optionally, the normalizing in step 3) includes: standardizing the form of the equivalent relationship assertion into the relationship assertion in a unified specified form; standardizing the equivalent connection query form into a connection query in a unified specified form; standardizing the form of the equivalent nested query into a nested query of a unified specified form; the form of the equivalent interval query is normalized to a representation based on a comparison operator.
Optionally, the SQL statement Q 'specified in step 4)'sAnd canonical SQL statement Q'iThe computational function expression of the conversion cost between is:
Figure BDA0002667239540000051
in the above formula, Penaltyt(Q′s,Q’i) Is a canonical SQL statement Q'sAnd a canonical SQL statement Q'iConversion cost between, Component (Q's) Express canonical SQL statement Q'sSet of composition elements of the corresponding query tree, OiExpress canonical SQL statement Q'sIth group of corresponding query treeElement of general formula ED (O)i,Q’i) Is a specification-based SQL statement Q'iModified canonical SQL statement Q 'of the corresponding query tree'sCorresponding combined element O in query treeiThe value of the edit distance introduced is,
Figure BDA0002667239540000052
is a combination element OiThe weight values of the clause types belong to, W is the sum of the weight values of all the clause types, and S is the score of the target topic;
calculating the correction cost of the student SQL answer in the step 2) comprises calculating a correction SQL answer QsThe first correction cost of SQL keyword and database mode information error in the database, and calculating the corrected SQL answer QsA second correction cost for a wrong order of clauses in (1); and the calculation function expressions of the first correction cost and the second correction cost are shown as the following formulas:
Figure BDA0002667239540000053
in the above formula, Penaltyc(Qs) Represents a first correction cost, QsAs a revised SQL answer, C (Q)s) Answer Q for SQLsThe set of clauses of (a) is,
Figure BDA0002667239540000054
is a clause CiThe number of editing modifications of (a) the number of editing modifications,
Figure BDA0002667239540000055
is a clause CiThe weight values of the clause types belong to, W is the sum of the weight values of all the clause types, and S is the score of the target topic;
Figure BDA0002667239540000056
in the above formula, Penaltya(Qs) Representing a second correction cost, QsAs a revised SQL answer, C (Q)s) SQL answer QsSet of clauses of, I (C)i) Is a clause CiIf clause CiIs adjusted to I (C)i) 1, and vice versa I (C)i)=0,
Figure BDA0002667239540000057
Is a clause CiAnd W is the sum of the weighted values of all clause types, and S is the score of the target topic.
Optionally, when the refined score of the student SQL answer is calculated in step 4), a functional expression is adopted as shown in the following formula:
Figure BDA0002667239540000061
in the above formula, G (Q)s) For the refined score of the student 'S SQL answer, S is the student' S submitted original SQL answer QsScore of (1), Penaltyc(Qs) Represents a first correction cost, Penaltya(Qs) Indicating a second correction cost, Penaltyt(Q′s,Q’i) Is a canonical SQL statement Q'sAnd a canonical SQL statement Q'iThe cost of the conversion between (a) and (b),
Figure BDA0002667239540000062
representing all canonical SQL statements Q'sAnd a canonical SQL statement Q'iThe minimum in the transition costs between.
In addition, the invention also provides a refinement scoring system for the SQL query, which comprises a computer device, wherein the computer device is programmed or configured to execute the steps of the refinement scoring method for the SQL query, or a computer program which is programmed or configured to execute the refinement scoring method for the SQL query is stored in a memory of the computer device.
Furthermore, the present invention also provides a computer readable storage medium having stored therein a computer program programmed or configured to execute the SQL query-oriented refinement scoring method.
Compared with the prior art, the invention has the following advantages: the learning of the structured query language SQL has been a major and difficult point of the database technology, requiring students to perform a large number of SQL programming practices. In recent years, the automatic scoring technology for SQL query has become a current research focus because the burden of teachers to modify student SQL jobs in batches can be greatly reduced. However, the existing SQL query automatic scoring technology does not fully consider the fairness of scoring for different forms of answers submitted by students and the fairness of scoring for answers submitted by students containing partial errors, thereby affecting the learning interest and confidence of students in SQL queries. Aiming at the problems, the invention provides a SQL query-oriented refinement scoring method, which on one hand utilizes an equivalent SQL statement set obtained based on the equivalent transformation of a correct answer SQL statement to realize reasonable score judgment of different forms of answers submitted by students, on the other hand, provides correction of partial wrong student answers and quantifies the scores of the student answers based on the correction cost and the conversion cost between the corrected answers and the correct answers. Answer data of students is collected through online SQL programming practice activities participated by students of a plurality of teaching classes, and relevant SQL query refinement scoring is carried out for experimental analysis based on the collected answer data of the students. Compared with the related technology, the technology of the invention has more advantages in the SQL scoring fairness, and the SQL scoring accuracy is obviously improved.
Drawings
FIG. 1 is a schematic diagram of the basic principle of the method according to the embodiment of the present invention.
FIG. 2 is a diagram illustrating a query tree according to an embodiment of the present invention.
Fig. 3 is a Graph of a directed acyclic Graph obtained by a first recursion in the embodiment of the present invention.
Fig. 4 is a directed acyclic Graph obtained by the second recursion in the embodiment of the present invention.
Fig. 5 is a Graph of a directed acyclic Graph obtained by a third recursion in the embodiment of the present invention.
Fig. 6 is a deformed directed acyclic Graph' finally obtained in the embodiment of the present invention.
FIG. 7 is a schematic diagram showing a comparison of the run-time analysis of the three methods in the embodiment of the present invention.
FIG. 8 is a diagram illustrating the comparison of the number of wrong answers by the three methods in the embodiment of the present invention.
Detailed Description
The detailed scoring method (abbreviated as SQL-graph) for SQL query according to the present invention will be further described in detail below by taking the target topic as "student querying all History systems (History) and physical systems (Physics) in student table (student)", wherein student a and student B submit original SQL answer Q for the target topicsAs shown in table 1.
As shown in fig. 1, the method for refining and scoring for SQL query in this embodiment includes:
1) according to SQL query answer Q provided by the teacher for the target question and the input conversion rule set R, SQL query equivalent transformation is carried out to obtain an equivalent SQL query answer set Q+
2) According to the original SQL answer Q submitted by students for the target subjectsCarrying out SQL query correction on the data dictionary Dict related to the query to obtain a corrected SQL answer QsCalculating the correction cost of the student SQL answer;
3) carrying out normalization processing on the corrected SQL answer Qs to obtain a normalized SQL statement Q'sConverting the obtained standard SQL statement into a relational algebra expression and further converting the relational algebra expression into a student SQL answer query tree;
4) answer set Q for equivalent SQL queries+Each equivalent SQL query answer Q in (a)i: first, the answer Q to the equivalent SQL queryiCarrying out normalization processing to obtain a normalized SQL statement Q'iConverting the obtained standard SQL statement into a relational algebra expression and further converting the relational algebra expression into an equivalent SQL query answer query tree; then, calculating a standard SQL statement Q 'according to the edit distance between the student SQL answer query tree and the equivalent SQL answer query tree'sAnd a canonical SQL statement Q'iA transition cost between; according to the correction cost of student SQL answer and the normalized SQL statement Q'sAnd canonical SQL statement Q'iAnd calculating the conversion cost to obtain a refined score of the student SQL answer.
As can be seen from fig. 1, as an optional implementation manner, in this embodiment, specifically, an SQL query equivalence transformation module is used as an execution main body of step 1), and in step 1), one SQL query answer (as shown in table 1) provided by a teacher is used as an input, and an equivalence transformation rule of a relational algebra expression is used to perform equivalence transformation on the relational algebra expression of the SQL query answer, so as to obtain a plurality of relational algebra expressions equivalent to the relational algebra expression of the SQL query answer, and further obtain an SQL query answer set equivalent to the SQL query answer, thereby ensuring fairness in scoring of equivalent answers of different forms submitted by students.
In this embodiment, in step 1), an equivalent SQL query transformation is performed to obtain an equivalent SQL query answer set Q+Comprises the following steps:
1.1) acquiring an SQL query answer Q provided by a teacher aiming at a target question and an input conversion rule set R;
1.2) converting the SQL query answer Q into a relational algebra RA (Q), and adding the relational algebra RA (Q) into a relational algebra set RA (Q)T
1.3) if the conversion rule set R is empty, skipping to execute the step 1.5); otherwise, the current conversion rule R is traversed and taken out from the conversion rule set RiSkipping to execute the next step;
1.4) algebraic relationship RA (Q) according to current conversion rule RiPerforming equivalence transformation to obtain equivalence relation algebra
Figure BDA0002667239540000071
If equivalence relation algebra
Figure BDA0002667239540000081
Different from the relationship algebra RA (Q), the equivalent relationship algebra
Figure BDA0002667239540000082
Adding relational algebra sets RAT(ii) a Collecting relationship algebra RATConverting into SQL sentence to obtain equivalent SQL query answer and adding equivalent SQL queryAnswer set Q+(ii) a Skipping to execute step 1.3);
1.5) outputting equivalent SQL query answer set Q+
Referring to steps 1.1) to 1.5), the SQL answer Q given by the teacher and the relational algebra expression equivalent transformation rule set R are used as input, Q is firstly converted into a relational algebra expression RA (Q), and the relational algebra expression is added into the equivalent relational algebra expression set RATIn (1). The relational algebra expression ra (Q) of Q is then equivalently transformed each time based on one of the transformation rules in the set of transformation rules R. If the transformed relational algebra expression
Figure BDA0002667239540000083
A difference is found between the relational algebraic expression RA (Q) and Q, then
Figure BDA0002667239540000084
Is an equivalent relation algebraic expression of Q, and is added into an equivalent relation algebraic expression set RATIn (1). Finally, RA is addedTAll equivalent relational algebra expressions in the system are converted into SQL sentences, and the obtained equivalent SQL query sentence set Q based on SQL answers Q given by teachers+And returning.
It should be noted that the conversion rule set R may be considered as specified as required, for example, as shown in table 1, the conversion rule set R in this embodiment includes three classes, i.e., an operator-based equivalent transformation rule, an operation-based equivalent transformation rule, and a sub-query equivalent transformation rule, and each class includes a plurality of specific rules.
Table 1: a set of conversion rules R for relational algebra expressions.
Figure BDA0002667239540000085
In this embodiment, for a target topic, "a student who queries all History systems (History) and physical systems (Physics) in a student table (student)", step 1) first converts a SQL answer provided by a teacher into a relational algebra expressionAnd (4) RA. Then, based on the relational algebra expression equivalent transformation rule R1.2 shown in Table 1, RA is subjected to equivalent transformation to obtain the relational algebra expression RA equivalent to RA1,RA1The corresponding SQL query is as follows:
SELECT*
FROM student
WHERE dept_name='Physics'
OR dept_name='History'
the equivalent SQL answer is consistent with the syntax structure of the SQL answer given by the student A and is the SQL query given based on the condition limit of the WHERE clause, so that the fairness of the refinement and grading of the SQL answer of the student A in the syntax structure aspect can be ensured. Consider that there are partial errors in the SQL answers for both student a and student B, resulting in their failure to be converted into relational algebra expressions.
Step 1.4) the relation algebra RA (Q) is converted according to the current conversion rule RiPerforming equivalence transformation (abbreviated as SQLTransform in this embodiment) to obtain equivalence relation algebra
Figure BDA0002667239540000091
Relating to a relational algebra conversion function transformDAG and an equivalent transformation function ApplyRules for performing equivalent transformation according to relational algebra and a transformation rule, the detailed steps comprise:
1.4.1) generating a directed acyclic Graph (Graph) by using a relational algebra RA (Q) and adopting a relational algebra conversion function transformDAG;
the relational algebra conversion function TransformDAG is used for carrying out rule matching on the relational algebra, the relational algebra is required to be expressed in a mode of a directed acyclic Graph, the input of the relational algebra conversion function TransformDAG is relational algebra RA (Q), and the output of the relational algebra conversion function TransformDAG is directed acyclic Graph, and the executing step of the relational algebra conversion function TransformDAG comprises the following steps:
s1, creating a directed acyclic graph;
s2, acquiring a first node (RA (Q) root) of a relation algebra RA (Q) as a current node rel;
s3, judging whether the directed acyclic Graph of the current node rel is true, if true, jumping out of the recursion round, and jumping to execute the step S6; otherwise, executing the next step;
s4, acquiring node set inputs connected with the current node rel;
s5, aiming at each node input in the node set inputs: calling a relational algebra conversion function transformDAG to recursively convert the node input into a Graph node child Vertex, adding the Graph node child Vertex into the directed acyclic Graph and setting a directed edge pointing to a child node child Vertex for a current node rel;
s6, trying to obtain the next node of the relation algebra RA (Q) as the current node rel, and if the node is successful, jumping to execute the step S3; otherwise, judging that the recursion is ended, and outputting the directed acyclic Graph.
1.4.2) taking the conversion rule set R and the directed acyclic Graph as input, and calling an equivalent transformation function ApplyRules to obtain the deformed directed acyclic Graph'.
The equivalent transformation function ApplyRules is a rule matching algorithm based on depth-first traversal, and the input of the equivalent transformation function ApplyRules is a transformation rule set R and a directed acyclic Graph, and the output of the equivalent transformation function ApplyRules is a deformed directed acyclic Graph. The executing step of the relational algebra conversion function transformDAG comprises the following steps:
s1, initializing and setting the matching times matches to be 0;
s2, if the matching times matches are less than or equal to the limit times matchLimit, obtaining the current Graph node vertex from the directed acyclic Graph, and if the obtaining is successful, skipping to execute the next step; otherwise, jumping to execute step S4;
s3, traversing each rule in the conversion rule set R, and carrying out the following processing aiming at the conversion rule R: equivalently deforming the current Graph node vertex by using a rule to obtain a new node newVertex, if the current Graph node vertex is different from the new node newVertex, increasing the matching times matches by 1, and if the matching times matches still do not reach the limit times matchLimit, performing traversal conversion on the Graph by using depth-first traversal according to a conversion rule set R, the new node newVertex and the matching times matches to obtain a deformed directed acyclic Graph; jumping to execute step S2;
and S4, outputting the deformed directed acyclic Graph'.
Take the following example SQL as an example:
SELECT instructor.name,instructor.salary
FROM instructor JOIN teaches ON instructor.id=teaches.id
WHERE instructor.salary>3000AND
first, the example SQL is converted into a relational algebra expression, and a directed acyclic Graph obtained by recursion of the relational algebra expression using a relational algebra conversion function TransformDAG is shown in fig. 3.
And secondly, traversing from the root node of the directed acyclic Graph shown in the figure 3 by using an equivalent transformation function ApplyRules, and when traversing to the Filter node and matching the Filter node to meet the equivalent deformation requirement of a rule R1.1 in the table 1, performing equivalent transformation on the Filter node according to the rule R1.1 to obtain the directed acyclic Graph shown in the figure 4.
And thirdly, performing rule matching on the Filter nodes pushed down in the second step according to depth-first traversal in an equivalent transformation function ApplyRules. And (3) traversing all the rules of the Filter nodes of the directed acyclic graph in the graph 3, if no rule which accords with the type of the Filter nodes exists, continuously performing rule matching on subsequent nodes of the Filter, and stopping until all the nodes do not accord with the equivalent deformation rule. Finally, the directed acyclic graph shown in fig. 3 is restored to SQL-2, and an SQL query that is equivalent to the input SQL and has a different writing method is obtained as follows:
SELECT name,salary FROM(SELECT*FROM public.instructor,public.teaches WHERE instructor.id=teaches.id)AS t WHERE t.salary>3000AND t.year>2010
and fourthly, after the first result is obtained according to the third step, performing rule matching on the node Join in the Graph 3 again according to depth-first traversal in an equivalent transformation function ApplyRules, wherein the node Join meets a rule R3.1 in the table 12, and performing equivalent transformation on the node Join according to the rule R3.1 to obtain the directed acyclic Graph shown in the Graph 5. Finally, the directed acyclic graph shown in fig. 5 is restored to SQL-3, and an SQL query that is equivalent to the input SQL and has a different writing method is obtained as follows:
SELECT t.name,t.salary FROM(SELECT*FROM public.instructor WHERE salary>3000)AS t INNER JOIN(SELECT*FROM public.teaches WHERE course_id>500)AS t0 ON t.id=t0.id
and fifthly, after a second equivalent transformation result is obtained according to the fourth step, traversing the residual matchable rules for the Join node according to an equivalent transformation function ApplyRules. Traversing the residual rules, if the Join node meets the rule R2.3 in the table 1, performing equivalent transformation on the Join node according to the rule R2.3, and obtaining the deformed directed acyclic Graph' as shown in FIG. 6. Finally, the directed acyclic graph shown in fig. 6 is restored to obtain SQL-4, and an SQL query with the same semantics as the SQL provided by the teacher and a different writing method is obtained, as follows:
SELECT instructor.name,instructor.salary
FROM(SELECT*FROM instructor,teaches WHERE instructor.id=teaches.id)AS t
WHERE t.salary>3000AND t.year>2010
referring to fig. 1, as an optional implementation manner, step 2) in this embodiment is specifically implemented by using an SQL query modification module. In step 2) of this embodiment, the SQL query containing partial errors (which may be a keyword error, a pattern information error, or a clause sequence error) submitted by the student is automatically corrected by using the SQL syntax rules and the data dictionary related to the query, so as to obtain a corrected student answer SQL statement. The corrected student SQL answers can be converted into relational algebra expressions in a subsequent refinement scoring module, so that the student SQL answers with partial errors are refined and judged, and the fairness of scoring is guaranteed.
The SQL refinement scoring strategy based on the syntactic structure is implemented on the premise that SQL query answers submitted by students can be converted into relational algebra expressions, and further refinement scoring of the student SQL answers can be given by comparing differences between query trees determined by the relational algebra expressions of the student SQL answers and the query trees of correct answers. However, as can be seen from the actual teaching feedback, the SQL answers submitted by students often appear similar to the SQL answers given by teachers, but there are some cases of errors. The errors that students are prone to include two main cases: (1) errors exist in SQL keywords or database schema information; (2) SQL clauses are in wrong order. These errors all result in the system's inability to convert student submitted SQL answers into relational algebra expressions. In view of this, the present embodiment first implements automatic correction of SQL answers submitted by students and containing partial errors based on SQL syntax rules and a data dictionary related to queries; and then correcting the clause sequence error existing in the SQL answer of the student based on the SQL query grammar. In this embodiment, the step 2) of performing SQL query modification to obtain a modified SQL answer Qs includes:
2.1) original SQL answer Q submitted by studentsThe original SQL answer Q is obtained by segmentationsA set of clauses C;
2.2) amending the original SQL answer Q based on the clause set CsThe SQL key words and the database mode information in the SQL database are wrong, and the SQL answer Q is corrected based on the clause set CsThe clauses in (1) are in wrong order.
This embodiment corrects the original SQL answer Q based on the clause set C in step 2.2)sThe step of the SQL key word and the database mode information error in (1) comprises the following steps:
2.2.1A) generating a data dictionary Dict from a set of SQL query keywords DictkeyDatabase schema information set DictschemaComposition, said database schema information set DictschemaThe database mode information comprises a basic table name, a view name, a field name and an index name;
2.2.2A) traversing clauses in the clause set C to obtain the current clause CiIf the clause set C is traversed completely, judging that the SQL key words and the database mode information are corrected completely, and returning; otherwise, skipping to execute the next step;
2.2.3A) for the current clause CiPerforming word segmentation processing to obtain a word and brand sequence T to be corrected;
2.2.4A) obtaining a current word tag T from the word tag sequence T in a traversal wayjIf the traversal is finished, skipping to execute the step 2.2.2A); otherwise, skipping to execute the next step;
2.2.5A) judging the current word brand TjWhether the dictionary Dict is true or not is judged, if true, the next step is executed by skipping; otherwise, skipping to execute the step 2.2.4A);
2.2.6A) SQL query key set Dict in data dictionary DictkeyThe distance between the current word plate and the word plate T is calculated based on the editing distance of the character stringjThe key word ED having the smallest edit distance therebetweenkeySet of database schema information Dict in data dictionary DictschemaThe distance between the current word plate and the word plate T is calculated based on the editing distance of the character stringjThe key word ED having the smallest edit distance therebetweenschemaIf the key word EDkeyKeyword EDschemaIf the absolute value of the difference is less than the preset threshold, the word tag T is selected according to the current word tagjIn clause CiContext of (1) determining the current word brand TjThe corrected result obtains the current word card T 'after modification'jOtherwise, the current word plate T is usedjCorrection to key words EDkeyKeyword EDschemaThe object with the minimum editing distance in the two elements is in the data dictionary Dict; the jump performs step 2.2.4A).
Step 2.2.1A), when the data dictionary Dict is generated, SQL keywords relevant to query are extracted based on an SQL standard (ISO/IEC9075)2016 to obtain an SQL query keyword set Dictkey. Then extracting the database mode information (including the name of the basic table, the name of the view, the name of the field and the name of the index) of an application database to obtain a database mode information set Dictschema. Gathering Dict with SQL query keyword setkeyAnd the database schema information set DictschemaIntegration can be performed to obtain a data dictionary (denoted as Dict ═ Dict) related to the querykey∪Dictschema). And then, correcting the SQL answer submitted by the student based on the data dictionary Dict related to the query: setting a word plate (token) to be corrected in the Chinese word plate sequence T as a current word plate TjIn this embodiment, the word tag T is obtained by calculation based on the edit distance of the character stringjQuery key word with minimum edit distance between them (denoted as ED)key) And the current word brand TjMinimum edit distance betweenDatabase schema information (denoted as ED)schema) And will present word brand TjThe element of Dict that is the smallest edit distance from it is corrected. If EDkeyWith EDschemaWith current word brand TjIf the difference between the editing distances is less than the threshold value, the current word brand T is usedjContext of department determines current word brand TjThe corrected result of (1). And the correction cost for correcting the SQL key words and the database mode information is calculated simultaneously in the process of correcting the student SQL answer.
See steps 2.2.1A) -2.2.6A), the present embodiment first performs the correction of the keyword and the database schema information for each clause C in CiPerforming word segmentation to obtain CiT. Secondly, whether each token in the T is an element in the data dictionary Dict related to the query is judged. If a token TjNot in Dict, T is calculated separatelyjWith the query key set Dict in DictkeyMinimum string edit distance ED for each elementkeyAnd TjAnd the database schema information set Dict in DictschemaMinimum string edit distance ED for each elementschema. If edit distance EDkeyAnd DictschemaThe difference between them is not large, and T is needed to be based onjIn clause CiTo determine T from the context information injIf not, then TjThe element of Dict that is the smallest edit distance from it is corrected.
In this embodiment, in step 2.2), SQL answer Q is corrected based on clause set CsThe step of clause misordering in (1) comprises:
2.2.1B) initially set the set of error clauses CerrorSet of null, unmatched clauses CunmatchIs empty;
2.2.2B) traversing the clauses in the clause set C to obtain the current clause CiIf the clause set C is traversed, skipping to execute the step 2.2.4B); otherwise, skipping to execute the next step;
2.2.3B) judging the next clause C of the current clausei+1For the current clause CiIf the next word is true, the current sub-sentence is determinedSentence next clause Ci+1Adding a set of wrong clauses CerrorThe current clause C is addediAdding a set of unmatched clauses Cunmatch(ii) a If yes, the next clause C of the current clause is addedi+1As the current clause CiThe next word or sentence; jump execution 2.2.2B);
2.2.4B) traverse the set of unmatched clauses CunmatchThe clause in (1) obtains the current clause Cl(ii) a If not, clause set CunmatchAnd step 2.2.7B) is executed by jumping after traversing is finished; otherwise, skipping to execute the next step;
2.2.5B) in the wrong clause set CerrorFind current clause C inlMatched clause Cmathch
2.2.6B) if clause CmathchIf the answer is null, the SQL answer Q submitted by the student is transmittedsReturning as a revised SQL answer Qs; otherwise, clause C is setmathchAs the current clause ClThe next clause of (a); skipping to execute step 2.2.4B);
2.2.7B) if wrong clause set CerrorIf not, the SQL answer Q submitted by the studentsReturning as a revised SQL answer Qs; otherwise, adjusting the sequence of the clauses in the clause set C according to the determined sequence, and then converting the clause set C after the sequence is adjusted into an SQL statement to be used as a modified SQL answer QsAnd returning.
See steps 2.2.1B) -2.2.7B), this embodiment first traverses Q sequentially for the clause-sequential correction portionsEach clause C in clause set CiJudging the successor clause Ci+1Whether the SQL query clause order rule shown in table 2 is satisfied. If not, C is addedi+1Adding a set of mismatching clauses CerrorAnd C isiJoining set of unsuccessfully matched clauses Cunmatch(ii) a If yes, clause C is addediIs set to Ci+1. For CunmatchEach unsuccessfully matched clause C in (a)lFrom the set of mismatching clauses CerrorIn is ClThe match satisfies the subsequent clause restriction rule (see table 2 for details) andthe clause with the strongest relevance is set as the successor clause. If not, and C can not be foundlThe matched subsequent clause indicates that the SQL answer Q which can not be submitted by the studentsCorrection is effected, at which point Q is returnedsItself; if C can be founderrorMatching successor clause C in (1)matchIs mixing Q withsIs set to CerrorWhile simultaneously adding CerrorFrom CerrorAnd (4) clearing. If the matching process is finished, finding the wrong matching clause set CerrorIf there is still an object that fails to match, it indicates that the SQL answer Q that cannot be submitted to the student is not the samesCorrection is effected, at which point Q is returnedsItself; otherwise, the description is QsSuccess of correction based on QsAnd reordering the clause set C according to the successor clause information of each clause in the clause set C, and returning to generate a corresponding SQL statement based on the reordered clause sequence.
The present embodiment will correct the SQL query syntax in the SQL standard (ISO/IEC9075)2016 for clause ORDER errors present in student submitted SQL answers (e.g., ORDER BY clause errors placed before the HAVING clause). The SQL clause order adjustment rule is shown in table 2.
Table 2: the SQL clause order adjusts the rules.
Clause Subsequent clause
SELECT FROM
FROM JOIN、WHERE、<NULL>
WHERE GROUP BY, collective clause,<NULL>
JOIN JOIN、WHERE、<NULL>
Set keywords SELECT
GROUP BY HAVING、ORDER BY、<NULL>
HAVING ORDER BY、<NULL>
ORDER BY ORDER BY、<NULL>
Table 2 gives the syntax order of each clause in the SQL query statement shown in the SQL standard, where < NULL > indicates that the current clause may be followed by no other clauses. For example, as can be seen from table 3, the HAVING clause may be followed BY the ORDER BY clause, which is followed BY either no clause or may be followed BY the ORDER BY clause and no other clauses such as HAVING. Based on the SQL clause order adjustment rule shown in table 2, in this embodiment, a student SQL answer corrected by SQL keywords and database mode information and a clause set of the student SQL answer obtained by division before are used as input, and then whether the position of each clause meets the SQL clause order rule shown in table 2 is checked, and if not, correction is performed according to the rule.
In this embodiment, for the target topic "students who query all the History systems (History) and physical systems (Physics) in the student table (student)", step 2) performs keyword and database schema information correction on SQL answers of the students a and B based on the data dictionary Dict related to the query. Specifically, the method comprises the following steps: for the SQL answer of student A, correcting the error token 'dept _ names' into the element 'dept _ name' with the minimum editing distance of the character string in the Dict; for student B's SQL answer, the error token "SELET" is modified to the element "SELECT" in Dict that is the smallest edit distance from its string. Because the SQL answers of two students have no clause sequence error, the clause sequence in the SQL answers of the two students does not need to be adjusted.
Referring to fig. 1, as an optional implementation manner, step 3) and step 4) in this embodiment are implemented by an SQL query refinement scoring module, which takes an SQL query answer set equivalent to an SQL query answer, a modified student SQL answer, and a modification cost as inputs, and obtains a normalized SQL statement by performing normalization processing on each equivalent SQL answer and each modified student SQL answer. Then, the SQL statements of each specification are converted into relational algebra expressions, and then a query tree structure corresponding to the SQL statements can be obtained. And then, determining the refined score of the student SQL answer by taking the correction cost of the student SQL answer and the editing distance between the student SQL answer query tree and each equivalent SQL answer query tree as input.
In order to reduce the syntax difference between the SQL answers irrelevant to the SQL equivalent transformation, and thus reduce the workload of analyzing the SQL answers subsequently, the present embodiment performs the normalization preprocessing on the SQL answers provided by the teacher and the SQL answers submitted by the students. The normalization process in step 3) of this embodiment includes:
normalized relationship assertion: standardizing the form of the equivalent relationship assertion into the relationship assertion in a unified specified form; for example, the assertion NOT (a < B) is replaced with the assertion a > -B, so that the NOT operator does NOT appear in the SQL answer.
Normalized connection query: standardizing the equivalent connection query form into a connection query in a unified specified form; for example, a connection query initiated by NATURAL INNER JOIN is uniformly rewritten to a connection query initiated by INNER JOIN.
Normalized nested queries: standardizing the form of the equivalent nested query into a nested query of a unified specified form; for example, uniformly rewriting those nested sub-queries connected by the operator IN/ANY into nested sub-queries connected by EXISTS.
Delete betweeen assertion: the form of the equivalent interval query (betweeen) is specified as a representation based on a comparison operator. For example, uniformly rewriting assertions represented by betweeen in SQL queries into their equivalent assertions represented by comparison operators (e.g., > and <).
In order to quantify the difference between the SQL answer provided by the teacher and the modified SQL answer of the student in the syntactic structure, the present embodiment converts the SQL answer provided by the teacher and the SQL answer submitted by the student into a query Tree in a fat Tree (Flattened Tree) structure, respectively. The query tree used in the query optimization process of the relational database system has mature heuristic optimization rules and can be optimized into a consistent organization form, so that the difference comparison of syntax structures between the SQL answer provided by a teacher and the modified student SQL answer is effectively supported. FIG. 2 illustrates the query tree structure of the following SQL query:
SELECT student.name
FROM student,takes
WHERE student.id=takes.id
AND takes.course_id=’2’;
it should be noted that, the query tree and the generation method thereof are prior art, and therefore, the specific implementation is not expanded here. After the SQL answer provided by the teacher and the modified student SQL answer are converted into the query tree structure, the conversion cost on the syntactic structure between the two answers can be quantized based on the editing distance of the tree. Based on the correction cost of the SQL answer submitted by the student and the conversion cost between the query tree of the SQL answer provided by the teacher and the corrected query tree of the SQL answer of the student, the detailed scoring of the SQL answer submitted by the student can be realized. It can be seen that the implementation of refinement scoring relies on a computational method that defines two costs. The calculation of the two costs depends on the weight setting given by the teacher aiming at different clause types, and the weight value expresses the importance degree of different clauses for SQL query. The related clause types include 9, which are respectively a SELECT clause, a FROM clause, a WHERE clause, a GROUP BY clause, a JOINS clause, an AGGREGATES clause, a DISTINCT clause, a HAVING clause and a SUBQUERY clause. Two cost calculation methods are introduced below based on the weight setting of clauses.
In this embodiment, the SQL statement Q 'specified in step 4) is'sAnd canonical SQL statement Q'iThe computational function expression of the conversion cost between is:
Figure BDA0002667239540000161
in the above formula, Penaltyt(Q′s,Q’i) Is a canonical SQL statement Q'sAnd a canonical SQL statement Q'iConversion cost between, Component (Q's) Express canonical SQL statement Q'sSet of composition elements of the corresponding query tree, OiExpress canonical SQL statement Q'sThe ith combined element, ED (O), of the corresponding query treei,Q’i) Is a specification-based SQL statement Q'iModified canonical SQL statement Q 'of the corresponding query tree'sCorresponding combined element O in query treeiThe value of the edit distance introduced is,
Figure BDA0002667239540000168
is a combination element OiAnd W is the sum of the weighted values of all clause types, and S is the score of the target topic.
Calculating the correction cost of the student SQL answer in the step 2) of the embodiment includes calculating a correction SQL answer QsThe first correction cost of SQL keyword and database mode information error in the database, and calculating the corrected SQL answer QsA second correction cost for a wrong order of clauses in (1); and the calculation function expressions of the first correction cost and the second correction cost are shown as the following formulas:
Figure BDA0002667239540000162
in the above formula, Penaltyc(Qs) Watch (A)Shows a first correction cost, QsAs a revised SQL answer, C (Q)s) Answer Q for SQLsThe set of clauses of (a) is,
Figure BDA0002667239540000163
is a clause CiThe number of editing modifications of (a) the number of editing modifications,
Figure BDA0002667239540000164
is a clause CiThe weight values of the clause types belong to, W is the sum of the weight values of all the clause types, and S is the score of the target topic;
Figure BDA0002667239540000165
in the above formula, Penaltya(Qs) Representing a second correction cost, QsAs a revised SQL answer, C (Q)s) SQL answer QsSet of clauses of, I (C)i) Is a clause CiIf clause CiIs adjusted to I (C)i) 1, and vice versa I (C)i)=0,
Figure BDA0002667239540000166
Is a clause CiAnd W is the sum of the weighted values of all clause types, and S is the score of the target topic.
In this embodiment, when the refined score of the student SQL answer is calculated in step 4), a function expression is adopted as follows:
Figure BDA0002667239540000167
in the above formula, G (Q)s) For the refined score of the student 'S SQL answer, S is the student' S submitted original SQL answer QsScore of (1), Penaltyc(Qs) Represents a first correction cost, Penaltya(Qs) Indicating a second correction cost, Penaltyt(Q′s,Q’i) Is standardizedSQL statement Q'sAnd a canonical SQL statement Q'iThe cost of the conversion between (a) and (b),
Figure BDA0002667239540000171
representing all canonical SQL statements Q'sAnd a canonical SQL statement Q'iThe minimum in the transition costs between. Wherein, the last item of the formula represents the student SQL answer Q 'after being corrected'sThe equivalent answer with the smallest conversion cost between the query trees. It can be seen that the refined score of the student SQL answer is obtained by deducting the correction cost of the SQL answer submitted by the student on the basis of the score of the SQL programming question and converting the corrected student SQL answer into the conversion cost of the SQL answer closest to the syntactic structure of the student SQL answer.
In this embodiment, for a target topic "a student who queries all historical systems (History) and physical systems (Physics) in a student table (student)", step 4) calculates an edit conversion cost between an SQL answer query tree of each student and each equivalent SQL answer query tree based on two equivalent SQL answers to the programming question, and further refines and obtains a refined score of an SQL answer of the student a and an SQL answer of the student B based on a correction cost of the SQL query and the edit conversion cost between the query trees.
The complexity analysis of the method of this example is as follows: the method of the embodiment integrates the SQL query equivalent transformation (SQLTranform) and the SQL query correction (SQLCorrection), and finally achieves the refinement scoring of the SQL querycAnd SQL clause order modification cost Penaltya. Then answer Q 'to student SQL'sAnd an equivalent SQL answer set Q+Each equivalent SQL answer in (1) is normalized, and then a corrected student SQL answer Q 'is calculated'sQuery tree and Q+The query tree of each equivalent SQL answer and the translation cost between them. And finally, calculating to obtain and return a refined score of the student SQL answer. The spatiotemporal complexity of the refined scoring method of the present embodiment is analyzed as follows. This example is detailedThe main calculation cost of the chemical scoring method comes from three parts: (1) the computation cost of the equivalent transformation method (SQLTranform method) of SQL queries; (2) the computation cost of the SQL query correction method (SQLCorrection method); (3) and calculating the cost of conversion between the corrected student SQL answer and the query tree of each equivalent SQL answer. The temporal complexity of the first partial SQLTranform method is O (| R |), where | R | is the cardinality of the equivalent transformation rule set R of the relational algebra expression. The temporal complexity of the second partial SQLCorrection method is O (T)ELog L), wherein TEIs student SQL answer QsThe total number of false tokens contained in each clause, L is the maximum length of the element name string in the data dictionary Dict, and O (Log L) represents the time complexity of searching a string in Dict based on the Tire tree index. And in the third part, the time complexity of conversion calculation between the modified student SQL answers and the query tree of each equivalent SQL answer is O (| Q)+| N log N), wherein | Q+I is SQL equivalent answer set Q+Is the average number of constituent elements included in the query tree, and O (N × log N) is the computational time complexity of the query tree edit distance. In summary, the computation time complexity of the method of the present embodiment is O (| R | + T)E*logL+|Q+| N | _ log N). Since the space consumption of the method of this embodiment is mainly from storing the data dictionary Dict, the space complexity of the method of this embodiment is O (| Dict |), wherein | Dict |, represents the number of elements included in the data dictionary Dict.
In addition, the present embodiment also provides an SQL-query-oriented refinement and scoring system, which includes a computer device programmed or configured to execute the steps of the SQL-query-oriented refinement and scoring method, or a computer program programmed or configured to execute the aforementioned SQL-query-oriented refinement and scoring method stored in a memory of the computer device.
Furthermore, the present embodiment also provides a computer-readable storage medium, in which a computer program programmed or configured to execute the above-mentioned SQL query-oriented refinement scoring method is stored.
In order to test the effectiveness of the method (abbreviated as SQL-graph) of the present embodiment, the present embodiment compares the variation technology of SQL-graph (abbreviated as SQL-graph w/o correction technology) that does not include the SQL query modification function with the most representative related work XData technology as the comparison of the method of the present embodiment. In order to obtain student SQL answers, five textbooks (including 284 students) with database technology courses are provided with jobs (covering typical SQL query types such as set query, connection query and condition query) including 12 SQL programming questions on an SQL online learning platform. Based on SQL answer data which are submitted by students and aim at 12 SQL programming questions, the advantages and disadvantages of the three SQL scoring technologies are compared. The information on the number of students from the five classes is shown in table 4.
Table 4: the experiment relates to information about the class.
Course(s) Number of students Number of SQL questions
DB-1 58 12
DB-2 52 12
DB-3 55 12
DB-4 57 12
DB-5 62 12
In the experiment of this embodiment, the teacher provides only one SQL answer to each SQL programming question. The weighted values of different query clause types which are commonly depended on by the three technologies for realizing the SQL refinement scoring function are all set to be 1, which is shown in table 5 in detail.
Table 5: and inquiring the weight setting of the clause type.
Sub-query type Weight of Sub-query type Weight of
SELECT 1 AGGREGATES 1
FROM 1 DISTINCT 1
WHERE 1 HAVING 1
GROUP BY 1 SUBQUERY 1
JOINS 1
Since scorers often have a certain subjectivity when scoring the SQL answers of students, it is not suitable for directly numerically comparing the SQL query scores given by scorers with the SQL query scores given by various technologies to determine whether the scores given by the SQL refinement scoring technology are fair. In view of this, following the experimental strategy described in Automated Grading of SQL Queries (ICDE,2019: 1630-1633) of Chandra B et al, several random creations of the form of multiple questions j are created for each question j
Figure BDA0002667239540000181
Each query pair contains two student SQL answers with partial errors, and the answer is recorded as
Figure BDA0002667239540000182
And
Figure BDA0002667239540000183
(wherein
Figure BDA0002667239540000184
The 1 st SQL query in the nth query pair representing the SQL programming question j,
Figure BDA0002667239540000185
then the 2 nd SQL query is represented). Then two teachers with the course teaching experience of database technology for more than 8 years and three assistant teachers are invited to serve as scorers, and the scorers are required to mark each SQL query pair as one of the following three categories, and the final mark of each SQL query pair is determined based on a few rules subject to majority:
(1)
Figure BDA0002667239540000191
a higher score should be obtained;
(2)
Figure BDA0002667239540000192
a higher score should be obtained;
(3) both contain different errors and they should get almost the same score.
Next, each SQL query in each SQL query pair is refined and scored using three SQL refinement scoring techniques, respectively, and each query pair is classified as one of the above three categories based on the results of the refinement scoring. It should be noted that if the difference between the refined scores of two SQL queries in a SQL query pair is less than 10% of the score of the SQL programming question to which they correspond, then the query pair is labeled as the third class.
For SQL query pair SQj,nIf the scorer and the SQL automatic scoring technology classify the scorer and the SQL automatic scoring technology into the same class, the SQL automatic scoring technology is determined to be SQj,nThe score for the medium SQL query is a fair score. Therefore, the fairness of scoring of a certain SQL refinement scoring technique with respect to the SQL programming question j can be quantified based on the following scoring accuracy formula.
Figure BDA0002667239540000193
In the above formula, AccurancyjThe fairness of scoring for a certain SQL refinement scoring technique relative to the SQL programming question j, m is the number of query pairs of the SQL programming question j,
Figure BDA0002667239540000194
then represents SThe label matching result of the nth SQL query pair of the QL programming question j is as follows: if the scorer and SQL refinement scoring technique mark the query pair as the same class, then
Figure BDA0002667239540000195
Otherwise, the reverse is carried out
Figure BDA0002667239540000196
For the same SQL programming question, the SQL answer submitted by students is diversified in form, and different solution ideas of the students are reflected. To analyze the existence of student answer diversity in this SQL programming practice, 3167 student SQL answers collected from the experiment were counted. Assuming that the ratio of the kth SQL type answer type of the SQL programming question E is pkAnd k is 1, …, n, then the formula for the student answer diversity index (E) of the SQL programming question E is given based on the idea of information entropy:
Figure BDA0002667239540000197
in the above formula, pkThe proportion of the kth SQL answer type of the SQL programming question E is, and n is the classification number of the SQL programming question E.
Table 6 shows student SQL answer type division ratios of 12-pass SQL programming questions and diversity index of student SQL answers. As can be seen from table 6, the student SQL answers corresponding to each SQL programming question exhibit significant diversity. The 9 th student SQL answer diversity expression is most obvious, and the student answer diversity index reaches 1.415: 20% of students use the idea of aggregate queries to compose answers, 57% of students use the idea of conditional queries to compose answers, and 23% of students use the idea of nested queries to compose answers. The diversity of student answers has affected the fairness in scoring for XData technologies that rely only on a single SQL answer provided by the teacher to implement SQL refinement scoring. Meanwhile, the necessity of automatically generating a plurality of equivalent SQL answers based on the equivalent transformation rule of the relational algebra expression in the method is also proved.
Table 6: student answer diversity analysis table.
Figure BDA0002667239540000201
In this embodiment, the fairness of scoring the SQL answers submitted by the students is compared with the three SQL refinement scoring techniques. And randomly extracting 202 SQL query pairs without repetition from the incorrect student SQL answer set of each SQL programming question as evaluation objects. Table 7 shows the number of equivalent SQL answers obtained for each SQL programming question and the accuracy of the SQL refinement scoring for the three SQL refinement scoring techniques.
Table 7: and (5) carrying out score fairness analysis.
Figure BDA0002667239540000202
As can be seen from table 7, the XData technique has the lowest scoring accuracy (embodying the fairness of scoring), and the average value of the scoring accuracy is only 61.55%, because it cannot cope well with the diversity of SQL answers of students, and does not correct the SQL answers submitted by students with partial errors. Compared with the XData technology, the method provided by the embodiment can generate a plurality of SQL answers with different semantic equivalent forms as a batch modification basis, and corrects the student SQL answers with partial grammar errors, so that the highest scoring fairness is realized on 12 SQL programming questions, the average scoring precision reaches 83.15%, and the average scoring precision is improved by 33% compared with the average scoring precision of the XData technology on the 12 questions. Meanwhile, it can be observed that the score accuracy of the SQL-graph w/o correction method is averagely lower by 9 percent than that of the method in the embodiment because the SQL answer submitted by the student is not corrected. This shows that modifying the student submitted SQL answers is very effective in improving the fairness of the SQL refinement scoring. Table 7 also shows that the scoring accuracy of the method of this embodiment on the SQL programming question of the 9 th channel exceeds 90%, because the SQL answers submitted by students are the most diverse (see table 6 in detail), which makes the superiority of the method of this embodiment over the XData technology more remarkable.
FIG. 7 shows SQL refinement scoring runtime for the same number of equivalent SQL answers under different SQL refinement scoring techniques. As shown in FIG. 7, XData technology has minimal runtime and its runtime does not fluctuate significantly with the number of equivalent SQL answers. This is because XData performs scoring of SQL queries based on only one teacher-supplied SQL answer. The two scoring techniques, namely the method of the present embodiment and the SQL-score w/o correction method, can generate a plurality of equivalent SQL answers based on the SQL answers provided by the teacher and accordingly achieve scoring of the SQL query, so that the running times of the two scoring techniques fluctuate with the change of the number of the equivalent SQL answers owned by the problem. Meanwhile, it can be observed that, as the number of equivalent SQL answers owned by the problem increases, the running times of the method of the present embodiment and the SQL-graph w/o correction method both show a rising trend, but when the number of equivalent SQL answers is 4 or 6, the rising trend of the running times fluctuates. This is because the running time of the method of the present embodiment and the SQL-graph w/o correction method has a positive correlation with the number of wrong answers to the problem, because the more wrong answers, the more time the two methods consume for correcting the SQL answer submitted by the student and calculating the edit distance between the query tree of the SQL answer submitted by the student and the query tree of each equivalent SQL answer. FIG. 8 shows the statistics of the number of wrong answers corresponding to the problem with the same number of equivalent SQL answers in the SQL-GRADER w/o correction method. As can be seen from the data in fig. 7 and fig. 8, for the problem set with the equivalent SQL answer number of 4, the number of wrong answers is significantly smaller than that of the problem set with the equivalent SQL answer number of 3, so the running time of the method of the present embodiment and the SQL-graph w/o correction method on the problem set with the equivalent SQL answer number of 4 is slightly smaller than that on the problem set with the equivalent SQL answer number of 3. Similarly, it is easy to know that the running time of the two methods on the problem set with the number of the equivalent SQL answers of 6 is slightly less than that on the problem set with the number of the equivalent SQL answers of 5 due to the difference of the number of wrong answers. In addition, it can be observed from FIG. 7 that the running time of the SQL-GRADER w/o correction method is slightly less than that of the present embodiment. This is because the SQL-GRADER w/o correction method removes the revision module of the SQL query, compared to the present embodiment.
In conclusion, the automatic scoring technology for SQL query can greatly reduce the burden of teachers on batch modification of student SQL work, and has important research significance. However, the existing SQL query automatic scoring technology does not fully consider the fairness of scoring for different forms of answers submitted by students and the fairness of scoring for answers submitted by students containing partial errors, thereby affecting the learning interest and confidence of students in SQL queries. Aiming at the defects of the existing research work, the embodiment provides a SQL query-oriented refinement scoring technology which is named as SQL-GRADER. SQL-GRADER utilizes SQL correction strategy based on correct answer SQL statement equivalent transformation, SQL grammar rule and query related data dictionary to raise the fairness of SQL refinement scoring. The method comprises the following steps of collecting student SQL answer data by a plurality of students in teaching classes participating in SQL programming teaching activities, and carrying out experimental analysis on related SQL query refinement scores based on the collected student answer data to show that: the SQL-GRADER refinement scoring technology can effectively improve the accuracy and fairness of scoring.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (10)

1. A refinement scoring method for SQL query is characterized by comprising the following steps:
1) according to SQL query answer Q provided by the teacher for the target question and the input conversion rule set R, SQL query equivalent transformation is carried out to obtain an equivalent SQL query answer set Q+
2) According to the source submitted by students for target questionsFirst SQL answer QsCarrying out SQL query correction on the data dictionary Dict related to the query to obtain a corrected SQL answer QsCalculating the correction cost of the student SQL answer;
3) the corrected SQL answer Q issCarrying out normalization processing to obtain a normalized SQL statement Q'sConverting the obtained standard SQL statement into a relational algebra expression and further converting the relational algebra expression into a student SQL answer query tree;
4) answer set Q for equivalent SQL queries+Each equivalent SQL query answer Q in (a)i: first, the answer Q to the equivalent SQL queryiCarrying out normalization processing to obtain a normalized SQL statement Q'iConverting the obtained standard SQL statement into a relational algebra expression and further converting the relational algebra expression into an equivalent SQL query answer query tree; then, calculating a standard SQL statement Q 'according to the edit distance between the student SQL answer query tree and the equivalent SQL answer query tree'sAnd a canonical SQL statement Q'iA transition cost between; according to the correction cost of student SQL answer and the normalized SQL statement Q'sAnd canonical SQL statement Q'iAnd calculating the conversion cost to obtain a refined score of the student SQL answer.
2. The SQL query-oriented refinement scoring method according to claim 1, wherein the SQL query equivalence transformation is performed in the step 1) to obtain an equivalent SQL query answer set Q+Comprises the following steps:
1.1) acquiring an SQL query answer Q provided by a teacher aiming at a target question and an input conversion rule set R;
1.2) converting the SQL query answer Q into a relational algebra RA (Q), and adding the relational algebra RA (Q) into a relational algebra set RA (Q)T
1.3) if the conversion rule set R is empty, skipping to execute the step 1.5); otherwise, the current conversion rule R is traversed and taken out from the conversion rule set RiSkipping to execute the next step;
1.4) algebraic relationship RA (Q) according to current conversion rule RiPerforming equivalence transformation to obtain equivalence relation algebra
Figure FDA0002667239530000013
If equivalence relation algebra
Figure FDA0002667239530000014
Different from the relationship algebra RA (Q), the equivalent relationship algebra
Figure FDA0002667239530000015
Adding relational algebra sets RAT(ii) a Collecting relationship algebra RATConverting into SQL sentence to obtain equivalent SQL query answer and adding equivalent SQL query answer set Q+(ii) a Skipping to execute step 1.3);
1.5) outputting equivalent SQL query answer set Q+
3. The SQL query-oriented refinement scoring method according to claim 1, wherein the step of modifying the SQL query in the step 2) to obtain a modified SQL answer Qs comprises the following steps:
2.1) original SQL answer Q submitted by studentsThe original SQL answer Q is obtained by segmentationsA set of clauses C;
2.2) amending the original SQL answer Q based on the clause set CsThe SQL key words and the database mode information in the SQL database are wrong, and the SQL answer Q is corrected based on the clause set CsThe clauses in (1) are in wrong order.
4. The SQL query-oriented refinement scoring method according to claim 3, wherein in the step 2.2), the original SQL answer Q is corrected based on the clause set CsThe step of the SQL key word and the database mode information error in (1) comprises the following steps:
2.2.1A) generating a data dictionary Dict from a set of SQL query keywords DictkeyDatabase schema information set DictschemaComposition, said database schema information set DictschemaThe database mode information comprises a basic table name, a view name, a field name and an index name;
2.2.2A) traversing clauses in clause set C to obtain current clause CiIf the clause set C is traversed completely, judging that the SQL key words and the database mode information are corrected completely, and returning; otherwise, skipping to execute the next step;
2.2.3A) for the current clause CiPerforming word segmentation processing to obtain a word and brand sequence T to be corrected;
2.2.4A) obtaining a current word tag T from the word tag sequence T in a traversal wayjIf the traversal is finished, skipping to execute the step 2.2.2A); otherwise, skipping to execute the next step;
2.2.5A) judging the current word brand TjWhether the dictionary Dict is true or not is judged, if true, the next step is executed by skipping; otherwise, skipping to execute the step 2.2.4A);
2.2.6A) SQL query key set Dict in data dictionary DictkeyThe distance between the current word plate and the word plate T is calculated based on the editing distance of the character stringjThe key word ED having the smallest edit distance therebetweenkeySet of database schema information Dict in data dictionary DictschemaThe distance between the current word plate and the word plate T is calculated based on the editing distance of the character stringjThe key word ED having the smallest edit distance therebetweenschemaIf the key word EDkeyKeyword EDschemaIf the absolute value of the difference is less than the preset threshold, the word tag T is selected according to the current word tagjIn clause CiContext of (1) determining the current word brand TjThe corrected result obtains the current word card T 'after modification'jOtherwise, the current word plate T is usedjCorrection to key words EDkeyKeyword EDschemaThe object with the minimum editing distance in the two elements is in the data dictionary Dict; the jump performs step 2.2.4A).
5. The SQL query-oriented refinement scoring method according to claim 3, wherein in the step 2.2), the SQL answer Q is corrected based on the clause set CsThe step of clause misordering in (1) comprises:
2.2.1B) initially set the set of error clauses CerrorSet of null, unmatched clauses CunmatchIs empty;
2.2.2B) traversing the clauses in the clause set C to obtain the current clause CiIf the clause set C is traversed, skipping to execute the step 2.2.4B); otherwise, skipping to execute the next step;
2.2.3B) judging the next clause C of the current clausei+1For the current clause CiIf the next word is true, the next clause C of the current clause is determined to be truei+1Adding a set of wrong clauses CerrorThe current clause C is addediAdding a set of unmatched clauses Cunmatch(ii) a If yes, the next clause C of the current clause is addedi+1As the current clause CiThe next word or sentence; jump execution 2.2.2B);
2.2.4B) traverse the set of unmatched clauses CunmatchThe clause in (1) obtains the current clause Cl(ii) a If not, clause set CunmatchAnd step 2.2.7B) is executed by jumping after traversing is finished; otherwise, skipping to execute the next step;
2.2.5B) in the wrong clause set CerrorFind current clause C inlMatched clause Cmathch
2.2.6B) if clause CmathchIf the answer is null, the SQL answer Q submitted by the student is transmittedsAs a revised SQL answer QsReturning; otherwise, clause C is setmathchAs the current clause ClThe next clause of (a); skipping to execute step 2.2.4B);
2.2.7B) if wrong clause set CerrorIf not, the SQL answer Q submitted by the studentsReturning as a revised SQL answer Qs; otherwise, adjusting the sequence of the clauses in the clause set C according to the determined sequence, and then converting the clause set C after the sequence is adjusted into an SQL statement to be used as a modified SQL answer QsAnd returning.
6. The SQL query-oriented refinement scoring method according to claim 1, wherein the normalization processing in the step 3) comprises: standardizing the form of the equivalent relationship assertion into the relationship assertion in a unified specified form; standardizing the equivalent connection query form into a connection query in a unified specified form; standardizing the form of the equivalent nested query into a nested query of a unified specified form; the form of the equivalent interval query is normalized to a representation based on a comparison operator.
7. The SQL query-oriented refinement scoring method according to claim 1, wherein the SQL statement Q 'specified in the step 4)'sAnd canonical SQL statement Q'iThe computational function expression of the conversion cost between is:
Figure FDA0002667239530000031
in the above formula, Penaltyt(Q′s,Q’i) Is a canonical SQL statement Q'sAnd a canonical SQL statement Q'iConversion cost between, Component (Q's) Express canonical SQL statement Q'sSet of composition elements of the corresponding query tree, OiExpress canonical SQL statement Q'sThe ith combined element, ED (O), of the corresponding query treei,Q’i) Is a specification-based SQL statement Q'iModified canonical SQL statement Q 'of the corresponding query tree'sCorresponding combined element O in query treeiThe value of the edit distance introduced is,
Figure FDA0002667239530000036
is a combination element OiThe weight values of the clause types belong to, W is the sum of the weight values of all the clause types, and S is the score of the target topic;
calculating the correction cost of the student SQL answer in the step 2) comprises calculating a correction SQL answer QsThe first correction cost of SQL keyword and database mode information error in the database, and calculating the corrected SQL answer QsA second correction cost for a wrong order of clauses in (1); and the calculation function expressions of the first correction cost and the second correction cost are shown as the following formulas:
Figure FDA0002667239530000032
in the above formula, Penaltyc(Qs) Represents a first correction cost, QsAs a revised SQL answer, C (Q)s) Answer Q for SQLsThe set of clauses of (a) is,
Figure FDA0002667239530000033
is a clause CiThe number of editing modifications of (a) the number of editing modifications,
Figure FDA0002667239530000034
is a clause CiThe weight values of the clause types belong to, W is the sum of the weight values of all the clause types, and S is the score of the target topic;
Figure FDA0002667239530000035
in the above formula, Penaltya(Qs) Representing a second correction cost, QsAs a revised SQL answer, C (Q)s) SQL answer QsSet of clauses of, I (C)i) Is a clause CiIf clause CiIs adjusted to I (C)i) 1, and vice versa I (C)i)=0,
Figure FDA0002667239530000041
Is a clause CiAnd W is the sum of the weighted values of all clause types, and S is the score of the target topic.
8. The SQL query-oriented refinement scoring method according to claim 1, wherein the function expression adopted when the refinement score of the student SQL answer is calculated in the step 4) is as follows:
Figure FDA0002667239530000042
in the above formula, G (Q)s) For the refined score of the student 'S SQL answer, S is the student' S submitted original SQL answer QsScore of (1), Penaltyc(Qs) Represents a first correction cost, Penaltya(Qs) Indicating a second correction cost, Penaltyt(Q′s,Q’i) Is a canonical SQL statement Q'sAnd a canonical SQL statement Q'iThe cost of the conversion between (a) and (b),
Figure FDA0002667239530000043
representing all canonical SQL statements Q'sAnd a canonical SQL statement Q'iThe minimum in the transition costs between.
9. A refined scoring system for SQL queries, comprising a computer device, wherein the computer device is programmed or configured to perform the steps of the refined scoring method for SQL queries according to any one of claims 1 to 8, or a computer program programmed or configured to perform the refined scoring method for SQL queries according to any one of claims 1 to 8 is stored in a memory of the computer device.
10. A computer-readable storage medium having stored thereon a computer program programmed or configured to perform the SQL query-oriented refinement scoring method according to any one of claims 1 to 8.
CN202010922595.1A 2020-09-04 2020-09-04 SQL query-oriented refinement scoring method Active CN112132420B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010922595.1A CN112132420B (en) 2020-09-04 2020-09-04 SQL query-oriented refinement scoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010922595.1A CN112132420B (en) 2020-09-04 2020-09-04 SQL query-oriented refinement scoring method

Publications (2)

Publication Number Publication Date
CN112132420A true CN112132420A (en) 2020-12-25
CN112132420B CN112132420B (en) 2023-11-28

Family

ID=73847324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010922595.1A Active CN112132420B (en) 2020-09-04 2020-09-04 SQL query-oriented refinement scoring method

Country Status (1)

Country Link
CN (1) CN112132420B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112687140A (en) * 2021-01-06 2021-04-20 北京智联友道科技有限公司 Assessment automatic scoring method, device and system
CN113408298A (en) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 Semantic analysis method and device, electronic equipment and storage medium
CN114625750A (en) * 2022-02-17 2022-06-14 中国人民解放军空军工程大学 Method for automatically judging whether SQL query statement is correct or not by robust computer
CN116468577A (en) * 2023-03-20 2023-07-21 中慧云启科技集团有限公司 Teaching practical training management system based on B/S architecture

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070299836A1 (en) * 2006-06-23 2007-12-27 Xue Qiao Hou Database query language transformation method, transformation apparatus and database query system
CN102023921A (en) * 2010-12-17 2011-04-20 江苏大学 Automatic grading method and device of structured query language (SQL) program
CN108153894A (en) * 2017-12-29 2018-06-12 上海跬智信息技术有限公司 A kind of method of OLAP data model automatic modeling, grader
CN110164216A (en) * 2019-05-23 2019-08-23 福建工程学院 A kind of SQL Online Judge system
CN110378818A (en) * 2019-07-22 2019-10-25 广西大学 Personalized exercise recommended method, system and medium based on difficulty
CN110471936A (en) * 2019-08-19 2019-11-19 福建工程学院 A kind of hybrid SQL automatic scoring method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070299836A1 (en) * 2006-06-23 2007-12-27 Xue Qiao Hou Database query language transformation method, transformation apparatus and database query system
CN102023921A (en) * 2010-12-17 2011-04-20 江苏大学 Automatic grading method and device of structured query language (SQL) program
CN108153894A (en) * 2017-12-29 2018-06-12 上海跬智信息技术有限公司 A kind of method of OLAP data model automatic modeling, grader
CN110164216A (en) * 2019-05-23 2019-08-23 福建工程学院 A kind of SQL Online Judge system
CN110378818A (en) * 2019-07-22 2019-10-25 广西大学 Personalized exercise recommended method, system and medium based on difficulty
CN110471936A (en) * 2019-08-19 2019-11-19 福建工程学院 A kind of hybrid SQL automatic scoring method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王倩 等: "有语法错误的编程题自动评分方法研究――用局部语法分析和采分点匹配实现", 《计算机工程与应用》, vol. 46, no. 17, pages 239 - 242 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112687140A (en) * 2021-01-06 2021-04-20 北京智联友道科技有限公司 Assessment automatic scoring method, device and system
CN113408298A (en) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 Semantic analysis method and device, electronic equipment and storage medium
CN114625750A (en) * 2022-02-17 2022-06-14 中国人民解放军空军工程大学 Method for automatically judging whether SQL query statement is correct or not by robust computer
CN114625750B (en) * 2022-02-17 2024-04-12 中国人民解放军空军工程大学 Method for automatically judging correctness of SQL query statement by robust computer
CN116468577A (en) * 2023-03-20 2023-07-21 中慧云启科技集团有限公司 Teaching practical training management system based on B/S architecture
CN116468577B (en) * 2023-03-20 2024-03-08 中慧云启科技集团有限公司 Teaching practical training management system based on B/S architecture

Also Published As

Publication number Publication date
CN112132420B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
CN112132420A (en) SQL query-oriented refinement scoring method
US11604792B2 (en) Method for constructing SQL statement based on actor-critic network
Karagiannis et al. Scrutinizer: A mixed-initiative approach to large-scale, data-driven claim verification
CN114036281B (en) Knowledge graph-based citrus control question-answering module construction method and question-answering system
CN112380325A (en) Knowledge graph question-answering system based on joint knowledge embedded model and fact memory network
CN111026884A (en) Dialog corpus generation method for improving quality and diversity of human-computer interaction dialog corpus
CN110471936A (en) A kind of hybrid SQL automatic scoring method
CN115934914A (en) Intention multi-classification question and answer method, device, equipment and medium based on knowledge graph
CN110164216B (en) SQL online evaluation system
WO2020168702A1 (en) Template-based automatic software defect question and answer method
CN112528011B (en) Open type mathematic operation correction method, system and equipment driven by multiple data sources
Chukhray et al. The method of student's query analysis while intelligent computer tutoring in SQL
CN112214507B (en) Automatic checking method for correctness of DML statement
CN117290376A (en) Two-stage Text2SQL model, method and system based on large language model
CN111309930A (en) Medical knowledge graph entity alignment method based on representation learning
CN111897829A (en) Natural language query method and equipment for medical software
CN115878814A (en) Knowledge graph question-answering method and system based on machine reading understanding
US6598019B1 (en) Evaluation method, apparatus, and recording medium using optimum template pattern determination method, apparatus and optimum template pattern
Chang et al. TREC 2003 Question Answering Track at CAS-ICT.
Deshmukh et al. Automatic text-to-SQL machine translation for scholarly publication database search
He et al. Application of Grammar Error Detection Method for English Composition Based on Machine Learning
CN115238705A (en) Semantic analysis result reordering method and system
CN114328823A (en) Database natural language query method and device, electronic equipment and storage medium
CN114282497A (en) Method and system for converting text into SQL
CN112905747A (en) Professional system archive question-answering robot system based on semantic analysis technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant