CN112700203B - Intelligent marking method and device - Google Patents

Intelligent marking method and device Download PDF

Info

Publication number
CN112700203B
CN112700203B CN201911012221.XA CN201911012221A CN112700203B CN 112700203 B CN112700203 B CN 112700203B CN 201911012221 A CN201911012221 A CN 201911012221A CN 112700203 B CN112700203 B CN 112700203B
Authority
CN
China
Prior art keywords
answer
student
sentence
standard
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911012221.XA
Other languages
Chinese (zh)
Other versions
CN112700203A (en
Inventor
向宇
刘琼琼
彭守业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yizhen Xuesi Education Technology Co Ltd
Original Assignee
Beijing Yizhen Xuesi Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yizhen Xuesi Education Technology Co Ltd filed Critical Beijing Yizhen Xuesi Education Technology Co Ltd
Priority to CN201911012221.XA priority Critical patent/CN112700203B/en
Publication of CN112700203A publication Critical patent/CN112700203A/en
Application granted granted Critical
Publication of CN112700203B publication Critical patent/CN112700203B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The embodiment of the application provides an intelligent marking method and device, wherein a standard text and a text to be read corresponding to a question are obtained, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read; calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read to obtain corresponding semantic similarity result data; obtaining a collection of points of the text to be read according to the semantic similarity result data; the score of the subject text to be read is obtained according to the score corresponding to the standard field corresponding to each field to be read in the collection point set of the text to be read, so that intelligent paper reading is realized, time and labor are saved, paper reading efficiency is improved, the influence of human subjective factors on examination results in the paper reading process is reduced, and objective fairness and accuracy of paper reading are guaranteed.

Description

Intelligent marking method and device
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to an intelligent marking method and device.
Background
In the field of education, students are usually examined to master the learning conditions of the students. The general examination mode is examination, the examination result is graded, and the learning condition of the student is mastered according to the grade of the student.
Examination questions of an examination are generally classified into subjective questions and objective questions. The objective questions are selected by the students, and the students select one or more answers from a plurality of options according to the examination questions. As the answers of the objective questions are fixed, the objective questions can be conveniently scored by a computer, so that compared with manual scoring, the scoring mode of the computer can shorten the scoring time, save the labor cost of scoring and improve the scoring efficiency. However, for the subjective questions, the students usually use a discussion mode to answer the examination questions, and during the answering process, the students usually answer according to their understanding and thinking modes, so the standard answers of the subjective questions can only be used as a reference, but not as an absolute standard, and the computer cannot be used to evaluate how many points the answers of the students should be for each sentence.
When subjective questions are evaluated in a manual examination paper reading mode, a large amount of labor and time are consumed in the middle processes of binding, reading, transferring test papers and the like, and the examination paper reading efficiency is low. In addition, the manual paper marking mode has strong subjectivity. Different people can score the same question, and the final scoring results can be different and even have large differences. Even the same reader's understanding of the same topic at different times may differ, and thus may also contribute to differences in scoring results.
Disclosure of Invention
In view of the above, an objective of the embodiments of the present invention is to provide an intelligent paper marking method and device, so as to overcome the defects in the prior art.
In one aspect, an embodiment of the present application provides an intelligent paper marking method, including:
acquiring a standard text and a text to be read corresponding to a title, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read; calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read to obtain corresponding semantic similarity result data; obtaining a collection point set of the text to be read according to the semantic similarity result data; and obtaining the score of the text to be read of the subject according to the score corresponding to the standard field corresponding to each field to be read in the score collection set of the text to be read.
On the other hand, the embodiment of the present application provides an intelligent marking device, including:
the reading device comprises an acquisition unit, a reading unit and a display unit, wherein the acquisition unit is used for acquiring a standard text and a text to be read corresponding to a title, the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read;
the similarity calculation unit is used for calculating the semantic similarity between the to-be-read field and the standard field in the standard text aiming at each to-be-read field in the to-be-read text to obtain corresponding semantic similarity result data;
the sampling point determining unit is used for obtaining a sampling point set of the text to be read according to the semantic similarity result data;
and the scoring unit is used for obtaining the score of the text to be read of the subject according to the score corresponding to the standard field corresponding to each field to be read in the collection point set of the text to be read.
In another aspect, an embodiment of the present application provides an electronic device, including:
one or more processors;
a storage device to store one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method as in any one of the above embodiments.
In yet another aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement the method as described in any one of the above embodiments.
According to the intelligent paper marking method and device, the standard text and the text to be read corresponding to the question are obtained, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read; calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read to obtain corresponding semantic similarity result data; obtaining a collection point set of the text to be read according to the semantic similarity result data; the score of the subject text to be read is obtained according to the score corresponding to the standard field corresponding to each field to be read in the collection point set of the text to be read, so that intelligent paper reading is realized, time and labor are saved, paper reading efficiency is improved, the influence of human subjective factors on examination results in the paper reading process is reduced, and objective fairness of paper reading is ensured.
Drawings
Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily to scale. In the drawings:
fig. 1 is a schematic flowchart of an intelligent scoring method according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of an intelligent paper marking method in the second embodiment of the present application.
Fig. 3 is a schematic flowchart of an intelligent scoring method in the third embodiment of the present application.
Fig. 4 is a schematic flow chart of an intelligent paper marking method in the fourth embodiment of the present application.
Fig. 5 is a schematic structural diagram of an intelligent scrolling device according to a fifth embodiment of the present application.
Fig. 6 is a schematic structural diagram of an intelligent scrolling device in the sixth embodiment of the application.
Fig. 7 is a schematic structural diagram of an intelligent scrolling device according to a seventh embodiment of the present application.
FIG. 8 is a schematic structural diagram of an intelligent scrolling device according to an eighth embodiment of the present application.
Fig. 9 is a schematic structural diagram of an electronic device according to a ninth embodiment of the present application.
Fig. 10 is a hardware structure of an electronic device in a tenth embodiment of the present application.
Detailed Description
It is not necessary for any particular embodiment of the invention to achieve all of the above advantages at the same time.
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of protection of the embodiments in the present application.
The test paper is generally divided into objective questions and subjective questions, in the embodiment of the application, the intelligent examination paper is mainly used for intelligently examining the subjective questions in the test paper, and the intelligent examination paper can be suitable for the subjective questions of any subject, such as Chinese subjective questions, political subjective questions, historical subjective questions and the like, and the method is not limited here.
The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.
Fig. 1 is a schematic flowchart of an intelligent scoring method according to an embodiment of the present application. As shown in fig. 1, includes:
step S101, a standard text and a text to be read corresponding to the title are obtained, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read.
In this embodiment, a corresponding standard text is preset for the question on the test paper as a reference answer during reading. For the subjective question, there is no fixed reference answer, so 1,2 or more reference answers may be set, and the corresponding standard texts may also be 1,2 or more reference answers, which is not limited herein. In addition, in consideration of the fact that the standard texts of the subject matter may have many discussions, each standard text includes at least one standard field.
It should be noted that the form of the standard text is not limited, and may be, for example, a word, a picture, or a combination of a word and a picture.
In this embodiment, the manner of obtaining the standard text is not limited here, and the corresponding relationship between the standard text and the title may be established, and the standard text and the corresponding relationship are stored. When going through examination paper marking, firstly, the title is determined, and then the standard text corresponding to the title is determined according to the title and the corresponding relation between the title and the standard text.
Optionally, if the standard text and the corresponding relationship between the standard text and the question need to be stored before going through the examination paper each time, the preparation work before going through the examination paper is complicated, and in order to avoid this situation, a standard text library can be established, and all the standard texts and the corresponding relationships between the standard texts and the questions of all the questions are stored in the standard text library. When the scoring point of the question is determined, the standard text corresponding to the question is determined in the standard text library, so that the preparation work before scoring is reduced, and the scoring efficiency is improved.
In this embodiment, the questions on the test paper also correspond to the answering contents of the students, i.e., the text to be read. Also, considering that many discussions may be made in answering the subject question, each of the texts to be read includes at least one field to be read. It should be noted that the form of the text to be read is not limited, and may be, for example, a character, a picture, or a combination of a picture and a character.
In this embodiment, the manner of obtaining the text to be read is not limited, and before the examination paper is read, the written answers of the students for the questions may be scanned and stored as the text to be read (which may be referred to as offline acquisition), or the written answers of the students may be answered directly by the computer and stored as the text to be read (which may be referred to as online acquisition).
It should be noted that the field may be a sentence or a word, and may also be defined by itself according to the requirement of paper marking, which is not limited herein. When the field is a sentence, different sentences may be divided according to punctuations in the standard text or the text to be read, for example, characters between two periods may be divided into one sentence, or characters between any two punctuations may be divided into one sentence, which is not limited herein.
And S102, calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read, and obtaining corresponding semantic similarity result data.
In this embodiment, the semantic similarity obtained by calculation for each field to be read is the semantic similarity between the field to be read and each standard field in the standard text corresponding to the question. It should be noted that various methods may be adopted to calculate the semantic similarity, which is not limited herein as long as the semantic similarity can be obtained. For example, the semantic similarity may be obtained by calculating the euclidean distance between the standard field and the field to be read, or the semantic similarity may be obtained by calculating the pearson distance between the standard text and the text to be read.
In addition, in the process of marking, when calculating the semantic similarity between the field to be read and the standard field in the standard text, the semantic similarity between each field to be read and the standard field can be judged one by one, and the semantic similarity between all the fields to be read and the standard field can also be judged at the same time, which is not limited here.
In this embodiment, it is considered that although the student can answer the correct answer, the text to be read and the standard text may not be completely consistent because the language logic of the text to be read and the standard text may be different. Therefore, when the examination paper is intelligently read, if the examination paper is directly read by judging whether the characters in the text to be read are consistent with the characters in the standard text, the accuracy of the examination paper is influenced. Therefore, whether the two values are consistent or not is determined by calculating the semantic similarity of the text to be read and the standard text, so that the score of the text to be read is determined, and the accuracy of score calculation of the text to be read can be improved.
And S103, obtaining a collection of points of the text to be read according to the semantic similarity result data.
In this embodiment, whether the field to be read is in fit with the standard field or not can be judged according to the semantic similarity result data, if yes, the field to be read is added to the division point set of the text to be read, and if not, the field to be read is not added to the division point set of the text to be read. The obtained scoring point set of the text to be read comprises the field to be read matched with the standard field.
And step S104, obtaining the score of the text to be read of the subject according to the score corresponding to the standard field corresponding to each field to be read in the score collection set of the text to be read.
In this embodiment, the scores corresponding to the standard fields corresponding to all the fields to be read in the score collection set of the text to be read may be directly added, so as to obtain the score of the text to be read of the topic, and of course, a weight may be set for the score corresponding to each standard field, and the score of the text to be read of the topic may be obtained by performing a weighted average on the scores corresponding to the standard fields corresponding to the fields to be read, which is not limited herein.
According to the intelligent marking method, the standard text and the text to be read corresponding to the subject are obtained, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read; calculating the semantic similarity between the to-be-read field and the standard field in the standard text aiming at each to-be-read field in the to-be-read text to obtain corresponding semantic similarity result data; obtaining a collection point set of the text to be read according to the semantic similarity result data; the score of the subject text to be read is obtained according to the score corresponding to the standard field corresponding to each field to be read in the collection point set of the text to be read, so that intelligent paper reading is realized, time and labor are saved, paper reading efficiency is improved, the influence of human subjective factors on examination results in the paper reading process is reduced, and objective fairness and accuracy of paper reading are guaranteed.
Fig. 2 is a schematic flow chart of an intelligent paper marking method according to an embodiment of the present application. The difference between the present embodiment and the above embodiments is that in the present embodiment, the standard text is used as the standard answer text, the text to be read is the student answer text, the standard field is the standard answer sentence, and the field to be read is the student answer sentence. As shown in fig. 2, includes:
step S201, obtaining a standard answer text and a student answer text corresponding to the question.
In this embodiment, the standard text is a standard answer text, and the text to be read is a student answer text. Correspondingly, the standard field in the standard text corresponds to a standard answer sentence, and the field to be read in the text to be read corresponds to a student answer sentence.
Step S202, aiming at each student answer sentence in the student answer text, calculating the semantic similarity between the student answer sentence and a standard answer sentence in the standard answer text to obtain corresponding semantic similarity result data.
In this embodiment, calculating the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text includes:
carrying out word segmentation on the student answer sentence, and extracting a student answer keyword set; traversing each standard answer sentence in the standard answer text, performing word segmentation processing on each traversed standard answer sentence, and extracting a standard answer keyword set corresponding to each standard answer sentence; calculating the similarity between the standard answer keyword set and the student answer keyword set; and determining semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text according to the similarity between the standard answer keyword set and the student answer keyword set.
In this embodiment, the similarity between the standard answer keyword set and the student answer keyword set can be obtained through the Jaccard similarity coefficient between the standard answer keyword set and the student answer keyword set. Among them, the Jaccard similarity coefficient (Jaccard similarity coefficient) is used to compare the similarity and difference between limited sample sets.
In this embodiment, before calculating the Jaccard similarity coefficient, the student answer sentence and the standard answer sentence are first subjected to word segmentation to obtain a student answer keyword set and a standard answer keyword set. In particular, word segmentation tools, such as THULAC, NLPIR, etc., may be employed for word segmentation processing. In order to increase the word segmentation processing speed and improve the word segmentation accuracy, a Jieba word segmentation tool is preferably used for word segmentation processing.
In this embodiment, the standard keyword set and the student keyword set are obtained by performing word segmentation processing on the standard answer sentence and the student answer sentence respectively through a word segmentation tool, instead of manually defining the standard keyword set and the student keyword set, so that the influence of human factors on an intelligent paper marking process is avoided, and the accuracy of intelligent paper marking is improved.
In this embodiment, when the Jaccard similarity coefficients of the standard answer keyword set and the student answer keyword set are directly calculated, it is determined that the two keywords are similar only when the keywords in the standard answer keyword set and the student answer keyword set are completely consistent, but in an actual situation, it is only necessary that the semantics of the keywords in the student answer text are similar to the semantics of the keywords in the standard answer text, for example, the student answer keywords are happy, the standard answer keywords are happy, and if the Jaccard similarity coefficient is directly calculated, the Jaccard similarity coefficient is 0, but the actual situation is 1, which may result in false judgment and missed judgment.
So to avoid this happening, a first list of key-value pairs may be established; adding a key value pair to each element in the first key value pair list according to the standard answer key words in the standard answer key word set and the student answer key words in the student answer key word set; calculating the cosine similarity of the ith element and each element behind the ith element in the first key-value pair list, and modifying the key-value pairs of the elements behind the ith element, which meet the cosine similarity threshold of the element, to obtain a second key-value pair list, wherein i is a positive integer; obtaining a standard answer key-value pair set and a student answer key-value pair set according to the second key-value pair list; and calculating the similarity of the standard answer key-value pair set and the student answer key-value pair set, namely calculating the Jaccard similarity coefficient of the standard answer key-value pair set and the student answer key-value pair set.
At this time, determining the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text according to the similarity between the standard answer keyword set and the student answer keyword set includes: the similarity of the standard answer key-value pair set and the student answer key-value pair set determines the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text. It should be noted that, in other embodiments, the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text may also be determined in other ways, which is not limited herein.
Optionally, adding a key-value pair to each element in the first key-value pair list according to the standard answer keyword in the standard answer keyword set and the student answer keyword in the student answer keyword set includes:
aiming at the kth student answer keyword, adding a key value pair containing identification information and sequence information representing the student answer keyword into the xth element in the first key value pair list, wherein k is more than or equal to 1 and less than or equal to m, and m is the number of the student answer keywords in the student answer keyword set;
aiming at the r standard answer keyword, adding a key value pair containing the identification information of the representation standard answer keyword and the sequence information to the y element in the first key value pair list, wherein r is more than or equal to 1 and less than or equal to n, and n is the number of the standard answer keywords in the standard answer keyword set;
wherein x is more than or equal to 1 and less than or equal to m, and m +1 is more than or equal to y; or y is more than or equal to 1 and less than or equal to n, and x, k, m, r and n are positive integers and are more than or equal to 1 and less than or equal to x, k, m and r.
In this embodiment, when adding a key value pair to each element in the first key value pair list, a key value pair including identification information representing the student answer key words may be sequentially added to the 1 st element to the mth element in the first key value pair list according to the student answer key words, and then a key value pair including identification information representing the standard answer key words may be sequentially added to the m +1 th element to the last element (i.e., the m + n th element) in the first key value pair list according to the standard answer key words, where x is greater than or equal to 1 and less than or equal to m, and m +1 is less than or equal to y, that is, in the first key value pair list, the first m elements correspond to relevant fields of the student answer key words, and the last n elements correspond to relevant fields of the standard answer key words. Certainly, key value pairs containing identification information representing standard answer keywords are added to the 1 st element to the nth element in the first key value pair list in sequence according to the labeled answer keywords, then key value pairs containing identification information representing the student answer keywords are added to the (n + 1) th element to the last element (namely, the (m + n) th element) in sequence according to the student answer keywords, at the moment, y is larger than or equal to 1 and smaller than or equal to n, and n +1 is smaller than or equal to x, namely in the first key value pair list, the first n elements correspond to relevant fields of the standard answer keywords, and the last m elements correspond to relevant fields of the student answer keywords. The specific order of addition is not limited herein.
In this embodiment, the identification information of the keyword representing the standard answer and the sequence information may be included by one key value pair, and may also be included by two or more key value pairs, which is not limited herein. The representation standard answer keyword identification information comprises word attributes for identifying the standard answer keywords and a standard answer keyword set to which the representation standard answer keywords belong, and the sequence information can be sequence information of the standard answer keywords in the standard answer set, sequence information of all standard answer keywords of the standard answer keywords in a first key-value pair list, and sequence information of the standard answer keywords in the first key-value pair list. Similarly, the identification information of the keyword characterizing the student answers and the sequence information may be included by one key value pair, and may of course be included by two or more key value pairs, which is not limited herein. The identification information for representing the student answer keywords comprises word attributes for identifying the student answer keywords and a student answer keyword set to which the student answer keywords belong, and the sequence information can be sequence information of the student answer keywords in the student answer set, sequence information of all student answer keywords of the student answer keywords in a first key-value pair list, and sequence information of the student answer keywords in the first key-value pair list. It should be noted that the sequence information is only required to show the order of the keywords in the set or the list, and the specific form is not limited here.
In this embodiment, taking x is greater than or equal to 1 and less than or equal to m, and m +1 is less than or equal to y as an example, sequence information included in a key-value pair added for each element in the first key-value pair list for a student answer keyword may be 1 to m, and sequence information included in a key-value pair added for each element in the first key-value pair list for a standard answer keyword may be-1 to-m or m +1 to m + n, which is not limited herein.
Optionally, in a specific implementation scenario, three key-value pairs are added to each element in the first key-value pair list, a key of the first key-value pair is a word attribute, a value of the first key-value pair is a keyword, a key of the second key-value pair is a category attribute, a value of the second key-value pair is a student answer keyword set or a standard answer keyword set, a key of the third key-value pair is a variable attribute, and a value of the third key-value pair is sequence information of the student answer keyword or sequence information of the standard answer keyword. For example, if the 3 rd keyword in the standard answer keyword set is "happy", three key-value pairs are added to the corresponding element, the key of the first key-value pair is a word attribute, the value of the first key-value pair is "happy", the key of the second key-value pair is a category attribute, the value of the second key-value pair is "standard answer keyword set" or a letter Q representing the standard answer category attribute, the key of the third key-value pair is a variable attribute, and the value of the third key-value pair is "3".
Optionally, modifying the key-value pair of the element, whose cosine similarity with the element of the ith element satisfies the element cosine similarity threshold, in the elements after the ith element includes: and modifying the sequence information of the key value pair of the element, of which the cosine similarity with the element of the ith element meets the element cosine similarity threshold, into the sequence information of the key value pair of the ith element. For example, when x is greater than or equal to 1 and less than or equal to m and m +1 is less than or equal to y, cosine similarity between the word vectors of the 2 nd element to the m + n th element and the 1 st element is respectively judged from the 1 st element of the first key-value pair list, if the cosine similarity between the word vectors of the p-th element and the 1 st element in the 2 nd element to the m + n th element meets an element cosine similarity threshold, sequence information of the key-value pair of the p-th element is modified into sequence information of the key-value pair of the 1 st element, then cosine similarity between the word vectors of the 3 rd element to the m + n th element and the 2 nd element is judged, and each element is sequentially judged until cosine similarity between the word vectors of the m + n th element and the reciprocal second element is judged. It should be noted that the word vector of each element may be obtained by a corresponding query from a corpus word vector library, for example, a query from a corpus word vector library in the people's daily newspaper. The cosine similarity threshold may be set according to a requirement, and is not limited herein. Wherein p is more than or equal to 2 and less than or equal to m + n.
Optionally, obtaining a standard answer key-value pair set and a student answer key-value pair set according to the second key-value pair list includes: when x is larger than or equal to 1 and smaller than or equal to m and m +1 is smaller than or equal to y, according to the representation student answer identification information contained in the key value pair, obtaining sequence information of the 1 st element to the mth element from the second key value pair list to form a student answer key value pair set, and according to the representation standard answer identification information contained in the key value pair, obtaining sequence information of the m +1 th element to the m + n th element from the second key value pair list to form a standard answer key value pair set;
when y is larger than or equal to 1 and smaller than or equal to n and n +1 is smaller than or equal to x, obtaining sequence information of 1 st to nth elements from a second key-value pair list according to the representation standard answer identification information contained in the key-value pair to form a standard answer key-value pair set, and obtaining sequence information of n +1 th to n + m th elements from the second key-value pair list according to the representation student answer identification information contained in the key-value pair to form a student answer key-value pair set. It should be noted that, in other embodiments, the standard answer key-value pair set and the student answer key-value pair set may be obtained in other manners, which is not limited herein.
It should be noted that, in other embodiments, the standard answer key-value pair set and the student answer key-value pair set may further include data representing standard answer identification information and data representing student answer identification information, respectively, besides the sequence information, and this is not limited herein.
In addition, when the standard answer key value pair set and the student answer key value pair set only contain sequence information, elements with the same sequence information in the sets can be merged according to the mutual difference of the sets in the process of generating the standard answer key value pair set and the student answer key value pair set.
Optionally, the specific process of calculating the similarity between the standard answer key-value pair set and the student answer key-value pair set includes: and counting the number of sequence information in the intersection of the standard answer key-value pair set and the student answer key-value pair set and the number of sequence information in the union of the standard answer key-value pair set and the student answer key-value pair set, wherein the ratio of the number of sequence information in the intersection of the standard answer key-value pair set and the student answer key-value pair set to the number of sequence information in the union of the standard answer key-value pair set and the student answer key-value pair set is the similarity of the standard answer key-value pair set and the student answer key-value pair set. For example, the set of standard answer key-value pairs New-P = {1,2,3,4}, the set of student answer key-value pairs New-Q = {1,2,3}, and the similarity of the set of standard answer key-value pairs and the set of student answer key-value pairs, jaccard similarity coefficient = (New-P ∞ New-Q)/(New-P ucu-Q) =3/4=0.75. Of course, in other embodiments, the similarity between the standard answer key-value pair set and the student answer key-value pair set may be calculated by other suitable methods, which are not limited herein.
The following describes an example of calculating the similarity between the student answer keyword set and the standard answer keyword set.
A standard answer keyword set P = { happy, chinese, travel }, and a student answer keyword set Q = { happy, chinese };
firstly, establishing a first key value pair list, and adding 3 key value pairs for 1 st to 3 rd elements in the first key value pair list in sequence according to keywords in standard keywords, wherein keys of the 1 st key value pair of the 1 st element are word attributes, and the values of the 1 st key value pair are happy; the key of the 2 nd key-value pair of the 1 st element is a category attribute, the value of the 2 nd key-value pair is P, the key of the 3 rd key-value pair of the 1 st element is a variable attribute, and the value of the 3 rd key-value pair is 1. The key of the 1 st key value pair of the 2 nd element is a word attribute, and the value of the 1 st key value pair is China; the key of the 2 nd key-value pair of the 2 nd element is a category attribute, the value of the 2 nd key-value pair is P, the key of the 3 rd key-value pair of the 2 nd element is a variable attribute, and the value of the 3 rd key-value pair is 3. The key of the 1 st key value pair of the 3 rd element is a word attribute, and the value of the 1 st key value pair is travel; the key of the 2 nd key-value pair of the 3 rd element is a category attribute, the value of the 2 nd key-value pair is P, the key of the 3 rd key-value pair of the 3 rd element is a variable attribute, and the value of the 3 rd key-value pair is 3.
Secondly, adding 3 key value pairs to the 4 th to 5 th elements in the first key value pair list in sequence, wherein the key of the 1 st key value pair of the 4 th element is a word attribute, and the value of the 1 st key value pair is happy; the key of the 2 nd key-value pair of the 4 th element is a category attribute, the value of the 2 nd key-value pair is Q, the key of the 3 rd key-value pair of the 4 th element is a variable attribute, and the value of the 3 rd key-value pair is-1. The key of the 1 st key-value pair of the 5 th element is a word attribute, and the value of the 1 st key-value pair is Chinese; the key of the 2 nd key-value pair of the 5 th element is a category attribute, the value of the 2 nd key-value pair is Q, the key of the 3 rd key-value pair of the 5 th element is a variable attribute, and the value of the 3 rd key-value pair is-2.
Thirdly, from the 1 st element, respectively judging the cosine similarity between the 2 nd element to the 5 th element and the 1 st element, if the cosine similarity of the 4 th element meets the threshold of the cosine similarity of the elements, modifying the value of the 3 rd key value pair of the 4 th element to 1, and then repeating the steps in the same order to modify the values of the 3 rd key value pairs of other elements. And obtaining a second key-value pair list after modification.
Fourthly, values of the 3 rd key-value pair of the 1 st element to the 3 rd element are sequentially obtained according to the 2 nd key-value pair of each element in the second key-value pair list to form a standard answer key-value pair set New-P = {1,2,3,4}, then values of the 3 rd key-value pair of the 4 th element and the 5 th element are sequentially obtained to form a student answer key-value pair set New-Q = {1,2}, and a Jaccard similarity coefficient = (New-P ≡ New-Q)/(New-P $ New-Q) =2/3=0.667 is calculated.
In this embodiment, after the similarity between the standard answer keyword set and the student answer keyword set is obtained through calculation, the similarity between the standard answer keyword set and the student answer keyword set may be directly used as the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text, and certainly, the similarity between the standard answer keyword set and the student answer keyword set may also be normalized or weighted and averaged to obtain the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text, which is not limited here.
And S203, obtaining an answer point collection set of the student according to the result data of each semantic similarity.
In this embodiment, according to each semantic similarity result data, a student answer sentence with a standard answer sentence in a standard answer text so that the semantic similarity is not less than a semantic similarity threshold value may be added to an answer point collection of a student to obtain an answer point collection of the student. It should be noted that the semantic similarity threshold may be set by itself, and is not limited herein.
And S204, obtaining the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to the answer sentences of each student in the student answer point collection set.
In this embodiment, steps S203 and S204 are similar to steps S103 and S104, and are not described again here.
Fig. 3 is a schematic flow chart of an intelligent paper marking method in the third embodiment of the present application. The difference between this embodiment and the above embodiment is that in this embodiment, the sentence component similarity is also calculated for the student answer sentence and the standard answer sentence. As shown in fig. 3, includes:
step S301, a standard answer text and a student answer text corresponding to the question are obtained.
Step S302, aiming at each student answer sentence in the student answer text, calculating the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text to obtain corresponding semantic similarity result data.
And step S303, obtaining a semantic similar answer set of the student according to the result data of each semantic similarity.
In this embodiment, according to each semantic similarity result data, a student answer sentence, in which a standard answer sentence exists in a standard answer text and the semantic similarity is not less than a semantic similarity threshold, may be added to a semantic similar answer set of a student to obtain the semantic similar answer set of the student. It should be noted that the semantic similarity threshold may be set by itself, and is not limited herein.
Step S304, aiming at each student answer sentence in the semantic similar answer set of the student, the sentence component similarity between the student answer sentence and the standard answer sentence in the standard answer text is calculated, and corresponding sentence component similarity result data is obtained.
In this embodiment, it is considered that if the student scores of the questions are directly obtained according to each student answer sentence of which the semantic similarity result data is not less than the semantic similarity threshold, a case of wrong judgment and missed judgment may occur, and the scores are inaccurate. In order to improve the accuracy of the score, the sentence component similarity of the student answer sentence and the standard answer sentence in the standard answer text can be calculated, whether the student answer sentence is matched with the standard answer sentence is judged again according to the sentence component similarity result data, if the student answer sentence is judged to be matched with the standard answer sentence, the student answer sentence is added into the student answer score point set, and if the student answer sentence is not matched, the student answer sentence is not added into the student answer score point set. The obtained student answer point collection comprises student answer sentences matched with the standard answer sentence semantic similarity and the sentence component similarity, so that the student scores of the questions obtained according to the student point collection are more accurate.
In this embodiment, various methods may be adopted to calculate the sentence component similarity, which is not limited herein as long as the sentence component similarity can be obtained. For example, the similarity of sentence components is obtained by calculating the euclidean distance between the standard answer text and the student answer text, or the similarity of sentence components is obtained by calculating the pearson distance between the standard answer text and the student answer text, etc.
Optionally, the calculating the sentence component similarity between the student answer sentence in the semantic similar answer set of the student and the standard answer sentence in the standard answer text includes: sentence component extraction processing is carried out on student answer sentences in the semantic similar answer set of the students to obtain sentence components of the student answer sentences; traversing standard answer sentences corresponding to student answer sentences in the semantic similar answer set of the students, and performing sentence component extraction processing on each traversed standard answer sentence to obtain sentence components of each standard answer sentence; calculating the cosine similarity of sentence components of the student answer sentences and sentence components of corresponding standard answer sentences; and obtaining sentence component similarity result data according to the cosine similarity of each sentence component.
Correspondingly, obtaining the student answer score point set according to the sentence component similarity result data comprises the following steps: and adding the student answer sentences of which the sentence component similarity is not less than the sentence component similarity threshold into the student answer score point collection according to the sentence component similarity result data to obtain the student answer score point collection.
In this embodiment, when the sentence component extraction processing is performed on the student answer sentence and the standard answer sentence, the sentence component extraction processing is performed by using a sentence component extraction tool, for example, the sentence component extraction is performed by using an LTP tool, or the part-of-speech tagging by Jieba may be used, which is not limited herein.
In the present embodiment, it is considered that the sentence component of each sentence includes a single sentence component such as a subject, a predicate, an object, and a fixed sentence of text. Therefore, when calculating the sentence component similarity, the cosine similarity of the sentence component of the subject of the standard answer sentence and the subject of the student answer sentence may be calculated first, and then the cosine similarity of the sentence component of the predicate of the standard answer sentence and the predicate of the student answer sentence may be calculated. And obtaining sentence component similarity result data according to the cosine similarity of each sentence component according to the sentence cosine similarity of the subject of the student answer sentence and the standard answer sentence, the cosine similarity of the sentence of the predicate and other sentence component cosine similarities.
In this embodiment, when calculating the cosine similarity between the subject of the student answer sentence and the sentence component of the subject of the standard answer sentence, the word vector of the subject of the student answer sentence and the word vector of the subject of the standard answer sentence may be determined first, and then the cosine similarity between the word vector of the subject of the student answer sentence and the cosine similarity between the word vector of the subject of the standard answer sentence may be calculated. The method for determining the word vector of the subject is the same as the method for determining the word vector of the keyword in the above embodiments, and details are not repeated here. The calculation methods of the similarity of other single sentence components are similar, and are not repeated here.
Optionally, in a specific implementation scenario, obtaining sentence component similarity result data according to the cosine similarity of each sentence component includes: and giving each sentence component a certain component similarity score, if the cosine similarity between one sentence component of the student answer sentence and the sentence component of the corresponding sentence component in the standard answer sentence is not less than the cosine similarity threshold of the sentence component, adding the component similarity score corresponding to the sentence component to the student answer sentence, and obtaining sentence component similarity result data according to all the component similarity scores of the student answer sentence.
It should be noted that, when sentence component similarity result data is obtained according to all component similarity scores of a student answer sentence, all component similarity scores of the student answer sentence may be directly added to obtain sentence component similarity result data, and in addition, all component similarity scores of the student answer sentence may be subjected to weighted average processing to obtain sentence component similarity result data, which is not limited herein. In addition, certain component similarity scores of each single sentence component can be consistent or can be distributed according to proportions, and are not limited herein.
It should be noted that, if there are a plurality of student answer sentences, there are a plurality of standard answer sentences corresponding to the plurality of student answer sentences, and at this time, the similarity between each student answer sentence and the sentence component of each standard answer sentence corresponding to the student answer sentence can be determined one by one, or the similarity between each student answer sentence and the sentence component of each standard answer sentence corresponding to the student answer sentence can be determined at the same time, which is not limited herein.
Optionally, if the sentence pattern type of the student answer sentence is inconsistent with the sentence pattern type of the standard answer sentence, the similarity of the sentence components calculated according to the extracted student answer sentence components and standard answer sentence components may be deviated, which may result in a false judgment. Therefore, in order to avoid the situation, whether the sentence pattern types of the student answer sentence and the corresponding standard answer sentence are the same or not can be judged before the sentence component extraction processing is carried out; if not, carrying out sentence pattern conversion processing on the student answer sentence or the standard answer sentence, so that the student answer sentence is the same as the standard answer sentence in sentence pattern type.
And S305, obtaining an answer point collection set of the students according to the similarity result data of each sentence component.
And S306, obtaining the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to the answer sentences of each student in the student answer point collection set.
In this embodiment, step S305 and step S306 are similar to the above embodiments, and are not described again here.
In the embodiment, the semantic similarity and sentence component similarity between the student answer sentences in the answer collection point set and the standard answer sentences are not less than the semantic similarity threshold and sentence component similarity threshold, so that the accuracy of the student answer sentences in the answer collection point set is ensured, and the student scores of the questions are more accurate.
Fig. 4 is a schematic flow chart of an intelligent paper marking method in the fourth embodiment of the present application. The present embodiment is different from the above-mentioned embodiments in that the answer point collection is expanded according to the obtained standard answer text and student answer text. As shown in fig. 4, includes:
step S401, a standard answer text and a student answer text corresponding to the question are obtained.
Step S402, calculating semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text aiming at each student answer sentence in the student answer text to obtain corresponding semantic similarity result data.
And S403, obtaining a semantic similar answer set of the student according to the semantic similarity result data.
Step S404, calculating the sentence component similarity between the student answer sentence and the standard answer sentence in the standard answer text aiming at each student answer sentence in the semantic similar answer set of the student to obtain corresponding sentence component similarity result data.
And S405, obtaining an answer point collection set of the students according to the similarity result data of each sentence component.
In this embodiment, steps S401 to S405 are similar to the above embodiment, and are not described herein again.
Step S406, calculating a vector inner product of the student answer sentence and a standard answer sentence in the standard answer text for each student answer sentence in the student answer text to obtain corresponding vector inner product result data.
In this embodiment, it is considered that errors generally exist in the process of screening student answer sentences by calculating semantic similarity between the student answer sentences and the standard answer sentences, some correct student answer sentences are omitted, and errors generally exist in the process of screening student answer sentences by calculating sentence component similarity between the student answer sentences and the standard answer sentences, and some correct student answer sentences are also omitted. Therefore, errors in the two screening processes can be accumulated by the answer point collection and point division sets, and the student scores obtained according to the answer point collection and point division sets can generate certain deviation.
In this embodiment, the method of obtaining the student answer vector and the standard answer vector is not limited herein, such as the TF-IDF method. However, when the TF-IDF method is adopted, a situation that an inverse text Frequency Index (IDF) score is negative may occur, which affects the accuracy of the obtained answer vector. Therefore, to avoid this, in one embodiment, calculating the vector inner product of the student answer sentence and the standard answer sentence in the standard answer text comprises:
vectorizing the student answer sentences to obtain student answer vectors;
vectorizing each standard answer sentence to obtain a standard answer vector;
and traversing each standard answer vector, and calculating the vector inner product of the student answer vector and the traversed standard answer vector to obtain the vector inner product of the student answer sentence and the standard answer sentence in the standard answer text.
Optionally, in a specific implementation scenario, vectorizing the student answer sentence to obtain a student answer vector includes: performing word segmentation on the student answer sentences to obtain a student answer word set; calculating the word Frequency (TF) score, the IDF score and the word vector of each word in the student answer word set, and calculating to obtain the answer vector of each word according to the word Frequency score, the IDF score and the word vector of each word; and carrying out normalization processing on the answer vector of each word to obtain a normalized answer vector of each word, and obtaining a student answer vector according to the normalized answer vector of each word.
In this embodiment, the manner of performing word segmentation processing on the student answer sentence may be the same as that in the above embodiment, and other word segmentation processing manners may also be adopted, which are not limited herein. The word vectors of the words are determined in a similar manner to the above embodiments, and are not limited herein.
In this embodiment, the calculation formula of the word answer vector is:
term answer vector = term TF score term IDF score term vector.
Where the word TF score = word frequency/total word frequency of all words in the sentence. The word frequency determination mode may be: the number of times a word occurs in a student answer word set is determined. The determination method of the total word frequency of all words in the sentence can be as follows: and determining the word frequency of each word in the student word set, and synthesizing the word frequency of each word to obtain the total word frequency of all words in the sentence.
In this embodiment, the inverse text frequency index score of each word in the student answer keyword set is calculated according to the following formula:
Figure BDA0002244538890000101
d is the total number of documents in a pre-established standard answer text library, wf is the total number of documents including words in the standard answer text library, C is a constant and is larger than or equal to 2.
In this embodiment, in order to make the IDF score of a word more accurate, a standard answer text library may be established according to all standard answer texts of all questions.
Optionally, calculating an inverse text frequency index score of each word in the student answer word set according to the standard answer text library and an inverse text frequency index score calculation formula includes: performing word segmentation processing on all standard answer sentences in the standard answer text library, then counting the total number of documents including words in the standard answer text library to be wf, and then calculating the IDF score of each word according to an inverse text frequency index score calculation formula.
In this embodiment, the method for vectorizing the standard answer sentence to obtain the standard answer vector is similar to the method for vectorizing the student answer sentence to obtain the student answer vector, and is not described herein again.
In this embodiment, when the IDF score of each word is calculated according to the existing inverse text frequency index score calculation formula, it may be caused that the obtained student answer vector or standard answer vector has a negative number, and it may occur that the actual vector inner product of the standard answer text and the student answer text satisfies the vector inner product determination condition, but since the student answer vector or standard answer vector has a negative number, the determination result is that the vector inner product does not satisfy the vector inner product determination condition, which affects the accuracy of the intelligent answer sheet.
In this embodiment, in order to calculate the vector inner product according to the standard answer vector and the student answer vector and improve the accuracy of the vector inner product, data normalization processing may be performed on the standard answer vector and the student answer vector. The specific process of normalization processing includes dividing the word vector of each student answer by the word vector modular length of the student answer to obtain a normalized answer vector after normalization processing, and the normalization processing of the word answer vector of the standard answer is consistent with the normalization processing of the word answer vector of the student answer, and is not repeated here.
It should be noted that, in other embodiments, other methods may be adopted to perform vectorization processing on the student answer sentences and the standard answer sentences, which is not limited herein.
In this embodiment, the student answer vector is obtained according to the normalized answer vector of each word, and the student answer vector can be obtained by directly adding the normalized answer vectors of each word, or the student answer vector can be obtained by performing weighted average on the normalized answer vectors of each word.
And step S407, obtaining an effective answer point collection set of the student according to the result data of the each vector inner product and the answer point collection set of the student.
In this embodiment, according to the result data of each vector inner product, a first student answer sentence is obtained, where a standard answer sentence exists in the standard answer text, so that the vector inner product is not less than a vector inner product threshold, and it is determined whether a first student answer sentence exists in the student's answer point collection, if not, the first student answer sentence is added to the student's answer point collection, so as to obtain a valid answer point collection of the student.
Step S408, according to the score corresponding to the standard answer sentence corresponding to each student answer sentence in the effective answer point collection set, obtaining the student score of the question.
In this embodiment, the scores corresponding to the standard answer sentences corresponding to all the student answer sentences in the effective answer point collection can be directly added to obtain the student scores of the questions, and certainly, the weight can be set for the score corresponding to each standard answer sentence, and the student scores of the questions can be obtained by performing weighted average on the scores corresponding to the standard answer sentences corresponding to the student answer sentences, which is not limited here.
Fig. 5 is a schematic structural diagram of a fifth exemplary embodiment of the present application. As shown in fig. 5, includes:
the obtaining unit 501 is configured to obtain a standard text and a text to be read corresponding to a title, where the standard text includes at least one standard field, and the text to be read includes at least one field to be read.
In this embodiment, a corresponding standard text is preset for the question on the test paper as a reference answer during reading. For the subjective questions, there is no fixed reference answer, so 1,2 or more reference answers may be set, and the corresponding standard texts may also be 1,2 or more reference answers, which is not limited herein. In addition, in consideration of the fact that the standard texts of the subject matter may have many discussions, each standard text includes at least one standard field.
The standard text is not limited in form, and may be, for example, a word or a picture, or may be a combination of a word and a picture.
In this embodiment, the manner of obtaining the standard text is not limited here, and the corresponding relationship between the standard text and the title may be established, and the standard text and the corresponding relationship are stored. When going through examination paper marking, firstly, the title is determined, and then the standard text corresponding to the title is determined according to the title and the corresponding relation between the title and the standard text.
Optionally, if the standard text and the corresponding relationship between the standard text and the questions need to be stored before scoring each time, the preparation work before scoring is relatively complicated, and in order to avoid this situation, the obtaining unit 501 is further configured to establish a standard text library, and store all the standard texts of all the questions and the corresponding relationship between the standard text and the questions into the standard text library. When the examination paper is scored, after the score point of the question is determined, the standard text corresponding to the question is determined in the standard text library, so that the preparation work before examination paper scoring is reduced, and the examination paper scoring efficiency is improved.
In this embodiment, the questions on the test paper also correspond to the answer contents of the students, that is, the text to be read. Also, considering that many discussions may be made in answering the subject question, each of the texts to be read includes at least one field to be read. It should be noted that the form of the text to be read is not limited, and may be, for example, a character, a picture, or a combination of a picture and a character.
In this embodiment, the manner of obtaining the text to be read is not limited, and before the examination paper is read, the written answers of the students for the questions may be scanned and stored as the text to be read (which may be referred to as offline acquisition), or the written answers of the students may be answered directly by the computer and stored as the text to be read (which may be referred to as online acquisition).
It should be noted that the field may be a sentence or a word, and may also be defined by itself according to the requirement of paper marking, which is not limited herein. When the field is a sentence, different sentences may be divided according to punctuations in the standard text or the text to be read, for example, characters between two periods may be divided into one sentence, or characters between any two punctuations may be divided into one sentence, which is not limited herein.
The similarity calculation unit 502 is configured to calculate, for each field to be read in the text to be read, semantic similarity between the field to be read and a standard field in the standard text, and obtain corresponding semantic similarity result data.
In this embodiment, the semantic similarity obtained through calculation for each field to be read is the semantic similarity between each field to be read and each standard field in the standard text corresponding to the title. It should be noted that various methods may be adopted to calculate the semantic similarity, which is not limited herein as long as the semantic similarity can be obtained. For example, the semantic similarity may be obtained by calculating the euclidean distance between the standard field and the field to be read, or the semantic similarity may be obtained by calculating the pearson distance between the standard text and the text to be read.
In addition, in the process of marking, when calculating the semantic similarity between the field to be read and the standard field in the standard text, the semantic similarity between each field to be read and the standard field can be judged one by one, and the semantic similarity between all the fields to be read and the standard field can also be judged at the same time, which is not limited herein.
In this embodiment, although the student can answer the correct answer, the text to be read and the standard text may not be completely consistent because the language logic of the text to be read and the standard text may be different. Therefore, when the examination paper is intelligently read, if the examination paper is directly read by judging whether the characters in the text to be read are consistent with the characters in the standard text, the accuracy of the examination paper is influenced. Therefore, whether the two values are consistent or not is determined by calculating the semantic similarity of the text to be read and the standard text, so that the score of the text to be read is determined, and the accuracy of score calculation of the text to be read can be improved.
A sampling point determining unit 503, configured to obtain a sampling point set of the text to be read according to each semantic similarity result data.
In this embodiment, whether the field to be read is in fit with the standard field or not can be judged according to the semantic similarity result data, if yes, the field to be read is added to the division point set of the text to be read, and if not, the field to be read is not added to the division point set of the text to be read. The obtained scoring point set of the text to be read comprises the field to be read matched with the standard field.
The scoring unit 504 is configured to obtain a score of the text to be read of the subject according to a score corresponding to a standard field corresponding to each field to be read in the score collection of the text to be read.
In this embodiment, the scores corresponding to the standard fields corresponding to all the fields to be read in the score collection set of the text to be read may be directly added, so as to obtain the score of the text to be read of the topic, and of course, a weight may be set for the score corresponding to each standard field, and the score of the text to be read of the topic may be obtained by performing a weighted average on the scores corresponding to the standard fields corresponding to the fields to be read, which is not limited herein.
According to the intelligent marking device, the standard text and the text to be read corresponding to the subject are obtained, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read; calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read to obtain corresponding semantic similarity result data; obtaining a collection of points of the text to be read according to the semantic similarity result data; and obtaining the score of the text to be read of the question according to the score corresponding to the standard field corresponding to each field to be read in the collection and division point set of the text to be read, so that the intelligent paper reading is realized, the time and the labor are saved, the paper reading efficiency is improved, the influence of human subjective factors on the examination result in the paper reading process is reduced, and the objective fairness and the accuracy of the paper reading are ensured.
Fig. 6 is a schematic structural diagram of an intelligent scrolling device according to a sixth embodiment of the present application. As shown in fig. 6, includes:
the obtaining unit 601 is used for a standard answer text and a student answer text corresponding to the topic.
In this embodiment, the standard text is a standard answer text, and the text to be read is a student answer text. Correspondingly, the standard field in the standard text corresponds to a standard answer sentence, and the field to be read in the text to be read corresponds to a student answer sentence.
The similarity calculating unit 602 is configured to calculate, for each student answer sentence in the student answer text, a semantic similarity between the student answer sentence and a standard answer sentence in the standard answer text, so as to obtain corresponding semantic similarity result data.
In this embodiment, the similarity calculation unit 602 is further configured to perform word segmentation on the student answer sentence, and extract a student answer keyword set; traversing each standard answer sentence in the standard answer text, performing word segmentation processing on each traversed standard answer sentence, and extracting a standard answer keyword set corresponding to each standard answer sentence; calculating the similarity between the standard answer keyword set and the student answer keyword set; and determining semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text according to the similarity between the standard answer keyword set and the student answer keyword set.
In this embodiment, the similarity between the standard answer keyword set and the student answer keyword set can be obtained through the Jaccard similarity coefficient between the standard answer keyword set and the student answer keyword set. Among them, the Jaccard similarity coefficient (Jaccard similarity coefficient) is used to compare the similarity and difference between limited sample sets.
In this embodiment, before calculating the Jaccard similarity coefficient, the student answer sentence and the standard answer sentence are first subjected to word segmentation to obtain a student answer keyword set and a standard answer keyword set. In particular, word segmentation tools, such as THULAC, NLPIR, etc., may be employed for word segmentation processing. In order to increase the word segmentation processing speed and improve the word segmentation accuracy, a Jieba word segmentation tool is preferably used for word segmentation processing.
In this embodiment, the standard keyword set and the student keyword set are obtained by performing word segmentation processing on the standard answer sentence and the student answer sentence respectively through a word segmentation tool, instead of manually defining the standard keyword set and the student keyword set, so that the influence of human factors on an intelligent paper marking process is avoided, and the accuracy of intelligent paper marking is improved.
In this embodiment, when considering that the Jaccard similarity coefficients of the standard answer keyword set and the student answer keyword set are directly calculated, it is determined that the two keywords are similar only when the keywords in the standard answer keyword set and the student answer keyword set are completely consistent, but in an actual situation, only the semantics of the keywords in the student answer text are similar to those of the standard answer text.
So to avoid this, the similarity calculation unit 602 is further configured to build a first list of key-value pairs; adding a key value pair to each element in the first key value pair list according to the standard answer key words in the standard answer key word set and the student answer key words in the student answer key word set; calculating the cosine similarity of the ith element and each element behind the ith element in the first key-value pair list, and modifying the key-value pairs of the elements behind the ith element, of which the cosine similarity with the element behind the ith element meets an element cosine similarity threshold value, to obtain a second key-value pair list, wherein i is a positive integer; obtaining a standard answer key-value pair set and a student answer key-value pair set according to the second key-value pair list; and calculating the similarity of the standard answer key-value pair set and the student answer key-value pair set, namely calculating the Jaccard similarity coefficient of the standard answer key-value pair set and the student answer key-value pair set.
At this time, the similarity calculating unit 602 is further configured to determine semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text according to the similarity between the standard answer key-value pair set and the student answer key-value pair set. It should be noted that, in other embodiments, the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text may also be determined in other ways, which is not limited herein.
Optionally, the similarity calculation unit 602 is further configured to add, for the kth student answer keyword, a key-value pair including identification information of the student answer keyword and sequence information to the xth element in the first key-value pair list, where k is greater than or equal to 1 and less than or equal to m, and m is the number of the student answer keywords in the student answer keyword set;
aiming at the r standard answer keyword, adding a key value pair containing the identification information of the representation standard answer keyword and the sequence information to the y element in the first key value pair list, wherein r is more than or equal to 1 and less than or equal to n, and n is the number of the standard answer keywords in the standard answer keyword set;
wherein x is more than or equal to 1 and less than or equal to m, and m +1 is more than or equal to y; or y is more than or equal to 1 and less than or equal to n, and x, k, m, r and n are positive integers and are more than or equal to 1 and less than or equal to x, k, m and r.
In this embodiment, when adding a key value pair to each element in the first key value pair list, the similarity calculation unit 602 may add, according to the student answer keyword, a key value pair including identification information representing the student answer keyword to the 1 st element to the mth element in the first key value pair list in sequence, and then add, according to the standard answer keyword, a key value pair including identification information representing the standard answer keyword to the m +1 th element to the last element (i.e., the m + n th element) in sequence, where x is greater than or equal to 1 and less than or equal to m, and m +1 is less than or equal to y, that is, in the first key value pair list, the first m elements correspond to relevant fields of the student answer keyword, and the last n elements correspond to relevant fields of the standard answer keyword. Certainly, key value pairs containing identification information of the representation standard answer key words can be added to the 1 st element to the nth element in the first key value pair list in sequence according to the labeled answer key words, then key value pairs containing identification information of the representation standard answer key words are added to the (n + 1) th element to the last element (namely, the (m + n) th element) in the first key value pair list in sequence according to the student answer key words, at the moment, y is more than or equal to 1 and less than or equal to n, n +1 and less than or equal to x, namely, in the first key value pair list, the first n elements correspond to relevant fields of the standard answer key words, and the last m elements correspond to relevant fields of the student answer key words. . The specific order of addition is not limited herein.
Optionally, the similarity calculation unit 602 is further configured to modify, in elements subsequent to the ith element, sequence information of a key-value pair of an element whose cosine similarity with the ith element satisfies an element cosine similarity threshold to sequence information of a key-value pair of the ith element. It should be noted that, in other embodiments, a key-value pair may be added to each element in the first key-value pair list in other manners, which is not limited herein.
Optionally, the similarity calculation unit 602 is further configured to, when x is greater than or equal to 1 and less than or equal to m and m +1 is less than or equal to y, obtain, according to the representation student answer identification information included in the key-value pair, sequence information of the 1 st to m-th elements from the second key-value pair list to form a student answer key-value pair set, and obtain, according to the representation standard answer identification information included in the key-value pair, sequence information of the m +1 th to m + n-th elements from the second key-value pair list to form a standard answer key-value pair set; when y is larger than or equal to 1 and smaller than or equal to n and n +1 is smaller than or equal to x, obtaining sequence information of 1 st to nth elements from a second key-value pair list according to the representation standard answer identification information contained in the key-value pair to form a standard answer key-value pair set, and obtaining sequence information of n +1 th to n + m th elements from the second key-value pair list according to the representation student answer identification information contained in the key-value pair to form a student answer key-value pair set. It should be noted that, in other embodiments, the standard answer key-value pair set and the student answer key-value pair set may be obtained in other manners, which is not limited herein.
Optionally, the similarity calculation unit 602 is further configured to count the number of sequence information in the intersection of the standard answer key-value pair set and the student answer key-value pair set and the number of sequence information in the union of the standard answer key-value pair set and the student answer key-value pair set, where a ratio of the number of sequence information in the intersection of the standard answer key-value pair set and the student answer key-value pair set to the number of sequence information in the union of the standard answer key-value pair set and the student answer key-value pair set is the similarity between the standard answer key-value pair set and the student answer key-value pair set.
In this embodiment, after the similarity between the standard answer keyword set and the student answer keyword set is obtained through calculation, the similarity between the standard answer keyword set and the student answer keyword set may be directly used as the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text, and certainly, the similarity between the standard answer keyword set and the student answer keyword set may also be normalized or weighted and averaged to obtain the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text, which is not limited here.
And a point-to-point acquisition determining unit 603, configured to obtain an answer point-to-point acquisition set of the student according to each semantic similarity result data.
In this embodiment, the point collection determination unit 603 may add, according to each semantic similarity result data, a student answer sentence in which a standard answer sentence exists in the standard answer text and the semantic similarity is not less than the semantic similarity threshold to the student answer point collection to obtain the student answer point collection. It should be noted that the semantic similarity threshold may be set by itself, and is not limited herein.
The scoring unit 604 is configured to obtain the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to each student answer sentence in the student answer point collection.
In this embodiment, the sampling point determining unit 603 and the scoring unit 604 in the step are similar to the sampling point determining unit 503 and the scoring unit 504, and are not described herein again.
FIG. 7 is a schematic structural diagram of a seventh exemplary embodiment of the present disclosure. The difference between this embodiment and the above embodiment is that in this embodiment, the sentence component similarity is also calculated for the student answer sentence and the standard answer sentence. As shown in fig. 7, includes:
the obtaining unit 701 is configured to obtain a standard answer text and a student answer text corresponding to a question.
The similarity calculation unit 702 is configured to calculate, for each student answer sentence in the student answer text, semantic similarity between the student answer sentence and a standard answer sentence in the standard answer text, so as to obtain corresponding semantic similarity result data.
And the semantic similar answer determining unit 703 is configured to obtain a semantic similar answer set of the student according to each semantic similarity result data.
In this embodiment, the semantic similar answer determining unit 703 may add, according to each semantic similarity result data, a student answer sentence in which a standard answer sentence exists in the standard answer text and the semantic similarity is not less than the semantic similarity threshold to the semantic similar answer set of the student, so as to obtain the semantic similar answer set of the student. It should be noted that the semantic similarity threshold may be set by itself, and is not limited herein.
The sentence component similarity calculation unit 704 is configured to calculate, for each student answer sentence in the semantic similar answer set of the student, a sentence component similarity between the student answer sentence and a standard answer sentence in the standard answer text, so as to obtain corresponding sentence component similarity result data.
In this embodiment, it is considered that if the student scores of the questions are directly obtained according to each student answer sentence of which the semantic similarity result data is not less than the semantic similarity threshold, a case of wrong judgment and missed judgment may occur, and the scores are inaccurate. In order to improve the score accuracy, the sentence component similarity calculation unit 704 may further calculate the sentence component similarity between the student answer sentence and the standard answer sentence in the standard answer text, determine whether the student answer sentence fits the standard answer sentence again according to the sentence component similarity result data, add the student answer sentence to the student answer point collection if the student answer sentence fits the standard answer sentence, and not add the student answer sentence to the student answer point collection if the student answer sentence does not fit. The obtained student answer point collection comprises student answer sentences matched with the standard answer sentence semantic similarity and the sentence component similarity, so that the student scores of the questions obtained according to the student point collection are more accurate.
Optionally, the sentence component similarity calculation unit 704 is further configured to perform sentence component extraction processing on the student answer sentences in the semantic similar answer set of the students to obtain sentence components of the student answer sentences; traversing standard answer sentences corresponding to student answer sentences in the semantic similar answer set of the students, and performing sentence component extraction processing on each traversed standard answer sentence to obtain a sentence component of each standard answer sentence; calculating the cosine similarity of sentence components of the student answer sentences and sentence components of corresponding standard answer sentences; and obtaining sentence component similarity result data according to the cosine similarity of each sentence component.
Correspondingly, the sentence component similarity calculation unit 704 is further configured to add the student answer sentence with the sentence component similarity not less than the sentence component similarity threshold to the student answer point collection according to the sentence component similarity result data, so as to obtain the student answer point collection.
In the present embodiment, it is considered that the sentence component of each sentence includes a single sentence component such as a subject, a predicate, an object, and a fixed sentence of text. Therefore, when calculating the sentence component similarity, the cosine similarity of the sentence component of the subject of the standard answer sentence and the subject of the student answer sentence may be calculated first, and then the cosine similarity of the sentence component of the predicate of the standard answer sentence and the predicate of the student answer sentence may be calculated. And obtaining sentence component similarity result data according to the cosine similarity of each sentence component according to the sentence cosine similarity of the subject of the student answer sentence and the standard answer sentence, the cosine similarity of the sentence of the predicate and other sentence component cosine similarities.
In this embodiment, when calculating the cosine similarity between the subject of the student answer sentence and the sentence component of the subject of the standard answer sentence, the word vector of the subject of the student answer sentence and the word vector of the subject of the standard answer sentence may be determined first, and then the cosine similarity between the word vector of the subject of the student answer sentence and the cosine similarity between the word vector of the subject of the standard answer sentence may be calculated. The method for determining the word vector of the subject is the same as the method for determining the word vector of the keyword in the above embodiments, and details are not repeated here. The calculation method of the similarity of other single sentence components is similar, and is not repeated herein.
Optionally, the sentence component similarity calculation unit 704 is further configured to assign a certain component similarity score to each sentence component, and if the cosine similarity between one sentence component of the student answer sentence and the sentence component of the corresponding sentence component in the standard answer sentence is not less than the sentence component cosine similarity threshold, add the component similarity score corresponding to the sentence component to the student answer sentence, and obtain the sentence component similarity result data according to all the component similarity scores of the student answer sentence.
Optionally, if the sentence pattern type of the student answer sentence is not consistent with the sentence pattern type of the standard answer sentence, the similarity of the sentence components calculated according to the extracted student answer sentence components and the standard answer sentence components may be deviated, resulting in a false judgment. Therefore, in order to avoid this situation, the sentence component similarity calculation unit 704 is further configured to determine whether the student answer sentence and the standard answer sentence corresponding thereto have the same sentence pattern type before performing the sentence component extraction process; if not, carrying out sentence pattern conversion processing on the student answer sentence or the standard answer sentence, so that the student answer sentence is the same as the standard answer sentence in sentence pattern type.
And a point sampling and dividing determining unit 705, configured to obtain an answer point sampling and dividing set of the students according to the result data of the similarity of each sentence component.
And the scoring unit 706 is used for obtaining the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to the student answer sentences in the student answer point collection sets.
In this embodiment, the scoring point determining unit 705 and the scoring unit 706 are similar to the above embodiments, and are not described herein again.
In the embodiment, the semantic similarity and sentence component similarity between the student answer sentences in the answer collection point set and the standard answer sentences are not less than the semantic similarity threshold and sentence component similarity threshold, so that the accuracy of the student answer sentences in the answer collection point set is ensured, and the student scores of the questions are more accurate.
FIG. 8 is a schematic structural diagram of an intelligent scrolling device according to an eighth embodiment of the present application. The present embodiment is different from the above-mentioned embodiments in that the answer point collection is expanded according to the obtained standard answer text and student answer text. As shown in fig. 8, includes:
the obtaining unit 801 is configured to obtain a standard answer text and a student answer text corresponding to a topic.
The similarity calculation unit 802 is configured to calculate, for each student answer sentence in the student answer text, semantic similarity between the student answer sentence and a standard answer sentence in the standard answer text, and obtain corresponding semantic similarity result data.
And the semantic similar answer determining unit 803 is configured to obtain a semantic similar answer set of the student according to each semantic similarity result data.
The sentence component similarity calculation unit 804 is configured to calculate, for each student answer sentence in the semantic similar answer set of the student, a sentence component similarity between the student answer sentence and a standard answer sentence in the standard answer text, and obtain corresponding sentence component similarity result data.
And a point-to-point determining unit 805 configured to obtain a point-to-point set of answers from the students according to the result data of the similarity of each sentence component.
In this embodiment, the obtaining unit 801, the similarity calculating unit 802, the semantic similarity answer determining unit 803, the sentence component similarity calculating unit 804, and the segmentation point determining unit 805 are similar to those of the above embodiments, and are not described herein again.
The vector inner product calculating unit 806 is configured to calculate, for each student answer sentence in the student answer text, a vector inner product of the student answer sentence and a standard answer sentence in the standard answer text, so as to obtain corresponding vector inner product result data.
In this embodiment, it is considered that errors generally exist in the process of screening student answer sentences by calculating semantic similarity between the student answer sentences and the standard answer sentences, some correct student answer sentences are omitted, and errors generally exist in the process of screening student answer sentences by calculating sentence component similarity between the student answer sentences and the standard answer sentences, and some correct student answer sentences are also omitted. Therefore, in order to avoid the above situation, the vector inner product calculating unit 806 calculates the vector inner product of the original student answer sentence and the standard answer sentence of the standard answer sentence, and expands the answer point collection according to the result data of the vector inner product, thereby reducing the error accumulation and improving the accuracy of intelligent scoring.
In this embodiment, the method of obtaining the student answer vector and the standard answer vector is not limited herein, such as the TF-IDF method. However, when the TF-IDF method is adopted, a situation that an inverse text Frequency Index (IDF) score is negative may occur, which affects the accuracy of the obtained answer vector. Therefore, in order to avoid this situation, in a specific implementation scenario, the vector inner product calculation unit 806 is further configured to perform vectorization processing on the student answer sentence to obtain a student answer vector; vectorizing each standard answer sentence to obtain a standard answer vector; and traversing each standard answer vector, and calculating the vector inner product of the student answer vector and the traversed standard answer vector to obtain the vector inner product of the student answer sentence and the standard answer sentence in the standard answer text.
Optionally, the vector inner product calculating unit 806 is further configured to perform word segmentation on the student answer sentences to obtain a student answer word set; calculating the word Frequency (TF) score, the IDF score and the word vector of each word in the student answer word set, and calculating to obtain the answer vector of each word according to the word Frequency score, the IDF score and the word vector of each word; and carrying out normalization processing on the answer vector of each word to obtain a normalized answer vector of each word, and obtaining a student answer vector according to the normalized answer vector of each word.
In this embodiment, the calculation formula of the word answer vector is:
term answer vector = term TF score term IDF score term vector.
Where the word TF score = word frequency/total word frequency of all words in the sentence. The word frequency determination method may be as follows: the number of times a word occurs in a student answer word set is determined. The determination method of the total word frequency of all words in the sentence can be as follows: determining the word frequency of each word in the student word set, and synthesizing the word frequency of each word to obtain the total word frequency of all words in the sentence.
In this embodiment, the inverse text frequency index score of each word in the student answer keyword set is calculated according to the following formula:
Figure BDA0002244538890000161
d is the total number of documents in a pre-established standard answer text library, wf is the total number of documents containing words in the standard answer text library, C is a constant and is not less than 2.
In this embodiment, in order to make the IDF score of a word more accurate, a standard answer text library may be established according to all standard answer texts of all questions.
Optionally, the vector inner product calculating unit 806 is further configured to perform word segmentation on all standard answer sentences in the standard answer text library, count the total number of documents including words in the standard answer text library as wf, and calculate the IDF score of each word according to an inverse text frequency index score calculation formula.
In this embodiment, when the IDF score of each word is calculated according to the existing inverse text frequency index score calculation formula, it may be caused that the obtained student answer vector or standard answer vector has a negative number, and it may occur that the actual vector inner product of the standard answer text and the student answer text satisfies the vector inner product determination condition, but since the student answer vector or standard answer vector has a negative number, the determination result is that the vector inner product does not satisfy the vector inner product determination condition, which affects the accuracy of the intelligent answer sheet.
In this embodiment, in order to calculate the vector inner product according to the standard answer vector and the student answer vector and improve the precision of the vector inner product, the vector inner product calculating unit 806 is further configured to perform data normalization processing on the standard answer vector and the student answer vector. The specific process of normalization processing includes dividing the word vector of each student answer by the word vector modular length of the student answer to obtain a normalized answer vector after normalization processing, and the normalization processing of the word answer vector of the standard answer is consistent with the normalization processing of the word answer vector of the student answer, and is not repeated here.
The effective answer point-collecting determining unit 807 is configured to obtain an effective answer point-collecting set of the student according to the result data of the each vector inner product and the answer point-collecting set of the student.
In this embodiment, according to the result data of each vector inner product, a first student answer sentence is obtained, where a standard answer sentence exists in the standard answer text, so that the vector inner product is not less than a vector inner product threshold, and it is determined whether a first student answer sentence exists in the student's answer point collection, if not, the first student answer sentence is added to the student's answer point collection, so as to obtain a valid answer point collection of the student.
The scoring unit 808 is configured to obtain the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to each student answer sentence in the effective answer point collection set.
In this embodiment, the scores corresponding to the standard answer sentences corresponding to all the student answer sentences in the effective answer point collection can be directly added to obtain the student scores of the questions, and certainly, the weight can be set for the score corresponding to each standard answer sentence, and the student scores of the questions can be obtained by performing weighted average on the scores corresponding to the standard answer sentences corresponding to the student answer sentences, which is not limited here.
Fig. 9 is a schematic structural diagram of an electronic device in a ninth embodiment of the present application. As shown in fig. 9, includes:
one or more processors 901;
a storage 902, which may be configured to store one or more programs,
when executed by one or more processors, cause the one or more processors to implement a method of flow restriction as in any of the embodiments described above.
Fig. 10 is a hardware structure of an electronic device in an embodiment of the present application; as shown in fig. 10, the hardware structure of the electronic device may include: a processor 1001, a communication interface 1002, a computer-readable storage medium 1003, and a communication bus 1004;
wherein the processor 1001, the communication interface 1002, and the computer-readable storage medium 1003 complete communication with each other through the communication bus 1004;
optionally, the communication interface 1002 may be an interface of a communication module, such as an interface of a GSM module; the processor 1001 may be specifically configured to: acquiring a standard answer text and a student answer text corresponding to the question; calculating the semantic similarity between the student answer sentence and a standard answer sentence in the standard answer text aiming at each student answer sentence in the student answer text to obtain corresponding semantic similarity result data; obtaining an answer sampling point set of the students according to the result data of each semantic similarity; and according to the scores corresponding to the standard answer sentences corresponding to each student answer sentence in the student answer point collection, obtaining the student scores of the questions.
The Processor 1001 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In the above embodiment, the electronic device may be a front-end intelligent terminal, or may be a background server, and when the electronic device is a front-end intelligent terminal, the electronic device is an intelligent household appliance. The appliance may include at least one of the following, for example: a television, a Digital Versatile Disc (DVD) player, an audio device, a refrigerator, an air conditioner, a vacuum cleaner, an oven, a microwave oven, a washing machine, an air purifier, a set-top box, a home automation control panel, a security control panel, a television box, a game machine, an electronic dictionary, an electronic key, a camcorder, and an electronic photo frame.
In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code configured to perform the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable storage medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory medium (RAM), a read-only memory medium (ROM), an erasable programmable read-only memory medium (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory medium (CD-ROM), an optical storage medium, a magnetic storage medium, or any suitable combination of the foregoing. In the context of this application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code configured to carry out operations for the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may travel through any type of network: including a Local Area Network (LAN) or a Wide Area Network (WAN) -to the user's computer, or alternatively, to an external computer (e.g., through the internet using an internet service provider).
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: the processor comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a standard answer text and a student answer text corresponding to a question; the similarity calculation unit is used for calculating the semantic similarity between each student answer sentence in the student answer text and the standard answer sentence in the standard answer text to obtain corresponding semantic similarity result data; the collection point determining unit is used for obtaining an answer collection point set of the student according to the result data of each semantic similarity; and the scoring unit is used for obtaining the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to the student answer sentences in the student answer point collection. For example, the acquisition unit may also be described as a "unit for acquiring a standard answer text and a student answer text corresponding to a topic".
As another aspect, the present application also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method as described in any of the embodiments above.
As another aspect, the present application also provides a computer-readable storage medium, which may be included in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable storage medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a standard answer text and a student answer text corresponding to a question; calculating the semantic similarity between the student answer sentence and a standard answer sentence in a standard answer text aiming at each student answer sentence in the student answer text to obtain corresponding semantic similarity result data; obtaining an answer collecting point set of the student according to the result data of each semantic similarity; and according to the scores corresponding to the standard answer sentences corresponding to each student answer sentence in the student answer point collection, obtaining the student scores of the questions.
The term "module" or "functional unit" as used herein may mean, for example, a unit including hardware, software, and firmware, or a unit including a combination of two or more of hardware, software, and firmware. A "module" may be used interchangeably with the terms "unit," "logic block," "component," or "circuit," for example. A "module" or "functional unit" may be a minimal unit of an integrated component element or a portion of an integrated component element. A "module" may be a minimal unit or a portion thereof for performing one or more functions. A "module" or "functional unit" may be implemented mechanically or electrically. For example, a "module" or "functional unit" according to the present disclosure may include at least one of: application Specific Integrated Circuit (ASIC) chips, field Programmable Gate Arrays (FPGAs), and programmable logic devices known or later developed to perform operations.
The foregoing description is only exemplary of the preferred embodiments of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (15)

1. An intelligent scoring method is characterized by comprising the following steps:
acquiring a standard answer text and a student answer text corresponding to a question, wherein the standard answer text comprises at least one standard answer sentence, and the student answer text comprises at least one student answer sentence;
calculating the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text aiming at each student answer sentence in the student answer text to obtain corresponding semantic similarity result data;
obtaining a collection point set of student answer texts according to the semantic similarity result data;
obtaining the score of the student answer text of the question according to the score corresponding to the standard answer sentence corresponding to each student answer sentence in the collection point set of the student answer text;
calculating semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text comprises:
performing word segmentation processing on the student answer sentences, and extracting a student answer keyword set;
traversing each standard answer sentence in the standard answer text, performing word segmentation processing on each traversed standard answer sentence, and extracting a standard answer keyword set corresponding to each standard answer sentence;
establishing a first key value pair list;
adding a key-value pair to each element in the first key-value pair list according to the standard answer key words in the standard answer key word set and the student answer key words in the student answer key word set, wherein the key-value pair comprises sequence information;
calculating the cosine similarity of the ith element and each element behind the ith element in the first key-value pair list, and modifying the sequence information of the key-value pairs of the elements behind the ith element, which meet an element cosine similarity threshold with the cosine similarity of the element of the ith element, into the sequence information of the key-value pairs of the ith element to obtain a second key-value pair list, wherein i is a positive integer;
obtaining a standard answer key-value pair set and a student answer key-value pair set according to the second key-value pair list;
calculating Jaccard similarity coefficients of a standard answer key value pair set and a student answer key value pair set according to sequence information in the standard answer key value pair set and sequence information in the student answer key value pair set, wherein the Jaccard similarity coefficients serve as similarity of the standard answer key word set and the student answer key word set;
and determining the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text according to the similarity between the standard answer keyword set and the student answer keyword set.
2. The method of claim 1, wherein obtaining a set of student answer text scoring points according to each of the semantic similarity result data comprises:
and adding the student answer sentences of which the standard answer sentences exist in the standard answer texts and the semantic similarity is not less than a semantic similarity threshold value into an answer point collection of the student according to the semantic similarity result data to obtain the answer point collection of the student.
3. The method of claim 1, wherein adding a key-value pair to each element in the first key-value pair list according to a standard answer keyword in the set of standard answer keywords and a student answer keyword in the set of student answer keywords comprises:
aiming at the kth student answer keyword, adding a key value pair containing identification information representing the student answer keyword and the sequence information to the xth element in the first key value pair list, wherein k is more than or equal to 1 and is less than or equal to m, and m is the number of the student answer keywords in the student answer keyword set;
aiming at the r standard answer keyword, adding a key value pair containing the identification information of the representation standard answer keyword and the sequence information to the y element in the first key value pair list, wherein r is more than or equal to 1 and less than or equal to n, and n is the number of the standard answer keywords in the standard answer keyword set;
wherein x is more than or equal to 1 and less than or equal to m, and m +1 is more than or equal to y; or y is more than or equal to 1 and less than or equal to n, and x, k, m, r and n are positive integers and are more than or equal to 1 and less than or equal to x, k, m and r.
4. The method of claim 3, wherein obtaining a set of standard answer key-value pairs and a set of student answer key-value pairs from the second list of key-value pairs comprises: when x is larger than or equal to 1 and smaller than or equal to m and m +1 is smaller than or equal to y, according to the representation student answer identification information contained in the key value pair, obtaining sequence information of 1 st to m th elements from the second key value pair list to form a student answer key value pair set, and according to the representation standard answer identification information contained in the key value pair, obtaining sequence information of m +1 th to m + n th elements from the second key value pair list to form a standard answer key value pair set;
when y is larger than or equal to 1 and smaller than or equal to n and n +1 is smaller than or equal to x, according to the representation standard answer identification information contained in the key value pair, sequence information of the 1 st to nth elements is obtained from the second key value pair list to form the standard answer key value pair set, and according to the representation student answer identification information contained in the key value pair, sequence information of the n +1 th to n + m th elements is obtained from the second key value pair list to form the student answer key value pair set.
5. The method of claim 1, wherein obtaining a set of scoring points for student answer text based on each of the semantic similarity result data comprises:
obtaining a semantic similar answer set of the student according to the semantic similarity result data;
for each student answer sentence in the semantic similar answer set of the student, calculating sentence component similarity between the student answer sentence and the standard answer sentence in the standard answer text to obtain corresponding sentence component similarity result data;
and obtaining an answer scoring point set of the student according to the sentence component similarity result data.
6. The method according to claim 5, wherein the calculating the sentence component similarity between the student answer sentence in the semantic similar answer set of the student and the standard answer sentence in the standard answer text comprises:
sentence component extraction processing is carried out on student answer sentences in the semantic similar answer set of the students to obtain sentence components of the student answer sentences;
traversing the standard answer sentences corresponding to the student answer sentences in the semantic similar answer set of the student, and performing sentence component extraction processing on each traversed standard answer sentence to obtain a sentence component of each standard answer sentence;
calculating the cosine similarity of the sentence components of the student answer sentence and the sentence components of the corresponding standard answer sentence;
obtaining sentence component similarity result data according to the cosine similarity of each sentence component;
the obtaining of the answer score collection of the students according to the sentence component similarity result data comprises the following steps:
and adding the student answer sentences of which the sentence component similarity is not less than a sentence component similarity threshold value into an answer scoring point set of the student according to the sentence component similarity result data to obtain the answer scoring point set of the student.
7. The method of claim 6, wherein obtaining the sentence-component-similarity result data according to the cosine similarity of each sentence component comprises:
and giving each sentence component a certain component similarity score, if the cosine similarity between one sentence component of the student answer sentence and the sentence component of the corresponding sentence component in the standard answer sentence is not less than the cosine similarity threshold of the sentence component, adding the component similarity score corresponding to the sentence component to the student answer sentence, and obtaining the sentence component similarity result data according to all the component similarity scores of the student answer sentence.
8. The method according to claim 6, wherein the sentence component extraction processing for each of the standard answer sentence and the student answer sentence is preceded by:
judging whether the sentence pattern types of the student answer sentence and the standard answer sentence are the same;
if not, carrying out sentence pattern conversion processing on the student answer sentence or the standard answer sentence, so that the student answer sentence and the standard answer sentence have the same sentence pattern type.
9. The method according to any one of claims 1-8, further comprising:
calculating a vector inner product of the student answer sentence and the standard answer sentence in the standard answer text aiming at each student answer sentence in the student answer text to obtain corresponding vector inner product result data;
obtaining an effective answer point collection of the students according to the vector inner product result data and the answer point collection of the students;
correspondingly, obtaining the student score of the question according to the score corresponding to the standard answer sentence corresponding to each student answer sentence in the student answer point collection set comprises:
and obtaining the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to each student answer sentence in the effective answer point collection set.
10. The method of claim 9, wherein the calculating the vector inner product of the student answer sentence and the standard answer sentence in the standard answer text comprises:
vectorizing the student answer sentences to obtain student answer vectors;
vectorizing each standard answer sentence to obtain a standard answer vector;
traversing each standard answer vector, and calculating the vector inner product of the student answer vector and the traversed standard answer vector to obtain the vector inner product of the student answer sentence and the standard answer sentence in the standard answer text;
the obtaining of the effective answer point collection set of the student according to the vector inner product result data and the answer point collection set of the student comprises:
and according to the result data of each vector inner product, acquiring a first student answer sentence of which the vector inner product is not less than a vector inner product threshold value due to the existence of the standard answer sentence in the standard answer text, judging whether the first student answer sentence exists in an answer point collection of the student or not, and if not, adding the first student answer sentence into the answer point collection of the student to obtain an effective answer point collection of the student.
11. The method according to claim 10, wherein the vectorizing the student answer sentence in the student answer text to obtain a student answer vector comprises:
performing word segmentation processing on the student answer sentence to obtain a student answer word set;
calculating the word frequency fraction, the inverse text frequency index fraction and the word vector of each word in the student answer word set, and calculating to obtain each word answer vector according to the word frequency fraction, the inverse text frequency index fraction and the word vector of each word;
and carrying out normalization processing on each word answer vector to obtain a normalization answer vector of each word, and obtaining the student answer vector according to the normalization answer vector of each word.
12. The method of claim 11, wherein the inverse text frequency index score for each word in the set of student answer keywords is calculated according to the formula:
Figure FDA0003773629760000031
d is the total number of documents in a pre-established standard answer text library, wf is the total number of documents containing the words in the standard answer text library, C is a constant and is not less than 2.
13. An intelligent scoring device, comprising:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a standard answer text and a student answer text corresponding to a question, the standard answer text comprises at least one standard answer sentence, and the student answer text comprises at least one student answer sentence;
a similarity calculation unit, configured to calculate, for each student answer sentence in the student answer text, a semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text, so as to obtain corresponding semantic similarity result data;
the point collection and division determination unit is used for obtaining a point collection of student answer texts according to the semantic similarity result data;
the scoring unit is used for obtaining the score of the student answer text of the question according to the score corresponding to the standard answer sentence corresponding to each student answer sentence in the collection point set of the student answer text;
the similarity calculation unit is specifically configured to:
performing word segmentation processing on the student answer sentence, and extracting a student answer keyword set;
traversing each standard answer sentence in the standard answer text, performing word segmentation processing on each traversed standard answer sentence, and extracting a standard answer keyword set corresponding to each standard answer sentence;
establishing a first key value pair list;
adding a key-value pair to each element in the first key-value pair list according to the standard answer key words in the standard answer key word set and the student answer key words in the student answer key word set, wherein the key-value pair comprises sequence information;
calculating the cosine similarity of the ith element and each element behind the ith element in the first key-value pair list, modifying the sequence information of the key-value pairs of the elements behind the ith element, which meet the cosine similarity threshold of the element of the ith element, into the sequence information of the key-value pairs of the ith element, and obtaining a second key-value pair list, wherein i is a positive integer;
obtaining a standard answer key-value pair set and a student answer key-value pair set according to the second key-value pair list;
calculating Jaccard similarity coefficients of a standard answer key value pair set and a student answer key value pair set according to sequence information in the standard answer key value pair set and sequence information in the student answer key value pair set, wherein the Jaccard similarity coefficients serve as similarity of the standard answer key word set and the student answer key word set;
and determining semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text according to the similarity between the standard answer keyword set and the student answer keyword set.
14. An electronic device, comprising:
one or more processors;
a storage configured to store one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-12.
15. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-12.
CN201911012221.XA 2019-10-23 2019-10-23 Intelligent marking method and device Active CN112700203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911012221.XA CN112700203B (en) 2019-10-23 2019-10-23 Intelligent marking method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911012221.XA CN112700203B (en) 2019-10-23 2019-10-23 Intelligent marking method and device

Publications (2)

Publication Number Publication Date
CN112700203A CN112700203A (en) 2021-04-23
CN112700203B true CN112700203B (en) 2022-11-01

Family

ID=75505040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911012221.XA Active CN112700203B (en) 2019-10-23 2019-10-23 Intelligent marking method and device

Country Status (1)

Country Link
CN (1) CN112700203B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627722B (en) * 2021-07-02 2024-04-02 湖北美和易思教育科技有限公司 Simple answer scoring method based on keyword segmentation, terminal and readable storage medium
CN113822040B (en) * 2021-08-06 2024-07-02 深圳市卓帆技术有限公司 Subjective question scoring method, subjective question scoring device, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9940367B1 (en) * 2014-08-13 2018-04-10 Google Llc Scoring candidate answer passages
CN104268603B (en) * 2014-09-16 2017-04-12 科大讯飞股份有限公司 Intelligent marking method and system for text objective questions
CN106980624B (en) * 2016-01-18 2021-03-26 阿里巴巴集团控股有限公司 Text data processing method and device
CN110196893A (en) * 2019-05-05 2019-09-03 平安科技(深圳)有限公司 Non- subjective item method to go over files, device and storage medium based on text similarity

Also Published As

Publication number Publication date
CN112700203A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
CN108073568B (en) Keyword extraction method and device
US10831769B2 (en) Search method and device for asking type query based on deep question and answer
CN108304375B (en) Information identification method and equipment, storage medium and terminal thereof
CN109815491B (en) Answer scoring method, device, computer equipment and storage medium
CN108108426B (en) Understanding method and device for natural language question and electronic equipment
CN109783631B (en) Community question-answer data verification method and device, computer equipment and storage medium
CN111104526A (en) Financial label extraction method and system based on keyword semantics
CN109359290B (en) Knowledge point determining method of test question text, electronic equipment and storage medium
CN110334209B (en) Text classification method, device, medium and electronic equipment
CN107301164B (en) Semantic analysis method and device for mathematical formula
CN111369980B (en) Voice detection method, device, electronic equipment and storage medium
US20160170993A1 (en) System and method for ranking news feeds
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN112700203B (en) Intelligent marking method and device
CN113486664A (en) Text data visualization analysis method, device, equipment and storage medium
CN113722478A (en) Multi-dimensional feature fusion similar event calculation method and system and electronic equipment
CN110659352A (en) Test question and test point identification method and system
CN107844531B (en) Answer output method and device and computer equipment
CN111708870A (en) Deep neural network-based question answering method and device and storage medium
CN110069772B (en) Device, method and storage medium for predicting scoring of question-answer content
CN112541069A (en) Text matching method, system, terminal and storage medium combined with keywords
CN110096708B (en) Calibration set determining method and device
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN113656575B (en) Training data generation method and device, electronic equipment and readable medium
CN113627722B (en) Simple answer scoring method based on keyword segmentation, terminal and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant