CN112700203A - Intelligent marking method and device - Google Patents
Intelligent marking method and device Download PDFInfo
- Publication number
- CN112700203A CN112700203A CN201911012221.XA CN201911012221A CN112700203A CN 112700203 A CN112700203 A CN 112700203A CN 201911012221 A CN201911012221 A CN 201911012221A CN 112700203 A CN112700203 A CN 112700203A
- Authority
- CN
- China
- Prior art keywords
- answer
- student
- standard
- sentence
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Human Resources & Organizations (AREA)
- Operations Research (AREA)
- Economics (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The embodiment of the application provides an intelligent marking method and device, wherein a standard text and a text to be read corresponding to a question are obtained, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read; calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read to obtain corresponding semantic similarity result data; obtaining a collection of points of the text to be read according to the semantic similarity result data; the score of the subject text to be read is obtained according to the score corresponding to the standard field corresponding to each field to be read in the collection point set of the text to be read, so that intelligent paper reading is realized, time and labor are saved, paper reading efficiency is improved, the influence of human subjective factors on examination results in the paper reading process is reduced, and objective fairness and accuracy of paper reading are guaranteed.
Description
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to an intelligent marking method and device.
Background
In the field of education, students are usually examined to master the learning conditions of the students. The general examination mode is examination, the examination result is graded, and the learning condition of the student is mastered according to the grade of the student.
Examination questions of an examination are generally classified into subjective questions and objective questions. The objective questions are selected by the students, and the students select one or more answers from a plurality of options according to the examination questions. As the answers of the objective questions are fixed, the objective questions can be conveniently scored by a computer, so that compared with manual scoring, the scoring mode of the computer can shorten the scoring time, save the labor cost of scoring and improve the scoring efficiency. However, for the subjective questions, the students usually use a discussion mode to answer the examination questions, and during the answering process, the students usually answer according to their understanding and thinking modes, so the standard answers of the subjective questions can only be used as a reference, but not as an absolute standard, and the computer cannot be used to evaluate how many points the answers of the students should be for each sentence.
When subjective questions are evaluated in a manual examination paper reading mode, a large amount of labor and time are consumed in the middle processes of binding, reading, transferring test papers and the like, and examination paper reading efficiency is low. In addition, the manual paper marking mode has strong subjectivity. Different people with the same question have different scoring results and even have large differences. Even the same reader's understanding of the same topic at different times may differ, and thus may also contribute to differences in scoring results.
Disclosure of Invention
In view of the above, an objective of the embodiments of the present invention is to provide an intelligent paper marking method and device, so as to overcome the defects in the prior art.
In one aspect, an embodiment of the present application provides an intelligent paper marking method, including:
acquiring a standard text and a text to be read corresponding to a title, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read; calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read to obtain corresponding semantic similarity result data; obtaining a collection of points of the text to be read according to the semantic similarity result data; and obtaining the score of the text to be read of the subject according to the score corresponding to the standard field corresponding to each field to be read in the score collection set of the text to be read.
On the other hand, the embodiment of the present application provides an intelligent paper marking device, including:
the reading device comprises an acquisition unit, a reading unit and a display unit, wherein the acquisition unit is used for acquiring a standard text and a text to be read corresponding to a title, the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read;
the similarity calculation unit is used for calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read so as to obtain corresponding semantic similarity result data;
the sampling point determining unit is used for obtaining a sampling point set of the text to be read according to the semantic similarity result data;
and the scoring unit is used for obtaining the score of the text to be read of the subject according to the score corresponding to the standard field corresponding to each field to be read in the score collection of the text to be read.
In another aspect, an embodiment of the present application provides an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method as in any one of the embodiments described above.
In yet another aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement the method as described in any one of the above embodiments.
According to the intelligent paper marking method and device, the standard text and the text to be read corresponding to the question are obtained, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read; calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read to obtain corresponding semantic similarity result data; obtaining a collection of points of the text to be read according to the semantic similarity result data; the score of the subject text to be read is obtained according to the score corresponding to the standard field corresponding to each field to be read in the collection point set of the text to be read, so that intelligent paper reading is realized, time and labor are saved, paper reading efficiency is improved, the influence of human subjective factors on examination results in the paper reading process is reduced, and objective fairness of paper reading is ensured.
Drawings
Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:
fig. 1 is a schematic flow chart of an intelligent paper marking method according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of an intelligent paper marking method in the second embodiment of the present application.
Fig. 3 is a schematic flow chart of an intelligent paper marking method in the third embodiment of the present application.
Fig. 4 is a schematic flow chart of an intelligent paper marking method in the fourth embodiment of the present application.
Fig. 5 is a schematic structural diagram of a fifth exemplary embodiment of the present application.
Fig. 6 is a schematic structural diagram of an intelligent scrolling device in the sixth embodiment of the application.
FIG. 7 is a schematic structural diagram of a seventh exemplary embodiment of the present disclosure.
FIG. 8 is a schematic structural diagram of an intelligent scrolling device according to an eighth embodiment of the present application.
Fig. 9 is a schematic structural diagram of an electronic device in a ninth embodiment of the present application.
Fig. 10 is a hardware structure of an electronic device in a tenth embodiment of the present application.
Detailed Description
It is not necessary for any particular embodiment of the invention to achieve all of the above advantages at the same time.
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.
The test paper is generally divided into objective questions and subjective questions, in the embodiment of the application, the intelligent examination paper is mainly used for intelligently examining the subjective questions in the test paper, and the intelligent examination paper can be suitable for the subjective questions of any subject, such as Chinese subjective questions, political subjective questions, historical subjective questions and the like, and the method is not limited here.
The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.
Fig. 1 is a schematic flow chart of an intelligent paper marking method according to an embodiment of the present application. As shown in fig. 1, includes:
step S101, a standard text and a text to be read corresponding to the title are obtained, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read.
In this embodiment, a corresponding standard text is preset for a question on a test paper as a reference answer during reading. For the subjective question, there is no fixed reference answer, so 1,2 or more reference answers may be set, and the corresponding standard texts may also be 1,2 or more reference answers, which is not limited herein. In addition, in consideration of the fact that the standard texts of the subject matter may have many discussions, each standard text includes at least one standard field.
The standard text is not limited in form, and may be, for example, a word or a picture, or may be a combination of a word and a picture.
In this embodiment, the manner of obtaining the standard text is not limited here, and the corresponding relationship between the standard text and the title may be established, and the standard text and the corresponding relationship are stored. When marking the paper, firstly determining the title, and then determining the standard text corresponding to the title according to the corresponding relation between the title and the standard text.
Optionally, if the standard text and the corresponding relationship between the standard text and the question need to be stored before going through the examination paper each time, the preparation work before going through the examination paper is complicated, and in order to avoid this situation, a standard text library can be established, and all the standard texts and the corresponding relationships between the standard texts and the questions of all the questions are stored in the standard text library. When the scoring point of the question is determined, the standard text corresponding to the question is determined in the standard text library, so that the preparation work before scoring is reduced, and the scoring efficiency is improved.
In this embodiment, the questions on the test paper also correspond to the answering contents of the students, i.e., the text to be read. Also, considering that many discussions may be made in answering the subject question, each of the texts to be read includes at least one field to be read. It should be noted that the form of the text to be read is not limited, and may be, for example, a character, a picture, or a combination of a picture and a character.
In this embodiment, the manner of obtaining the text to be read is not limited, and before the examination paper is read, the written answers of the students for the questions may be scanned and stored as the text to be read (which may be referred to as offline acquisition), or the written answers of the students may be answered directly by the computer and stored as the text to be read (which may be referred to as online acquisition).
It should be noted that the field may be a sentence or a word, and may also be defined by itself according to the requirement of paper marking, which is not limited herein. When the field is a sentence, different sentences may be divided according to punctuations in the standard text or the text to be read, for example, characters between two periods may be divided into one sentence, or characters between any two punctuations may be divided into one sentence, which is not limited herein.
And S102, calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read, and obtaining corresponding semantic similarity result data.
In this embodiment, the semantic similarity obtained by calculation for each field to be read is the semantic similarity between the field to be read and each standard field in the standard text corresponding to the question. It should be noted that various methods may be adopted to calculate the semantic similarity, which is not limited herein as long as the semantic similarity can be obtained. For example, the semantic similarity may be obtained by calculating the euclidean distance between the standard field and the field to be read, or the semantic similarity may be obtained by calculating the pearson distance between the standard text and the text to be read.
In addition, in the process of marking, when calculating the semantic similarity between the field to be read and the standard field in the standard text, the semantic similarity between each field to be read and the standard field can be judged one by one, and the semantic similarity between all the fields to be read and the standard field can also be judged at the same time, which is not limited herein.
In this embodiment, it is considered that although the student can answer the correct answer, the text to be read and the standard text may not be completely consistent because the language logic of the text to be read and the standard text may be different. Therefore, when the examination paper is intelligently read, if the examination paper is directly read by judging whether the characters in the text to be read are consistent with the characters in the standard text, the accuracy of the examination paper is influenced. Therefore, whether the two values are consistent or not is determined by calculating the semantic similarity of the text to be read and the standard text, so that the score of the text to be read is determined, and the accuracy of score calculation of the text to be read can be improved.
And S103, obtaining a collection of points of the text to be read according to the semantic similarity result data.
In this embodiment, whether the field to be read is in fit with the standard field or not can be judged according to the semantic similarity result data, if yes, the field to be read is added to the division point set of the text to be read, and if not, the field to be read is not added to the division point set of the text to be read. The obtained scoring point set of the text to be read comprises the field to be read matched with the standard field.
And step S104, obtaining the score of the text to be read of the subject according to the score corresponding to the standard field corresponding to each field to be read in the score collection set of the text to be read.
In this embodiment, the scores of the text to be read of the subject can be obtained by directly adding the scores corresponding to the standard fields corresponding to all the fields to be read in the score collection set of the text to be read, and of course, a weight can be set for the score corresponding to each standard field, and the score of the text to be read of the subject can be obtained by performing weighted average on the scores corresponding to the standard fields corresponding to the fields to be read, which is not limited here.
According to the intelligent marking method, the standard text and the text to be read corresponding to the subject are obtained, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read; calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read to obtain corresponding semantic similarity result data; obtaining a collection of points of the text to be read according to the semantic similarity result data; the score of the subject text to be read is obtained according to the score corresponding to the standard field corresponding to each field to be read in the collection point set of the text to be read, so that intelligent paper reading is realized, time and labor are saved, paper reading efficiency is improved, the influence of human subjective factors on examination results in the paper reading process is reduced, and objective fairness and accuracy of paper reading are guaranteed.
Fig. 2 is a schematic flow chart of an intelligent paper marking method according to an embodiment of the present application. The difference between the present embodiment and the above embodiments is that in the present embodiment, the standard text is used as the standard answer text, the text to be read is the student answer text, the standard field is the standard answer sentence, and the field to be read is the student answer sentence. As shown in fig. 2, includes:
step S201, obtaining a standard answer text and a student answer text corresponding to the question.
In this embodiment, the standard text is a standard answer text, and the text to be read is a student answer text. Correspondingly, the standard field in the standard text corresponds to a standard answer sentence, and the field to be read in the text to be read corresponds to a student answer sentence.
Step S202, aiming at each student answer sentence in the student answer text, calculating the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text to obtain corresponding semantic similarity result data.
In this embodiment, calculating the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text includes:
carrying out word segmentation on the student answer sentence, and extracting a student answer keyword set; traversing each standard answer sentence in the standard answer text, performing word segmentation processing on each traversed standard answer sentence, and extracting a standard answer keyword set corresponding to each standard answer sentence; calculating the similarity between the standard answer keyword set and the student answer keyword set; and determining semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text according to the similarity between the standard answer keyword set and the student answer keyword set.
In this embodiment, the similarity between the standard answer keyword set and the student answer keyword set can be obtained through the Jaccard similarity coefficient between the standard answer keyword set and the student answer keyword set. Among them, the Jaccard similarity coefficient (Jaccard similarity coefficient) is used to compare the similarity and difference between limited sample sets.
In this embodiment, before calculating the Jaccard similarity coefficient, the student answer sentence and the standard answer sentence are first subjected to word segmentation to obtain a student answer keyword set and a standard answer keyword set. In particular, word segmentation tools, such as THULAC, NLPIR, etc., may be employed for word segmentation processing. In order to increase the word segmentation processing speed and improve the word segmentation accuracy, a Jieba word segmentation tool is preferably used for word segmentation processing.
In this embodiment, the standard keyword set and the student keyword set are obtained by performing word segmentation processing on the standard answer sentence and the student answer sentence respectively through a word segmentation tool, instead of manually defining the standard keyword set and the student keyword set, so that the influence of human factors on an intelligent paper marking process is avoided, and the accuracy of intelligent paper marking is improved.
In this embodiment, when considering that the Jaccard similarity coefficients of the standard answer keyword set and the student answer keyword set are directly calculated, it is determined that the two keywords are similar only when the keywords in the standard answer keyword set and the student answer keyword set are completely consistent, but in an actual situation, only the semantics of the student answer text and the keywords in the standard answer text are similar, for example, the student answer keywords are happy, the standard answer keywords are happy, and if the Jaccard similarity coefficient is directly calculated, the standard answer keyword set is 0, but the actual situation is 1, which may cause a wrong judgment and a missed judgment.
So to avoid this happening, a first list of key-value pairs may be established; adding a key value pair to each element in the first key value pair list according to the standard answer key words in the standard answer key word set and the student answer key words in the student answer key word set; calculating the cosine similarity of the ith element and each element behind the ith element in the first key-value pair list, and modifying the key-value pairs of the elements behind the ith element, which meet the cosine similarity threshold of the element, to obtain a second key-value pair list, wherein i is a positive integer; obtaining a standard answer key-value pair set and a student answer key-value pair set according to the second key-value pair list; and calculating the similarity of the standard answer key-value pair set and the student answer key-value pair set, namely calculating the Jaccard similarity coefficient of the standard answer key-value pair set and the student answer key-value pair set.
At this time, determining the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text according to the similarity between the standard answer keyword set and the student answer keyword set includes: the similarity of the standard answer key-value pair set and the student answer key-value pair set determines the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text. It should be noted that, in other embodiments, the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text may also be determined in other ways, which is not limited herein.
Optionally, adding a key-value pair to each element in the first key-value pair list according to the standard answer key-word in the standard answer key-word set and the student answer key-word in the student answer key-word set includes:
aiming at the kth student answer keyword, adding a key value pair containing identification information representing the student answer keyword and sequence information to the xth element in the first key value pair list, wherein k is more than or equal to 1 and is less than or equal to m, and m is the number of the student answer keywords in the student answer keyword set;
aiming at the r standard answer keyword, adding a key value pair containing the identification information of the representation standard answer keyword and the sequence information to the y element in the first key value pair list, wherein r is more than or equal to 1 and less than or equal to n, and n is the number of the standard answer keywords in the standard answer keyword set;
wherein x is more than or equal to 1 and less than or equal to m, and m +1 is more than or equal to y; or y is more than or equal to 1 and less than or equal to n, and x, k, m, r and n are positive integers and are more than or equal to 1 and less than or equal to x, k, m and r.
In this embodiment, when adding a key value pair to each element in the first key value pair list, a key value pair including identification information representing the student answer key words may be sequentially added to the 1 st element to the mth element in the first key value pair list according to the student answer key words, and then a key value pair including identification information representing the standard answer key words may be sequentially added to the m +1 th element to the last element (i.e., the m + n th element) in the first key value pair list according to the standard answer key words, where x is greater than or equal to 1 and less than or equal to m, and m +1 is less than or equal to y, that is, in the first key value pair list, the first m elements correspond to relevant fields of the student answer key words, and the last n elements correspond to relevant fields of the standard answer key words. Certainly, key value pairs containing identification information of the representation standard answer key words can be added to the 1 st element to the nth element in the first key value pair list in sequence according to the labeled answer key words, then key value pairs containing identification information of the representation standard answer key words are added to the (n + 1) th element to the last element (namely, the (m + n) th element) in the first key value pair list in sequence according to the student answer key words, at the moment, y is more than or equal to 1 and less than or equal to n, n +1 and less than or equal to x, namely, in the first key value pair list, the first n elements correspond to relevant fields of the standard answer key words, and the last m elements correspond to relevant fields of the student answer key words. The specific order of addition is not limited herein.
In this embodiment, the identification information of the keyword representing the standard answer and the sequence information may be included by one key value pair, and may also be included by two or more key value pairs, which is not limited herein. The representation standard answer keyword identification information comprises word attributes for identifying the standard answer keywords and a standard answer keyword set to which the representation standard answer keywords belong, and the sequence information can be sequence information of the standard answer keywords in the standard answer set, sequence information of all standard answer keywords of the standard answer keywords in a first key-value pair list, and sequence information of the standard answer keywords in the first key-value pair list. Similarly, the identification information of the keyword characterizing the student answers and the sequence information may be included by one key value pair, and may of course be included by two or more key value pairs, which is not limited herein. The identification information for representing the student answer keywords comprises word attributes for identifying the student answer keywords and a student answer keyword set to which the student answer keywords belong, and the sequence information can be sequence information of the student answer keywords in the student answer set, sequence information of all the student answer keywords of the student answer keywords in a first key-value pair list, and sequence information of the student answer keywords in the first key-value pair list. It should be noted that the sequence information is only required to show the order of the keywords in the set or the list, and the specific form is not limited here.
In this embodiment, taking x is greater than or equal to 1 and less than or equal to m, and m +1 is less than or equal to y as an example, sequence information included in a key-value pair added for each element in the first key-value pair list for a student answer keyword may be 1 to m, and sequence information included in a key-value pair added for each element in the first key-value pair list for a standard answer keyword may be-1 to-m or m +1 to m + n, which is not limited herein.
Optionally, in a specific implementation scenario, three key-value pairs are added to each element in the first key-value pair list, a key of the first key-value pair is a word attribute, a value of the first key-value pair is a keyword, a key of the second key-value pair is a category attribute, a value of the second key-value pair is a student answer keyword set or a standard answer keyword set, a key of the third key-value pair is a variable attribute, and a value of the third key-value pair is sequence information of the student answer keyword or sequence information of the standard answer keyword. For example, if the 3 rd keyword in the standard answer keyword set is "happy", three key-value pairs are added to the corresponding element, the key of the first key-value pair is a word attribute, the value of the first key-value pair is "happy", the key of the second key-value pair is a category attribute, the value of the second key-value pair is "standard answer keyword set" or a letter Q representing the standard answer category attribute, the key of the third key-value pair is a variable attribute, and the value of the third key-value pair is "3".
Optionally, modifying the key-value pair of the element, which satisfies the element cosine similarity threshold with the element cosine similarity of the ith element, in the elements after the ith element includes: and modifying the sequence information of the key value pairs of the elements, of which the cosine similarity with the element of the ith element meets the element cosine similarity threshold, into the sequence information of the key value pairs of the ith element. For example, when x is greater than or equal to 1 and less than or equal to m and m +1 is less than or equal to y, cosine similarity between the word vectors of the 2 nd element to the m + n th element and the 1 st element is respectively judged from the 1 st element of the first key-value pair list, if the cosine similarity between the word vectors of the p-th element and the 1 st element in the 2 nd element to the m + n th element meets an element cosine similarity threshold, sequence information of the key-value pair of the p-th element is modified into sequence information of the key-value pair of the 1 st element, then cosine similarity between the word vectors of the 3 rd element to the m + n th element and the 2 nd element is judged, and each element is sequentially judged until cosine similarity between the word vectors of the m + n th element and the reciprocal second element is judged. It should be noted that the word vector of each element may be correspondingly queried from a corpus word vector library, for example, the word vector may be obtained from a corpus word vector library of the people's daily newspaper. The cosine similarity threshold may be set according to requirements, and is not limited herein. Wherein p is more than or equal to 2 and less than or equal to m + n.
Optionally, obtaining a standard answer key-value pair set and a student answer key-value pair set according to the second key-value pair list includes: when x is larger than or equal to 1 and smaller than or equal to m and m +1 is smaller than or equal to y, according to the representation student answer identification information contained in the key value pair, obtaining sequence information of the 1 st element to the mth element from the second key value pair list to form a student answer key value pair set, and according to the representation standard answer identification information contained in the key value pair, obtaining sequence information of the m +1 th element to the m + n th element from the second key value pair list to form a standard answer key value pair set;
when y is larger than or equal to 1 and smaller than or equal to n and n +1 is smaller than or equal to x, obtaining sequence information of the 1 st to nth elements from the second key-value pair list according to the representation standard answer identification information contained in the key-value pair to form a standard answer key-value pair set, and obtaining sequence information of the n +1 th to n + m th elements from the second key-value pair list according to the representation student answer identification information contained in the key-value pair to form a student answer key-value pair set. It should be noted that, in other embodiments, the standard answer key-value pair set and the student answer key-value pair set may be obtained in other manners, which is not limited herein.
It should be noted that, in other embodiments, the standard answer key-value pair set and the student answer key-value pair set may further include data representing standard answer identification information and data representing student answer identification information, respectively, besides the sequence information, and this is not limited herein.
In addition, when the standard answer key value pair set and the student answer key value pair set only contain sequence information, elements with the same sequence information in the sets can be merged according to the mutual difference of the sets in the process of generating the standard answer key value pair set and the student answer key value pair set.
Optionally, the specific process of calculating the similarity between the standard answer key-value pair set and the student answer key-value pair set includes: and counting the number of sequence information in the intersection of the standard answer key-value pair set and the student answer key-value pair set and the number of sequence information in the union of the standard answer key-value pair set and the student answer key-value pair set, wherein the ratio of the number of sequence information in the intersection of the standard answer key-value pair set and the student answer key-value pair set to the number of sequence information in the union of the standard answer key-value pair set and the student answer key-value pair set is the similarity of the standard answer key-value pair set and the student answer key-value pair set. For example, the set of standard answer key-value pairs New-P ═ {1,2,3,4}, the set of student answer key-value pairs New-Q ═ {1,2, 3}, then the similarity of the set of standard answer key-value pairs and the set of student answer key-value pairs Jaccard similarity coefficient ═ (New-P ∞ New-Q)/(New-P ≡ New-Q) ═ 3/4 ═ 0.75. Of course, in other embodiments, the similarity between the standard answer key-value pair set and the student answer key-value pair set may be calculated by other suitable methods, which are not limited herein.
The following describes an example of calculating the similarity between the student answer keyword set and the standard answer keyword set.
A standard answer keyword set P is (Happy, Chinese, travel), and a student answer keyword set Q is (happiness, China);
firstly, establishing a first key value pair list, and sequentially adding 3 key value pairs for the 1 st to 3 rd elements in the first key value pair list according to keywords in standard keywords, wherein keys of the 1 st key value pair of the 1 st element are word attributes, and the values of the 1 st key value pair are happy; the key of the 2 nd key-value pair of the 1 st element is a category attribute, the value of the 2 nd key-value pair is P, the key of the 3 rd key-value pair of the 1 st element is a variable attribute, and the value of the 3 rd key-value pair is 1. The key of the 1 st key value pair of the 2 nd element is a word attribute, and the value of the 1 st key value pair is China; the key of the 2 nd key-value pair of the 2 nd element is a category attribute, the value of the 2 nd key-value pair is P, the key of the 3 rd key-value pair of the 2 nd element is a variable attribute, and the value of the 3 rd key-value pair is 3. The key of the 1 st key value pair of the 3 rd element is a word attribute, and the value of the 1 st key value pair is travel; the key of the 2 nd key-value pair of the 3 rd element is a category attribute, the value of the 2 nd key-value pair is P, the key of the 3 rd key-value pair of the 3 rd element is a variable attribute, and the value of the 3 rd key-value pair is 3.
Secondly, adding 3 key value pairs to the 4 th to 5 th elements in the first key value pair list in sequence, wherein the key of the 1 st key value pair of the 4 th element is a word attribute, and the value of the 1 st key value pair is happy; the key of the 2 nd key-value pair of the 4 th element is a category attribute, the value of the 2 nd key-value pair is Q, the key of the 3 rd key-value pair of the 4 th element is a variable attribute, and the value of the 3 rd key-value pair is-1. The key of the 1 st key-value pair of the 5 th element is a word attribute, and the value of the 1 st key-value pair is Chinese; the key of the 2 nd key-value pair of the 5 th element is a category attribute, the value of the 2 nd key-value pair is Q, the key of the 3 rd key-value pair of the 5 th element is a variable attribute, and the value of the 3 rd key-value pair is-2.
Thirdly, from the 1 st element, respectively judging the cosine similarity between the 2 nd element to the 5 th element and the 1 st element, if the cosine similarity of the 4 th element meets the threshold value of the cosine similarity of the elements, modifying the value of the 3 rd key value pair of the 4 th element to 1, and then repeating the above steps to modify the values of the 3 rd key value pairs of other elements. And obtaining a second key-value pair list after modification.
Fourthly, values of a 3 rd key-value pair of the 1 st element to the 3 rd element are sequentially obtained according to a 2 nd key-value pair of each element in the second key-value pair list to form a standard answer key-value pair set New-P ═ 1,2,3,4}, then values of a 3 rd key-value pair of the 4 th element and the 5 th element are sequentially obtained to form a student answer key-value pair set New-Q ═ 1, 2}, and a Jaccard similarity coefficient (New-P n-Q)/(New-P New-Q) ═ 2/3 ═ 0.667 is calculated.
In this embodiment, after the similarity between the standard answer keyword set and the student answer keyword set is obtained through calculation, the similarity between the standard answer keyword set and the student answer keyword set may be directly used as the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text, and certainly, the similarity between the standard answer keyword set and the student answer keyword set may also be normalized or weighted and averaged to obtain the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text, which is not limited here.
And S203, obtaining an answer point collection set of the student according to the result data of each semantic similarity.
In this embodiment, according to each semantic similarity result data, a student answer sentence in which a standard answer sentence exists in a standard answer text and the semantic similarity is not less than a semantic similarity threshold value may be added to an answer point collection of a student to obtain an answer point collection of the student. It should be noted that the semantic similarity threshold may be set by itself, and is not limited herein.
And S204, obtaining the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to the answer sentences of each student in the student answer point collection set.
In this embodiment, steps S203 and S204 are similar to steps S103 and S104, and are not described again here.
Fig. 3 is a schematic flow chart of an intelligent paper marking method in the third embodiment of the present application. The difference between this embodiment and the above embodiment is that in this embodiment, the sentence component similarity is also calculated for the student answer sentence and the standard answer sentence. As shown in fig. 3, includes:
step S301, a standard answer text and a student answer text corresponding to the question are obtained.
Step S302, aiming at each student answer sentence in the student answer text, calculating the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text to obtain corresponding semantic similarity result data.
And step S303, obtaining a semantic similar answer set of the student according to the result data of each semantic similarity.
In this embodiment, according to each semantic similarity result data, a student answer sentence, in which a standard answer sentence exists in a standard answer text and the semantic similarity is not less than a semantic similarity threshold, may be added to a semantic similar answer set of a student to obtain the semantic similar answer set of the student. It should be noted that the semantic similarity threshold may be set by itself, and is not limited herein.
Step S304, calculating the sentence component similarity between the student answer sentence and the standard answer sentence in the standard answer text aiming at each student answer sentence in the semantic similar answer set of the students to obtain corresponding sentence component similarity result data.
In this embodiment, it is considered that if the student scores of the questions are directly obtained according to each student answer sentence of which the semantic similarity result data is not less than the semantic similarity threshold, a case of wrong judgment and missed judgment may occur, and the scores are inaccurate. In order to improve the accuracy of the score, the sentence component similarity of the student answer sentence and the standard answer sentence in the standard answer text can be calculated, whether the student answer sentence is matched with the standard answer sentence is judged again according to the sentence component similarity result data, if the student answer sentence is judged to be matched with the standard answer sentence, the student answer sentence is added into the student answer point collection, and if not, the student answer sentence is not added into the student answer point collection. The obtained student answer point collection comprises student answer sentences matched with the standard answer sentence semantic similarity and the sentence component similarity, so that the student scores of the questions obtained according to the student point collection are more accurate.
In this embodiment, various methods may be adopted to calculate the sentence component similarity, which is not limited herein as long as the sentence component similarity can be obtained. For example, the similarity of sentence components is obtained by calculating the euclidean distance between the standard answer text and the student answer text, or the similarity of sentence components is obtained by calculating the pearson distance between the standard answer text and the student answer text, etc.
Optionally, the calculating the sentence component similarity between the student answer sentence in the semantic similar answer set of the student and the standard answer sentence in the standard answer text includes: sentence component extraction processing is carried out on student answer sentences in the semantic similar answer set of the students to obtain sentence components of the student answer sentences; traversing standard answer sentences corresponding to student answer sentences in the semantic similar answer set of the students, and performing sentence component extraction processing on each traversed standard answer sentence to obtain sentence components of each standard answer sentence; calculating the cosine similarity of sentence components of the student answer sentences and sentence components of corresponding standard answer sentences; and obtaining sentence component similarity result data according to the cosine similarity of each sentence component.
Correspondingly, obtaining the answer point collection set of the student according to the result data of the similarity of each sentence component comprises the following steps: and adding the student answer sentences of which the sentence component similarity is not less than the sentence component similarity threshold into the student answer point collection according to the sentence component similarity result data to obtain the student answer point collection.
In this embodiment, when the sentence component extraction processing is performed on the student answer sentence and the standard answer sentence, the sentence component extraction processing is performed by using a sentence component extraction tool, for example, the sentence component extraction is performed by using an LTP tool, or the part-of-speech tagging by Jieba may be used, which is not limited herein.
In the present embodiment, it is considered that the sentence component of each sentence includes a single sentence component such as a subject, a predicate, an object, and a predicate of text. Therefore, when calculating the sentence component similarity, the cosine similarity of the sentence component of the subject of the standard answer sentence and the subject of the student answer sentence may be calculated first, and then the cosine similarity of the sentence component of the predicate of the standard answer sentence and the predicate of the student answer sentence may be calculated. And obtaining sentence component similarity result data according to the cosine similarity of each sentence component according to the sentence cosine similarity of the subject of the student answer sentence and the standard answer sentence, the cosine similarity of the sentence of the predicate and other sentence component cosine similarities.
In this embodiment, when calculating the cosine similarity between the subject of the student answer sentence and the sentence component of the subject of the standard answer sentence, the word vector of the subject of the student answer sentence and the word vector of the subject of the standard answer sentence may be determined first, and then the cosine similarity between the word vector of the subject of the student answer sentence and the cosine similarity between the word vector of the subject of the standard answer sentence may be calculated. The method for determining the word vector of the subject is the same as the method for determining the word vector of the keyword in the above embodiments, and details are not repeated here. The calculation methods of the similarity of other single sentence components are similar, and are not repeated here.
Optionally, in a specific implementation scenario, obtaining sentence component similarity result data according to the cosine similarity of each sentence component includes: and giving each sentence component a certain component similarity score, if the cosine similarity between one sentence component of the student answer sentence and the sentence component of the corresponding sentence component in the standard answer sentence is not less than the cosine similarity threshold of the sentence component, adding the component similarity score corresponding to the sentence component to the student answer sentence, and obtaining sentence component similarity result data according to all the component similarity scores of the student answer sentence.
It should be noted that, when sentence component similarity result data is obtained according to all component similarity scores of a student answer sentence, all component similarity scores of the student answer sentence may be directly added to obtain sentence component similarity result data, and in addition, all component similarity scores of the student answer sentence may be subjected to weighted average processing to obtain sentence component similarity result data, which is not limited herein. In addition, certain component similarity scores of each single sentence component can be consistent or can be distributed according to proportions, and are not limited herein.
It should be noted that, if there are a plurality of student answer sentences, there are a plurality of standard answer sentences corresponding to the plurality of student answer sentences, and at this time, the similarity between each student answer sentence and the sentence component of each standard answer sentence corresponding to the student answer sentence can be determined one by one, or the similarity between each student answer sentence and the sentence component of each standard answer sentence corresponding to the student answer sentence can be determined at the same time, which is not limited herein.
Optionally, if the sentence pattern type of the student answer sentence is not consistent with the sentence pattern type of the standard answer sentence, the similarity of the sentence components calculated according to the extracted student answer sentence components and the standard answer sentence components may be deviated, resulting in a false judgment. Therefore, in order to avoid the situation, whether the sentence pattern types of the student answer sentence and the corresponding standard answer sentence are the same or not can be judged before the sentence component extraction processing is carried out; if not, carrying out sentence pattern conversion processing on the student answer sentence or the standard answer sentence, so that the student answer sentence is the same as the standard answer sentence in sentence pattern type.
And S305, obtaining an answer point collection set of the student according to the result data of the similarity of each sentence component.
And S306, obtaining the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to the answer sentences of each student in the student answer point collection set.
In this embodiment, step S305 and step S306 are similar to the above embodiments, and are not described again here.
In the embodiment, the semantic similarity and sentence component similarity between the student answer sentences in the answer collection point set and the standard answer sentences are not less than the semantic similarity threshold and sentence component similarity threshold, so that the accuracy of the student answer sentences in the answer collection point set is ensured, and the student scores of the questions are more accurate.
Fig. 4 is a schematic flow chart of an intelligent paper marking method in the fourth embodiment of the present application. The present embodiment is different from the above-mentioned embodiments in that the answer point collection is expanded according to the obtained standard answer text and student answer text. As shown in fig. 4, includes:
step S401, a standard answer text and a student answer text corresponding to the question are obtained.
Step S402, calculating semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text aiming at each student answer sentence in the student answer text to obtain corresponding semantic similarity result data.
And S403, obtaining a semantic similar answer set of the student according to the semantic similarity result data.
Step S404, calculating the sentence component similarity between the student answer sentence and the standard answer sentence in the standard answer text aiming at each student answer sentence in the semantic similar answer set of the student to obtain corresponding sentence component similarity result data.
And S405, obtaining an answer point collection set of the student according to the result data of the similarity of each sentence component.
In this embodiment, steps S401 to S405 are similar to the above embodiments, and are not described again here.
Step S406, calculating a vector inner product of the student answer sentence and a standard answer sentence in the standard answer text for each student answer sentence in the student answer text, and obtaining corresponding vector inner product result data.
In this embodiment, it is considered that errors generally exist in the process of screening student answer sentences by calculating semantic similarity between the student answer sentences and the standard answer sentences, some correct student answer sentences are omitted, and errors generally exist in the process of screening student answer sentences by calculating sentence component similarity between the student answer sentences and the standard answer sentences, and some correct student answer sentences are also omitted. Therefore, errors in the two screening processes can be accumulated by the answer point collection and point division sets, and the student scores obtained according to the answer point collection and point division sets can generate certain deviation.
In this embodiment, the method of obtaining the student answer vector and the standard answer vector is not limited herein, such as the TF-IDF method. However, when the TF-IDF method is adopted, a situation that an inverse text Frequency Index (IDF) score is negative may occur, which affects the accuracy of the obtained answer vector. Therefore, to avoid this, in one implementation scenario, calculating the vector inner product of the student answer sentence and the standard answer sentence in the standard answer text comprises:
vectorizing the student answer sentences to obtain student answer vectors;
vectorizing each standard answer sentence to obtain a standard answer vector;
and traversing each standard answer vector, and calculating the vector inner product of the student answer vector and the traversed standard answer vector to obtain the vector inner product of the student answer sentence and the standard answer sentence in the standard answer text.
Optionally, in a specific implementation scenario, vectorizing the student answer sentence to obtain a student answer vector includes: carrying out word segmentation on the student answer sentence to obtain a student answer word set; calculating a Term Frequency (TF) score, an IDF score and a word vector of each word in the student answer word set, and calculating to obtain each word answer vector according to the Term Frequency score, the IDF score and the word vector of each word; and carrying out normalization processing on the answer vector of each word to obtain a normalized answer vector of each word, and obtaining a student answer vector according to the normalized answer vector of each word.
In this embodiment, the manner of performing word segmentation processing on the student answer sentence may be the same as that in the above embodiment, and other word segmentation processing manners may also be adopted, which are not limited herein. The word vectors of the words are determined in a similar manner to the above embodiments, and are not limited herein.
In this embodiment, the calculation formula of the word answer vector is:
the term answer vector is the term TF score and the term IDF score is the term word vector.
Wherein, the term TF score is the term frequency/the total term frequency of all terms in the sentence. The word frequency determination method may be as follows: the number of times a word occurs in a student answer word set is determined. The determination method of the total word frequency of all words in the sentence can be as follows: determining the word frequency of each word in the student word set, and synthesizing the word frequency of each word to obtain the total word frequency of all words in the sentence.
In this embodiment, the inverse text frequency index score of each word in the student answer keyword set is calculated according to the following formula:
d is the total number of documents in a pre-established standard answer text library, wf is the total number of documents including words in the standard answer text library, C is a constant and is larger than or equal to 2.
In this embodiment, in order to make the word IDF score more accurate, a standard answer text library may be established according to all standard answer texts of all questions.
Optionally, calculating an inverse text frequency index score of each word in the student answer word set according to the standard answer text library and an inverse text frequency index score calculation formula includes: performing word segmentation on all standard answer sentences in the standard answer text library, then counting the total number of documents including words in the standard answer text library to be wf, and then calculating the IDF score of each word according to an inverse text frequency index score calculation formula.
In this embodiment, the method for vectorizing the standard answer sentence to obtain the standard answer vector is similar to the method for vectorizing the student answer sentence to obtain the student answer vector, and is not described herein again.
In this embodiment, when the IDF score of each word is calculated according to the existing inverse text frequency index score calculation formula, it may be caused that the obtained student answer vector or standard answer vector has a negative number, and it may occur that the actual vector inner product of the standard answer text and the student answer text satisfies the vector inner product determination condition, but since the student answer vector or standard answer vector has a negative number, the determination result is that the vector inner product does not satisfy the vector inner product determination condition, which affects the accuracy of the intelligent answer sheet.
In this embodiment, in order to calculate the vector inner product according to the standard answer vector and the student answer vector and improve the accuracy of the vector inner product, data normalization processing may be performed on the standard answer vector and the student answer vector. The specific process of normalization processing includes dividing the word vector of each student answer by the word vector modular length of the student answer to obtain a normalized answer vector after normalization processing, and the normalization processing of the word answer vector of the standard answer is consistent with the normalization processing of the word answer vector of the student answer, and is not repeated here.
It should be noted that, in other embodiments, other methods may be adopted to perform vectorization processing on the student answer sentences and the standard answer sentences, which is not limited herein.
In this embodiment, the student answer vector is obtained according to the normalized answer vector of each word, and the student answer vector can be obtained by directly adding the normalized answer vectors of each word, or the student answer vector can be obtained by performing weighted average on the normalized answer vectors of each word.
And step S407, obtaining an effective answer point collection set of the student according to the result data of the each vector inner product and the answer point collection set of the student.
In this embodiment, according to the result data of each vector inner product, a first student answer sentence is obtained, where a standard answer sentence exists in the standard answer text, so that the vector inner product is not less than a vector inner product threshold, and it is determined whether a first student answer sentence exists in the student's answer point collection, if not, the first student answer sentence is added to the student's answer point collection, so as to obtain a valid answer point collection of the student.
Step S408, according to the score corresponding to the standard answer sentence corresponding to each student answer sentence in the effective answer point collection set, obtaining the student score of the question.
In this embodiment, the scores corresponding to the standard answer sentences corresponding to all the student answer sentences in the effective answer point collection can be directly added to obtain the student scores of the questions, and certainly, the weight can be set for the score corresponding to each standard answer sentence, and the student scores of the questions can be obtained by performing weighted average on the scores corresponding to the standard answer sentences corresponding to the student answer sentences, which is not limited here.
Fig. 5 is a schematic structural diagram of a fifth exemplary embodiment of the present application. As shown in fig. 5, includes:
the obtaining unit 501 is configured to obtain a standard text and a text to be read corresponding to a title, where the standard text includes at least one standard field, and the text to be read includes at least one field to be read.
In this embodiment, a corresponding standard text is preset for a question on a test paper as a reference answer during reading. For the subjective question, there is no fixed reference answer, so 1,2 or more reference answers may be set, and the corresponding standard texts may also be 1,2 or more reference answers, which is not limited herein. In addition, in consideration of the fact that the standard texts of the subject matter may have many discussions, each standard text includes at least one standard field.
The standard text is not limited in form, and may be, for example, a word or a picture, or may be a combination of a word and a picture.
In this embodiment, the manner of obtaining the standard text is not limited here, and the corresponding relationship between the standard text and the title may be established, and the standard text and the corresponding relationship are stored. When marking the paper, firstly determining the title, and then determining the standard text corresponding to the title according to the corresponding relation between the title and the standard text.
Optionally, if the standard text and the corresponding relationship between the standard text and the question need to be stored before going through the examination paper each time, the preparation work before going through the examination paper is complicated, and in order to avoid this, the obtaining unit 501 is further configured to establish a standard text library, and store all the standard texts and the corresponding relationship between the standard text and the question of all the questions into the standard text library. When the scoring point of the question is determined, the standard text corresponding to the question is determined in the standard text library, so that the preparation work before scoring is reduced, and the scoring efficiency is improved.
In this embodiment, the questions on the test paper also correspond to the answering contents of the students, i.e., the text to be read. Also, considering that many discussions may be made in answering the subject question, each of the texts to be read includes at least one field to be read. It should be noted that the form of the text to be read is not limited, and may be, for example, a character, a picture, or a combination of a picture and a character.
In this embodiment, the manner of obtaining the text to be read is not limited, and before the examination paper is read, the written answers of the students for the questions may be scanned and stored as the text to be read (which may be referred to as offline acquisition), or the written answers of the students may be answered directly by the computer and stored as the text to be read (which may be referred to as online acquisition).
It should be noted that the field may be a sentence or a word, and may also be defined by itself according to the requirement of paper marking, which is not limited herein. When the field is a sentence, different sentences may be divided according to punctuations in the standard text or the text to be read, for example, characters between two periods may be divided into one sentence, or characters between any two punctuations may be divided into one sentence, which is not limited herein.
The similarity calculation unit 502 is configured to calculate, for each field to be read in the text to be read, semantic similarity between the field to be read and a standard field in the standard text, and obtain corresponding semantic similarity result data.
In this embodiment, the semantic similarity obtained by calculation for each field to be read is the semantic similarity between the field to be read and each standard field in the standard text corresponding to the question. It should be noted that various methods may be adopted to calculate the semantic similarity, which is not limited herein as long as the semantic similarity can be obtained. For example, the semantic similarity may be obtained by calculating the euclidean distance between the standard field and the field to be read, or the semantic similarity may be obtained by calculating the pearson distance between the standard text and the text to be read.
In addition, in the process of marking, when calculating the semantic similarity between the field to be read and the standard field in the standard text, the semantic similarity between each field to be read and the standard field can be judged one by one, and the semantic similarity between all the fields to be read and the standard field can also be judged at the same time, which is not limited herein.
In this embodiment, it is considered that although the student can answer the correct answer, the text to be read and the standard text may not be completely consistent because the language logic of the text to be read and the standard text may be different. Therefore, when the examination paper is intelligently read, if the examination paper is directly read by judging whether the characters in the text to be read are consistent with the characters in the standard text, the accuracy of the examination paper is influenced. Therefore, whether the two values are consistent or not is determined by calculating the semantic similarity of the text to be read and the standard text, so that the score of the text to be read is determined, and the accuracy of score calculation of the text to be read can be improved.
A scoring point determining unit 503, configured to obtain a scoring point set of the text to be read according to each semantic similarity result data.
In this embodiment, whether the field to be read is in fit with the standard field or not can be judged according to the semantic similarity result data, if yes, the field to be read is added to the division point set of the text to be read, and if not, the field to be read is not added to the division point set of the text to be read. The obtained scoring point set of the text to be read comprises the field to be read matched with the standard field.
The scoring unit 504 is configured to obtain a score of the text to be read of the subject according to a score corresponding to a standard field corresponding to each field to be read in the score collection of the text to be read.
In this embodiment, the scores of the text to be read of the subject can be obtained by directly adding the scores corresponding to the standard fields corresponding to all the fields to be read in the score collection set of the text to be read, and of course, a weight can be set for the score corresponding to each standard field, and the score of the text to be read of the subject can be obtained by performing weighted average on the scores corresponding to the standard fields corresponding to the fields to be read, which is not limited here.
According to the intelligent marking device, the standard text and the text to be read corresponding to the subject are obtained, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read; calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read to obtain corresponding semantic similarity result data; obtaining a collection of points of the text to be read according to the semantic similarity result data; the score of the subject text to be read is obtained according to the score corresponding to the standard field corresponding to each field to be read in the collection point set of the text to be read, so that intelligent paper reading is realized, time and labor are saved, paper reading efficiency is improved, the influence of human subjective factors on examination results in the paper reading process is reduced, and objective fairness and accuracy of paper reading are guaranteed.
Fig. 6 is a schematic structural diagram of an intelligent scrolling device in the sixth embodiment of the application. As shown in fig. 6, includes:
the obtaining unit 601 is used for a standard answer text and a student answer text corresponding to the question.
In this embodiment, the standard text is a standard answer text, and the text to be read is a student answer text. Correspondingly, the standard field in the standard text corresponds to a standard answer sentence, and the field to be read in the text to be read corresponds to a student answer sentence.
The similarity calculating unit 602 is configured to calculate, for each student answer sentence in the student answer text, a semantic similarity between the student answer sentence and a standard answer sentence in the standard answer text, so as to obtain corresponding semantic similarity result data.
In this embodiment, the similarity calculation unit 602 is further configured to perform word segmentation on the student answer sentence, and extract a student answer keyword set; traversing each standard answer sentence in the standard answer text, performing word segmentation processing on each traversed standard answer sentence, and extracting a standard answer keyword set corresponding to each standard answer sentence; calculating the similarity between the standard answer keyword set and the student answer keyword set; and determining semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text according to the similarity between the standard answer keyword set and the student answer keyword set.
In this embodiment, the similarity between the standard answer keyword set and the student answer keyword set can be obtained through the Jaccard similarity coefficient between the standard answer keyword set and the student answer keyword set. Among them, the Jaccard similarity coefficient (Jaccard similarity coefficient) is used to compare the similarity and difference between limited sample sets.
In this embodiment, before calculating the Jaccard similarity coefficient, the student answer sentence and the standard answer sentence are first subjected to word segmentation to obtain a student answer keyword set and a standard answer keyword set. In particular, word segmentation tools, such as THULAC, NLPIR, etc., may be employed for word segmentation processing. In order to increase the word segmentation processing speed and improve the word segmentation accuracy, a Jieba word segmentation tool is preferably used for word segmentation processing.
In this embodiment, the standard keyword set and the student keyword set are obtained by performing word segmentation processing on the standard answer sentence and the student answer sentence respectively through a word segmentation tool, instead of manually defining the standard keyword set and the student keyword set, so that the influence of human factors on an intelligent paper marking process is avoided, and the accuracy of intelligent paper marking is improved.
In this embodiment, when considering that the Jaccard similarity coefficients of the standard answer keyword set and the student answer keyword set are directly calculated, it is determined that the two keywords are similar only when the keywords in the standard answer keyword set and the student answer keyword set are completely consistent, but in an actual situation, only the semantics of the keywords in the student answer text are similar to those of the standard answer text.
So to avoid this, the similarity calculation unit 602 is further configured to build a first list of key-value pairs; adding a key value pair to each element in the first key value pair list according to the standard answer key words in the standard answer key word set and the student answer key words in the student answer key word set; calculating the cosine similarity of the ith element and each element behind the ith element in the first key-value pair list, and modifying the key-value pairs of the elements behind the ith element, which meet the cosine similarity threshold of the element, to obtain a second key-value pair list, wherein i is a positive integer; obtaining a standard answer key-value pair set and a student answer key-value pair set according to the second key-value pair list; and calculating the similarity of the standard answer key-value pair set and the student answer key-value pair set, namely calculating the Jaccard similarity coefficient of the standard answer key-value pair set and the student answer key-value pair set.
At this time, the similarity calculation unit 602 is further configured to determine semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text according to the similarity between the standard answer key-value pair set and the student answer key-value pair set. It should be noted that, in other embodiments, the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text may also be determined in other ways, which is not limited herein.
Optionally, the similarity calculation unit 602 is further configured to add, for the kth student answer keyword, a key-value pair including identification information of the student answer keyword and sequence information to the xth element in the first key-value pair list, where k is greater than or equal to 1 and less than or equal to m, and m is the number of the student answer keywords in the student answer keyword set;
aiming at the r standard answer keyword, adding a key value pair containing the identification information of the representation standard answer keyword and the sequence information to the y element in the first key value pair list, wherein r is more than or equal to 1 and less than or equal to n, and n is the number of the standard answer keywords in the standard answer keyword set;
wherein x is more than or equal to 1 and less than or equal to m, and m +1 is more than or equal to y; or y is more than or equal to 1 and less than or equal to n, and x, k, m, r and n are positive integers and are more than or equal to 1 and less than or equal to x, k, m and r.
In this embodiment, when adding a key value pair to each element in the first key value pair list, the similarity calculation unit 602 may add, according to the student answer keyword, a key value pair including identification information representing the student answer keyword to the 1 st element to the mth element in the first key value pair list in sequence, and then add, according to the standard answer keyword, a key value pair including identification information representing the standard answer keyword to the m +1 th element to the last element (i.e., the m + n th element) in sequence, where x is greater than or equal to 1 and less than or equal to m, and m +1 is less than or equal to y, that is, in the first key value pair list, the first m elements correspond to relevant fields of the student answer keyword, and the last n elements correspond to relevant fields of the standard answer keyword. Certainly, key value pairs containing identification information of the representation standard answer key words can be added to the 1 st element to the nth element in the first key value pair list in sequence according to the labeled answer key words, then key value pairs containing identification information of the representation standard answer key words are added to the (n + 1) th element to the last element (namely, the (m + n) th element) in the first key value pair list in sequence according to the student answer key words, at the moment, y is more than or equal to 1 and less than or equal to n, n +1 and less than or equal to x, namely, in the first key value pair list, the first n elements correspond to relevant fields of the standard answer key words, and the last m elements correspond to relevant fields of the student answer key words. . The specific order of addition is not limited herein.
Optionally, the similarity calculation unit 602 is further configured to modify, in elements subsequent to the ith element, sequence information of a key-value pair of an element whose cosine similarity with the ith element satisfies an element cosine similarity threshold to sequence information of a key-value pair of the ith element. It should be noted that, in other embodiments, a key-value pair may be added to each element in the first key-value pair list in other manners, which is not limited herein.
Optionally, the similarity calculation unit 602 is further configured to, when x is greater than or equal to 1 and less than or equal to m and m +1 is less than or equal to y, obtain, according to the representation student answer identification information included in the key-value pair, sequence information of the 1 st to m-th elements from the second key-value pair list to form a student answer key-value pair set, and obtain, according to the representation standard answer identification information included in the key-value pair, sequence information of the m +1 th to m + n-th elements from the second key-value pair list to form a standard answer key-value pair set; when y is larger than or equal to 1 and smaller than or equal to n and n +1 is smaller than or equal to x, obtaining sequence information of the 1 st to nth elements from the second key-value pair list according to the representation standard answer identification information contained in the key-value pair to form a standard answer key-value pair set, and obtaining sequence information of the n +1 th to n + m th elements from the second key-value pair list according to the representation student answer identification information contained in the key-value pair to form a student answer key-value pair set. It should be noted that, in other embodiments, the standard answer key-value pair set and the student answer key-value pair set may be obtained in other manners, which is not limited herein.
Optionally, the similarity calculation unit 602 is further configured to count the number of sequence information in the intersection of the standard answer key-value pair set and the student answer key-value pair set and the number of sequence information in the union of the standard answer key-value pair set and the student answer key-value pair set, where a ratio of the number of sequence information in the intersection of the standard answer key-value pair set and the student answer key-value pair set to the number of sequence information in the union of the standard answer key-value pair set and the student answer key-value pair set is the similarity between the standard answer key-value pair set and the student answer key-value pair set.
In this embodiment, after the similarity between the standard answer keyword set and the student answer keyword set is obtained through calculation, the similarity between the standard answer keyword set and the student answer keyword set may be directly used as the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text, and certainly, the similarity between the standard answer keyword set and the student answer keyword set may also be normalized or weighted and averaged to obtain the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text, which is not limited here.
And a point-to-point acquisition determining unit 603, configured to obtain an answer point-to-point acquisition set of the student according to each semantic similarity result data.
In this embodiment, the point collection determination unit 603 may add, according to each semantic similarity result data, a student answer sentence in which a standard answer sentence exists in the standard answer text and the semantic similarity is not less than the semantic similarity threshold to the student answer point collection to obtain the student answer point collection. It should be noted that the semantic similarity threshold may be set by itself, and is not limited herein.
The scoring unit 604 is configured to obtain the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to each student answer sentence in the student answer point collection.
In this embodiment, the step of the score point determining unit 603 and the scoring unit 604 are similar to the above score point determining unit 503 and the scoring unit 504, and are not described herein again.
FIG. 7 is a schematic structural diagram of a seventh exemplary embodiment of the present disclosure. The difference between this embodiment and the above embodiment is that in this embodiment, the sentence component similarity is also calculated for the student answer sentence and the standard answer sentence. As shown in fig. 7, includes:
the obtaining unit 701 is configured to obtain a standard answer text and a student answer text corresponding to a question.
The similarity calculation unit 702 is configured to calculate, for each student answer sentence in the student answer text, semantic similarity between the student answer sentence and a standard answer sentence in the standard answer text, so as to obtain corresponding semantic similarity result data.
And the semantic similar answer determining unit 703 is configured to obtain a semantic similar answer set of the student according to each semantic similarity result data.
In this embodiment, the semantic similar answer determining unit 703 may add, according to each semantic similarity result data, a student answer sentence in which a standard answer sentence exists in the standard answer text and the semantic similarity is not less than the semantic similarity threshold to the semantic similar answer set of the student, so as to obtain the semantic similar answer set of the student. It should be noted that the semantic similarity threshold may be set by itself, and is not limited herein.
The sentence component similarity calculation unit 704 is configured to calculate, for each student answer sentence in the semantic similar answer set of the student, a sentence component similarity between the student answer sentence and a standard answer sentence in the standard answer text, so as to obtain corresponding sentence component similarity result data.
In this embodiment, it is considered that if the student scores of the questions are directly obtained according to each student answer sentence of which the semantic similarity result data is not less than the semantic similarity threshold, a case of wrong judgment and missed judgment may occur, and the scores are inaccurate. In order to improve the accuracy of the score, the sentence component similarity calculation unit 704 may further calculate the sentence component similarity between the student answer sentence and the standard answer sentence in the standard answer text, determine whether the student answer sentence matches the standard answer sentence again according to the sentence component similarity result data, add the student answer sentence to the student answer point set if the student answer sentence matches the standard answer sentence, and not add the student answer sentence to the student answer point set if the student answer sentence does not match the standard answer sentence. The obtained student answer point collection comprises student answer sentences matched with the standard answer sentence semantic similarity and the sentence component similarity, so that the student scores of the questions obtained according to the student point collection are more accurate.
Optionally, the sentence component similarity calculation unit 704 is further configured to perform sentence component extraction processing on the student answer sentences in the semantic similar answer set of the students to obtain sentence components of the student answer sentences; traversing standard answer sentences corresponding to student answer sentences in the semantic similar answer set of the students, and performing sentence component extraction processing on each traversed standard answer sentence to obtain sentence components of each standard answer sentence; calculating the cosine similarity of sentence components of the student answer sentences and sentence components of corresponding standard answer sentences; and obtaining sentence component similarity result data according to the cosine similarity of each sentence component.
Correspondingly, the sentence component similarity calculation unit 704 is further configured to add the student answer sentence with the sentence component similarity not less than the sentence component similarity threshold to the student answer point collection according to the sentence component similarity result data, so as to obtain the student answer point collection.
In the present embodiment, it is considered that the sentence component of each sentence includes a single sentence component such as a subject, a predicate, an object, and a predicate of text. Therefore, when calculating the sentence component similarity, the cosine similarity of the sentence component of the subject of the standard answer sentence and the subject of the student answer sentence may be calculated first, and then the cosine similarity of the sentence component of the predicate of the standard answer sentence and the predicate of the student answer sentence may be calculated. And obtaining sentence component similarity result data according to the cosine similarity of each sentence component according to the sentence cosine similarity of the subject of the student answer sentence and the standard answer sentence, the cosine similarity of the sentence of the predicate and other sentence component cosine similarities.
In this embodiment, when calculating the cosine similarity between the subject of the student answer sentence and the sentence component of the subject of the standard answer sentence, the word vector of the subject of the student answer sentence and the word vector of the subject of the standard answer sentence may be determined first, and then the cosine similarity between the word vector of the subject of the student answer sentence and the cosine similarity between the word vector of the subject of the standard answer sentence may be calculated. The method for determining the word vector of the subject is the same as the method for determining the word vector of the keyword in the above embodiments, and details are not repeated here. The calculation methods of the similarity of other single sentence components are similar, and are not repeated here.
Optionally, the sentence component similarity calculation unit 704 is further configured to assign a certain component similarity score to each sentence component, and if the cosine similarity between one sentence component of the student answer sentence and the sentence component of the corresponding sentence component in the standard answer sentence is not less than the sentence component cosine similarity threshold, add the component similarity score corresponding to the sentence component to the student answer sentence, and obtain the sentence component similarity result data according to all the component similarity scores of the student answer sentence.
Optionally, if the sentence pattern type of the student answer sentence is not consistent with the sentence pattern type of the standard answer sentence, the similarity of the sentence components calculated according to the extracted student answer sentence components and the standard answer sentence components may be deviated, resulting in a false judgment. Therefore, in order to avoid this situation, the sentence component similarity calculation unit 704 is further configured to determine whether the student answer sentence and the standard answer sentence corresponding thereto have the same sentence pattern type before performing the sentence component extraction process; if not, carrying out sentence pattern conversion processing on the student answer sentence or the standard answer sentence, so that the student answer sentence is the same as the standard answer sentence in sentence pattern type.
And a point-to-point determining unit 705, configured to obtain a point-to-point set of answers of the student according to the result data of the similarity of each sentence component.
The scoring unit 706 is configured to obtain the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to each student answer sentence in the student answer scoring point set.
In this embodiment, the scoring point determining unit 705 and the scoring unit 706 are similar to those in the above embodiments, and are not described herein again.
In the embodiment, the semantic similarity and sentence component similarity between the student answer sentences in the answer collection point set and the standard answer sentences are not less than the semantic similarity threshold and sentence component similarity threshold, so that the accuracy of the student answer sentences in the answer collection point set is ensured, and the student scores of the questions are more accurate.
FIG. 8 is a schematic structural diagram of an intelligent scrolling device according to an eighth embodiment of the present application. The present embodiment is different from the above-mentioned embodiments in that the answer point collection is expanded according to the obtained standard answer text and student answer text. As shown in fig. 8, includes:
the obtaining unit 801 is configured to obtain a standard answer text and a student answer text corresponding to a topic.
The similarity calculation unit 802 is configured to calculate, for each student answer sentence in the student answer text, semantic similarity between the student answer sentence and a standard answer sentence in the standard answer text, and obtain corresponding semantic similarity result data.
And the semantic similar answer determining unit 803 is configured to obtain a semantic similar answer set of the student according to each semantic similarity result data.
The sentence component similarity calculation unit 804 is configured to calculate, for each student answer sentence in the semantic similar answer set of the student, a sentence component similarity between the student answer sentence and a standard answer sentence in the standard answer text, and obtain corresponding sentence component similarity result data.
And a point-to-point determining unit 805 configured to obtain a point-to-point set of answers from the students according to the result data of the similarity of each sentence component.
In this embodiment, the obtaining unit 801, the similarity calculating unit 802, the semantic similarity answer determining unit 803, the sentence component similarity calculating unit 804, and the segmentation point determining unit 805 are similar to those of the above embodiments, and are not described herein again.
The vector inner product calculating unit 806 is configured to calculate, for each student answer sentence in the student answer text, a vector inner product of the student answer sentence and a standard answer sentence in the standard answer text, so as to obtain corresponding vector inner product result data.
In this embodiment, it is considered that errors generally exist in the process of screening student answer sentences by calculating semantic similarity between the student answer sentences and the standard answer sentences, some correct student answer sentences are omitted, and errors generally exist in the process of screening student answer sentences by calculating sentence component similarity between the student answer sentences and the standard answer sentences, and some correct student answer sentences are also omitted. Therefore, in order to avoid the above situation, the vector inner product calculating unit 806 calculates the vector inner product of the original student answer sentence and the standard answer sentence of the standard answer sentence, and the answer collection point set is expanded according to the vector inner product result data, so that the error accumulation is reduced, and the accuracy of intelligent scoring is improved.
In this embodiment, the method of obtaining the student answer vector and the standard answer vector is not limited herein, such as the TF-IDF method. However, when the TF-IDF method is adopted, a situation that an inverse text Frequency Index (IDF) score is negative may occur, which affects the accuracy of the obtained answer vector. Therefore, in order to avoid this situation, in a specific implementation scenario, the vector inner product calculation unit 806 is further configured to perform vectorization processing on the student answer sentence to obtain a student answer vector; vectorizing each standard answer sentence to obtain a standard answer vector; and traversing each standard answer vector, and calculating the vector inner product of the student answer vector and the traversed standard answer vector to obtain the vector inner product of the student answer sentence and the standard answer sentence in the standard answer text.
Optionally, the vector inner product calculating unit 806 is further configured to perform word segmentation on the student answer sentence to obtain a student answer word set; calculating a Term Frequency (TF) score, an IDF score and a word vector of each word in the student answer word set, and calculating to obtain each word answer vector according to the Term Frequency score, the IDF score and the word vector of each word; and carrying out normalization processing on the answer vector of each word to obtain a normalized answer vector of each word, and obtaining a student answer vector according to the normalized answer vector of each word.
In this embodiment, the calculation formula of the word answer vector is:
the term answer vector is the term TF score and the term IDF score is the term word vector.
Wherein, the term TF score is the term frequency/the total term frequency of all terms in the sentence. The word frequency determination method may be as follows: the number of times a word occurs in a student answer word set is determined. The determination method of the total word frequency of all words in the sentence can be as follows: determining the word frequency of each word in the student word set, and synthesizing the word frequency of each word to obtain the total word frequency of all words in the sentence.
In this embodiment, the inverse text frequency index score of each word in the student answer keyword set is calculated according to the following formula:
d is the total number of documents in a pre-established standard answer text library, wf is the total number of documents including words in the standard answer text library, C is a constant and is larger than or equal to 2.
In this embodiment, in order to make the word IDF score more accurate, a standard answer text library may be established according to all standard answer texts of all questions.
Optionally, the vector inner product calculating unit 806 is further configured to perform word segmentation on all standard answer sentences in the standard answer text library, count the total number of documents including words in the standard answer text library as wf, and calculate the IDF score of each word according to an inverse text frequency index score calculation formula.
In this embodiment, when the IDF score of each word is calculated according to the existing inverse text frequency index score calculation formula, it may be caused that the obtained student answer vector or standard answer vector has a negative number, and it may occur that the actual vector inner product of the standard answer text and the student answer text satisfies the vector inner product determination condition, but since the student answer vector or standard answer vector has a negative number, the determination result is that the vector inner product does not satisfy the vector inner product determination condition, which affects the accuracy of the intelligent answer sheet.
In this embodiment, in order to calculate the vector inner product according to the standard answer vector and the student answer vector and improve the precision of the vector inner product, the vector inner product calculating unit 806 is further configured to perform data normalization processing on the standard answer vector and the student answer vector. The specific process of normalization processing includes dividing the word vector of each student answer by the word vector modular length of the student answer to obtain a normalized answer vector after normalization processing, and the normalization processing of the word answer vector of the standard answer is consistent with the normalization processing of the word answer vector of the student answer, and is not repeated here.
The effective answer point-collecting determining unit 807 is configured to obtain an effective answer point-collecting set of the student according to the result data of the each vector inner product and the answer point-collecting set of the student.
In this embodiment, according to the result data of each vector inner product, a first student answer sentence is obtained, where a standard answer sentence exists in the standard answer text, so that the vector inner product is not less than a vector inner product threshold, and it is determined whether a first student answer sentence exists in the student's answer point collection, if not, the first student answer sentence is added to the student's answer point collection, so as to obtain a valid answer point collection of the student.
The scoring unit 808 is configured to obtain the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to each student answer sentence in the effective answer point collection set.
In this embodiment, the scores corresponding to the standard answer sentences corresponding to all the student answer sentences in the effective answer point collection can be directly added to obtain the student scores of the questions, and certainly, the weight can be set for the score corresponding to each standard answer sentence, and the student scores of the questions can be obtained by performing weighted average on the scores corresponding to the standard answer sentences corresponding to the student answer sentences, which is not limited here.
Fig. 9 is a schematic structural diagram of an electronic device in a ninth embodiment of the present application. As shown in fig. 9, includes:
one or more processors 901;
a storage 902, which may be configured to store one or more programs,
when executed by one or more processors, cause the one or more processors to implement a method of flow restriction as in any of the embodiments described above.
Fig. 10 is a hardware structure of an electronic device according to a tenth embodiment of the present application; as shown in fig. 10, the hardware structure of the electronic device may include: a processor 1001, a communication interface 1002, a computer-readable storage medium 1003, and a communication bus 1004;
wherein the processor 1001, the communication interface 1002, and the computer-readable storage medium 1003 complete communication with each other through the communication bus 1004;
optionally, the communication interface 1002 may be an interface of a communication module, such as an interface of a GSM module; the processor 1001 may be specifically configured to: acquiring a standard answer text and a student answer text corresponding to the question; calculating the semantic similarity between the student answer sentence and a standard answer sentence in the standard answer text aiming at each student answer sentence in the student answer text to obtain corresponding semantic similarity result data; obtaining an answer collecting point set of the student according to the result data of each semantic similarity; and according to the scores corresponding to the standard answer sentences corresponding to each student answer sentence in the student answer point collection, obtaining the student scores of the questions.
The Processor 1001 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In the above embodiment, the electronic device may be a front-end intelligent terminal, or may be a background server, and when the electronic device is a front-end intelligent terminal, the electronic device is an intelligent household appliance. The appliance may include at least one of the following, for example: televisions, Digital Video Disc (DVD) players, audio devices, refrigerators, air conditioners, vacuum cleaners, ovens, microwave ovens, washing machines, air purifiers, set-top boxes, home automation control panels, security control panels, television boxes, game consoles, electronic dictionaries, electronic keys, camcorders, and electronic photo frames.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code configured to perform the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable storage medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access storage media (RAM), a read-only storage media (ROM), an erasable programmable read-only storage media (EPROM or flash memory), an optical fiber, a portable compact disc read-only storage media (CD-ROM), an optical storage media piece, a magnetic storage media piece, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code configured to carry out operations for the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may operate over any of a variety of networks: including a Local Area Network (LAN) or a Wide Area Network (WAN) -to the user's computer, or alternatively, to an external computer (e.g., through the internet using an internet service provider).
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: the processor comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a standard answer text and a student answer text corresponding to a question; the similarity calculation unit is used for calculating the semantic similarity between each student answer sentence in the student answer text and the standard answer sentence in the standard answer text to obtain corresponding semantic similarity result data; the collection point determining unit is used for obtaining an answer collection point set of the student according to the result data of each semantic similarity; and the scoring unit is used for obtaining the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to the student answer sentences in the student answer point collection. For example, the acquisition unit may also be described as a "unit for acquiring a standard answer text and a student answer text corresponding to a topic".
As another aspect, the present application also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method as described in any of the embodiments above.
As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable storage medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a standard answer text and a student answer text corresponding to the question; calculating the semantic similarity between the student answer sentence and a standard answer sentence in the standard answer text aiming at each student answer sentence in the student answer text to obtain corresponding semantic similarity result data; obtaining an answer collecting point set of the student according to the result data of each semantic similarity; and according to the scores corresponding to the standard answer sentences corresponding to each student answer sentence in the student answer point collection, obtaining the student scores of the questions.
The term "module" or "functional unit" as used herein may mean, for example, a unit including hardware, software, and firmware, or a unit including a combination of two or more of hardware, software, and firmware. A "module" may be used interchangeably with the terms "unit," "logic block," "component," or "circuit," for example. A "module" or "functional unit" may be a minimal unit of an integrated component element or a portion of an integrated component element. A "module" may be the smallest unit or part thereof for performing one or more functions. A "module" or "functional unit" may be implemented mechanically or electrically. For example, a "module" or "functional unit" according to the present disclosure may include at least one of: application Specific Integrated Circuit (ASIC) chips, Field Programmable Gate Arrays (FPGAs), and programmable logic devices known or later developed to perform operations.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.
Claims (20)
1. An intelligent scoring method is characterized by comprising the following steps:
acquiring a standard text and a text to be read corresponding to a title, wherein the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read;
calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read to obtain corresponding semantic similarity result data;
obtaining a collection of points of the text to be read according to the semantic similarity result data;
and obtaining the score of the text to be read of the subject according to the score corresponding to the standard field corresponding to each field to be read in the score collection set of the text to be read.
2. The method according to claim 1, wherein the standard text is a standard answer text, the text to be read is a student answer text, the standard field is a standard answer sentence, and the field to be read is a student answer sentence;
the method comprises the following steps: calculating the semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text aiming at each student answer sentence in the student answer text to obtain corresponding semantic similarity result data;
obtaining an answer point collection set of the student according to the semantic similarity result data;
and obtaining the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to the student answer sentences in the student answer point collection set.
3. The method according to claim 2, wherein the calculating the semantic similarity of the student answer sentence to the standard answer sentence in the standard answer text comprises:
performing word segmentation processing on the student answer sentence, and extracting a student answer keyword set;
traversing each standard answer sentence in the standard answer text, performing word segmentation processing on each traversed standard answer sentence, and extracting a standard answer keyword set corresponding to each standard answer sentence;
calculating the similarity between the standard answer keyword set and the student answer keyword set;
and determining semantic similarity between the student answer sentence and the standard answer sentence in the standard answer text according to the similarity between the standard answer keyword set and the student answer keyword set.
4. The method of claim 2, wherein obtaining a set of student answer scoring points according to each semantic similarity result data comprises:
according to the semantic similarity result data, adding the student answer sentences of which the semantic similarity is not less than a semantic similarity threshold value in the standard answer texts into an answer point collection of the student to obtain the answer point collection of the student.
5. The method according to claim 3, wherein the calculating the similarity between the standard answer keyword set and the student answer keyword set comprises:
establishing a first key-value pair list;
adding a key value pair to each element in the first key value pair list according to the standard answer key words in the standard answer key word set and the student answer key words in the student answer key word set;
calculating the cosine similarity of the ith element and each element behind the ith element in the first key-value pair list, and modifying the key-value pairs of the elements behind the ith element, which meet an element cosine similarity threshold with the cosine similarity of the element of the ith element, to obtain a second key-value pair list, wherein i is a positive integer;
obtaining a standard answer key-value pair set and a student answer key-value pair set according to the second key-value pair list;
and calculating the similarity of the standard answer key-value pair set and the student answer key-value pair set.
6. The method of claim 5, wherein adding a key-value pair to each element in the first key-value pair list according to a standard answer keyword in the set of standard answer keywords and a student answer keyword in the set of student answer keywords comprises:
aiming at the kth student answer keyword, adding a key value pair containing identification information representing the student answer keyword and sequence information to the xth element in the first key value pair list, wherein k is more than or equal to 1 and is less than or equal to m, and m is the number of the student answer keywords in the student answer keyword set;
aiming at the r standard answer keyword, adding a key value pair containing the identification information of the representation standard answer keyword and sequence information to the y element in the first key value pair list, wherein r is more than or equal to 1 and less than or equal to n, and n is the number of the standard answer keywords in the standard answer keyword set;
wherein x is more than or equal to 1 and less than or equal to m, and m +1 is more than or equal to y; or y is more than or equal to 1 and less than or equal to n, and x, k, m, r and n are positive integers and are more than or equal to 1 and less than or equal to x, k, m and r.
7. The method of claim 6, wherein the modifying the key-value pairs of the elements subsequent to the i-th element whose element cosine similarity to the i-th element satisfies an element cosine similarity threshold comprises: and modifying the sequence information of the key value pair of the element, the cosine similarity of which with the element of the ith element meets the element cosine similarity threshold, of the elements behind the ith element into the sequence information of the key value pair of the ith element.
8. The method according to claim 6, wherein obtaining a set of standard answer key-value pairs and a set of student answer key-value pairs from the second list of key-value pairs comprises: when x is larger than or equal to 1 and smaller than or equal to m and m +1 is smaller than or equal to y, according to the representation student answer identification information contained in the key value pair, obtaining sequence information of 1 st to m th elements from the second key value pair list to form a student answer key value pair set, and according to the representation standard answer identification information contained in the key value pair, obtaining sequence information of m +1 th to m + n th elements from the second key value pair list to form a standard answer key value pair set;
when y is larger than or equal to 1 and smaller than or equal to n and n +1 is smaller than or equal to x, according to the representation standard answer identification information contained in the key value pair, sequence information of the 1 st to nth elements is obtained from the second key value pair list to form the standard answer key value pair set, and according to the representation student answer identification information contained in the key value pair, sequence information of the n +1 th to n + m th elements is obtained from the second key value pair list to form the student answer key value pair set.
9. The method of claim 8, wherein the calculating the similarity of the set of standard answer key-value pairs and the set of student answer key-value pairs comprises:
the ratio of the number of the sequence information in the intersection of the standard answer key-value pair set and the student answer key-value pair set to the number of the sequence information in the union of the standard answer key-value pair set and the student answer key-value pair set is the similarity of the standard answer key-value pair set and the student answer key-value pair set.
10. The method of claim 2, wherein obtaining a set of student answer scoring points according to each semantic similarity result data comprises:
obtaining a semantic similar answer set of the student according to the semantic similarity result data;
calculating sentence component similarity between the student answer sentence and the standard answer sentence in the standard answer text aiming at each student answer sentence in the semantic similar answer set of the student to obtain corresponding sentence component similarity result data;
and obtaining an answer scoring point set of the student according to the sentence component similarity result data.
11. The method according to claim 10, wherein the calculating the sentence component similarity between the student answer sentence in the semantic similar answer set of the student and the standard answer sentence in the standard answer text comprises:
carrying out sentence component extraction processing on student answer sentences in the semantic similar answer set of the students to obtain sentence components of the student answer sentences;
traversing the standard answer sentences corresponding to the student answer sentences in the semantic similar answer set of the student, and performing sentence component extraction processing on each traversed standard answer sentence to obtain a sentence component of each standard answer sentence;
calculating the cosine similarity of the sentence components of the student answer sentence and the sentence components of the corresponding standard answer sentence;
obtaining sentence component similarity result data according to the cosine similarity of each sentence component;
the obtaining of the answer scoring point set of the student according to the sentence component similarity result data comprises:
and adding the student answer sentences of which the sentence component similarity is not less than a sentence component similarity threshold value into an answer scoring point set of the student according to the sentence component similarity result data to obtain the answer scoring point set of the student.
12. The method of claim 11, wherein obtaining the sentence component similarity result data based on the cosine similarity of each sentence component comprises:
and giving each sentence component a certain component similarity score, if the cosine similarity between one sentence component of the student answer sentence and the sentence component of the corresponding sentence component in the standard answer sentence is not less than the cosine similarity threshold of the sentence component, adding the component similarity score corresponding to the sentence component to the student answer sentence, and obtaining the sentence component similarity result data according to all the component similarity scores of the student answer sentence.
13. The method according to claim 11, wherein the sentence component extraction processing for each of the standard answer sentence and the student answer sentence is preceded by:
judging whether the sentence pattern types of the student answer sentence and the standard answer sentence are the same;
if not, carrying out sentence pattern conversion processing on the student answer sentence or the standard answer sentence, so that the student answer sentence and the standard answer sentence have the same sentence pattern type.
14. The method according to any one of claims 1-13, further comprising:
calculating a vector inner product of the student answer sentence and the standard answer sentence in the standard answer text aiming at each student answer sentence in the student answer text to obtain corresponding vector inner product result data;
obtaining an effective answer point collection set of the student according to the vector inner product result data and the answer point collection set of the student;
correspondingly, obtaining the student score of the question according to the score corresponding to the standard answer sentence corresponding to each student answer sentence in the student answer point collection set comprises:
and obtaining the student scores of the questions according to the scores corresponding to the standard answer sentences corresponding to each student answer sentence in the effective answer point collection set.
15. The method of claim 14, wherein the calculating the vector inner product of the student answer sentence and the standard answer sentence in the standard answer text comprises:
vectorizing the student answer sentences to obtain student answer vectors;
vectorizing each standard answer sentence to obtain a standard answer vector;
traversing each standard answer vector, and calculating the vector inner product of the student answer vector and the traversed standard answer vector to obtain the vector inner product of the student answer sentence and the standard answer sentence in the standard answer text;
the obtaining of the effective answer point collection set of the student according to the vector inner product result data and the answer point collection set of the student comprises:
and according to the result data of each vector inner product, acquiring a first student answer sentence of which the vector inner product is not smaller than a vector inner product threshold value due to the existence of the standard answer sentence in the standard answer text, judging whether the first student answer sentence exists in an answer point collection of the student or not, and if not, adding the first student answer sentence into the answer point collection of the student to obtain an effective answer point collection of the student.
16. The method according to claim 15, wherein the vectorizing the student answer sentence in the student answer text to obtain a student answer vector comprises:
performing word segmentation processing on the student answer sentence to obtain a student answer word set;
calculating the word frequency fraction, the inverse text frequency index fraction and the word vector of each word in the student answer word set, and calculating to obtain each word answer vector according to the word frequency fraction, the inverse text frequency index fraction and the word vector of each word;
and carrying out normalization processing on each word answer vector to obtain a normalization answer vector of each word, and obtaining the student answer vector according to the normalization answer vector of each word.
17. The method of claim 16, wherein the inverse text frequency index score for each term in the set of student answer keywords is calculated according to the formula:
d is the total number of documents in a pre-established standard answer text library, wf is the total number of documents including the words in the standard answer text library, C is a constant and is more than or equal to 2.
18. An intelligent scoring device, comprising:
the reading device comprises an acquisition unit, a reading unit and a display unit, wherein the acquisition unit is used for acquiring a standard text and a text to be read corresponding to a title, the standard text comprises at least one standard field, and the text to be read comprises at least one field to be read;
the similarity calculation unit is used for calculating the semantic similarity between the field to be read and the standard field in the standard text aiming at each field to be read in the text to be read so as to obtain corresponding semantic similarity result data;
the sampling point determining unit is used for obtaining a sampling point set of the text to be read according to the semantic similarity result data;
and the scoring unit is used for obtaining the score of the text to be read of the subject according to the score corresponding to the standard field corresponding to each field to be read in the score collection of the text to be read.
19. An electronic device, comprising:
one or more processors;
a storage configured to store one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-17.
20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-17.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911012221.XA CN112700203B (en) | 2019-10-23 | 2019-10-23 | Intelligent marking method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911012221.XA CN112700203B (en) | 2019-10-23 | 2019-10-23 | Intelligent marking method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112700203A true CN112700203A (en) | 2021-04-23 |
CN112700203B CN112700203B (en) | 2022-11-01 |
Family
ID=75505040
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911012221.XA Active CN112700203B (en) | 2019-10-23 | 2019-10-23 | Intelligent marking method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112700203B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113627722A (en) * | 2021-07-02 | 2021-11-09 | 湖北美和易思教育科技有限公司 | Simple answer scoring method based on keyword segmentation, terminal and readable storage medium |
CN113822040A (en) * | 2021-08-06 | 2021-12-21 | 深圳市卓帆技术有限公司 | Subjective question marking and scoring method and device, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170206897A1 (en) * | 2016-01-18 | 2017-07-20 | Alibaba Group Holding Limited | Analyzing textual data |
US20170262738A1 (en) * | 2014-09-16 | 2017-09-14 | Iflytek Co., Ltd. | Intelligent scoring method and system for text objective question |
US9940367B1 (en) * | 2014-08-13 | 2018-04-10 | Google Llc | Scoring candidate answer passages |
CN110196893A (en) * | 2019-05-05 | 2019-09-03 | 平安科技(深圳)有限公司 | Non- subjective item method to go over files, device and storage medium based on text similarity |
-
2019
- 2019-10-23 CN CN201911012221.XA patent/CN112700203B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9940367B1 (en) * | 2014-08-13 | 2018-04-10 | Google Llc | Scoring candidate answer passages |
US20170262738A1 (en) * | 2014-09-16 | 2017-09-14 | Iflytek Co., Ltd. | Intelligent scoring method and system for text objective question |
US20170206897A1 (en) * | 2016-01-18 | 2017-07-20 | Alibaba Group Holding Limited | Analyzing textual data |
CN110196893A (en) * | 2019-05-05 | 2019-09-03 | 平安科技(深圳)有限公司 | Non- subjective item method to go over files, device and storage medium based on text similarity |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113627722A (en) * | 2021-07-02 | 2021-11-09 | 湖北美和易思教育科技有限公司 | Simple answer scoring method based on keyword segmentation, terminal and readable storage medium |
CN113627722B (en) * | 2021-07-02 | 2024-04-02 | 湖北美和易思教育科技有限公司 | Simple answer scoring method based on keyword segmentation, terminal and readable storage medium |
CN113822040A (en) * | 2021-08-06 | 2021-12-21 | 深圳市卓帆技术有限公司 | Subjective question marking and scoring method and device, computer equipment and storage medium |
CN113822040B (en) * | 2021-08-06 | 2024-07-02 | 深圳市卓帆技术有限公司 | Subjective question scoring method, subjective question scoring device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112700203B (en) | 2022-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108073568B (en) | Keyword extraction method and device | |
CN108304375B (en) | Information identification method and equipment, storage medium and terminal thereof | |
CN109815314B (en) | Intent recognition method, recognition device and computer readable storage medium | |
CN108108426B (en) | Understanding method and device for natural language question and electronic equipment | |
CN109299280B (en) | Short text clustering analysis method and device and terminal equipment | |
CN107436864A (en) | A kind of Chinese question and answer semantic similarity calculation method based on Word2Vec | |
CN106570109B (en) | Method for automatically generating question bank knowledge points through text analysis | |
CN109359290B (en) | Knowledge point determining method of test question text, electronic equipment and storage medium | |
US20160170993A1 (en) | System and method for ranking news feeds | |
CN111369980B (en) | Voice detection method, device, electronic equipment and storage medium | |
CN110263854A (en) | Live streaming label determines method, apparatus and storage medium | |
CN113486664A (en) | Text data visualization analysis method, device, equipment and storage medium | |
CN112700203B (en) | Intelligent marking method and device | |
CN109522396B (en) | Knowledge processing method and system for national defense science and technology field | |
CN116402166B (en) | Training method and device of prediction model, electronic equipment and storage medium | |
CN109298796B (en) | Word association method and device | |
CN113656575B (en) | Training data generation method and device, electronic equipment and readable medium | |
CN116151220A (en) | Word segmentation model training method, word segmentation processing method and device | |
CN110069772B (en) | Device, method and storage medium for predicting scoring of question-answer content | |
CN112541069A (en) | Text matching method, system, terminal and storage medium combined with keywords | |
CN110096708B (en) | Calibration set determining method and device | |
CN114842982B (en) | Knowledge expression method, device and system for medical information system | |
CN108763208B (en) | Topic information acquisition method, topic information acquisition device, server and computer-readable storage medium | |
CN108573025B (en) | Method and device for extracting sentence classification characteristics based on mixed template | |
CN116719950A (en) | Intelligent question-answering method and system based on knowledge graph sub-graph retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |