CN116629270A

CN116629270A - Subjective question scoring method and device based on examination big data and text semantics

Info

Publication number: CN116629270A
Application number: CN202310690851.2A
Authority: CN
Inventors: 马赫; 董淑娟; 倪小明; 郭南明; 杜育林; 刘佳荣; 洪潜凯
Original assignee: Guangzhou Nanfang Human Resources Evaluation Center Co ltd
Current assignee: Guangzhou Nanfang Human Resources Evaluation Center Co ltd
Priority date: 2023-06-12
Filing date: 2023-06-12
Publication date: 2023-08-22
Anticipated expiration: 2043-06-12
Also published as: CN116629270B

Abstract

The invention requests to protect subjective question scoring method and device based on examination big data and text semantics, and carries out data processing on answer content of input subjective questions, including removing special characters, word segmentation and word segmentation; mapping the answer content of the input subjective questions after data processing into a numerical sequence; inputting the numerical sequence of the answer content of the input subjective questions into a feature extraction model, and collecting feature vectors of the answer content of the input subjective questions; and carrying out semantic clustering based on the feature vectors, and grading the subjective questions according to the reference data. The invention can clearly distinguish the current answering state, carries out targeted analysis on the multidimensional characteristics of answering contents, comprehensively considers the framework and the content weight of the current answering contents, plays an important role for application under artificial intelligence, and can provide a lot of information and logic guidance for reference as auxiliary decision.

Description

Subjective question scoring method and device based on examination big data and text semantics

Technical Field

The invention belongs to the field of big data processing, and particularly relates to a subjective question scoring method and device based on examination big data and text semantics.

Background

At present, the subjective questions are scored in a manner of matching keywords, namely, a large number of manual setting of all conceivable keywords which can be used as answers is used, whether the possible keywords appear in the answers of the answering staff is searched, and if so, the subjective questions are scored, and if not, the subjective questions are not scored.

However, the subjective question scoring mode purely relying on keywords is excessively dependent on the set of answer keywords set in advance, and once the set is not complete, the probability of obvious errors of scores of examinees is increased sharply, so that there is still a more objective technical scheme improvement demand for the scoring standard of the subjective questions.

Disclosure of Invention

In order to solve the problem that the verification and output of the current subjective question detection answer content is inaccurate, the invention requests to protect a subjective question scoring method and device based on examination big data and text semantics.

According to a first aspect of the present invention, the present invention claims a subjective question scoring method based on examination big data and text semantics, comprising:

collecting subjective question answering contents to be scored, and partitioning and standardizing the subjective question answering contents to be scored to obtain standard answering contents to be scored;

Inputting the standard answer contents to be scored into a plurality of feature extraction models to obtain a semantic clustering result set of the subjective question answer contents to be scored under each feature extraction model;

and collecting subjective question scoring results of the subjective question answering contents to be scored according to the semantic clustering result set of the subjective question answering contents to be scored.

Further, collecting the answer content of the subjective questions to be scored, and partitioning and standardizing the answer content of the subjective questions to be scored to obtain the answer content of the standard to be scored, which specifically comprises the following steps:

according to the spatial position relation between the subjective question answer area to be scored and the subjective question answer area of the selected subjective question standard answer, carrying out semantic transfer on the selected subjective question standard answer to enable the subjective question answer area of the selected subjective question standard answer to be aligned with the subjective question answer area to be scored, and comprising the following steps: according to the line vector of the subjective question answer area to be scored, performing word segmentation processing on the selected subjective question standard answer through word segmentation processing semantic transfer; according to the size of an answer frame of an answer area of the subjective questions to be scored, carrying out equal proportion partition on the selected subjective question standard answers; comparing and pasting the selected subjective question standard answers according to the positions of detection points in the subjective question answering areas to be scored;

Resampling the contents of the subjective question answering areas to be scored according to the subjective question answering area points of the selected subjective question standard answers, so that the number of the vertices of the subjective question answering areas to be scored is the same as that of the vertices of the subjective question answering areas in the selected subjective question standard answers, and the positions of the vertices of the subjective question answering areas are corresponding to those of the vertices of the subjective question answering areas in the selected subjective question standard answers; establishing a score point corresponding relation between the two subjective question answering areas according to the subjective question answering areas to be scored and the vertex serial numbers of the subjective question answering areas of the selected subjective question standard answers;

and according to the subjective question offset line vector, processing the main line word of the subjective question answer content to the same transverse line, and carrying out equal proportion partition on the grid model of the subjective question answer content on the three transverse lines to obtain the answer content of the standard to be scored.

Further, the plurality of feature extraction models at least comprise a first feature extraction model, a second feature extraction model, a third feature extraction model and a fourth feature extraction model;

inputting the standard answer contents to be scored into a plurality of feature extraction models to obtain a semantic clustering result set of the answer contents of subjective questions to be scored under each feature extraction model, wherein the semantic clustering result set specifically comprises:

collecting a first answer content set, a second answer content set, a third answer content set and a fourth answer content set of standard answer content to be scored, and inputting a first feature extraction model, a second feature extraction model, a third feature extraction model and a fourth feature extraction model correspondingly;

Each feature extraction model operation respectively obtains a first candidate scoring content, a second candidate scoring content, a third candidate scoring content and a fourth candidate scoring content;

and collecting a semantic clustering result set of the subjective question answering contents to be scored under each feature extraction model according to the candidate scoring contents.

Further, the first feature extraction model, the second feature extraction model, the third feature extraction model and the fourth feature extraction model are convolutional neural networks trained through deep learning;

the first answer content set is a frame-related answer content set, the second answer content set is a content weight-related answer content set, the third answer content set is a validity-related answer content set, and the fourth answer content set is a minutiae answer content set;

the first candidate scoring content comprises an unordered frame, a total score frame and an identification probability value of the total score frame;

the second candidate scoring content comprises an identification probability value with infinitesimal weight, smaller weight, moderate weight, larger weight and infinite weight;

the third candidate scoring content includes an identifying probability value of invalidity, validity;

the fourth candidate scoring content comprises a detail content missing, a detail content defect and an identification probability value of complete detail content;

The semantic clustering result set of the subjective question answer contents to be scored is a multi-group formed by the identification probability values of a plurality of elements in each candidate scoring content.

Further, collecting subjective question scoring results of the subjective question answering contents to be scored according to a semantic clustering result set of the subjective question answering contents to be scored, which specifically comprises the following steps:

collecting verification output of each candidate scoring content according to a semantic clustering result set of the subjective question to be scored as a response content;

obtaining subjective question scoring results of subjective question answering contents to be scored according to the verification output;

and forming five-tuple storage by the subjective question scoring result of the subjective question answering content to be scored and the verification output.

According to a second aspect of the present invention, the present invention claims a subjective question scoring apparatus based on examination big data and text semantics, comprising:

the standard processing module is used for collecting the answer content of the subjective questions to be scored, and partitioning and standardizing the answer content of the subjective questions to be scored to obtain the answer content of the standard to be scored;

the branch processing module inputs the standard answer contents to be scored into a plurality of feature extraction models to obtain a semantic clustering result set of the answer contents of subjective questions to be scored under each feature extraction model;

And the result output module is used for collecting subjective question scoring results of the subjective question answering contents to be scored according to the semantic clustering result set of the subjective question answering contents to be scored.

The invention requests to protect a subjective question scoring method and device based on examination big data and text semantics, and the subjective question scoring content to be scored is obtained by collecting the subjective question scoring content to be scored, and carrying out partitioning and standardization processing on the subjective question scoring content to be scored; inputting the standard answer contents to be scored into a plurality of feature extraction models to obtain a semantic clustering result set of the subjective question answer contents to be scored under each feature extraction model; and collecting subjective question scoring results of the subjective question answering contents to be scored according to the semantic clustering result set of the subjective question answering contents to be scored. The invention can clearly distinguish the current answering state, carries out targeted analysis on the multidimensional characteristics of answering contents, comprehensively considers the frame, content weight, validity use and amplification condition of the current answering content, plays an important role for application under artificial intelligence, and can provide a lot of information and logic guidance for reference as auxiliary decision.

Drawings

FIG. 1 is a workflow diagram of the subject matter scoring method based on test big data and text semantics as claimed in the present application;

FIG. 2 is a second workflow diagram of the subject matter scoring method based on test big data and text semantics as claimed in the present application;

FIG. 3 is a third workflow diagram of the subject matter scoring method based on test big data and text semantics as claimed in the present application;

FIG. 4 is a fourth operational flow diagram of the subject matter scoring method based on test big data and text semantics as claimed in the present application;

fig. 5 is a block diagram of the structure of the subjective question scoring device based on examination big data and text semantics as claimed in the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another element. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

According to a first embodiment of the present invention, referring to fig. 1, the present invention claims a subjective question scoring method based on examination big data and text semantics, comprising:

The invention simply, but rapidly and accurately identifies the test paper, and can also rapidly and accurately score objectively, subjectively and narratically through data processing. More specifically, the standard marks in the test paper and the answer sheet capable of being identified are used, the whole picture is not required to be checked, and only the answer sheet area is required to be checked, so that the test paper can be quickly identified and can be automatically shot, and the convenience of a user can be improved. Moreover, the position of the test paper can be accurately mastered through the reference mark, whether the test paper rotates or not can be identified, an inclination correction index is provided when an image is corrected, and an answer position extraction index is provided when an answer region is extracted, so that the test paper can be simply, quickly and accurately identified. Further, in scoring an answer, in the case of an objective question, binary values of a mark, a check, and a recording position identified as an answer region are not specified as absolute values, but a mark position or a check position is relatively identified by calculating a lowest value of each question, so that reliability of a scoring result can be improved, and in the case of a subjective question or a descriptive question, a scorer can view only answers other than questions, so that readability is improved, and scoring can be easily performed.

"subjective questions" refers to questions with a subjective interpretation, such as a brief answer, by referring to at least one reference answer, such as text, where the reference answer may be composed of a plurality of keywords (i.e., a given point or a collection point), where the interpretation result may include at least three results, such as correct answer, incorrect and unknown (i.e., whether it cannot be determined correctly or incorrectly), and the questions are, as non-limiting examples, brief answer, text writing, and blank filling.

and according to the spatial position relation between the subjective question answering area to be scored and the subjective question answering area of the selected subjective question standard answer, carrying out semantic transfer on the selected subjective question standard answer, so that the subjective question answering area of the selected subjective question standard answer is aligned with the subjective question answering area to be scored.

Referring to fig. 2, specifically, the method includes: according to the line vector of the subjective question answer area to be scored, performing word segmentation processing on the selected subjective question standard answer through word segmentation processing semantic transfer; according to the size of an answer frame of an answer area of the subjective questions to be scored, carrying out equal proportion partition on the selected subjective question standard answers; comparing and pasting the selected subjective question standard answers according to the positions of detection points in the subjective question answering areas to be scored;

referring to fig. 3, inputting standard answer contents to be scored into a plurality of feature extraction models to obtain a semantic clustering result set of subjective question answer contents to be scored under each feature extraction model, which specifically includes:

The number of recognition possibilities output by the frame feature extraction model may be equal to or smaller than the number of kinds of frames of the preset type. For example, the number of frames of the preset type is 3, the frame feature extraction model can output 3 recognition possibilities, and the 3 recognition possibilities can be the recognition possibilities that the answer content of the subjective questions to be scored respectively belong to the frames of the preset type; the frame feature extraction model can only output 3 recognition possibilities, and the 3 recognition possibilities can be the 3 recognition possibilities with the largest median of the recognition possibilities of the subjective questions to be scored, wherein the subjective questions to be scored are respectively in 3 preset types of frames; for example, the frame feature extraction model may output only 1 recognition probability, which is a probability that the median of the recognition probabilities of the 3 preset types of frames is largest.

In this embodiment, extracting key points with smaller weight and larger weight adaptation and neutralization weight respectively, and establishing an affine matrix with larger weight to smaller weight and moderate weight based on the key points with smaller weight and larger weight adaptation and neutralization weight;

Extracting answer content blocks in the small weight and the weight adaptation, and establishing a training answer content data set based on the answer content blocks; judging whether the extracted answer content block is positioned in the content weight area according to the affine matrix, if so, marking the extracted answer content block as a positive sample, otherwise marking the extracted answer content block as a negative sample.

Specifically, foreground information and background information with smaller weight and larger weight in the adaptive middle are distinguished respectively, positions of tissues in the foreground information with smaller weight and moderate weight and the foreground information with larger weight are detected respectively, and the tissues are completely extracted from the foreground information with smaller weight and larger weight in the adaptive middle, so that the tissues with smaller weight and larger weight in the adaptive middle are formed; and respectively extracting key points with smaller tissue weight and larger weight adaptation and neutralization weight, matching the key points with smaller tissue weight and larger weight adaptation and neutralization weight, and establishing an affine matrix with larger weight to smaller tissue weight and moderate weight.

Extracting key points from the tissue with smaller weight, proper weight and larger weight respectively;

calculating the similarity of key points with smaller weight and larger weight;

establishing key point matching pairs with small organization weight and large weight in weight adaptation according to the similarity of the key points; generating an affine matrix based on the key point matching pairs;

Extracting the positions of the subjective questions with larger weight in the larger weight, generating a marking grid map with larger weight, and applying an affine matrix to the marking grid map with larger weight to obtain the corresponding subjective questions with smaller organization weight and moderate weight.

In this embodiment, the score point judgment information of the subjective question answering content to be scored is identified according to the validity answering content of the subjective question answering content to be scored; and collecting the recorded scoring elements in the identifying sensor of the answer content of the subjective questions to be scored, and judging whether the information verification scoring elements are correct or not according to the scoring points collected from the answer content identifying module.

Specifically, collecting validity answer content of subjective questions to be scored; extracting answer content characteristic information of effective answer content acquired by the vision acquisition device, matching the extracted answer content characteristic information with answer content characteristic information stored in a memory, and determining the validity information of operation effectiveness corresponding to the matched answer content characteristic information as score point judgment information of the answer content of the subjective questions to be scored.

Reading the scoring element recorded in the identification sensor; and collecting the score judgment information of the determined subjective question answer content to be scored, and comparing the score element with the score judgment information to judge whether the score element recorded in the identification sensor is correct or not.

Reading a first check value recorded in the identification sensor, calculating the score element according to a second preset check algorithm to obtain a second check value, and judging whether the first check value is matched with the second check value.

Wherein in this embodiment, identifying the probability value for the fourth candidate scoring content including missing detail content, defect in detail content, complete detail content includes:

storing the stretched resolution answer content and configuration information of subjective question answer content to be scored, which is generated according to the stretched resolution answer content, in a resolution memory, wherein the stretched resolution answer content is obtained by stretching original resolution answer content acquired by a display screen device, and the subjective question answer content to be scored is used for displaying on the display screen;

obtaining configuration information of answering contents of subjective questions to be scored and stretched resolution answering contents according to the resolution data processing request;

recovering the stretched resolution response content according to the configuration information to obtain a basic resolution response content;

amplifying the basic resolution answer content according to the configuration information to obtain subjective question answer content to be scored;

Displaying answer content of subjective questions to be scored;

searching in a resolution memory to obtain subjective question answering contents to be scored, which are matched with the position coordinate information, according to the position coordinate information of each pixel point to be scored in the display resolution answering content in any resolution answering content area on the display screen;

and comparing and analyzing the pixel value of the pixel point to be scored with the pixel value of the corresponding answer content of the subjective question to be scored, and judging whether amplification exists or not according to the comparison and analysis result.

Further, referring to fig. 4, according to a semantic clustering result set of the subjective question answering contents to be scored, subjective question scoring results of the subjective question answering contents to be scored are collected, which specifically includes:

In this embodiment, the candidate score content of each feature extraction model is essentially a vector after the softmax operation, and the multiple candidate score contents of each feature extraction model are subjected to tuple construction to form a verification output in a multi-tuple form.

For example, the check output of the content weight feature extraction model for answering the subjective questions to be scored at a time is [0,0.1,0.7,0.2], which represents the predicted probability values of the four results of the content weight branches, respectively.

Wherein the probability of infinitesimal weight is 0;

the probability of smaller weight is 0.1;

the probability of moderate weight is 0.7;

the probability of a larger weight is 0.2.

In this case, a strategy of winner's general eating is adopted, and under this scheme, the verification output of the content weight feature extraction model is considered as "weight is moderate".

Similarly, the framework recognition feature extraction model, the validity recognition feature extraction model and the minutiae feature extraction model all adopt the same strategy. Thus, the prediction vector is converted into a single prediction category.

It is emphasized that each frame will have a corresponding examination type result. So that tens of thousands of five-tuple results (frame state, content weight state, validity state, magnification state, examination type) are obtained during the examination/operation of a answer. Our patent facilitates the quick localization of certain special scenarios of a case when the rolls-judging person performs retrospective work afterwards.

According to a second embodiment of the present invention, referring to fig. 5, the present invention claims a subjective question scoring device based on examination big data and text semantics, comprising:

the branch processing module inputs the standard answer contents to be scored into a plurality of feature extraction models to obtain a semantic clustering result set of the subjective question answer contents to be scored under each feature extraction model;

Those skilled in the art will appreciate that various modifications and improvements can be made to the disclosure. For example, the various devices or components described above may be implemented in hardware, or may be implemented in software, firmware, or a combination of some or all of the three.

A flowchart is used in this disclosure to describe the steps of a method according to embodiments of the present disclosure. It should be understood that the steps that follow or before do not have to be performed in exact order. Rather, the various steps may be processed in reverse order or simultaneously. Also, other operations may be added to these processes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the methods described above may be implemented by a computer program to instruct related hardware, and the program may be stored in a computer readable storage medium, such as a read only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiment may be implemented in the form of hardware, or may be implemented in the form of a software functional module. The present disclosure is not limited to any specific form of combination of hardware and software.

Unless defined otherwise, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The foregoing is illustrative of the present disclosure and is not to be construed as limiting thereof. Although a few exemplary embodiments of this disclosure have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this disclosure. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the claims. It is to be understood that the foregoing is illustrative of the present disclosure and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The disclosure is defined by the claims and their equivalents.

In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. The subjective question scoring method based on the examination big data and the text semantics is characterized by comprising the following steps:

collecting the answer content of the subjective questions to be scored, and partitioning and standardizing the answer content of the subjective questions to be scored to obtain the answer content of the standard to be scored;

Inputting the standard answer contents to be scored into a plurality of feature extraction models to obtain a semantic clustering result set of the answer contents of subjective questions to be scored under each feature extraction model;

2. The method for scoring subjective questions based on test big data and text semantics of claim 1,

the method comprises the steps of collecting the answer content of the subjective questions to be scored, partitioning and standardizing the answer content of the subjective questions to be scored to obtain the answer content of the standard to be scored, and specifically comprises the following steps:

according to the spatial position relation between the subjective question answering area to be scored and the subjective question answering area of the selected subjective question standard answer, carrying out semantic transfer on the selected subjective question standard answer to enable the subjective question answering area of the selected subjective question standard answer to be aligned with the subjective question answering area to be scored, comprising the following steps: according to the line vector of the subjective question answer area to be scored, performing word segmentation processing on the selected subjective question standard answer through word segmentation processing semantic transfer; according to the size of an answer frame of an answer area of the subjective questions to be scored, carrying out equal proportion partition on the selected subjective question standard answers; comparing and pasting the selected subjective question standard answers according to the positions of detection points in the subjective question answering areas to be scored;

3. The method for scoring subjective questions based on test big data and text semantics of claim 1,

the plurality of feature extraction models at least comprise a first feature extraction model, a second feature extraction model, a third feature extraction model and a fourth feature extraction model;

inputting the standard answer content to be scored into a plurality of feature extraction models to obtain a semantic clustering result set of the answer content of the subjective questions to be scored under each feature extraction model, wherein the semantic clustering result set comprises the following specific steps:

Collecting a first answer content set, a second answer content set, a third answer content set and a fourth answer content set of the standard answer content to be scored, and respectively inputting a first feature extraction model, a second feature extraction model, a third feature extraction model and a fourth feature extraction model correspondingly;

and collecting a semantic clustering result set of the subjective question answer content to be scored under each feature extraction model according to the candidate scoring content.

4. The method for scoring subjective questions based on examination big data and text semantics of claim 3,

the first feature extraction model, the second feature extraction model, the third feature extraction model and the fourth feature extraction model are convolutional neural networks subjected to deep learning training;

the fourth candidate scoring content comprises a detail content deficiency, a detail content defect and an identification probability value of complete detail content; and the semantic clustering result set of the subjective question answer contents to be scored is a multi-group formed by the identification probability values of a plurality of elements in each candidate scoring content.

5. The method for scoring subjective questions based on test big data and text semantics of claim 4,

the collecting subjective question scoring results of the subjective question answering contents to be scored according to the semantic clustering result set of the subjective question answering contents to be scored specifically comprises the following steps:

collecting verification output of each candidate scoring content according to the semantic clustering result set of the subjective question to be scored as the answer content;

obtaining subjective question scoring results of the subjective question answering contents to be scored according to the checking output;

6. Subjective question scoring device based on examination big data and text semantics, which is characterized by comprising:

7. The subjective question scoring device based on big examination data and text semantics according to claim 6, wherein the collecting subjective question answering contents to be scored, partitioning and standardizing the subjective question answering contents to be scored, and obtaining standard answering contents to be scored specifically includes:

8. The subjective question scoring device based on test big data and text semantics of claim 7,

9. The subjective question scoring device based on examination big data and text semantics of claim 8,

10. The subjective-topic scoring device based on big examination data and text semantics of claim 9, wherein the collecting subjective-topic scoring results of the subjective-topic scoring content to be scored according to a semantic clustering result set of the subjective-topic scoring content to be scored specifically comprises: