CN114550181A

CN114550181A - Method, device and medium for identifying question

Info

Publication number: CN114550181A
Application number: CN202210126218.6A
Authority: CN
Inventors: 秦曙光
Original assignee: Zhuhai Readboy Software Technology Co Ltd
Current assignee: Zhuhai Readboy Software Technology Co Ltd
Priority date: 2022-02-10
Filing date: 2022-02-10
Publication date: 2022-05-27
Anticipated expiration: 2042-02-10
Also published as: CN114550181B

Abstract

The invention provides a small question identification method based on machine learning, which identifies all test question areas and correction traces in a test paper in a machine learning mode, further identifies whether the test questions have the small questions, further determines the correction results of each small question and answer according to whether the small questions and the correction traces exist, and can more accurately identify the correction results of the test paper.

Description

Method, device and medium for identifying question

Technical Field

The invention relates to the technical field of education, in particular to a method, a device and a medium for identifying a question.

Background

At present wisdom classroom rapid development, the unified functions such as examination paper, teaching assistance have appeared, but still remain perfect to the wholesale function of examination paper and teaching assistance etc. present identification system can only accomplish the big question discernment basically, but the big question discernment is unfavorable for the teacher to unite the branch also does not favor subsequent according to the question of recommending of the knowledge point that becomes more meticulous.

Topic identification is mainly faced with the following problems: 1) teachers are used to modify all questions which are right or wrong in a big question only by one modification symbol, for example, for a plurality of small questions in a big question, drawing a hook or a cross to show that all the small questions are right or wrong. 2) The corresponding small questions of a big question may have a right or a wrong, and the teacher may also individually correct each small question. 3) The distribution of the questions may be longitudinal or transverse, which is not favorable for accurately obtaining the information of the questions under the condition of a correction trace.

The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the material described in this section is not prior art to the claims in this application and is not admitted to be prior art by inclusion in this section.

Disclosure of Invention

Aiming at the technical problems in the related art, the invention provides a topic identification method based on machine learning, which comprises the following steps:

s1, obtaining template data of the target to be recognized, performing image segmentation on the target to be recognized according to the template data, and sequentially performing correction trace extraction on segmented images; identifying the result of the correction trace by using a preset identification model;

s2, acquiring the number of effective correction marks in the segmented image, wherein if the number of the effective correction marks is one, the correction result of the segmented image is the recognition result of the correction marks;

s3, if the number of the correction marks is not one, further acquiring the mark with the largest area occupied by the correction marks in all the correction marks, and taking the result of the recognition of the correction marks as the default correction result of the question and recording the result as a default value;

s4, obtaining the area corresponding to each small topic of the divided area, if no small topic exists, judging whether the result is wrong correction trace in the topic range, if so, judging that the whole correction result is wrong;

and S5, if the small questions exist, sequentially traversing the areas corresponding to all the small questions, and judging whether correction marks exist in the range of the small questions. If the correction trace exists, the result of the correction trace is used as the recognition result of the question, and if the correction trace does not exist, a default value is used as the correction result.

Specifically, the method for determining the default value of the correction mark in step S3 may further perform a secondary verification on the largest correction mark, where the secondary verification step further includes:

s31, judging whether the question has a subtotal or not, if not, then no secondary verification is needed, if yes, then further identifying the distribution mode of the subtotal;

s32, if the subtotal is longitudinally distributed, further calculating the proportion value of the height of the maximum correction trace in the longitudinal direction to the height of the whole subtotal, if the proportion value exceeds a preset threshold value, taking the result of the maximum correction trace as a default value;

s33, if the subtotal is transversely distributed, further calculating the transverse length of the maximum correction trace as the proportion value of the overall transverse length of the subtotal, and if the proportion value exceeds a preset threshold value, taking the result of the maximum correction trace as a default value;

and S34, if the small questions are distributed longitudinally and transversely, further judging whether the whole questions are more inclined to be longitudinally distributed or transversely distributed according to the size or the proportional relation between the longitudinal height and the transverse length, and identifying according to the inclined distribution structure of the whole questions.

Specifically, the correction trace in step S5 is subjected to secondary verification, and if and only if the proportion of the correction trace to the subtotal area is greater than the preset threshold, the correction trace is considered as an effective correction trace.

Specifically, the template data includes page data and title data.

Specifically, the page data includes width and height data of the page or/and a page number; the title data includes title coordinate data.

In a second aspect, another embodiment of the present invention discloses a device for identifying a topic for machine learning, which includes the following units:

the correction trace recognition unit is used for acquiring template data of a target to be recognized, segmenting the target to be recognized according to the template data and sequentially extracting correction traces of the segmented image; identifying the result of the correction trace by using a preset identification model;

the effective correction mark judging unit is used for acquiring the number of effective correction marks in the segmented image, and if the number of the effective correction marks is one, the correction result of the segmented image is the recognition result of the correction marks;

a maximum correction trace judging unit, configured to further obtain, if the number of the correction traces is not one, a trace in which the correction trace occupies the largest area among all the correction traces, and record a result of recognition of the correction trace as a default correction result of the question as a default value;

the small question judging unit is used for acquiring the area corresponding to each small question of the divided area, if no small question exists, judging whether a result is an error correction trace in the scope of the question, and if the error correction trace exists, judging that the whole correction result of the question is an error;

the correction result determining unit is used for sequentially traversing the areas corresponding to all the small questions if the small questions exist, and judging whether correction traces exist in the range of the small questions; if the correction trace exists, the result of the correction trace is used as the recognition result of the question, and if the correction trace does not exist, a default value is used as the correction result.

Specifically, the maximum trace determining unit further includes:

a secondary verification unit: the correcting trace default value judging method in the maximum trace judging unit can further perform secondary verification on the maximum correcting trace by using a secondary verifying unit, and the secondary verifying unit further comprises:

the second question judging unit is used for judging whether the question has a question or not, if the question does not have a question, secondary verification is not needed, and if the question exists, the distribution mode of the question is further identified;

the first subtotal direction processing unit is used for further calculating the height occupied by the maximum correcting mark in the longitudinal direction as the proportion value of the height of the whole subtotal in the longitudinal direction if the subtotal is longitudinally distributed, and taking the result of the maximum correcting mark as a default value if the proportion value exceeds a preset threshold value;

the second subtotal direction processing unit is used for further calculating the transverse length of the maximum correction mark as a proportion value of the overall transverse length of the topic if the subtotal is transversely distributed, and taking the result of the maximum correction mark as a default value if the proportion value exceeds a preset threshold value;

and the third topic direction processing unit is used for judging whether the overall topic is more inclined to longitudinal distribution or transverse distribution according to the size or proportional relation between the longitudinal height and the transverse length if the topics are distributed longitudinally and transversely, and then identifying according to the distribution structure of the overall deviation.

Specifically, the correction trace in the small question judging unit is subjected to secondary verification, and if and only if the proportion of the correction trace to the small question area is greater than a preset threshold value, the correction trace is regarded as an effective correction trace.

Specifically, the template data includes page data and title data; the page data comprises width and height data or/and page numbers of the page; the title data includes title coordinate data.

In a third aspect, another embodiment of the present invention discloses a non-volatile memory storing instructions, which when executed by a processor, are used for implementing the above-mentioned method for identifying topics based on machine learning.

The invention identifies all test question areas and correction traces in the test paper in a machine learning mode, further identifies whether the test questions have the small questions, further determines the correction result of each small question and the answer according to whether the small questions and the correction traces exist, and can more accurately identify the correction result of the test paper.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram provided by an embodiment of the present invention;

fig. 3 is a schematic diagram provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.

Example one

Referring to fig. 1, the embodiment discloses a method for identifying a question, which includes the following steps:

the target to be identified is a test paper, and the specific test paper can be a paper test paper which can be obtained through a scanner.

Specifically, the template data of the target to be recognized may be a template that is established in advance and corresponds to the target to be recognized.

The template data at least comprises page data and title data, and can further comprise one or more of data of a title, a corresponding grade, a class, a subject, a chapter, a section and the like of the target to be identified.

The page data at least comprises width and height data of the page, and can further comprise page numbers and other data.

The question data at least comprises the coordinate data of a question, and can further comprise one or more of score data, answer data, analysis data, micro-course link data, knowledge point data, similar question type data and the like of a coordinate region corresponding to the question.

Preferably, when the coordinate data of the topic is obtained, the structure type of the topic is determined, if the topic is only a big topic, only the coordinate data of the big topic is obtained, and if the topic is a structure of the big topic and a small topic corresponding to the big topic, the coordinate data of each level of the topic is further recorded.

The topic coordinate data further refers to coordinate data information of a minimum circumscribed rectangle of the topic, and the data information storage mode is not specifically limited, and may be formed by the coordinate information of the upper left corner of the rectangle and the coordinate information of the lower right corner of the rectangle, or formed by the coordinate information of the upper left corner and the data information of the width and height of the rectangle. In this embodiment, the titles are cut according to the acquired coordinate data, so that the titles in the page are cut respectively.

Specifically, after the target to be recognized is obtained, the target to be recognized is cut according to the coordinate information in the template data of the target to be recognized, so that the question of the target to be recognized is obtained.

The image segmentation in the present embodiment is to segment each of the large rectangular regions in the target as a unit.

In this embodiment, the extraction of the correction traces further refers to limiting HSV color value ranges of the correction traces, and each color corresponds to an HSV color value space, so that the color of the correction pen can be extracted by designating the color of the correction pen, thereby extracting the correction traces.

And extracting all the extracted correction traces according to the minimum external rectangle of the correction traces, and recording the coordinate position of each rectangular picture in the image.

Preferably, when the correction trace is extracted according to the minimum circumscribed rectangle, a redundancy value can be further set, so that the extracted rectangle area is slightly larger than the correction trace for fault tolerance.

Preferably, in the extraction process of the correction traces, preliminary filtering can be further performed according to the size of the correction traces, so as to eliminate redundant noise points. The filtering method is not particularly limited, and may specifically be to screen out noise points whose modification traces are smaller than the preset minimum trace proportion value of the original image, for example, if the area of the modification traces is smaller than 0.4% of the whole image area, the traces are considered to be too small, so as to determine that the modification traces are only interference factors, but not true modification traces.

Specifically, the recognition model of the present embodiment is recognized by machine learning, which trains the recognition model by machine learning. Specific machine learning includes, but is not limited to, neural network models.

The specific training algorithm of the recognition model trained by the machine learning is not specifically limited, and the training process of the recognition model further includes:

the type of the label for training is determined, four kinds of labels are set according to the general correction habit of the teacher, and the labels are respectively correct, oblique lines, crosses and errors represented by hooks, and half pairs represented by circles.

And collecting a large number of samples corresponding to the label types, training and classifying according to a preset algorithm to generate a recognition model, and further predicting the correction result corresponding to the extracted correction trace according to the recognition model.

Preferably, before the identification is carried out according to the identification model, the identification for judging whether the extracted correction trace is the compliance correction trace can be further added, the judgment basis is also that the model for judging whether the extracted correction trace is the compliance correction trace is obtained by training according to a large number of samples, the preliminary screening can be carried out according to the model, and the identification accuracy rate is improved.

and S3, if the number of the correction marks is not one, further acquiring the mark with the largest area occupied by the correction marks in all the correction marks, and taking the result of the recognition of the correction marks as the default correction result of the question and recording the result as a default value.

S4, obtaining the area corresponding to each small topic of the divided area, if no small topic exists, judging whether the result is wrong correction trace in the topic range, if so, judging that the whole correction result is wrong.

and S31, judging whether the question has a question or not, if not, needing no secondary verification, and if so, further identifying the distribution mode of the question.

And S32, if the small questions are distributed longitudinally, further calculating a proportion value of the height of the maximum correction trace in the longitudinal direction to the height of the whole longitudinal direction of the questions, and if the proportion value exceeds a preset threshold value, taking the result of the maximum correction trace as a default value (one question has a plurality of small questions in the longitudinal direction, and if a teacher wants to correct the small questions into a full pair, the correction trace is certainly high in the longitudinal direction).

And S33, if the small questions are distributed transversely, further calculating a ratio value of the transverse length of the maximum correction trace to the transverse length of the whole questions, and if the ratio value exceeds a preset threshold, taking the result of the maximum correction trace as a default value (one question has a plurality of small questions in the transverse direction, and if the teacher wants to correct the small questions into a full pair, the correction trace is long enough in the transverse direction).

Specifically, if the maximum modification mark cannot satisfy the secondary verification condition, the default value is an identification error; in practical situations, many students are directly left without doing any questions, and teachers do not modify the questions, so that the questions which are not done should be defaulted as wrong questions.

Specifically, the correction trace in step S5 may also be subjected to secondary verification, and if and only if the proportion of the correction trace to the subtotal area is greater than the preset threshold, the correction trace is considered as an effective correction trace.

In the embodiment, all test question areas and correction traces in the test paper are identified in a machine learning mode, whether the test questions have the small questions or not is further identified, each small question and the correction result of the answer are further determined according to whether the small questions and the correction traces exist or not, and the correction result of the test paper can be more accurately identified.

Example two

Referring to fig. 2, the present embodiment discloses a device for identifying a topic for machine learning, which includes the following units:

Preferably, when the coordinate data of the topic is obtained, the structure type of the topic is determined, if the topic is only a big topic, only the coordinate data of the big topic is obtained, and if the topic is a structure of the big topic and a small topic corresponding to the big topic, the coordinate data of each level of the topic is further recorded, if one big topic comprises three small topics and the first small topic also comprises two small topics, the whole coordinate region of the big topic, the coordinate region of each small topic and the coordinate region of each small topic are required to be recorded.

The topic coordinate data further refers to coordinate data information of a minimum circumscribed rectangle of the topic, and the data information storage mode is not specifically limited, and may be formed by the coordinate information of the upper left corner of the rectangle and the coordinate information of the lower right corner of the rectangle, or formed by the coordinate information of the upper left corner and the data information of the width and height of the rectangle. In this embodiment, the titles are cut according to the obtained coordinate data, so that the titles in the page are cut out respectively.

the type of the trained label is determined, four labels are set according to the general correction habit of the teacher, the hook represents correct, oblique line, cross and error, and the circle represents half pair.

the correction result determining unit is used for sequentially traversing the areas corresponding to all the small questions if the small questions exist, and judging whether correction traces exist in the range of the small questions; if the correction trace exists, the result of the correction trace is used as the recognition result of the small question, and if the correction trace does not exist, a default value is used as the correction result.

The maximum trace judging unit further includes:

EXAMPLE III

Referring to fig. 3, fig. 3 is a schematic structural diagram of a topic identification device based on machine learning according to the embodiment. The machine learning-based topic identification apparatus 20 of this embodiment comprises a processor 21, a memory 22, and a computer program stored in said memory 22 and executable on said processor 21. The processor 21 realizes the steps in the above-described method embodiments when executing the computer program. Alternatively, the processor 21 implements the functions of the modules/units in the above-described device embodiments when executing the computer program.

Illustratively, the computer program may be divided into one or more modules/units, which are stored in the memory 22 and executed by the processor 21 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the machine learning based topic identification apparatus 20. For example, the computer program may be divided into the modules in the second embodiment, and for the specific functions of the modules, reference is made to the working process of the apparatus in the foregoing embodiment, which is not described herein again.

The topic identification device 20 based on machine learning may include, but is not limited to, a processor 21 and a memory 22. Those skilled in the art will appreciate that the schematic diagram is merely an example of a machine learning based topic identification apparatus 20 and does not constitute a limitation of a machine learning based topic identification apparatus 20 and may include more or fewer components than shown, or combine certain components, or different components, for example, the machine learning based topic identification apparatus 20 may also include input output devices, network access devices, buses, and the like.

The Processor 21 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 21 is a control center of the machine learning based topic identification apparatus 20, and various interfaces and lines are used to connect various parts of the entire machine learning based topic identification apparatus 20.

The memory 22 may be used to store the computer programs and/or modules, and the processor 21 implements various functions of the machine learning-based question recognition apparatus 20 by running or executing the computer programs and/or modules stored in the memory 22 and calling data stored in the memory 22. The memory 22 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the integrated module/unit of the apparatus 20 for recognizing topics based on machine learning can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by the processor 21 to implement the steps of the above embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. A method for identifying a subtotal based on machine learning comprises the following steps:

s5, if the small questions exist, sequentially traversing the areas corresponding to all the small questions, and judging whether correction marks exist in the range of the small questions; if the correction trace exists, the result of the correction trace is used as the recognition result of the question, and if the correction trace does not exist, a default value is used as the correction result.

2. The method according to claim 1, wherein the default value determination method for modification traces in step S3 further performs a secondary verification on the largest modification trace, and the secondary verification step further comprises:

s32, if the small questions are distributed longitudinally, further calculating the proportion value of the height of the maximum correcting mark in the longitudinal direction to the height of the whole questions in the longitudinal direction, and if the proportion value exceeds a preset threshold value, taking the result of the maximum correcting mark as a default value;

3. The method according to claim 2, wherein the correction mark in step S5 is secondarily checked, and the correction mark is considered as a valid correction mark if and only if the proportion of the correction mark to the subtopic area is greater than a preset threshold.

4. The method of claim 3, wherein the template data comprises page data and title data.

5. The method of claim 4, the page data comprising page width and height data or/and page number; the title data includes title coordinate data.

6. A device for identifying a topic for machine learning, comprising the following units:

the maximum correction mark judging unit is used for further acquiring the mark with the largest area occupied by the correction marks in all the correction marks if the number of the correction marks is not one, and taking the result of recognition of the correction marks as the default correction result of the question and recording the result as a default value;

7. The apparatus of claim 1, the maximum trace judging unit further comprising:

8. The apparatus according to claim 7, wherein the correction trace in the question judging unit is checked for the second time, and the correction trace is considered as a valid correction trace only when the ratio of the correction trace to the question area is greater than a preset threshold.

9. The apparatus of claim 3, the template data comprising page data and title data; the page data comprises width and height data or/and page numbers of the page; the title data includes title coordinate data.

10. A non-volatile memory storing instructions which, when executed by a processor, are adapted to implement the machine learning based topic identification method of any one of claims 1-5.