CN117037187A - Test paper image extraction method and device and terminal equipment - Google Patents

Test paper image extraction method and device and terminal equipment Download PDF

Info

Publication number
CN117037187A
CN117037187A CN202311303753.5A CN202311303753A CN117037187A CN 117037187 A CN117037187 A CN 117037187A CN 202311303753 A CN202311303753 A CN 202311303753A CN 117037187 A CN117037187 A CN 117037187A
Authority
CN
China
Prior art keywords
test paper
image
paper image
handwriting
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311303753.5A
Other languages
Chinese (zh)
Inventor
陈之华
姚祖发
张候云
黄何列
李伟洪
岳玉美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Science & Technology Infrastructure Center
Original Assignee
Guangdong Science & Technology Infrastructure Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Science & Technology Infrastructure Center filed Critical Guangdong Science & Technology Infrastructure Center
Priority to CN202311303753.5A priority Critical patent/CN117037187A/en
Publication of CN117037187A publication Critical patent/CN117037187A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a method, a device and terminal equipment for extracting test paper images, which are used for collecting the test paper images of a test paper to be extracted; extracting a handwriting characteristic region image in the test paper image according to a pre-constructed test paper template; extracting and analyzing the identity data information in the test paper image; and correspondingly matching the handwriting characteristic area image with the identity data information to obtain a test paper sample of the test paper to be extracted and a student handwriting digital training sample. The application can realize the extraction of the test paper image with high precision and high accuracy, can effectively utilize the test paper with the answer, can efficiently extract the handwriting number in the student identity information as a training sample, and provides a good training data basis for the student handwriting number recognition model.

Description

Test paper image extraction method and device and terminal equipment
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method and apparatus for extracting a test paper image, and a terminal device.
Background
Along with the continuous development of science and technology, the intelligent education theory gradually enters the teaching platform, different modes are also provided for the test of students, various online education platforms and online examination systems are gradually applied to teaching activities, and the examination systems are also gradually changed from manual examination of teachers to automatic examination of systems, so that the ordinary class examination can be automatically checked by students, and a great amount of time and energy of teachers are saved.
The current education informatization development is rapid, and the intelligent paper reading is widely applied to schools, and the teacher is assisted to improve the teaching efficiency by reading the paper image uploading system. However, problems faced by the extraction of the test paper image include matching of the answer content of the student with the identity information and recognition of the handwriting identity information of the student, and the quality of the image acquired by a common scanner is low, so that the extraction accuracy of the test paper image is affected. On the other hand, in the development of artificial intelligence, handwriting digital recognition is a very important ring, and can be used for recognizing a test number or a student number, but handwriting digital sample data of a student are difficult to collect, so that accuracy is difficult to guarantee, and therefore if a test paper with a finished answer can be effectively utilized, handwriting digital can be efficiently extracted to serve as a training sample, a good data basis can be provided for a handwriting digital recognition model of the student, and the accuracy of the model is guaranteed.
Disclosure of Invention
In order to solve the technical problems, the application provides a test paper image extraction method, a device and a terminal device, which can realize test paper image extraction with high precision and high accuracy and efficiently extract handwriting numbers in student identity information as training samples.
The embodiment of the application provides a test paper image extraction method, which comprises the following steps:
collecting a test paper image of a test paper to be extracted;
extracting a handwriting characteristic region image in the test paper image according to a pre-constructed test paper template;
extracting and analyzing the identity data information in the test paper image;
and correspondingly matching the handwriting characteristic area image with the identity data information to obtain a test paper sample of the test paper to be extracted and a student handwriting digital training sample.
Preferably, the test paper template construction process includes:
determining locating point coordinate information of a standard test paper;
determining the marking data of different areas according to the positioning point coordinate information;
performing association mapping on the labeling data of different areas and the characteristic area attribute corresponding to the areas to obtain the test paper template;
the positioning point coordinate information comprises an upper left corner coordinate, a lower left corner coordinate, an upper right corner coordinate and a lower right corner coordinate; the annotation data comprises the reference origin coordinates of the region, the feature region width and the feature region height.
As a preferred solution, after collecting the test paper image of the test paper to be extracted, the method further includes:
rectangular correction is carried out on the test paper image through transmission transformation;
obtaining an inversion recognition result by recognizing a plurality of asymmetric auxiliary positioning points preset in the test paper to be extracted, and carrying out inversion correction on the test paper image according to the recognition result;
and carrying out characteristic alignment correction on the test paper image and a pre-stored standard test paper by adopting a preset correction algorithm.
Preferably, the extracting the handwriting feature area image in the test paper image according to the pre-constructed test paper template includes:
acquiring positioning point coordinate information on the test paper image;
calculating the region information of different characteristic regions according to the marking data in the test paper template and the coordinate information of the locating points on the test paper image;
and extracting the handwriting characteristic region images according to the region information of the different characteristic regions.
Further, the region information includes a width, a height, a reference origin abscissa and a reference origin ordinate;
wherein the width of the jth characteristic region in the test paper imageThe method comprises the steps of carrying out a first treatment on the surface of the The height of the jth characteristic region in the test paper image is +.>The method comprises the steps of carrying out a first treatment on the surface of the The upper left-corner abscissa of the jth characteristic region in the test paper imageThe method comprises the steps of carrying out a first treatment on the surface of the The upper left vertical coordinate of the jth characteristic region in the test paper image is +.>The method comprises the steps of carrying out a first treatment on the surface of the The width ratio of the test paper template and the test paper image in the horizontal direction is +.>The method comprises the steps of carrying out a first treatment on the surface of the The height ratio of the test paper template to the test paper image in the vertical direction is +.>The method comprises the steps of carrying out a first treatment on the surface of the Dx1 and Dx2 are respectively the left upper corner abscissa and the right upper corner abscissa of the standard test paper; TPx1 and TPx2 are the left upper corner abscissa and the right upper corner abscissa of the test paper image respectively; dy4 and Dy1 are respectively the lower right-hand ordinate and the upper left-hand ordinate of the standard test paper; TPy4 and TPy are respectively the lower right-hand ordinate and the upper left-hand ordinate of the test paper image;TwidthjandThightjthe feature area width and the feature area height of the jth feature area in the labeling data are the feature area width and the feature area height; txj and Tyj are the abscissa and ordinate, respectively, of the reference origin of the jth feature region in the annotation data.
Preferably, after extracting the handwriting feature area image in the test paper image according to the pre-constructed test paper template, the method further comprises:
calculating the segmentation confidence of the handwriting feature area image through a preset target detection model;
if the calculated segmentation confidence is higher than a preset confidence threshold, judging that the regional feature mark is correct;
and if the calculated segmentation confidence is not higher than the confidence threshold, judging that the regional characteristic mark is incorrect.
As a preferable scheme, the extracting and analyzing the identity data information in the test paper image specifically includes:
extracting an identity area image in the test paper image;
identifying a two-dimensional code or a bar code in the identity area image, and acquiring the identity data information;
the identity data information comprises at least one of examination room number, seat number, admission ticket number and ticket number.
Further, the student handwriting digital training sample obtaining process specifically comprises the following steps:
extracting a handwriting characteristic area image comprising digital identity information of students;
and carrying out association binding on the identity data information and the handwriting characteristic area image comprising the student digital identity information to obtain a student handwriting digital training sample.
The embodiment of the application also provides a test paper image extraction device, which comprises:
the image acquisition module is used for acquiring a test paper image of the test paper to be extracted;
the region extraction module is used for extracting a handwriting characteristic region image in the test paper image according to a pre-constructed test paper template;
the identity extraction module is used for extracting and analyzing the identity data information in the test paper image;
and the sample construction module is used for correspondingly matching the handwriting characteristic area image with the identity data information to obtain a test paper sample of the test paper to be extracted and a student handwriting digital training sample.
Preferably, the process of constructing the test paper template by the area extraction module comprises the following steps:
determining locating point coordinate information of a standard test paper;
determining the marking data of different areas according to the positioning point coordinate information;
performing association mapping on the labeling data of different areas and the characteristic area attribute corresponding to the areas to obtain the test paper template;
the positioning point coordinate information comprises an upper left corner coordinate, a lower left corner coordinate, an upper right corner coordinate and a lower right corner coordinate; the annotation data comprises the reference origin coordinates of the region, the feature region width and the feature region height.
Preferably, the apparatus further comprises a correction module for:
after collecting a test paper image of a test paper to be extracted, carrying out rectangular correction on the test paper image through transmission transformation;
obtaining an inversion recognition result by recognizing a plurality of asymmetric auxiliary positioning points preset in the test paper to be extracted, and carrying out inversion correction on the test paper image according to the recognition result;
and carrying out characteristic alignment correction on the test paper image and a pre-stored standard test paper by adopting a preset correction algorithm.
Preferably, the region extraction module is configured to:
acquiring positioning point coordinate information on the test paper image;
calculating the region information of different characteristic regions according to the marking data in the test paper template and the coordinate information of the locating points on the test paper image;
and extracting the handwriting characteristic region images according to the region information of the different characteristic regions.
Further, the region information includes a width, a height, a reference origin abscissa and a reference origin ordinate;
wherein the width of the jth characteristic region in the test paper imageThe method comprises the steps of carrying out a first treatment on the surface of the The height of the jth characteristic region in the test paper image is +.>The method comprises the steps of carrying out a first treatment on the surface of the The upper left-corner abscissa of the jth characteristic region in the test paper imageThe method comprises the steps of carrying out a first treatment on the surface of the The upper left vertical coordinate of the jth characteristic region in the test paper image is +.>The method comprises the steps of carrying out a first treatment on the surface of the The width ratio of the test paper template and the test paper image in the horizontal direction is +.>The method comprises the steps of carrying out a first treatment on the surface of the The height ratio of the test paper template to the test paper image in the vertical direction is +.>The method comprises the steps of carrying out a first treatment on the surface of the Dx1 and Dx2 are respectively the left upper corner abscissa and the right upper corner abscissa of the standard test paper; TPx1 and TPx2 are the left upper corner abscissa and the right upper corner abscissa of the test paper image respectively; dy4 and Dy1 are respectively the lower right-hand ordinate and the upper left-hand ordinate of the standard test paper; TPy4 and TPy are respectively the lower right-hand ordinate and the upper left-hand ordinate of the test paper image;TwidthjandThightjthe feature area width and the feature area height of the jth feature area in the labeling data are the feature area width and the feature area height; txj and Tyj are the abscissa and ordinate, respectively, of the reference origin of the jth feature region in the annotation data.
Preferably, the apparatus further comprises a verification module for:
after extracting a handwriting characteristic region image in the test paper image according to a pre-constructed test paper template, calculating the segmentation confidence coefficient of the handwriting characteristic region image through a preset target detection model;
if the calculated segmentation confidence is higher than a preset confidence threshold, judging that the regional feature mark is correct;
and if the calculated segmentation confidence is not higher than the confidence threshold, judging that the regional characteristic mark is incorrect.
Preferably, the identity extraction module is specifically configured to:
extracting an identity area image in the test paper image;
identifying a two-dimensional code or a bar code in the identity area image, and acquiring the identity data information;
the identity data information comprises at least one of examination room number, seat number, admission ticket number and ticket number.
Further, the student handwriting digital training sample obtaining process specifically comprises the following steps:
extracting a handwriting characteristic area image comprising digital identity information of students;
and carrying out association binding on the identity data information and the handwriting characteristic area image comprising the student digital identity information to obtain a student handwriting digital training sample.
The embodiment of the application also provides a computer readable storage medium, which comprises a stored computer program, wherein when the computer program runs, the equipment where the computer readable storage medium is located is controlled to execute the test paper image extraction method according to any one of the above embodiments.
The embodiment of the application also provides a terminal device, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor realizes the test paper image extraction method according to any one of the above embodiments when executing the computer program.
The application provides a test paper image extraction method, a device and terminal equipment, which are used for collecting test paper images of a test paper to be extracted; extracting a handwriting characteristic region image in the test paper image according to a pre-constructed test paper template; extracting and analyzing the identity data information in the test paper image; and correspondingly matching the handwriting characteristic area image with the identity data information to obtain a test paper sample of the test paper to be extracted and a student handwriting digital training sample. The application can realize the extraction of the test paper image with high precision and high accuracy, can effectively utilize the test paper with the answer, can efficiently extract the handwriting number in the student identity information as a training sample, and provides a good training data basis for the student handwriting number recognition model.
Drawings
FIG. 1 is a schematic flow chart of a test paper image extraction method provided by an embodiment of the application;
FIG. 2 is a schematic diagram of a test paper image according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a test paper image extracting device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application provides a test paper image extraction method, referring to fig. 1, which is a flow chart of the test paper image extraction method provided by the embodiment of the application, wherein the method comprises the steps of S1-S4:
s1, collecting a test paper image of a test paper to be extracted;
s2, extracting a handwriting characteristic region image in the test paper image according to a pre-constructed test paper template;
s3, extracting and analyzing the identity data information in the test paper image;
and S4, correspondingly matching the handwriting characteristic area image with the identity data information to obtain a test paper sample of the test paper to be extracted and a student handwriting digital training sample.
When the embodiment is implemented specifically, when the test paper image extraction is performed, the test paper image of the test paper to be extracted needs to be acquired first, and in this process, the test papers of different students are generally placed in a preset photographing area in sequence, and the test paper image of the whole test paper of the test paper to be extracted is acquired by adopting a fixed sensor.
When the electronic examination paper is read, the answer parts of the handwriting areas of different questions are required to be extracted independently for unified examination paper, so that the images of different areas are required to be extracted.
In the process, different areas of the test paper image are determined according to the standard test paper template, and then different handwriting characteristic area images are extracted in different areas.
Extracting and analyzing the identity data information in the test paper image; and correspondingly matching the handwriting characteristic area image with the identity data information to obtain a test paper sample of the test paper to be extracted and a student handwriting digital training sample.
After the handwritten characteristic area image of the test paper image of each test paper to be extracted is obtained, each handwritten characteristic area is required to be correspondingly matched with the identity of the corresponding student, and the score statistics can be carried out according to different identity information after the subsequent test paper reading.
According to the application, through effectively extracting the handwriting characteristic area image and the identity data information of the student, an automatic program is adopted, and the test paper sample data information is accurately and efficiently extracted. And the test paper after answering can be effectively utilized, the handwritten numbers in the student identity information can be efficiently extracted as training samples, and a good training data basis is provided for the student handwritten number recognition model.
In still another embodiment of the present application, the process of building a test paper template includes:
determining locating point coordinate information of a standard test paper;
determining the marking data of different areas according to the positioning point coordinate information;
performing association mapping on the labeling data of different areas and the characteristic area attribute corresponding to the areas to obtain the test paper template;
the positioning point coordinate information comprises an upper left corner coordinate, a lower left corner coordinate, an upper right corner coordinate and a lower right corner coordinate; the annotation data comprises the reference origin coordinates of the region, the feature region width and the feature region height.
When the embodiment is implemented, a test paper template is constructed according to the input standard data, and basic data is provided for subsequent test paper image processing, and the method comprises the following steps:
after the standard test paper is imported, the test paper editing function is entered, and the editing function further at least comprises the steps of presenting an original test paper drawing and providing a test paper marking toolbar;
confirming positioning point coordinate information of a standard test paper, wherein the positioning point coordinate information at least comprises four corner coordinates of the test paper;
constructing a test paper template according to the coordinate information of the positioning points and the marks determined on the standard test paper;
the labeling data specifically refers to labeling points of the region to be extracted on the original test paper, and the labeling at least comprises: coordinate positions of each handwriting characteristic area corresponding to the current test paper.
Coordinate data (Dxi, dyi) of different positioning points, i is a position ordering label according to the positioning points, dx1 is an upper left-hand abscissa, dx2 is an upper right-hand abscissa, dx3 is a lower right-hand abscissa, and Dx4 is a lower left-hand abscissa; dy1 is the upper left-hand abscissa, dy2 is the upper right-hand abscissa, dy3 is the lower right-hand abscissa, and Dy4 is the lower left-hand abscissa.
Determining marking data of different area standards according to the positioning point coordinate information;
the labeling data comprises the reference origin coordinates of the region, the feature region width and the feature region height.
In the specific implementation, the left upper corner locating point is taken as a reference origin, data are marked, and the characteristic region coordinates consist of the characteristic region left upper corner coordinates (Txj, tyj) and the characteristic region width Twidth and height;
and carrying out association mapping on the labeling data of different areas and the characteristic area attribute corresponding to the areas, wherein the characteristic area attribute is a label set for the area when a user labels, the area can be an identity information area or a answering area, and the test paper template is obtained through standard test paper labeling.
According to the test paper template generated by the labeling, an accurate test paper template can be obtained, and further images of different areas can be conveniently identified, and a test paper sample is determined.
In yet another embodiment of the present application, after collecting the test paper image of the test paper to be extracted, the method further includes:
rectangular correction is carried out on the test paper image through transmission transformation;
obtaining an inversion recognition result by recognizing a plurality of asymmetric auxiliary positioning points preset in the test paper to be extracted, and carrying out inversion correction on the test paper image according to the recognition result;
and carrying out characteristic alignment correction on the test paper image and a pre-stored standard test paper by adopting a preset correction algorithm.
When the embodiment is implemented, after the test paper image is acquired, the test paper image is required to be corrected, so that the test paper image is unified, and the accuracy of acquiring the test paper sample is improved.
In order to avoid distortion due to the deviation of the shooting angle of the sensor, rectangular correction is required for the test paper image. And carrying out rectangular correction on the test paper image, wherein the rectangular correction can be further finished through transmission transformation.
In order to avoid the influence on the subsequent identification or examination paper caused by the acquired regional image due to the examination paper rewinding, the examination paper image needs to be inverted and corrected after the examination paper image is acquired;
referring to fig. 2, a schematic diagram of a test paper image according to an embodiment of the present application is shown; in the test paper, a plurality of asymmetric auxiliary positioning points are arranged, optical character recognition is carried out on the test paper image, the positioning points can be clearly and accurately recognized, only the auxiliary positioning points are recognized to be forward after a preset positive position relation is formed, and when the position relation is not met, the test paper is recognized to be inverted; and carrying out inversion correction on the test paper image according to the identified position relation and the normal position relation transformation relation.
As another parallel implementation mode, the inversion correction can be realized in other modes, optical character recognition is carried out according to a preset test paper title, whether the test paper is inverted or not is determined according to the position in the optical character recognition image, the inversion correction is carried out on the test paper image according to the recognition result.
In order to avoid the position deviation of the test paper image caused by the test paper position placement deviation, the image alignment correction needs to be performed on the test paper, and the characteristic alignment correction is performed on the target test paper and the standard test paper stored in the template, and the correction algorithm further can be Homography, meshFlow or Optical flow.
In still another embodiment of the present application, the extracting the handwriting feature area image in the test paper image according to the pre-constructed test paper template includes:
acquiring positioning point coordinate information on the test paper image;
calculating the region information of different characteristic regions according to the marking data in the test paper template and the coordinate information of the locating points on the test paper image;
and extracting the handwriting characteristic region images according to the region information of the different characteristic regions.
In the implementation of this embodiment, each anchor point and anchor point coordinate data (TPxm, TPym) in the target image are identified, and m is a ranking index according to the positions of the anchor points.
Calculating the region information of the position of the characteristic region in the test paper image, which is determined by the labeling data in the test paper template, according to the labeling data in the test paper template and the positioning point data of the test paper image;
and extracting the handwriting characteristic region images according to the region information of the different characteristic regions.
Through the relation between the standard image and the test paper image locating points, the labeling data can be converted into the area information in the corresponding test paper image, and the handwriting characteristic area image is identified.
In yet another embodiment provided by the present application, the region information includes a width, a height, a reference origin abscissa, and a reference origin ordinate;
wherein the width of the jth characteristic region in the test paper imageThe method comprises the steps of carrying out a first treatment on the surface of the The height of the jth characteristic region in the test paper image is +.>The method comprises the steps of carrying out a first treatment on the surface of the The test paperUpper left-hand abscissa of jth feature region in imageThe method comprises the steps of carrying out a first treatment on the surface of the The upper left vertical coordinate of the jth characteristic region in the test paper image is +.>The method comprises the steps of carrying out a first treatment on the surface of the The width ratio of the test paper template and the test paper image in the horizontal direction is +.>The method comprises the steps of carrying out a first treatment on the surface of the The height ratio of the test paper template to the test paper image in the vertical direction is +.>The method comprises the steps of carrying out a first treatment on the surface of the Dx1 and Dx2 are respectively the left upper corner abscissa and the right upper corner abscissa of the standard test paper; TPx1 and TPx2 are the left upper corner abscissa and the right upper corner abscissa of the test paper image respectively; dy4 and Dy1 are respectively the lower right-hand ordinate and the upper left-hand ordinate of the standard test paper; TPy4 and TPy are respectively the lower right-hand ordinate and the upper left-hand ordinate of the test paper image;TwidthjandThightjthe feature area width and the feature area height of the jth feature area in the labeling data are the feature area width and the feature area height; txj and Tyj are the abscissa and ordinate, respectively, of the reference origin of the jth feature region in the annotation data.
When the embodiment is implemented, because the standard image and the acquired test paper template may have scaling or because the sensor position is adjusted to generate a size gap, when the area information is calculated, the ratio of the test paper template to the test paper image needs to be calculated first.
When calculating the region information, firstly, the width ratio of the test paper template to the test paper image in the horizontal direction needs to be calculated:
the height ratio of the test paper template to the test paper image in the vertical direction is as follows:
wherein Dx1 and Dx2 are respectively the left upper corner abscissa and the right upper corner abscissa of the standard test paper; TPx1 and TPx2 are the left upper corner abscissa and the right upper corner abscissa of the test paper image respectively; dy4 and Dy1 are respectively the lower right-hand ordinate and the upper left-hand ordinate of the standard test paper; TPy4 and TPy are the lower right and upper left ordinate of the test paper image, respectively.
Then, according to the calculated proportion, calculating the marking data on the test paper image as the region information, namely calculating the width and the height of different regions, the abscissa of the reference origin and the ordinate of the reference origin;
wherein the width of the jth characteristic region in the test paper imageThe method comprises the steps of carrying out a first treatment on the surface of the The height of the jth characteristic region in the test paper image is +.>The method comprises the steps of carrying out a first treatment on the surface of the The upper left-corner abscissa of the jth characteristic region in the test paper imageThe method comprises the steps of carrying out a first treatment on the surface of the The upper left vertical coordinate of the jth characteristic region in the test paper image is +.>TwidthjAndThightjthe feature area width and the feature area height of the jth feature area in the labeling data are the feature area width and the feature area height; txj and Tyj are the abscissa and ordinate, respectively, of the reference origin of the jth feature region in the annotation data.
The feature region position on the test paper image can be determined by converting the labeling data, and the corresponding handwriting feature region image in the test paper image can be accurately extracted.
In yet another embodiment of the present application, after extracting the handwriting feature area image in the test paper image according to the pre-constructed test paper template, the method further includes:
calculating the segmentation confidence of the handwriting feature area image through a preset target detection model;
if the calculated segmentation confidence is higher than a preset confidence threshold, judging that the regional feature mark is correct;
and if the calculated segmentation confidence is not higher than the confidence threshold, judging that the regional characteristic mark is incorrect.
When the embodiment is implemented, the verification is performed on different hand-written region feature images segmented according to region information, and the verification process is as follows:
identifying the segmentation confidence of the characteristic images of different handwriting areas through a target detection model;
if the calculated segmentation confidence is higher than a preset confidence threshold, judging that the regional feature mark is correct;
if the calculated segmentation confidence is not higher than the confidence threshold, judging that the regional feature labels are incorrect, and further manual verification is needed.
It should be noted that, the object detection model algorithm may use yolo object detection algorithm or other algorithms.
And the accuracy of the segmentation area is improved by verifying the segmentation area through segmentation confidence.
In still another embodiment of the present application, the extracting and analyzing the identity data information in the test paper image specifically includes:
extracting an identity area image in the test paper image;
identifying a two-dimensional code or a bar code in the identity area image, and acquiring the identity data information;
the identity data information comprises at least one of examination room number, seat number, admission ticket number and ticket number.
When the embodiment is implemented, a test paper image is acquired, and an identity area image of a handwriting characteristic area image in the test paper image at a designated position is determined;
identifying a two-dimensional code or a bar code in the identity area image, and acquiring the identity data information;
the identity data information comprises at least one of examination room number, seat number, admission ticket number and ticket number.
And carrying out association binding on the identity data information and the data information corresponding to and matched with the handwriting characteristic area image, and then using the identity data information and the data information as data items in a sample data set.
In still another embodiment of the present application, the student handwriting digital training sample obtaining process specifically includes:
extracting a handwriting characteristic area image comprising digital identity information of students;
and carrying out association binding on the identity data information and the handwriting characteristic area image comprising the student digital identity information to obtain a student handwriting digital training sample.
When the embodiment is implemented, when a handwriting digital training sample of a student is acquired, a handwriting characteristic area image comprising digital identity information of the student needs to be extracted;
the student digital identity information is specifically information such as a test number, an examination room number, a seat number, an identity card number and the like which need to be input by students in a handwriting digital form, and a handwriting characteristic area image comprising the student digital identity information is extracted;
and carrying out association binding on the identity data information and the handwriting characteristic area image comprising the student digital identity information, so that a student handwriting digital training sample of the student can be confirmed, and a good training data basis is provided for the student handwriting digital recognition model.
In still another embodiment of the present application, referring to fig. 3, a schematic structural diagram of a test paper image extracting device provided in an embodiment of the present application is provided, where the device includes:
the image acquisition module is used for acquiring a test paper image of the test paper to be extracted;
the region extraction module is used for extracting a handwriting characteristic region image in the test paper image according to a pre-constructed test paper template;
the identity extraction module is used for extracting and analyzing the identity data information in the test paper image;
and the sample construction module is used for correspondingly matching the handwriting characteristic area image with the identity data information to obtain a test paper sample of the test paper to be extracted and a student handwriting digital training sample.
It should be noted that, the test paper image extraction device provided in the embodiment of the present application can execute the test paper image extraction method described in any embodiment of the foregoing embodiments, and specific functions of the test paper image extraction device are not described herein.
Referring to fig. 4, a schematic structural diagram of a terminal device according to an embodiment of the present application is provided. The terminal device of this embodiment includes: a processor, a memory, and a computer program, such as a test paper image extraction program, stored in the memory and executable on the processor. The steps in the above embodiments of the method for extracting test paper images are implemented when the processor executes the computer program, for example, steps S1 to S4 shown in fig. 1. Alternatively, the processor may implement the functions of the modules in the above-described device embodiments when executing the computer program.
The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present application, for example. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used for describing the execution of the computer program in the terminal device. For example, the computer program may be divided into modules, and specific functions of each module are not described herein.
The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a terminal device and does not constitute a limitation of the terminal device, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the terminal device, and which connects various parts of the entire terminal device using various interfaces and lines.
The memory may be used to store the computer program and/or module, and the processor may implement various functions of the terminal device by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the terminal device integrated modules/units may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as stand alone products. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of code, object code, executable files, or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present application without undue burden.
While the foregoing is directed to the preferred embodiments of the present application, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the application, such changes and modifications are also intended to be within the scope of the application.

Claims (10)

1. The test paper image extraction method is characterized by comprising the following steps of:
collecting a test paper image of a test paper to be extracted;
extracting a handwriting characteristic region image in the test paper image according to a pre-constructed test paper template;
extracting and analyzing the identity data information in the test paper image;
and correspondingly matching the handwriting characteristic area image with the identity data information to obtain a test paper sample of the test paper to be extracted and a student handwriting digital training sample.
2. The test paper image extraction method of claim 1, wherein the test paper template construction process comprises:
determining locating point coordinate information of a standard test paper;
determining the marking data of different areas according to the positioning point coordinate information;
performing association mapping on the labeling data of different areas and the characteristic area attribute corresponding to the areas to obtain the test paper template;
the positioning point coordinate information comprises an upper left corner coordinate, a lower left corner coordinate, an upper right corner coordinate and a lower right corner coordinate; the annotation data comprises the reference origin coordinates of the region, the feature region width and the feature region height.
3. The test paper image extraction method as claimed in claim 1, wherein after collecting the test paper image of the test paper to be extracted, the method further comprises:
rectangular correction is carried out on the test paper image through transmission transformation;
obtaining an inversion recognition result by recognizing a plurality of asymmetric auxiliary positioning points preset in the test paper to be extracted, and carrying out inversion correction on the test paper image according to the recognition result;
and carrying out characteristic alignment correction on the test paper image and a pre-stored standard test paper by adopting a preset correction algorithm.
4. The method for extracting a test paper image according to claim 1, wherein the step of extracting a handwritten feature area image in the test paper image according to a pre-constructed test paper template comprises:
acquiring positioning point coordinate information on the test paper image;
calculating the region information of different characteristic regions according to the marking data in the test paper template and the coordinate information of the locating points on the test paper image;
and extracting the handwriting characteristic region images according to the region information of the different characteristic regions.
5. The test paper image extraction method of claim 4, wherein the region information includes a width, a height, a reference origin abscissa, and a reference origin ordinate;
wherein the width of the jth characteristic region in the test paper imageThe method comprises the steps of carrying out a first treatment on the surface of the The height of the jth characteristic region in the test paper image is +.>The method comprises the steps of carrying out a first treatment on the surface of the The left upper-corner abscissa ++of the jth characteristic region in the test paper image>The method comprises the steps of carrying out a first treatment on the surface of the The upper left vertical coordinate of the jth characteristic region in the test paper image is +.>The method comprises the steps of carrying out a first treatment on the surface of the The width ratio of the test paper template and the test paper image in the horizontal direction is +.>The method comprises the steps of carrying out a first treatment on the surface of the The height ratio of the test paper template to the test paper image in the vertical direction is +.>The method comprises the steps of carrying out a first treatment on the surface of the Dx1 and Dx2 are respectively the left upper corner abscissa and the right upper corner abscissa of the standard test paper; TPx1 and TPx2 are the left upper corner abscissa and the right upper corner abscissa of the test paper image respectively; dy4 and Dy1 are respectively the lower right-hand ordinate and the upper left-hand ordinate of the standard test paper; TPy4 and TPy are respectively the lower right-hand ordinate and the upper left-hand ordinate of the test paper image;TwidthjandThightjfor the noted dataFeature area width and feature area height of the jth feature area; txj and Tyj are the abscissa and ordinate, respectively, of the reference origin of the jth feature region in the annotation data.
6. The test paper image extraction method of claim 1, wherein after extracting the handwritten feature area image in the test paper image according to a pre-constructed test paper template, the method further comprises:
calculating the segmentation confidence of the handwriting feature area image through a preset target detection model;
if the calculated segmentation confidence is higher than a preset confidence threshold, judging that the regional feature mark is correct;
and if the calculated segmentation confidence is not higher than the confidence threshold, judging that the regional characteristic mark is incorrect.
7. The test paper image extraction method according to claim 1, wherein the extracting and analyzing the identity data information in the test paper image specifically comprises:
extracting an identity area image in the test paper image;
identifying a two-dimensional code or a bar code in the identity area image, and acquiring the identity data information;
the identity data information comprises at least one of examination room number, seat number, admission ticket number and ticket number.
8. The test paper image extraction method of claim 7, wherein the student handwriting digital training sample acquisition process specifically comprises:
extracting a handwriting characteristic area image comprising digital identity information of students;
and carrying out association binding on the identity data information and the handwriting characteristic area image comprising the student digital identity information to obtain a student handwriting digital training sample.
9. A test paper image extraction apparatus, the apparatus comprising:
the image acquisition module is used for acquiring a test paper image of the test paper to be extracted;
the region extraction module is used for extracting a handwriting characteristic region image in the test paper image according to a pre-constructed test paper template;
the identity extraction module is used for extracting and analyzing the identity data information in the test paper image;
and the sample construction module is used for correspondingly matching the handwriting characteristic area image with the identity data information to obtain a test paper sample of the test paper to be extracted and a student handwriting digital training sample.
10. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the test paper image extraction method according to any one of claims 1 to 8 when the computer program is executed.
CN202311303753.5A 2023-10-10 2023-10-10 Test paper image extraction method and device and terminal equipment Pending CN117037187A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311303753.5A CN117037187A (en) 2023-10-10 2023-10-10 Test paper image extraction method and device and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311303753.5A CN117037187A (en) 2023-10-10 2023-10-10 Test paper image extraction method and device and terminal equipment

Publications (1)

Publication Number Publication Date
CN117037187A true CN117037187A (en) 2023-11-10

Family

ID=88623140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311303753.5A Pending CN117037187A (en) 2023-10-10 2023-10-10 Test paper image extraction method and device and terminal equipment

Country Status (1)

Country Link
CN (1) CN117037187A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5597311A (en) * 1993-12-30 1997-01-28 Ricoh Company, Ltd. System for making examination papers and having an automatic marking function
CN105373978A (en) * 2015-08-12 2016-03-02 高学 Artificial test paper judgment processing device and artificial test paper judgment processing method based on OCR
CN109634961A (en) * 2018-12-05 2019-04-16 杭州大拿科技股份有限公司 A kind of paper sample generating method, device, electronic equipment and storage medium
CN110659584A (en) * 2019-08-30 2020-01-07 石家庄云松信息科技有限公司 Intelligent trace marking system based on image recognition
CN113095312A (en) * 2021-04-06 2021-07-09 中教云智数字科技有限公司 Method for acquiring handwritten answer area in test paper based on identification code
CN116824607A (en) * 2023-06-25 2023-09-29 湖南墨思博教育科技有限公司 High-precision answer sheet identification method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5597311A (en) * 1993-12-30 1997-01-28 Ricoh Company, Ltd. System for making examination papers and having an automatic marking function
CN105373978A (en) * 2015-08-12 2016-03-02 高学 Artificial test paper judgment processing device and artificial test paper judgment processing method based on OCR
CN109634961A (en) * 2018-12-05 2019-04-16 杭州大拿科技股份有限公司 A kind of paper sample generating method, device, electronic equipment and storage medium
CN110659584A (en) * 2019-08-30 2020-01-07 石家庄云松信息科技有限公司 Intelligent trace marking system based on image recognition
CN113095312A (en) * 2021-04-06 2021-07-09 中教云智数字科技有限公司 Method for acquiring handwritten answer area in test paper based on identification code
CN116824607A (en) * 2023-06-25 2023-09-29 湖南墨思博教育科技有限公司 High-precision answer sheet identification method

Similar Documents

Publication Publication Date Title
CN109815932B (en) Test paper correcting method and device, electronic equipment and storage medium
CN111507251B (en) Method and device for positioning answering area in test question image, electronic equipment and computer storage medium
CN111476227B (en) Target field identification method and device based on OCR and storage medium
CN109800761A (en) Method and terminal based on deep learning model creation paper document structural data
CN111291629A (en) Method and device for recognizing text in image, computer equipment and computer storage medium
CN112926469B (en) Certificate identification method based on deep learning OCR and layout structure
CN108509988B (en) Test paper score automatic statistical method and device, electronic equipment and storage medium
CN111783757A (en) OCR technology-based identification card recognition method in complex scene
WO2021232670A1 (en) Pcb component identification method and device
CN113159014A (en) Objective question reading method, device, equipment and storage medium based on handwritten question numbers
CN112347997A (en) Test question detection and identification method and device, electronic equipment and medium
CN107067399A (en) A kind of paper image segmentation processing method
CN112991410A (en) Text image registration method, electronic equipment and storage medium thereof
CN112632926A (en) Data processing method and device for bill, electronic equipment and storage medium
CN111325106A (en) Method and device for generating training data
CN117037187A (en) Test paper image extraction method and device and terminal equipment
CN115457585A (en) Processing method and device for homework correction, computer equipment and readable storage medium
CN111783737B (en) Mathematical formula identification method and device
CN113033480A (en) Answer sheet-based objective question reading method, device, equipment and storage medium
CN114926840A (en) Method and system for transferring photocopy PDF (Portable document Format) to reproducible PDF
CN114359931A (en) Express bill identification method and device, computer equipment and storage medium
CN113903039A (en) Color-based answer area acquisition method for answer sheet
CN114241486A (en) Method for improving accuracy rate of identifying student information of test paper
CN112396057A (en) Character recognition method and device and electronic equipment
CN114550181B (en) Method, device and medium for identifying question

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination