CN113128486A - Construction method and device of handwritten mathematical formula sample library and terminal equipment - Google Patents

Construction method and device of handwritten mathematical formula sample library and terminal equipment Download PDF

Info

Publication number
CN113128486A
CN113128486A CN202110350789.3A CN202110350789A CN113128486A CN 113128486 A CN113128486 A CN 113128486A CN 202110350789 A CN202110350789 A CN 202110350789A CN 113128486 A CN113128486 A CN 113128486A
Authority
CN
China
Prior art keywords
image
formula
frame
recognition
question block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110350789.3A
Other languages
Chinese (zh)
Other versions
CN113128486B (en
Inventor
郭蔚
魏亚恒
李冰欣
任庆云
周丙寅
田亮
韩娟娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Normal University
Original Assignee
Hebei Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Normal University filed Critical Hebei Normal University
Priority to CN202110350789.3A priority Critical patent/CN113128486B/en
Publication of CN113128486A publication Critical patent/CN113128486A/en
Application granted granted Critical
Publication of CN113128486B publication Critical patent/CN113128486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Character Input (AREA)

Abstract

The invention is suitable for the technical field of image recognition, and provides a method and a device for constructing a sample library of handwritten mathematical formulas and terminal equipment, wherein the method comprises the following steps: cutting an image to be identified into at least one question block image; performing position detection on the handwritten mathematical formulas through a first mathematical formula recognition program to determine initial labeling frames of the handwritten mathematical formulas; then, manually correcting the initial marking frame; extracting a formula image in each question block image according to the corrected labeling frame, and performing data identification on each formula image through a second mathematical formula identification program; performing data recognition on the target formula image by adopting a preset deep learning model and assisting in a manual recognition mode, and outputting second recognition data corresponding to the target formula image; and finally, constructing a handwritten mathematical formula sample library by adopting the first training sample and the second training sample. According to the method and the device, the sample library which enables the program to be accurately identified can be established through the steps, so that the accuracy of identifying the handwritten mathematical formula is improved.

Description

Construction method and device of handwritten mathematical formula sample library and terminal equipment
Technical Field
The invention belongs to the technical field of image recognition, and particularly relates to a method and a device for constructing a handwritten mathematical formula sample library and terminal equipment.
Background
In the field of education, students and teachers generate a large amount of mathematical handwriting contents everyday, the written contents contain rich and valuable information, most of the contents are recorded and stored by paper media, so that the information contained in the written contents is difficult to be utilized efficiently.
The method uses machine learning to learn the sample data of the handwritten mathematical formula, can process the writing content of the student, and can complete various functions such as intelligent reading, personalized recommendation, student portrait and the like. This will greatly reduce the workload of the teacher. The current mathematical formula samples such as MINST sample library, CROHME online mathematical formula library and the like are single and cannot well reflect the actual handwritten mathematical formula.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for constructing a handwritten mathematical formula sample library, and a terminal device, so as to solve the problem in the prior art that a sample of a handwritten mathematical formula library is single.
The first aspect of the embodiments of the present invention provides a method for constructing a sample library of handwritten mathematical formulas, including:
acquiring a plurality of images to be recognized containing handwritten mathematical formulas, and cutting each image to be recognized into at least one question block image;
carrying out position detection on the handwritten mathematical formulas in the question block images through a first mathematical formula recognition program, and determining initial labeling frames of the handwritten mathematical formulas;
acquiring the correction operation of a user on the wrong initial labeling frame, correcting the corresponding initial labeling frame based on the correction operation, and extracting a formula image in each question block image according to the corrected labeling frame;
performing data recognition on each formula image through a second mathematical formula recognition program to determine first recognition data of a handwritten mathematical formula in each formula image;
performing data recognition on the target formula image by adopting a preset deep learning model and assisting in a manual recognition mode, and outputting second recognition data corresponding to the target formula image; the target formula image is a formula image with first identification data errors;
constructing a handwritten mathematical formula sample library by adopting a first training sample and a second training sample, wherein the first training sample comprises a formula image with correct first identification data and corresponding first identification data; the second training sample includes a target formula image and corresponding second recognition data.
A second aspect of the embodiments of the present invention provides a device for constructing a sample library of handwritten mathematical formulas, including:
the question block image acquisition module is used for acquiring a plurality of images to be recognized containing handwritten mathematical formulas and cutting each image to be recognized into at least one question block image;
the initial labeling frame identification module is used for carrying out position detection on the handwritten mathematical formulas in the question block images through a first mathematical formula identification program and determining the initial labeling frames of the handwritten mathematical formulas;
the annotation frame correction module is used for acquiring correction operation of a user on the wrong initial annotation frame, correcting the corresponding initial annotation frame based on the correction operation, and extracting a formula image in each question block image according to the corrected annotation frame;
the first data identification module is used for carrying out data identification on each formula image through a second mathematical formula identification program and determining first identification data of a handwritten mathematical formula in each formula image;
the second data identification module is used for carrying out data identification on the target formula image by adopting a preset deep learning model and assisting in a manual identification mode and outputting second identification data corresponding to the target formula image; the target formula image is a formula image with first identification data errors;
the system comprises a sample base construction module, a handwriting mathematical formula sample base construction module and a handwriting mathematical formula recognition module, wherein the sample base construction module is used for constructing a handwritten mathematical formula sample base by adopting a first training sample and a second training sample, and the first training sample comprises a formula image with correct first recognition data and corresponding first recognition data; the second training sample includes a target formula image and corresponding second recognition data.
A third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for constructing the sample library of handwritten mathematical formulas as described above when executing the computer program.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the method for constructing the sample library of handwritten mathematical formulas as described above.
Compared with the prior art, the embodiment of the invention has the following beneficial effects: the embodiment firstly cuts an image to be identified into at least one question block image; performing position detection on the handwritten mathematical formulas in the question block images through a first mathematical formula recognition program, and determining initial labeling frames of the handwritten mathematical formulas; then, manually correcting the initial marking frame; extracting formula images in the question block images according to the corrected label frames, then carrying out data recognition on the formula images through a second mathematical formula recognition program, and determining first recognition data of handwritten mathematical formulas in the formula images; performing data recognition on the target formula image by adopting a preset deep learning model and assisting in a manual recognition mode, and outputting second recognition data corresponding to the target formula image; and finally, constructing a handwritten mathematical formula sample library by adopting the first training sample and the second training sample. In the embodiment, through the steps, a sample library which can accurately identify the program is constructed in a mode of identifying the marking frame, identifying the data and identifying the deep learning model with the assistance of manual identification, so that the accuracy of identifying the handwritten mathematical formula is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic flow chart of an implementation of a method for constructing a sample library of handwritten mathematical formulas according to an embodiment of the present invention;
FIG. 2 is a diagram of an image of a question block with a label box according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data recognition display interface provided by an embodiment of the invention;
FIG. 4 is a schematic diagram of an apparatus for constructing a sample library of handwritten mathematical formulas according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
In an embodiment, as shown in fig. 1, fig. 1 shows an implementation flow of a method for constructing a handwritten mathematical formula sample library provided in this embodiment, an execution subject of this embodiment is a terminal device, and a process thereof is detailed as follows:
s101: the method comprises the steps of obtaining a plurality of images to be recognized containing handwritten mathematical formulas, and cutting each image to be recognized into at least one question block image.
In order to create a sample library of handwritten mathematical formulas, the embodiment needs to first collect a large number of files containing mathematical formulas, for example, a pdf file can be obtained by scanning a test paper of a mathematical examination, and a terminal device obtains the pdf file of the mathematical test paper. The mathematical examination paper data can relate to a plurality of mathematical knowledge such as trigonometric functions, derivatives, probabilities, geometries and the like of primary schools, middle schools, universities and the above school paragraphs, and is not limited to the above.
After the pdf file of the math test paper is obtained, the pdf file is converted into a png format file.
Specifically, the collected data image calls a getPixmap function in a fitz library, and the pdf file is converted into the png format.
In an embodiment, the specific implementation process of S101 includes:
s201: and carrying out graying processing on each image to be identified respectively to obtain a corresponding grayscale image.
In this embodiment, the image to be recognized is regarded as a standard rectangle, each pixel point in the image to be recognized has a corresponding row and column in the matrix, and therefore the position of each pixel may be represented by a group of (i, j), where (i, j) represents the pixel point in the ith row and the jth column, each pixel in the image has three channels, which are divided into R, G, B three components, R, G, B each component is selected from [0, 255], and graying refers to that, in an RGB model, if R ═ G ═ B, a color represents a gray color, where R ═ G ═ B is a gray value, and average graying may be given to the three components of each pixel point by calculating an average value of R, G, B three components of each pixel point; namely, it is
Figure BDA0003002094240000051
S202: and carrying out binarization processing on each gray level image based on a maximum inter-class variance method to obtain a corresponding binarization image.
In this embodiment, after the image to be recognized is grayed, the grayed image is binarized, an Otsu (the ohsu method or the maximum inter-class variance method) algorithm realizes automatic selection of a threshold value based on the statistical characteristics of the whole image, the whole grayed image histogram is divided into two classes by the threshold value, and the threshold value corresponding to the maximum inter-class variance is found, so that the background and the target in the image can be separated, and a black and white effect is presented.
For example, for a grayscale image I (x, y), the segmentation threshold of the foreground and background is denoted as T, and the ratio of the number of foreground pixels to the whole image is denoted as ω0Average gray level mu of0(ii) a The proportion of the number of background pixels to the whole image is omega1Average gray of μ1. The total mean gray level of the image is denoted as μ and the inter-class variance is denoted as g.
Assuming that the background of the image is dark and the size of the image is M N, the number of pixels in the image having a gray value less than the threshold T is denoted as N0The number of pixels having a pixel gray level greater than the threshold T is denoted by N1Then, there are:
Figure BDA0003002094240000061
bringing formula (4) in the above formula to formula (5) finds T that maximizes threshold g.
In this embodiment, the binarized image corresponding to each gray level image is obtained by the above method.
S203: and respectively cutting each binary image into at least one question block image based on a projection method.
In an embodiment, the specific implementation flow of S203 includes:
calculating the horizontal numerical value of each binary image based on an improved horizontal projection formula, performing horizontal projection segmentation on each binary image according to the horizontal numerical value, calculating the vertical numerical value of each binary image based on an improved vertical projection formula, and performing vertical projection segmentation on each binary image according to the vertical numerical value to obtain a question block image;
wherein the improved horizontal projection formula is:
Figure BDA0003002094240000062
the improved vertical projection formula is as follows:
Figure BDA0003002094240000063
h (i) represents a horizontal numerical value, V (j) represents a vertical numerical value, and IMG theta (i, j) represents pixel values of an ith row and a jth column in the binarized image after the binarized image is rotated by a horizontal rotation angle theta;
Figure BDA0003002094240000064
indicating a rotation angle through a vertical direction
Figure BDA0003002094240000065
And pixel values of the ith row and the jth column of the rotated binary image.
In this embodiment, since each question block of the mathematical test paper has a surrounding frame around, the peak obtained by horizontal or vertical projection should appear at the frame line, and the frame line is not vertical or horizontal or even non-parallel due to the inclination of the test paper itself and the vibration of the machine during the scanning of the mathematical test paper, so that the peak is low and is not beneficial to the segmentation. Therefore, the embodiment adds a jitter variable in the projection method, so that the wireframe peak value is improved, and the segmentation of the question blocks is facilitated.
Considering the characteristics of the typesetting of the mathematical test paper, the layout is generally two or three columns, so the embodiment performs the vertical projection segmentation first, and then performs the horizontal projection segmentation, thereby obtaining each question block.
Specifically, the horizontal rotation angle θ represents an included angle between a question block frame in the binarized image and the horizontal direction, and the vertical rotation angle θ represents a vertical rotation angle
Figure BDA0003002094240000071
The included angle between the frame and the vertical direction in the binary image is represented, and (-k, k) represents the angle range.
S102: and carrying out position detection on the handwritten mathematical formulas in the question block images through a first mathematical formula recognition program, and determining initial labeling frames of the handwritten mathematical formulas.
In this embodiment, the terminal device may call a first mathematical formula identification program to identify a position of a handwritten mathematical formula in each question block image, and mark a frame on the position of each handwritten mathematical formula in the question block image, where the first mathematical formula identification program may adopt a mathematical formula identification program that is common in the prior art. However, the existing recognition program has poor recognition effect on complex mathematical formulas such as multiple lines, root numbers or fraction formulas, and generally only can recognize approximate positions, so the initial labeling frame of each handwritten mathematical formula in the question block image is labeled by the first mathematical formula recognition program. As shown in FIG. 2, the initial labeled box is a polygon box for framing the formula location identified by the first mathematical formula identification program.
In an embodiment, after S102, the method for constructing the sample library of handwritten mathematical formulas further includes:
and displaying the question block image marked with the initial marking frame.
In order to facilitate the user to modify the wrong initial labeling frame, the terminal device of this embodiment is further configured to display the question block image labeled with the initial labeling frame.
S103: and acquiring the correction operation of the user on the wrong initial labeling frame, correcting the corresponding initial labeling frame based on the correction operation, and extracting the formula image in each question block image according to the corrected labeling frame.
In one embodiment, the corrective action includes cutting the label box horizontally, cutting the label box vertically, adding the label box, deleting the label box, merging the label box, and modifying the label box.
In the present embodiment, the correction operation includes five kinds of correction, i.e., addition, deletion, modification, and deletion.
Specifically, the horizontal cutting marking frame operation is specifically configured to:
horizontal cuts are used for the case where a single line contains more than one formula, and are mainly labeled three: adding a horizontal cutting marking frame between the two formulas, and naming the content of the marking frame by Cut _ math _ math (representing to Cut the left and right mathematical formulas), Cut _ cn _ math (representing to Cut the left Chinese character and the right mathematical formula), and Cut _ math _ cn (representing to Cut the left mathematical formula and the right Chinese character);
the vertical cut marking box operation is specifically used for:
vertical direction cutting is used for the condition that a plurality of lines of texts are detected as one detection box, and the vertical cutting marking box has three marks: a vertical cutting marking frame is added between the two formulas, and the marking frame contents are named as Cut _ math _ math _ v (representing that the upper and lower mathematical formulas are Cut in the vertical direction), Cut _ cn _ math _ v (representing that the upper and lower Chinese character formulas are Cut in the vertical direction), and Cut _ math _ cn _ v (representing that the upper and lower Chinese character formulas are Cut in the vertical direction).
The add label box operation is specifically for:
for the case that the question block image contains the mathematical formula but is not detected, a labeling box needs to be added to the formula and named, and the omission formula is mainly divided into two types: the mathematical formula of omission factor is as follows: the add _ math and missed Chinese are as follows: add _ chip, naming the detected Chinese as "chip".
The merge marking box operation is specifically configured to:
for the case where a formula is detected as multiple parts, it is necessary to build several parts into an integral large label box, which wraps the separate parts and names the large label box as "unity".
The delete marking box operation is specifically for:
for places where there is no text to detect instead a text box, a new label box is created, named "delete".
And (3) modifying the label frame:
and carrying out corner point modification operation on the labeling box covering the position of the mathematical formula with incomplete position, and naming 'modify'.
S104: and performing data recognition on each formula image through a second mathematical formula recognition program to determine first recognition data of the handwritten mathematical formula in each formula image.
In an embodiment, the specific implementation flow of S103 includes:
s301: generating a manual marking frame based on the correction operation;
s302: determining a revised labeling frame of each question block image based on an intersection ratio algorithm, and an artificial labeling frame and an initial labeling frame corresponding to each question block image;
s303: cutting each question block image into a plurality of marking frame images according to the corrected marking frame; and using the image of the labeling frame containing the handwritten mathematical formula as a formula image.
The specific implementation process of S302 includes:
inputting the manual marking frame and the initial marking frame corresponding to each question block image into an intersection ratio formula to obtain an intersection ratio corresponding to each initial marking frame;
correcting the initial marking frame with the intersection ratio larger than a first preset threshold value based on the corresponding manual marking frame to obtain a corrected marking frame of each question block image;
the intersection ratio formula is as follows:
Figure BDA0003002094240000091
wherein, NIOU represents the intersection ratio, S (A) represents the area of the artificial labeling frame, and S (B) represents the area of the initial labeling frame.
In this embodiment, taking the horizontal cutting marking frame as an example, if a horizontal cutting marking frame is manually added into a certain initial marking frame, calculating an intersection ratio of the manually added horizontal cutting marking frame and the initial marking frame, and when the intersection ratio is greater than a first preset threshold, dividing the initial marking frame into a left marking frame and a right marking frame by taking the horizontal cutting marking frame as a boundary. And determining the types of the two Cut marking frames according to the names of the horizontal cutting marking frames, for example, if the name of the horizontal cutting marking frame is Cut _ cn _ math, Chinese characters are in the left marking frame after the cutting, a handwritten mathematical formula is in the right marking frame, then the corresponding question block image is Cut by the corrected marking frame to obtain a marking frame image, and the marking frame image named as the handwritten mathematical formula is screened out to be used as the formula image.
In this embodiment, for the formula images, a second mathematical formula recognition program is used to perform data recognition, and first recognition data of the handwritten mathematical formula in each formula image is obtained, where the recognition data is a mathematical formula latex sentence. Since the current formula image has been modified and cut, the recognition accuracy is higher than that of the original question block image.
In an embodiment, the first identification data is a mathematical formula latex statement, and after S104, the method for constructing a sample library of handwritten mathematical formulas provided in this embodiment further includes:
and if the formula image selected by the user is acquired, displaying a mathematical formula latex sentence corresponding to the formula image selected by the user.
In this embodiment, as shown in fig. 3, fig. 3 shows a display interface of data recognition in S104, where the first frame from the top in fig. 3 is a formula image, the second frame from the top is a print formula generated by a formula latex sentence, and the fifth frame from the top is a formula latex sentence obtained by a second formula recognition program. The user can view the original mathematical formula and the identified mathematical formula in the formula image through the interface shown in fig. 3, and modify the 5 th box in fig. 3 when viewing that the identified mathematical formula is incorrect.
In this embodiment, the first mathematical formula identification program and the second mathematical formula identification program may be the same or different.
In this embodiment, the second mathematical formula recognition image may output the confidence of each recognition data in addition to the first recognition data, and if the confidence is greater than the preset confidence threshold, the recognition data corresponding to the confidence is used as the correct recognition data, otherwise, the recognition data corresponding to the confidence is used as the incorrect recognition data, so as to perform the subsequent processing on the formula images corresponding to the correct and incorrect recognition data, respectively.
In one embodiment, before S104, the method for constructing the sample library of handwritten mathematical formulas further includes:
constructing an initial deep learning model;
acquiring manual identification data of a user on a first formula image; the first formula image is a formula image with a first identification data error except the target formula image;
training an initial deep learning model by adopting a third training sample to obtain the preset deep learning model; the third training sample includes a first formula image and corresponding human recognition data.
In this embodiment, for a formula image with a first identification data error, a part of the formula image is corrected by a manual correction method, the identification data of the part of the formula image is corrected, the formula image with the corrected identification data is used as a target formula image, and a deep learning model is trained by using the target formula image. And when the recognition accuracy of the deep learning model meets the requirement, recognizing a large number of subsequent formula images by using the deep learning model.
S105: performing data recognition on the target formula image by adopting a preset deep learning model and assisting in a manual recognition mode, and outputting second recognition data corresponding to the target formula image; the target formula image is a formula image with first identification data errors.
In this embodiment, the target formula image is input into the deep learning model, the recognition result is output, if the recognition result is correct, the recognition result of this time is used as the second recognition data of the target formula image, if the recognition result is wrong, the target formula image is recognized in a manual recognition mode, the result of the manual recognition and the corresponding target formula image are input into the deep learning model again, and the above process is repeated until the recognition result of the deep learning model on the target formula image is correct.
By adopting the method, the recognition precision of the deep learning model is higher and higher along with the continuous increase of the recognition times, so that the times needing manual correction are less and less, a large number of second training samples can be obtained through the deep learning model, the handwritten mathematical formula sample library is greatly enriched, and the construction efficiency of the sample library is improved.
S106: constructing a handwritten mathematical formula sample library by adopting a first training sample and a second training sample, wherein the first training sample comprises a formula image with correct first identification data and corresponding first identification data; the second training sample includes a target formula image and corresponding second recognition data.
As can be seen from the above embodiments, in the present embodiment, an image to be recognized is first cut into at least one question block image; performing position detection on the handwritten mathematical formulas in the question block images through a first mathematical formula recognition program, and determining initial labeling frames of the handwritten mathematical formulas; then, manually correcting the initial marking frame; extracting formula images in the question block images according to the corrected label frames, then carrying out data recognition on the formula images through a second mathematical formula recognition program, and determining first recognition data of handwritten mathematical formulas in the formula images; performing data recognition on the target formula image by adopting a preset deep learning model and assisting in a manual recognition mode, and outputting second recognition data corresponding to the target formula image; and finally, constructing a handwritten mathematical formula sample library by adopting the first training sample and the second training sample. In the embodiment, through the steps, a sample library which can accurately identify the program is constructed in a mode of identifying the marking frame, identifying the data and identifying the deep learning model with the assistance of manual identification, so that the accuracy of identifying the handwritten mathematical formula is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In one embodiment, as shown in fig. 4, fig. 4 shows a structure of a construction apparatus 100 for a sample library of handwritten mathematical formulas provided in this embodiment, which includes:
the question block image obtaining module 110 is configured to obtain a plurality of to-be-identified images containing handwritten mathematical formulas, and cut each to-be-identified image into at least one question block image;
an initial labeling frame identification module 120, configured to perform position detection on the handwritten mathematical formulas in each question block image through a first mathematical formula identification program, and determine initial labeling frames of the handwritten mathematical formulas;
the annotation frame correction module 130 is configured to obtain a correction operation of the user on the erroneous initial annotation frame, correct the corresponding initial annotation frame based on the correction operation, and extract a formula image in each question block image according to the corrected annotation frame;
the first data identification module 140 is configured to perform data identification on each formula image through a second mathematical formula identification program, and determine first identification data of a handwritten mathematical formula in each formula image;
the second data identification module 150 is configured to perform data identification on the target formula image by using a preset deep learning model and assisting in manual identification, and output second identification data corresponding to the target formula image; the target formula image is a formula image with first identification data errors;
the sample library construction module 160 is configured to construct a handwritten mathematical formula sample library by using a first training sample and a second training sample, where the first training sample includes a formula image with correct first identification data and corresponding first identification data; the second training sample includes a target formula image and corresponding second recognition data.
In one embodiment, the question block image obtaining module 110 includes:
the graying unit is used for respectively carrying out graying processing on each image to be identified to obtain a corresponding grayscale image;
the binarization unit is used for carrying out binarization processing on each gray level image based on the maximum inter-class variance method to obtain a corresponding binarization image;
and the cutting unit is used for respectively cutting each binary image into at least one question block image based on a projection method.
In one embodiment, the cutting unit includes:
calculating the horizontal numerical value of each binary image based on an improved horizontal projection formula, performing horizontal projection segmentation on each binary image according to the horizontal numerical value, calculating the vertical numerical value of each binary image based on an improved vertical projection formula, and performing vertical projection segmentation on each binary image according to the vertical numerical value to obtain a question block image;
wherein the improved horizontal projection formula is:
Figure BDA0003002094240000131
the improved vertical projection formula is as follows:
Figure BDA0003002094240000132
h (i) represents a horizontal numerical value, V (j) represents a vertical numerical value, and IMG theta (i, j) represents pixel values of an ith row and a jth column in the binarized image after the binarized image is rotated by a horizontal rotation angle theta;
Figure BDA0003002094240000133
indicating a rotation angle through a vertical direction
Figure BDA0003002094240000134
And pixel values of the ith row and the jth column of the rotated binary image.
In one embodiment, the apparatus 100 for constructing a sample library of handwritten mathematical formulas further comprises:
in one embodiment, the corrective action includes cutting the label box horizontally, cutting the label box vertically, adding the label box, deleting the label box, merging the label box, and modifying the label box. The label box correction module 130 includes:
the correction operation acquisition unit is used for generating an artificial marking frame based on the correction operation;
the annotation frame correction unit is used for determining the corrected annotation frame of each question block image based on the cross-over comparison algorithm, the artificial annotation frame and the initial annotation frame corresponding to each question block image;
the formula image acquisition unit is used for cutting each question block image into a plurality of label frame images according to the corrected label frame; and using the image of the labeling frame containing the handwritten mathematical formula as a formula image.
The label frame correction unit includes:
inputting the manual marking frame and the initial marking frame corresponding to each question block image into an intersection ratio formula to obtain an intersection ratio corresponding to each initial marking frame;
correcting the initial marking frame with the intersection ratio larger than a first preset threshold value based on the corresponding manual marking frame to obtain a corrected marking frame of each question block image;
the intersection ratio formula is as follows:
Figure BDA0003002094240000141
wherein, NIOU represents the intersection ratio, S (A) represents the area of the artificial labeling frame, and S (B) represents the area of the initial labeling frame.
In one embodiment, the first identification data is a mathematical formula latex statement, and the constructing apparatus 100 for the sample library of handwritten mathematical formulas further includes:
and the second display module is used for displaying the mathematical formula latex sentence corresponding to the formula image selected by the user if the formula image selected by the user is obtained.
In one embodiment, the apparatus 100 for constructing a sample library of handwritten mathematical formulas further comprises:
the model building unit is used for building an initial deep learning model;
the manual identification data acquisition unit is used for acquiring manual identification data of a user on the first formula image; the first formula image is a formula image with a first identification data error except the target formula image;
the model obtaining unit is used for training an initial deep learning model by adopting a third training sample to obtain the preset deep learning model; the third training sample includes a first formula image and corresponding human recognition data.
It can be known from the foregoing embodiment that, in the foregoing steps, a sample library enabling a program to be accurately identified is constructed in a manner of identifying a label frame, identifying data, and identifying a deep learning model with manual identification, so as to improve accuracy of identifying a handwritten mathematical formula.
Fig. 5 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 5, the terminal device 5 of this embodiment includes: a processor 50, a memory 51 and a computer program 52 stored in said memory 51 and executable on said processor 50. The processor 50, when executing the computer program 52, implements the steps in the above-described embodiments of the method for constructing a sample library of handwritten mathematical formulas, such as the steps 101 to 106 shown in fig. 1. Alternatively, the processor 50, when executing the computer program 52, implements the functions of each module/unit in the above-mentioned device embodiments, such as the functions of the modules 110 to 160 shown in fig. 4.
The computer program 52 may be divided into one or more modules/units, which are stored in the memory 51 and executed by the processor 50 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 52 in the terminal device 5. The terminal device 5 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of a terminal device 5 and does not constitute a limitation of terminal device 5 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.
The Processor 50 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may also be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing the computer program and other programs and data required by the terminal device. The memory 51 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A method for constructing a sample library of handwritten mathematical formulas is characterized by comprising the following steps:
acquiring a plurality of images to be recognized containing handwritten mathematical formulas, and cutting each image to be recognized into at least one question block image;
carrying out position detection on the handwritten mathematical formulas in the question block images through a first mathematical formula recognition program, and determining initial labeling frames of the handwritten mathematical formulas;
acquiring the correction operation of a user on the wrong initial labeling frame, correcting the corresponding initial labeling frame based on the correction operation, and extracting a formula image in each question block image according to the corrected labeling frame;
performing data recognition on each formula image through a second mathematical formula recognition program to determine first recognition data of a handwritten mathematical formula in each formula image;
performing data recognition on the target formula image by adopting a preset deep learning model and assisting in a manual recognition mode, and outputting second recognition data corresponding to the target formula image; the target formula image is a formula image with first identification data errors;
constructing a handwritten mathematical formula sample library by adopting a first training sample and a second training sample, wherein the first training sample comprises a formula image with correct first identification data and corresponding first identification data; the second training sample includes a target formula image and corresponding second recognition data.
2. The method for constructing a sample library of handwritten mathematical formulas according to claim 1, wherein said cutting each image to be recognized into at least one question block image comprises:
carrying out graying processing on each image to be identified respectively to obtain corresponding grayscale images;
carrying out binarization processing on each gray level image based on a maximum inter-class variance method to obtain a corresponding binarization image;
and respectively cutting each binary image into at least one question block image based on a projection method.
3. The method for constructing a sample library of handwritten mathematical formulas as claimed in claim 2, wherein said separately segmenting each binarized image into at least one question block image based on projection comprises:
calculating the horizontal numerical value of each binary image based on an improved horizontal projection formula, performing horizontal projection segmentation on each binary image according to the horizontal numerical value, calculating the vertical numerical value of each binary image based on an improved vertical projection formula, and performing vertical projection segmentation on each binary image according to the vertical numerical value to obtain a question block image;
wherein the improved horizontal projection formula is:
Figure FDA0003002094230000021
the improved vertical projection formula is as follows:
Figure FDA0003002094230000022
h (i) represents a horizontal numerical value, V (j) represents a vertical numerical value, and IMG theta (i, j) represents pixel values of an ith row and a jth column in the binarized image after the binarized image is rotated by a horizontal rotation angle theta;
Figure FDA0003002094230000023
indicating a rotation angle through a vertical direction
Figure FDA0003002094230000024
And pixel values of the ith row and the jth column of the rotated binary image.
4. The method of claim 1, wherein the correction operation comprises horizontally cutting the label box, vertically cutting the label box, adding the label box, deleting the label box, merging the label box, and modifying the label box;
the correcting operation based on the corresponding initial labeling frame and extracting the formula image in each question block image according to the corrected labeling frame comprise:
generating a manual marking frame based on the correction operation;
determining a revised labeling frame of each question block image based on an intersection ratio algorithm, and an artificial labeling frame and an initial labeling frame corresponding to each question block image;
cutting each question block image into a plurality of marking frame images according to the corrected marking frame; and using the image of the labeling frame containing the handwritten mathematical formula as a formula image.
5. The method for constructing a sample library of handwritten mathematical formulas according to claim 4, wherein the step of determining the corrected label frame of each question block image based on the cross-over ratio algorithm, the artificial label frame and the initial label frame corresponding to each question block image comprises:
inputting the manual marking frame and the initial marking frame corresponding to each question block image into an intersection ratio formula to obtain an intersection ratio corresponding to each initial marking frame;
correcting the initial marking frame with the intersection ratio larger than a first preset threshold value based on the corresponding manual marking frame to obtain a corrected marking frame of each question block image;
the intersection ratio formula is as follows:
Figure FDA0003002094230000031
wherein, NIOU represents the intersection ratio, S (A) represents the area of the artificial labeling frame, and S (B) represents the area of the initial labeling frame.
6. The method of constructing a sample library of handwritten mathematical formulas according to claim 1, wherein said first identification data is a mathematical formula latex sentence, and after said determining the first identification data of the handwritten mathematical formula in each formula image, said method further comprises:
and if the formula image selected by the user is acquired, displaying a mathematical formula latex sentence corresponding to the formula image selected by the user.
7. The method for constructing a sample library of handwritten mathematical formulas according to claim 1, wherein before the data recognition of the target formula image by using the preset deep learning model and assisting the manual recognition, the method further comprises:
constructing an initial deep learning model;
acquiring manual identification data of a user on a first formula image; the first formula image is a formula image with a first identification data error except the target formula image;
training an initial deep learning model by adopting a third training sample to obtain the preset deep learning model; the third training sample includes a first formula image and corresponding human recognition data.
8. An apparatus for constructing a sample library of handwritten mathematical formulas, comprising:
the question block image acquisition module is used for acquiring a plurality of images to be recognized containing handwritten mathematical formulas and cutting each image to be recognized into at least one question block image;
the initial labeling frame identification module is used for carrying out position detection on the handwritten mathematical formulas in the question block images through a first mathematical formula identification program and determining the initial labeling frames of the handwritten mathematical formulas;
the annotation frame correction module is used for acquiring correction operation of a user on the wrong initial annotation frame, correcting the corresponding initial annotation frame based on the correction operation, and extracting a formula image in each question block image according to the corrected annotation frame;
the first data identification module is used for carrying out data identification on each formula image through a second mathematical formula identification program and determining first identification data of a handwritten mathematical formula in each formula image;
the second data identification module is used for carrying out data identification on the target formula image by adopting a preset deep learning model and assisting in a manual identification mode and outputting second identification data corresponding to the target formula image; the target formula image is a formula image with first identification data errors;
the system comprises a sample base construction module, a handwriting mathematical formula sample base construction module and a handwriting mathematical formula recognition module, wherein the sample base construction module is used for constructing a handwritten mathematical formula sample base by adopting a first training sample and a second training sample, and the first training sample comprises a formula image with correct first recognition data and corresponding first recognition data; the second training sample includes a target formula image and corresponding second recognition data.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202110350789.3A 2021-03-31 2021-03-31 Construction method and device of handwritten mathematical formula sample library and terminal equipment Active CN113128486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110350789.3A CN113128486B (en) 2021-03-31 2021-03-31 Construction method and device of handwritten mathematical formula sample library and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110350789.3A CN113128486B (en) 2021-03-31 2021-03-31 Construction method and device of handwritten mathematical formula sample library and terminal equipment

Publications (2)

Publication Number Publication Date
CN113128486A true CN113128486A (en) 2021-07-16
CN113128486B CN113128486B (en) 2022-12-27

Family

ID=76774394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110350789.3A Active CN113128486B (en) 2021-03-31 2021-03-31 Construction method and device of handwritten mathematical formula sample library and terminal equipment

Country Status (1)

Country Link
CN (1) CN113128486B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921161A (en) * 2018-06-08 2018-11-30 Oppo广东移动通信有限公司 Model training method, device, electronic equipment and computer readable storage medium
CN108985297A (en) * 2018-06-04 2018-12-11 平安科技(深圳)有限公司 Handwriting model training, hand-written image recognition methods, device, equipment and medium
CN110866499A (en) * 2019-11-15 2020-03-06 爱驰汽车有限公司 Handwritten text recognition method, system, device and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985297A (en) * 2018-06-04 2018-12-11 平安科技(深圳)有限公司 Handwriting model training, hand-written image recognition methods, device, equipment and medium
CN108921161A (en) * 2018-06-08 2018-11-30 Oppo广东移动通信有限公司 Model training method, device, electronic equipment and computer readable storage medium
CN110866499A (en) * 2019-11-15 2020-03-06 爱驰汽车有限公司 Handwritten text recognition method, system, device and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张建成等: "联机手写数学表达式识别方法综述", 《淮北煤炭师范学院学报(自然科学版)》 *

Also Published As

Publication number Publication date
CN113128486B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
CN110929573A (en) Examination question checking method based on image detection and related equipment
CN110956138B (en) Auxiliary learning method based on home education equipment and home education equipment
CN113486828B (en) Image processing method, device, equipment and storage medium
CN113705576B (en) Text recognition method and device, readable storage medium and equipment
WO2022161293A1 (en) Image processing method and apparatus, and electronic device and storage medium
CN111507330A (en) Exercise recognition method and device, electronic equipment and storage medium
CN107038438A (en) It is a kind of that method is read and appraised based on image recognition
CN112446259A (en) Image processing method, device, terminal and computer readable storage medium
CN110889406B (en) Method, system and terminal for acquiring information of problem data card
CN114399623B (en) Universal answer identification method, system, storage medium and computing device
CN110852131B (en) Examination card information acquisition method, system and terminal
CN113128486B (en) Construction method and device of handwritten mathematical formula sample library and terminal equipment
CN116704508A (en) Information processing method and device
CN111259888A (en) Image-based information comparison method and device and computer-readable storage medium
CN108509960A (en) A kind of text is towards detection method and device
CN114241486A (en) Method for improving accuracy rate of identifying student information of test paper
CN113378822A (en) System for marking handwritten answer area by using special mark frame in test paper
CN113936187A (en) Text image synthesis method and device, storage medium and electronic equipment
CN112597990A (en) Judging and reading method and system of handwriting formula, terminal device and storage medium
CN111627511A (en) Ophthalmologic report content identification method and device and readable storage medium
CN111476090A (en) Watermark identification method and device
CN110738522B (en) User portrait construction method and device, computer equipment and storage medium
CN113033400B (en) Method and device for identifying mathematical formulas, storage medium and electronic equipment
CN111046863B (en) Data processing method, device, equipment and computer readable storage medium
CN116303871A (en) Exercise book reading method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant