CN110781648A - Test paper automatic transcription system and method based on deep learning - Google Patents

Test paper automatic transcription system and method based on deep learning Download PDF

Info

Publication number
CN110781648A
CN110781648A CN201910970234.1A CN201910970234A CN110781648A CN 110781648 A CN110781648 A CN 110781648A CN 201910970234 A CN201910970234 A CN 201910970234A CN 110781648 A CN110781648 A CN 110781648A
Authority
CN
China
Prior art keywords
test paper
detection
character
recognition
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910970234.1A
Other languages
Chinese (zh)
Inventor
严军峰
侯冲
陈家海
叶家鸣
吴波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Seven Days Education Technology Co Ltd
Original Assignee
Anhui Seven Days Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Seven Days Education Technology Co Ltd filed Critical Anhui Seven Days Education Technology Co Ltd
Priority to CN201910970234.1A priority Critical patent/CN110781648A/en
Publication of CN110781648A publication Critical patent/CN110781648A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Input (AREA)

Abstract

The invention relates to the technical field of image target detection and recognition, and discloses an automatic test paper transcription system and a method, wherein the system is based on various deep learning technologies, and the method mainly comprises the steps of data automatic generation, chart detection, character line detection, formula detection, OCR (optical character recognition) recognition, post-processing and the like; the system provides a test paper image automatic transcription method aiming at test paper, and the method mainly aims at photographing and scanning test paper image data including common test papers of mathematics, Chinese, English and the like, and realizes automatic transcription of test paper contents from images to Word. The automatic test paper transcription method provided by the invention is a process of automatically converting the content of the test paper into a Word version for the test paper image data acquired by scanning or shooting and the like, so that the conversion of the content of the test paper image from the image to the electronic version is realized.

Description

Test paper automatic transcription system and method based on deep learning
Technical Field
The invention relates to the technical field of image target detection and identification, in particular to a test paper automatic transcription system and method based on deep learning.
Background
In recent years, a deep learning technology based on a convolutional neural network makes breakthrough progress in the field of computer vision, application research in the field of image processing is greatly promoted, and particularly, a technology represented by target detection and OCR (optical character recognition) is widely applied to the fields of intelligent transportation, video monitoring, unmanned driving, AI education and the like. Meanwhile, deep learning techniques are increasingly applied in the field of education, such as face recognition, handwriting recognition, image capture and question searching.
At present, the application of deep learning technology in test paper document analysis is not many, and the deep learning technology mainly focuses on the scenes of test paper document analysis, test paper image-text separation, test paper handwriting identification and the like. The requirement of automatically transcribing the test paper content from the picture to the electronic version format of the Word version becomes the hot point requirement in the current teacher question, the realization of the automatic transcription of the photographed test paper is beneficial to the recombination and modification of the questions in the teacher question-making process, the teacher question-making time is greatly saved, and the working efficiency is improved. In the current teaching work, manual intervention is still needed in the test paper transcription work, and the transcription process is time-consuming and low in efficiency. Based on the current situation, the method realizes automatic test paper transcription by means of a deep learning technology, and provides a test paper automatic transcription system and method based on deep learning.
The method integrates various existing deep learning technologies to comprehensively realize the automatic test paper transcription task according to the test paper layout characteristics, can conveniently and automatically transcribe the test paper picture data acquired in the modes of photographing or scanning and the like into a Word format, and provides support for subsequent teacher question setting, similar question recommendation and knowledge point judgment.
Disclosure of Invention
Technical problem to be solved
Aiming at the problems existing in the current test paper transcription, the invention provides a test paper automatic transcription method based on deep learning, which realizes the process of test paper transcription from manual to automatic by introducing the deep learning technology into the test paper transcription, solves the problem of time consumption for extracting test paper picture and document information, and greatly improves the test paper transcription efficiency.
(II) technical scheme
In order to achieve the purpose, the invention provides the following technical scheme: a method for automatically transcribing test paper based on deep learning is characterized in that: the system is based on a deep learning technology and mainly comprises the steps of data automatic generation, chart detection, character line detection, formula detection, OCR recognition, post-processing and the like.
Preferably, the main features are specifically described as follows: the method comprises the steps of automatically generating training data required by algorithms such as character line detection, chart detection, OCR (optical character recognition) and the like by using a simulation program, wherein the chart detection is responsible for separating pictures from character areas in a test paper, the character line and formula detection is responsible for detecting all character lines (formulas) in the test paper, the OCR recognition is responsible for recognizing the detected character lines and formulas, and the post-processing process rearranges recognition and detection results and outputs a Word format according to the original format of the test paper.
Preferably, the data is automatically generated as specifically described: training data which is highly similar to a real sample is automatically generated by using a programming language, and various layout test paper pictures and label data including charts, character lines, formulas and the like in specified quantity are randomly generated in the process of simulating data through program control.
Preferably, the graph test is specifically described as: a lightweight SSD-Mobilenetv2 network is used as a chart detection network, the size of an input image is 224x224, Mobilenetv2 is used as a backbone network to extract features, and chart areas in the image are respectively detected based on multi-layer features of the SSD, so that the position coordinate information of the chart areas is obtained.
Preferably, the text line detection is specifically described as: the method uses a text line detection algorithm AdvanceaST in a natural scene as a text line detection network, the test paper picture may have the situation of inclined arrangement position and the like in the shooting or scanning process, at the moment, if the algorithm based on two-point positioning is used, the situation of inaccurate text line positioning can occur, therefore, the four-point positioning algorithm is used for text line positioning in the method, and when the picture is inclined, the text line region can be aligned by carrying out perspective transformation on four-point coordinates of the text line. The method uses 1280x192 resolution images as input to position all character lines in the images, thereby obtaining the position coordinate information of the character lines in the test paper images, and restoring the coordinates to the original image area for perspective transformation. Considering the problem that the formula height in the character line is larger than the character line height, the character line coordinates are subjected to the outward expansion of 5 pixel points, so that each line of characters intercepted from the original image is ensured to include a complete formula area.
Preferably, the formula detection is specifically described as: the CTPN algorithm is used as a formula detection network, the output of the character line detection result is used as input, and therefore whether a formula exists in each character line is detected, and the character area and the position coordinates of the formula area in the current input character line are distinguished.
Preferably, the OCR recognition is specifically described as: the OCR recognition is divided into character recognition and formula recognition, character area and formula area position coordinates in a character line can be obtained through character line detection and formula detection, corresponding areas are cut from an original image according to the coordinates, the character areas are input into a character recognition engine to be recognized, the formula areas are input into a formula engine to be recognized, and recognition of all characters and formulas in a test paper is achieved through two independent branches.
Preferably, the post-treatment is specifically described as: and according to the results of chart detection, character recognition, formula recognition and the like, rearranging the recognition result and outputting the Word version transcription result according to the original test paper layout.
Preferably, the method comprises the following specific steps:
step one, simulating training data: the method is oriented to all detection and identification models related to the test paper automatic transcription process, 5 different models need to be trained independently, each model needs a large amount of training data as support, manual marking is time-consuming, and therefore batch training data needed by the 5 models can be simulated conveniently through the automatic data generation program in the method.
The data generation process is sequentially simulated according to the sequence of chart detection, character line detection, formula detection, character recognition and formula recognition, the detected data label information is coordinate information of a chart or a character line, and the label information of OCR related recognition is an index of a corresponding character string on a picture in a dictionary. The simulation program is provided with partial data enhancement functions, including fuzzification processing, noise data adding and other processes, and batch training data can be generated only by specifying the total amount of samples and operating the corresponding simulation program in the simulation process.
Step two, data preprocessing: combining a common test paper board, wherein in the training data, the size of a chart detection input image is 224x224, the size of a character line detection input image is 1280x192, OCR (optical character recognition) identifies that the height of the input image is 32 pixels, normalizes the image to be between-1 and 1, the training process takes Batchsize as basic input, each Batchsize is randomly selected from an original image, and data enhancement operations such as Gaussian blurring, contrast, brightness, test paper cutting and the like are randomly added;
step three, training a neural network: according to the steps, a chart detection model, a character line detection model, a formula detection model and an OCR recognition model are trained in sequence, an end-to-end training mode is adopted in the steps, and network hyper-parameters are set as follows:
(1) and learning rate: the initial learning rate was set to 0.01, a 10% reduction per 10 rounds of training;
(2) and an optimizer: adam or sgd optimizer (implementation process is decided according to model training condition);
(3) and the other: the batch processing size is set to be 8 and is different according to the display memory size; the total number of training rounds is 200;
step four, post-treatment: and converting the models into pb files, sequentially splicing the pb files, outputting the previous model as the next model input, and finally typesetting the recognition result again and outputting a word format.
(III) advantageous effects
The invention provides a test paper automatic transcription method based on deep learning, which has the following beneficial effects:
(1) the invention provides a method for automatically transcribing a test paper based on deep learning aiming at the current situation, the method mainly aims at the automatic transcription of the test paper comprising common test papers of mathematics, Chinese, English and the like, and the automatic transcription of the test paper is a process of automatically converting the content of the test paper into a Word version in the test paper image data acquired by scanning or shooting and the like by the method, so that the conversion of the content of the test paper from a picture to an electronic version is realized. By introducing the deep learning technology into test paper transcription, the process of test paper transcription from manual to automatic is realized, the problem of time consumption in test paper picture and document information extraction is solved, and the test paper transcription efficiency is greatly improved.
(2) According to the invention, a deep learning technology is introduced into test paper transcription, so that the automation of test paper document content transcription is realized, and aiming at the test paper transcription characteristics, the test paper automatic transcription overall process based on various deep learning technology methods is creatively provided through the overall integration of the target detection and OCR recognition methods in the existing deep learning technology, so that the automatic transcription of various complex test papers including mathematic test papers can be completed, and the test paper transcription efficiency is greatly improved.
Drawings
FIG. 1 is a flow chart of the overall implementation of the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example (b):
as shown in fig. 1, the present invention provides a technical solution: a test paper automatic transcription method based on deep learning comprises the following parts:
the data automatic generation part comprises: the automatic generation of training data of 5 deep learning models in the method is mainly described, the method relates to various network structures, each network needs independent training data, and time and labor are consumed by manual marking, so that the automatic generation part of the data in the method can respectively generate corresponding training data for each network. And (3) detecting the model: the graph detection, the character line detection and the formula detection model use similar simulation programs, a pure white background picture and a section of corpus information with random length are randomly selected in the simulation data process, the corpus information is randomly added into the background picture, and the corpus coordinate position information is recorded in a corresponding txt file. The detection is mainly oriented to test paper, so that corpus information is mainly collected Word edition type test paper, a program in the data generation process can automatically simulate a part according to a certain probability to select a question style text, if the test data is training data aiming at a chart detection model, the program can add a chart with unfixed position to each piece of simulation data and record corresponding position information. By the simulation program in the method, a large amount of different training data can be generated in a short time, and the model can be trained and deployed in time conveniently.
The chart detection section: the method mainly describes how to detect the chart in the test paper, the chart detection network uses SSD-Mobilenetv2, the size of the test paper image resize is 224x224 resolution in the method, the test paper features are extracted by lightweight Mobilenetv2, and the chart areas are respectively and independently predicted from a plurality of different sizes according to the SSD layered prediction idea, so that the detection of chart targets with different sizes in the test paper is realized, and finally the final chart area position can be obtained by global NMS.
The character line detection section: mainly describing how to detect all lines of text from the test paper image, the line of text detection in the method uses the advanced EAST algorithm. The test paper text line detection is complex, element symbols such as various formulas or mathematical sets of different types may exist in a section of text line, the height of the formulas is often higher than the height of the text line by a plurality of pixels, and incomplete detection of the formula part may occur if the detection is directly performed according to the text line area. Therefore, in the automatic generation stage of the data, the training data all use the height of the formula in the same line as the standard label, so that the detection algorithm can be ensured to completely detect the formula in the character line to the greatest extent. The Advance EAST is a character line detection algorithm in a natural scene, and can perform four-point positioning on a detection target, so that the detection under the problems of irregular inclination of the target to be detected and the like is solved. The size of the input test paper image is 1280x192 (the length and the width of the input image must be integral multiple of 32) in the character line detection, the VGG is used for extracting features by the algorithm backbone network, after a plurality of features are extracted, 1x1 and 3x3 size convolution are sequentially carried out on the feature map, and the feature map is up-sampled to the same scale as the previous feature map and spliced. And finally, the network obtains the fused feature map, performs 1x1 convolution for three times on the basis, outputs feature maps with three channels of 1, 2 and 4 respectively, and respectively represents the score value of a pixel point, the probability value that the pixel point is a text region or a non-text region and the connection prediction between the pixel point and the surrounding four directions.
The formula detection part: it is mainly described how to detect whether or not a formula exists from the above-described detected text lines, thereby individually cutting and identifying the formula portion. In the method, the formula detection uses a CTPN network which is improved from fast R-CNN and can effectively detect the transversely-distributed characters of a complex scene. According to the method, feature maps are obtained by using the first 5 Conv stages of VGG16, window features of 3x3xC are taken at each position of the feature maps of Conv5, and the features are mainly used for predicting category information and position information corresponding to k anchors at the current position. The features of all windows per row corresponding to 3x3xC are input to RNN, resulting in an output vector of W × 256, and W × 256 of RNN is input to a full-connected layer of 512 dimensions. The Fc layer features are input into three classification or regression layers, the second 2k scores representing class information (character or not) of k anchors. The first 2k vertical code and the third k side-refinement are the location information used to regress the k anchors. And finally merging the propofol of the characters obtained by classification into a text line by using a detected text line construction algorithm, namely formula detection output.
An OCR recognition section: the method mainly describes how to recognize characters in a character area and a formula area, and comprises two recognition engines, wherein one recognition engine is a traditional OCR recognition algorithm and recognizes characters, numbers and the like, the other recognition engine is an algorithm specially recognizing a formula, the two recognition engines adopt the basic framework of CNN + LSTM, the formula recognition additionally uses an Attention mechanism, and the character recognition algorithm uses CTC to calculate loss Function. In the method, the height of the character line input by the character recognition model is 32x280, and the character length is between 5 and 15 characters. The image size after formula identification input formula detection is not fixed, the formula image height is higher than the character line height, the formula identification result is output in a latex format, and the image data can be displayed only by a post-processing part.
And (3) post-treatment: the method aims to realize automatic test paper transcription, not only transcribes the test paper image content into a Word version, but also needs to output the transcription result according to the original image original layout, so that the post-processing process comprises the steps of detecting the results of chart detection, character line detection, formula detection and OCR (optical character recognition) according to the coordinate position information of a detection target, sequencing according to the first Y and the second X coordinates, and finally adding the formula recognition result to the corresponding position according to the position of the formula coordinates in the character line and carrying out global optimization processing.
A test paper automatic transcription method based on deep learning comprises the following steps:
step one, manufacturing training data: the deep learning model training needs a large amount of data as a support, the test paper automatic transcription aimed at by the method has 5 deep learning models, and a large amount of test paper data corresponding to different models are needed as training data, so that by utilizing a data simulation program in the method, a large amount of training data can be quickly generated by specifying parameters such as total sample amount, test paper data types, simulation data forms and the like, the simulation program can generate a jpg format picture and a corresponding txt file, wherein [ xmin, ymin, xmax, ymax ] format test paper text lines or charts and formula region coordinate information are stored in the txt file, and when N text lines, formulas or charts exist in the jpg format picture, N lines of similar coordinate information exist in the corresponding txt;
step two, data preprocessing: according to 5 different model training data formats, in the training data, the sizes of character line detection data are unified from resize to 1280x190, the sizes of chart detection data are unified from resize to 224x224, the character line detection data are input into an RGB picture, the picture is normalized to be between-1 and 1, the height of an OCR character recognition input image is 32 pixels, and a gray scale image is input. In the training process, the batchsizes are used as basic input, each batchsize is randomly selected from an original picture, and data enhancement operations such as Gaussian fuzzification, contrast, brightness, paper cutting and the like are randomly added;
step three, training a neural network: model training is carried out integrally by adopting an end-to-end network structure, and the hyper-parameters are set as follows:
(1) and learning rate: the initial learning rate of the detection model is set to be 0.01, the initial learning rate of the identification model is reduced by 10 percent in each training 10 rounds, the initial learning rate of the identification model is 0.0001, and the initial learning rate of the identification model is reduced by 10 percent in each 10 epochs;
(2) and an optimizer: adam or sgd optimizer (implementation process is decided according to model training condition);
(3) and the other: the batch processing size is set to be 8 and is different according to the display memory size; the total number of training rounds is 200;
step four, post-treatment: and converting the models into pb files, sequentially splicing the pb files, outputting the previous model as the next model input, and finally typesetting the recognition result again and outputting the Word format.
The invention aims at the test paper image, realizes the automatic transcription of the content of the test paper document by a deep learning method, and automatically transcribes the image type test paper data into a Word version by the method, thereby laying a foundation for the construction of a large-scale test paper database.
In summary, the present invention provides a method for automatically transcribing a test paper based on deep learning, which is mainly used for automatically transcribing test papers including common test papers of mathematics, Chinese, english, etc., and the automatic transcription of the test paper in the present invention refers to a process of automatically converting the content of the test paper into a Word version in the test paper image data obtained by scanning or shooting, etc., so as to realize the conversion of the content of the test paper from a picture to an electronic version. Aiming at the characteristics of the automatic test paper transcription process, the method provides an integrated whole process solution of detection and identification by using various image processing technologies based on deep learning, and realizes a new one-stop automatic test paper image content transcription method. The method can well transcribe the test paper aiming at different types of questions, particularly in scenes that the test paper questions contain charts, formulas and the like. According to the method, the deep learning technology is introduced into the test paper transcription, so that the process of the test paper transcription from manual to automatic is realized, the problem of time consumption for extracting the test paper picture and document information is solved, the process of the test paper text information extraction from manual to automatic is realized, and the test paper transcription efficiency is greatly improved.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. A test paper automatic transcription system and method based on deep learning are characterized in that: the system is based on a deep learning technology and mainly comprises the steps of data automatic generation, chart detection, character line detection, formula detection, OCR recognition, post-processing and the like.
2. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the main features are described in detail as follows: the method comprises the steps of automatically generating training data required by algorithms such as character line detection, chart detection, OCR (optical character recognition) and the like by using a simulation program, wherein the chart detection is responsible for separating pictures from character areas in a test paper, the character line and formula detection is responsible for detecting all character lines (formulas) in the test paper, the OCR recognition is responsible for recognizing the detected character lines and formulas, the recognition and detection results are rearranged in a post-processing process, and a Word format is output according to the original format of the test paper.
3. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the data is automatically generated and specifically described as follows: training data which is highly similar to a real sample is automatically generated by using a programming language, and various layout test paper pictures and label data including charts, character lines, formulas and the like in specified quantity are randomly generated in the process of simulating data through program control.
4. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the chart detection is specifically described as follows: a lightweight SSD-Mobilenetv2 network is used as a chart detection network, the size of an input image is 224x224, Mobilenetv2 is used as a backbone network to extract features, and chart areas in the image are respectively detected based on multi-layer features of the SSD, so that the position coordinate information of the chart areas is obtained.
5. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the text line detection is specifically described as follows: the method comprises the steps that a word line detection algorithm Advance EAST under a natural scene is used as a word line detection network, the situation that the placing position of a test paper picture is inclined and the like possibly exists in the shooting or scanning process, at the moment, if an algorithm based on two-point positioning is used, the situation that the word line is not accurately positioned occurs, therefore, a four-point positioning algorithm is used for positioning the word line in the method, and when the picture is inclined, the four-point coordinates of the word line are subjected to perspective transformation, so that the word line area can be aligned; the method uses 1280x192 resolution images as input to position all character lines in the images, thereby obtaining the position coordinate information of the character lines in the test paper images, and restoring the coordinates to the original image area for perspective transformation. Considering the problem that the formula height in the character line is larger than the character line height, the character line coordinates are subjected to the outward expansion of 5 pixel points, so that each line of characters intercepted from the original image is ensured to include a complete formula area.
6. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the formula detection is specifically described as: the CTPN algorithm is used as a formula detection network, the output of the character line detection result is used as input, and therefore whether a formula exists in each character line is detected, and the character area and the position coordinates of the formula area in the current input character line are distinguished.
7. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the OCR recognition is specifically described as: the OCR recognition is divided into character recognition and formula recognition, character area and formula area position coordinates in a character line can be obtained through character line detection and formula detection, corresponding areas are cut from an original image according to the coordinates, the character areas are input into a character recognition engine to be recognized, the formula areas are input into a formula engine to be recognized, and recognition of all characters and formulas in a test paper is achieved through two independent branches.
8. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the post-treatment is specifically described as: and according to the results of chart detection, character recognition, formula recognition and the like, rearranging the recognition result and outputting the Word version transcription result according to the original test paper layout. The method comprises the following specific steps:
step one, simulating training data: the method is oriented to all detection and identification models related to the test paper automatic transcription process, 5 different models need to be trained independently, each model needs a large amount of training data as support, manual marking is time-consuming, and batch training data needed by the 5 models can be simulated conveniently through an automatic data generation program in the method;
the data generation process is sequentially simulated according to the sequence of chart detection, character line detection, formula detection, character recognition and formula recognition, the detected data label information is coordinate information of a chart or a character line, and the label information of OCR related recognition is an index of a corresponding character string on a picture in a dictionary. The simulation program is provided with partial data enhancement functions, including fuzzification processing, noise data adding and other processes, and batch training data can be generated only by specifying the total amount of samples and operating the corresponding simulation program in the simulation process;
step two, data preprocessing: combining a common test paper board, wherein in the training data, the size of a chart detection input image is 224x224, the size of a character line detection input image is 1280x192, OCR (optical character recognition) identifies that the height of the input image is 32 pixels, normalizes the image to be between-1 and 1, the training process takes Batchsize as basic input, each Batchsize is randomly selected from an original image, and data enhancement operations such as Gaussian blurring, contrast, brightness, test paper cutting and the like are randomly added;
step three, training a neural network: according to the steps, a chart detection model, a character line detection model, a formula detection model and an OCR recognition model are trained in sequence, an end-to-end training mode is adopted in the steps, and network hyper-parameters are set as follows:
(1) and learning rate: the initial learning rate was set to 0.01, a 10% reduction per 10 rounds of training;
(2) and an optimizer: adam or sgd optimizer (implementation process is decided according to model training condition);
(3) and the other: the batch processing size is set to be 8 and is different according to the display memory size; the total number of training rounds is 200;
step four, post-treatment: and converting the models into pb files, sequentially splicing the pb files, outputting the previous model as the next model input, and finally typesetting the recognition result again and outputting the Word format.
CN201910970234.1A 2019-10-12 2019-10-12 Test paper automatic transcription system and method based on deep learning Withdrawn CN110781648A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910970234.1A CN110781648A (en) 2019-10-12 2019-10-12 Test paper automatic transcription system and method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910970234.1A CN110781648A (en) 2019-10-12 2019-10-12 Test paper automatic transcription system and method based on deep learning

Publications (1)

Publication Number Publication Date
CN110781648A true CN110781648A (en) 2020-02-11

Family

ID=69385231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910970234.1A Withdrawn CN110781648A (en) 2019-10-12 2019-10-12 Test paper automatic transcription system and method based on deep learning

Country Status (1)

Country Link
CN (1) CN110781648A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401342A (en) * 2020-06-04 2020-07-10 南京红松信息技术有限公司 Question type sample manufacturing method based on label automation
CN111460782A (en) * 2020-04-01 2020-07-28 支付宝(杭州)信息技术有限公司 Information processing method, device and equipment
CN111507250A (en) * 2020-04-16 2020-08-07 北京世纪好未来教育科技有限公司 Image recognition method, device and storage medium
CN112347997A (en) * 2020-11-30 2021-02-09 广东国粒教育技术有限公司 Test question detection and identification method and device, electronic equipment and medium
CN112597878A (en) * 2020-12-21 2021-04-02 安徽七天教育科技有限公司 Sample making and identifying method for scanning test paper layout analysis
CN112651315A (en) * 2020-12-17 2021-04-13 苏州超云生命智能产业研究院有限公司 Information extraction method and device of line graph, computer equipment and storage medium
CN112766125A (en) * 2021-01-12 2021-05-07 徐州金林人工智能科技有限公司 Test question uploading tool based on machine learning algorithm and uploading method thereof
CN113537201A (en) * 2021-09-16 2021-10-22 江西风向标教育科技有限公司 Multi-dimensional hybrid OCR recognition method, device, equipment and storage medium
CN114120349A (en) * 2022-01-10 2022-03-01 深圳市菁优智慧教育股份有限公司 Test paper identification method and system based on deep learning
CN117894217A (en) * 2024-03-12 2024-04-16 中国科学技术大学 Mathematics topic guiding system for online learning system

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460782B (en) * 2020-04-01 2023-08-22 支付宝(杭州)信息技术有限公司 Information processing method, device and equipment
CN111460782A (en) * 2020-04-01 2020-07-28 支付宝(杭州)信息技术有限公司 Information processing method, device and equipment
CN111507250A (en) * 2020-04-16 2020-08-07 北京世纪好未来教育科技有限公司 Image recognition method, device and storage medium
CN111401342A (en) * 2020-06-04 2020-07-10 南京红松信息技术有限公司 Question type sample manufacturing method based on label automation
CN112347997A (en) * 2020-11-30 2021-02-09 广东国粒教育技术有限公司 Test question detection and identification method and device, electronic equipment and medium
CN112651315A (en) * 2020-12-17 2021-04-13 苏州超云生命智能产业研究院有限公司 Information extraction method and device of line graph, computer equipment and storage medium
CN112597878A (en) * 2020-12-21 2021-04-02 安徽七天教育科技有限公司 Sample making and identifying method for scanning test paper layout analysis
CN112766125A (en) * 2021-01-12 2021-05-07 徐州金林人工智能科技有限公司 Test question uploading tool based on machine learning algorithm and uploading method thereof
CN113537201A (en) * 2021-09-16 2021-10-22 江西风向标教育科技有限公司 Multi-dimensional hybrid OCR recognition method, device, equipment and storage medium
CN114120349A (en) * 2022-01-10 2022-03-01 深圳市菁优智慧教育股份有限公司 Test paper identification method and system based on deep learning
CN114120349B (en) * 2022-01-10 2022-05-03 深圳市菁优智慧教育股份有限公司 Test paper identification method and system based on deep learning
CN117894217A (en) * 2024-03-12 2024-04-16 中国科学技术大学 Mathematics topic guiding system for online learning system
CN117894217B (en) * 2024-03-12 2024-06-04 中国科学技术大学 Mathematics topic guiding system for online learning system

Similar Documents

Publication Publication Date Title
CN110781648A (en) Test paper automatic transcription system and method based on deep learning
CN112101357B (en) RPA robot intelligent element positioning and picking method and system
US7899249B2 (en) Media material analysis of continuing article portions
CN110765907A (en) System and method for extracting paper document information of test paper in video based on deep learning
RU2707147C1 (en) Neural network training by means of specialized loss functions
RU2760471C1 (en) Methods and systems for identifying fields in a document
CN111767927A (en) Lightweight license plate recognition method and system based on full convolution network
CN110175609B (en) Interface element detection method, device and equipment
CN110929746A (en) Electronic file title positioning, extracting and classifying method based on deep neural network
CN112434690A (en) Method, system and storage medium for automatically capturing and understanding elements of dynamically analyzing text image characteristic phenomena
CN112347997A (en) Test question detection and identification method and device, electronic equipment and medium
CN110705535A (en) Method for automatically detecting test paper layout character line
CN114550153A (en) Terminal block image detection and identification method
CN111832497B (en) Text detection post-processing method based on geometric features
CN110674721A (en) Method for automatically detecting test paper layout formula
CN111832551A (en) Text image processing method and device, electronic scanning equipment and storage medium
Ahmed et al. A generic method for automatic ground truth generation of camera-captured documents
Castillo et al. Object detection in digital documents based on machine learning algorithms
CN113111869B (en) Method and system for extracting text picture and description thereof
RU2703270C1 (en) Optical character recognition using specialized confidence functions, implemented on the basis of neural networks
CN114331932A (en) Target image generation method and device, computing equipment and computer storage medium
JP4807486B2 (en) Teaching material processing apparatus, teaching material processing method, and teaching material processing program
CN113409327A (en) Example segmentation improvement method based on ordering and semantic consistency constraint
CN113934922A (en) Intelligent recommendation method, device, equipment and computer storage medium
CN111488728A (en) Labeling method, device and storage medium for unstructured test question data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200211

WW01 Invention patent application withdrawn after publication