CN110781648A

CN110781648A - Test paper automatic transcription system and method based on deep learning

Info

Publication number: CN110781648A
Application number: CN201910970234.1A
Authority: CN
Inventors: 严军峰; 侯冲; 陈家海; 叶家鸣; 吴波
Original assignee: Anhui Seven Days Education Technology Co Ltd
Current assignee: Anhui Seven Days Education Technology Co Ltd
Priority date: 2019-10-12
Filing date: 2019-10-12
Publication date: 2020-02-11

Abstract

The invention relates to the technical field of image target detection and recognition, and discloses an automatic test paper transcription system and a method, wherein the system is based on various deep learning technologies, and the method mainly comprises the steps of data automatic generation, chart detection, character line detection, formula detection, OCR (optical character recognition) recognition, post-processing and the like; the system provides a test paper image automatic transcription method aiming at test paper, and the method mainly aims at photographing and scanning test paper image data including common test papers of mathematics, Chinese, English and the like, and realizes automatic transcription of test paper contents from images to Word. The automatic test paper transcription method provided by the invention is a process of automatically converting the content of the test paper into a Word version for the test paper image data acquired by scanning or shooting and the like, so that the conversion of the content of the test paper image from the image to the electronic version is realized.

Description

Test paper automatic transcription system and method based on deep learning

Technical Field

The invention relates to the technical field of image target detection and identification, in particular to a test paper automatic transcription system and method based on deep learning.

Background

In recent years, a deep learning technology based on a convolutional neural network makes breakthrough progress in the field of computer vision, application research in the field of image processing is greatly promoted, and particularly, a technology represented by target detection and OCR (optical character recognition) is widely applied to the fields of intelligent transportation, video monitoring, unmanned driving, AI education and the like. Meanwhile, deep learning techniques are increasingly applied in the field of education, such as face recognition, handwriting recognition, image capture and question searching.

At present, the application of deep learning technology in test paper document analysis is not many, and the deep learning technology mainly focuses on the scenes of test paper document analysis, test paper image-text separation, test paper handwriting identification and the like. The requirement of automatically transcribing the test paper content from the picture to the electronic version format of the Word version becomes the hot point requirement in the current teacher question, the realization of the automatic transcription of the photographed test paper is beneficial to the recombination and modification of the questions in the teacher question-making process, the teacher question-making time is greatly saved, and the working efficiency is improved. In the current teaching work, manual intervention is still needed in the test paper transcription work, and the transcription process is time-consuming and low in efficiency. Based on the current situation, the method realizes automatic test paper transcription by means of a deep learning technology, and provides a test paper automatic transcription system and method based on deep learning.

The method integrates various existing deep learning technologies to comprehensively realize the automatic test paper transcription task according to the test paper layout characteristics, can conveniently and automatically transcribe the test paper picture data acquired in the modes of photographing or scanning and the like into a Word format, and provides support for subsequent teacher question setting, similar question recommendation and knowledge point judgment.

Disclosure of Invention

Technical problem to be solved

Aiming at the problems existing in the current test paper transcription, the invention provides a test paper automatic transcription method based on deep learning, which realizes the process of test paper transcription from manual to automatic by introducing the deep learning technology into the test paper transcription, solves the problem of time consumption for extracting test paper picture and document information, and greatly improves the test paper transcription efficiency.

(II) technical scheme

In order to achieve the purpose, the invention provides the following technical scheme: a method for automatically transcribing test paper based on deep learning is characterized in that: the system is based on a deep learning technology and mainly comprises the steps of data automatic generation, chart detection, character line detection, formula detection, OCR recognition, post-processing and the like.

Preferably, the main features are specifically described as follows: the method comprises the steps of automatically generating training data required by algorithms such as character line detection, chart detection, OCR (optical character recognition) and the like by using a simulation program, wherein the chart detection is responsible for separating pictures from character areas in a test paper, the character line and formula detection is responsible for detecting all character lines (formulas) in the test paper, the OCR recognition is responsible for recognizing the detected character lines and formulas, and the post-processing process rearranges recognition and detection results and outputs a Word format according to the original format of the test paper.

Preferably, the data is automatically generated as specifically described: training data which is highly similar to a real sample is automatically generated by using a programming language, and various layout test paper pictures and label data including charts, character lines, formulas and the like in specified quantity are randomly generated in the process of simulating data through program control.

Preferably, the graph test is specifically described as: a lightweight SSD-Mobilenetv2 network is used as a chart detection network, the size of an input image is 224x224, Mobilenetv2 is used as a backbone network to extract features, and chart areas in the image are respectively detected based on multi-layer features of the SSD, so that the position coordinate information of the chart areas is obtained.

Preferably, the text line detection is specifically described as: the method uses a text line detection algorithm AdvanceaST in a natural scene as a text line detection network, the test paper picture may have the situation of inclined arrangement position and the like in the shooting or scanning process, at the moment, if the algorithm based on two-point positioning is used, the situation of inaccurate text line positioning can occur, therefore, the four-point positioning algorithm is used for text line positioning in the method, and when the picture is inclined, the text line region can be aligned by carrying out perspective transformation on four-point coordinates of the text line. The method uses 1280x192 resolution images as input to position all character lines in the images, thereby obtaining the position coordinate information of the character lines in the test paper images, and restoring the coordinates to the original image area for perspective transformation. Considering the problem that the formula height in the character line is larger than the character line height, the character line coordinates are subjected to the outward expansion of 5 pixel points, so that each line of characters intercepted from the original image is ensured to include a complete formula area.

Preferably, the formula detection is specifically described as: the CTPN algorithm is used as a formula detection network, the output of the character line detection result is used as input, and therefore whether a formula exists in each character line is detected, and the character area and the position coordinates of the formula area in the current input character line are distinguished.

Preferably, the OCR recognition is specifically described as: the OCR recognition is divided into character recognition and formula recognition, character area and formula area position coordinates in a character line can be obtained through character line detection and formula detection, corresponding areas are cut from an original image according to the coordinates, the character areas are input into a character recognition engine to be recognized, the formula areas are input into a formula engine to be recognized, and recognition of all characters and formulas in a test paper is achieved through two independent branches.

Preferably, the post-treatment is specifically described as: and according to the results of chart detection, character recognition, formula recognition and the like, rearranging the recognition result and outputting the Word version transcription result according to the original test paper layout.

Preferably, the method comprises the following specific steps:

step one, simulating training data: the method is oriented to all detection and identification models related to the test paper automatic transcription process, 5 different models need to be trained independently, each model needs a large amount of training data as support, manual marking is time-consuming, and therefore batch training data needed by the 5 models can be simulated conveniently through the automatic data generation program in the method.

The data generation process is sequentially simulated according to the sequence of chart detection, character line detection, formula detection, character recognition and formula recognition, the detected data label information is coordinate information of a chart or a character line, and the label information of OCR related recognition is an index of a corresponding character string on a picture in a dictionary. The simulation program is provided with partial data enhancement functions, including fuzzification processing, noise data adding and other processes, and batch training data can be generated only by specifying the total amount of samples and operating the corresponding simulation program in the simulation process.

Step two, data preprocessing: combining a common test paper board, wherein in the training data, the size of a chart detection input image is 224x224, the size of a character line detection input image is 1280x192, OCR (optical character recognition) identifies that the height of the input image is 32 pixels, normalizes the image to be between-1 and 1, the training process takes Batchsize as basic input, each Batchsize is randomly selected from an original image, and data enhancement operations such as Gaussian blurring, contrast, brightness, test paper cutting and the like are randomly added;

step three, training a neural network: according to the steps, a chart detection model, a character line detection model, a formula detection model and an OCR recognition model are trained in sequence, an end-to-end training mode is adopted in the steps, and network hyper-parameters are set as follows:

(1) and learning rate: the initial learning rate was set to 0.01, a 10% reduction per 10 rounds of training;

(2) and an optimizer: adam or sgd optimizer (implementation process is decided according to model training condition);

(3) and the other: the batch processing size is set to be 8 and is different according to the display memory size; the total number of training rounds is 200;

step four, post-treatment: and converting the models into pb files, sequentially splicing the pb files, outputting the previous model as the next model input, and finally typesetting the recognition result again and outputting a word format.

(III) advantageous effects

The invention provides a test paper automatic transcription method based on deep learning, which has the following beneficial effects:

(1) the invention provides a method for automatically transcribing a test paper based on deep learning aiming at the current situation, the method mainly aims at the automatic transcription of the test paper comprising common test papers of mathematics, Chinese, English and the like, and the automatic transcription of the test paper is a process of automatically converting the content of the test paper into a Word version in the test paper image data acquired by scanning or shooting and the like by the method, so that the conversion of the content of the test paper from a picture to an electronic version is realized. By introducing the deep learning technology into test paper transcription, the process of test paper transcription from manual to automatic is realized, the problem of time consumption in test paper picture and document information extraction is solved, and the test paper transcription efficiency is greatly improved.

(2) According to the invention, a deep learning technology is introduced into test paper transcription, so that the automation of test paper document content transcription is realized, and aiming at the test paper transcription characteristics, the test paper automatic transcription overall process based on various deep learning technology methods is creatively provided through the overall integration of the target detection and OCR recognition methods in the existing deep learning technology, so that the automatic transcription of various complex test papers including mathematic test papers can be completed, and the test paper transcription efficiency is greatly improved.

Drawings

FIG. 1 is a flow chart of the overall implementation of the present invention;

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example (b):

as shown in fig. 1, the present invention provides a technical solution: a test paper automatic transcription method based on deep learning comprises the following parts:

the data automatic generation part comprises: the automatic generation of training data of 5 deep learning models in the method is mainly described, the method relates to various network structures, each network needs independent training data, and time and labor are consumed by manual marking, so that the automatic generation part of the data in the method can respectively generate corresponding training data for each network. And (3) detecting the model: the graph detection, the character line detection and the formula detection model use similar simulation programs, a pure white background picture and a section of corpus information with random length are randomly selected in the simulation data process, the corpus information is randomly added into the background picture, and the corpus coordinate position information is recorded in a corresponding txt file. The detection is mainly oriented to test paper, so that corpus information is mainly collected Word edition type test paper, a program in the data generation process can automatically simulate a part according to a certain probability to select a question style text, if the test data is training data aiming at a chart detection model, the program can add a chart with unfixed position to each piece of simulation data and record corresponding position information. By the simulation program in the method, a large amount of different training data can be generated in a short time, and the model can be trained and deployed in time conveniently.

The chart detection section: the method mainly describes how to detect the chart in the test paper, the chart detection network uses SSD-Mobilenetv2, the size of the test paper image resize is 224x224 resolution in the method, the test paper features are extracted by lightweight Mobilenetv2, and the chart areas are respectively and independently predicted from a plurality of different sizes according to the SSD layered prediction idea, so that the detection of chart targets with different sizes in the test paper is realized, and finally the final chart area position can be obtained by global NMS.

The character line detection section: mainly describing how to detect all lines of text from the test paper image, the line of text detection in the method uses the advanced EAST algorithm. The test paper text line detection is complex, element symbols such as various formulas or mathematical sets of different types may exist in a section of text line, the height of the formulas is often higher than the height of the text line by a plurality of pixels, and incomplete detection of the formula part may occur if the detection is directly performed according to the text line area. Therefore, in the automatic generation stage of the data, the training data all use the height of the formula in the same line as the standard label, so that the detection algorithm can be ensured to completely detect the formula in the character line to the greatest extent. The Advance EAST is a character line detection algorithm in a natural scene, and can perform four-point positioning on a detection target, so that the detection under the problems of irregular inclination of the target to be detected and the like is solved. The size of the input test paper image is 1280x192 (the length and the width of the input image must be integral multiple of 32) in the character line detection, the VGG is used for extracting features by the algorithm backbone network, after a plurality of features are extracted, 1x1 and 3x3 size convolution are sequentially carried out on the feature map, and the feature map is up-sampled to the same scale as the previous feature map and spliced. And finally, the network obtains the fused feature map, performs 1x1 convolution for three times on the basis, outputs feature maps with three channels of 1, 2 and 4 respectively, and respectively represents the score value of a pixel point, the probability value that the pixel point is a text region or a non-text region and the connection prediction between the pixel point and the surrounding four directions.

The formula detection part: it is mainly described how to detect whether or not a formula exists from the above-described detected text lines, thereby individually cutting and identifying the formula portion. In the method, the formula detection uses a CTPN network which is improved from fast R-CNN and can effectively detect the transversely-distributed characters of a complex scene. According to the method, feature maps are obtained by using the first 5 Conv stages of VGG16, window features of 3x3xC are taken at each position of the feature maps of Conv5, and the features are mainly used for predicting category information and position information corresponding to k anchors at the current position. The features of all windows per row corresponding to 3x3xC are input to RNN, resulting in an output vector of W × 256, and W × 256 of RNN is input to a full-connected layer of 512 dimensions. The Fc layer features are input into three classification or regression layers, the second 2k scores representing class information (character or not) of k anchors. The first 2k vertical code and the third k side-refinement are the location information used to regress the k anchors. And finally merging the propofol of the characters obtained by classification into a text line by using a detected text line construction algorithm, namely formula detection output.

An OCR recognition section: the method mainly describes how to recognize characters in a character area and a formula area, and comprises two recognition engines, wherein one recognition engine is a traditional OCR recognition algorithm and recognizes characters, numbers and the like, the other recognition engine is an algorithm specially recognizing a formula, the two recognition engines adopt the basic framework of CNN + LSTM, the formula recognition additionally uses an Attention mechanism, and the character recognition algorithm uses CTC to calculate loss Function. In the method, the height of the character line input by the character recognition model is 32x280, and the character length is between 5 and 15 characters. The image size after formula identification input formula detection is not fixed, the formula image height is higher than the character line height, the formula identification result is output in a latex format, and the image data can be displayed only by a post-processing part.

And (3) post-treatment: the method aims to realize automatic test paper transcription, not only transcribes the test paper image content into a Word version, but also needs to output the transcription result according to the original image original layout, so that the post-processing process comprises the steps of detecting the results of chart detection, character line detection, formula detection and OCR (optical character recognition) according to the coordinate position information of a detection target, sequencing according to the first Y and the second X coordinates, and finally adding the formula recognition result to the corresponding position according to the position of the formula coordinates in the character line and carrying out global optimization processing.

A test paper automatic transcription method based on deep learning comprises the following steps:

step one, manufacturing training data: the deep learning model training needs a large amount of data as a support, the test paper automatic transcription aimed at by the method has 5 deep learning models, and a large amount of test paper data corresponding to different models are needed as training data, so that by utilizing a data simulation program in the method, a large amount of training data can be quickly generated by specifying parameters such as total sample amount, test paper data types, simulation data forms and the like, the simulation program can generate a jpg format picture and a corresponding txt file, wherein [ xmin, ymin, xmax, ymax ] format test paper text lines or charts and formula region coordinate information are stored in the txt file, and when N text lines, formulas or charts exist in the jpg format picture, N lines of similar coordinate information exist in the corresponding txt;

step two, data preprocessing: according to 5 different model training data formats, in the training data, the sizes of character line detection data are unified from resize to 1280x190, the sizes of chart detection data are unified from resize to 224x224, the character line detection data are input into an RGB picture, the picture is normalized to be between-1 and 1, the height of an OCR character recognition input image is 32 pixels, and a gray scale image is input. In the training process, the batchsizes are used as basic input, each batchsize is randomly selected from an original picture, and data enhancement operations such as Gaussian fuzzification, contrast, brightness, paper cutting and the like are randomly added;

step three, training a neural network: model training is carried out integrally by adopting an end-to-end network structure, and the hyper-parameters are set as follows:

(1) and learning rate: the initial learning rate of the detection model is set to be 0.01, the initial learning rate of the identification model is reduced by 10 percent in each training 10 rounds, the initial learning rate of the identification model is 0.0001, and the initial learning rate of the identification model is reduced by 10 percent in each 10 epochs;

step four, post-treatment: and converting the models into pb files, sequentially splicing the pb files, outputting the previous model as the next model input, and finally typesetting the recognition result again and outputting the Word format.

The invention aims at the test paper image, realizes the automatic transcription of the content of the test paper document by a deep learning method, and automatically transcribes the image type test paper data into a Word version by the method, thereby laying a foundation for the construction of a large-scale test paper database.

In summary, the present invention provides a method for automatically transcribing a test paper based on deep learning, which is mainly used for automatically transcribing test papers including common test papers of mathematics, Chinese, english, etc., and the automatic transcription of the test paper in the present invention refers to a process of automatically converting the content of the test paper into a Word version in the test paper image data obtained by scanning or shooting, etc., so as to realize the conversion of the content of the test paper from a picture to an electronic version. Aiming at the characteristics of the automatic test paper transcription process, the method provides an integrated whole process solution of detection and identification by using various image processing technologies based on deep learning, and realizes a new one-stop automatic test paper image content transcription method. The method can well transcribe the test paper aiming at different types of questions, particularly in scenes that the test paper questions contain charts, formulas and the like. According to the method, the deep learning technology is introduced into the test paper transcription, so that the process of the test paper transcription from manual to automatic is realized, the problem of time consumption for extracting the test paper picture and document information is solved, the process of the test paper text information extraction from manual to automatic is realized, and the test paper transcription efficiency is greatly improved.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A test paper automatic transcription system and method based on deep learning are characterized in that: the system is based on a deep learning technology and mainly comprises the steps of data automatic generation, chart detection, character line detection, formula detection, OCR recognition, post-processing and the like.

2. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the main features are described in detail as follows: the method comprises the steps of automatically generating training data required by algorithms such as character line detection, chart detection, OCR (optical character recognition) and the like by using a simulation program, wherein the chart detection is responsible for separating pictures from character areas in a test paper, the character line and formula detection is responsible for detecting all character lines (formulas) in the test paper, the OCR recognition is responsible for recognizing the detected character lines and formulas, the recognition and detection results are rearranged in a post-processing process, and a Word format is output according to the original format of the test paper.

3. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the data is automatically generated and specifically described as follows: training data which is highly similar to a real sample is automatically generated by using a programming language, and various layout test paper pictures and label data including charts, character lines, formulas and the like in specified quantity are randomly generated in the process of simulating data through program control.

4. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the chart detection is specifically described as follows: a lightweight SSD-Mobilenetv2 network is used as a chart detection network, the size of an input image is 224x224, Mobilenetv2 is used as a backbone network to extract features, and chart areas in the image are respectively detected based on multi-layer features of the SSD, so that the position coordinate information of the chart areas is obtained.

5. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the text line detection is specifically described as follows: the method comprises the steps that a word line detection algorithm Advance EAST under a natural scene is used as a word line detection network, the situation that the placing position of a test paper picture is inclined and the like possibly exists in the shooting or scanning process, at the moment, if an algorithm based on two-point positioning is used, the situation that the word line is not accurately positioned occurs, therefore, a four-point positioning algorithm is used for positioning the word line in the method, and when the picture is inclined, the four-point coordinates of the word line are subjected to perspective transformation, so that the word line area can be aligned; the method uses 1280x192 resolution images as input to position all character lines in the images, thereby obtaining the position coordinate information of the character lines in the test paper images, and restoring the coordinates to the original image area for perspective transformation. Considering the problem that the formula height in the character line is larger than the character line height, the character line coordinates are subjected to the outward expansion of 5 pixel points, so that each line of characters intercepted from the original image is ensured to include a complete formula area.

6. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the formula detection is specifically described as: the CTPN algorithm is used as a formula detection network, the output of the character line detection result is used as input, and therefore whether a formula exists in each character line is detected, and the character area and the position coordinates of the formula area in the current input character line are distinguished.

7. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the OCR recognition is specifically described as: the OCR recognition is divided into character recognition and formula recognition, character area and formula area position coordinates in a character line can be obtained through character line detection and formula detection, corresponding areas are cut from an original image according to the coordinates, the character areas are input into a character recognition engine to be recognized, the formula areas are input into a formula engine to be recognized, and recognition of all characters and formulas in a test paper is achieved through two independent branches.

8. The method for automatic transcription of test paper based on deep learning of claim 1, wherein: the post-treatment is specifically described as: and according to the results of chart detection, character recognition, formula recognition and the like, rearranging the recognition result and outputting the Word version transcription result according to the original test paper layout. The method comprises the following specific steps:

step one, simulating training data: the method is oriented to all detection and identification models related to the test paper automatic transcription process, 5 different models need to be trained independently, each model needs a large amount of training data as support, manual marking is time-consuming, and batch training data needed by the 5 models can be simulated conveniently through an automatic data generation program in the method;

the data generation process is sequentially simulated according to the sequence of chart detection, character line detection, formula detection, character recognition and formula recognition, the detected data label information is coordinate information of a chart or a character line, and the label information of OCR related recognition is an index of a corresponding character string on a picture in a dictionary. The simulation program is provided with partial data enhancement functions, including fuzzification processing, noise data adding and other processes, and batch training data can be generated only by specifying the total amount of samples and operating the corresponding simulation program in the simulation process;