CN112446259A - Image processing method, device, terminal and computer readable storage medium - Google Patents
Image processing method, device, terminal and computer readable storage medium Download PDFInfo
- Publication number
- CN112446259A CN112446259A CN201910824436.5A CN201910824436A CN112446259A CN 112446259 A CN112446259 A CN 112446259A CN 201910824436 A CN201910824436 A CN 201910824436A CN 112446259 A CN112446259 A CN 112446259A
- Authority
- CN
- China
- Prior art keywords
- image
- text
- information
- mark
- text content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 61
- 238000001514 detection method Methods 0.000 claims abstract description 72
- 238000012545 processing Methods 0.000 claims abstract description 30
- 238000003058 natural language processing Methods 0.000 claims abstract description 28
- 238000013135 deep learning Methods 0.000 claims abstract description 27
- 238000012360 testing method Methods 0.000 claims description 58
- 238000012015 optical character recognition Methods 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 15
- 238000000605 extraction Methods 0.000 abstract 1
- 238000012937 correction Methods 0.000 description 36
- 230000006870 function Effects 0.000 description 18
- 238000000034 method Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 14
- 239000000463 material Substances 0.000 description 12
- 238000013527 convolutional neural network Methods 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 238000005192 partition Methods 0.000 description 6
- 238000007781 pre-processing Methods 0.000 description 5
- 230000004075 alteration Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 101100221616 Halobacterium salinarum (strain ATCC 29341 / DSM 671 / R1) cosB gene Proteins 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 102100032202 Cornulin Human genes 0.000 description 2
- 241001442234 Cosa Species 0.000 description 2
- 101000920981 Homo sapiens Cornulin Proteins 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 229910000831 Steel Inorganic materials 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000006748 scratching Methods 0.000 description 1
- 230000002393 scratching effect Effects 0.000 description 1
- 239000010959 steel Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an image processing method, an image processing device, a terminal and a computer readable storage medium, wherein the image processing method comprises the following steps: acquiring text content in the image according to a natural language processing algorithm; determining the characteristics of the text according to the text content; identifying mark information in the image with the mark according to a target detection algorithm based on deep learning; and determining the association relation among the characteristics of the text, the marking information and the text content to generate a recognition result. By the technical scheme, the marked text content in the image can be quickly identified and recorded, and wrong records or extraction of annotation information are completed, so that the text content is efficiently acquired, and the learning efficiency is improved.
Description
Technical Field
The present invention relates to the field of image processing and education, and in particular, to an image processing method, an image processing apparatus, a terminal, and a computer-readable storage medium.
Background
With the development of economy and science, people's time is more and more precious, and the time spent on education cause is indispensable. In order to improve the international status of China, the cultivation of high-quality technical skills-type talents is an essential way. This increases the stress on the teacher, and especially, it is time-consuming and labor-consuming to correct the homework of the students and to count the wrong assignments of the students. In the learning process, the comment contents in books need to be browsed frequently, or the homework, exercises and wrong questions in test papers need to be arranged into a book, namely a wrong-question book, so that weak links in learning are found conveniently, the learning key points are prominent, the learning is more targeted, and the learning efficiency and the learning score are improved. The existing wrong-question collecting and analyzing scheme is that students manually copy homework, exercises and wrong questions in test papers which are done by themselves into wrong-question books, or the students manually cut the homework, exercises and wrong questions in the test papers which are done by themselves and paste the cut wrong questions into the wrong-question books, while the existing scheme for checking the comment contents generally refers to that the students look over the books again by memory, so that the efficiency of obtaining target contents is low.
Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art or the related art.
To this end, it is an object of the present invention to provide an image processing method.
Another object of the present invention is to provide an image processing apparatus.
Another object of the present invention is to provide a terminal.
It is another object of the present invention to provide a computer-readable storage medium.
In order to achieve the above object, according to an aspect of the first aspect of the present invention, there is provided an image processing method including: acquiring text content in the image according to a natural language processing algorithm; determining the characteristics of the text according to the text content; identifying mark information in the image with the mark according to a target detection algorithm based on deep learning; and determining the association relation among the characteristics of the text, the marking information and the text content to generate a recognition result. The images and the images with marks belong to corresponding relations, and have the same typesetting and partially similar contents.
In the technical scheme, the image is detected and identified by combining a plurality of image processing methods (at least including a natural language processing algorithm and a target detection algorithm based on deep learning), so that the mark information (such as mark information, annotation handwriting, annotation symbols and the like) in the image and the text content (such as test questions and character strings) associated with the mark information can be identified, and further the target content is extracted. Specifically, the natural language processing algorithm is used for carrying out text detection on the image, determining a text box (text area) of text content in the image, further analyzing the text content to determine the characteristics of the text, and when the image with the mark is processed on the basis, the processing efficiency is greatly improved. Further, the positioning can be performed according to the characteristics (e.g. title, position) of the text, so as to obtain the corresponding text content and the mark information.
Natural Language Processing (NLP) algorithm is a sub-field of artificial intelligence, and performs Text detection based on the principle of deep learning, such as OCR (Optical Character Recognition) algorithm based on deep learning and CTPN (detection Text in Natural Image with connectivity Text protocol Network) Text line detection algorithm based on the Text detection connected to a preselected frame Network. The method for realizing the OCR algorithm model comprises the following steps: the CRNN + CTC builds a network, and the following convolutional neural network can also be used to build a model: LeNet + CTC; AlexNet + CTC; ZF + CTC; VGG + CTC; google lenet + CTC; ResNet + CTC; DenseNet + CTC.
Among them, CRNN is CNN (Convolutional Neural Networks) + RNN (Recurrent Neural Networks), CTC (connected Temporal Classification), LeNet (a Neural network proposed by Yan LeCun for handwritten character recognition and Classification), AlexNet (a Neural network proposed by Hinton and Alex krizenvsky), ZF (full-name ZFNet, a Neural network based on AlexNet), VGG (full-name vgnet, a deep Convolutional Neural network proposed by oxford university computer vision group and gore deepness Deepmind), Google (a deep learning structure proposed by Christian sge), res (residual network), and densnet (Convolutional network, a Neural network developed with a larger number of layers).
The target detection algorithm is used for identifying the mark information in the image, the mark information generally has significant differences with text contents, and the model capable of identifying the mark information according to the differences mainly comprises: FPN (Feature Pyramid Network) model, FPN model, Faster R-CNN model, R-FCN model, YOLO model, and SSD model.
Among them, fast R-CNN (called "fast Region-CNN" in its entirety), R-FCN (Object Detection view-based Fuel Convolentional Networks, which is a target Detection method), YOLO (You Only Look one: Real-time Object Detection, which is a target Detection system based on a Single neural network), SSD (called "Single Shot Multi Box Detector", which is a multi-target Detection algorithm that directly predicts the target type and bounding box).
According to the image processing method of the above technical solution, optionally, determining an association relationship between the feature of the text, the tag information, and the text content to generate the recognition result specifically includes: extracting corresponding text content from the marked image according to the text content and the characteristics of the text; positioning a text in the image with the mark according to the characteristics, determining the area of the text, and determining the area of the mark according to the mark information; if the intersection area of the text and the area of the marked area is larger than or equal to a preset proportion compared with the area of the marked area, determining that the marking information is associated with the text content; and generating a recognition result according to the text content and the mark information.
In the technical scheme, the corresponding relation between the plurality of marking information and the plurality of sections (multiple lines) of text content is determined according to the coincidence degree of the marking information and the text content, so that the target text content (for example, a wrong question or a text line with annotation handwriting) can be extracted from the image according to the marking information, the time for manually searching and extracting records is saved, the information acquisition efficiency is greatly improved, and the learning efficiency is improved. Specifically, the area of the region of the mark information can be obtained according to the detected mark information, the area of the region of the text content can be obtained according to the detected feature of the text and the text content, and if the intersection area of the region of the text and the area of the mark is greater than or equal to a preset ratio, for example, the preset ratio is 30%, and the overlapping area of the text region and the mark region accounts for more than 30% of the mark region, it can be determined that the two are related.
The image processing method according to any one of the above technical solutions, optionally, further includes: and determining the meaning of the marked information, and storing the text content corresponding to the marked information according to the meaning of the marked information.
In this embodiment, the meaning of the label information is further determined after the label information is identified, and the label information may include a question score, a correction symbol, or a comment character, and the meaning of the label information may be determined from the correction symbol "check mark" or "x", for example: and evaluating the text content as correct content or evaluating the text content as wrong content. Thereby classifying the text content with the correction symbol "x" into the wrong question.
The image processing method according to any one of the above technical solutions, optionally, further includes: determining layout information of the image by using a straight line detection algorithm and a blank area detection algorithm to generate layout coordinates; and segmenting the image according to the layout coordinates.
In the technical scheme, the page information of the image is detected according to the straight line detection algorithm and the blank area detection algorithm, and the text content of the image is identified according to the page, so that the identification accuracy of the multi-page text can be improved.
According to the image processing method of any one of the above technical solutions, optionally, obtaining text content in the image according to a natural language processing algorithm specifically includes: performing text line detection on the image to determine text line information; and performing optical character recognition according to the text line information to determine text content.
In the technical scheme, edge contour detection is carried out according to a binary image of the image, so that a plurality of rectangular frames corresponding to texts in the image are determined, the text frames can be predicted according to the rectangular frames, text line information can be obtained after the rectangular frames are integrated, further OCR (optical character recognition) is carried out according to the text line information, and text content can be obtained. In addition, text line detection can also be performed on the image by using the CTPN model to determine text lines, and the content of the text can be determined by using the OCR model according to the text line information. The CTPN model and the OCR model can be updated and maintained based on the deep learning principle, and the text line detection accuracy and the character recognition accuracy are improved.
According to the image processing method of any one of the above technical solutions, optionally, identifying, according to a target detection algorithm based on deep learning, the mark information in the image with the mark specifically includes: building a marking information identification model according to the characteristic pyramid network; training a label information recognition model; and inputting the image with the mark into a mark information identification model, and outputting the mark information by the mark information identification model.
In the technical scheme, a target detection algorithm based on deep learning is embodied as a mark information recognition model, the mark information recognition model built according to a Feature Pyramid Network (FPN) model is trained by using a training set, and an image with marks is used as the input of the model, so that the mark information can be accurately recognized.
According to the image processing method of any of the above technical solutions, optionally, the mark information recognition model recognizes the mark information according to the color information, or the mark information recognition model recognizes the mark information according to a font difference or a font size difference between the text content and the mark information.
In the technical scheme, in consideration of the difference between the mark information and the text information, the text and the mark can be distinguished based on the color information or the characters, the word sizes and other characteristics, and the mark information is recognized from the image. For example, the mark (correction handwriting) on the test paper is generally red, and the test paper image with the red correction handwriting is used as a training set to train the mark information recognition model, so that the mark information recognition model recognizes the mark information according to the color information. For another example, if a handwritten comment exists near a text line and the handwritten font and the print font are different from the print font in both font and font size, the label information recognition model is trained using a text image with the handwritten comment as a training set so that the label information recognition model recognizes the label information based on the font difference or font size difference between the text content and the label information.
According to the image processing method of any one of the above technical solutions, optionally, the image includes at least one of: unanswered test question images, answered test question images, unmarked test question images, marked test question images, unmarked text images and marked text images, wherein the marked images comprise at least one of the following images: a test question image with a mark and a text image with a mark.
In the technical scheme, the step of extracting the text content according to the natural language processing algorithm can use an unanswered test question image, an answered test question image, a unmarked test question image, a marked test question image, an unmarked text image or a marked text image as a material to determine the characteristics of the text content and the text, and the target detection algorithm based on the deep learning takes the marked test question image and the marked text image as the material and can perform preprocessing by referring to the processing result of the natural language processing algorithm to extract the mark information. It should be noted that, in the step of extracting the text content according to the natural language processing algorithm, the unanswered test question image, the unmarked test question image or the unmarked text image is preferably used as a material, so that the interference factor can be reduced, and the accuracy and the efficiency of text recognition can be improved.
According to the image processing method of any one of the above technical solutions, the text content specifically includes a test question or a character string, and the feature of the text specifically includes a question number or character position information.
The image processing method according to any one of the above technical solutions, optionally, further includes: and establishing an error question set according to the marking information and the text content, or counting question answering conditions according to the relation between the marking information and the text content, wherein the image is a test question image, the image with the mark is a revised test question image, the text is characterized by a question number, and the marking information corresponds to the marking information.
According to the technical scheme, the scanned image of the examination paper is used as a material for image processing, corresponding wrong questions in the examination paper can be extracted according to the marking information (batch alteration traces), and the wrong questions are integrated together to form a wrong question set, so that a learner can be helped to consolidate error-prone knowledge points, and the learning efficiency is improved. In addition, the questions (question answering conditions) with high error rate can be counted according to the question numbers and the marking information, and for teachers, weak links in teaching activities can be found according to the question answering conditions, so that teaching contents are arranged in a targeted mode, and the improvement of teaching quality is facilitated.
According to an aspect of the second aspect of the present invention, there is provided an image processing apparatus including: a memory, a processor and a program stored in the memory and executable on the processor, the program, when executed by the processor, implementing the steps of the image processing method according to any one of the above-mentioned technical solutions. The image processing apparatus includes all the advantages of the image processing method according to any of the above technical solutions, and details are not repeated here.
According to a technical solution of the third aspect of the present invention, there is also provided a terminal, including: the image processing apparatus according to the second aspect of the present invention. The terminal includes all the advantages of the image processing method according to any of the above technical solutions, which are not described herein again.
According to an aspect of the fourth aspect of the present invention, there is also provided a computer-readable storage medium, on which a computer program is stored, the computer program, when executed, implementing the image processing method defined in any one of the aspects of the first aspect.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 shows a schematic flow diagram of an image processing method according to an embodiment of the invention;
FIG. 2 shows a schematic flow diagram of an image processing method according to another embodiment of the invention;
FIG. 3 illustrates a blank job image of an image processing method according to one embodiment of the present invention;
FIG. 4 illustrates a layout analysis result of an image processing method according to an embodiment of the present invention;
FIG. 5 shows layout 1 of the results of layout analysis of an image processing method according to one embodiment of the present invention;
FIG. 6 shows layout 2 of the results of layout analysis of an image processing method according to one embodiment of the present invention;
fig. 7 shows a text line detection result of layout 1 of the image processing method according to an embodiment of the present invention;
FIG. 8 illustrates a wholesale job image of an image processing method according to one embodiment of the present invention;
FIG. 9 illustrates a recognition result of a red-stroke trace recognition model of an image processing method according to an embodiment of the present invention;
FIG. 10 shows a schematic block diagram of an image processing apparatus according to an embodiment of the present invention;
FIG. 11 shows a schematic block diagram of a terminal according to one embodiment of the present invention;
FIG. 12 shows a schematic block diagram of a computer-readable storage medium according to an embodiment of the invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Example one
As shown in fig. 1, an image processing method according to an embodiment of the present invention includes:
102, acquiring text contents in an image according to a natural language processing algorithm;
104, determining the characteristics of the text according to the text content;
106, identifying mark information in the image with the marks according to a target detection algorithm based on deep learning;
and step 108, determining the association relation among the characteristics of the text, the marking information and the text content to generate a recognition result.
In the embodiment, a plurality of image processing methods (at least including a natural language processing algorithm and a target detection algorithm based on deep learning) are combined to detect and recognize the image, so that the mark information (such as mark information, annotation handwriting, annotation symbols and the like) in the image and the text content (such as test questions and character strings) associated with the mark information can be recognized, and further the target content is extracted. Specifically, the natural language processing algorithm is used for carrying out text detection on the image, determining a text box (text area) of text content in the image, further analyzing the text content to determine the characteristics of the text, and when the image with the mark is processed on the basis, the processing efficiency is greatly improved. Further, the positioning can be performed according to the characteristics (e.g. title, position) of the text, so as to obtain the corresponding text content and the mark information.
According to the image processing method of the embodiment, optionally, determining an association relationship among the feature of the text, the label information, and the text content to generate the recognition result specifically includes: extracting corresponding text content from the marked image according to the text content and the characteristics of the text; positioning a text in the image with the mark according to the characteristics, determining the area of the text, and determining the area of the mark according to the mark information; if the intersection area of the text and the area of the marked area is larger than or equal to a preset proportion compared with the area of the marked area, determining that the marking information is associated with the text content; and generating a recognition result according to the text content and the mark information.
In the embodiment, the corresponding relation between the plurality of marking information and the plurality of sections (multiple lines) of text content is determined according to the coincidence degree of the marking information and the text content, so that the target text content (for example, a wrong question or a text line with annotation handwriting) can be extracted from the image according to the marking information, the time for manually searching and extracting records is saved, the information acquisition efficiency is greatly improved, and the learning efficiency is improved. Specifically, the area of the region of the mark information can be obtained according to the detected mark information, the area of the region of the text content can be obtained according to the detected feature of the text and the text content, and if the intersection area of the region of the text and the area of the mark is greater than or equal to a preset ratio, for example, the preset ratio is 30%, and the overlapping area of the text region and the mark region accounts for more than 30% of the mark region, it can be determined that the two are related.
The image processing method according to the above embodiment, optionally, further includes: and determining the meaning of the marked information, and storing the text content corresponding to the marked information according to the meaning of the marked information.
In this embodiment, the meaning of the label information is further determined after the label information is identified, and the label information can include a test question score, a correction symbol or an annotation character, and for example, the meaning of the label information can be determined according to the correction symbol "v" or "x" as follows: and evaluating the text content as correct content or evaluating the text content as wrong content. Thereby classifying the text content with the correction symbol "x" into the wrong question.
The image processing method according to the above embodiment, optionally, further includes: determining layout information of the image by using a straight line detection algorithm and a blank area detection algorithm to generate layout coordinates; and segmenting the image according to the layout coordinates.
In the embodiment, the layout information of the image is detected according to the straight line detection algorithm and the blank area detection algorithm, and the text content of the image is identified according to the layout, so that the identification accuracy of the multi-layout text can be improved.
According to the image processing method of the above embodiment, optionally, obtaining text content in the image according to a natural language processing algorithm specifically includes: performing text line detection on the image to determine text line information; and performing optical character recognition according to the text line information to determine text content.
In this embodiment, edge contour detection is performed on the binarized image of the image to determine a plurality of rectangular frames corresponding to the text in the image, the text frames can be predicted from these rectangular frames, text line information can be obtained after integrating the rectangular frames, and further, OCR (optical character recognition) is performed on the text line information to obtain text content. In addition, text line detection can also be performed on the image by using the CTPN model to determine text lines, and the content of the text can be determined by using the OCR model according to the text line information. The CTPN model and the OCR model can be updated and maintained based on the deep learning principle, and the text line detection accuracy and the character recognition accuracy are improved.
According to the image processing method of the embodiment, optionally, identifying the mark information in the image with the mark according to a target detection algorithm based on deep learning specifically includes: building a marking information identification model according to the characteristic pyramid network; training a label information recognition model; and inputting the image with the mark into a mark information identification model, and outputting the mark information by the mark information identification model.
In the embodiment, the target detection algorithm based on deep learning is embodied as a mark information recognition model, the mark information recognition model built according to a Feature Pyramid Network (FPN) model is trained by using a training set, and an image with marks is used as the input of the model, so that the mark information can be accurately recognized.
According to the image processing method of the above embodiment, optionally, the tag information recognition model recognizes the tag information based on the color information, or the tag information recognition model recognizes the tag information based on a font difference or a font size difference between the text content and the tag information.
In this embodiment, in consideration of the difference between the label information and the text information, the text and the label may be distinguished from each other based on the color information or based on the character style, the font size, or the like, and the label information may be recognized from the image. For example, the mark (correction handwriting) on the test paper is generally red, and the test paper image with the red correction handwriting is used as a training set to train the mark information recognition model, so that the mark information recognition model recognizes the mark information according to the color information. For another example, if a handwritten comment exists near a text line and the handwritten font and the print font are different from the print font in both font and font size, the label information recognition model is trained using a text image with the handwritten comment as a training set so that the label information recognition model recognizes the label information based on the font difference or font size difference between the text content and the label information.
According to the image processing method of the above embodiment, optionally, the image includes at least one of: unanswered test question images, answered test question images, unmarked test question images, marked test question images, unmarked text images and marked text images, wherein the marked images comprise at least one of the following images: a test question image with a mark and a text image with a mark.
In this embodiment, the extracting of the text contents according to the natural language processing algorithm may use an unanswered question image, an answered question image, an unmarked question image, a marked question image, an unmarked text image, or a marked text image as a material to determine the characteristics of the text contents and the text, and the target detection algorithm based on the deep learning may use the marked question image and the marked text image as a material while performing preprocessing with reference to the processing result of the natural language processing algorithm to extract the marking information. It should be noted that, in the step of extracting the text content according to the natural language processing algorithm, the unanswered test question image, the unmarked test question image or the unmarked text image is preferably used as a material, so that the interference factor can be reduced, and the accuracy and the efficiency of text recognition can be improved.
According to the image processing method of the foregoing embodiment, optionally, the text content specifically includes a question or a character string, and the feature of the text specifically includes a question number or character position information.
The image processing method according to the above embodiment, optionally, further includes: and establishing an error question set according to the marking information and the text content, or counting question answering conditions according to the relation between the marking information and the text content, wherein the image is a test question image, the image with the mark is a revised test question image, the text is characterized by a question number, and the marking information corresponds to the marking information.
In the embodiment, the scanned image of the examination paper is used as a material for image processing, so that corresponding wrong questions in the examination paper can be extracted according to the marking information (batch alteration trace), and the wrong questions can be integrated to form a wrong question set, so that a learner can be helped to consolidate error-prone knowledge points, and the learning efficiency is improved. In addition, the questions (question answering conditions) with high error rate can be counted according to the question numbers and the marking information, and for teachers, weak links in teaching activities can be found according to the question answering conditions, so that teaching contents are arranged in a targeted mode, and the improvement of teaching quality is facilitated.
Example two
Fig. 2 is a schematic flow chart of an image processing method in an application scenario of topic identification according to another embodiment of the present invention, which includes: step 202, inputting a blank job image; step 204, performing layout analysis on the image to obtain a rectangular area of each layout; step 206, performing text line detection on each layout; step 208, performing OCR (optical character recognition) on the text line of each layout, and combining the results to obtain a text; step 210, extracting question information from the text according to the question number characteristics; step 212, inputting the answered and red-stroke corrected job image, and performing image preprocessing on the answered and red-stroke corrected job image by taking the blank job image as a standard to obtain information of each question; step 214, recognizing red-stroke correction traces by using a pre-trained deep learning detection model on the images to obtain correction results; and step 216, obtaining the correction information of each question finally through the correction result and the coordinate comparison of each question.
The image processing method of the present application is explained in detail with reference to fig. 3 to 9: the blank job image in step 202 is shown in FIG. 3, and the blank job image and the job image that was answered and red-tipped are the same test paper. In step 204, a schematic diagram of performing layout analysis on an image is shown in fig. 4, where a line in the middle of a test paper, i.e. a separation line, is used to divide the image, and the layout analysis generates two layouts: layout 1 (see fig. 5) and layout 2 (see fig. 6), where layout area: [235,5,1505,2330],[1746,5,1559,2330]. The result of the text line detection (text line detection for layout 1) in step 206 is shown in fig. 7. In step 208, OCR recognition is performed to obtain texts, and the texts are combined to obtain the following result:
[ [' ______ school 2013-
[ [ 'seventh grade _______', [104,120,363,52] ]
[ '(test time minute, full mark', [104,410,363,32] ])
[[”,[211,467,1172,230]]]
[ [ 'Note: using blue, black steel' [85,713,846,33] ]
[ [ 'I, choice questions (total 9 small questions, total 45.0 points of the main subject)', [77,759,561,30] ]
If the set a [ ' 1] { xr \ ^ 2-good-3 <0}, B ═ X [ X-3>0}, then a ═ B ═ a ', [87,802,906,34] ] [ ' 1]
[['(-,-',[251,857,90,51]]]
[ [ '2. function V ═ 2x \ 2-e \ at [ -2', [84,932,634,34] ]
[['A.',[142,985,31,276]]]
[[”,[193,1291,307,272]],['D',[784,1291,24,272]]]
[ '3. known series of equidifferences { a-first 9 terms', [84,1584,817,32] ]
[['A.100',[142,1629,85,26]]]
[ '4.', [83,1683,1041,46] ], [ ", [628,1683,38,46] ] ] of the function V ═ 2sm (X ═ div) ]
[['A.',[141,1770,32,46]]]
[ [ '5. interior angles A, B, Ci' of dABC, [84,1856,1196,40] ]
[[”,[1096,1890,12,17]]]
[['v',[250,1930,34,32]]]
[[”,[1173,1976,295,346]]]
Part of [ [ '6. function y ═ Asim/ox-p'), [84,1974,689,32] ]
[['A.',[141,2032,31,44]]]
[ [ [ 'v ═ 2s such as/x-', [244,2119,198,44] ]
[ [ 'page 1/page 4' altogether, [703,2205,141,25] ]
[['C.',[79,209,28,44]]]
[['V=2snr-',[182,295,145,44]]]
[ '7. known even function) in the interval [ r,', [20,382,1114,51] ], [ ", [841,382,28,51] ] [ ])
[['I2',[200,469,38,17]]]
[[”,[505,469,69,51]],['C.',[720,468,29,52]]]
[[”,[186,484,68,36]]]
[ [ '8 ] let the straight line l pass through one' of the ellipses, [21,556,1388,47] ]
[[”,[504,643,10,17]],[”,[823,643,14,16]],[”,[1146,642,13,17]]]
[['A.',[78,654,31,26]]]
[[”,[504,677,12,16]],[”,[824,677,12,16]],[”,[1146,676,13,13]]]
[[”,[1041,719,162,219]],[”,[1306,718,97,220]]]
[' 9] is composed of a cylinder and a cone as shown in the figure, [20,717,961,29] ]
[ [ 'is:)', [79,754,93,36] ]
[ [ '20', [185,849,46,24] ]
[['B.',[78,893,29,26]]]
[['28T',[183,937,46,23]]]
[['327',[185,958,46,99]],[”,[1114,958,98,99]]]
[ 'two, space filling problem (the main problem is totally 4 small problem, totally 20.0 points)', [13,1116,560,30] ], [ ", [24,1116,21,30] ]
The opposite sides of the internal angles a, B, C of [ [ [ '10.1ABC are a, B, C, respectively, if cosA-, cosC', [23,1171,917,49] ], [ ", [816,1172,21,48] ], [", [950,1171,32,49] ], [ 'a', [1002,1171,261,49] ] [ ], respectively
[ [ '11 ] known hyperbolas C: has ', [23,1259,277,52] ], the right term ' of [ '5- [ > la >0, b > ] ', [280,1259,1119,52] ], [ ", [329,1259,25,52] ] ]
[ [ 'an asymptote intersects M, N two points.', [80,1332,883,33] ]
[ '12 ] if the straight line y-x-b is the curve p', [23,1376,1153,32] ]
[ [ '13. curve V ═ x \ 2-points/], 2 at,', [23,1432,637,51] ], [ ", [243,1432,38,51] ] [ ]
[ [ 'three, answer questions (10 questions in total and 120.0 points in total for the present general question)', [13,1506,588,30] ], [ ", [23,1506,23,30] ], [", [25,1506,19,30] ] [ ]
The opposite sides of the interior angles a, B, C of [ [ '14.dABC are a, B, C, respectively, known as 2 coscrosb-bcos 4) ═ C.', [23,1550,1057,32] ]
[ [ '(I calculating C;', [80,1594,111,30] ])
[ [' l II fibular ═ area of dABC, [80,1650,674,40] ]
[[”,[487,1686,12,16]]]
[ [ [ '15. internal angles A, B, C', [23,2041,978,50] ]of dABC
['ll) to obtain cosB; ',[79,2115,128,32]]]
[ [ '2) if a-c is 6, the area of Δ ABC', [79,2158,511,32] ]
[ [ 'page 2/page 4' altogether, [648,2205,142,25] ]
In step 210, the question information is extracted according to the question number features, and the result is as follows:
[ 'one', [ 'one, choice questions (total 9 small questions, total 45.0 points in the main subject)' ] [312,756,1428,43] ]
If the set a ═ xr \ Λ 2-good-3 <0} and B ═ X [ X-3>0} are given, a ═ B ═ a ', ' (, - ' ], [322,807,1418,114]
[ '2', [ '2 ] the function V ═ 2x \ 2-e \ at [ -2,', 'a.', 'D' ], [319,929,1421,645] ]
[ ' 3], [ ' 3] known arithmetic progression { a-first 9 term ', ' A.100' ], [319,1583,1421,87] ]
'4', '4' of a function V2 sm (X ÷ a.), [318,1678,1422,159]
[ '5.', [ '5. internal angle A, B, Ci', ", 'v' ], [319,1845,1421,124] ]ofdABC
The section of [ ' 6], [ Asim/ox-p) function, ", [ a ], [ v ], [ 2s such as/x- ', [ page 1/4 ' ], [ [319,1977,1421,350], [1766,214,1523,157] ], ]
[ '7', [ '7 ] known even functions) in the interval [ r,', 'c.', 'I2', "], [1766,379,1539,162] ]
[ '8', [ '8 ] a straight line l is taken through one of the ellipses,', ", [1767,550,1538,156] ]
The figure is composed of a cylinder and a cone, ',' is:), '20', 'B.', '28T', '327' ], [1766,714,1539,399] ], and
[ 'two', [ 'two, gap filling questions (total 4 small questions, total 20.0 points in the main subject)' ], [1759,1121,1546,38] ]
The opposite sides of the inner corners a, B and C of [ '10 ', [ '10.1ABC are a, B and C, respectively, and if cosA and cosC are l, B is ═ l, [1769,1168,1536 and 72] ]
The right term of a known hyperbola C: already 5- (la >0, b >), 'an asymptote intersects M, N two points', [1769,1249,1536,122]
[ '12 ], [ '12 ] if the straight line y ═ x-b is the curve p ' ], [1769,1380,1536,41] ]
[ '13 ], [ '13. curve V ═ x \ 2-points/], 2 points ' ], [1769,1429,1536,66] ]
[ ' three ', answer (total 10 small questions, total 120.0 points) of the main question) ' ] [1759,1504,1546,40]
The opposite sides of the inner angles a, B, C of [ '14 ], ['14.dABC are a, B, C, respectively, known as 2 coscracisb-bcos 4) ═ C ',' (I finds C; ', ' lIIfibec ═ area of dABC ', "], [1769,1552,1536,486] ]
Inner angles A, B, C ','ll of [ '15.', [ '15. dABC) to cosB; ', '2) if a-c is 6, Δ ABC area ', ' page 2/page 4 total ' ], [1769,2046,1536,189] ]
The image of the job answered and red-marked in step 212 is shown in FIG. 8. In step 214, the pre-trained deep learning detection model is used for recognizing red pen correction marks (see fig. 9), and the output result is as follows:
[[[(1747,1648),(2252,1932)],'right'],[[(677,1830),(1049,2085)],'right'],[[(1548,761),(1863,941)],'right'],[[(2562,1323),(2822,1476)],'right'],[[(2658,1144),(2884,1299)],'right'],[[(753,1517),(1014,1709)],'right'],[[(2775,605),(3045,745)],'right'],[[(596,927),(852,1129)],'right'],[[(2382,1273),(2594,1415)],'right'],[[(870,825),(1057,944)],'right'],[[(1173,1794),(1437,1999)],'right'],[[(2494,478),(2712,631)],'right'],[[(2289,700),(2359,752)],'right'],[[(2133,1393),(2273,1535)],'right'],[[(1041,1682),(1164,1836)],'wrong'],[[(245,1002),(348,1116)],'star'],[[(1173,817),(1286,946)],'star']]
in step 216, after the topic and the correction result are aligned by coordinates, topic correction information is obtained (an association relationship between the mark information and the text content is established). The results are as follows:
[ 'one', [ 'one, choice questions (total 9 small questions, total 45.0 points in the main subject)' ] [312,756,1428,43] ]
If the set a ═ xr \ Λ 2-good-3 <0} and B ═ X [ X-3>0} are given, a ═ B ═ a ', ' (, - ', [322,807,1418,114], ' right ' ]
[ '2', [ '2 ] the function V ═ 2x \ 2-e \ at [ -2,', 'a.', 'D' ], [319,929,1421,645], 'right' ]
[ ' 3], [ ' 3] known arithmetic series { a-first 9 term ', ' A.100' ], [319,1583,1421,87], ' right ' ]
'4', '4' of a function V2 sm (X ═ div), 'a' ], [318,1678,1422,159], 'wrong' ]
[ '5.', [ '5. internal angles A, B, Ci', ", 'v' ], [319,1845,1421,124], 'right' ]of dABC
The section of [ ' 6], [ Asim/ox-p) function y ], ", ' a. ', ' v ═ 2s such as/x- ', ' page 1/4 total ' ], [ [319,1977,1421,350], [1766,214,1523,157] ], ' right ' ]
[ '7', [ '7 ] known even functions) are in the interval [ r,', 'c.', 'I2', "], [1766,379,1539,162], 'right' ]
[ '8', [ '8 ] a straight line l is taken through one of the ellipses,', [1767,550,1538,156], 'right' ]
The figure is composed of a cylinder and a cone, ',' is:), '20', 'B.', '28T', '327', [1766,714,1539,399], 'right' ]
[ 'two', [ 'two, gap filling questions (total 4 small questions, total 20.0 points in the main subject)' ], [1759,1121,1546,38] ]
The opposite sides of the inner corners a, B and C of [ '10 ], [ '10.1ABC are a, B and C, respectively, and if cosA and cosC are l, B is ' ], [1769,1168,1536 and 72], ' right ' ]
The right term of the known hyperbola C: already 5- (la >0, b >), 'an asymptote intersecting M, N two points', [1769,1249,1536,122], 'right' ]
[ '12 ], [ '12 ] if the straight line y ═ x-b is the curve p ' ], [1769,1380,1536,41], ' right ' ]
[ '13 ], [ '13. curve V ═ x \ 2-points/], 2' ], [1769,1429,1536,66], ' right ' ]
[ ' three ', answer (total 10 small questions, total 120.0 points) of the main question) ' ] [1759,1504,1546,40]
The opposite sides of the inner angles a, B, C of [ '14 ], ['14.dABC are a, B, C, respectively, known as 2 coscracisb-bcos 4) ═ C ',' (I finds C; ', ' lIfibular ═ area of dABC ', "], [1769,1552,1536,486], ' wrong ' ]
Inner angles A, B, C ','ll of [ '15.', [ '15. dABC) to cosB; ', '2) if a-c is 6, Δ ABC area ', ' page 2/4 total pages ' ], [1769,2046,1536,189], ' right ' ]
EXAMPLE III
The image processing method according to the second embodiment specifically includes the steps of:
1. and inputting an original image img, and performing layout analysis to obtain all layout rectangular areas. The layout analysis method specifically comprises the following steps:
1.1 detecting layout separator definitions
The layout separators may be: blank areas, dashed lines, straight lines of more than a specified size and height.
1.2 prioritized double-number layout
1.2.1 taking the middle area image midle _ img of the image img, 1/5 with width of img and height of img;
1.2.2 detect the page dividing line in the image middle _ img, the method is as follows:
1.2.2.1 method for detecting blank area: carrying out binarization processing on the midlet _ img to obtain an image binary _ img, carrying out vertical projection (counting the number of 0 in the vertical direction) on the image binary _ img to obtain a projection result array, wherein if an interval with the width larger than a preset value exists in the array, the position of the interval is a layout partition line layout _ line;
1.2.2.2 method for detecting straight line and dotted line: and performing Gaussian blur denoising processing on the midlet _ img to obtain an image img3, detecting straight lines by using a hough _ lines function of opencv, and filtering the straight lines with inclination angles not being [70,110] and lengths being less than 50 to obtain a straight line set line _ set. Traversing line _ set, and if line [ i ] satisfies any one of the following conditions, determining the position of the straight line as the layout partition line layout _ line:
1) a straight line length greater than 4/5 of the height of the image img 3;
2) the length of the straight line is greater than 2/3 of the height of the image img3, and the lower ends of the straight line are all blank areas. Meanwhile, the upper end of the straight line is a title, and the sum of the width of the title plus the width of the straight line plus the height of the blank area at the lower end is more than 4/5 of the height of the image img 3;
1.2.2.3 if the layout partition line layout _ line is detected, dividing the image img by the partition line layout _ line to obtain 2 rectangular areas rect1 and rect 2;
1.2.3 if the 1.2.2 does not detect the separation line, analyzing the layout of the three columns;
1.2.3.1 take the image left _ img at 1/3 of the image img, its width is 1/5 of img and its height is img's height;
1.2.3.2 repeat step 1.2.2.2 and check the split line layout _ line 1. If the detection is successful, the image right _ img at 2/3 of the image img, the width of which is 1/5 of the img, and the height of which is 1.2.2.2, is taken, and the partition line layout _ line2 is detected. If the detection of the two steps is successful, dividing the image img into three columns by using the partition lines layout _ line1 and layout _ line 2;
1.3 if both 1.1 and 1.2 fail, then the entire img is a column.
2. Text line detection is performed on each layout
Recording the detected layout result as layout _ records, traversing the layout _ records, and analyzing the text lines of the layout _ records [ i ]:
2.1, carrying out binarization processing on the picture img to obtain a picture binary _ img 2;
2.2 obtaining an outer edge contour set constraints of the picture binary _ img2 by using a findcconstraints function of opencv;
2.3 traversing the constants, and taking the maximum external rectangle of the constants [ i ] to obtain a rectangular frame list recats;
2.4 merge rectangle boxes recats: if the vertical distance between the 2 rectangles is less than 8, and the center point of one rectangle is in the horizontal range of the other rectangle;
2.5 calculate text line height: taking the height heights of rectangle rects, removing abnormal maximum and minimum values, counting the number of the heights in the range of [ height, height + C ], wherein the height corresponding to the maximum value of the number is the height of the text line (C is a constant and an empirical value);
2.6 taking the text line × F (F is a constant greater than 1, such as 1.4) as a reference, removing the rectangular frames exceeding the text line × F, and taking the remaining rectangular frames as possible text frames;
2.7 merge rectangular box recats on a text line × 2 basis: if the vertical distance between the 2 rectangles is less than 8 and the horizontal distance is less than the text line multiplied by 2;
2.8 traversing the rectangle rects from left to right, if the center point of the current rectangle and the center point of the previous rectangle are approximately on the same straight line, merging the current rectangle into the 1 st rectangle to obtain a small text line;
2.9 recursion 2.8, the whole text line text lines can be obtained.
3. Performing OCR recognition on the text line of each layout, and merging to obtain the final job text _ list
3.1 traversing text lines text _ lines [ i ] of each layout, taking the maximum external rectangular area max _ line _ rect, and scratching out the corresponding image part _ img from img;
3.2, inputting part _ img into a pre-trained OCR model to produce a text result;
and 3.3, combining the texts to obtain a final text _ list.
4. Extracting topic information from text _ list
Topic information definition: sequence number, region (rectangular coordinates), title content, etc.;
4.1 traversing the text, and taking out the text line meeting the following rule as the initial position area of the candidate question sequence number;
rule: "arabic numerals | chinese numerals" + ", |";
4.2 filtering the candidate serial number by the following characteristics;
4.2.1, taking the abscissa of the area obtained by the step 4.1, calculating an average value to obtain a threshold value coordinate _ x, and deleting the abscissa of the serial number coordinate if the abscissa is obviously larger than or smaller than the coordinate _ x;
4.2.2 deleting the Arabic numerals if the abscissa of the Arabic numerals sequence number area does not satisfy the sequence number increment;
4.2.3 if the abscissa of the Chinese number does not satisfy the increment of the serial number, deleting the horizontal coordinate;
4.2.4 the ending position of each topic is the starting position of the next topic serial number, if the ending position of the layout is reached, the ending position of the layout is taken as the ending position of the topic area;
4.3, the topic sequence number and the corresponding area are obtained through 4.2, the text in the topic area is the topic content, and the information is the topic information.
5. Image pre-processing
5.1 inputting a work image (corrected _ img) corresponding to the blank work image (empty _ img) and answered by the student;
5.2 scaling the corrected _ img to the same size of the empty _ img by taking the empty _ img as a standard;
the topic information of 5.3empty _ img is the topic information of corrected _ img.
6. Identifying red-stroke correction trace
6.1 using the image corrected _ img in 5.2 as the input of the model, the result of the model identification is the red pen correction result, and each result is represented as [ [ (x1, y1), (x2, y2) ], 'red _ category' ], the former is rectangular coordinates, and the latter is category. The current category consists of scores (0-9), true to false;
6.2 Red-pen recognition model
6.2.1 model data
(1) Labeling data by using 5000 pictures of real data;
(2) labels label have: 0-9, √ x, a total of 12 categories (red _ category);
6.2.2 model network
The network model is built using FPN:
the article: feature templates for object detection
Linking the papers: https:// axiv. org/abs/1612.03144
The FPN algorithm structure can be divided into three parts: a bottom-up convolutional neural network, top-down processes and features-to-feature side connections. The bottom-up part is actually the forward process of the convolutional neural network. In the forward process, the size of the feature map changes after passing through some layers, but does not change when passing through other layers, and the author classifies the layers without changing the size of the feature map into one stage, so that each extracted feature is the output of the last layer of each stage, and thus the feature pyramid can be formed. Specifically, for ResNets, the author uses the feature activation output of the last residual structure of each phase. These residual block outputs are denoted as { C2, C3, C4, C5}, corresponding to the outputs of conv2, conv3, conv4 and conv 5. The top-down process is performed with upsampling. The upsampling almost adopts an interpolation method, namely, a proper interpolation algorithm is adopted to insert new elements among pixel points on the basis of the original image pixels, so that the size of the original image is enlarged. By up-sampling the feature map, the up-sampled feature map has the same size as the feature map of the next layer. The process of transverse connection between the side edges is shown in the following figures. Basically, the result of up-sampling is fused with the feature map generated from bottom-up. The feature map of a corresponding layer generated in the convolutional neural network is subjected to 1 x1 convolution operation and is fused with the feature map subjected to upsampling to obtain a new feature map, and the feature map fuses the features of different layers and has richer information. The 1 x1 convolution operation here aims to change channels, and the requirements are the same as for the channels of the subsequent layer. After the fusion, each fusion result is convolved by using a convolution kernel of 3 × 3 to eliminate aliasing effect of the upsampling, so that a new feature map is obtained. By iterating layer by layer, a plurality of new feature maps can be obtained. Suppose that the generated feature map results are P2, P3, P4 and P5, which are in one-to-one correspondence with the original bottom-up convolution results C2, C3, C4 and C5. All levels in the pyramid structure share a classification level (regression level).
6.2.3 model training
(1) Dividing the data labeled with 6.2.1 into a training set and a verification set according to the proportion of 9: 1;
(2) the number of model training rounds epochs is 200000;
(3) the training loss value of the final model is reduced to below 0.002 and converged.
7. Extracting correction information
7.1 traversing each topic information, if the intersection area of a region of a correction result and a topic region is more than or equal to 0.5 time of the area of the correction result, the correction trace is the correction information of the topic;
7.2 for one topic with multiple topic information, the following rules determine the final correction result:
7.2.1 taking the maximum value of the scores of the questions with scores as the final correction result;
7.2.2 taking x of √ and x in a question is the final batch modification result;
7.2.3 in one subject, only X is selected as the final batch modification result;
7.2.4 in a subject, only the drawing of V is the final correction result;
7.3 for a topic, no correction information exists, and the default x is the final correction result;
7.4 finally obtaining the correction information of each topic, which is expressed as:
[ topic number, topic content, region, correction result ].
Example four
As shown in fig. 10, an image processing apparatus 1000 according to an embodiment of the present invention includes: a memory 1002, a processor 1004 and a program stored on the memory 1002 and executable on the processor 1004, the program implementing the steps of the image processing method according to any of the embodiments as described above when executed by the processor 1004. The image processing apparatus 1000 includes all the advantages of the image processing method according to any of the above embodiments, and will not be described herein again.
EXAMPLE five
As shown in fig. 11, a terminal 1100 according to an embodiment of the present invention includes: the image processing apparatus 1000 described above. The terminal 1100 can implement: inputting the image with the specified format into a text detection model, and outputting the text content in the image and coordinate position information corresponding to the text content by the text detection model; and filtering out non-character information in the text content according to the coordinate position information. The terminal 1100 includes all the advantages of the image processing method according to any of the embodiments described above, and details are not repeated herein.
EXAMPLE six
As shown in fig. 12, according to an embodiment of the present invention, there is further provided a computer-readable storage medium 1200, on which a computer program 1202 is stored, wherein the computer program 1202 implements the image processing method defined in any one of the above embodiments when executed.
In this embodiment, the computer program 1202 when executed implements: acquiring text content in the image according to a natural language processing algorithm; determining the characteristics of the text according to the text content; identifying mark information in the image with the mark according to a target detection algorithm based on deep learning; and determining the association relation among the characteristics of the text, the marking information and the text content to generate a recognition result. The images and the images with marks belong to corresponding relations, and have the same typesetting and partially similar contents.
The embodiment of the application combines a plurality of image processing methods (at least including a natural language processing algorithm and a target detection algorithm based on deep learning) to detect and identify the image, can identify the mark information (such as mark information, annotation handwriting, annotation symbols and the like) in the image and the text content (such as test questions and character strings) associated with the mark information, and further extracts the target content. Specifically, the natural language processing algorithm is used for carrying out text detection on the image, determining a text box (text area) of text content in the image, further analyzing the text content to determine the characteristics of the text, and when the image with the mark is processed on the basis, the processing efficiency is greatly improved. Further, the positioning can be performed according to the characteristics (e.g. title, position) of the text, so as to obtain the corresponding text content and the mark information. The Natural Language Processing (NLP) algorithm is a sub-field of artificial intelligence, and performs text detection based on the deep learning principle. The target detection algorithm is used for identifying the mark information in the image, the mark information and the text content generally have obvious difference, and the mark information can be identified according to the difference.
According to the computer program 1202 of the foregoing embodiment, optionally, determining an association relationship among the feature of the text, the tag information, and the text content to generate the recognition result includes: extracting corresponding text content from the marked image according to the text content and the characteristics of the text; positioning a text in the image with the mark according to the characteristics, determining the area of the text, and determining the area of the mark according to the mark information; if the intersection area of the text and the area of the marked area is larger than or equal to a preset proportion compared with the area of the marked area, determining that the marking information is associated with the text content; and generating a recognition result according to the text content and the mark information.
In the embodiment, the corresponding relation between the plurality of marking information and the plurality of sections (multiple lines) of text content is determined according to the coincidence degree of the marking information and the text content, so that the target text content (for example, a wrong question or a text line with annotation handwriting) can be extracted from the image according to the marking information, the time for manually searching and extracting records is saved, the information acquisition efficiency is greatly improved, and the learning efficiency is improved. Specifically, the area of the region of the mark information can be obtained according to the detected mark information, the area of the region of the text content can be obtained according to the detected feature of the text and the text content, and if the intersection area of the region of the text and the area of the mark is greater than or equal to a preset ratio, for example, the preset ratio is 30%, and the overlapping area of the text region and the mark region accounts for more than 30% of the mark region, it can be determined that the two are related.
According to any of the above embodiments, optionally, the method further comprises: and determining the meaning of the marked information, and storing the text content corresponding to the marked information according to the meaning of the marked information.
In this embodiment, the meaning of the label information is further determined after the label information is identified, and the label information can include a test question score, a correction symbol or an annotation character, and for example, the meaning of the label information can be determined according to the correction symbol "v" or "x" as follows: and evaluating the text content as correct content or evaluating the text content as wrong content. Thereby classifying the text content with the correction symbol "x" into the wrong question.
According to any of the above embodiments, optionally, the method further comprises: determining layout information of the image by using a straight line detection algorithm and a blank area detection algorithm to generate layout coordinates; and segmenting the image according to the layout coordinates.
In the embodiment, the layout information of the image is detected according to the straight line detection algorithm and the blank area detection algorithm, and the text content of the image is identified according to the layout, so that the identification accuracy of the multi-layout text can be improved.
According to any of the embodiments, optionally, obtaining text content in the image according to a natural language processing algorithm specifically includes: performing text line detection on the image to determine text line information; and performing optical character recognition according to the text line information to determine text content.
In this embodiment, edge contour detection is performed on the binarized image of the image to determine a plurality of rectangular frames corresponding to the text in the image, the text frames can be predicted from these rectangular frames, text line information can be obtained after integrating the rectangular frames, and further, OCR (optical character recognition) is performed on the text line information to obtain text content. In addition, text line detection can also be performed on the image by using the CTPN model to determine text lines, and the content of the text can be determined by using the OCR model according to the text line information. The CTPN model and the OCR model can be updated and maintained based on the deep learning principle, and the text line detection accuracy and the character recognition accuracy are improved.
According to any of the embodiments, optionally, identifying, according to a target detection algorithm based on deep learning, marker information in an image with a marker specifically includes: building a marking information identification model according to the characteristic pyramid network; training a label information recognition model; and inputting the image with the mark into a mark information identification model, and outputting the mark information by the mark information identification model.
In the embodiment, the target detection algorithm based on deep learning is embodied as a mark information recognition model, the mark information recognition model built according to a Feature Pyramid Network (FPN) model is trained by using a training set, and an image with marks is used as the input of the model, so that the mark information can be accurately recognized.
According to any of the above embodiments, optionally, the tag information recognition model recognizes the tag information according to the color information, or the tag information recognition model recognizes the tag information according to a font difference or a font size difference between the text content and the tag information.
In this embodiment, in consideration of the difference between the label information and the text information, the text and the label may be distinguished from each other based on the color information or based on the character style, the font size, or the like, and the label information may be recognized from the image. For example, the mark (correction handwriting) on the test paper is generally red, and the test paper image with the red correction handwriting is used as a training set to train the mark information recognition model, so that the mark information recognition model recognizes the mark information according to the color information. For another example, if a handwritten comment exists near a text line and the handwritten font and the print font are different from the print font in both font and font size, the label information recognition model is trained using a text image with the handwritten comment as a training set so that the label information recognition model recognizes the label information based on the font difference or font size difference between the text content and the label information.
According to any of the above embodiments, optionally, the image comprises at least one of: unanswered test question images, answered test question images, unmarked test question images, marked test question images, unmarked text images and marked text images, wherein the marked images comprise at least one of the following images: a test question image with a mark and a text image with a mark.
In this embodiment, the extracting of the text contents according to the natural language processing algorithm may use an unanswered question image, an answered question image, an unmarked question image, a marked question image, an unmarked text image, or a marked text image as a material to determine the characteristics of the text contents and the text, and the target detection algorithm based on the deep learning may use the marked question image and the marked text image as a material while performing preprocessing with reference to the processing result of the natural language processing algorithm to extract the marking information. It should be noted that, in the step of extracting the text content according to the natural language processing algorithm, the unanswered test question image, the unmarked test question image or the unmarked text image is preferably used as a material, so that the interference factor can be reduced, and the accuracy and the efficiency of text recognition can be improved.
According to any of the above embodiments, the text content specifically includes test questions or character strings, and the characteristics of the text specifically include question numbers or character position information.
According to any of the above embodiments, optionally, the method further comprises: and establishing an error question set according to the marking information and the text content, or counting question answering conditions according to the relation between the marking information and the text content, wherein the image is a test question image, the image with the mark is a revised test question image, the text is characterized by a question number, and the marking information corresponds to the marking information.
In the embodiment, the scanned image of the examination paper is used as a material for image processing, so that corresponding wrong questions in the examination paper can be extracted according to the marking information (batch alteration trace), and the wrong questions can be integrated to form a wrong question set, so that a learner can be helped to consolidate error-prone knowledge points, and the learning efficiency is improved. In addition, the questions (question answering conditions) with high error rate can be counted according to the question numbers and the marking information, and for teachers, weak links in teaching activities can be found according to the question answering conditions, so that teaching contents are arranged in a targeted mode, and the improvement of teaching quality is facilitated.
By the image processing method, the image processing device, the terminal and the computer readable storage medium disclosed by the embodiment, the marked text content in the image can be quickly identified and automatically recorded, so that the information is efficiently extracted, the time investment is reduced, and the learning efficiency is favorably improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable image processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable image processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable image processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable image processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made therein without departing from the scope of the invention as defined in the appended claims and their equivalents, and it is intended that the invention encompass such changes and modifications as well.
Claims (12)
1. An image processing method, comprising:
acquiring text content in the image according to a natural language processing algorithm;
determining the characteristics of the text according to the text content;
identifying mark information in the image with the mark according to a target detection algorithm based on deep learning;
and determining the association relation among the features, the marking information and the text content to generate a recognition result.
2. The image processing method according to claim 1, wherein the determining of the association among the feature, the label information, and the text content to generate the recognition result specifically includes:
extracting corresponding text content from the image with the mark according to the text content and the characteristics of the text;
positioning a text in the image with the mark according to the characteristics, determining the area of the text, and determining the area of the mark according to the mark information;
if the intersection area of the text and the area of the marked area is larger than or equal to a preset proportion compared with the area of the marked area, determining that the marking information is associated with the text content;
and generating a recognition result according to the text content and the marking information.
3. The image processing method according to claim 1, further comprising:
and determining the meaning of the marking information, and storing the text content corresponding to the marking information according to the meaning of the marking information.
4. The image processing method according to claim 1, further comprising:
determining layout information of the image by using a straight line detection algorithm and a blank area detection algorithm to generate layout coordinates;
and segmenting the image according to the layout coordinates.
5. The image processing method according to claim 1, wherein the obtaining of the text content in the image according to the natural language processing algorithm specifically comprises:
performing text line detection on the image to determine text line information;
and performing optical character recognition according to the text line information to determine text content.
6. The image processing method according to claim 1, wherein the identifying, according to the target detection algorithm based on deep learning, the mark information in the image with the mark specifically comprises:
building a marking information identification model according to the characteristic pyramid network;
training the label information recognition model;
and inputting the image with the mark into the mark information identification model, and outputting mark information by the mark information identification model.
7. The image processing method according to claim 6,
the mark information identification model identifies the mark information according to color information; or
And the marking information identification model identifies the marking information according to the font difference or the font size difference between the text content and the marking information.
8. The image processing method according to any one of claims 1 to 7,
the image includes at least one of: unanswered test question images, answered test question images, unmarked test question images, marked test question images, unmarked text images and marked text images, wherein the marked images comprise at least one of the following images: a test question image with a mark and a text image with a mark.
9. The image processing method according to any one of claims 1 to 7, further comprising:
establishing an error question set according to the marking information and the text content; or
Counting question answering conditions according to the relation between the marking information and the text content;
the image is a test question image, the image with the mark is the test question image which is corrected, and the character of the text is a question number.
10. An image processing apparatus characterized by comprising: memory, processor and program stored on the memory and executable on the processor, the program being capable of implementing the steps defined by the image processing method as claimed in any one of claims 1 to 9 when executed by the processor.
11. A terminal, comprising:
the image processing apparatus according to claim 10.
12. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the steps of the image processing method according to any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910824436.5A CN112446259A (en) | 2019-09-02 | 2019-09-02 | Image processing method, device, terminal and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910824436.5A CN112446259A (en) | 2019-09-02 | 2019-09-02 | Image processing method, device, terminal and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112446259A true CN112446259A (en) | 2021-03-05 |
Family
ID=74733971
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910824436.5A Pending CN112446259A (en) | 2019-09-02 | 2019-09-02 | Image processing method, device, terminal and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112446259A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113034592A (en) * | 2021-03-08 | 2021-06-25 | 西安电子科技大学 | Three-dimensional scene target detection modeling and detection method based on natural language description |
CN113822907A (en) * | 2021-10-18 | 2021-12-21 | 海信集团控股股份有限公司 | Image processing method and device |
CN113835598A (en) * | 2021-09-03 | 2021-12-24 | 维沃移动通信(杭州)有限公司 | Information acquisition method and device and electronic equipment |
CN115019404A (en) * | 2022-07-01 | 2022-09-06 | 中国光大银行股份有限公司 | Handwritten signature recognition method and device, storage medium and electronic equipment |
CN115545009A (en) * | 2022-12-01 | 2022-12-30 | 中科雨辰科技有限公司 | Data processing system for acquiring target text |
CN118379754A (en) * | 2024-04-25 | 2024-07-23 | 红心动力(北京)科技有限公司 | Exercise book detection method and system based on cloud computing and artificial intelligence |
WO2024183372A1 (en) * | 2023-03-06 | 2024-09-12 | 荣耀终端有限公司 | Text recognition method based on terminal device, and device and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1400547A (en) * | 2001-08-03 | 2003-03-05 | 富士通株式会社 | Format file information extracting device and method |
CN107506762A (en) * | 2017-09-01 | 2017-12-22 | 昆山中骏博研互联网科技有限公司 | A kind of achievement method for automatically inputting based on graphical analysis |
CN107798321A (en) * | 2017-12-04 | 2018-03-13 | 海南云江科技有限公司 | A kind of examination paper analysis method and computing device |
CN109657221A (en) * | 2018-12-13 | 2019-04-19 | 北京金山数字娱乐科技有限公司 | A kind of document segment sort method, collator, electronic equipment and storage medium |
CN110032935A (en) * | 2019-03-08 | 2019-07-19 | 北京联合大学 | A kind of traffic signals label detection recognition methods based on deep learning cascade network |
-
2019
- 2019-09-02 CN CN201910824436.5A patent/CN112446259A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1400547A (en) * | 2001-08-03 | 2003-03-05 | 富士通株式会社 | Format file information extracting device and method |
CN107506762A (en) * | 2017-09-01 | 2017-12-22 | 昆山中骏博研互联网科技有限公司 | A kind of achievement method for automatically inputting based on graphical analysis |
CN107798321A (en) * | 2017-12-04 | 2018-03-13 | 海南云江科技有限公司 | A kind of examination paper analysis method and computing device |
CN109657221A (en) * | 2018-12-13 | 2019-04-19 | 北京金山数字娱乐科技有限公司 | A kind of document segment sort method, collator, electronic equipment and storage medium |
CN110032935A (en) * | 2019-03-08 | 2019-07-19 | 北京联合大学 | A kind of traffic signals label detection recognition methods based on deep learning cascade network |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113034592A (en) * | 2021-03-08 | 2021-06-25 | 西安电子科技大学 | Three-dimensional scene target detection modeling and detection method based on natural language description |
CN113034592B (en) * | 2021-03-08 | 2021-08-31 | 西安电子科技大学 | Three-dimensional scene target detection modeling and detection method based on natural language description |
CN113835598A (en) * | 2021-09-03 | 2021-12-24 | 维沃移动通信(杭州)有限公司 | Information acquisition method and device and electronic equipment |
CN113822907A (en) * | 2021-10-18 | 2021-12-21 | 海信集团控股股份有限公司 | Image processing method and device |
CN113822907B (en) * | 2021-10-18 | 2024-03-26 | 海信集团控股股份有限公司 | Image processing method and device |
CN115019404A (en) * | 2022-07-01 | 2022-09-06 | 中国光大银行股份有限公司 | Handwritten signature recognition method and device, storage medium and electronic equipment |
CN115545009A (en) * | 2022-12-01 | 2022-12-30 | 中科雨辰科技有限公司 | Data processing system for acquiring target text |
WO2024183372A1 (en) * | 2023-03-06 | 2024-09-12 | 荣耀终端有限公司 | Text recognition method based on terminal device, and device and storage medium |
CN118379754A (en) * | 2024-04-25 | 2024-07-23 | 红心动力(北京)科技有限公司 | Exercise book detection method and system based on cloud computing and artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020259060A1 (en) | Test paper information extraction method and system, and computer-readable storage medium | |
CN110766014B (en) | Bill information positioning method, system and computer readable storage medium | |
CN112446259A (en) | Image processing method, device, terminal and computer readable storage medium | |
US11790641B2 (en) | Answer evaluation method, answer evaluation system, electronic device, and medium | |
CN108171297B (en) | Answer sheet identification method | |
Choudhary et al. | A new character segmentation approach for off-line cursive handwritten words | |
CN110210413A (en) | A kind of multidisciplinary paper content detection based on deep learning and identifying system and method | |
CN110619326B (en) | English test paper composition detection and identification system and method based on scanning | |
CN110414563A (en) | Total marks of the examination statistical method, system and computer readable storage medium | |
CN106980857B (en) | Chinese calligraphy segmentation and recognition method based on copybook | |
CN112507758A (en) | Answer sheet character string identification method, answer sheet character string identification device, terminal and computer storage medium | |
CN110598566A (en) | Image processing method, device, terminal and computer readable storage medium | |
CN110443235B (en) | Intelligent paper test paper total score identification method and system | |
CN109508716B (en) | Image character positioning method and device | |
CN112446262A (en) | Text analysis method, text analysis device, text analysis terminal and computer-readable storage medium | |
CN113159014A (en) | Objective question reading method, device, equipment and storage medium based on handwritten question numbers | |
CN111737478A (en) | Text detection method, electronic device and computer readable medium | |
CN112347997A (en) | Test question detection and identification method and device, electronic equipment and medium | |
CN111079641A (en) | Answering content identification method, related device and readable storage medium | |
CN113158895A (en) | Bill identification method and device, electronic equipment and storage medium | |
CN112241730A (en) | Form extraction method and system based on machine learning | |
CN112241727A (en) | Multi-ticket identification method and system and readable storage medium | |
CN116824608A (en) | Answer sheet layout analysis method based on target detection technology | |
CN114758341A (en) | Intelligent contract image identification and contract element extraction method and device | |
CN114330234A (en) | Layout structure analysis method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210305 |