CN110598566A - Image processing method, device, terminal and computer readable storage medium - Google Patents

Image processing method, device, terminal and computer readable storage medium Download PDF

Info

Publication number
CN110598566A
CN110598566A CN201910760632.0A CN201910760632A CN110598566A CN 110598566 A CN110598566 A CN 110598566A CN 201910760632 A CN201910760632 A CN 201910760632A CN 110598566 A CN110598566 A CN 110598566A
Authority
CN
China
Prior art keywords
image
text
information
height
image processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910760632.0A
Other languages
Chinese (zh)
Inventor
贺涛
欧阳一村
曾志辉
邢军华
许文龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE ICT Technologies Co Ltd
Original Assignee
ZTE ICT Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE ICT Technologies Co Ltd filed Critical ZTE ICT Technologies Co Ltd
Priority to CN201910760632.0A priority Critical patent/CN110598566A/en
Publication of CN110598566A publication Critical patent/CN110598566A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image processing method, an image processing device, a terminal and a computer readable storage medium, wherein the image processing method comprises the following steps: inputting the image with the specified format into a text detection model, and outputting the text content in the image and coordinate position information corresponding to the text content by the text detection model; and carrying out statistical analysis on the coordinate position information, setting a threshold value according to the coordinate position information, determining a position frame corresponding to the character information according to the threshold value, and filtering out non-character information except the position frame. According to the technical scheme, the text detection model is used as a filter to filter the non-character information in the image, so that the accuracy and the calculation speed of text content detection are improved.

Description

Image processing method, device, terminal and computer readable storage medium
Technical Field
The present invention relates to the field of image processing, and in particular, to an image processing method, an image processing apparatus, a terminal, and a computer-readable storage medium.
Background
With the wide application of image-based Character Recognition at present, image-based OCR (Optical Character Recognition) is widely used in the business field, such as text information Recognition of bill Recognition, identification card Recognition, and the like. In the existing education industry, examinations are an important means for teachers to know knowledge points of students, analysis and statistics of examination conditions of the students occupy a large part of workload of the teachers, examination paper is different from identity cards, bank cards, invoices and the like, has relatively fixed formats and contents, and the examination paper of each school or each education institution has respective typesetting.
In the related art, when a test paper is scanned to extract specific text information, the following technical problems exist by using the conventional image processing method: for one, the conventional line inspection method is greatly affected by the quality of the scanned image, such as the color and quality of the paper and the poor quality of the image formed by the scanner. Secondly, the test paper has various text structures and may contain numerous information such as characters, formulas, tables, images and the like, when the traditional image text line detection method is used, the text line detection method is used as a detector for identifying text contents, a large number of text line screening processes are required to be added to remove interference information such as table images, the preprocessing methods are complicated mostly, and the test paper with different formats needs different screening processes, so that the identification accuracy and the calculation speed are influenced, and the development efficiency is reduced.
Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art or the related art.
To this end, it is an object of the present invention to provide an image processing method.
Another object of the present invention is to provide an image processing apparatus.
Another object of the present invention is to provide a terminal.
It is another object of the present invention to provide a computer-readable storage medium.
In order to achieve the above object, according to an aspect of the first aspect of the present invention, there is provided an image processing method including: inputting the image with the specified format into a text detection model, and outputting the text content in the image and coordinate position information corresponding to the text content by the text detection model; and carrying out statistical analysis on the coordinate position information, setting a threshold value according to the coordinate position information, determining a position frame corresponding to the character information according to the threshold value, and filtering out non-character information except the position frame.
In the technical scheme, the image in the specified format corresponds to the image to be recognized, the image to be recognized can be converted into the image in the specified format through preprocessing, the image in the specified format is input into the text detection model, the text content of the image and the coordinate position information of the text can be acquired, the image is filtered according to the coordinate position information, and non-character information in the text content can be filtered. The present application uses a deep-learned text detection model as a filter, which is exactly the opposite of the conventional scheme (which uses the text detection model as a detector). The text line detection algorithm based on the deep learning is used as a filter, the traditional text line detection can be further carried out on the filtered image, the text positioning result of the text line detection algorithm based on the deep learning and the traditional algorithm result are subjected to statistical fusion, the inclination correction is carried out, and the final result (coordinate position information) is extracted according to the width, height, character spacing and other thresholds of the characters in the image.
The text detection model is a relevant model of a deep learning-based target detection algorithm, for example: CTPN, fast-RCNN, SSD, etc., among which CTPN (full name "Detecting Text in Natural Image with connection Text forward Network", based on Text detection connected to a preselected frame Network), fast-RCNN (full name "fast Region-CNN", CNN full name "volumetric Neural Network", fast domain-Convolutional Neural Network), SSD (full name "Single short multi box Detector", a multi-target detection algorithm that directly predicts a target class and bounding box).
In addition, some models dedicated to text detection can achieve similar technical effects, such as: EAST, textboxes + +, and SegLink, where EAST (collectively, "an Efficient and accurate scene Text detection pipeline"), textboxes + + (aspect-Shot organized scene Text Detector, an SSD-based end-to-end trainable fast scene Text Detector), and SegLink (segment-link, a scene Oriented Text detection algorithm).
It can be understood that, the conventional text line detection method needs to set a plurality of fixed thresholds, the setting of such thresholds may require that the imaging quality of the scanned image is stable, and if the imaging quality fluctuates, the set thresholds may need to be adjusted, which may affect the efficiency of project use and the cost of project maintenance in the later period. The problem of low environmental adaptability which cannot be solved by a traditional text line detection method is effectively solved by utilizing the robustness performance of a deep learning text line detection algorithm, for example, the quality of test paper and the imaging quality of a scanner have great influence on the traditional text line detection method, and the influence of the factors can be effectively reduced by using the deep learning method.
According to the image processing method of the above technical solution, optionally, the method further includes: and identifying the filtered image by using an image connected region detection method, and identifying text content in the image.
In the technical scheme, the images are processed based on a deep learning text detection model, and the filtered images are identified by combining an image connected region detection method. Because the text detection model based on the deep learning filters out the non-character impurity information in the image, the text positioning result of the text line detection algorithm based on the deep learning and the result of the image connected region detection method are statistically fused, and the accuracy of image identification is improved.
According to the image processing method in any of the above technical solutions, optionally, the text detection model corresponds to a text recognition network model that determines a text region in the image and then determines a text line in the text region, and the inputting of the image in the specified format into the text detection model specifically includes: and inputting the image with the specified format into the trained character recognition network model for detection.
In the technical scheme, the text detection model is a CTPN model, the trained CTPN model has higher robustness and can solve the problem of low environmental adaptability which cannot be solved by the traditional text line detection method, for example, the quality of test paper and the imaging quality of a scanner have great influence on the traditional text line detection method, and the influence of the factors can be effectively reduced by using the CTPN model trained based on deep learning.
According to the image processing method of any one of the above technical solutions, optionally, the outputting, by the text detection model, the text content in the image and the coordinate position information corresponding to the text content specifically includes: determining a zooming threshold value according to the paper type of the carrier of the image, zooming the image exceeding the zooming threshold value, and performing text recognition on the zoomed image by the text detection model so as to output text content in the image and coordinate position information corresponding to the text content.
In the technical scheme, the zoom threshold is set according to the paper model of the carrier (such as the normalized paper of test paper, file, questionnaire and the like) of the image to be recognized, the zoomed image is more beneficial to implementing text detection, and the processing speed and accuracy of the character recognition network model are improved.
According to the image processing method of any one of the above technical solutions, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and adjusting the value of the width-height ratio parameter of the character recognition network model according to the arrangement direction of the characters in the image so as to enhance the information extraction capability of the character recognition network model in the arrangement direction, and performing text recognition on the image by the text detection model after adjusting the width-height ratio parameter so as to output text content in the image and coordinate position information corresponding to the text content.
In the technical scheme, the different text arrangement directions can cause different text recognition capabilities of the character recognition network model, and the width-height ratio parameter of the character recognition network model is adjusted according to the text arrangement directions, so that the character recognition network model can adapt to the text arrangement directions, and the recognition accuracy and the recognition speed are improved. For example, modifying the SCALES _ BASE in the aspect ratio parameter to (0.25,0.5,1.0,2.0,3.0) to (1.0,2.0,3.0, 5.0,10.0) can enhance the information extraction capability of the text recognition network model in the horizontal direction.
According to the image processing method of any one of the above technical solutions, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and after a plurality of prediction frames capable of detecting the text characters and the punctuations are determined, the text detection model performs text recognition on the image so as to output text content in the image and coordinate position information corresponding to the text content.
In the technical scheme, the prediction box is used for selecting texts (characters), the height of the prediction box determines the height of the texts (characters) which can be recognized by the prediction box, in order to ensure that high-height text characters and low-height punctuation marks can be recognized accurately, values of the pixel height of the prediction box of the character recognition network model are adjusted to obtain a plurality of prediction boxes with different heights, and specific numerical values of the heights of the prediction boxes are determined according to the character word numbers and the punctuation marks in the images. For example, the height of the prediction frame is set to start from 8 pixels, and 10 possible prediction frames are obtained by adding 4 pixels each time, that is, the height of the first prediction frame is 8 pixels, and the height of the tenth prediction frame is 44 pixels, so as to ensure that effective text information including text-to-punctuation marks can be detected.
The image processing method according to any one of the above technical solutions, optionally, further includes: converting a scanned image of an image to be recognized into a gray image; enhancing the image quality of the gray level image according to an image denoising algorithm and an image enhancement algorithm to obtain an enhanced image; processing the enhanced image according to a self-adaptive threshold segmentation algorithm to obtain a binary segmentation threshold; and converting the enhanced image into an image in a specified format according to the binary segmentation threshold value.
In the technical scheme, an image with a specified format, namely a binary image, is scanned to obtain a scanned image of an image to be identified, the scanned image is converted into a gray image, the gray image is denoised and enhanced, then a binary segmentation threshold value is determined, and the binary image is obtained according to the binary segmentation threshold value, so that the image is preprocessed. The image binarization can reduce the influence of interference factors in the color image on the text detection model and improve the detection accuracy. The image denoising and the image enhancement can keep stroke information in the characters as much as possible and improve the detection accuracy. In addition, the preprocessing denoising algorithm which can be used in the preprocessing process includes mean filtering, median filtering and the like, and the image enhancement algorithm includes methods of linear transformation enhancement, histogram equalization transformation and the like. In practice, the original color image is first converted into a gray-scale image, and a median filtering algorithm may be used to remove noise in the image in order to effectively remove salt-pepper noise in the image, and preferably, only 3 × 3 templates are used in the median filtering algorithm. In the process of image enhancement, in order to meet the robustness of an image algorithm, a histogram equalization transformation method can be used for enhancing the global contrast of an image, so that the problem caused by possible illumination in the process of scanning a test paper is solved.
According to the image processing method of any one of the above technical solutions, optionally, the coordinate position information is statistically analyzed, a threshold is set according to the coordinate position information, a position frame corresponding to the text information is determined according to the threshold, and non-text information other than the position frame is filtered, specifically including: determining a text box according to the coordinate position information, counting the height information of the text box, solving the mean value of the height information, and setting a height threshold value according to the mean value; extracting a normal position frame and extracting an abnormal position frame with the height being greater than or less than the height threshold according to the height threshold; and filtering out the image pixels outside the normal position frame to obtain a first filtered image.
According to the technical scheme, the text box can be determined according to coordinate position information, the height average value of the text box is counted, a threshold value is set near the average value to extract information of an abnormal position box with too large height or too small height and information of a normal position box, and non-character information such as icons, formulas, noise and the like is contained in the abnormal position box. According to the abnormal position frame, non-character information such as icons, formulas and the like in the frame can be identified. In addition, the image pixels outside the normal position frame are filtered, the image of the character information (text line) can be obtained, in the process, the text detection model is combined with a statistical analysis method to complete the extraction of the text line information in the image, and the accuracy of text content identification and the accuracy of positioning are improved.
According to the image processing method of any one of the above technical solutions, optionally, after obtaining the first filtered image, the method further includes: counting the average value of the length of the position frame in the horizontal direction, setting a length threshold value according to the length average value, counting the average values of the starting position and the ending position of each line in the vertical direction, and setting a line threshold value according to the average values of the starting position and the ending position; extracting a plurality of horizontal position frames in the horizontal direction according to the length threshold value, and extracting a plurality of vertical position frames in the vertical direction according to the line threshold value; and filtering out the pixels of the image except the horizontal position frame and the vertical position frame to obtain a second filtered image.
According to the technical scheme, the corresponding threshold value is set by counting the average value of the length of the position frame in the horizontal direction and counting the average value of the starting position and the ending position of each line in the vertical direction, the horizontal position frame and the vertical position frame are determined according to the threshold value, pixels except the horizontal position frame and the vertical position frame can be filtered, non-text information in the image is further filtered, and the detection accuracy is improved.
According to the image processing method of any one of the above technical solutions, optionally, after obtaining the second filtered image, the method further includes: calculating the inclination angle of the straight line segment at the center of each text line of the second filtered image according to a morphological processing algorithm and a connected region rectangular frame position detection algorithm; determining layout information of the text according to the horizontal position frame and the vertical position frame; extracting a text region image from the first filtered image according to the layout information; and correcting the text region image according to the inclination angle to obtain a corrected image, wherein the corrected image corresponds to the filtered image.
According to the technical scheme, the image is subjected to tilt detection and tilt correction according to a morphological processing algorithm and a connected region rectangular frame position detection algorithm, the layout information of the text is determined, the text information is accurately acquired by combining a text detection model based on deep learning, a traditional morphological processing algorithm and a connected region rectangular frame position detection algorithm, and the identification accuracy is improved.
The image processing method according to any one of the above technical solutions, optionally, further includes: and performing connected region detection and extension detection on the corrected image to obtain rectangular frame information of the characters, and aggregating the rectangular frames of each line according to the width and height information of the preset printing font and the distance information between the characters to obtain coordinate position information of the text line.
According to the technical scheme, non-text impurity information in an image is filtered through a text detection model, a deep learning text detection model is used as the image filtered by a filter to perform traditional text line detection, a text positioning result of the text detection model based on the deep learning and a traditional algorithm result are subjected to statistical fusion to correct an inclined text, and a final result (namely position coordinate information of text content to be recognized in the image) is extracted according to thresholds such as the width and height of text characters in the image, character intervals and the like.
According to the image processing method of any of the above technical solutions, optionally, the text detection model is trained by a data set produced from a text row picture generated from the examination question bank.
According to an aspect of the second aspect of the present invention, there is provided an image processing apparatus including: a memory, a processor and a program stored in the memory and executable on the processor, the program, when executed by the processor, implementing the steps of the image processing method according to any one of the above-mentioned technical solutions. The image processing apparatus includes all the advantages of the image processing method according to any of the above technical solutions, and details are not repeated here.
According to a technical solution of the third aspect of the present invention, there is also provided a terminal, including: the image processing apparatus according to the second aspect of the present invention. The terminal includes all the advantages of the image processing method according to any of the above technical solutions, which are not described herein again.
According to an aspect of the fourth aspect of the present invention, there is also provided a computer-readable storage medium, on which a computer program is stored, the computer program, when executed, implementing the image processing method defined in any one of the aspects of the first aspect.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 shows a schematic flow diagram of an image processing method according to an embodiment of the invention;
FIG. 2 shows a schematic diagram of an image processing method according to another embodiment of the invention;
FIG. 3 shows a schematic block diagram of an image processing apparatus according to an embodiment of the present invention;
FIG. 4 shows a schematic block diagram of a terminal according to one embodiment of the present invention;
FIG. 5 shows a schematic block diagram of a computer-readable storage medium according to an embodiment of the invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Example one
As shown in fig. 1, an image processing method according to an embodiment of the present invention includes:
102, inputting an image with a specified format into a text detection model, and outputting text content in the image and coordinate position information corresponding to the text content by the text detection model;
and 104, carrying out statistical analysis on the coordinate position information, setting a threshold value according to the coordinate position information, determining a position frame corresponding to the text information according to the threshold value, and filtering out non-text information except the position frame.
In the embodiment, the image in the specified format corresponds to the image to be recognized, the image to be recognized can be converted into the image in the specified format through preprocessing, the image in the specified format is input into the text detection model, the text content of the image and the coordinate position information of the text can be acquired, the image is filtered according to the coordinate position information, and non-character information in the text content can be filtered. The present application uses a deep-learned text detection model as a filter, which is exactly the opposite of the conventional scheme (which uses the text detection model as a detector). The text line detection algorithm based on the deep learning is used as a filter, the traditional text line detection can be further carried out on the filtered image, the text positioning result of the text line detection algorithm based on the deep learning and the traditional algorithm result are subjected to statistical fusion, the inclination correction is carried out, and the final result (coordinate position information) is extracted according to the width, height, character spacing and other thresholds of the characters in the image.
The text detection model is a relevant model of a deep learning-based target detection algorithm, for example: CTPN, fast-RCNN, SSD, etc.
In addition, some models dedicated to text detection can achieve similar technical effects, such as: EAST, textboxes + +, and SegLink, among other algorithmic models.
It can be understood that, the conventional text line detection method needs to set a plurality of fixed thresholds, the setting of such thresholds may require that the imaging quality of the scanned image is stable, and if the imaging quality fluctuates, the set thresholds may need to be adjusted, which may affect the efficiency of project use and the cost of project maintenance in the later period. The problem of low environmental adaptability which cannot be solved by a traditional text line detection method is effectively solved by utilizing the robustness performance of a deep learning text line detection algorithm, for example, the quality of test paper and the imaging quality of a scanner have great influence on the traditional text line detection method, and the influence of the factors can be effectively reduced by using the deep learning method.
The image processing method according to the above embodiment, optionally, further includes: and identifying the filtered image by using an image connected region detection method, and identifying text content in the image.
In the embodiment, the filtered image is identified by combining an image connected region detection method based on the processing of the deep learning text detection model on the image. Because the text detection model based on the deep learning filters out the non-character impurity information in the image, the text positioning result of the text line detection algorithm based on the deep learning and the result of the image connected region detection method are statistically fused, and the accuracy of image identification is improved.
According to the image processing method in the foregoing embodiment, optionally, the text detection model corresponds to a text recognition network model that determines a text region in the image and then determines a text line in the text region, and the inputting of the image in the specified format into the text detection model specifically includes: and inputting the image with the specified format into the trained character recognition network model for detection.
In the embodiment, the character recognition network model is a CTPN model, the trained CTPN model has higher robustness and can make up the problem of low environmental adaptability which cannot be solved by the traditional text line detection method, for example, the quality of test paper and the imaging quality of a scanner have great influence on the traditional text line detection method, and the influence of the factors can be effectively reduced by using the CTPN model trained based on deep learning.
According to the image processing method in the foregoing embodiment, optionally, the outputting, by the text detection model, the text content in the image and the coordinate position information corresponding to the text content specifically includes: determining a zooming threshold value according to the paper type of the carrier of the image, zooming the image exceeding the zooming threshold value, and performing text recognition on the zoomed image by the text detection model so as to output text content in the image and coordinate position information corresponding to the text content.
In the embodiment, the zoom threshold is set according to the paper model of the carrier (such as the normalized paper of test paper, file, questionnaire and the like) of the image to be recognized, the zoomed image is more beneficial to implementing text detection, and the processing speed and accuracy of the character recognition network model are improved.
According to the image processing method in the foregoing embodiment, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and adjusting the value of the width-height ratio parameter of the character recognition network model according to the arrangement direction of the characters in the image so as to enhance the information extraction capability of the character recognition network model in the arrangement direction, and performing text recognition on the image by the text detection model after adjusting the width-height ratio parameter so as to output text content in the image and coordinate position information corresponding to the text content.
In the embodiment, the different text arrangement directions can cause different text recognition capabilities of the character recognition network model, and the width-to-height ratio parameter of the character recognition network model is adjusted according to the text arrangement directions, so that the character recognition network model can adapt to the text arrangement directions, and the recognition accuracy and the recognition speed are improved. For example, modifying the SCALES _ BASE in the aspect ratio parameter to (0.25,0.5,1.0,2.0,3.0) to (1.0,2.0,3.0, 5.0,10.0) can enhance the information extraction capability of the text recognition network model in the horizontal direction.
According to the image processing method in the foregoing embodiment, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and after a plurality of prediction frames capable of detecting the text characters and the punctuations are determined, the text detection model performs text recognition on the image so as to output text content in the image and coordinate position information corresponding to the text content.
In the embodiment, the prediction box is used for selecting texts (characters), the height of the prediction box determines the height of the texts (characters) which can be recognized by the prediction box, in order to ensure that text characters with higher height and punctuation marks with lower height can be recognized accurately, values of the pixel height of the prediction box of the character recognition network model are adjusted to obtain a plurality of prediction boxes with different heights, and specific numerical values of the heights of the prediction boxes are determined according to the character word number and the punctuation mark height in the image. For example, the height of the prediction frame is set to start from 8 pixels, and 10 possible prediction frames are obtained by adding 4 pixels each time, that is, the height of the first prediction frame is 8 pixels, and the height of the tenth prediction frame is 44 pixels, so as to ensure that effective text information including text-to-punctuation marks can be detected.
The image processing method according to the above embodiment, optionally, further includes: converting a scanned image of an image to be recognized into a gray image; enhancing the image quality of the gray level image according to an image denoising algorithm and an image enhancement algorithm to obtain an enhanced image; processing the enhanced image according to a self-adaptive threshold segmentation algorithm to obtain a binary segmentation threshold; and converting the enhanced image into an image in a specified format according to the binary segmentation threshold value.
In the embodiment, an image with a specified format, namely a binary image, is scanned to obtain a scanned image of an image to be identified, the scanned image is converted into a gray image, the gray image is denoised and enhanced, a binary segmentation threshold value of the gray image is determined, and the binary image is obtained according to the binary segmentation threshold value, so that the image is preprocessed. The image binarization can reduce the influence of interference factors in the color image on the text detection model and improve the detection accuracy. The image denoising and the image enhancement can keep stroke information in the characters as much as possible and improve the detection accuracy. In addition, the preprocessing denoising algorithm which can be used in the preprocessing process includes mean filtering, median filtering and the like, and the image enhancement algorithm includes methods of linear transformation enhancement, histogram equalization transformation and the like. In practice, the original color image is first converted into a gray-scale image, and a median filtering algorithm may be used to remove noise in the image in order to effectively remove salt-pepper noise in the image, and preferably, only 3 × 3 templates are used in the median filtering algorithm. In the process of image enhancement, in order to meet the robustness of an image algorithm, a histogram equalization transformation method can be used for enhancing the global contrast of an image, so that the problem caused by possible illumination in the process of scanning a test paper is solved.
According to the image processing method of the embodiment, optionally, the coordinate position information is statistically analyzed, a threshold is set according to the coordinate position information, a position frame corresponding to the text information is determined according to the threshold, and non-text information other than the position frame is filtered, which specifically includes: determining a text box according to the coordinate position information, counting the height information of the text box, solving the mean value of the height information, and setting a height threshold value according to the mean value; extracting a normal position frame and extracting an abnormal position frame with the height being greater than or less than the height threshold according to the height threshold; and filtering out the image pixels outside the normal position frame to obtain a first filtered image.
In the embodiment, the text box can be determined by the coordinate position information, the height average value of the text box is counted, a threshold value is set near the average value to extract the information of the abnormal position box with too large height or too small height and the information of the normal position box, and the abnormal position box contains non-character information such as icons, formulas, noise and the like. According to the abnormal position frame, non-character information such as icons, formulas and the like in the frame can be identified. In addition, the image pixels outside the normal position frame are filtered, the image of the character information (text line) can be obtained, in the process, the text detection model is combined with a statistical analysis method to complete the extraction of the text line information in the image, and the accuracy of text content identification and the accuracy of positioning are improved.
According to the image processing method of the foregoing embodiment, optionally, after obtaining the first filtered image, the method further includes: counting the average value of the length of the position frame in the horizontal direction, setting a length threshold value according to the length average value, counting the average values of the starting position and the ending position of each line in the vertical direction, and setting a line threshold value according to the average values of the starting position and the ending position; extracting a plurality of horizontal position frames in the horizontal direction according to the length threshold value, and extracting a plurality of vertical position frames in the vertical direction according to the line threshold value; and filtering out the pixels of the image except the horizontal position frame and the vertical position frame to obtain a second filtered image.
In the embodiment, the corresponding threshold value is set by counting the average value of the length of the position frame in the horizontal direction and counting the average value of the starting position and the ending position of each line in the vertical direction, and the horizontal position frame and the vertical position frame are determined according to the threshold value.
According to the image processing method of the foregoing embodiment, optionally, after obtaining the second filtered image, the method further includes: calculating the inclination angle of the straight line segment at the center of each text line of the second filtered image according to a morphological processing algorithm and a connected region rectangular frame position detection algorithm; determining layout information of the text according to the horizontal position frame and the vertical position frame; extracting a text region image from the first filtered image according to the layout information; and correcting the text region image according to the inclination angle to obtain a corrected image, wherein the corrected image corresponds to the filtered image.
In the embodiment, the image is subjected to tilt detection and tilt correction according to the morphological processing algorithm and the connected region rectangular frame position detection algorithm, the layout information of the text is determined, and the text information is accurately acquired by combining the text detection model based on deep learning, the traditional morphological processing algorithm and the connected region rectangular frame position detection algorithm, so that the identification accuracy is improved.
The image processing method according to the above embodiment, optionally, further includes: and performing connected region detection and extension detection on the corrected image to obtain rectangular frame information of the characters, and aggregating the rectangular frames of each line according to the width and height information of the preset printing font and the distance information between the characters to obtain coordinate position information of the text line.
In the embodiment, non-text impurity information in an image is filtered through a text detection model, a deep learning text detection model is used as a filter for filtering the image to carry out traditional text line detection, a text positioning result of the text detection model based on the deep learning and a traditional algorithm result are subjected to statistical fusion to correct an inclined text, and a final result (namely position coordinate information of text content to be recognized in the image) is extracted according to thresholds such as width and height of text characters in the image, character spacing and the like.
According to the image processing method of the above embodiment, optionally, the text detection model is trained by a data set made of text row pictures generated from the test question library.
Example two
According to another embodiment of the present invention, an image processing method applied to a scene of layout analysis and text information extraction of a test paper includes:
step 1: and converting the scanned image into a gray image, and then enhancing the image quality by using an image denoising algorithm and an image enhancement algorithm.
Step 2: and processing the enhanced test paper image by using an adaptive threshold segmentation algorithm to obtain a binary segmentation threshold, and converting the image into a binary image (an image with a specified format).
And step 3: the images are sent into a pre-trained CTPN model for detection (text detection model), and the detection result of the CTPN model is obtained and comprises coordinate position information L of text linesoriginal
The font size and text lines in the test paper have fixed characteristics, for example, the width and height of a single character of a scanned test paper are generally about 32 pixel points, the number of characters of each line of text of the test paper in two layouts is generally within 45 characters, and the number of characters of each line of text of the test paper in 4 layouts is generally about 25. Because the general CTPN model is used for detecting text contents in photos, for text contents with fixed font sizes such as test paper, algorithm modification is performed in a targeted manner to improve the processing speed and accuracy of the CTPN, and the modification comprises the following contents:
the maximum type of the examination paper is A3 type, the maximum threshold of the image zooming long side is set to be 3400 pixel points, and the maximum threshold of the zooming short side is set to be 2400 pixel points.
Secondly, because the text information in the test paper is transversely arranged, feature extraction in a general algorithm is removed, and partial values of (0.25,0.5,1.0,2.0 and 3.0) of SCALES _ BASE in the width-height ratio parameter are modified into (1.0,2.0,3.0, 5.0 and 10.0) values, so that the information extraction capability in the horizontal direction is enhanced.
Thirdly, because the text in the test paper is the effective information of the subject, the improved algorithm is set to obtain 10 possible prediction frames by adding 4 pixels from 8 pixels each time. Namely, the height of the first prediction frame is 8 pixels, the height of the second prediction frame is 12 pixels, the height of the third prediction frame is 16 pixels, the height of the fourth prediction frame is 20 pixels, the height of the fifth prediction frame is 24 pixels, the height of the sixth prediction frame is 28 pixels, the height of the seventh prediction frame is 32 pixels, the height of the eighth prediction frame is 36 pixels, the height of the ninth prediction frame is 40 pixels, and the height of the tenth prediction frame is 44 pixels, so that effective text information including text-to-punctuation marks is ensured to be detected, and the detection result refers to the text reference figure 2.
And 4, step 4: according to the detection result LoriginalCounting the height information of the text box in the detection result, calculating the average value of the height information, setting a reasonable threshold value near the average value and extracting the position information L of the abnormal box with too large or too small heightabnormalAnd normal position frame information LnormalThe images of these areas within the frame include non-text information such as icons, formulas, and noise in the test paper.
And 5: using image processing method to process the image in the step 2 to extract the normal position area LnormalThe pixel values of the other images are reset to 255 (i.e., the filling is white), and a new filtered test paper image a is obtained, wherein the image is an image in which the non-text information in the test paper is filtered. Counting the average value of the length of the frame in the horizontal direction and the average value of the starting position and the ending position of each line in the vertical direction from the non-abnormal position frame obtained by filtering in the step 4, and setting a threshold value to find out the bits of a plurality of frames near the average value in the horizontal directionPut frame information LH_meanAnd position frame information L of a plurality of frames in the vicinity of the vertical direction mean valueV_mean
The traditional text line detection method needs to set a plurality of fixed thresholds, the setting of the thresholds can require that the imaging quality of a scanned image is stable, and if the imaging quality fluctuates, the set thresholds need to be adjusted, which can affect the efficiency of later project use and the cost of project maintenance. The above-mentioned steps 3 to 5 are the proposed solutions based on this situation. The implementation mode is mainly used for solving the problem of low environmental adaptability which cannot be solved by the traditional text line detection method by utilizing the robustness of the deep learning text line detection algorithm, for example, the quality of test paper and the imaging quality of a scanner have great influence on the traditional text line detection method, but the influence of the factors can be effectively reduced by matching with the deep learning method.
Step 6: and (3) reserving image pixel values in a position frame area near the mean value extracted in the step (5) by using an image processing method for the image A in the step (5), and setting the pixel values of other areas to be 255 (namely, filling the image to be white), so as to obtain a new image B.
And 7: using the image B obtained in the step 6 to obtain the straight line segment of the center of each text line by using the traditional image processing algorithms such as morphological processing, rectangular frame position detection of a connected region and the like, obtaining the slope k of each straight line, and further calculating to obtain the inclination angle of each straight line
And 8: based on the position frame information L of a plurality of frames near the horizontal direction mean value obtained in the step 6H_meanAnd position frame information L of a plurality of frames in the vicinity of the vertical direction mean valueV_meanCombining and analyzing to obtain layout position information L of the text in the test papersectionBased on the layout position information LsectionA plurality of title text region images C are obtained on a test paper image A.
And step 9: performing tilt correction on the multiple images C obtained from one test paper at a correction angle ofThe inclination angle calculated in step 7A plurality of images D are obtained.
Step 10: and performing connected region detection on the corrected image D by using an image processing method, performing extension detection to obtain rectangular frame information of the characters, and aggregating the rectangular frame information of each line according to the width and height information of the preset test paper printing font and the distance information between the characters to obtain final position coordinate information of the text line to be recognized.
The method comprises the following steps of 1 to 5, filtering out non-text impurity information in a test paper by using a text line detection algorithm based on deep learning, carrying out traditional text line detection by using the text line detection algorithm based on deep learning as an image filtered by a filter in the subsequent steps 6 to 10, carrying out statistical fusion on a text positioning result of the text line detection algorithm based on deep learning and a traditional algorithm result to correct an oblique text, and extracting a final result (namely position coordinate information of text content to be recognized in a scanned image of the test paper) according to thresholds such as width and height of text characters, character spacing and the like in the test paper.
EXAMPLE III
As shown in fig. 3, an image processing apparatus 300 according to an embodiment of the present invention includes: a memory 302, a processor 304 and a program stored on the memory 302 and executable on the processor 304, which when executed by the processor 304, implement the steps of the image processing method according to any of the embodiments described above. The image processing apparatus 300 includes all the advantages of the image processing method according to any of the above embodiments, and will not be described herein again.
Example four
As shown in fig. 4, a terminal 400 according to an embodiment of the present invention includes: the image processing apparatus 300 according to the third embodiment. The terminal 400 is capable of implementing in operation: inputting the image with the specified format into a text detection model, and outputting the text content in the image and coordinate position information corresponding to the text content by the text detection model; and filtering out non-character information in the text content according to the coordinate position information. The terminal 400 includes all the advantages of the image processing method according to any of the above embodiments, and will not be described herein again.
EXAMPLE five
As shown in fig. 5, according to an embodiment of the present invention, there is further provided a computer readable storage medium 500, on which a computer program 502 is stored, wherein the computer program 502 implements the image processing method defined in any one of the above embodiments when executed.
In this embodiment, the computer program 502 when executed implements: inputting the image with the specified format into a text detection model, and outputting the text content in the image and coordinate position information corresponding to the text content by the text detection model; and carrying out statistical analysis on the coordinate position information, setting a threshold value according to the coordinate position information, determining a position frame corresponding to the character information according to the threshold value, and filtering out non-character information except the position frame.
The image with the specified format corresponds to the image to be recognized, the image to be recognized can be converted into the image with the specified format through preprocessing, the image with the specified format is input into the text detection model, text content of the image and coordinate position information of the text can be obtained, the image is filtered according to the coordinate position information, and non-character information in the text content can be filtered. The present application uses a deep-learned text detection model as a filter, which is exactly the opposite of the conventional scheme (which uses the text detection model as a detector). The text line detection algorithm based on the deep learning is used as a filter, the traditional text line detection can be further carried out on the filtered image, the text positioning result of the text line detection algorithm based on the deep learning and the traditional algorithm result are subjected to statistical fusion, the inclination correction is carried out, and the final result (coordinate position information) is extracted according to the width, height, character spacing and other thresholds of the characters in the image.
The computer program 502 according to the above technical solution optionally further includes: and identifying the filtered image by using an image connected region detection method, and identifying text content in the image.
In the technical scheme, the images are processed based on a deep learning text detection model, and the filtered images are identified by combining an image connected region detection method. Because the text detection model based on the deep learning filters out the non-character impurity information in the image, the text positioning result of the text line detection algorithm based on the deep learning and the result of the image connected region detection method are statistically fused, and the accuracy of image identification is improved.
According to the computer program 502 in any one of the above technical solutions, optionally, the text detection model corresponds to a text recognition network model that determines a text region in an image and then determines a text line in the text region, and inputting the image in a specified format into the text detection model specifically includes: and inputting the image with the specified format into the trained character recognition network model for detection.
In the technical scheme, the character recognition network model is a CTPN model, the trained CTPN model has higher robustness and can solve the problem of low environmental adaptability which cannot be solved by the traditional text line detection method, for example, the quality of test paper and the imaging quality of a scanner have great influence on the traditional text line detection method, and the influence of the factors can be effectively reduced by using the CTPN model trained based on deep learning.
According to the computer program 502 in any one of the above technical solutions, optionally, the outputting, by the text detection model, the text content in the image and the coordinate position information corresponding to the text content specifically include: determining a zooming threshold value according to the paper type of the carrier of the image, zooming the image exceeding the zooming threshold value, and performing text recognition on the zoomed image by the text detection model so as to output text content in the image and coordinate position information corresponding to the text content.
In the technical scheme, the zoom threshold is set according to the paper model of the carrier (such as the normalized paper of test paper, file, questionnaire and the like) of the image to be recognized, the zoomed image is more beneficial to implementing text detection, and the processing speed and accuracy of the character recognition network model are improved.
According to the computer program 502 of any one of the above technical solutions, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and adjusting the value of the width-height ratio parameter of the character recognition network model according to the arrangement direction of the characters in the image so as to enhance the information extraction capability of the character recognition network model in the arrangement direction, and performing text recognition on the image by the text detection model after adjusting the width-height ratio parameter so as to output text content in the image and coordinate position information corresponding to the text content.
In the technical scheme, the different text arrangement directions can cause different text recognition capabilities of the character recognition network model, and the width-height ratio parameter of the character recognition network model is adjusted according to the text arrangement directions, so that the character recognition network model can adapt to the text arrangement directions, and the recognition accuracy and the recognition speed are improved. For example, modifying the SCALES _ BASE in the aspect ratio parameter to (0.25,0.5,1.0,2.0,3.0) to (1.0,2.0,3.0, 5.0,10.0) can enhance the information extraction capability of the text recognition network model in the horizontal direction.
According to the computer program 502 of any one of the above technical solutions, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and after a plurality of prediction frames capable of detecting the text characters and the punctuations are determined, the text detection model performs text recognition on the image so as to output text content in the image and coordinate position information corresponding to the text content.
In the technical scheme, the prediction box is used for selecting texts (characters), the height of the prediction box determines the height of the texts (characters) which can be recognized by the prediction box, in order to ensure that high-height text characters and low-height punctuation marks can be recognized accurately, values of the pixel height of the prediction box of the character recognition network model are adjusted to obtain a plurality of prediction boxes with different heights, and specific numerical values of the heights of the prediction boxes are determined according to the character word numbers and the punctuation marks in the images. For example, the height of the prediction frame is set to start from 8 pixel points, and 10 possible prediction frames are obtained by adding 4 pixel points each time, that is, the height of the first prediction frame is 8 pixels, the height of the second prediction frame is 12 pixels, the height of the third prediction frame is 16 pixels, the height of the fourth prediction frame is 20 pixels, the height of the fifth prediction frame is 24 pixels, the height of the sixth prediction frame is 28 pixels, the height of the seventh prediction frame is 32 pixels, the height of the eighth prediction frame is 36 pixels, the height of the ninth prediction frame is 40 pixels, and the height of the tenth prediction frame is 44 pixels, so as to ensure that effective text information including text-to-punctuation symbols can be detected.
The computer program 502 according to any of the above technical solutions, optionally, further includes: converting a scanned image of an image to be recognized into a gray image; enhancing the image quality of the gray level image according to an image denoising algorithm and an image enhancement algorithm to obtain an enhanced image; processing the enhanced image according to a self-adaptive threshold segmentation algorithm to obtain a binary segmentation threshold; and converting the enhanced image into an image in a specified format according to the binary segmentation threshold value.
In the technical scheme, an image with a specified format, namely a binary image, is scanned to obtain a scanned image of an image to be identified, the scanned image is converted into a gray image, the gray image is denoised and enhanced, then a binary segmentation threshold value is determined, and the binary image is obtained according to the binary segmentation threshold value, so that the image is preprocessed. The image binarization can reduce the influence of interference factors in the color image on the text detection model and improve the detection accuracy. The image denoising and the image enhancement can keep stroke information in the characters as much as possible and improve the detection accuracy. In addition, the preprocessing denoising algorithm which can be used in the preprocessing process includes mean filtering, median filtering and the like, and the image enhancement algorithm includes methods of linear transformation enhancement, histogram equalization transformation and the like. In practice, the original color image is first converted into a gray-scale image, and a median filtering algorithm may be used to remove noise in the image in order to effectively remove salt-pepper noise in the image, and preferably, only 3 × 3 templates are used in the median filtering algorithm. In the process of image enhancement, in order to meet the robustness of an image algorithm, a histogram equalization transformation method can be used for enhancing the global contrast of an image, so that the problem caused by possible illumination in the process of scanning a test paper is solved.
According to the computer program 502 of any one of the above technical solutions, optionally, filtering out non-text information in the text content according to the coordinate position information specifically includes: determining a text box according to the coordinate position information, counting the height information of the text box, solving the mean value of the height information, and setting a height threshold value according to the mean value; extracting a normal position frame and extracting an abnormal position frame with too large height or too small height according to the height threshold; and filtering out the image pixels outside the normal position frame to obtain a first filtered image.
According to the technical scheme, the text box can be determined according to coordinate position information, the height average value of the text box is counted, a threshold value is set near the average value to extract information of an abnormal position box with too large height or too small height and information of a normal position box, and non-character information such as icons, formulas, noise and the like is contained in the abnormal position box. According to the abnormal position frame, non-character information such as icons, formulas and the like in the frame can be identified. In addition, the image pixels outside the normal position frame are filtered, the image of the character information (text line) can be obtained, in the process, the text detection model is combined with a statistical analysis method to complete the extraction of the text line information in the image, and the accuracy of text content identification and the accuracy of positioning are improved.
The computer program 502 according to any of the above technical solutions, optionally, after obtaining the first filtered image, further includes: counting the average value of the length of the position frame in the horizontal direction, setting a length threshold value according to the length average value, counting the average values of the starting position and the ending position of each line in the vertical direction, and setting a line threshold value according to the average values of the starting position and the ending position; extracting a plurality of horizontal position frames in the horizontal direction according to the length threshold value, and extracting a plurality of vertical position frames in the vertical direction according to the line threshold value; and filtering out the pixels of the image except the horizontal position frame and the vertical position frame to obtain a second filtered image.
According to the technical scheme, the corresponding threshold value is set by counting the average value of the length of the position frame in the horizontal direction and counting the average value of the starting position and the ending position of each line in the vertical direction, the horizontal position frame and the vertical position frame are determined according to the threshold value, pixels except the horizontal position frame and the vertical position frame can be filtered, non-text information in the image is further filtered, and the detection accuracy is improved.
The computer program 502 according to any of the above technical solutions, optionally, after obtaining the second filtered image, further includes: calculating the inclination angle of the straight line segment at the center of each text line of the second filtered image according to a morphological processing algorithm and a connected region rectangular frame position detection algorithm; determining layout information of the text according to the horizontal position frame and the vertical position frame; extracting a text region image from the first filtered image according to the layout information; and correcting the text region image according to the inclination angle to obtain a corrected image, wherein the corrected image corresponds to the filtered image.
According to the technical scheme, the image is subjected to tilt detection and tilt correction according to a morphological processing algorithm and a connected region rectangular frame position detection algorithm, the layout information of the text is determined, the text information is accurately acquired by combining a text detection model based on deep learning, a traditional morphological processing algorithm and a connected region rectangular frame position detection algorithm, and the identification accuracy is improved.
The computer program 502 according to any of the above technical solutions, optionally, further includes: and performing connected region detection and extension detection on the corrected image to obtain rectangular frame information of the characters, and aggregating the rectangular frames of each line according to the width and height information of the preset printing font and the distance information between the characters to obtain coordinate position information of the text line.
According to the technical scheme, non-text impurity information in an image is filtered through a text detection model, a deep learning text detection model is used as the image filtered by a filter to perform traditional text line detection, a text positioning result of the text detection model based on the deep learning and a traditional algorithm result are subjected to statistical fusion to correct an inclined text, and a final result (namely position coordinate information of text content to be recognized in the image) is extracted according to thresholds such as the width and height of text characters in the image, character intervals and the like.
According to the computer program 502 of any of the above solutions, optionally, the text detection model is trained by a data set made of text row pictures generated from the examination question library.
According to the embodiment, the problem of text line detection in the test paper analysis system is solved, the text line can be accurately extracted, the position of the text can be accurately positioned, and the text content can be conveniently identified. The embodiment of the application provides a test paper layout analysis and correction method of a text line detection algorithm (text detection model) based on image deep learning aiming at the characteristics of optical characters in a scanned test paper and the complexity of test paper contents, and the text line detection algorithm based on deep learning and the outline extraction algorithm based on the traditional image processing algorithm are combined by using a statistical analysis method to finish the extraction of the text line information in the test paper, so that the accuracy of content extraction and the accuracy of positioning are improved. The above-described embodiments can be used not only for detecting and recognizing test papers but also for detecting and recognizing arbitrary images based on a similar principle.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable image processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable image processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable image processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable image processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made therein without departing from the scope of the invention as defined in the appended claims and their equivalents, and it is intended that the invention encompass such changes and modifications as well.

Claims (11)

1. An image processing method, comprising:
inputting an image with a specified format into a text detection model, wherein the text detection model outputs text content in the image and coordinate position information corresponding to the text content;
and carrying out statistical analysis on the coordinate position information, setting a threshold value according to the coordinate position information, determining a position frame corresponding to the text information according to the threshold value, and filtering out non-text information except the position frame.
2. The image processing method according to claim 1, wherein the text detection model outputs text content in the image and coordinate position information corresponding to the text content, and specifically comprises:
determining a zooming threshold value according to the paper type of the carrier of the image, zooming the image exceeding the zooming threshold value, and performing text recognition on the zoomed image by the text detection model so as to output text content in the image and coordinate position information corresponding to the text content.
3. The image processing method according to claim 1, wherein the text detection model outputs text content in the image and coordinate position information corresponding to the text content, and specifically further comprises:
and adjusting the value of a width-height ratio parameter of the character recognition network model according to the arrangement direction of the characters in the image so as to enhance the information extraction capability of the character recognition network model in the arrangement direction, and performing text recognition on the image by the text detection model after the width-height ratio parameter is adjusted so as to output text content in the image and coordinate position information corresponding to the text content.
4. The image processing method according to claim 1, wherein the text detection model outputs text content in the image and coordinate position information corresponding to the text content, and specifically further comprises:
and adjusting the value of the pixel height of a prediction frame of the character recognition network model to obtain a plurality of prediction frames with different heights so that the prediction frames can detect the text characters and punctuation marks, and after determining the plurality of prediction frames which can detect the text characters and the punctuation marks, performing text recognition on the image by the text detection model so as to output the text content in the image and the coordinate position information corresponding to the text content.
5. The image processing method according to any one of claims 1 to 4, wherein the statistically analyzing the coordinate position information, establishing a threshold according to the coordinate position information, determining a position frame corresponding to text information according to the threshold, and filtering out non-text information other than the position frame specifically includes:
determining a text box according to the coordinate position information, counting the height information of the text box, solving the mean value of the height information, and setting a height threshold value according to the mean value;
extracting a normal position frame according to the height threshold value, and extracting an abnormal position frame with the height being greater than the height threshold value or less than the height threshold value;
and filtering out the image pixels outside the normal position frame to obtain a first filtered image.
6. The image processing method of claim 5, further comprising, after obtaining the first filtered image:
counting the average value of the length of the position frame in the horizontal direction, setting a length threshold value according to the length average value, counting the average values of the starting position and the ending position of each line in the vertical direction, and setting a line threshold value according to the average values of the starting position and the ending position;
extracting a plurality of horizontal position frames in the horizontal direction according to the length threshold value, and extracting a plurality of vertical position frames in the vertical direction according to the line threshold value;
and filtering out the image pixels outside the horizontal position frame and the vertical position frame to obtain a second filtered image.
7. The image processing method of claim 6, further comprising, after obtaining the second filtered image:
calculating the inclination angle of the straight line segment at the center of each text line of the second filtered image according to a morphological processing algorithm and a connected region rectangular frame position detection algorithm;
determining layout information of the text according to the horizontal position frame and the vertical position frame;
extracting a text region image from the first filtered image according to the layout information;
and correcting the text region image according to the inclination angle to obtain a corrected image.
8. The image processing method according to any one of claims 1 to 4,
the text detection model is trained by a data set made of text row pictures generated according to an examination question bank.
9. An image processing apparatus characterized by comprising: memory, processor and program stored on the memory and executable on the processor, the program being capable of implementing the steps defined by the image processing method as claimed in any one of claims 1 to 8 when executed by the processor.
10. A terminal, comprising:
an image processing apparatus as claimed in claim 9.
11. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the steps of the image processing method according to any one of claims 1 to 8.
CN201910760632.0A 2019-08-16 2019-08-16 Image processing method, device, terminal and computer readable storage medium Pending CN110598566A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910760632.0A CN110598566A (en) 2019-08-16 2019-08-16 Image processing method, device, terminal and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910760632.0A CN110598566A (en) 2019-08-16 2019-08-16 Image processing method, device, terminal and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN110598566A true CN110598566A (en) 2019-12-20

Family

ID=68854474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910760632.0A Pending CN110598566A (en) 2019-08-16 2019-08-16 Image processing method, device, terminal and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110598566A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275051A (en) * 2020-02-28 2020-06-12 上海眼控科技股份有限公司 Character recognition method, character recognition device, computer equipment and computer-readable storage medium
CN111414816A (en) * 2020-03-04 2020-07-14 沈阳先进医疗设备技术孵化中心有限公司 Information extraction method, device, equipment and computer readable storage medium
CN112926565A (en) * 2021-02-25 2021-06-08 中国平安人寿保险股份有限公司 Picture text recognition method, system, device and storage medium
CN113392833A (en) * 2021-06-10 2021-09-14 沈阳派得林科技有限责任公司 Method for identifying type number of industrial radiographic negative image
WO2022006829A1 (en) * 2020-07-09 2022-01-13 国网电子商务有限公司 Bill image recognition method and system, electronic device, and storage medium
CN116188293A (en) * 2022-12-21 2023-05-30 北京海天瑞声科技股份有限公司 Image processing method, device, apparatus, medium, and program product
CN116563650B (en) * 2023-07-10 2023-10-13 邦世科技(南京)有限公司 Deep learning-based endplate inflammatory degeneration grading method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748888A (en) * 2017-10-13 2018-03-02 众安信息技术服务有限公司 A kind of image text row detection method and device
WO2018223857A1 (en) * 2017-06-09 2018-12-13 科大讯飞股份有限公司 Text line recognition method and system
CN109376658A (en) * 2018-10-26 2019-02-22 信雅达系统工程股份有限公司 A kind of OCR method based on deep learning
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR
CN109697440A (en) * 2018-12-10 2019-04-30 浙江工业大学 A kind of ID card information extracting method
CN109993040A (en) * 2018-01-03 2019-07-09 北京世纪好未来教育科技有限公司 Text recognition method and device
CN110110585A (en) * 2019-03-15 2019-08-09 西安电子科技大学 Intelligently reading realization method and system based on deep learning, computer program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018223857A1 (en) * 2017-06-09 2018-12-13 科大讯飞股份有限公司 Text line recognition method and system
CN107748888A (en) * 2017-10-13 2018-03-02 众安信息技术服务有限公司 A kind of image text row detection method and device
CN109993040A (en) * 2018-01-03 2019-07-09 北京世纪好未来教育科技有限公司 Text recognition method and device
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR
CN109376658A (en) * 2018-10-26 2019-02-22 信雅达系统工程股份有限公司 A kind of OCR method based on deep learning
CN109697440A (en) * 2018-12-10 2019-04-30 浙江工业大学 A kind of ID card information extracting method
CN110110585A (en) * 2019-03-15 2019-08-09 西安电子科技大学 Intelligently reading realization method and system based on deep learning, computer program

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275051A (en) * 2020-02-28 2020-06-12 上海眼控科技股份有限公司 Character recognition method, character recognition device, computer equipment and computer-readable storage medium
CN111414816A (en) * 2020-03-04 2020-07-14 沈阳先进医疗设备技术孵化中心有限公司 Information extraction method, device, equipment and computer readable storage medium
CN111414816B (en) * 2020-03-04 2024-03-08 东软医疗系统股份有限公司 Information extraction method, apparatus, device and computer readable storage medium
WO2022006829A1 (en) * 2020-07-09 2022-01-13 国网电子商务有限公司 Bill image recognition method and system, electronic device, and storage medium
CN112926565A (en) * 2021-02-25 2021-06-08 中国平安人寿保险股份有限公司 Picture text recognition method, system, device and storage medium
CN112926565B (en) * 2021-02-25 2024-02-06 中国平安人寿保险股份有限公司 Picture text recognition method, system, equipment and storage medium
CN113392833A (en) * 2021-06-10 2021-09-14 沈阳派得林科技有限责任公司 Method for identifying type number of industrial radiographic negative image
CN116188293A (en) * 2022-12-21 2023-05-30 北京海天瑞声科技股份有限公司 Image processing method, device, apparatus, medium, and program product
CN116188293B (en) * 2022-12-21 2023-08-29 北京海天瑞声科技股份有限公司 Image processing method, device, apparatus, medium, and program product
CN116563650B (en) * 2023-07-10 2023-10-13 邦世科技(南京)有限公司 Deep learning-based endplate inflammatory degeneration grading method and system

Similar Documents

Publication Publication Date Title
CN111814722B (en) Method and device for identifying table in image, electronic equipment and storage medium
CN110598566A (en) Image processing method, device, terminal and computer readable storage medium
US9542752B2 (en) Document image compression method and its application in document authentication
CN103310211B (en) A kind ofly fill in mark recognition method based on image procossing
US20130208986A1 (en) Character recognition
US20070253040A1 (en) Color scanning to enhance bitonal image
US11836969B2 (en) Preprocessing images for OCR using character pixel height estimation and cycle generative adversarial networks for better character recognition
CN101122953A (en) Picture words segmentation method
CN109784342A (en) A kind of OCR recognition methods and terminal based on deep learning model
CN116071763B (en) Teaching book intelligent correction system based on character recognition
CN112307919B (en) Improved YOLOv 3-based digital information area identification method in document image
CN112507782A (en) Text image recognition method and device
EP0949579A2 (en) Multiple size reductions for image segmentation
CN102737240B (en) Method of analyzing digital document images
RU2581786C1 (en) Determination of image transformations to increase quality of optical character recognition
CN112446259A (en) Image processing method, device, terminal and computer readable storage medium
CN109741273A (en) A kind of mobile phone photograph low-quality images automatically process and methods of marking
EP2545498B1 (en) Resolution adjustment of an image that includes text undergoing an ocr process
US20140086473A1 (en) Image processing device, an image processing method and a program to be used to implement the image processing
JP6630341B2 (en) Optical detection of symbols
CN112036294B (en) Method and device for automatically identifying paper form structure
CN113139535A (en) OCR document recognition method
KR20150099116A (en) Method for recognizing a color character using optical character recognition and apparatus thereof
CN115410191A (en) Text image recognition method, device, equipment and storage medium
Manlises et al. Expiry Date Character Recognition on Canned Goods Using Convolutional Neural Network VGG16 Architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191220

WD01 Invention patent application deemed withdrawn after publication