CN110598566A

CN110598566A - Image processing method, device, terminal and computer readable storage medium

Info

Publication number: CN110598566A
Application number: CN201910760632.0A
Authority: CN
Inventors: 贺涛; 欧阳一村; 曾志辉; 邢军华; 许文龙
Original assignee: ZTE ICT Technologies Co Ltd
Current assignee: ZTE ICT Technologies Co Ltd
Priority date: 2019-08-16
Filing date: 2019-08-16
Publication date: 2019-12-20

Abstract

The invention provides an image processing method, an image processing device, a terminal and a computer readable storage medium, wherein the image processing method comprises the following steps: inputting the image with the specified format into a text detection model, and outputting the text content in the image and coordinate position information corresponding to the text content by the text detection model; and carrying out statistical analysis on the coordinate position information, setting a threshold value according to the coordinate position information, determining a position frame corresponding to the character information according to the threshold value, and filtering out non-character information except the position frame. According to the technical scheme, the text detection model is used as a filter to filter the non-character information in the image, so that the accuracy and the calculation speed of text content detection are improved.

Description

Image processing method, device, terminal and computer readable storage medium

Technical Field

The present invention relates to the field of image processing, and in particular, to an image processing method, an image processing apparatus, a terminal, and a computer-readable storage medium.

Background

With the wide application of image-based Character Recognition at present, image-based OCR (Optical Character Recognition) is widely used in the business field, such as text information Recognition of bill Recognition, identification card Recognition, and the like. In the existing education industry, examinations are an important means for teachers to know knowledge points of students, analysis and statistics of examination conditions of the students occupy a large part of workload of the teachers, examination paper is different from identity cards, bank cards, invoices and the like, has relatively fixed formats and contents, and the examination paper of each school or each education institution has respective typesetting.

In the related art, when a test paper is scanned to extract specific text information, the following technical problems exist by using the conventional image processing method: for one, the conventional line inspection method is greatly affected by the quality of the scanned image, such as the color and quality of the paper and the poor quality of the image formed by the scanner. Secondly, the test paper has various text structures and may contain numerous information such as characters, formulas, tables, images and the like, when the traditional image text line detection method is used, the text line detection method is used as a detector for identifying text contents, a large number of text line screening processes are required to be added to remove interference information such as table images, the preprocessing methods are complicated mostly, and the test paper with different formats needs different screening processes, so that the identification accuracy and the calculation speed are influenced, and the development efficiency is reduced.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art or the related art.

To this end, it is an object of the present invention to provide an image processing method.

Another object of the present invention is to provide an image processing apparatus.

Another object of the present invention is to provide a terminal.

It is another object of the present invention to provide a computer-readable storage medium.

In order to achieve the above object, according to an aspect of the first aspect of the present invention, there is provided an image processing method including: inputting the image with the specified format into a text detection model, and outputting the text content in the image and coordinate position information corresponding to the text content by the text detection model; and carrying out statistical analysis on the coordinate position information, setting a threshold value according to the coordinate position information, determining a position frame corresponding to the character information according to the threshold value, and filtering out non-character information except the position frame.

In the technical scheme, the image in the specified format corresponds to the image to be recognized, the image to be recognized can be converted into the image in the specified format through preprocessing, the image in the specified format is input into the text detection model, the text content of the image and the coordinate position information of the text can be acquired, the image is filtered according to the coordinate position information, and non-character information in the text content can be filtered. The present application uses a deep-learned text detection model as a filter, which is exactly the opposite of the conventional scheme (which uses the text detection model as a detector). The text line detection algorithm based on the deep learning is used as a filter, the traditional text line detection can be further carried out on the filtered image, the text positioning result of the text line detection algorithm based on the deep learning and the traditional algorithm result are subjected to statistical fusion, the inclination correction is carried out, and the final result (coordinate position information) is extracted according to the width, height, character spacing and other thresholds of the characters in the image.

The text detection model is a relevant model of a deep learning-based target detection algorithm, for example: CTPN, fast-RCNN, SSD, etc., among which CTPN (full name "Detecting Text in Natural Image with connection Text forward Network", based on Text detection connected to a preselected frame Network), fast-RCNN (full name "fast Region-CNN", CNN full name "volumetric Neural Network", fast domain-Convolutional Neural Network), SSD (full name "Single short multi box Detector", a multi-target detection algorithm that directly predicts a target class and bounding box).

In addition, some models dedicated to text detection can achieve similar technical effects, such as: EAST, textboxes + +, and SegLink, where EAST (collectively, "an Efficient and accurate scene Text detection pipeline"), textboxes + + (aspect-Shot organized scene Text Detector, an SSD-based end-to-end trainable fast scene Text Detector), and SegLink (segment-link, a scene Oriented Text detection algorithm).

It can be understood that, the conventional text line detection method needs to set a plurality of fixed thresholds, the setting of such thresholds may require that the imaging quality of the scanned image is stable, and if the imaging quality fluctuates, the set thresholds may need to be adjusted, which may affect the efficiency of project use and the cost of project maintenance in the later period. The problem of low environmental adaptability which cannot be solved by a traditional text line detection method is effectively solved by utilizing the robustness performance of a deep learning text line detection algorithm, for example, the quality of test paper and the imaging quality of a scanner have great influence on the traditional text line detection method, and the influence of the factors can be effectively reduced by using the deep learning method.

According to the image processing method of the above technical solution, optionally, the method further includes: and identifying the filtered image by using an image connected region detection method, and identifying text content in the image.

In the technical scheme, the images are processed based on a deep learning text detection model, and the filtered images are identified by combining an image connected region detection method. Because the text detection model based on the deep learning filters out the non-character impurity information in the image, the text positioning result of the text line detection algorithm based on the deep learning and the result of the image connected region detection method are statistically fused, and the accuracy of image identification is improved.

According to the image processing method in any of the above technical solutions, optionally, the text detection model corresponds to a text recognition network model that determines a text region in the image and then determines a text line in the text region, and the inputting of the image in the specified format into the text detection model specifically includes: and inputting the image with the specified format into the trained character recognition network model for detection.

In the technical scheme, the text detection model is a CTPN model, the trained CTPN model has higher robustness and can solve the problem of low environmental adaptability which cannot be solved by the traditional text line detection method, for example, the quality of test paper and the imaging quality of a scanner have great influence on the traditional text line detection method, and the influence of the factors can be effectively reduced by using the CTPN model trained based on deep learning.

According to the image processing method of any one of the above technical solutions, optionally, the outputting, by the text detection model, the text content in the image and the coordinate position information corresponding to the text content specifically includes: determining a zooming threshold value according to the paper type of the carrier of the image, zooming the image exceeding the zooming threshold value, and performing text recognition on the zoomed image by the text detection model so as to output text content in the image and coordinate position information corresponding to the text content.

In the technical scheme, the zoom threshold is set according to the paper model of the carrier (such as the normalized paper of test paper, file, questionnaire and the like) of the image to be recognized, the zoomed image is more beneficial to implementing text detection, and the processing speed and accuracy of the character recognition network model are improved.

According to the image processing method of any one of the above technical solutions, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and adjusting the value of the width-height ratio parameter of the character recognition network model according to the arrangement direction of the characters in the image so as to enhance the information extraction capability of the character recognition network model in the arrangement direction, and performing text recognition on the image by the text detection model after adjusting the width-height ratio parameter so as to output text content in the image and coordinate position information corresponding to the text content.

In the technical scheme, the different text arrangement directions can cause different text recognition capabilities of the character recognition network model, and the width-height ratio parameter of the character recognition network model is adjusted according to the text arrangement directions, so that the character recognition network model can adapt to the text arrangement directions, and the recognition accuracy and the recognition speed are improved. For example, modifying the SCALES _ BASE in the aspect ratio parameter to (0.25,0.5,1.0,2.0,3.0) to (1.0,2.0,3.0, 5.0,10.0) can enhance the information extraction capability of the text recognition network model in the horizontal direction.

According to the image processing method of any one of the above technical solutions, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and after a plurality of prediction frames capable of detecting the text characters and the punctuations are determined, the text detection model performs text recognition on the image so as to output text content in the image and coordinate position information corresponding to the text content.

In the technical scheme, the prediction box is used for selecting texts (characters), the height of the prediction box determines the height of the texts (characters) which can be recognized by the prediction box, in order to ensure that high-height text characters and low-height punctuation marks can be recognized accurately, values of the pixel height of the prediction box of the character recognition network model are adjusted to obtain a plurality of prediction boxes with different heights, and specific numerical values of the heights of the prediction boxes are determined according to the character word numbers and the punctuation marks in the images. For example, the height of the prediction frame is set to start from 8 pixels, and 10 possible prediction frames are obtained by adding 4 pixels each time, that is, the height of the first prediction frame is 8 pixels, and the height of the tenth prediction frame is 44 pixels, so as to ensure that effective text information including text-to-punctuation marks can be detected.

The image processing method according to any one of the above technical solutions, optionally, further includes: converting a scanned image of an image to be recognized into a gray image; enhancing the image quality of the gray level image according to an image denoising algorithm and an image enhancement algorithm to obtain an enhanced image; processing the enhanced image according to a self-adaptive threshold segmentation algorithm to obtain a binary segmentation threshold; and converting the enhanced image into an image in a specified format according to the binary segmentation threshold value.

In the technical scheme, an image with a specified format, namely a binary image, is scanned to obtain a scanned image of an image to be identified, the scanned image is converted into a gray image, the gray image is denoised and enhanced, then a binary segmentation threshold value is determined, and the binary image is obtained according to the binary segmentation threshold value, so that the image is preprocessed. The image binarization can reduce the influence of interference factors in the color image on the text detection model and improve the detection accuracy. The image denoising and the image enhancement can keep stroke information in the characters as much as possible and improve the detection accuracy. In addition, the preprocessing denoising algorithm which can be used in the preprocessing process includes mean filtering, median filtering and the like, and the image enhancement algorithm includes methods of linear transformation enhancement, histogram equalization transformation and the like. In practice, the original color image is first converted into a gray-scale image, and a median filtering algorithm may be used to remove noise in the image in order to effectively remove salt-pepper noise in the image, and preferably, only 3 × 3 templates are used in the median filtering algorithm. In the process of image enhancement, in order to meet the robustness of an image algorithm, a histogram equalization transformation method can be used for enhancing the global contrast of an image, so that the problem caused by possible illumination in the process of scanning a test paper is solved.

According to the image processing method of any one of the above technical solutions, optionally, the coordinate position information is statistically analyzed, a threshold is set according to the coordinate position information, a position frame corresponding to the text information is determined according to the threshold, and non-text information other than the position frame is filtered, specifically including: determining a text box according to the coordinate position information, counting the height information of the text box, solving the mean value of the height information, and setting a height threshold value according to the mean value; extracting a normal position frame and extracting an abnormal position frame with the height being greater than or less than the height threshold according to the height threshold; and filtering out the image pixels outside the normal position frame to obtain a first filtered image.

According to the technical scheme, the text box can be determined according to coordinate position information, the height average value of the text box is counted, a threshold value is set near the average value to extract information of an abnormal position box with too large height or too small height and information of a normal position box, and non-character information such as icons, formulas, noise and the like is contained in the abnormal position box. According to the abnormal position frame, non-character information such as icons, formulas and the like in the frame can be identified. In addition, the image pixels outside the normal position frame are filtered, the image of the character information (text line) can be obtained, in the process, the text detection model is combined with a statistical analysis method to complete the extraction of the text line information in the image, and the accuracy of text content identification and the accuracy of positioning are improved.

According to the image processing method of any one of the above technical solutions, optionally, after obtaining the first filtered image, the method further includes: counting the average value of the length of the position frame in the horizontal direction, setting a length threshold value according to the length average value, counting the average values of the starting position and the ending position of each line in the vertical direction, and setting a line threshold value according to the average values of the starting position and the ending position; extracting a plurality of horizontal position frames in the horizontal direction according to the length threshold value, and extracting a plurality of vertical position frames in the vertical direction according to the line threshold value; and filtering out the pixels of the image except the horizontal position frame and the vertical position frame to obtain a second filtered image.

According to the technical scheme, the corresponding threshold value is set by counting the average value of the length of the position frame in the horizontal direction and counting the average value of the starting position and the ending position of each line in the vertical direction, the horizontal position frame and the vertical position frame are determined according to the threshold value, pixels except the horizontal position frame and the vertical position frame can be filtered, non-text information in the image is further filtered, and the detection accuracy is improved.

According to the image processing method of any one of the above technical solutions, optionally, after obtaining the second filtered image, the method further includes: calculating the inclination angle of the straight line segment at the center of each text line of the second filtered image according to a morphological processing algorithm and a connected region rectangular frame position detection algorithm; determining layout information of the text according to the horizontal position frame and the vertical position frame; extracting a text region image from the first filtered image according to the layout information; and correcting the text region image according to the inclination angle to obtain a corrected image, wherein the corrected image corresponds to the filtered image.

According to the technical scheme, the image is subjected to tilt detection and tilt correction according to a morphological processing algorithm and a connected region rectangular frame position detection algorithm, the layout information of the text is determined, the text information is accurately acquired by combining a text detection model based on deep learning, a traditional morphological processing algorithm and a connected region rectangular frame position detection algorithm, and the identification accuracy is improved.

The image processing method according to any one of the above technical solutions, optionally, further includes: and performing connected region detection and extension detection on the corrected image to obtain rectangular frame information of the characters, and aggregating the rectangular frames of each line according to the width and height information of the preset printing font and the distance information between the characters to obtain coordinate position information of the text line.

According to the technical scheme, non-text impurity information in an image is filtered through a text detection model, a deep learning text detection model is used as the image filtered by a filter to perform traditional text line detection, a text positioning result of the text detection model based on the deep learning and a traditional algorithm result are subjected to statistical fusion to correct an inclined text, and a final result (namely position coordinate information of text content to be recognized in the image) is extracted according to thresholds such as the width and height of text characters in the image, character intervals and the like.

According to the image processing method of any of the above technical solutions, optionally, the text detection model is trained by a data set produced from a text row picture generated from the examination question bank.

According to an aspect of the second aspect of the present invention, there is provided an image processing apparatus including: a memory, a processor and a program stored in the memory and executable on the processor, the program, when executed by the processor, implementing the steps of the image processing method according to any one of the above-mentioned technical solutions. The image processing apparatus includes all the advantages of the image processing method according to any of the above technical solutions, and details are not repeated here.

According to a technical solution of the third aspect of the present invention, there is also provided a terminal, including: the image processing apparatus according to the second aspect of the present invention. The terminal includes all the advantages of the image processing method according to any of the above technical solutions, which are not described herein again.

According to an aspect of the fourth aspect of the present invention, there is also provided a computer-readable storage medium, on which a computer program is stored, the computer program, when executed, implementing the image processing method defined in any one of the aspects of the first aspect.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 shows a schematic flow diagram of an image processing method according to an embodiment of the invention;

FIG. 2 shows a schematic diagram of an image processing method according to another embodiment of the invention;

FIG. 3 shows a schematic block diagram of an image processing apparatus according to an embodiment of the present invention;

FIG. 4 shows a schematic block diagram of a terminal according to one embodiment of the present invention;

FIG. 5 shows a schematic block diagram of a computer-readable storage medium according to an embodiment of the invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

Example one

As shown in fig. 1, an image processing method according to an embodiment of the present invention includes:

102, inputting an image with a specified format into a text detection model, and outputting text content in the image and coordinate position information corresponding to the text content by the text detection model;

and 104, carrying out statistical analysis on the coordinate position information, setting a threshold value according to the coordinate position information, determining a position frame corresponding to the text information according to the threshold value, and filtering out non-text information except the position frame.

In the embodiment, the image in the specified format corresponds to the image to be recognized, the image to be recognized can be converted into the image in the specified format through preprocessing, the image in the specified format is input into the text detection model, the text content of the image and the coordinate position information of the text can be acquired, the image is filtered according to the coordinate position information, and non-character information in the text content can be filtered. The present application uses a deep-learned text detection model as a filter, which is exactly the opposite of the conventional scheme (which uses the text detection model as a detector). The text line detection algorithm based on the deep learning is used as a filter, the traditional text line detection can be further carried out on the filtered image, the text positioning result of the text line detection algorithm based on the deep learning and the traditional algorithm result are subjected to statistical fusion, the inclination correction is carried out, and the final result (coordinate position information) is extracted according to the width, height, character spacing and other thresholds of the characters in the image.

The text detection model is a relevant model of a deep learning-based target detection algorithm, for example: CTPN, fast-RCNN, SSD, etc.

In addition, some models dedicated to text detection can achieve similar technical effects, such as: EAST, textboxes + +, and SegLink, among other algorithmic models.

The image processing method according to the above embodiment, optionally, further includes: and identifying the filtered image by using an image connected region detection method, and identifying text content in the image.

In the embodiment, the filtered image is identified by combining an image connected region detection method based on the processing of the deep learning text detection model on the image. Because the text detection model based on the deep learning filters out the non-character impurity information in the image, the text positioning result of the text line detection algorithm based on the deep learning and the result of the image connected region detection method are statistically fused, and the accuracy of image identification is improved.

According to the image processing method in the foregoing embodiment, optionally, the text detection model corresponds to a text recognition network model that determines a text region in the image and then determines a text line in the text region, and the inputting of the image in the specified format into the text detection model specifically includes: and inputting the image with the specified format into the trained character recognition network model for detection.

In the embodiment, the character recognition network model is a CTPN model, the trained CTPN model has higher robustness and can make up the problem of low environmental adaptability which cannot be solved by the traditional text line detection method, for example, the quality of test paper and the imaging quality of a scanner have great influence on the traditional text line detection method, and the influence of the factors can be effectively reduced by using the CTPN model trained based on deep learning.

According to the image processing method in the foregoing embodiment, optionally, the outputting, by the text detection model, the text content in the image and the coordinate position information corresponding to the text content specifically includes: determining a zooming threshold value according to the paper type of the carrier of the image, zooming the image exceeding the zooming threshold value, and performing text recognition on the zoomed image by the text detection model so as to output text content in the image and coordinate position information corresponding to the text content.

In the embodiment, the zoom threshold is set according to the paper model of the carrier (such as the normalized paper of test paper, file, questionnaire and the like) of the image to be recognized, the zoomed image is more beneficial to implementing text detection, and the processing speed and accuracy of the character recognition network model are improved.

According to the image processing method in the foregoing embodiment, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and adjusting the value of the width-height ratio parameter of the character recognition network model according to the arrangement direction of the characters in the image so as to enhance the information extraction capability of the character recognition network model in the arrangement direction, and performing text recognition on the image by the text detection model after adjusting the width-height ratio parameter so as to output text content in the image and coordinate position information corresponding to the text content.

In the embodiment, the different text arrangement directions can cause different text recognition capabilities of the character recognition network model, and the width-to-height ratio parameter of the character recognition network model is adjusted according to the text arrangement directions, so that the character recognition network model can adapt to the text arrangement directions, and the recognition accuracy and the recognition speed are improved. For example, modifying the SCALES _ BASE in the aspect ratio parameter to (0.25,0.5,1.0,2.0,3.0) to (1.0,2.0,3.0, 5.0,10.0) can enhance the information extraction capability of the text recognition network model in the horizontal direction.

According to the image processing method in the foregoing embodiment, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and after a plurality of prediction frames capable of detecting the text characters and the punctuations are determined, the text detection model performs text recognition on the image so as to output text content in the image and coordinate position information corresponding to the text content.

In the embodiment, the prediction box is used for selecting texts (characters), the height of the prediction box determines the height of the texts (characters) which can be recognized by the prediction box, in order to ensure that text characters with higher height and punctuation marks with lower height can be recognized accurately, values of the pixel height of the prediction box of the character recognition network model are adjusted to obtain a plurality of prediction boxes with different heights, and specific numerical values of the heights of the prediction boxes are determined according to the character word number and the punctuation mark height in the image. For example, the height of the prediction frame is set to start from 8 pixels, and 10 possible prediction frames are obtained by adding 4 pixels each time, that is, the height of the first prediction frame is 8 pixels, and the height of the tenth prediction frame is 44 pixels, so as to ensure that effective text information including text-to-punctuation marks can be detected.

The image processing method according to the above embodiment, optionally, further includes: converting a scanned image of an image to be recognized into a gray image; enhancing the image quality of the gray level image according to an image denoising algorithm and an image enhancement algorithm to obtain an enhanced image; processing the enhanced image according to a self-adaptive threshold segmentation algorithm to obtain a binary segmentation threshold; and converting the enhanced image into an image in a specified format according to the binary segmentation threshold value.

In the embodiment, an image with a specified format, namely a binary image, is scanned to obtain a scanned image of an image to be identified, the scanned image is converted into a gray image, the gray image is denoised and enhanced, a binary segmentation threshold value of the gray image is determined, and the binary image is obtained according to the binary segmentation threshold value, so that the image is preprocessed. The image binarization can reduce the influence of interference factors in the color image on the text detection model and improve the detection accuracy. The image denoising and the image enhancement can keep stroke information in the characters as much as possible and improve the detection accuracy. In addition, the preprocessing denoising algorithm which can be used in the preprocessing process includes mean filtering, median filtering and the like, and the image enhancement algorithm includes methods of linear transformation enhancement, histogram equalization transformation and the like. In practice, the original color image is first converted into a gray-scale image, and a median filtering algorithm may be used to remove noise in the image in order to effectively remove salt-pepper noise in the image, and preferably, only 3 × 3 templates are used in the median filtering algorithm. In the process of image enhancement, in order to meet the robustness of an image algorithm, a histogram equalization transformation method can be used for enhancing the global contrast of an image, so that the problem caused by possible illumination in the process of scanning a test paper is solved.

According to the image processing method of the embodiment, optionally, the coordinate position information is statistically analyzed, a threshold is set according to the coordinate position information, a position frame corresponding to the text information is determined according to the threshold, and non-text information other than the position frame is filtered, which specifically includes: determining a text box according to the coordinate position information, counting the height information of the text box, solving the mean value of the height information, and setting a height threshold value according to the mean value; extracting a normal position frame and extracting an abnormal position frame with the height being greater than or less than the height threshold according to the height threshold; and filtering out the image pixels outside the normal position frame to obtain a first filtered image.

In the embodiment, the text box can be determined by the coordinate position information, the height average value of the text box is counted, a threshold value is set near the average value to extract the information of the abnormal position box with too large height or too small height and the information of the normal position box, and the abnormal position box contains non-character information such as icons, formulas, noise and the like. According to the abnormal position frame, non-character information such as icons, formulas and the like in the frame can be identified. In addition, the image pixels outside the normal position frame are filtered, the image of the character information (text line) can be obtained, in the process, the text detection model is combined with a statistical analysis method to complete the extraction of the text line information in the image, and the accuracy of text content identification and the accuracy of positioning are improved.

According to the image processing method of the foregoing embodiment, optionally, after obtaining the first filtered image, the method further includes: counting the average value of the length of the position frame in the horizontal direction, setting a length threshold value according to the length average value, counting the average values of the starting position and the ending position of each line in the vertical direction, and setting a line threshold value according to the average values of the starting position and the ending position; extracting a plurality of horizontal position frames in the horizontal direction according to the length threshold value, and extracting a plurality of vertical position frames in the vertical direction according to the line threshold value; and filtering out the pixels of the image except the horizontal position frame and the vertical position frame to obtain a second filtered image.

In the embodiment, the corresponding threshold value is set by counting the average value of the length of the position frame in the horizontal direction and counting the average value of the starting position and the ending position of each line in the vertical direction, and the horizontal position frame and the vertical position frame are determined according to the threshold value.

According to the image processing method of the foregoing embodiment, optionally, after obtaining the second filtered image, the method further includes: calculating the inclination angle of the straight line segment at the center of each text line of the second filtered image according to a morphological processing algorithm and a connected region rectangular frame position detection algorithm; determining layout information of the text according to the horizontal position frame and the vertical position frame; extracting a text region image from the first filtered image according to the layout information; and correcting the text region image according to the inclination angle to obtain a corrected image, wherein the corrected image corresponds to the filtered image.

In the embodiment, the image is subjected to tilt detection and tilt correction according to the morphological processing algorithm and the connected region rectangular frame position detection algorithm, the layout information of the text is determined, and the text information is accurately acquired by combining the text detection model based on deep learning, the traditional morphological processing algorithm and the connected region rectangular frame position detection algorithm, so that the identification accuracy is improved.

The image processing method according to the above embodiment, optionally, further includes: and performing connected region detection and extension detection on the corrected image to obtain rectangular frame information of the characters, and aggregating the rectangular frames of each line according to the width and height information of the preset printing font and the distance information between the characters to obtain coordinate position information of the text line.

In the embodiment, non-text impurity information in an image is filtered through a text detection model, a deep learning text detection model is used as a filter for filtering the image to carry out traditional text line detection, a text positioning result of the text detection model based on the deep learning and a traditional algorithm result are subjected to statistical fusion to correct an inclined text, and a final result (namely position coordinate information of text content to be recognized in the image) is extracted according to thresholds such as width and height of text characters in the image, character spacing and the like.

According to the image processing method of the above embodiment, optionally, the text detection model is trained by a data set made of text row pictures generated from the test question library.

Example two

According to another embodiment of the present invention, an image processing method applied to a scene of layout analysis and text information extraction of a test paper includes:

step 1: and converting the scanned image into a gray image, and then enhancing the image quality by using an image denoising algorithm and an image enhancement algorithm.

Step 2: and processing the enhanced test paper image by using an adaptive threshold segmentation algorithm to obtain a binary segmentation threshold, and converting the image into a binary image (an image with a specified format).

And step 3: the images are sent into a pre-trained CTPN model for detection (text detection model), and the detection result of the CTPN model is obtained and comprises coordinate position information L of text lines_original。

The font size and text lines in the test paper have fixed characteristics, for example, the width and height of a single character of a scanned test paper are generally about 32 pixel points, the number of characters of each line of text of the test paper in two layouts is generally within 45 characters, and the number of characters of each line of text of the test paper in 4 layouts is generally about 25. Because the general CTPN model is used for detecting text contents in photos, for text contents with fixed font sizes such as test paper, algorithm modification is performed in a targeted manner to improve the processing speed and accuracy of the CTPN, and the modification comprises the following contents:

the maximum type of the examination paper is A3 type, the maximum threshold of the image zooming long side is set to be 3400 pixel points, and the maximum threshold of the zooming short side is set to be 2400 pixel points.

Secondly, because the text information in the test paper is transversely arranged, feature extraction in a general algorithm is removed, and partial values of (0.25,0.5,1.0,2.0 and 3.0) of SCALES _ BASE in the width-height ratio parameter are modified into (1.0,2.0,3.0, 5.0 and 10.0) values, so that the information extraction capability in the horizontal direction is enhanced.

Thirdly, because the text in the test paper is the effective information of the subject, the improved algorithm is set to obtain 10 possible prediction frames by adding 4 pixels from 8 pixels each time. Namely, the height of the first prediction frame is 8 pixels, the height of the second prediction frame is 12 pixels, the height of the third prediction frame is 16 pixels, the height of the fourth prediction frame is 20 pixels, the height of the fifth prediction frame is 24 pixels, the height of the sixth prediction frame is 28 pixels, the height of the seventh prediction frame is 32 pixels, the height of the eighth prediction frame is 36 pixels, the height of the ninth prediction frame is 40 pixels, and the height of the tenth prediction frame is 44 pixels, so that effective text information including text-to-punctuation marks is ensured to be detected, and the detection result refers to the text reference figure 2.

And 4, step 4: according to the detection result L_originalCounting the height information of the text box in the detection result, calculating the average value of the height information, setting a reasonable threshold value near the average value and extracting the position information L of the abnormal box with too large or too small height_abnormalAnd normal position frame information L_normalThe images of these areas within the frame include non-text information such as icons, formulas, and noise in the test paper.

And 5: using image processing method to process the image in the step 2 to extract the normal position area L_normalThe pixel values of the other images are reset to 255 (i.e., the filling is white), and a new filtered test paper image a is obtained, wherein the image is an image in which the non-text information in the test paper is filtered. Counting the average value of the length of the frame in the horizontal direction and the average value of the starting position and the ending position of each line in the vertical direction from the non-abnormal position frame obtained by filtering in the step 4, and setting a threshold value to find out the bits of a plurality of frames near the average value in the horizontal directionPut frame information L_{H_mean}And position frame information L of a plurality of frames in the vicinity of the vertical direction mean value_{V_mean}。

The traditional text line detection method needs to set a plurality of fixed thresholds, the setting of the thresholds can require that the imaging quality of a scanned image is stable, and if the imaging quality fluctuates, the set thresholds need to be adjusted, which can affect the efficiency of later project use and the cost of project maintenance. The above-mentioned steps 3 to 5 are the proposed solutions based on this situation. The implementation mode is mainly used for solving the problem of low environmental adaptability which cannot be solved by the traditional text line detection method by utilizing the robustness of the deep learning text line detection algorithm, for example, the quality of test paper and the imaging quality of a scanner have great influence on the traditional text line detection method, but the influence of the factors can be effectively reduced by matching with the deep learning method.

Step 6: and (3) reserving image pixel values in a position frame area near the mean value extracted in the step (5) by using an image processing method for the image A in the step (5), and setting the pixel values of other areas to be 255 (namely, filling the image to be white), so as to obtain a new image B.

And 7: using the image B obtained in the step 6 to obtain the straight line segment of the center of each text line by using the traditional image processing algorithms such as morphological processing, rectangular frame position detection of a connected region and the like, obtaining the slope k of each straight line, and further calculating to obtain the inclination angle of each straight line

And 8: based on the position frame information L of a plurality of frames near the horizontal direction mean value obtained in the step 6_{H_mean}And position frame information L of a plurality of frames in the vicinity of the vertical direction mean value_{V_mean}Combining and analyzing to obtain layout position information L of the text in the test paper_sectionBased on the layout position information L_sectionA plurality of title text region images C are obtained on a test paper image A.

And step 9: performing tilt correction on the multiple images C obtained from one test paper at a correction angle ofThe inclination angle calculated in step 7A plurality of images D are obtained.

Step 10: and performing connected region detection on the corrected image D by using an image processing method, performing extension detection to obtain rectangular frame information of the characters, and aggregating the rectangular frame information of each line according to the width and height information of the preset test paper printing font and the distance information between the characters to obtain final position coordinate information of the text line to be recognized.

The method comprises the following steps of 1 to 5, filtering out non-text impurity information in a test paper by using a text line detection algorithm based on deep learning, carrying out traditional text line detection by using the text line detection algorithm based on deep learning as an image filtered by a filter in the subsequent steps 6 to 10, carrying out statistical fusion on a text positioning result of the text line detection algorithm based on deep learning and a traditional algorithm result to correct an oblique text, and extracting a final result (namely position coordinate information of text content to be recognized in a scanned image of the test paper) according to thresholds such as width and height of text characters, character spacing and the like in the test paper.

EXAMPLE III

As shown in fig. 3, an image processing apparatus 300 according to an embodiment of the present invention includes: a memory 302, a processor 304 and a program stored on the memory 302 and executable on the processor 304, which when executed by the processor 304, implement the steps of the image processing method according to any of the embodiments described above. The image processing apparatus 300 includes all the advantages of the image processing method according to any of the above embodiments, and will not be described herein again.

Example four

As shown in fig. 4, a terminal 400 according to an embodiment of the present invention includes: the image processing apparatus 300 according to the third embodiment. The terminal 400 is capable of implementing in operation: inputting the image with the specified format into a text detection model, and outputting the text content in the image and coordinate position information corresponding to the text content by the text detection model; and filtering out non-character information in the text content according to the coordinate position information. The terminal 400 includes all the advantages of the image processing method according to any of the above embodiments, and will not be described herein again.

EXAMPLE five

As shown in fig. 5, according to an embodiment of the present invention, there is further provided a computer readable storage medium 500, on which a computer program 502 is stored, wherein the computer program 502 implements the image processing method defined in any one of the above embodiments when executed.

In this embodiment, the computer program 502 when executed implements: inputting the image with the specified format into a text detection model, and outputting the text content in the image and coordinate position information corresponding to the text content by the text detection model; and carrying out statistical analysis on the coordinate position information, setting a threshold value according to the coordinate position information, determining a position frame corresponding to the character information according to the threshold value, and filtering out non-character information except the position frame.

The image with the specified format corresponds to the image to be recognized, the image to be recognized can be converted into the image with the specified format through preprocessing, the image with the specified format is input into the text detection model, text content of the image and coordinate position information of the text can be obtained, the image is filtered according to the coordinate position information, and non-character information in the text content can be filtered. The present application uses a deep-learned text detection model as a filter, which is exactly the opposite of the conventional scheme (which uses the text detection model as a detector). The text line detection algorithm based on the deep learning is used as a filter, the traditional text line detection can be further carried out on the filtered image, the text positioning result of the text line detection algorithm based on the deep learning and the traditional algorithm result are subjected to statistical fusion, the inclination correction is carried out, and the final result (coordinate position information) is extracted according to the width, height, character spacing and other thresholds of the characters in the image.

The computer program 502 according to the above technical solution optionally further includes: and identifying the filtered image by using an image connected region detection method, and identifying text content in the image.

According to the computer program 502 in any one of the above technical solutions, optionally, the text detection model corresponds to a text recognition network model that determines a text region in an image and then determines a text line in the text region, and inputting the image in a specified format into the text detection model specifically includes: and inputting the image with the specified format into the trained character recognition network model for detection.

In the technical scheme, the character recognition network model is a CTPN model, the trained CTPN model has higher robustness and can solve the problem of low environmental adaptability which cannot be solved by the traditional text line detection method, for example, the quality of test paper and the imaging quality of a scanner have great influence on the traditional text line detection method, and the influence of the factors can be effectively reduced by using the CTPN model trained based on deep learning.

According to the computer program 502 in any one of the above technical solutions, optionally, the outputting, by the text detection model, the text content in the image and the coordinate position information corresponding to the text content specifically include: determining a zooming threshold value according to the paper type of the carrier of the image, zooming the image exceeding the zooming threshold value, and performing text recognition on the zoomed image by the text detection model so as to output text content in the image and coordinate position information corresponding to the text content.

According to the computer program 502 of any one of the above technical solutions, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and adjusting the value of the width-height ratio parameter of the character recognition network model according to the arrangement direction of the characters in the image so as to enhance the information extraction capability of the character recognition network model in the arrangement direction, and performing text recognition on the image by the text detection model after adjusting the width-height ratio parameter so as to output text content in the image and coordinate position information corresponding to the text content.

According to the computer program 502 of any one of the above technical solutions, optionally, the text detection model outputs the text content in the image and coordinate position information corresponding to the text content, and specifically includes: and after a plurality of prediction frames capable of detecting the text characters and the punctuations are determined, the text detection model performs text recognition on the image so as to output text content in the image and coordinate position information corresponding to the text content.

In the technical scheme, the prediction box is used for selecting texts (characters), the height of the prediction box determines the height of the texts (characters) which can be recognized by the prediction box, in order to ensure that high-height text characters and low-height punctuation marks can be recognized accurately, values of the pixel height of the prediction box of the character recognition network model are adjusted to obtain a plurality of prediction boxes with different heights, and specific numerical values of the heights of the prediction boxes are determined according to the character word numbers and the punctuation marks in the images. For example, the height of the prediction frame is set to start from 8 pixel points, and 10 possible prediction frames are obtained by adding 4 pixel points each time, that is, the height of the first prediction frame is 8 pixels, the height of the second prediction frame is 12 pixels, the height of the third prediction frame is 16 pixels, the height of the fourth prediction frame is 20 pixels, the height of the fifth prediction frame is 24 pixels, the height of the sixth prediction frame is 28 pixels, the height of the seventh prediction frame is 32 pixels, the height of the eighth prediction frame is 36 pixels, the height of the ninth prediction frame is 40 pixels, and the height of the tenth prediction frame is 44 pixels, so as to ensure that effective text information including text-to-punctuation symbols can be detected.

The computer program 502 according to any of the above technical solutions, optionally, further includes: converting a scanned image of an image to be recognized into a gray image; enhancing the image quality of the gray level image according to an image denoising algorithm and an image enhancement algorithm to obtain an enhanced image; processing the enhanced image according to a self-adaptive threshold segmentation algorithm to obtain a binary segmentation threshold; and converting the enhanced image into an image in a specified format according to the binary segmentation threshold value.

According to the computer program 502 of any one of the above technical solutions, optionally, filtering out non-text information in the text content according to the coordinate position information specifically includes: determining a text box according to the coordinate position information, counting the height information of the text box, solving the mean value of the height information, and setting a height threshold value according to the mean value; extracting a normal position frame and extracting an abnormal position frame with too large height or too small height according to the height threshold; and filtering out the image pixels outside the normal position frame to obtain a first filtered image.

The computer program 502 according to any of the above technical solutions, optionally, after obtaining the first filtered image, further includes: counting the average value of the length of the position frame in the horizontal direction, setting a length threshold value according to the length average value, counting the average values of the starting position and the ending position of each line in the vertical direction, and setting a line threshold value according to the average values of the starting position and the ending position; extracting a plurality of horizontal position frames in the horizontal direction according to the length threshold value, and extracting a plurality of vertical position frames in the vertical direction according to the line threshold value; and filtering out the pixels of the image except the horizontal position frame and the vertical position frame to obtain a second filtered image.

The computer program 502 according to any of the above technical solutions, optionally, after obtaining the second filtered image, further includes: calculating the inclination angle of the straight line segment at the center of each text line of the second filtered image according to a morphological processing algorithm and a connected region rectangular frame position detection algorithm; determining layout information of the text according to the horizontal position frame and the vertical position frame; extracting a text region image from the first filtered image according to the layout information; and correcting the text region image according to the inclination angle to obtain a corrected image, wherein the corrected image corresponds to the filtered image.

The computer program 502 according to any of the above technical solutions, optionally, further includes: and performing connected region detection and extension detection on the corrected image to obtain rectangular frame information of the characters, and aggregating the rectangular frames of each line according to the width and height information of the preset printing font and the distance information between the characters to obtain coordinate position information of the text line.

According to the computer program 502 of any of the above solutions, optionally, the text detection model is trained by a data set made of text row pictures generated from the examination question library.

According to the embodiment, the problem of text line detection in the test paper analysis system is solved, the text line can be accurately extracted, the position of the text can be accurately positioned, and the text content can be conveniently identified. The embodiment of the application provides a test paper layout analysis and correction method of a text line detection algorithm (text detection model) based on image deep learning aiming at the characteristics of optical characters in a scanned test paper and the complexity of test paper contents, and the text line detection algorithm based on deep learning and the outline extraction algorithm based on the traditional image processing algorithm are combined by using a statistical analysis method to finish the extraction of the text line information in the test paper, so that the accuracy of content extraction and the accuracy of positioning are improved. The above-described embodiments can be used not only for detecting and recognizing test papers but also for detecting and recognizing arbitrary images based on a similar principle.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable image processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable image processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable image processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable image processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made therein without departing from the scope of the invention as defined in the appended claims and their equivalents, and it is intended that the invention encompass such changes and modifications as well.

Claims

1. An image processing method, comprising:

inputting an image with a specified format into a text detection model, wherein the text detection model outputs text content in the image and coordinate position information corresponding to the text content;

and carrying out statistical analysis on the coordinate position information, setting a threshold value according to the coordinate position information, determining a position frame corresponding to the text information according to the threshold value, and filtering out non-text information except the position frame.

2. The image processing method according to claim 1, wherein the text detection model outputs text content in the image and coordinate position information corresponding to the text content, and specifically comprises:

determining a zooming threshold value according to the paper type of the carrier of the image, zooming the image exceeding the zooming threshold value, and performing text recognition on the zoomed image by the text detection model so as to output text content in the image and coordinate position information corresponding to the text content.

3. The image processing method according to claim 1, wherein the text detection model outputs text content in the image and coordinate position information corresponding to the text content, and specifically further comprises:

and adjusting the value of a width-height ratio parameter of the character recognition network model according to the arrangement direction of the characters in the image so as to enhance the information extraction capability of the character recognition network model in the arrangement direction, and performing text recognition on the image by the text detection model after the width-height ratio parameter is adjusted so as to output text content in the image and coordinate position information corresponding to the text content.

4. The image processing method according to claim 1, wherein the text detection model outputs text content in the image and coordinate position information corresponding to the text content, and specifically further comprises:

and adjusting the value of the pixel height of a prediction frame of the character recognition network model to obtain a plurality of prediction frames with different heights so that the prediction frames can detect the text characters and punctuation marks, and after determining the plurality of prediction frames which can detect the text characters and the punctuation marks, performing text recognition on the image by the text detection model so as to output the text content in the image and the coordinate position information corresponding to the text content.

5. The image processing method according to any one of claims 1 to 4, wherein the statistically analyzing the coordinate position information, establishing a threshold according to the coordinate position information, determining a position frame corresponding to text information according to the threshold, and filtering out non-text information other than the position frame specifically includes:

determining a text box according to the coordinate position information, counting the height information of the text box, solving the mean value of the height information, and setting a height threshold value according to the mean value;

extracting a normal position frame according to the height threshold value, and extracting an abnormal position frame with the height being greater than the height threshold value or less than the height threshold value;

and filtering out the image pixels outside the normal position frame to obtain a first filtered image.

6. The image processing method of claim 5, further comprising, after obtaining the first filtered image:

counting the average value of the length of the position frame in the horizontal direction, setting a length threshold value according to the length average value, counting the average values of the starting position and the ending position of each line in the vertical direction, and setting a line threshold value according to the average values of the starting position and the ending position;

extracting a plurality of horizontal position frames in the horizontal direction according to the length threshold value, and extracting a plurality of vertical position frames in the vertical direction according to the line threshold value;

and filtering out the image pixels outside the horizontal position frame and the vertical position frame to obtain a second filtered image.

7. The image processing method of claim 6, further comprising, after obtaining the second filtered image:

calculating the inclination angle of the straight line segment at the center of each text line of the second filtered image according to a morphological processing algorithm and a connected region rectangular frame position detection algorithm;

determining layout information of the text according to the horizontal position frame and the vertical position frame;

extracting a text region image from the first filtered image according to the layout information;

and correcting the text region image according to the inclination angle to obtain a corrected image.

8. The image processing method according to any one of claims 1 to 4,

the text detection model is trained by a data set made of text row pictures generated according to an examination question bank.

9. An image processing apparatus characterized by comprising: memory, processor and program stored on the memory and executable on the processor, the program being capable of implementing the steps defined by the image processing method as claimed in any one of claims 1 to 8 when executed by the processor.

10. A terminal, comprising:

an image processing apparatus as claimed in claim 9.

11. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the steps of the image processing method according to any one of claims 1 to 8.