CN113763313A - Text image quality detection method, device, medium and electronic equipment - Google Patents

Text image quality detection method, device, medium and electronic equipment Download PDF

Info

Publication number
CN113763313A
CN113763313A CN202110484548.8A CN202110484548A CN113763313A CN 113763313 A CN113763313 A CN 113763313A CN 202110484548 A CN202110484548 A CN 202110484548A CN 113763313 A CN113763313 A CN 113763313A
Authority
CN
China
Prior art keywords
text
text image
image
scale
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110484548.8A
Other languages
Chinese (zh)
Inventor
黄鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110484548.8A priority Critical patent/CN113763313A/en
Publication of CN113763313A publication Critical patent/CN113763313A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Abstract

The application belongs to the technical field of computers, and particularly relates to a text image quality detection method and device, a computer readable medium and electronic equipment. The method comprises the following steps: detecting a character scale of the text image; when the character scale of the text image is smaller than a first preset scale, carrying out amplification processing on the text image, and when the character scale of the text image is larger than a second preset scale, carrying out reduction processing on the text image; inputting the text image into a first neural network to detect one or more text regions in the text image; performing quality score prediction on the text region to obtain the quality score of the text region; and carrying out weighted summation processing on the quality scores of the text regions based on the preset weight to obtain the quality scores of the text images. According to the method and the device, the accuracy of prediction of the quality scores of the text regions and even the text images can be improved.

Description

Text image quality detection method, device, medium and electronic equipment
Technical Field
The application belongs to the technical field of computers, and particularly relates to a text image quality detection method and device, a computer readable medium and electronic equipment.
Background
As OCR (Optical Character Recognition) technology is more and more widely applied, the quality of the acquired text image is more concerned, and the text image quality evaluation method also has attracted more extensive interest in academia and industry.
The existing evaluation method for the quality of most of images aims at natural scene images and is not suitable for text image quality evaluation, so that the evaluation method for the quality of text images needs to be provided.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present application and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.
Disclosure of Invention
The application aims to provide a text image quality detection method, a text image quality detection device, a computer readable medium and electronic equipment, which at least overcome the technical problems of inaccurate text line identification and the like in the related technology to a certain extent.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of an embodiment of the present application, there is provided a method for detecting quality of a text image, including:
detecting a character scale of the text image, wherein the character scale of the text image is an average scale of characters of the text image;
when the character scale of the text image is smaller than a first preset scale, carrying out magnification processing on the text image so that the character scale of the text image is in a preset scale range, and when the character scale of the text image is larger than a second preset scale, carrying out reduction processing on the text image so that the character scale of the text image is in a preset scale range, wherein the preset scale range is a scale range which is larger than the first preset scale and smaller than the second preset scale;
inputting the text image into a first neural network, and performing feature extraction and mapping processing on the text image through the first neural network to detect one or more text regions in the text image; a sensing window is configured in the first neural network, the sensing window is in a preset scale range, and the sensing window is used for moving on the text image to perform feature extraction on the text image; the text area is a continuous image area which is composed of part or all of characters in the text image;
performing quality score prediction on a text region in the text image to obtain the quality score of the text region;
and acquiring preset weights corresponding to the text regions, and performing weighted summation processing on the quality scores of the text regions based on the preset weights to obtain the quality scores of the text images.
According to an aspect of the embodiments of the present application, there is provided a quality detection apparatus for a text image, the quality detection apparatus including:
a character scale detection module configured to detect a character scale of the text image, the character scale of the text image being an average scale of characters of the text image;
the zooming module is configured to perform zooming-in processing on the text image so that the character scale of the text image is in a preset scale range when the character scale of the text image is smaller than a first preset scale, and perform zooming-out processing on the text image so that the character scale of the text image is in a preset scale range when the character scale of the text image is larger than a second preset scale, wherein the preset scale range is a scale range which is larger than the first preset scale and smaller than the second preset scale;
a text region detection module configured to input the text image into a first neural network, and perform feature extraction and mapping processing on the text image through the first neural network to detect one or more text regions in the text image; a sensing window is configured in the first neural network, the sensing window is in a preset scale range, and the sensing window is used for moving on the text image to perform feature extraction on the text image; the text area is a continuous image area which is composed of part or all of characters in the text image;
the quality score prediction module is configured to perform quality score prediction on a text region in the text image to obtain a quality score of the text region;
and the weighted summation module is configured to acquire preset weights corresponding to the text regions, and perform weighted summation processing on the quality scores of the text regions based on the preset weights to obtain the quality scores of the text images.
In some embodiments of the present application, based on the above technical solutions, the weighted summation module includes:
a word number acquisition unit configured to acquire the word number of each text region and sum the word numbers of each text region to obtain the total word number of the text image;
a preset weight calculation unit configured to acquire a ratio of the number of words of each of the text regions to a total number of words of the text image, respectively, as a preset weight corresponding to each of the text regions.
In some embodiments of the present application, based on the above technical solution, the text region is a text line, the text line is a continuous image region including one or more characters arranged in series in the text image, and the quality score prediction module includes:
a length-to-height ratio detection unit configured to detect a text line in the text image and a length-to-height ratio of the text line, the length-to-height ratio being a ratio of a length of the text line to a height of the text line, the length of the text line being a length of an extension line of the text line extending along a character arrangement direction in the text line, the height of the text line being a height of the text line perpendicular to the extension line;
a text line segmentation unit configured to segment a text line having an aspect ratio larger than a preset value into a plurality of text lines having an aspect ratio smaller than or equal to the preset value;
and the quality score prediction unit is configured to respectively perform feature extraction and quality score prediction on the text lines with the length-height ratios smaller than or equal to a preset value to obtain quality scores of the text lines with the length-height ratios smaller than or equal to the preset value.
In some embodiments of the present application, based on the above technical solutions, the text line segmentation unit includes:
the text line projection shadow unit is configured to project the text line with the length-height ratio larger than a preset value to the length direction of the text line to form a one-dimensional projection point set, wherein the projection of the pixel point at the position of the character in the length direction of the text line forms a real point projection, the projection of the pixel point at the position except the position of the character in the length direction of the text line forms a virtual point projection, and the real point projection and the virtual point projection form a projection point set;
and the text line segmentation subunit is configured to take segmentation points on the line segments of the virtual point projection aggregation, and segment the text lines according to the segmentation points so as to segment the text lines with the length-height ratio larger than the preset value into a plurality of text lines with the length-height ratio smaller than or equal to the preset value.
In some embodiments of the present application, based on the above technical solutions, the quality score prediction module may include:
an input unit configured to input a text region in the text image into a second neural network;
a feature extraction unit configured to perform feature extraction on the text region through the convolutional layer of the second neural network to obtain a planar feature;
the dimensionality reduction processing unit is configured to perform dimensionality reduction processing on the plane features through a pooling layer of the second neural network to obtain feature vectors;
a full-connection calculation unit configured to perform full-connection calculation on the feature vector through a full-connection layer of the second neural network to obtain a prediction quality score of the text region.
In some embodiments of the present application, based on the above technical solutions, the quality score prediction module may further include:
a data set acquisition unit configured to acquire a text recognition data set including a text image and a recognition accuracy of the text image, wherein the recognition accuracy of the text image is a ratio of the number of recognizable characters of the text image to an actual number of characters that can be correctly recognized by a character recognition model of the text image, and the actual number of characters is the number of characters actually included in the text image;
the data set labeling unit is configured to perform geometric conversion on the identification accuracy of the text image to obtain the quality score of the text image, and label the quality score of the text image into the text identification data set;
a neural network training unit configured to input the text recognition data set into the second neural network, training the second neural network.
In some embodiments of the present application, based on the above technical solution, the text region is a single character region, and the single character region is a continuous image region including one character in the text image.
According to an aspect of the embodiments of the present application, there is provided a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements a method for detecting quality of a text image as in the above technical solutions.
According to an aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor; and a memory for storing executable instructions for the processor; wherein the processor is configured to execute the method for detecting the quality of the text image as in the above technical solution via executing the executable instructions.
According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method for detecting the quality of the text image according to the above technical scheme.
In the technical scheme provided by the embodiment of the application, the character scale of the text image is in the preset scale range by performing amplification processing or reduction processing on the text image with the character scale exceeding the preset scale range, and the sensing window of the first neural network is also in the preset scale range. Therefore, the matching degree of the sensing window of the first neural network and the character size of the text image is high, the accuracy of detecting the text region in the text image can be improved, the accuracy of predicting the text region and the quality score of the text image can be further improved, and meanwhile the robustness of the text image quality detection scheme of the embodiment of the application is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the solution of the present application applies.
Fig. 2 schematically illustrates a flow chart of steps of a quality detection method according to certain embodiments of the present application.
Fig. 3 schematically shows a comparison between the detection effect of an embodiment of the present application and the detection effect of the related art.
Fig. 4 schematically shows a flowchart of a step of performing quality score prediction on a text region in a text image to obtain a quality score of the text region in an embodiment of the present application.
Fig. 5 schematically illustrates a process diagram of a second neural network performing feature extraction, dimensionality reduction, and full-connection computation on a text region and obtaining a quality score of the text region according to an embodiment of the present application.
Fig. 6 schematically shows a partial flow chart of steps before a text region in a text image is input into a second neural network in the quality detection method according to an embodiment of the present application.
Fig. 7 schematically illustrates a flowchart of a step of performing quality score prediction on a text region in a text image to obtain a quality score of the text region in an embodiment of the present application.
Fig. 8 is a flowchart schematically illustrating a step of dividing a text line having an aspect ratio larger than a preset value into a plurality of text lines having an aspect ratio smaller than or equal to a preset value in an embodiment of the present application.
Fig. 9 schematically shows a flowchart of steps of obtaining preset weights corresponding to respective text regions in an embodiment of the present application.
Fig. 10 schematically illustrates a scene diagram for detecting text lines in a text image and the quality scores and the number of characters of the text lines according to an embodiment of the present application.
Fig. 11 schematically illustrates a scene diagram for detecting text lines and the aspect ratio of the text lines in a text image according to an embodiment of the present application.
Fig. 12 schematically shows a block diagram of a text image quality detection apparatus according to an embodiment of the present application.
Fig. 13 schematically shows a structural block diagram of a computer system of an electronic device for implementing the embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.
Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the solution of the present application applies.
As shown in fig. 1, system architecture 100 may include a terminal device 110, a network 120, and a server 130. The terminal device 110 may include various electronic devices such as a smart phone, a tablet computer, a notebook computer, and a desktop computer. The server 130 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. Network 120 may be a communication medium of various connection types capable of providing a communication link between terminal device 110 and server 130, such as a wired communication link or a wireless communication link.
The system architecture in the embodiments of the present application may have any number of terminal devices, networks, and servers, according to implementation needs. For example, the server 130 may be a server group composed of a plurality of server devices. In addition, the technical solution provided in the embodiment of the present application may be applied to the terminal device 110, or may be applied to the server 130, or may be implemented by both the terminal device 110 and the server 130, which is not particularly limited in this application.
For example, the server 130 may be loaded with the text image quality detection method according to the embodiment of the present application, and after the user uploads the text image through the terminal device 110, the server may implement the quality detection method according to the embodiment of the present application to detect the quality of the text image. Therefore, the matching degree of the sensing window of the first neural network and the character size of the text image is high, the accuracy of detecting the text region in the text image can be improved, the accuracy of predicting the text region and the quality score of the text image can be further improved, and meanwhile the robustness of the text image quality detection scheme of the embodiment of the application is improved.
In addition, according to the technical scheme provided by the embodiment of the application, quality score prediction is performed on the text regions in the text image to obtain quality scores of the text regions, then the preset weights corresponding to the text regions are obtained, and the quality scores of the text regions are subjected to weighted summation processing based on the preset weights to obtain the quality scores of the text image, so that the problem of quality score prediction of the text image is converted into the problems of quality score prediction of the text regions and weight obtaining corresponding to each text region. Therefore, compared with the method of directly performing quality score prediction on the text image, the method performs quality score prediction on the text region first, can eliminate the influence of the region which does not contain characters in the text image on the quality score prediction result, and has higher quality score prediction accuracy. Meanwhile, different text regions can have different preset weights, and the weighted sum processing is carried out on the quality scores of the text regions based on the preset weights, so that the robustness of the text image quality detection scheme of the embodiment of the application can be further improved.
The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, and a big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted device, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
With the widespread use of smart devices in our daily lives, it is often necessary to submit mobile captured text images in the business processes of companies, so that the number of text images is rapidly increasing. Thus, intelligent document identification becomes increasingly important for business process automation. However, smart document recognition is very sensitive to text image quality. Since the text image quality may be low due to unavoidable distortion during the image capturing process, the recognition accuracy of the captured text image is usually reduced, which may seriously hinder the subsequent business process. For example, in an online insurance underwriting business, if a low quality document image submitted for a claim is not detected as soon as possible, it needs to be immediately recaptured. Because the image of the underwriting document cannot be obtained once the paper file is lost or the user is not providing the image cooperatively, the key information may be lost in the business process. Since the quality of text images uploaded by users is uneven, it is necessary to evaluate the quality of such text images in advance to reject text images of low quality.
The following describes the quality detection method provided by the present application in detail with reference to specific embodiments.
Fig. 2 schematically illustrates a flow chart of steps of a quality detection method according to certain embodiments of the present application. The execution main body of the quality detection method can be terminal equipment, a server and the like, and the quality detection method is not limited in the application. As shown in fig. 2, the quality detection method may mainly include the following steps S210 to S250.
S210, detecting the character scale of the text image, wherein the character scale of the text image is the average scale of the characters of the text image;
s220, when the character scale of the text image is smaller than a first preset scale, performing amplification processing on the text image to enable the character scale of the text image to be in a preset scale range, and when the character scale of the text image is larger than a second preset scale, performing reduction processing on the text image to enable the character scale of the text image to be in the preset scale range, wherein the preset scale range is a scale range which is larger than the first preset scale and smaller than the second preset scale;
s230, inputting the text image into a first neural network, and performing feature extraction and mapping processing on the text image through the first neural network to detect one or more text regions in the text image; the first neural network is configured with a sensing window, the sensing window is in a preset scale range, and the sensing window is used for moving on the text image to extract the characteristics of the text image; the text area is a continuous image area consisting of part or all of characters in the text image;
s240, performing quality score prediction on a text region in the text image to obtain the quality score of the text region;
and S250, acquiring preset weights corresponding to the text regions, and performing weighted summation processing on the quality scores of the text regions based on the preset weights to obtain the quality scores of the text images.
The text image is an image containing text, that is, an image containing characters. The character scale of the text image is an average scale of the characters of the text image. Specifically, the dimension may be height or area, or the like. In some implementations, the average scale of the characters of the text image can be an average of the character heights of each character in the text image; in other embodiments, the average scale of the characters of the text image may be an average of the character area of each character in the text image. In a specific embodiment, a MSER (maximum Stable extreme area) detector may be used as a single character detector to detect a single character in a text image. Then, the average height of all the characters in the text image is obtained, and according to the obtained average height, as described in step S220, the original image is adaptively scaled so that the character scale of the text image is within the preset scale range to adapt to the scale size of the sensing window of the first neural network, thereby improving the accuracy of the first neural network in detecting the text region. The sensing window is a matrix obtained by performing convolution processing on data by a convolution layer in the convolutional neural network, namely a sensing field of the convolutional neural network.
In some embodiments, after the text image is input to the first neural network, the feature extraction unit of the first neural network performs feature extraction on the text image, and then the mapping processing unit of the first neural network performs mapping processing, so that one or more text regions in the text image can be detected. In a specific embodiment, the first neural network may be a PSE (Progressive Scale Expansion) segmentation network, that is, a PSENet. Therefore, the multi-direction, multi-angle and multi-scale detection of the text image can be realized, the large-scale characters and the small-scale characters of the text image with the characters with several times of scale difference can be detected and distinguished, and the text image with the character arrangement direction of slant, bending, inversion and the like can also be detected and distinguished. Referring to fig. 3, fig. 3 schematically illustrates a comparison between the detection effects of an embodiment of the present application and the related art. As shown in fig. 3, the detection of the text region is performed on the same text image by using the related art and some embodiments of the present application, where the text region is a text line. The text image 310 shows that in the related art, only text lines in a single direction (horizontal direction as shown in fig. 3) can be detected, longitudinal text lines cannot be detected, and a character "honor teachers and religion" with a larger scale cannot be completely detected, and certain edge loss exists. The text image 320 shows that the technical scheme of the application can realize the detection of multiple directions (horizontal and longitudinal), multiple angles (horizontal angle and vertical angle) and multiple scales (large scale and small scale) of the text image, can realize the complete detection of text lines, and can greatly reduce the edge loss of the text lines and characters in the detection process. Besides the PSE segmentation Network, the first neural Network may also be another CNN (Cable News Network) based neural Network, which is used for detecting a text region in a text image, and this application does not limit this.
It will be appreciated that unlike scene images, text images are inherently more focused on text. Therefore, in the embodiment of the application, the quality scores of the text images are obtained by performing weighted summation processing on the quality scores of the text regions based on the preset weights, and the quality of the text images can be reflected more accurately.
Fig. 4 schematically shows a flowchart of a step of performing quality score prediction on a text region in a text image to obtain a quality score of the text region in an embodiment of the present application. As shown in fig. 4, on the basis of the above embodiment, the performing quality score prediction on the text region in the text image in step S240 to obtain the quality score of the text region may further include the following steps S410 to S440.
S410, inputting a text region in the text image into a second neural network;
s420, extracting features of the text region through the convolutional layer of the second neural network to obtain plane features;
s430, performing dimensionality reduction on the plane features through a pooling layer of a second neural network to obtain feature vectors;
s440, performing full-connection calculation on the feature vectors through a full-connection layer of the second neural network to obtain the prediction quality score of the text region.
In some embodiments, the structure of the second neural network may include a convolutional layer, a pooling layer, and a fully-connected layer.
Specifically, fig. 5 schematically illustrates a process of performing feature extraction, dimensionality reduction processing, and full-connection calculation on a text region by using a second neural network according to an embodiment of the present application, and obtaining a predicted quality score of the text region. Referring to fig. 5, the convolutional layer of the second neural network performs feature extraction on the text region through the steps of convolution, maximum pooling and convolution to obtain the first planar feature. Then, the pooling layer of the second neural network obtains corresponding feature vectors by pooling the maximum and minimum values of the first planar features. And then, the full-connection layer of the second neural network obtains the quality score corresponding to the text region through full-connection calculation of the feature vector.
In some implementations, the text regions can be lines of text. The second neural network may be a DIQA (Deep CNN-Based Blank Image Quality Predictor) framework Based on text lines. In a specific embodiment, the second neural network can be constructed on the basis of ResNet (residual error network), and because ResNet has excellent feature representation capability, the feature extraction can be performed on the text image more accurately, so that the accuracy of the prediction of the quality score can be improved. The second neural network may also be other CNN networks such as VGG (Visual Geometry Group) networks.
In some embodiments, a text line having an aspect ratio greater than a preset value may be segmented into a plurality of text lines having an aspect ratio less than or equal to a preset value. Notably, the size of the text image is typically much larger than the image size accepted by the deep convolutional neural network. In order to meet the condition that the size of an image received by a deep convolutional neural network is relatively fixed, if the size of a text image is adjusted in an undersampling mode before detection, a text with smaller characters may become fuzzy or even difficult to identify after undersampling, and finally detection of a text line is inaccurate, so that quality score prediction of the text image is relatively inaccurate.
Therefore, in order to avoid the deterioration caused by undersampling of text lines, in the embodiment of the present application, the first neural network performs feature extraction and mapping processing on the text image, detects a text region in the text image, and inputs the text region into the second neural network, that is, the deep convolutional neural network, to perform quality score prediction. Therefore, the text region in the text image is detected, and then the text region is input into the second neural network for quality component prediction, so that the text image is decomposed into one or more text regions and then input into the second neural network, the problem of overlarge image caused by directly inputting the text image into the neural network can be avoided, and image deterioration caused by undersampling the text image before detection can be avoided. Moreover, the input images are text regions, most of information in the images of the text regions is text information, the extraction and analysis of features by the second neural network are facilitated, and the accuracy and robustness of the prediction of the second neural network on the image quality can be improved. Based on the above effects, it can be understood that the text image quality detection method according to the embodiment of the application has a relatively accurate quality detection effect on the quality detection of business document images such as insurance underwriting and contracts, and also has a relatively accurate quality detection effect in natural text image quality evaluation scenes such as bank card shot pictures, medical record list shot pictures, invoice shot pictures and the like which have relatively high interference and relatively high recognition difficulty of characters.
In the second neural network, the L2 loss can be used as an estimated loss to describe the difference between the predicted and true quality. Specifically, the estimated loss L2 is defined as:
Figure BDA0003049784050000121
wherein q is the predicted quality score, qgtIs the quality score of the text image label in the text recognition data set, and when the conversion ratio of the equal ratio conversion is 1, q isgtI.e. the accuracy of the text recognition of the text image.
Before training the second neural network, the parameters of the fully-connected layer may be randomly initialized under a uniform normal distribution in the range of (-0.1, 0.1).
Fig. 6 schematically shows a partial flow chart of steps before a text region in a text image is input into a second neural network in the quality detection method according to an embodiment of the present application. As shown in fig. 6, on the basis of the above embodiment, before the text region in the text image is input into the second neural network in step S410, the following steps S610 to S630 may be further included.
S610, acquiring a text recognition data set, wherein the text recognition data set comprises a text image and the recognition accuracy of the text image, the recognition accuracy of the text image is the ratio of the number of recognizable characters of the text image to the actual number of characters, the number of recognizable characters is the number of characters which can be correctly recognized by a character recognition model of the text image, and the actual number of characters is the number of characters actually included in the text image;
s620, carrying out geometric proportion conversion on the identification accuracy of the text image to obtain the quality score of the text image, and marking the quality score of the text image into a text identification data set;
and S630, inputting the text recognition data set into a second neural network, and training the second neural network.
Specifically, the identification accuracy of the text image is subjected to geometric conversion to obtain the quality score of the text image, which may be that the identification accuracy of the text image is subjected to geometric conversion to obtain a percentage quality score, and the quality score of the text image is labeled in the text identification data set. The text recognition dataset is used to train the second neural network to improve the accuracy of the prediction of the quality score by the second neural network.
The conversion ratio of the geometric conversion may be 0.1, 1, 10, 100, or the like, and the present application is not limited thereto.
In some embodiments, the text image in the text recognition data set may be a text line image, i.e., an image containing one or more characters arranged in series. Wherein, the characters arranged in series are also the characters arranged in series one by one. Specifically, the serially arranged characters may be a single row of characters or a single column of characters. Therefore, the method can be understood that the second neural network is trained on the basis of the text line image, the background clutter and the noise of the image can be reduced, the influence of the background clutter and the noise of the image on the quality score can be reduced, and the accuracy of quality score prediction can be improved. In addition, in some embodiments of the present application, the text region may be a text line, and at this time, the text line in the text image is detected first, and then the quality score prediction is performed on the text line in the text image according to the second neural network; in this case, if the second neural network is trained based on a single-line text image, the attribute of the input image in the training process of the second neural network is close to the attribute of the input image in the prediction quality score using the second neural network, and the input image is an image including one or more serially arranged characters, so that the prediction accuracy of the prediction quality score using the second neural network for the text region is improved.
In some embodiments, the text image in the text recognition data set may be from an artificial image synthesized using an algorithm such as fuzzy noise, and the quality score of the text image in the text recognition data set may be automatically generated during the fuzzy noise synthesis process.
In other embodiments, the text images in the text recognition dataset may be from real text images and the quality scores of the text images in the text recognition dataset may be from manual scoring.
In still other embodiments, the text image in the text Recognition data set may be from Recognition data of a real text image by an OCR (Optical Character Recognition) model in a text Recognition task. In the embodiment, the recognition accuracy of the text image is subjected to geometric conversion to obtain the quality score of the text image, and the quality score of the text image is labeled to the text recognition data set. Moreover, subjective scoring of the text image is difficult, a data set of the quality score of the image is relatively lack, a data set of the recognition accuracy rate of the text recognition task to the image is very common, the scheme adopts the data set of the recognition accuracy rate of the text recognition task to the image, and the quality score of the text image is converted according to the recognition accuracy rate of the text image. Meanwhile, the real text image is more in line with the actual scene than the artificial image synthesized by the algorithms such as the fuzzy noise and the like, and the parameter training effect on the second neural network is favorably improved, so that the prediction accuracy of the text region prediction quality score by using the second neural network is favorably improved.
Fig. 7 schematically illustrates a flowchart of a step of performing quality score prediction on a text region in a text image to obtain a quality score of the text region in an embodiment of the present application. As shown in fig. 7, based on the above embodiment, the text region is a text line, the text line is a continuous image region containing one or more characters arranged in series in the text image, and the quality score of the text region is obtained by performing quality score prediction on the text region in the text image in step S240, which may further include the following steps S710 to S730.
S710, detecting text lines in the text image and the length-height ratio of the text lines, wherein the length-height ratio is the ratio of the length to the height of the text lines, the length of the text lines is the length of extension lines extending along the arrangement direction of characters in the text lines, and the height of the text lines is the height of the text lines perpendicular to the extension lines;
s720, dividing the text lines with the length-height ratio larger than the preset value into a plurality of text lines with the length-height ratio smaller than or equal to the preset value;
and S730, respectively carrying out feature extraction and quality score prediction on a plurality of text lines with the length-height ratio smaller than or equal to a preset value to obtain quality scores of the plurality of text lines with the length-height ratio smaller than or equal to the preset value.
According to some embodiments of the application, the text line with the length-height ratio larger than the preset value is divided into a plurality of text lines with the length-height ratio smaller than or equal to the preset value, and the text line with the length longer and more characters can be divided into the text line with the length shorter and the length-height ratio smaller than or equal to the preset value, so that the long text is adaptively divided, the situation that the overlong text line is input into the second neural network is avoided, and the prediction accuracy of the second neural network can be greatly improved.
In some embodiments, the text image in the text recognition data set may be a text line image, i.e., an image containing one or more characters arranged in series, and more specifically, the aspect ratio of the text line image may be less than or equal to a preset value. Therefore, the second neural network can enable the aspect ratio of the processed image to be less than or equal to the preset value in the stage of training by adopting the text recognition data set and the stage of predicting the quality score of the text line, so that the optimization of the second neural network is facilitated, the calculated amount of the second neural network can be reduced, and the prediction efficiency of the quality score prediction of the text line is improved.
Fig. 8 is a flowchart schematically illustrating a step of dividing a text line having an aspect ratio larger than a preset value into a plurality of text lines having an aspect ratio smaller than or equal to a preset value in an embodiment of the present application. As shown in fig. 8, on the basis of the above embodiment, the step S720 of dividing the text line with the aspect ratio larger than the preset value into a plurality of text lines with the aspect ratio smaller than or equal to the preset value may further include the following steps S810 to S820.
S810, projecting the text line with the length-height ratio larger than the preset value to the length direction of the text line to form a one-dimensional projection point set, wherein the projection of the pixel point at the position of the character in the length direction of the text line forms a real point projection, the projection of the pixel points at the positions except the position of the character in the length direction of the text line forms an imaginary point projection, and the real point projection and the imaginary point projection form a projection point set;
and S820, taking segmentation points on the line segments projected and gathered by the virtual points, and segmenting the text lines according to the segmentation points so as to segment the text lines with the length-height ratio larger than the preset value into a plurality of text lines with the length-height ratio smaller than or equal to the preset value.
Specifically, the real-point projection may be a black-point pixel, the virtual-point projection may be a white-point pixel, and the one-dimensional set of projection points may be a line segment composed of black-point pixels, a line segment composed of white-point pixels, or a line segment composed of black-point pixels and white-point pixels. The segmentation points are taken from the line segments of the virtual point projection aggregation, the text lines are segmented according to the segmentation points, it can be understood that the line segments of the virtual point projection aggregation are projection line segments of some pixel points at positions except the positions of the characters in the length direction of the text lines, and the segmentation points are taken from the line segments of the virtual point projection aggregation, so that the text lines are segmented at positions except the positions of the characters, thereby avoiding segmenting a single character into two different text lines, further causing difficulty in character identification and reducing accuracy of quality score prediction.
Fig. 9 schematically shows a flowchart of steps of obtaining preset weights corresponding to respective text regions in an embodiment of the present application. As shown in fig. 9, on the basis of the above embodiment, acquiring the preset weights corresponding to the respective text regions in step S250 may further include the following steps S910 to S920.
S910, acquiring the word number of each text area, and summing the word numbers of the text areas to obtain the total word number of the text image;
and S920, respectively obtaining the ratio of the word number of each text area to the total word number of the text image, and taking the ratio as a preset weight corresponding to each text area.
It can be understood that the content of the longer text lines in the text image is more, and the influence of the quality of the longer text lines on the quality of the text image is larger, so that the ratio of the number of words in each text region to the total number of words in the text image is respectively obtained, and the ratio is used as the preset weight corresponding to each text region, so that the preset weight of the longer text lines is larger, the text image quality detection method in the embodiment of the present application conforms to the decision logic of actual quality determination, the accuracy of the text image quality detection can be improved, and the detection efficiency of the text image quality detection method is improved.
In particular embodiments, the quality score of a text image
Figure BDA0003049784050000161
Can be as follows:
Figure BDA0003049784050000162
wherein q isjPredicted quality score for jth line of text, wjA preset weight for the jth text line in the text image. In some embodiments of the present invention, the,
Figure BDA0003049784050000163
may be equal to the sum of the predicted quality scores of all text lines multiplied by the corresponding preset weights. w is ajThe definition of (a) can be as follows:
Figure BDA0003049784050000164
wherein R is(j)Is the number of characters, sigma, of the jth text linekR(k)For a total of k text lines in a text imageThe sum of the number of characters of k text lines. In some embodiments, the number of words in a text line may be detected by a single word detector, such as a MSER detector. In other embodiments, the number of words R in a text line(k)The aspect ratio of the text line is approximated, for example:
Figure BDA0003049784050000165
wherein line _ w is the length of the text line, and line _ h is the height of the text line. The length of the text line is the length of an extension line of the text line extending along the direction in which characters in the text line are arranged, and the height of the text line is the height of the text line perpendicular to the extension line. Therefore, the number of characters in the text line is approximately calculated by adopting the aspect ratio of the text line, the detection flow of the quality detection method of the embodiment of the application can be simplified, and the detection amount is reduced.
Alternatively, in some embodiments, obtaining the preset weight corresponding to each text region includes: acquiring the length-height ratio of each text region, and summing the length-height ratios of the text regions to obtain a total length-height ratio; and acquiring the ratio of the length-height ratio of each text region to the total length-height ratio, and taking the ratio as a preset weight corresponding to each text region. Therefore, the preset weight corresponding to each text region is directly calculated by adopting the aspect ratio of the text line, the calculated amount can be reduced, and the detection efficiency of the quality detection method is improved.
Fig. 10 schematically illustrates a scene diagram for detecting text lines in a text image and the quality scores and the number of characters of the text lines according to an embodiment of the present application. Referring to fig. 10, text line detection is performed on a text image, then quality score prediction and number of characters are performed on the text line in the text image, and results of the quality score prediction and number of characters detection of the text line are displayed below the corresponding text line. For example, the number of characters in the text line "woman" is 2, and the prediction quality is 0.8. In this case, the predicted quality score of the text image may be calculated from the formula (1) of the quality score of the text image, the quality score prediction of the text line, and the result of the character number detection.
Fig. 11 schematically illustrates a scene diagram for detecting text lines and the aspect ratio of the text lines in a text image according to an embodiment of the present application. Referring to fig. 11, text line detection is performed on a text image, and then quality score prediction and aspect ratio detection are performed on the text line in the text image, and the results of the quality score prediction and aspect ratio detection of the text line are displayed below the corresponding text line. For example, the text line "ultrasound description: "has a length to height ratio of 4.4 and a predicted mass of 0.848. In this case, the predicted quality score of the text image may be calculated from the formula (1) of the quality score of the text image and the quality score and aspect ratio of the text line.
In some embodiments, the text region is a single-word region, and the single-word region is a continuous image region containing one character in the text image. Then, performing quality score prediction on the single character areas in the text image to obtain the quality score of each single character area, then obtaining the preset weight corresponding to each single character area, and performing weighting summation processing on the quality scores of the single character areas based on the preset weight to obtain the quality score of the text image. Wherein, the preset weight of the single character area is 1. Alternatively, the preset weight of the single-word region may be determined by a ratio of an area of the single-word region to a sum of areas of the single-word regions, for example:
quality score of text image
Figure BDA0003049784050000171
Can be as follows:
Figure BDA0003049784050000172
wherein q isjFor the predicted quality score, v, of the jth line of textjThe preset weight is the preset weight of the jth single word region in the text image. In some embodiments of the present invention, the,
Figure BDA0003049784050000173
the sum of the predicted quality scores of all the single-word regions multiplied by the corresponding preset weights may be equal. v. ofjThe definition of (a) can be as follows:
Figure BDA0003049784050000174
wherein S is(j)Is the area size of the jth single word region, sigmakS(k)The text image has k single-word regions, and the sum of the areas of the k single-word regions is the sum of the areas of the k single-word regions. Therefore, the importance degree of characters with large areas in the text image is high, and the influence of the quality of the characters with large areas on the quality of the text image is large, so that the preset weights corresponding to the individual character areas are obtained, the quality scores of the text image are obtained by performing weighted summation on the quality scores of the individual character areas based on the preset weights, the preset weights of the individual character areas with large areas can be large, the text image quality detection method in the embodiment of the application accords with the judgment logic of actual quality judgment, and the accuracy of quality detection of the text image can be improved.
In some embodiments, the first neural network and the second neural network may be merged into the same deep neural network, and feature values obtained by feature extraction of the text image by the first neural network are shared in the second neural network. Specifically, the method for detecting the quality of the text image may further include:
and inputting a characteristic value obtained by the first neural network through characteristic extraction on the text image into the second neural network, wherein the characteristic value is used for helping the second neural network to obtain one or more of plane characteristics, characteristic vectors and prediction quality scores through the convolution layer through the characteristic extraction on the text region.
Therefore, the calculation amount of the text image quality detection method according to the embodiment of the application can be reduced, and the text image quality score detection efficiency can be improved.
It should be noted that although the various steps of the methods in this application are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the shown steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
The following describes embodiments of the apparatus of the present application, which can be used to perform the method for detecting the quality of text images in the above embodiments of the present application. Fig. 12 schematically shows a block diagram of a text image quality detection apparatus according to an embodiment of the present application. As shown in fig. 12, the text image quality detection apparatus 1200 includes:
a character scale detection module 1210 configured to detect a character scale of the text image, the character scale of the text image being an average scale of characters of the text image;
a scaling module 1220, configured to, when a character scale of the text image is smaller than a first preset scale, perform an enlargement process on the text image so that the character scale of the text image is within a preset scale range, and when the character scale of the text image is larger than a second preset scale, perform a reduction process on the text image so that the character scale of the text image is within the preset scale range, where the preset scale range is a scale range that is larger than the first preset scale and smaller than the second preset scale;
a text region detection module 1230 configured to input the text image into the first neural network, and perform feature extraction and mapping processing on the text image through the first neural network to detect a text region in the text image; the first neural network is configured with a sensing window, the sensing window is in a preset scale range, and the sensing window is used for moving on the text image to extract the characteristics of the text image; the text area is a continuous image area consisting of part or all of characters in the text image;
the quality score prediction module 1240 is configured to perform quality score prediction on the text region in the text image to obtain the quality score of the text region;
and a weighted summation module 1250 configured to obtain preset weights corresponding to the text regions, and perform weighted summation processing on the quality scores of the text regions based on the preset weights to obtain the quality scores of the text images.
In some embodiments of the present application, based on the above embodiments, the weighted sum module includes:
a word number acquisition unit configured to acquire the word numbers of the respective text regions and perform a summation operation on the word numbers of the respective text regions to obtain a total word number of the text image;
a preset weight calculation unit configured to obtain a ratio of the number of words of each text region to the total number of words of the text image, respectively, as a preset weight corresponding to each text region.
In some embodiments of the present application, based on the above embodiments, the text region is a text line, and the text line is a continuous image region in the text image containing one or more serially arranged characters, and the quality score prediction module includes:
an aspect ratio detection unit configured to detect a text line in the text image and an aspect ratio of the text line, the aspect ratio being a ratio of a length of the text line to a height, the length of the text line being a length of an extension line of the text line extending along a character arrangement direction in the text line, the height of the text line being a height of the text line perpendicular to the extension line;
a text line segmentation unit configured to segment a text line having an aspect ratio larger than a preset value into a plurality of text lines having an aspect ratio smaller than or equal to the preset value;
and the quality score prediction unit is configured to respectively perform feature extraction and quality score prediction on a plurality of text lines with the length-height ratio smaller than or equal to a preset value to obtain a plurality of quality scores of the text lines with the length-height ratio smaller than or equal to the preset value.
In some embodiments of the present application, based on the above embodiments, the text line segmentation unit includes:
the text line projection shadow unit is configured to project a text line with a length-height ratio larger than a preset value to the length direction of the text line to form a one-dimensional projection point set, wherein the projection of pixel points at the positions of characters on the length direction of the text line forms real point projection, the projection of pixel points at positions except the positions of the characters on the length direction of the text line forms virtual point projection, and the real point projection and the virtual point projection form a projection point set;
and the text line segmentation subunit is configured to take segmentation points on the line segments of the virtual point projection aggregation and segment the text lines according to the segmentation points so as to segment the text lines with the length-height ratio larger than the preset value into a plurality of text lines with the length-height ratio smaller than or equal to the preset value.
In some embodiments of the present application, based on the above embodiments, the quality score prediction module may include:
an input unit configured to input a text region in the text image into a second neural network;
the feature extraction unit is configured to perform feature extraction on the text region through the convolutional layer of the second neural network to obtain a plane feature;
the dimensionality reduction processing unit is configured to perform dimensionality reduction processing on the plane features through a pooling layer of the second neural network to obtain feature vectors;
and the full-connection calculating unit is configured to perform full-connection calculation on the feature vectors through a full-connection layer of the second neural network to obtain the prediction quality score of the text region.
In some embodiments of the present application, based on the above embodiments, the quality score prediction module may further include:
a data set acquisition unit configured to acquire a text recognition data set including a text image and a recognition accuracy of the text image, wherein the recognition accuracy of the text image is a ratio of the number of recognizable characters of the text image to the number of actual characters, the number of recognizable characters is a number of characters that the text image can be correctly recognized by a character recognition model, and the number of actual characters is a number of characters actually included in the text image;
the data set labeling unit is configured to perform geometric conversion on the identification accuracy of the text image to obtain the quality score of the text image, and label the quality score of the text image into the text identification data set;
and the neural network training unit is configured to input the text recognition data set into a second neural network and train the second neural network.
In some embodiments of the present application, based on the above embodiments, the text region is a single-word region, and the single-word region is a continuous image region including one character in the text image.
The specific details of the text image quality detection apparatus provided in each embodiment of the present application have been described in detail in the corresponding method embodiment, and are not described herein again.
Fig. 13 schematically shows a structural block diagram of a computer system of an electronic device for implementing the embodiment of the present application.
It should be noted that the computer system 1300 of the electronic device shown in fig. 13 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 13, the computer system 1300 includes a Central Processing Unit (CPU) 1301 that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1302 or a program loaded from a storage section 1308 into a Random Access Memory (RAM) 1303. In the random access memory 1303, various programs and data necessary for system operation are also stored. The cpu 1301, the rom 1302, and the ram 1303 are connected to each other via a bus 1304. An Input/Output interface 1305(Input/Output interface, i.e., I/O interface) is also connected to the bus 1304.
The following components are connected to the input/output interface 1305: an input portion 1306 including a keyboard, a mouse, and the like; an output section 1307 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 1308 including a hard disk and the like; and a communication section 1309 including a network interface card such as a local area network card, modem, or the like. The communication section 1309 performs communication processing via a network such as the internet. The driver 1310 is also connected to the input/output interface 1305 as necessary. A removable medium 1311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1310 as necessary, so that a computer program read out therefrom is mounted into the storage portion 1308 as necessary.
In particular, according to embodiments of the present application, the processes described in the various method flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications component 1309 and/or installed from removable media 1311. When executed by the central processor 1301, the computer programs perform various functions defined in the system of the present application.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for detecting the quality of a text image is characterized by comprising the following steps:
detecting a character scale of the text image, wherein the character scale of the text image is an average scale of characters of the text image;
when the character scale of the text image is smaller than a first preset scale, carrying out magnification processing on the text image so that the character scale of the text image is in a preset scale range, and when the character scale of the text image is larger than a second preset scale, carrying out reduction processing on the text image so that the character scale of the text image is in a preset scale range, wherein the preset scale range is a scale range which is larger than the first preset scale and smaller than the second preset scale;
inputting the text image into a first neural network, and performing feature extraction and mapping processing on the text image through the first neural network to detect one or more text regions in the text image; a sensing window is configured in the first neural network, the sensing window is in a preset scale range, and the sensing window is used for moving on the text image to perform feature extraction on the text image; the text area is a continuous image area which is composed of part or all of characters in the text image;
performing quality score prediction on a text region in the text image to obtain the quality score of the text region;
and acquiring preset weights corresponding to the text regions, and performing weighted summation processing on the quality scores of the text regions based on the preset weights to obtain the quality scores of the text images.
2. The method according to claim 1, wherein the obtaining of the preset weight corresponding to each text region comprises:
acquiring the word number of each text region, and summing the word numbers of the text regions to obtain the total word number of the text image;
and respectively acquiring the ratio of the word number of each text region to the total word number of the text image, and taking the ratio as a preset weight corresponding to each text region.
3. The method according to claim 1, wherein the text region is a text line, the text line is a continuous image region containing one or more characters arranged in series in the text image, and the quality score of the text region is obtained by performing quality score prediction on the text region in the text image, the method comprising:
detecting a text line in the text image and a length-to-height ratio of the text line, wherein the length-to-height ratio is a ratio of a length of the text line to a height of the text line, the length of the text line is a length of an extension line extending along a character arrangement direction in the text line, and the height of the text line is a height of the text line perpendicular to the extension line;
dividing the text lines with the length-height ratio larger than a preset value into a plurality of text lines with the length-height ratio smaller than or equal to the preset value;
and respectively carrying out feature extraction and quality score prediction on the text lines with the length-height ratios smaller than or equal to the preset value to obtain the quality scores of the text lines with the length-height ratios smaller than or equal to the preset value.
4. The method according to claim 3, wherein the dividing the text line having the aspect ratio greater than the predetermined value into a plurality of text lines having the aspect ratio less than or equal to the predetermined value comprises:
projecting the text line with the length-height ratio larger than the preset value to the length direction of the text line to form a one-dimensional projection point set, wherein the projection of the pixel point at the position of the character in the length direction of the text line forms a real point projection, the projection of the pixel points at the positions except the position of the character in the length direction of the text line forms an imaginary point projection, and the real point projection and the imaginary point projection form the projection point set;
and taking segmentation points on the line segments projected and gathered by the virtual points, and segmenting the text lines according to the segmentation points so as to segment the text lines with the length-height ratio larger than a preset value into a plurality of text lines with the length-height ratio smaller than or equal to the preset value.
5. The method according to claim 1, wherein the performing quality score prediction on the text region in the text image to obtain the quality score of the text region comprises:
inputting a text region in the text image into a second neural network;
performing feature extraction on the text region through the convolutional layer of the second neural network to obtain a plane feature;
performing dimensionality reduction processing on the plane features through a pooling layer of the second neural network to obtain feature vectors;
and performing full-connection calculation on the feature vectors through a full-connection layer of the second neural network to obtain the prediction quality score of the text region.
6. The method of detecting the quality of a text image according to claim 5, wherein before the text region in the text image is input into a second neural network, the method further comprises:
acquiring a text recognition data set, wherein the text recognition data set comprises a text image and the recognition accuracy of the text image, the recognition accuracy of the text image is the ratio of the number of recognizable characters of the text image to the actual number of characters, the number of recognizable characters is the number of characters which can be correctly recognized by a character recognition model of the text image, and the actual number of characters is the number of characters actually included in the text image;
carrying out geometric conversion on the identification accuracy of the text image to obtain the quality score of the text image, and marking the quality score of the text image into the text identification data set;
inputting the text recognition data set into the second neural network, and training the second neural network.
7. The method according to claim 1, wherein the text region is a single character region, and the single character region is a continuous image region containing one character in the text image.
8. An apparatus for detecting a quality of a text image, comprising:
a character scale detection module configured to detect a character scale of the text image, the character scale of the text image being an average scale of characters of the text image;
the zooming module is configured to perform zooming-in processing on the text image so that the character scale of the text image is in a preset scale range when the character scale of the text image is smaller than a first preset scale, and perform zooming-out processing on the text image so that the character scale of the text image is in a preset scale range when the character scale of the text image is larger than a second preset scale, wherein the preset scale range is a scale range which is larger than the first preset scale and smaller than the second preset scale;
the text region detection module is configured to input the text image into a first neural network, and perform feature extraction and mapping processing on the text image through the first neural network so as to detect a text region in the text image; a sensing window is configured in the first neural network, the sensing window is in a preset scale range, and the sensing window is used for moving on the text image to perform feature extraction on the text image; the text area is a continuous image area which is formed by part or all of characters in the text image;
the quality score prediction module is configured to perform quality score prediction on a text region in the text image to obtain a quality score of the text region;
and the weighted summation module is configured to acquire preset weights corresponding to the text regions, and perform weighted summation processing on the quality scores of the text regions based on the preset weights to obtain the quality scores of the text images.
9. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out a method of quality detection of a text image according to any one of claims 1 to 7.
10. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of quality detection of a text image of any one of claims 1 to 7 via execution of the executable instructions.
CN202110484548.8A 2021-04-30 2021-04-30 Text image quality detection method, device, medium and electronic equipment Pending CN113763313A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110484548.8A CN113763313A (en) 2021-04-30 2021-04-30 Text image quality detection method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110484548.8A CN113763313A (en) 2021-04-30 2021-04-30 Text image quality detection method, device, medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN113763313A true CN113763313A (en) 2021-12-07

Family

ID=78786949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110484548.8A Pending CN113763313A (en) 2021-04-30 2021-04-30 Text image quality detection method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113763313A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898409A (en) * 2022-07-14 2022-08-12 深圳市海清视讯科技有限公司 Data processing method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898409A (en) * 2022-07-14 2022-08-12 深圳市海清视讯科技有限公司 Data processing method and device

Similar Documents

Publication Publication Date Title
Liu et al. SCCGAN: style and characters inpainting based on CGAN
CN108509915B (en) Method and device for generating face recognition model
WO2021203863A1 (en) Artificial intelligence-based object detection method and apparatus, device, and storage medium
WO2021139324A1 (en) Image recognition method and apparatus, computer-readable storage medium and electronic device
CN111898696A (en) Method, device, medium and equipment for generating pseudo label and label prediction model
CN111027563A (en) Text detection method, device and recognition system
CN111754541A (en) Target tracking method, device, equipment and readable storage medium
CN111680678B (en) Target area identification method, device, equipment and readable storage medium
CN111400528B (en) Image compression method, device, server and storage medium
CN112861575A (en) Pedestrian structuring method, device, equipment and storage medium
CN112101344B (en) Video text tracking method and device
CN115131218A (en) Image processing method, image processing device, computer readable medium and electronic equipment
CN113706562B (en) Image segmentation method, device and system and cell segmentation method
CN111814821A (en) Deep learning model establishing method, sample processing method and device
CN112907569A (en) Head image area segmentation method and device, electronic equipment and storage medium
Qin et al. Face inpainting network for large missing regions based on weighted facial similarity
CN111814653B (en) Method, device, equipment and storage medium for detecting abnormal behavior in video
CN113781387A (en) Model training method, image processing method, device, equipment and storage medium
CN110555406B (en) Video moving target identification method based on Haar-like characteristics and CNN matching
CN113763313A (en) Text image quality detection method, device, medium and electronic equipment
CN112037239A (en) Text guidance image segmentation method based on multi-level explicit relation selection
Agarwal et al. Unmasking the potential: evaluating image inpainting techniques for masked face reconstruction
CN113822846A (en) Method, apparatus, device and medium for determining region of interest in medical image
CN111814865A (en) Image identification method, device, equipment and storage medium
CN112749691A (en) Image processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination