CN117095417A - Screen shot form image text recognition method, device, equipment and storage medium - Google Patents

Screen shot form image text recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN117095417A
CN117095417A CN202311076101.2A CN202311076101A CN117095417A CN 117095417 A CN117095417 A CN 117095417A CN 202311076101 A CN202311076101 A CN 202311076101A CN 117095417 A CN117095417 A CN 117095417A
Authority
CN
China
Prior art keywords
target
pixel matrix
preset
image
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311076101.2A
Other languages
Chinese (zh)
Inventor
程浩宇
刘子星
丁乐
徐煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Servyou Software Group Co ltd
Original Assignee
Servyou Software Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Servyou Software Group Co ltd filed Critical Servyou Software Group Co ltd
Priority to CN202311076101.2A priority Critical patent/CN117095417A/en
Publication of CN117095417A publication Critical patent/CN117095417A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/15Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/18105Extraction of features or characteristics of the image related to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/02Recognising information on displays, dials, clocks

Abstract

The application discloses a screen shot form image text recognition method, a device, equipment and a storage medium, relating to the technical field of image recognition, comprising the following steps: inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and performing exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image; inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates; and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area. Therefore, automatic image calibration and character recognition are realized, manual intervention is reduced, and recognition efficiency is improved.

Description

Screen shot form image text recognition method, device, equipment and storage medium
Technical Field
The present application relates to the field of image recognition technologies, and in particular, to a method, an apparatus, a device, and a storage medium for recognizing a screen shot form image text.
Background
Today, image processing techniques have been widely used in various industries. However, the identification of low quality on-screen form images has been a major concern due to technical limitations and physical condition limitations. Such images often have problems with high noise, distortion, and low contrast, which presents a significant challenge for form detection and text content recognition. In the prior art, the form on the image is detected after image preprocessing operations such as gamma conversion, perspective conversion and the like are adopted. Since form data of the electronic screen is recorded as one of actual scenes of natural photographing, array aliasing occurs between a display element of the display device and a photosensitive element of the photographing device, and it is difficult to avoid the occurrence of moire phenomenon. In the prior art, the image preprocessing method (such as graying, gamma conversion and perspective conversion) is difficult to eliminate mole marks on the image, and the mole marks can be covered on the text content and the table frame line, so that the problems of missed detection, false detection and the like of an OCR (Optical Character Recognition, namely optical character recognition technology) model recognition result occur. In view of this, there are a number of difficulties and challenges in low quality on-screen form image recognition, including high noise, moire, etc., that need to be addressed by conventional techniques.
Disclosure of Invention
In view of the above, the present application aims to provide a method, a device and a storage medium for recognizing a screen shot form image text, which can improve the accuracy and efficiency of automatic image calibration and text recognition. The specific scheme is as follows:
in a first aspect, the application discloses a screen shot form image text recognition method, which comprises the following steps:
inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and performing exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image;
inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates;
and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area.
Optionally, before inputting the target screen shot form image to the preset moire elimination model to obtain the first target pixel matrix with the moire removed, the method further includes:
and adjusting the initial moire elimination model through a preset screen shot image data set to obtain the preset moire elimination model.
Optionally, the exposing the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image includes:
performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix, and cutting the exposed pixel matrix based on a preset pixel cutting range to obtain a target exposed pixel matrix;
and determining a target form area on the target screen shot form image according to the outline vertex coordinates and the preset vertex coordinates corresponding to the target exposed pixel matrix, and calibrating the target form area through a preset perspective transformation algorithm to obtain a second target pixel matrix corresponding to the target form area.
Optionally, before performing exposure processing on the first target pixel matrix based on the preset linear transformation formula to obtain an exposed pixel matrix and clipping the exposed pixel matrix based on a preset pixel clipping range to obtain the target exposed pixel matrix, the method further includes:
converting the first target pixel matrix from an RGB color space to an HLS color space;
correspondingly, after performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix and cutting the exposed pixel matrix based on a preset pixel cutting range to obtain a target exposed pixel matrix, the method further comprises:
the first target pixel matrix is restored from HLS color space to RGB color space.
Optionally, the inputting the second target pixel matrix to a preset table detection model to obtain a target cell vertex coordinate, and determining the cell pixel matrix based on the target cell vertex coordinate includes:
inputting the second target pixel matrix to a preset table detection model to obtain the vertex coordinates of the target cells;
determining a corresponding cell pixel matrix from the second target pixel matrix based on the target cell vertex coordinates;
and ordering the cell pixel matrix based on a preset ordering rule to determine the cell pixel matrix.
Optionally, the stitching the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain a text on the target form area, including:
adjusting the widths of the cells in the cell pixel matrix to obtain a new cell pixel matrix, and splicing the cells in the new cell pixel matrix in the vertical direction to obtain a target image;
and carrying out text detection on the target image by using a preset optical character recognition technology to obtain text content on the target form area and coordinates corresponding to the text content.
Optionally, after performing text detection on the target image by using a preset optical character recognition technology to obtain text content on the target form area and coordinates corresponding to the text content, the method further includes:
integrating the text content and the coordinates corresponding to the text content to generate a target table, and storing the target table through a preset file format.
In a second aspect, the present application discloses a screen shot form image text recognition device, comprising:
the image preprocessing module is used for inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and carrying out exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image;
the table detection module is used for inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates;
and the cell text recognition module is used for splicing cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by utilizing a preset optical character recognition technology to obtain the text on the target form area.
In a third aspect, the present application discloses an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the screen shot form image text recognition method.
In a fourth aspect, the present application discloses a computer readable storage medium storing a computer program which, when executed by a processor, implements the aforementioned method for identifying a screen shot form image text.
In the method, a target screen shot form image is input into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and exposure processing is carried out on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image; inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates; and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area. The method comprises the steps of eliminating moire on a target screen shot form image through a preset moire elimination model to obtain a first target pixel matrix, exposing and checking the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area, detecting the second target pixel matrix through a preset form detection model to obtain vertex coordinate information of each cell in the second target pixel matrix, extracting and arranging pixels of the cells to generate a target image, detecting the target image through a preset optical character recognition technology, and actually detecting each cell on the target image to further obtain text on the target form area. According to the application, the performance of character recognition by the preset optical character recognition technology is improved through operations such as moire elimination and image exposure, and when the character is recognized by the preset optical character recognition technology, each cell on the target image is recognized, so that the method is more accurate compared with the whole recognition, and the occurrence of false detection or omission detection is reduced. Therefore, the problems of low accuracy, low efficiency and poor using effect of form detection and character recognition caused by high noise, moire and the like can be solved, the recognition accuracy and efficiency are improved, and the method can be widely applied to the fields of image processing, document management, financial tax management and the like.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for recognizing a screen shot form image text;
FIG. 2 is a flowchart of a text image preprocessing of a specific on-screen form image disclosed in the present application;
FIG. 3 is a flowchart of a specific method for identifying text in a screen shot form image according to the present disclosure;
FIG. 4 is a flowchart of a specific on-screen form image text form verification disclosed herein;
FIG. 5 is a flowchart of a specific method for identifying text in a screen shot form image according to the present disclosure;
FIG. 6 is a schematic diagram of a screen shot form image text recognition device according to the present application;
fig. 7 is a block diagram of an electronic device according to the present disclosure.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the aspect of low-quality screen shot form image recognition, the conventional technology has a plurality of difficulties and challenges, and the embodiment specifically introduces a screen shot form image text recognition method, so that the problems can be effectively overcome, and the recognition precision and efficiency are improved.
Referring to fig. 1, the embodiment of the application discloses a screen shot form image text recognition method, which comprises the following steps:
step S11: inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and carrying out exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image.
In this embodiment, as shown in fig. 2, before inputting the target screen shot form image to the preset moire elimination model to obtain the first target pixel matrix with the moire removed, the method further includes: and adjusting the initial moire elimination model through a preset screen shot image data set to obtain the preset moire elimination model. The initial moire elimination model used in the application is constructed based on the champion scheme MRNet of NTIRE 2021 (a convolutional neural network training knee joint magnetic resonance imaging graph classifier). After the initial mole pattern elimination model is obtained, fine adjustment is carried out on the initial mole pattern elimination model through a real acquisition screen shot image data set to obtain a preset mole pattern elimination model. Therefore, the preset moire elimination model can be more in line with the use condition of the current image processing. Then the acquired target screen shot form image I raw Inputting (x, y) into a preset moire elimination model, and removing the moire of the target screen shot form image by the preset moire elimination model to obtain a first target pixel matrix I with the moire removed mr (x, y). Then, performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix; i.e. converting the first target pixel matrix from RGB (red green blue) color space to HLS (Hue Saturation Lighmess, i.e. hue, saturation, brightness) color space, i.e. converting the first target pixel matrix I mr (x, y) conversion from RGB color space to HLS color space I hls (x, y) and then adjusting the brightness using a linear transformation, the formula is as follows:
I hls (x,y)=α·I mr (x,y)+β;
where α is a scaling factor of the linear transformation, and β is an offset of the linear transformation, which can be obtained by analyzing the image histogram. And then clipping the exposed pixel matrix based on a preset pixel clipping range to obtain a target exposed pixel matrix. The operation of clipping pixel values can be expressed as:
the first target pixel matrix is then restored from the HLS color space to the RGB color space, i.e. the first target pixel matrix I hls (x, y) reduction from HLS color space to RGB color space I rgb (x, y). Then in turn to I rgb And (x, y) carrying out graying and Gaussian blur, constructing an edge detection operator to finish edge and contour detection and contour polygon fitting, and finally obtaining four vertex coordinates of the contour of the outermost layer of the form area. Then according to the form outermost contour vertex coordinates and target vertex coordinates, solving to obtain a mapping exchange matrix M of the form outermost contour vertex coordinates and target vertex coordinates, and finally carrying out perspective transformation on I through a perspective transformation algorithm based on the mapping exchange matrix M rgb (x, y) calibrating to obtain a calibrated second target pixel matrix I corresponding to the target form area cal (x, y). The perspective transformation can be expressed as:
step S12: and inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates.
In this embodiment, the second target pixel matrix I cal And (x, y) inputting the vertex coordinate information of each cell in the second target pixel matrix into a preset table detection model, and extracting the corresponding cell pixel matrix according to the vertex coordinate information of the cell to obtain a list related to the cell pixel matrix.
Step S13: and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area.
In this embodiment, the unit cell pixel momentThe elements in the list of the array are filled accordingly so that the widths of all cells are equal, creating a new list of cell pixel matrices. Then splicing the list elements along the vertical direction to obtain a new image I new (x, y). And then performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area. It should be noted that, when the text is detected by using the preset optical character recognition technology, the text is detected by taking the unit cell as a unit, so that the recognition accuracy is greatly improved.
In this embodiment, a target screen shot form image is input to a preset moire elimination model to obtain a first target pixel matrix with moire removed, and exposure processing is performed on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image; inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates; and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area. The method comprises the steps of eliminating moire on a target screen shot form image through a preset moire elimination model to obtain a first target pixel matrix, exposing and checking the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area, detecting the second target pixel matrix through a preset form detection model to obtain vertex coordinate information of each cell in the second target pixel matrix, extracting and arranging pixels of the cells to generate a target image, detecting the target image through a preset optical character recognition technology, and actually detecting each cell on the target image to further obtain text on the target form area. According to the application, the performance of character recognition by the preset optical character recognition technology is improved through operations such as moire elimination and image exposure, and when the character is recognized by the preset optical character recognition technology, each cell on the target image is recognized, so that the method is more accurate compared with the whole recognition, and the occurrence of false detection or omission detection is reduced. Therefore, the problems of low accuracy, low efficiency and poor using effect of form detection and character recognition caused by high noise, moire and the like can be solved, the recognition accuracy and efficiency are improved, and the method can be widely applied to the fields of image processing, document management, financial tax management and the like.
The above embodiment mainly introduces a screen shot form image text recognition method from the aspect of picture preprocessing, and the present embodiment introduces a screen shot form image text recognition method from the aspect of form detection and text recognition.
Referring to fig. 3, the embodiment of the application discloses a specific screen shot form image text recognition method, which comprises the following steps:
step S21: and inputting the second target pixel matrix into a preset table detection model to obtain the vertex coordinates of the target cells.
In this embodiment, as shown in FIG. 4, a second matrix of target pixels I cal (x, y) is input into a preset table detection model which outputs vertex coordinate information V= { V of each cell in the table 1 ,v 2 ,…,v n }, where v i =(x i1 ,y i1 ,x i2 ,y i2 ,x i3 ,y i3 ,x i4 ,y i4 ) The four vertex coordinates (upper left, lower right, upper right in order) of the ith cell are represented. It should be noted that, the Table-OCR model commonly used for the task of automatic detection of the preset Table detection model Table is developed, and the Table-OCR realizes automatic detection and Table reconstruction of the document Table based on the dark frame. In this way, the accuracy of table detection can be improved.
Step S22: and determining a corresponding cell pixel matrix from the second target pixel matrix based on the target cell vertex coordinates.
In this embodiment, the output vertex coordinate information of the cell is used in the I cal Extracting the pixel matrix of the corresponding cell on (x, y), and for the ith cell, extracting the pixel matrix P of the corresponding cell i Can be represented by I cal (x, y) extract, i.e. P i ={I cal (x,y)|x∈[x i1 ,x i3 ],y∈[y i1 ,y i2 ]}。
Step S23: and ordering the cell pixel matrix based on a preset ordering rule to determine the cell pixel matrix.
In this embodiment, the cell pixel matrix is ordered based on a preset ordering rule to determine the cell pixel matrix, that is, assuming that the form has m rows and n columns of cells, the cells may be arranged according to the order of "from left to right and from top to bottom", and finally a list p= { P about the cell pixel matrix is obtained 1 ,P 2 ,…,P mn }。
Step S24: and adjusting the cell width in the cell pixel matrix to obtain a new cell pixel matrix, and splicing the cells in the new cell pixel matrix in the vertical direction to obtain a target image.
In this embodiment, as shown in fig. 5, for the cell pixel matrix list p= { P 1 ,P 2 ,…,P mn Filling elements in the form, namely filling two sides of each cell with a white background with a pixel value of 255 in sequence to ensure that the widths of all the cells are equal to generate a new cell pixel matrix, wherein the serial single-sheet input reasoning mode of the OCR model can cause overlong reasoning time, so that a plurality of cell pixel matrixes in the form can be spliced into a new image I in a longitudinal stacking mode new (x, y) to reduce the number of inferences and time. Specifically, for the new cell pixel matrix list, list elements are spliced along the vertical direction to obtain a new image I new (x,y)。
Step S25: and carrying out text detection on the target image by using a preset optical character recognition technology to obtain text content on a target form area and coordinates corresponding to the text content.
In the present embodimentAnd constructing a preset optical character recognition model, wherein the text detection module is constructed based on a EAST (Efficient and Accurate Scene Text) model of the MobileNet V3 backbone network, and the text recognition model is constructed based on a RARE (Robust text recognizer with Automatic Rectification, namely a reliable text recognizer with an automatic correction function) model of the MobileNet V3 backbone network. And then, performing text detection on each cell in the target image by using a preset optical character recognition technology to obtain text content on a target form area and coordinates corresponding to the text content. I.e. for image I new (x, y), the recognized text content and its coordinates are t= { (T) 1 ,x 1 ,y 1 ),(t 2 ,x 2 ,y 2 ),…,(t k ,x k ,y k ) }. Wherein t is i Representing the content of the ith cell text, x i And y i Representing the upper left corner coordinates of the ith cell text box.
Step S26: integrating the text content and the coordinates corresponding to the text content to generate a target table, and storing the target table through a preset file format.
In this embodiment, the text content and coordinates corresponding to the text content are integrated and output, and finally are presented in a form of m rows and n columns. And storing the result as an Excel table according to the actual requirement, or performing data processing and analysis in other forms.
Therefore, in the embodiment of the application, the performance of character recognition by the preset optical character recognition technology is improved through operations such as moire elimination and image exposure, and when the character is recognized by the preset optical character recognition technology, each cell on the target image is recognized, so that the method is more accurate compared with the whole recognition, and the occurrence of false detection or omission detection is reduced. Therefore, the problems of low accuracy, low efficiency and poor using effect of form detection and character recognition caused by high noise, moire and the like can be solved, the recognition accuracy and efficiency are improved, and the method can be widely applied to the fields of image processing, document management, financial tax management and the like.
As described with reference to fig. 6, the embodiment of the present application further correspondingly discloses a device for identifying text of a screen shot form image, including:
the image preprocessing module 11 is configured to input a target screen shot form image to a preset moire elimination model to obtain a first target pixel matrix with moire removed, and perform exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image;
a table detection module 12, configured to input the second target pixel matrix to a preset table detection model to obtain a target cell vertex coordinate, and determine a cell pixel matrix based on the target cell vertex coordinate;
and the cell text recognition module 13 is used for splicing cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by utilizing a preset optical character recognition technology to obtain the text on the target form area.
In this embodiment, a target screen shot form image is input to a preset moire elimination model to obtain a first target pixel matrix with moire removed, and exposure processing is performed on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image; inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates; and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area. The method comprises the steps of eliminating moire on a target screen shot form image through a preset moire elimination model to obtain a first target pixel matrix, exposing and checking the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area, detecting the second target pixel matrix through a preset form detection model to obtain vertex coordinate information of each cell in the second target pixel matrix, extracting and arranging pixels of the cells to generate a target image, detecting the target image through a preset optical character recognition technology, and actually detecting each cell on the target image to further obtain text on the target form area. According to the application, the performance of character recognition by the preset optical character recognition technology is improved through operations such as moire elimination and image exposure, and when the character is recognized by the preset optical character recognition technology, each cell on the target image is recognized, so that the method is more accurate compared with the whole recognition, and the occurrence of false detection or omission detection is reduced. Therefore, the problems of low accuracy, low efficiency and poor using effect of form detection and character recognition caused by high noise, moire and the like can be solved, the recognition accuracy and efficiency are improved, and the method can be widely applied to the fields of image processing, document management, financial tax management and the like.
In some specific embodiments, the device for identifying text of a screen shot form image may further include:
and the model fine adjustment module is used for adjusting the initial mole pattern elimination model through a preset screen shot image data set so as to obtain the preset mole pattern elimination model.
In some specific embodiments, the image preprocessing module 11 may specifically include:
the image exposure unit is used for carrying out exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix, and cutting the exposed pixel matrix based on a preset pixel cutting range to obtain a target exposed pixel matrix;
and the image calibration unit is used for determining a target form area on the target screen shot form image according to the outline vertex coordinates and the preset vertex coordinates corresponding to the target exposed pixel matrix, and calibrating the target form area through a preset perspective transformation algorithm to obtain a second target pixel matrix corresponding to the target form area.
In some specific embodiments, the device for identifying text of a screen shot form image may further include:
a color space conversion module for converting the first target pixel matrix from an RGB color space to an HLS color space;
and the color space restoring module is used for restoring the first target pixel matrix from the HLS color space to the RGB color space.
In some specific embodiments, the table detection module 12 may specifically include:
the vertex coordinate determining unit is used for inputting the second target pixel matrix into a preset table detection model to obtain the vertex coordinates of the target unit cells;
a unit cell pixel matrix determining unit, configured to determine a corresponding unit cell pixel matrix from the second target pixel matrix based on the target unit cell vertex coordinates;
and the unit cell sequencing unit is used for sequencing the unit cell pixel matrix based on a preset sequencing rule so as to determine the unit cell pixel matrix.
In some specific embodiments, the cell word recognition module 13 may specifically include:
the unit grid adjusting unit is used for adjusting the width of the unit grid in the unit grid pixel matrix to obtain a new unit grid pixel matrix, and splicing the unit grids in the new unit grid pixel matrix in the vertical direction to obtain a target image;
and the text detection unit is used for carrying out text detection on the target image by utilizing a preset optical character recognition technology so as to obtain text content on the target form area and coordinates corresponding to the text content.
In some specific embodiments, the device for identifying text of a screen shot form image may further include:
and the text storage module is used for integrating the text content and the coordinates corresponding to the text content to generate a target table and storing the target table through a preset file format.
Further, the embodiment of the present application further discloses an electronic device, and fig. 7 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.
Fig. 7 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement the relevant steps in the on-screen form image text recognition method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the on-screen form image text recognition method performed by the electronic device 20 as disclosed in any of the previous embodiments.
Further, the application also discloses a computer readable storage medium for storing a computer program; the computer program, when executed by the processor, implements the screen shot form image text recognition method disclosed above. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. A method for identifying text of a screen shot form image, comprising the steps of:
inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and performing exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image;
inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates;
and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area.
2. The method for recognizing text in a screen shot form image according to claim 1, wherein before inputting the target screen shot form image to a preset moire elimination model to obtain a first target pixel matrix with moire removed, the method further comprises:
and adjusting the initial moire elimination model through a preset screen shot image data set to obtain the preset moire elimination model.
3. The method of claim 1, wherein exposing the first matrix of target pixels to determine a second matrix of target pixels corresponding to a target form area on the target screen form image comprises:
performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix, and cutting the exposed pixel matrix based on a preset pixel cutting range to obtain a target exposed pixel matrix;
and determining a target form area on the target screen shot form image according to the outline vertex coordinates and the preset vertex coordinates corresponding to the target exposed pixel matrix, and calibrating the target form area through a preset perspective transformation algorithm to obtain a second target pixel matrix corresponding to the target form area.
4. The method for recognizing text in a screen shot form image according to claim 3, wherein before performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix and clipping the exposed pixel matrix based on a preset pixel clipping range to obtain a target exposed pixel matrix, the method further comprises:
converting the first target pixel matrix from an RGB color space to an HLS color space;
correspondingly, after performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix and cutting the exposed pixel matrix based on a preset pixel cutting range to obtain a target exposed pixel matrix, the method further comprises:
the first target pixel matrix is restored from HLS color space to RGB color space.
5. The method of claim 1, wherein inputting the second matrix of target pixels into a predetermined table detection model to obtain target cell vertex coordinates, and determining the matrix of cell pixels based on the target cell vertex coordinates, comprises:
inputting the second target pixel matrix to a preset table detection model to obtain the vertex coordinates of the target cells;
determining a corresponding cell pixel matrix from the second target pixel matrix based on the target cell vertex coordinates;
and ordering the cell pixel matrix based on a preset ordering rule to determine the cell pixel matrix.
6. The method for recognizing text in a screen shot form image according to any one of claims 1 to 5, wherein the stitching the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain text on the target form area, includes:
adjusting the widths of the cells in the cell pixel matrix to obtain a new cell pixel matrix, and splicing the cells in the new cell pixel matrix in the vertical direction to obtain a target image;
and carrying out text detection on the target image by using a preset optical character recognition technology to obtain text content on the target form area and coordinates corresponding to the text content.
7. The method for text recognition of a screen shot form image according to claim 6, wherein after text detection is performed on the target image by using a preset optical character recognition technology to obtain text content on the target form area and coordinates corresponding to the text content, further comprising:
integrating the text content and the coordinates corresponding to the text content to generate a target table, and storing the target table through a preset file format.
8. A screen shot form image text recognition device, comprising:
the image preprocessing module is used for inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and carrying out exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image;
the table detection module is used for inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates;
and the cell text recognition module is used for splicing cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by utilizing a preset optical character recognition technology to obtain the text on the target form area.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the on-screen form image text recognition method of any one of claims 1 to 7.
10. A computer readable storage medium for storing a computer program which when executed by a processor implements the on-screen form image text recognition method of any one of claims 1 to 7.
CN202311076101.2A 2023-08-24 2023-08-24 Screen shot form image text recognition method, device, equipment and storage medium Pending CN117095417A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311076101.2A CN117095417A (en) 2023-08-24 2023-08-24 Screen shot form image text recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311076101.2A CN117095417A (en) 2023-08-24 2023-08-24 Screen shot form image text recognition method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117095417A true CN117095417A (en) 2023-11-21

Family

ID=88769527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311076101.2A Pending CN117095417A (en) 2023-08-24 2023-08-24 Screen shot form image text recognition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117095417A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350264A (en) * 2023-12-04 2024-01-05 税友软件集团股份有限公司 PPT file generation method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350264A (en) * 2023-12-04 2024-01-05 税友软件集团股份有限公司 PPT file generation method, device, equipment and storage medium
CN117350264B (en) * 2023-12-04 2024-02-23 税友软件集团股份有限公司 PPT file generation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109993086B (en) Face detection method, device and system and terminal equipment
CN110400278B (en) Full-automatic correction method, device and equipment for image color and geometric distortion
US10477128B2 (en) Neighborhood haze density estimation for single-image dehaze
US11790499B2 (en) Certificate image extraction method and terminal device
JP2010045613A (en) Image identifying method and imaging device
CN105339951A (en) Method for detecting a document boundary
CN107749986B (en) Teaching video generation method and device, storage medium and computer equipment
WO2021029423A4 (en) Image processing method and apparatus and non-transitory computer-readable medium
CN101983507A (en) Automatic redeye detection
CN117095417A (en) Screen shot form image text recognition method, device, equipment and storage medium
US20220398698A1 (en) Image processing model generation method, processing method, storage medium, and terminal
KR20180127913A (en) Image processing apparatus, image processing method, and storage medium
CN108182398B (en) Method and device for adjusting direction of scanned image based on scanning equipment
CN113112511B (en) Method and device for correcting test paper, storage medium and electronic equipment
CN111507181B (en) Correction method and device for bill image and computer equipment
CN116777769A (en) Method and device for correcting distorted image, electronic equipment and storage medium
US20210281742A1 (en) Document detections from video images
CN113965814B (en) Multi-conference-place key frame extraction method and system based on video conference scene
JP3348898B2 (en) Degraded image restoration device, degraded image restoration system, and degraded image restoration method
CN110751135A (en) Drawing checking method and device, electronic equipment and storage medium
CN114399623B (en) Universal answer identification method, system, storage medium and computing device
CN113920513B (en) Text recognition method and equipment based on custom universal template
CN117456371B (en) Group string hot spot detection method, device, equipment and medium
CN111383172B (en) Training method and device of neural network model and intelligent terminal
CN112825141B (en) Method and device for recognizing text, recognition equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination