CN116030472A - Text coordinate determining method and device - Google Patents

Text coordinate determining method and device Download PDF

Info

Publication number
CN116030472A
CN116030472A CN202310108392.2A CN202310108392A CN116030472A CN 116030472 A CN116030472 A CN 116030472A CN 202310108392 A CN202310108392 A CN 202310108392A CN 116030472 A CN116030472 A CN 116030472A
Authority
CN
China
Prior art keywords
text
text line
target
line image
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310108392.2A
Other languages
Chinese (zh)
Inventor
陈丽娟
李道振
陈华华
高晶晶
张芸菲
项蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hundsun Technologies Inc
Original Assignee
Hundsun Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hundsun Technologies Inc filed Critical Hundsun Technologies Inc
Priority to CN202310108392.2A priority Critical patent/CN116030472A/en
Publication of CN116030472A publication Critical patent/CN116030472A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application provides a text coordinate determining method and a text coordinate determining device, wherein the text coordinate determining method comprises the following steps: extracting a text outline of a target text line in the text line image, and determining text box coordinates of the target text line in the text line image based on the text outline; determining position information corresponding to the initial text in the target text line, performing vertical segmentation processing on the target text line, and determining segmentation interval coordinates corresponding to the target text line according to a processing result; updating the partition interval coordinates according to the position information, and determining target partition interval coordinates corresponding to the initial characters based on the updating result; and calculating character coordinates corresponding to the initial characters based on the text box coordinates and the target segmentation interval coordinates. And precisely calculating the character coordinates of the initial characters through the text box coordinates and the target segmentation interval coordinates, and improving the coordinate recognition accuracy of the initial characters.

Description

Text coordinate determining method and device
Technical Field
The application relates to the technical field of image processing, in particular to a text coordinate determining method. The present application also relates to a text coordinate determining apparatus, a computing device, and a computer-readable storage medium.
Background
With the rapid development of technology, the character recognition technology is developed continuously, and is widely applied to various industries, such as information extraction in picture types and contract use more widely than peer-to-peer scenes. The word coordinate calculation is extremely important in the word recognition technology, and in the information extraction scene, the accurate word coordinate calculation can accurately show the information extraction result to the user. At present, single-word coordinate calculation usually uses an OCR text detection model to detect single-word coordinates in combination with a text recognition algorithm, but the existing related technology is too dependent on a model output result, so that the single-word position in a text line cannot be accurately positioned, and the single-word coordinate calculation result is inaccurate. Therefore, how to improve the accuracy of single word coordinate detection is a problem that needs to be solved at present.
Disclosure of Invention
In view of this, the embodiment of the application provides a text coordinate determining method, which is used for identifying a specific position of a text in an image and calculating coordinates of the text in the image. The application relates to a text coordinate determining device, a computing device and a computer readable storage medium, so as to solve the problem of low text coordinate recognition accuracy in the prior art.
According to a first aspect of an embodiment of the present application, there is provided a text coordinate determining method, including:
extracting a text outline of a target text line in a text line image, and determining text box coordinates of the target text line in the text line image based on the text outline;
determining position information corresponding to initial characters in the target text line, performing vertical segmentation processing on the target text line, and determining segmentation interval coordinates corresponding to the target text line according to a processing result;
updating the partition interval coordinates according to the position information, and determining target partition interval coordinates corresponding to the initial text based on an updating result;
and calculating the text coordinates corresponding to the initial text based on the text box coordinates and the target segmentation interval coordinates.
According to a second aspect of embodiments of the present application, there is provided a text coordinate determining apparatus, including:
an extraction module configured to extract a text outline of a target text line in a text line image, and determine text box coordinates of the target text line in the text line image based on the text outline;
the segmentation module is configured to determine the position information corresponding to the initial text in the target text line, perform vertical segmentation processing on the target text line, and determine the segmentation interval coordinates corresponding to the target text line according to the processing result;
The determining module is configured to update the partition interval coordinates according to the position information and determine target partition interval coordinates corresponding to the initial text based on an updating result;
and the calculating module is configured to calculate character coordinates corresponding to the initial characters based on the text box coordinates and the target segmentation interval coordinates.
According to a third aspect of embodiments of the present application, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the text coordinate determination method when executing the computer instructions.
According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the literal coordinate determining method.
The text coordinate determining method provided by the application comprises the following steps: extracting a text outline of a target text line in a text line image, and determining text box coordinates of the target text line in the text line image based on the text outline; determining position information corresponding to initial characters in the target text line, performing vertical segmentation processing on the target text line, and determining segmentation interval coordinates corresponding to the target text line according to a processing result; updating the partition interval coordinates according to the position information, and determining target partition interval coordinates corresponding to the initial text based on an updating result; and calculating the text coordinates corresponding to the initial text based on the text box coordinates and the target segmentation interval coordinates.
According to the method and the device, the coordinates of the text box in the text line image are extracted, the coordinates of the segmented regions of the target text line are determined by adopting a vertical segmentation method, the segmented regions are updated based on the position information of the initial characters in the target text line, so that the vertical segmentation result of the target text line is more accurate, the coordinates of the target segmented regions corresponding to the initial characters can be obtained more accurately later, finally the character coordinates of the initial characters in the text line image are calculated accurately based on the extracted coordinates of the text box and the coordinates of the target segmented regions, and the coordinate recognition accuracy of each initial character in the text line image is improved.
Drawings
FIG. 1 is a schematic diagram of the effect of a text coordinate determining method according to an embodiment of the present application;
FIG. 2 is a flowchart of a text coordinate determination method according to an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a text coordinate determining method according to an embodiment of the present disclosure;
FIG. 4 is a process flow diagram of a text coordinate determination method for information extraction according to one embodiment of the present application;
FIG. 5 is a schematic diagram of a text coordinate determining apparatus according to an embodiment of the present disclosure;
FIG. 6 is a block diagram of a computing device according to one embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.
The terminology used in one or more embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of one or more embodiments of the application. As used in this application in one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
First, terms related to one or more embodiments of the present application will be explained.
Vertical projection: in the field of computer vision, a vertical projection method is often used when an image is segmented, and the target image can be accurately projected by vertical projection, so that later segmentation is facilitated. The usual method also includes horizontal projection. The algorithm idea of vertical projection is as follows: taking a straight line in the direction of projection of the corresponding direction of the image, counting the number of black pixel points of pixels on the image perpendicular to the straight line (axis), and accumulating and summing to be used as the value of the position of the axis; cutting based on graphic projection is to map the image into the feature, and then determine the cutting position of the image based on the feature, so as to obtain a cut target image.
OCR: OCR is a technique that extracts and converts characters on an electronic document into computer-readable text data through computer vision techniques.
At present, in the application scenes of picture type information extraction and contract comparison, the image is often required to be identified, and the text content and text coordinates in the image are accurately acquired, so that the image information is acquired or the information comparison is carried out, and the accuracy of the text coordinates in the process is particularly important. The character coordinate recognition method commonly used at present comprises the following three steps:
(1) Training a detection model for detecting single words: the mainstream OCR character detection model is mostly carried out in a line character detection mode, the output of the model is the coordinates of a text line, and the coordinate information of a single character cannot be obtained. Therefore, a model for detecting the single-word coordinates is trained, the single-word coordinates can be output, but training data of the model is required to be marked according to the single words, the marking cost is extremely high, a large amount of manpower and material resources are consumed, the running environment of the single-word detection model is high in requirement, equipment with a certain configuration is required to be normally used, and otherwise, the processing efficiency is reduced.
(2) The character extraction method comprises the following steps: the traditional character extraction method comprises connected domain analysis, maximum stable extremum region, projection analysis and the like, but the method can reduce the accuracy of character extraction under the conditions of complex image background, distorted characters, adhesion and noise interference, and cannot obtain accurate single-word coordinates.
(3) Estimating single word coordinates based on an output result of the word recognition model: the method can obtain the word coordinates without training the word detection model, and can alleviate the problem of the traditional method that the method is not robust to a certain extent due to the combination of the output of the recognition model. However, the method only uses the output of the recognition model, and the finally obtained single character coordinates are estimated results, including the height, width and center point of the character, so that the problem of low accuracy of the output coordinate values exists, and if the characters are different in size, width, height, edge, and the like, the error of the output coordinate values is relatively large. Meanwhile, the method is seriously dependent on the output of the recognition model, so that if the recognition model is iteratively updated or replaced, the output effect of the original single word coordinate calculation is greatly reduced, and the iteration cost is high.
Based on this, in the present application, a text coordinate determining method is provided for solving the technical problems existing in the prior art, and the present application relates to a text coordinate determining apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.
FIG. 1 shows an effect schematic diagram of a text coordinate determining method according to an embodiment of the present application, where a text line image includes four words of Chinese love, where specific positions of the four words in the text line image need to be recognized, a lower left corner of the text line image is used as an origin of a coordinate system, first, text lines in the text line image are recognized, text outlines of a target text line "Chinese love" are extracted, such as a solid line box around the text line "Chinese love" in FIG. 1, and text box coordinates of the target text line in the text line image are calculated according to side lengths of the text box, that is, coordinates of the text box "Chinese love" in the text line image are obtained, position information of the four words in the text line image is determined, that is, a dotted line on each word in FIG. 1, such as a position information 1 on "Chinese love" is vertically projected and divided, the partition interval between each word can be obtained, the partition interval is a black pixel interval between words in fig. 1, such as partition interval 1 between "me" and "love" in fig. 1, so that partition interval coordinates corresponding to partition interval 1 can be known, partition interval coordinates are updated based on position information, that is, when some partition intervals are recognized as being wrong, they can be modified according to position information, after determining that the partition interval has no problem, target partition interval coordinates corresponding to each initial word, that is, partition interval coordinates on both sides of each word, such as partition interval 1 on the left side and partition interval 2 on the right side in fig. 1 are target partition intervals of "love", if a word at the beginning or end of a text line, the target partition interval coordinates on the left side or right side can be a starting point x1 or an ending point x2 of a text frame coordinate, . After the target partition interval coordinates corresponding to each word are determined, the text coordinates of each text in the text line image can be calculated according to the text box coordinates and the target partition interval coordinates. By the method, dependence on the output result of the character recognition model can be reduced, the vertical projection segmentation result is further adjusted and updated according to the position information corresponding to the characters, and therefore the target segmentation interval coordinate corresponding to each initial character can be accurately determined, and accurate character coordinates can be calculated.
Fig. 2 shows a flowchart of a text coordinate determining method according to an embodiment of the present application, which specifically includes the following steps:
step 202: and extracting a text outline of a target text line in a text line image, and determining text box coordinates of the target text line in the text line image based on the text outline.
The text line image may be understood as an image containing text, and may be a photocopy of a paper document obtained by a scanner, a photographic image of a paper document obtained by a camera, a mobile phone, or the like, or an electronic document. The text line image may include a plurality of lines of text, and each line of text may be processed as it is processed. After determining the target text line to be processed, the text outline of the target text line can be extracted, namely, the peripheral outline of the target text line in the text line image is determined, and the text box coordinates of the target text line in the text line image can be determined according to the side length information of the text outline. Referring to fig. 3, fig. 3 is a schematic processing diagram of a text coordinate determining method according to an embodiment of the present application, in fig. 3A, a text outline of a target text line in a text line image is extracted, the text outline is a text box outline around the target text line, and coordinates of the text box are determined according to side length information of the text outline, and are (x 1, x2, y1, y 2) respectively.
In practical application, if a word recognition model is used to determine the position of each word in the target text line, content recognition is first required to be performed on the target text line, character segmentation is performed, and the word coordinates of each character are calculated according to the segmentation result. In the process, accurate text coordinates cannot be obtained due to recognition errors of the model. By extracting the text outline in the text line image and calculating the text box coordinates according to the text outline, the text box coordinates can be used as the ordinate of each text, so that the accuracy in the subsequent calculation of the text coordinates is improved. The text box coordinates are based on the image coordinate system of the text line image, and the text coordinates calculated later and the text box coordinates are located in the same coordinate system, i.e. are all in the image coordinate system.
In a specific embodiment of the present application, as shown in fig. 3A, the text outline of the "i love Chinese" target text line is extracted, and the text box coordinates of the target text line in the text line image are determined based on the text outline, where the text box coordinates are (x 1, x2, y1, y 2). Subsequently, the ordinate of each word in the target text line may be determined to be (y 1, y 2).
Further, since the text line image may be obtained through line text detection, there may be a plurality of interference factors affecting text coordinate calculation in the text line image, so some preprocessing methods may be adopted to reduce noise interference, and before specifically extracting the text outline of the target text line in the text line image, the method further includes: determining an initial text line image; and performing inverse color processing, denoising processing and/or line removing processing on the initial text line image to obtain a text line image.
The initial text line image may be understood as a text line image with noise interference factors, that is, a text line image before preprocessing, where preprocessing may include performing inverse color processing, denoising processing, line removing processing, and the like on the text line image, and through one or more combinations of the above processing methods, interference noise in the initial text line image may be removed, so as to obtain a clear text line image.
In practical application, for a document processing scene, the following noise interference factors may occur in an initial text line image detected by line text: the noise interference factors can influence the calculation accuracy of the subsequent text coordinates, such as non-welting of text lines, interference of table lines, inconsistent text colors, interference of noise points, different font sizes and the like. For these cases, various preprocessing methods, such as text line welt extraction, table line removal, text black-white color reversal, etc., can be used, so as to improve the accuracy of the finally output text coordinates. Therefore, after the initial text line image is obtained, the initial text line image can be detected, which types of noise interference factors exist in the initial text line image can be judged, and relative factor preprocessing is performed on the initial text line image, so that the text line image is obtained. In specific implementation, the size of the initial text line image can be scaled, so that the subsequent preprocessing operation is facilitated, and it is noted that in the size scaling process, the aspect ratio of the image needs to be maintained.
In a specific embodiment of the present application, when it is detected that the initial text line image is a text line image of a white background and black word, performing a color reversal process on the text line image; denoising the initial text line image when detecting that the noise exists in the initial text line image; and if the table grid lines exist in the initial text line image, carrying out line removal processing on the table grid lines.
In sum, after the initial text line image is obtained, the pre-processed text line image is obtained by performing preprocessing operations such as anti-color processing, denoising processing, line-removing processing and the like on the initial text line image, so that characters of the text line image are ensured to be clear, interference factors are reduced, and the accuracy of calculating the character coordinates subsequently is improved.
Further, in order to eliminate interference factors in the initial text line image, which cause that the vertical projection segmentation cannot be normally used in the subsequent process due to different font colors, the initial text line image may be subjected to a color reversal process, and specifically, the initial text line image is subjected to a color reversal process, including: extracting pixel values of pixel points in the initial text line image; determining a black pixel point set and a white pixel point set based on the pixel values; and under the condition that the number of pixels of the black pixel point set is larger than that of pixels of the white pixel point set, inverting the pixel value and obtaining an inverse-color text line image.
The pixel value of the pixel point in the initial text line image may be understood as a pixel value corresponding to each pixel point in the initial text line image, for example, the pixel value of the black pixel point is 0, and the pixel value of the white pixel point is 255. By determining the pixel value of each pixel point, the black pixel point and the white pixel point in the initial text line image can be determined, the black pixel point set can be understood as a set formed by all black pixel points in the initial text line image, the white pixel point set can be understood as a set formed by all white pixel points in the initial text line image, and whether the current initial text line image needs to be subjected to inverse color processing can be determined by comparing the number of the pixel points of the black pixel point set with the number of the pixel points of the white pixel point set. When the inverse color processing is required, the pixel value of each pixel point is inverted, for example, the pixel value of the black pixel point is 0, after the pixel value is inverted, the pixel value of the black pixel point is 255, the black pixel point is inverted to be the white pixel point, and the above processing is performed on each pixel point in the initial text line image, so that the inverse color text line image can be obtained.
In practical application, if the number of pixels in the black pixel set is greater than that of pixels in the white pixel set, the background of the image is described as black, and the font color is white, so that the initial text line image needs to be subjected to inverse color processing for facilitating the subsequent use of the vertical projection analysis method. In particular, in order to avoid the pixel points with other pixel values in the initial text line image, the gray processing may be performed on the initial text line image first, then a black pixel point set and a white pixel point set may be determined according to the gray processing initial text line image, and when the black pixel point set and the white pixel point set are determined, a preset pixel point threshold may be used to determine, for example, the pixel point greater than the preset pixel point threshold 200 is a white pixel point, and the pixel point less than the preset pixel point threshold 200 is a black pixel point, so as to determine the black pixel point set and the white pixel point set. And the step of executing the reverse color process is continued, thereby obtaining a reverse color text line image.
In a specific embodiment of the application, an initial text line image is determined to be a text line image of a black background and a white background, the initial text line image is firstly subjected to graying, pixel values of all pixel points in the initial text line image are extracted, all the pixel points in the initial text line image are divided into black pixel points and white pixel points based on a preset pixel point threshold value and the pixel values of all the pixel points, the black pixel points and the white pixel points are combined into a black pixel point set and a white pixel point set, and the pixel values of all the pixel points are inverted under the condition that the number of the black pixel points is larger than that of the white pixel points, so that the text line image of the white background and the black background is obtained.
Based on the method, the font color can be adjusted to be white by performing the inverse color processing on the initial text line image, so that the subsequent vertical projection segmentation processing is facilitated, and the accuracy of text coordinate calculation is improved.
Further, in order to avoid the influence of the noise point in the initial text line image on the calculation of the coordinates of the subsequent text, denoising the initial text line image, specifically denoising the initial text line image, including: performing binarization processing on the initial text line image; and denoising calculation is carried out on the initial text line image and the binarized initial text line image, so as to obtain a denoised text line image.
The binarization processing can be understood as converting the initial text line image into a binarization image with only black and white colors by adopting an OTSU threshold method, and the denoising calculation can be understood as performing AND operation on the initial text line image and the binarization text line image, so as to obtain the denoised initial text line image, namely the denoised text line image.
In practical application, after the binarization is performed on the initialized text line image, a binarized initial text line image can be obtained, and at this time, the binarized initial text line image and the original initial text line image are subjected to denoising operation, that is, AND operation, that is, the values of the white portions in the binarized initial text line image are reserved, the values of the black portions are directly changed into 0, and the pixel values of some noise points are also changed into 0, so that noise point filtering is realized. It should be noted that the denoising process may be performed on the result of the inverse color process, that is, the denoising process operation is performed on the inverse color text line image.
In a specific embodiment of the present application, when it is determined that a noise exists in an initial text line image, denoising the initial text line image, first performing binarization processing on the initial text line image to obtain a binarized text line image, and performing and operation on the binarized text line image and the original initial text line image to implement noise filtering in the initial text line image, thereby obtaining a denoised text line image.
Based on the method, through denoising the initial text line image, noise points in the initial text line image can be filtered, the influence of the noise points in the image on subsequent recognition of characters and calculation of character coordinates is eliminated, and the accuracy of character coordinate calculation is improved.
Further, in order to avoid the influence of the table lines in the initial text line image on the characters in the identified text line image, the initial text line image may be subjected to line removal processing, specifically, the line removal processing of the initial text line image includes: performing projection processing on the initial text line image, and determining lines to be deleted in the initial text line image; and deleting the lines to be deleted to obtain an outgoing text line image.
The projection processing may be understood as vertical projection processing or horizontal projection processing, after the projection processing is performed on the initial text line image, a longitudinal line or a transverse line in the initial text line image may be determined, the line to be deleted may be understood as a line determined in the initial text line image, and the line to be deleted may be deleted, so that the text line image after line removal may be obtained.
In practice, the line-out process includes vertically projecting the original text line image, and calculating the ratio of the pixel value other than 0 to the height after the vertical projection, i.e.
Figure BDA0004075863530000111
A:{p ij >0,i∈(0,1,...,w),j∈(0,1,...,h)},p ij Representing the pixel value at j on the y-axis and i on the x-axis. If r i If the position i is larger than the preset super-parameter threshold value, the position i is a longitudinal line, the line is deleted by using open operation, and the open operation is deletedThe process of the lines can be understood as an etching operation and then an expanding operation of the image. Correspondingly, the horizontal line can be determined by horizontally projecting the initial text line image according to the method, and the horizontal line is deleted by using open operation, so that the text line image after line removal is obtained. It should be noted that the line-out processing may process the inverse color text line image or the noise-removed text line image.
In a specific embodiment of the present application, vertical projection processing and horizontal projection processing are performed on an initial text line image, a longitudinal line and a transverse line in the initial text line image are determined, and operation processing is performed on both lines, so that deletion processing is performed on a line to be deleted, and an outgoing text line image is obtained.
In conclusion, redundant lines in the image can be deleted by carrying out line removal processing on the initial text line image, so that the influence of the lines in the image on the subsequent text content extraction is avoided, and the accuracy of text coordinate calculation is improved. It should be noted that the text line image may be an inverse text line image, a denoised text line image, or a line removed text line image, or may be a text line image obtained after performing the above three preprocessing operations on the initial text line image.
Further, in order to prevent the extracted text outline from being inaccurate, text outline extraction may be performed by adopting an edge-filling manner, specifically extracting a text outline of a target text line in a text line image, determining text box coordinates of the target text line in the text line image based on the text outline, including: performing edge-repairing processing on the text line image based on a preset rectangular frame, and determining a target text line in the text line image after edge-repairing; determining the text outline of the target text line based on a preset rectangular box in the text line image after edge repair; and determining an image coordinate system corresponding to the text line image, and calculating the text box coordinates of the target text line in the image coordinate system according to the side length information of the text outline.
The preset rectangular frame can be understood as a preset rectangle, and the preset rectangular frame is adopted to perform edge-repairing processing in the text line image, so that a target text line in the text line image and a text contour corresponding to the target text line are determined, the text line image is subjected to edge-repairing processing by adopting the preset rectangular frame in fig. 3B, namely blank parts in the text line image are complemented, and shadow parts are preset rectangular frames, wherein the height of a larger rectangular frame is a, and the width of a smaller rectangular frame is a. After the edge repair process, a target text line in the graph, namely, a text behavior target text line in the inscribed rectangle of the four rectangular boxes, can be determined, and a text outline of the target text line can be determined based on the four preset rectangular boxes, namely, the inscribed rectangular boxes formed by inscribed four preset rectangular boxes serve as the text outline of the target text line, so that the text boxes around the target text line in fig. 3A can be obtained.
In practical application, in order to ensure a coordinate system of a text box coordinate and a subsequent text coordinate, a corresponding image coordinate system can be determined based on a text line image, namely, a pixel point at the lower left corner of the text line image is taken as an origin of the coordinate system, the horizontal x-axis is the horizontal x-axis, and the vertical y-axis is the vertical y-axis. After determining the text outline of the target text line, the text box coordinates corresponding to the target text line may be calculated based on the side length information of the text outline, as shown in (x 1, x2, y1, y 2) in fig. 3A.
Based on the text line image, the text line image is subjected to edge-supplementing processing through the preset rectangular frame, a target text line in the text line image and a text outline corresponding to the target text line can be determined, text frame coordinates corresponding to the target text line are calculated according to the text outline, and then text coordinates can be calculated based on the text frame coordinates, so that the accuracy of text coordinate calculation is improved.
Step 204: determining position information corresponding to the initial text in the target text line, performing vertical segmentation processing on the target text line, and determining segmentation interval coordinates corresponding to the target text line according to a processing result.
The initial text can be understood as each single character in the target text line, the position information corresponding to the initial text can be understood as the coordinate information of the initial text in the text line image, the vertical segmentation processing of the target text line can be understood as the vertical projection of the target text line, the segmentation interval between the text is determined, and the corresponding segmentation interval coordinate can be determined after the segmentation interval is determined. The segmentation section may be referred to as a black section between characters in fig. 1.
In practical application, the location information corresponding to the initial text may be obtained by means of a text recognition model CRNN, and specifically determining the location information corresponding to the initial text in the target text line includes: and inputting the text line image into a character recognition model to obtain position information corresponding to the initial characters in the target text line output by the character recognition model.
After the text line image is output to a value text recognition model, the output result of the text recognition model comprises information such as position information, text content, confidence level and the like of the text on the x axis, and the position information corresponding to the initial text in the target text line is obtained from the model output result. Since the position information output by the text recognition model may be inaccurate due to the influence of image quality, the output position information is not in the middle of the text. Therefore, the text coordinate of the text in the text line image is calculated without depending on the position information, and the target text line is vertically segmented, so that the position information is adopted to assist in determining the segmentation section, and the accuracy of text coordinate calculation is improved.
In a specific embodiment of the present application, a text line image is input to a text recognition model, position information corresponding to each initial text in a target text line output by the text recognition model is obtained, and vertical segmentation processing is performed on the target text line, so as to obtain a segmentation interval between the text and the text, and a segmentation interval coordinate corresponding to the segmentation interval.
In summary, by identifying the target text line by means of the text recognition model, the position information corresponding to each initial text in the target text line can be obtained, and the position information can be used for determining the partition section subsequently, so that the accuracy of determining the partition section is improved.
Further, since there may be an error in the position information output by the model, the text coordinates corresponding to the initial text cannot be calculated based on the position information output by the model, so that the vertical segmentation processing can be performed on the target text line to obtain a segmentation region between the text and the text, the text coordinates are calculated based on the segmentation region, the accuracy of text coordinate calculation is improved, the vertical segmentation processing is performed on the target text line, and the segmentation region coordinates corresponding to the target text line are determined according to the processing result, including: performing binarization processing on the text line image, and performing vertical segmentation processing on the binarized text line image; determining an initial segmentation interval in the vertically segmented text line image; under the condition that the pixel points in the initial segmentation interval are black pixel points, determining the initial segmentation interval as a segmentation interval corresponding to the target text line; and determining an image coordinate system corresponding to the text line image, and calculating a partition interval coordinate of the partition interval in the image coordinate system, wherein the partition interval coordinate and the text box coordinate are positioned in the same coordinate system.
The method comprises the steps of performing binarization processing on a text line image to obtain a binarized text line image with black pixel points and white pixel points, performing vertical projection segmentation processing on the binarized text line image, and determining an initial segmentation section in the vertically segmented text line image, wherein the initial segmentation section can be understood as an initial segmentation section obtained by the vertical segmentation processing, and stroke images of characters possibly contained in the initial segmentation section, so that further judgment is needed on the initial segmentation section, and when the pixel points in the initial segmentation section are all black pixel points, the initial segmentation section is a segmentation section corresponding to a target text line.
In practical application, after binarizing an image by an OTSU threshold method, obtaining a binarized text line image, and then performing vertical projection to obtain an initial segmentation interval s= { S by calculation 0 ,S 1 ,...,S n }. Counting the number of pixel points with a pixel value of 255 in an initial segmentation interval
Figure BDA0004075863530000141
At->
Figure BDA0004075863530000142
When the pixel points in the initial dividing section are black pixel points, the dividing section s between characters at the position i is described i The initial partition may be defined as the partition corresponding to the target text line. It should be noted that there may be a case where two initial divided sections are consecutive, in which case, in obtaining the divided section s i Then, whether the two divided sections are continuous with the previous divided section is further judged, if so, the two divided sections are combined, and the next initial divided section is continuously judged.
In a specific embodiment of the present application, binarizing a text line image, and vertically dividing the binarized text line image, determining an initial dividing section in the text line image, selecting the initial dividing sections each of which is a black pixel point as a dividing section of a target text line, and calculating dividing section coordinates in the text line image according to a start point and an end point of the dividing section.
Based on the above, the segmentation interval between the characters in the target text line can be obtained by performing vertical segmentation processing on the text line image, and then the character coordinates can be calculated based on the segmentation interval, so that the accuracy of calculating the character coordinates is improved.
Step 206: and updating the partition interval coordinates according to the position information, and determining the target partition interval coordinates corresponding to the initial text based on an updating result.
The updating of the coordinates of the divided sections based on the position information can be understood as filtering the divided sections based on the position information. In practical application, because the left and right structures and the interference of background factors exist in the Chinese characters, the vertical segmentation processing often causes the situation of excessive segmentation, for example, the Chinese characters with the left and right structures are divided into two parts, so that the segmentation interval can be further judged based on the position information, and the updating of the coordinates of the segmentation interval is realized.
In practice, each initial text may be processed by first determining initial text a and initial textThe word b has a partition section, and the position information of the initial word a is that
Figure BDA0004075863530000151
The position information of the initial letter b is +.>
Figure BDA0004075863530000152
(location information of the previous word). At->
Figure BDA0004075863530000153
Figure BDA0004075863530000154
In the case of (a), it is explained that a division section exists between the initial character a and the initial character b, j start For the starting point coordinates, j of the partition end Is the end point coordinates of the partitioned section.
In order to further eliminate the occurrence of a partition interval due to the interference of the left-right structure of the Chinese character, the partition interval needs to be judged according to the super parameter, specifically, the partition interval coordinates are updated according to the position information, and the target partition interval coordinates corresponding to the initial character are determined based on the updating result, which comprises the following steps: determining target characters and reference characters in the target text line; determining a to-be-selected partition interval of the target text and a to-be-selected partition interval coordinate corresponding to the to-be-selected partition interval according to the target position information corresponding to the target text and the reference position information corresponding to the reference text; acquiring calculation parameters corresponding to the text line image, and calculating a verification interval coordinate according to the calculation parameters and the target position information; carrying out coordinate verification on the coordinates of the to-be-selected partition interval based on the verification interval coordinates; and under the condition that the coordinate verification is passed, taking the interval to be selected and the interval coordinate to be selected as the target interval and the target interval coordinate of the target text.
The target text is understood as the text requiring determination of the two-sided division interval, namely the initial text a illustrated above, and the reference text is understood as the previous text of the target textOr the latter word, the edge uses the above example, the reference word is the initial word b. According to the target position information corresponding to the target characters, namely
Figure BDA0004075863530000161
Reference position information corresponding to the reference text, i.e. +.>
Figure BDA0004075863530000162
Determining the coordinates of the to-be-selected partition interval (j) of the target text, which corresponds to the to-be-selected partition interval start ,j end ). In practical application, since each initial text corresponds to two-sided segments, only the segments on the left side of the target text are described in this embodiment, and the segments on the right side of the target text can be judged and selected by the method. The calculated parameters corresponding to the text line image can be understood as the super parameters set and calculated in advance, and the calculated parameters comprise a first calculated parameter and a second calculated parameter. The approximate position of the left segmentation section of the target text, namely the verification segmentation section, can be estimated through calculating parameters and target position information, the verification segmentation section to be selected can be verified based on comparison of the verification section coordinates of the verification segmentation section and the candidate segmentation section coordinates, and accordingly whether the candidate segmentation section is the left segmentation section of the target text or not is judged, and if not, the verification segmentation section is directly used as the left segmentation section of the target text.
In the specific implementation, firstly, the characters are classified according to categories, which are divided into Chinese, digital, english, punctuation and other categories, and each category contains the aspect ratio P of the characters char_ratio And the approximate position P of the output position of the character recognition model over the width of the Chinese character left_ratio . The aspect ratio is used as a first calculation parameter, the position is used as a second calculation parameter, and when the first calculation parameter is set, the ratio of Chinese class to other classes is set to be 1, and the numbers, english and punctuation are set to be 0.5. The second hyper-parameter we obtained by: a batch of marked literal line image data sets is preset. The labeling information of each text line image comprises text content and eachCoordinates of the text. Outputting the position information of each character on the width axis by using a character recognition model, and respectively calculating the ratio of the distance between the left boundary of the character coordinate and the labeled character to the width of the character, namely h left_ratio =(pred x -gt xmin ) W, pred herein x Representing the position information (converted to the coordinate system of the text line image) output by the text recognition model xmin The x-axis coordinate position of the upper left corner of the marked text coordinate frame is represented, and w represents the width (gt) of the text itself xmax -gt xmin ). Judging the category (Chinese, number, english, punctuation and other categories) of each character, and counting h according to each category left_ratio Using k-means to perform class 5 clustering to obtain class 5 central values respectively; traversing class center values of 5 classes, and screening h within twenty percent of the class center values left_ratio And taking the class center with the largest number as the screened super-parameters. In summary, after the calculation parameters are determined by the above method, the coordinates of the verification interval can be calculated according to the calculation parameters and the target position information, so as to determine whether the to-be-selected segmentation interval is the left segmentation interval of the target text. At the position of
Figure BDA0004075863530000171
Figure BDA0004075863530000172
In the case of (1), the candidate segmentation interval is described as the left segmentation interval of the target text, wherein h m The height of the text box is calculated according to the coordinates of the text box.
In a specific embodiment of the present application, a target text a and a reference text b are determined in a target text row, a candidate segmentation interval between the texts and a candidate segmentation interval coordinate corresponding to the candidate segmentation interval are determined according to target position information corresponding to the target text a and reference position information corresponding to the reference text b, a calculated calculation parameter is obtained, verification is performed on the candidate verification interval coordinate of the candidate verification interval according to the calculation parameter and the target position information, and the candidate verification interval is determined to be the left segmentation interval of the target text a under the condition that verification passes. Similarly, the target character a and the reference character c are determined, and the right segmentation section of the target character a is determined according to the position information of the target character a and the reference character c.
In sum, the segmentation section obtained by vertical segmentation can be further judged and determined through the calculation parameters and the position information of the initial characters, the segmentation section which is segmented by vertical projection is eliminated, and the accuracy of the subsequent character coordinate calculation is further improved.
Further, if it is determined that the verification section to be selected does not pass the verification, the estimated verification partition section may be directly used as the target partition section of the target text. The method specifically comprises the following steps: and under the condition that the coordinate verification is not passed, taking the verification interval coordinate as the target segmentation interval coordinate corresponding to the initial character.
In practical application, when the partition interval to be selected is determined to be the partition interval after verification, the partition interval coordinate corresponding to the verification partition interval is taken as the target partition interval coordinate corresponding to the initial character based on the verification partition interval calculated before as the partition interval of the target character.
In a specific embodiment of the present application, under the condition that the coordinate verification is not passed, taking the verification section coordinate as a target segmentation section corresponding to the initial text, that is, taking the verification segmentation section coordinate as the target segmentation section coordinate corresponding to the initial text, and subsequently calculating the text coordinate corresponding to the initial text based on the target segmentation section coordinate and the text box coordinate.
Based on the above, the segmentation section corresponding to the target text line can be obtained by carrying out vertical projection segmentation on the text line image, and the segmentation section is further screened by calculating the position information of the parameters corresponding to the initial characters, so that the target segmentation section on two sides of each initial character is determined, the character coordinates of each initial character can be calculated based on the target segmentation section, and the accuracy of character coordinate calculation is improved.
Step 208: and calculating the text coordinates corresponding to the initial text based on the text box coordinates and the target segmentation interval coordinates.
The ordinate of the initial text can be obtained according to the text box coordinates, and the abscissa of the initial text can be obtained according to the target division interval coordinates of the target division intervals on two sides of the initial text, so that the text coordinates of the initial text can be calculated based on the text box coordinates and the target division interval coordinates.
In a specific embodiment of the present application, the ordinate of the initial text is determined to be (y 1, y 2) according to the text box coordinates (x 1, x2, y1, y 2), the abscissa of the initial text is determined to be (a 2, b 1) according to the coordinates (a 1, a 2) of the left-side divided section and the coordinates (b 1, b 2) of the right-side divided section, and then the text coordinates corresponding to the initial text are determined to be (a 2, b1, y1, y 2).
The text coordinate determining method comprises the steps of extracting text outlines of target text lines in a text line image, and determining text box coordinates of the target text lines in the text line image based on the text outlines; determining position information corresponding to initial characters in the target text line, performing vertical segmentation processing on the target text line, and determining segmentation interval coordinates corresponding to the target text line according to a processing result; updating the partition interval coordinates according to the position information, and determining target partition interval coordinates corresponding to the initial text based on an updating result; and calculating the text coordinates corresponding to the initial text based on the text box coordinates and the target segmentation interval coordinates. The method comprises the steps of extracting text box coordinates in a text line image, determining partition interval coordinates of a target text line by adopting a vertical partition method, updating the partition interval based on position information of initial characters in the target text line, enabling a vertical partition result of the target text line to be more accurate, obtaining target partition interval coordinates corresponding to the initial characters more accurately, and finally accurately calculating character coordinates of the initial characters in the text line image based on the extracted text box coordinates and the target partition interval coordinates, so that the coordinate recognition accuracy of each initial character in the text line image is improved.
The text coordinate determining method provided in the present application is taken as an example in information extraction, and the text coordinate determining method is further described below with reference to fig. 4. Fig. 4 shows a process flow chart of a text coordinate determining method applied to information extraction according to an embodiment of the present application, which specifically includes the following steps:
step 402: and carrying out edge-repairing processing on the text line image based on a preset rectangular frame, and determining a target text line in the text line image after edge repairing.
In one implementation manner, information extraction is required to be performed on the content in a certain paper document, after character recognition is performed on the paper document, an initial text line image is obtained, preprocessing operations such as anti-color processing, denoising processing, line removal processing and the like are performed on the initial text line image, a text line image is obtained, edge repair processing is performed on the text line image, and a target text line in the text line image is determined.
Step 404: and determining the text outline of the target text line based on a preset rectangular box in the text line image after edge repair.
In one implementation, a text outline around the target text line is determined in the edge-supplemented text line image according to a preset rectangular box.
Step 406: and determining an image coordinate system corresponding to the text line image, and calculating the text box coordinates of the target text line in the image coordinate system according to the side length information of the text outline.
In one implementation, the lower left corner of the text line image is taken as the origin of the image coordinate system, the coordinates of the text outline are calculated according to the side length information of the text outline, and the coordinates are taken as the text box coordinates of the target text line in the image coordinate system.
Step 408: and inputting the text line image into the text recognition model to obtain the position information corresponding to the initial text in the target text line output by the text recognition model.
In one implementation, a text line image is input into a text recognition model, and position information corresponding to each initial text in a target text line output by the text recognition model is obtained.
Step 410: and carrying out binarization processing on the text line image, and carrying out vertical segmentation processing on the binarized text line image.
In one implementation, a binarization process is performed on a text line image to obtain a binarized text line image, and a vertical projection segmentation process is performed on the binarized text line image.
Step 412: determining an initial segmentation interval in the vertically segmented text line image; and determining the initial segmentation interval as the segmentation interval corresponding to the target text line under the condition that the pixel points in the initial segmentation interval are all black pixel points.
In one possible implementation, after the vertical projection segmentation process is performed, an initial segmentation section corresponding to the target text line is determined, judgment is performed according to the pixel point in each initial segmentation section, and if the pixels are all black pixels, the initial segmentation section is determined to be the segmentation section corresponding to the target text line.
Step 414: and calculating the partition interval coordinates of the partition interval in the image coordinate system, wherein the partition interval coordinates and the text box coordinates are positioned in the same coordinate system.
In one implementation, the partition coordinates in the label in each partition re-image are calculated.
Step 416: and updating the partition interval coordinates according to the position information, and determining the target partition interval coordinates corresponding to the initial characters based on the updating result.
In one implementation manner, a target text and a reference text are determined in a target text line, a to-be-selected partition interval between the two texts and corresponding to-be-selected partition interval coordinates are determined according to target position information of the target text and reference position information of the reference text, calculation parameters corresponding to a text line image are obtained, verification interval coordinates are calculated according to the calculation parameters and the target position information, the verification interval coordinates and the to-be-selected partition interval coordinates are compared, the to-be-selected partition interval is determined to be the target partition interval of the target text under the condition that verification is passed, and the to-be-selected partition interval coordinates are taken as the target partition interval coordinates of the target text. If the verification is not passed, the verification segment is set as a target segment of the target character, and the verification segment coordinates are set as target segment coordinates of the target character.
Step 418: and calculating character coordinates corresponding to the initial characters based on the text box coordinates and the target segmentation interval coordinates.
In one implementation manner, after the target segmentation interval coordinates of each initial text in the target text line are determined by the method, text coordinates corresponding to each initial text are calculated based on the text box coordinates and the target segmentation interval coordinates, and the text coordinates and the text content are output as information extraction results.
According to the text coordinate determining method applied to information extraction, the text box coordinates in the text line image are extracted, the segmentation interval coordinates of the target text line are determined by adopting the vertical segmentation method, the segmentation interval is updated based on the position information of the initial text in the target text line, so that the vertical segmentation result of the target text line is more accurate, the target segmentation interval coordinates corresponding to the initial text can be obtained more accurately later, finally, the text coordinates of the initial text in the text line image are calculated accurately based on the extracted text box coordinates and the target segmentation interval coordinates, and the coordinate recognition accuracy of each initial text in the text line image is improved.
Corresponding to the above method embodiment, the present application further provides a text coordinate determining device embodiment, and fig. 5 shows a schematic structural diagram of a text coordinate determining device according to an embodiment of the present application. As shown in fig. 5, the apparatus includes:
An extraction module 502 configured to extract a text outline of a target text line in a text line image, determine text box coordinates of the target text line in the text line image based on the text outline;
the segmentation module 504 is configured to determine position information corresponding to the initial text in the target text line, perform vertical segmentation processing on the target text line, and determine segmentation interval coordinates corresponding to the target text line according to a processing result;
a determining module 506, configured to update the partition coordinates according to the location information, and determine target partition coordinates corresponding to the initial text based on an update result;
and a calculating module 508, configured to calculate text coordinates corresponding to the initial text based on the text box coordinates and the target segmentation interval coordinates.
Optionally, the apparatus further comprises a processing module configured to: determining an initial text line image; and performing inverse color processing, denoising processing and/or line removing processing on the initial text line image to obtain a text line image.
Optionally, the processing module is further configured to: extracting pixel values of pixel points in the initial text line image; determining a black pixel point set and a white pixel point set based on the pixel values; and under the condition that the number of pixels of the black pixel point set is larger than that of pixels of the white pixel point set, inverting the pixel value and obtaining an inverse-color text line image.
Optionally, the processing module is further configured to: performing binarization processing on the initial text line image; and denoising calculation is carried out on the initial text line image and the binarized initial text line image, so as to obtain a denoised text line image.
Optionally, the processing module is further configured to: performing projection processing on the initial text line image, and determining lines to be deleted in the initial text line image; and deleting the lines to be deleted to obtain an outgoing text line image.
Optionally, the extracting module 502 is further configured to: performing edge-repairing processing on the text line image based on a preset rectangular frame, and determining a target text line in the text line image after edge-repairing; determining the text outline of the target text line based on a preset rectangular box in the text line image after edge repair; and determining an image coordinate system corresponding to the text line image, and calculating the text box coordinates of the target text line in the image coordinate system according to the side length information of the text outline.
Optionally, the segmentation module 504 is further configured to: and inputting the text line image into a character recognition model to obtain position information corresponding to the initial characters in the target text line output by the character recognition model.
Optionally, the segmentation module 504 is further configured to: performing binarization processing on the text line image, and performing vertical segmentation processing on the binarized text line image; determining an initial segmentation interval in the vertically segmented text line image; under the condition that the pixel points in the initial segmentation interval are black pixel points, determining the initial segmentation interval as a segmentation interval corresponding to the target text line; and determining an image coordinate system corresponding to the text line image, and calculating a partition interval coordinate of the partition interval in the image coordinate system, wherein the partition interval coordinate and the text box coordinate are positioned in the same coordinate system.
Optionally, the determining module 506 is further configured to: determining target characters and reference characters in the target text line; determining a to-be-selected partition interval of the target text and a to-be-selected partition interval coordinate corresponding to the to-be-selected partition interval according to the target position information corresponding to the target text and the reference position information corresponding to the reference text; acquiring calculation parameters corresponding to the text line image, and calculating a verification interval coordinate according to the calculation parameters and the target position information; carrying out coordinate verification on the coordinates of the to-be-selected partition interval based on the verification interval coordinates; and under the condition that the coordinate verification is passed, taking the interval to be selected and the interval coordinate to be selected as the target interval and the target interval coordinate of the target text.
Optionally, the determining module 506 is further configured to: and under the condition that the coordinate verification is not passed, taking the verification interval coordinate as the target segmentation interval coordinate corresponding to the initial character.
The text coordinate determining device comprises an extracting module, a text coordinate determining module and a text coordinate determining module, wherein the extracting module is configured to extract a text outline of a target text line in a text line image, and determine text box coordinates of the target text line in the text line image based on the text outline; the segmentation module is configured to determine the position information corresponding to the initial text in the target text line, perform vertical segmentation processing on the target text line, and determine the segmentation interval coordinates corresponding to the target text line according to the processing result; the determining module is configured to update the partition interval coordinates according to the position information and determine target partition interval coordinates corresponding to the initial text based on an updating result; and the calculating module is configured to calculate character coordinates corresponding to the initial characters based on the text box coordinates and the target segmentation interval coordinates. The method comprises the steps of extracting text box coordinates in a text line image, determining partition interval coordinates of a target text line by adopting a vertical partition method, updating the partition interval based on position information of initial characters in the target text line, enabling a vertical partition result of the target text line to be more accurate, obtaining target partition interval coordinates corresponding to the initial characters more accurately, and finally accurately calculating character coordinates of the initial characters in the text line image based on the extracted text box coordinates and the target partition interval coordinates, so that the coordinate recognition accuracy of each initial character in the text line image is improved.
The above is an exemplary embodiment of a text coordinate determining apparatus of the present embodiment. It should be noted that, the technical solution of the text coordinate determining device and the technical solution of the text coordinate determining method belong to the same concept, and details of the technical solution of the text coordinate determining device, which are not described in detail, can be referred to the description of the technical solution of the text coordinate determining method.
Fig. 6 illustrates a block diagram of a computing device 600 provided in accordance with an embodiment of the present application. The components of computing device 600 include, but are not limited to, memory 610 and processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to hold data.
Computing device 600 also includes access device 640, access device 640 enabling computing device 600 to communicate via one or more networks 660. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 640 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present application, the above-described components of computing device 600, as well as other components not shown in FIG. 6, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 6 is for exemplary purposes only and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as desired.
Computing device 600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 600 may also be a mobile or stationary server.
Wherein the processor 620 performs the steps of the text coordinate determination method when executing the computer instructions.
The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the above text coordinate determining method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the above text coordinate determining method.
An embodiment of the present application also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the text coordinate determination method as described above.
The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the above text coordinate determining method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the above text coordinate determining method.
The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The above-disclosed preferred embodiments of the present application are provided only as an aid to the elucidation of the present application. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of this application. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This application is to be limited only by the claims and the full scope and equivalents thereof.

Claims (13)

1. A text coordinate determination method, comprising:
extracting a text outline of a target text line in a text line image, and determining text box coordinates of the target text line in the text line image based on the text outline;
determining position information corresponding to initial characters in the target text line, performing vertical segmentation processing on the target text line, and determining segmentation interval coordinates corresponding to the target text line according to a processing result;
updating the partition interval coordinates according to the position information, and determining target partition interval coordinates corresponding to the initial text based on an updating result;
and calculating the text coordinates corresponding to the initial text based on the text box coordinates and the target segmentation interval coordinates.
2. The method of claim 1, wherein prior to extracting the text outline of the target text line in the text line image, the method further comprises:
determining an initial text line image;
and performing inverse color processing, denoising processing and/or line removing processing on the initial text line image to obtain a text line image.
3. The method of claim 2, wherein the inverse processing of the initial text line image comprises:
Extracting pixel values of pixel points in the initial text line image;
determining a black pixel point set and a white pixel point set based on the pixel values;
and under the condition that the number of pixels of the black pixel point set is larger than that of pixels of the white pixel point set, inverting the pixel value and obtaining an inverse-color text line image.
4. The method of claim 2, wherein denoising the initial text line image comprises:
performing binarization processing on the initial text line image;
and denoising calculation is carried out on the initial text line image and the binarized initial text line image, so as to obtain a denoised text line image.
5. The method of claim 2, wherein performing a line-out process on the initial text line image comprises:
performing projection processing on the initial text line image, and determining lines to be deleted in the initial text line image;
and deleting the lines to be deleted to obtain an outgoing text line image.
6. The method of claim 1, wherein extracting a text outline of a target text line in a text line image, determining text box coordinates of the target text line in the text line image based on the text outline, comprises:
Performing edge-repairing processing on the text line image based on a preset rectangular frame, and determining a target text line in the text line image after edge-repairing;
determining the text outline of the target text line based on a preset rectangular box in the text line image after edge repair;
and determining an image coordinate system corresponding to the text line image, and calculating the text box coordinates of the target text line in the image coordinate system according to the side length information of the text outline.
7. The method of claim 1, wherein determining location information corresponding to an initial text in the target text line comprises:
and inputting the text line image into a character recognition model to obtain position information corresponding to the initial characters in the target text line output by the character recognition model.
8. The method of claim 1, wherein performing vertical segmentation processing on the target text line, and determining the segmentation interval coordinates corresponding to the target text line according to the processing result, comprises:
performing binarization processing on the text line image, and performing vertical segmentation processing on the binarized text line image;
determining an initial segmentation interval in the vertically segmented text line image;
Under the condition that the pixel points in the initial segmentation interval are black pixel points, determining the initial segmentation interval as a segmentation interval corresponding to the target text line;
and determining an image coordinate system corresponding to the text line image, and calculating a partition interval coordinate of the partition interval in the image coordinate system, wherein the partition interval coordinate and the text box coordinate are positioned in the same coordinate system.
9. The method of claim 1, wherein updating the partition coordinates according to the location information, and determining the target partition coordinates corresponding to the initial text based on the update result, comprises:
determining target characters and reference characters in the target text line;
determining a to-be-selected partition interval of the target text and a to-be-selected partition interval coordinate corresponding to the to-be-selected partition interval according to the target position information corresponding to the target text and the reference position information corresponding to the reference text;
acquiring calculation parameters corresponding to the text line image, and calculating a verification interval coordinate according to the calculation parameters and the target position information;
carrying out coordinate verification on the coordinates of the to-be-selected partition interval based on the verification interval coordinates;
And under the condition that the coordinate verification is passed, taking the interval to be selected and the interval coordinate to be selected as the target interval and the target interval coordinate of the target text.
10. The method of claim 9, wherein the method further comprises:
and under the condition that the coordinate verification is not passed, taking the verification interval coordinate as the target segmentation interval coordinate corresponding to the initial character.
11. A character coordinate determination apparatus, comprising:
an extraction module configured to extract a text outline of a target text line in a text line image, and determine text box coordinates of the target text line in the text line image based on the text outline;
the segmentation module is configured to determine the position information corresponding to the initial text in the target text line, perform vertical segmentation processing on the target text line, and determine the segmentation interval coordinates corresponding to the target text line according to the processing result;
the determining module is configured to update the partition interval coordinates according to the position information and determine target partition interval coordinates corresponding to the initial text based on an updating result;
and the calculating module is configured to calculate character coordinates corresponding to the initial characters based on the text box coordinates and the target segmentation interval coordinates.
12. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of any one of claims 1-10.
13. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1-10.
CN202310108392.2A 2023-02-02 2023-02-02 Text coordinate determining method and device Pending CN116030472A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310108392.2A CN116030472A (en) 2023-02-02 2023-02-02 Text coordinate determining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310108392.2A CN116030472A (en) 2023-02-02 2023-02-02 Text coordinate determining method and device

Publications (1)

Publication Number Publication Date
CN116030472A true CN116030472A (en) 2023-04-28

Family

ID=86070626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310108392.2A Pending CN116030472A (en) 2023-02-02 2023-02-02 Text coordinate determining method and device

Country Status (1)

Country Link
CN (1) CN116030472A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117079282A (en) * 2023-08-16 2023-11-17 读书郎教育科技有限公司 Intelligent dictionary pen based on image processing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117079282A (en) * 2023-08-16 2023-11-17 读书郎教育科技有限公司 Intelligent dictionary pen based on image processing

Similar Documents

Publication Publication Date Title
CN110210413B (en) Multidisciplinary test paper content detection and identification system and method based on deep learning
CN106446896B (en) Character segmentation method and device and electronic equipment
CN106960208B (en) Method and system for automatically segmenting and identifying instrument liquid crystal number
CN111259878A (en) Method and equipment for detecting text
CN112183038A (en) Form identification and typing method, computer equipment and computer readable storage medium
JPH08305803A (en) Operating method of learning machine of character template set
CN112070649B (en) Method and system for removing specific character string watermark
CN111680690A (en) Character recognition method and device
CN113158977B (en) Image character editing method for improving FANnet generation network
CN110738030A (en) Table reconstruction method and device, electronic equipment and storage medium
CN112507876A (en) Wired table picture analysis method and device based on semantic segmentation
CN112101386B (en) Text detection method, device, computer equipment and storage medium
CN111507337A (en) License plate recognition method based on hybrid neural network
CN113033558A (en) Text detection method and device for natural scene and storage medium
CN109741273A (en) A kind of mobile phone photograph low-quality images automatically process and methods of marking
CN111626145A (en) Simple and effective incomplete form identification and page-crossing splicing method
Kaundilya et al. Automated text extraction from images using OCR system
CN116030472A (en) Text coordinate determining method and device
CN109508716B (en) Image character positioning method and device
CN114495141A (en) Document paragraph position extraction method, electronic equipment and storage medium
CN112634288A (en) Equipment area image segmentation method and device
CN115797939A (en) Two-stage italic character recognition method and device based on deep learning
CN113989481A (en) Contract text image seal detection and removal method
CN110298350B (en) Efficient printing body Uyghur word segmentation algorithm
Rani et al. Object Detection in Natural Scene Images Using Thresholding Techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination