CN110210467B  Formula positioning method of text image, image processing device and storage medium  Google Patents
Formula positioning method of text image, image processing device and storage medium Download PDFInfo
 Publication number
 CN110210467B CN110210467B CN201910452711.5A CN201910452711A CN110210467B CN 110210467 B CN110210467 B CN 110210467B CN 201910452711 A CN201910452711 A CN 201910452711A CN 110210467 B CN110210467 B CN 110210467B
 Authority
 CN
 China
 Prior art keywords
 formula
 text
 value
 target
 information
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Active
Links
 230000000875 corresponding Effects 0.000 claims description 27
 238000004364 calculation method Methods 0.000 claims description 11
 238000010586 diagram Methods 0.000 description 16
 238000000034 method Methods 0.000 description 11
 238000001514 detection method Methods 0.000 description 5
 238000001914 filtration Methods 0.000 description 3
 230000003287 optical Effects 0.000 description 3
 239000003086 colorant Substances 0.000 description 2
 239000011159 matrix material Substances 0.000 description 2
 230000001537 neural Effects 0.000 description 2
 238000007781 preprocessing Methods 0.000 description 2
 230000001629 suppression Effects 0.000 description 2
 239000000969 carrier Substances 0.000 description 1
 238000003708 edge detection Methods 0.000 description 1
 230000000694 effects Effects 0.000 description 1
 230000004048 modification Effects 0.000 description 1
 238000006011 modification reaction Methods 0.000 description 1
 230000000306 recurrent Effects 0.000 description 1
 230000000007 visual effect Effects 0.000 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
 G06T5/00—Image enhancement or restoration
 G06T5/001—Image restoration
 G06T5/002—Denoising; Smoothing

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
 G06T7/00—Image analysis
 G06T7/10—Segmentation; Edge detection
 G06T7/11—Regionbased segmentation

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
 G06T7/00—Image analysis
 G06T7/10—Segmentation; Edge detection
 G06T7/13—Edge detection

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
 G06T7/00—Image analysis
 G06T7/90—Determination of colour characteristics

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
 G06V10/00—Arrangements for image or video recognition or understanding
 G06V10/20—Image preprocessing
 G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
 G06V10/225—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
Abstract
The application discloses a formula positioning method of a text image, an image processing device and a storage medium, wherein the formula positioning method of the text image comprises the following steps: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set. By the method, the formula in the text image can be accurately positioned.
Description
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a formula positioning method for text images, an image processing apparatus, and a storage medium.
Background
With the development of mobile internet technology, a large number of handheld mobile terminals such as smart phones and tablet computers enter our lives and become an indispensable part of our lives. The handheld terminals have the camera shooting function, and great convenience is provided for acquiring document information at any time.
Scientific formulas as a special information carrier are also widely stored in text documents. In practical application, scientific formulas often need to be positioned and extracted, and how to position the formulas in the text images becomes an urgent problem to be solved.
Disclosure of Invention
In order to solve the above problems, the present application provides a formula positioning method for a text image, an image processing apparatus, and a storage medium, which can accurately position a formula in a text image.
The technical scheme adopted by the application is as follows: a formula positioning method of a text image is provided, and the method comprises the following steps: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set.
The step of calculating the formula coordinate set and the formula boundary set of the text line according to the text positioning information and the attention information comprises the following steps: acquiring an attention information vector of a target special character in a target text line according to the attention information; wherein, the special characters are characters in a formula; judging whether an index value corresponding to the maximum value in the attention information vector is 0 or not; and if so, adding the adjacent formula information of the target special character into the formula boundary set.
Wherein, the method also comprises: calculating the width value of the target text line according to the text positioning information of the target text line; calculating an abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector; determining a maximum abscissa value and a minimum abscissa value according to the abscissa value of the target special character; and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.
The step of calculating the width value of the target text line according to the text positioning information of the target text line comprises the following steps: the normalized width value of the target text line is calculated using the following formula:wherein w is the coordinate width of the target text line, and h is the target text lineOf (c) is measured.
The step of calculating the abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector includes: calculating the abscissa value of the target special character by using the following formula:where w is the width of the target text line, aidx is the index value corresponding to the maximum value in the attention information vector, and w_{m}Is the normalized width value of the target text line.
Wherein, according to the abscissa value of the target special character, the step of determining the maximum and minimum abscissa values includes: determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character; when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values; if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value; and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.
Wherein, according to text positioning information, the maximum abscissa value and the minimum abscissa value, the step of calculating the formula coordinate set comprises: the formula coordinate set is calculated using the following formula:wherein x is_{1}Is the left value of the abscissa, x, of the formula_{2}Is the right value of the abscissa of the formula, y_{1}Is the value on the ordinate of the formula, y_{2}Is the lower value of the ordinate of the formula, b_{i0}For locating the left value of the abscissa in the information, b_{i2}Locating values on ordinate in information for text, b_{i3}Locating a lower value, w, of the ordinate in the information for the text_{min}Is the minimum abscissa value, w_{max}The maximum abscissa value.
Wherein, according to the formula coordinate set and the formula boundary set, the step of calculating the positioning coordinate of the formula in the text line comprises the following steps: judging whether the previous line of text of the target line of text is in a formula boundary set or not; if yes, judging whether the target line text has adjacent formula information; and if so, fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.
Wherein, the method also comprises: acquiring a binary image of a target formula area according to the formula coordinate set; carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula; and updating the formula coordinate set by adopting the ordinate of the target formula.
Another technical scheme adopted by the application is as follows: provided is an image processing apparatus including: the acquisition module is used for acquiring text positioning information and attention information of a text line; the first calculation module is used for calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and the second calculation module is used for calculating the positioning coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set.
Another technical scheme adopted by the application is as follows: there is provided an image processing apparatus comprising a processor and a memory for storing program data, the processor being arranged to execute the program data to implement the method as described above.
Another technical scheme adopted by the application is as follows: a computer storage medium is provided for storing program data for implementing the method as described above when executed by a processor.
The formula positioning method of the text image comprises the following steps: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set. By the mode, the formula in the text image can be positioned by using the attention information, so that a foundation is laid for subsequent formula identification, and the image of the formula can be accurately obtained.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:
FIG. 1 is a schematic flowchart of a formula positioning method for text images according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of first location coordinates of a text provided in an embodiment of the present application;
fig. 3 is a schematic flowchart of acquiring a boundary set according to an embodiment of the present application;
FIG. 4 is a schematic flow chart illustrating obtaining a formula set according to an embodiment of the present application;
FIG. 5 is a schematic flow chart illustrating the determination of the maximum abscissa and the minimum abscissa in the embodiment of the present application;
FIG. 6 is a schematic diagram of second location coordinates of a text provided in an embodiment of the present application;
FIG. 7 is a schematic flow chart illustrating calculation of formula positioning coordinates according to an embodiment of the present application;
FIG. 8 is a logic diagram of a formula positioning method for text images according to an embodiment of the present disclosure;
FIG. 9 is a logic diagram for obtaining a formula location coordinate set according to an embodiment of the present application;
FIG. 10 is a logic diagram of formula coordinate consolidation provided by an embodiment of the present application;
fig. 11 is a schematic diagram of a first structure of an image processing apparatus according to an embodiment of the present application;
fig. 12 is a second schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a computer storage medium provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first", "second", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover nonexclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1, fig. 1 is a schematic flowchart of a formula positioning method for a text image according to an embodiment of the present application, where the method includes:
step 11: text positioning information and attention information of a text line are acquired.
Text images, also called document images, i.e. documents in image format. The method is to convert a paper document into a document in an image format in a certain mode for electronic reading of a user. Typical text image formats include jpg (jpeg), BMP, PNG, GIF, FSP, TIFF, TGA, EPS, and the like.
Alternatively, the text location information may be location coordinates of the text. It will be appreciated that the text is generally arranged line by line in the form of "lines", and the location coordinates are generally the coordinates of the upper left and lower right points in the rectangular area in which a line of text is located.
As shown in fig. 2, fig. 2 is a schematic diagram of a first location coordinate of a text provided in an embodiment of the present application, where a (x) is_{1},y_{1}) Coordinate point representing the top left corner of the line of text, B (x)_{2},y_{2}) A coordinate point representing the lower right corner of the line of text.
In a specific operation, the text image may be subjected to a gradation process.
Grayscale is the most direct visual feature that describes the content of a grayscale image. It refers to the color depth of the dots in a blackandwhite image, generally ranging from 0 to 255, with white being 255 and black being 0, so the blackandwhite image is also called a grayscale image. The grayscale image matrix elements typically take values of [0, 255], and thus their data type is typically an 8bit unsigned integer, which is known as 256 levels of grayscale. When a color image is converted into a gray image, the effective brightness value of each pixel in the image needs to be calculated, and the calculation formula is as follows: y is 0.3R +0.59G + 0.11B.
Then, the text image is subjected to denoising processing.
Alternatively, the grayscale image may be gaussian smoothed using a gaussian filtering algorithm. The gaussian filtering is a process of weighted average of the whole image, and the value of each pixel point is obtained by weighted average of the value of each pixel point and other pixel values in the neighborhood. The specific operation of gaussian filtering is: each pixel in the image is scanned using a template (or convolution, mask), and the weighted average gray value of the pixels in the neighborhood determined by the template is used to replace the value of the pixel in the center of the template.
And secondly, carrying out binarization and reverse color processing on the text image.
Image Binarization (Image Binarization) is a process of setting the gray value of a pixel point on an Image to be 0 or 255, namely, the whole Image presents an obvious blackwhite effect.
The inverse is a color that can become white superimposed with the primary colors, i.e. the color of the primary colors subtracted from white (RGB: 255, 255, 255). For example, the reverse of red (RGB: 255, 0, 0) is cyan (0, 255, 255). In contrast, in the image subjected to the binarization processing in step 34, the gradation value 0 is changed to the gradation value 255, and the gradation value 255 is changed to the gradation value 0.
Finally, edge calculations may be performed on the text image.
Alternatively, a Canny edge algorithm can be used, which aims to find an optimal edge detection algorithm, which means:
(1) optimal detection: the algorithm can identify actual edges in the image as much as possible, and the probability of missing detection of the actual edges and the probability of false detection of the nonedges are both as small as possible;
(2) optimal positioning criterion: the position of the detected edge point is closest to the position of the actual edge point, or the degree that the detected edge deviates from the real edge of the object due to the influence of noise is minimum;
(3) the detection points correspond to the edge points one by one: the edge points detected by the operator should have a onetoone correspondence with the actual edge points.
The Canny edge algorithm may include the following steps:
(1) finding intensity gradients (intensity gradients) of the image;
(2) applying a nonmaximum suppression (nonmaximum suppression) technique to eliminate edge false detection (which is not originally detected but detected);
(3) applying a dual threshold approach to determine possible (potential) boundaries;
(4) the boundaries are tracked using a hysteresis technique.
And preprocessing the text image to be corrected in the above mode, and starting to acquire first inclination information.
It can be understood that, through the above preprocessing of the text image, the points at the upper left corner and the lower right corner of the region where the text line is located can be identified and positioned through the identification of the image.
OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a blackwhite dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software.
Aocr (attention ocr), which is an algorithm for recognizing a single line of text by using an attention mechanism, generally takes CNN (Convolutional Neural Networks) features as input, and calculates an attention weight of a new state by using an attention model for attention weights of a state and a previous state of RNN (recurrent Neural Networks). And then, inputting the CNN characteristics and the weight into the RNN, and obtaining a result through encoding and decoding.
Step 12: and calculating a formula coordinate set and a formula boundary set of the text lines according to the text positioning information and the attention information.
Step 12 may specifically include two aspects, that is, first, a formula boundary set is obtained; second, a formula coordinate set is calculated.
Referring to fig. 3, fig. 3 is a schematic flowchart of acquiring a boundary set according to an embodiment of the present application, where the method includes:
step 31: acquiring an attention information vector of a target special character in a target text line according to the attention information; wherein, the special characters are characters in a formula.
Optionally, the extracted special characters may be encoded to obtain encoding characteristics; calculating a prediction probability for the coding features; and calculating weights of different encoding characteristics by using an attention mechanism to obtain an encoded attention information vector.
Step 32: and judging whether the index value corresponding to the maximum value in the attention information vector is 0 or not.
The attention confidence vectors are indexed according to 0, 1 and 2 … …, wherein if the index value corresponding to the maximum value is 0, the maximum value is the head of the vector, and the special character is further indicated to be positioned at the head of the text line.
If the determination result in step 32 is yes, step 33 is executed.
Step 33: the formula information adjacent to the target special character is added to the formula boundary set.
The adjacent formula information is used for indicating that the special character is positioned at the head of the text line, and the tail of the line of the previous text and the head of the text line possibly form a formula together.
Referring to fig. 4, fig. 4 is a schematic flowchart of obtaining a formula set according to an embodiment of the present application, where the method includes:
step 41: and calculating the width value of the target text line according to the text positioning information of the target text line.
Alternatively, the width value of the target text line may be calculated using the following formula:
wherein w is the coordinate width of the target text line, h is the height of the target text line, w_{m}The result of the calculation of (A) is rounded up, e.g. w_{m}If the calculation result of (a) is 1.5, then the value can be 2.
Step 42: and calculating the abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector.
Alternatively, the abscissa value of the target special character may be calculated using the following formula:
where w is the width of the target text line, aidx is the index value corresponding to the maximum value in the attention information vector, and w_{m}Is the width value of the target text line, C_{w}The result of the calculation of (a) is rounded up.
Step 43: and determining the maximum abscissa value and the minimum abscissa value according to the abscissa value of the target special character.
Optionally, referring to fig. 5, fig. 5 is a schematic flowchart of determining a maximum abscissa and a minimum abscissa in an embodiment of the present application, where the method includes:
step 431: and determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character.
The abscissa value of the target special character is the abscissa value C of the target special character calculated in the above step 42_{w}Here, the maximum abscissa value W is set_{max}And a minimum abscissa value W_{min}。
Alternatively, the initial maximum and minimum abscissa values may be determined by traversing the initial abscissa values of the target special character obtained after one text line.
Step 432: and when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values.
Step 433: and if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value.
If C_{w}Ratio W_{min}Small, then to W_{min}Updating is carried out, optionally W can be added_{min}Is replaced by C_{w}The value of (c).
Step 434: and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.
If C_{w}Ratio W_{max}Large, then to W_{max}Updating is carried out, optionally W can be added_{max}Is replaced by C_{w}The value of (c).
Step 44: and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.
Alternatively, the formula coordinate set may be calculated using the following formula:
wherein x is_{1}Is the left value of the abscissa, x, of the formula_{2}Is the right value of the abscissa of the formula, y_{1}Is the value on the ordinate of the formula, y_{2}Is the lower value of the ordinate of the formula, b_{i0}For locating the left value of the abscissa in the information, b_{i1}For locating the right value of the abscissa in the information, b_{i2}Locating values on ordinate in information for text, b_{i3}Locating a lower value, w, of the ordinate in the information for the text_{min}Is the minimum abscissa value, w_{max}The maximum abscissa value.
Step 13: and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set.
Referring to fig. 6, fig. 6 is a schematic diagram of a second location coordinate of a text provided in an embodiment of the present application, and it can be understood that, in some embodiments, a formula to be located may not be in the same text line, for example, a previous part of the formula is in a previous text line, and a next part of the formula is in a next text line. As shown in fig. 6, "exemplary" in "exemplary text" is on the upper line, and "text" is on the lower line.
Optionally, as shown in fig. 7, fig. 7 is a schematic flowchart of calculating a formulabased positioning coordinate in an embodiment of the present application, where the method includes:
step 71: and judging whether the text of the previous line of the target line of text is in the formula boundary set.
If the determination result in step 71 is yes, step 72 is executed.
It is understood that, through the determination process of step 71, it is known whether there is a formula located at the head or tail of the line in the previous line of text, and thus it is possible to use the same formula as the formula of the present line.
Step 72: and judging whether the target line text has adjacent formula information.
Wherein the adjacent formula information is the adjacent formula information added in the above step 33.
If the determination result in step 72 is yes, step 73 is executed.
Step 73: and fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.
As shown in FIG. 6, the upper left coordinate of "exemplary" in "exemplary text" is C (x)_{3},y_{3}) And the lower right coordinate is D (x)_{4},y_{4}) The upper left coordinate of "text" is E (x)_{5}Y5) with the lower right coordinate F (x)_{6}Y 6). Then the coordinates of the whole formula can be obtained from combining the coordinates.
In addition, in the process of coordinate calculation, the ordinate may be updated, specifically: acquiring a binary image of a target formula area according to the formula coordinate set; carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula; and updating the formula coordinate set by adopting the ordinate of the target formula.
Different from the prior art, the formula positioning method for text images provided by the embodiment includes: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set. By the mode, the formula in the text image can be positioned by using the attention information, so that a foundation is laid for subsequent formula identification, and the image of the formula can be accurately obtained.
The above embodiments are described below in several detail steps:
referring to fig. 8, fig. 8 is a logic diagram of a formula positioning method for text images according to an embodiment of the present application, where the method includes:
step 81: inputting a text image S containing a formula, a positioning coordinate information set B of each line of text, an identification information set T of AOCR on each line of text, and an attention information set A of each line of text.
Step 82 a: and acquiring the ith row of text information ti from T.
Step 82 b: and carrying out binarization on the image S to obtain a binarized image St.
Step 82a and step 82b may be executed simultaneously or sequentially.
Step 83: and judging whether the ti text characters have the mathematical keywords or not. If yes, go to step 84, otherwise go back to step 82 a.
Step 84: the corresponding attention information ai is obtained from a.
Step 85: and obtaining a formula coordinate set AB, a formula boundary set FB and a corresponding number k of the ith line of text according to the ti and the ai and the corresponding text positioning coordinate information bi.
Step 86: and calculating the formula positioning coordinate of the line by using AB, FB, k and St.
Step 87: all formula sets FBL are output.
Referring to fig. 9, fig. 9 is a schematic logic diagram for obtaining a formula location coordinate set according to an embodiment of the present application, where the method includes:
step 901: all mathematical key characters for ti are sought.
Step 902: and taking the mathematical key character as a center, searching all nonChinese characters to the left and the right, and acquiring a corresponding number set FS.
Step 903: calculate the width w and height h of the line of text and normalize the width to w_{m}Setting a minimum abscissa value W_{min}And the maximum abscissa value W_{max}。
Step 904: and traversing the FS to obtain a corresponding number FS, and extracting an attention information vector a of ai corresponding to FS.
Step 905: obtaining the index aidx corresponding to the maximum value in a, and calculating the abscissa value C_{w}。
Step 906: it is queried whether aidx is first. If yes, go to step 907, otherwise go to step 908.
Step 907: and adding adjacent formula information into the FB corresponding position, and reserving the number k corresponding to the ith line of text.
Step 908: judgment C_{w}Whether or not to compare W_{min}Is small. If yes, go to step 909, otherwise go to step 910.
Step 909: by C_{w}Updating W_{min}。
Step 910: judgment C_{w}Whether or not to compare W_{max}Is large. If yes, go to step 911, otherwise go to step 912.
Step 911: by C_{w}Updating W_{max}。
Step 912: return to step 904 until the FS is processed.
Step 913: and calculating the current formula AB and adding the current formula AB into a formula coordinate set AB.
Referring to fig. 10, fig. 10 is a logic diagram of formula coordinate combination provided in the embodiment of the present application, where the method includes:
step 101: and acquiring the jth formula coordinate from AB, and intercepting the temporary binary image tt from St.
Step 102: and (5) carrying out vertical coordinate projection on tt by using a projection method to obtain an actual vertical coordinate, and updating the vertical coordinate of the jth formula.
Step 103: and returning to the step 101 until the left and right formula coordinates are completely processed, and executing the next step.
Step 104: and judging whether the FB corresponding to the k1 number exists. If yes, go to step 105, otherwise go to step 107.
Step 105: it is determined whether the kth fb has adjacent formula information. If yes, go to step 106, otherwise go to step 107.
Step 106: and fusing the last formula coordinate in the FBL and the first formula coordinate of the current AB into a new formula coordinate, replacing the last formula coordinate in the FBL, and adding the rest formulas in the AB into the FBL.
Step 107: the current corresponding AB formula coordinate set is added to the formula set FBL.
Step 108: and outputting the formula coordinate set FBL of the line text.
It is to be understood that the logic steps described above are based on the above embodiments, and the principles and calculation methods are similar, and are not described herein again.
Referring to fig. 11, fig. 11 is a schematic diagram of a first structure of an image processing apparatus according to an embodiment of the present disclosure, where the image processing apparatus 110 includes an obtaining module 111, a first calculating module 112, and a second calculating module 113.
The obtaining module 111 is configured to obtain text positioning information and attention information of a text line; the first calculating module 112 is configured to calculate a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; the second calculating module 113 is configured to calculate the location coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set.
Referring to fig. 12, fig. 12 is a schematic diagram of a second structure of the image processing apparatus according to the embodiment of the present application, where the image processing apparatus 120 includes a processor 121 and a memory 122, the memory 122 is used for storing program data, and the processor 121 is used for executing the program data to implement the following method:
acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: acquiring an attention information vector of a target special character in a target text line according to the attention information; wherein, the special characters are characters in a formula; judging whether an index value corresponding to the maximum value in the attention information vector is 0 or not; and if so, adding the adjacent formula information of the target special character into the formula boundary set.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: calculating the width value of the target text line according to the text positioning information of the target text line; calculating an abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector; determining a maximum abscissa value and a minimum abscissa value according to the abscissa value of the target special character; and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.
Optionally, processor 121 is also configured to execute runtime data to implement, for exampleThe following method: the normalized width value of the target text line is calculated using the following formula:wherein w is the coordinate width of the target text line, and h is the height of the target text line.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: calculating the abscissa value of the target special character by using the following formula:where w is the width of the target text line, aidx is the index value corresponding to the maximum value in the attention information vector, and w_{m}Is the normalized width value of the target text line.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character; when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values; if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value; and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: the formula coordinate set is calculated using the following formula:wherein x is_{1}Is the left value of the abscissa, x, of the formula_{2}Is the right value of the abscissa of the formula, y_{1}Is the value on the ordinate of the formula, y_{2}Is the lower value of the ordinate of the formula, b_{i0}For locating the left value of the abscissa in the information, b_{i2}Locating values on ordinate in information for text, b_{i3}Locating a lower value, w, of the ordinate in the information for the text_{min}Is the minimum abscissa value, w_{max}Is the maximum abscissaThe value is obtained.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: judging whether the previous line of text of the target line of text is in a formula boundary set or not; if yes, judging whether the target line text has adjacent formula information; and if so, fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: acquiring a binary image of a target formula area according to the formula coordinate set; carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula; and updating the formula coordinate set by adopting the ordinate of the target formula.
Referring to fig. 13, fig. 13 is a schematic structural diagram of a computer storage medium according to an embodiment of the present application, the computer storage medium 130 stores program data 131, and the program data 131 is executed by a processor to implement the following method:
acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the abovedescribed device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated units in the other embodiments described above may be stored in a computerreadable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a Udisk, a removable hard disk, a ReadOnly Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made according to the content of the present specification and the accompanying drawings, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.
Claims (11)
1. A formula positioning method of a text image is characterized by comprising the following steps:
acquiring text positioning information and attention information of a text line; wherein the attention information is obtained by identifying the text line by using an attention mechanism;
calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information;
calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set;
the step of calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information includes:
acquiring an attention information vector of a target special character in a target text line according to the attention information; the special characters are characters in a formula, the attention information vector is obtained by encoding the target special characters to obtain encoding characteristics, and the weight of the encoding characteristics is calculated according to an attention mechanism;
judging whether an index value corresponding to the maximum value in the attention information vector is 0 or not;
and if so, adding the adjacent formula information of the target special character into a formula boundary set.
2. The method of claim 1,
the method further comprises the following steps:
calculating the width value of the target text line according to the text positioning information of the target text line;
calculating an abscissa value of the target special character according to the width value of the target text line and an index value corresponding to the maximum value in the attention information vector;
determining a maximum abscissa value and a minimum abscissa value according to the abscissa value of the target special character;
and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.
3. The method of claim 2,
the step of calculating the width value of the target text line according to the text positioning information of the target text line comprises the following steps:
calculating the normalized width value of the target text line using the following formula:
wherein w is the coordinate width of the target text line, and h is the height of the target text line.
4. The method of claim 2,
the step of calculating the abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector includes:
calculating the abscissa value of the target special character by using the following formula:
wherein w is the width of the target text line, aidx is the index value corresponding to the maximum value in the attention information vector, and w_{m}And the normalized width value of the target text line.
5. The method of claim 2,
the step of determining the maximum and minimum abscissa values of the target special character, comprising:
determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character;
when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values;
if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value;
and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.
6. The method of claim 2,
the step of calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value includes:
the formula coordinate set is calculated using the following formula:
wherein x is_{1}Is the left value of the abscissa, x, of the formula_{2}Is the right value of the abscissa of the formula, y_{1}Is the value on the ordinate of the formula, y_{2}Is the lower value of the ordinate of the formula, b_{i0}For locating the left value of the abscissa in the information, b_{i2}For locating values on the ordinate in the information, b_{i3}Locating a lower value, w, of a vertical coordinate in the text_{min}Is the minimum abscissa value, w_{max}The maximum abscissa value.
7. The method of claim 1,
the step of calculating the location coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set includes:
judging whether the previous line of text of the target line of text is in the formula boundary set or not;
if yes, judging whether the target line text has adjacent formula information;
and if so, fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.
8. The method of claim 1,
the method further comprises the following steps:
acquiring a binary image of a target formula area according to the formula coordinate set;
carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula;
and updating the formula coordinate set by adopting the ordinate of the target formula.
9. An image processing apparatus characterized by comprising:
the acquisition module is used for acquiring text positioning information and attention information of a text line;
the first calculation module is used for calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information;
the second calculation module is used for calculating the positioning coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set; wherein the attention information is obtained by identifying the text line by using an attention mechanism;
the step of calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information includes:
acquiring an attention information vector of a target special character in a target text line according to the attention information; the special characters are characters in a formula, the attention information vector is obtained by encoding the target special characters to obtain encoding characteristics, and the weight of the encoding characteristics is calculated according to an attention mechanism;
judging whether an index value corresponding to the maximum value in the attention information vector is 0 or not;
and if so, adding the adjacent formula information of the target special character into a formula boundary set.
10. An image processing apparatus, characterized in that the image processing apparatus comprises a processor and a memory for storing program data, the processor being adapted to execute the program data to implement the method according to any of claims 18.
11. A computer storage medium for storing program data, which when executed by a processor is adapted to carry out the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201910452711.5A CN110210467B (en)  20190528  20190528  Formula positioning method of text image, image processing device and storage medium 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201910452711.5A CN110210467B (en)  20190528  20190528  Formula positioning method of text image, image processing device and storage medium 
Publications (2)
Publication Number  Publication Date 

CN110210467A CN110210467A (en)  20190906 
CN110210467B true CN110210467B (en)  20210730 
Family
ID=67789041
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201910452711.5A Active CN110210467B (en)  20190528  20190528  Formula positioning method of text image, image processing device and storage medium 
Country Status (1)
Country  Link 

CN (1)  CN110210467B (en) 
Families Citing this family (1)
Publication number  Priority date  Publication date  Assignee  Title 

CN112699337B (en) *  20191022  20220729  北京易真学思教育科技有限公司  Equation correction method, electronic device and computer storage medium 
Citations (9)
Publication number  Priority date  Publication date  Assignee  Title 

CN104751148A (en) *  20150416  20150701  同方知网数字出版技术股份有限公司  Method for recognizing scientific formulas in layout file 
CN105913057A (en) *  20160412  20160831  中国传媒大学  Projection and structure characteristicbased inimage mathematical formula detection method 
CN107169485A (en) *  20170328  20170915  北京捷通华声科技股份有限公司  A kind of method for identifying mathematical formula and device 
CN107798321A (en) *  20171204  20180313  海南云江科技有限公司  A kind of examination paper analysis method and computing device 
CN108399386A (en) *  20180226  20180814  阿博茨德（北京）科技有限公司  Information extracting method in pie chart and device 
CN109241861A (en) *  20180814  20190118  科大讯飞股份有限公司  A kind of method for identifying mathematical formula, device, equipment and storage medium 
CN109471583A (en) *  20140320  20190315  卡西欧计算机株式会社  Electronic equipment, mathematical expression display control method and recording medium 
CN109614944A (en) *  20181217  20190412  科大讯飞股份有限公司  A kind of method for identifying mathematical formula, device, equipment and readable storage medium storing program for executing 
CN111340020A (en) *  20191212  20200626  科大讯飞股份有限公司  Formula identification method, device, equipment and storage medium 
Family Cites Families (1)
Publication number  Priority date  Publication date  Assignee  Title 

US9078093B2 (en) *  20111019  20150707  Electronics And Telecommunications Research Institute  Apparatus and method for recognizing target mobile communication terminal 

2019
 20190528 CN CN201910452711.5A patent/CN110210467B/en active Active
Patent Citations (9)
Publication number  Priority date  Publication date  Assignee  Title 

CN109471583A (en) *  20140320  20190315  卡西欧计算机株式会社  Electronic equipment, mathematical expression display control method and recording medium 
CN104751148A (en) *  20150416  20150701  同方知网数字出版技术股份有限公司  Method for recognizing scientific formulas in layout file 
CN105913057A (en) *  20160412  20160831  中国传媒大学  Projection and structure characteristicbased inimage mathematical formula detection method 
CN107169485A (en) *  20170328  20170915  北京捷通华声科技股份有限公司  A kind of method for identifying mathematical formula and device 
CN107798321A (en) *  20171204  20180313  海南云江科技有限公司  A kind of examination paper analysis method and computing device 
CN108399386A (en) *  20180226  20180814  阿博茨德（北京）科技有限公司  Information extracting method in pie chart and device 
CN109241861A (en) *  20180814  20190118  科大讯飞股份有限公司  A kind of method for identifying mathematical formula, device, equipment and storage medium 
CN109614944A (en) *  20181217  20190412  科大讯飞股份有限公司  A kind of method for identifying mathematical formula, device, equipment and readable storage medium storing program for executing 
CN111340020A (en) *  20191212  20200626  科大讯飞股份有限公司  Formula identification method, device, equipment and storage medium 
NonPatent Citations (3)
Title 

"Attentionbased Extraction of Structured Information from Street View Imagery";Zbigniew Wojna， Alex Gorban et.al.;《arXiv》;20170820;第17页 * 
"Embedding a Mathematical OCR Module into OCRopus";Shinpei Yamazaki, Fumihiro Furukori et.al.;《2011 International Conference on Document Analysis and Recognition》;20111231;第880884页 * 
"中文电子文档的数学公式定位研究";林晓燕，高良才，汤帜;《北京大学学报(自然科学版)》;20140131;第50卷(第1期);第1724 * 
Also Published As
Publication number  Publication date 

CN110210467A (en)  20190906 
Similar Documents
Publication  Publication Date  Title 

US11302109B2 (en)  Range and/or polaritybased thresholding for improved data extraction  
US9754164B2 (en)  Systems and methods for classifying objects in digital images captured using mobile devices  
US9769354B2 (en)  Systems and methods of processing scanned data  
KR100339691B1 (en)  Apparatus for recognizing code and method therefor  
CN108345880B (en)  Invoice identification method and device, computer equipment and storage medium  
US20070253040A1 (en)  Color scanning to enhance bitonal image  
EP2974261A2 (en)  Systems and methods for classifying objects in digital images captured using mobile devices  
US8331670B2 (en)  Method of detection document alteration by comparing characters using shape features of characters  
CN110046529B (en)  Twodimensional code identification method, device and equipment  
US9235779B2 (en)  Method and apparatus for recognizing a character based on a photographed image  
JP2013042415A (en)  Image processing apparatus, image processing method, and computer program  
US9171224B2 (en)  Method of improving contrast for text extraction and recognition applications  
US9626601B2 (en)  Identifying image transformations for improving optical character recognition quality  
CN110956171A (en)  Automatic nameplate identification method and device, computer equipment and storage medium  
US8848984B2 (en)  Dynamic thresholds for document tamper detection  
CN110210467B (en)  Formula positioning method of text image, image processing device and storage medium  
RU2603495C1 (en)  Classification of document images based on parameters of colour layers  
US8705134B2 (en)  Method of processing an image to clarify text in the image  
US11341739B2 (en)  Image processing device, image processing method, and program recording medium  
US20210064859A1 (en)  Image processing system, image processing method, and storage medium  
US20140086473A1 (en)  Image processing device, an image processing method and a program to be used to implement the image processing  
JP2010074342A (en)  Image processing apparatus, image forming apparatus, and program  
JP2008252877A (en)  Image processing method, image processing apparatus, image reading apparatus, image forming apparatus, computer program and computer readable recording medium  
CN110390643B (en)  License plate enhancement method and device and electronic equipment  
Konya et al.  Adaptive methods for robust document image understanding 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
GR01  Patent grant  
GR01  Patent grant 