CN110210467B - Formula positioning method of text image, image processing device and storage medium - Google Patents

Formula positioning method of text image, image processing device and storage medium Download PDF

Info

Publication number
CN110210467B
CN110210467B CN201910452711.5A CN201910452711A CN110210467B CN 110210467 B CN110210467 B CN 110210467B CN 201910452711 A CN201910452711 A CN 201910452711A CN 110210467 B CN110210467 B CN 110210467B
Authority
CN
China
Prior art keywords
formula
text
value
target
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910452711.5A
Other languages
Chinese (zh)
Other versions
CN110210467A (en
Inventor
黄家冕
梁炎
王卫锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huanju Mark Network Information Co ltd
Original Assignee
Guangzhou Huaduo Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huaduo Network Technology Co Ltd filed Critical Guangzhou Huaduo Network Technology Co Ltd
Priority to CN201910452711.5A priority Critical patent/CN110210467B/en
Publication of CN110210467A publication Critical patent/CN110210467A/en
Application granted granted Critical
Publication of CN110210467B publication Critical patent/CN110210467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06T5/70
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/225Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area

Abstract

The application discloses a formula positioning method of a text image, an image processing device and a storage medium, wherein the formula positioning method of the text image comprises the following steps: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set. By the method, the formula in the text image can be accurately positioned.

Description

Formula positioning method of text image, image processing device and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a formula positioning method for text images, an image processing apparatus, and a storage medium.
Background
With the development of mobile internet technology, a large number of handheld mobile terminals such as smart phones and tablet computers enter our lives and become an indispensable part of our lives. The handheld terminals have the camera shooting function, and great convenience is provided for acquiring document information at any time.
Scientific formulas as a special information carrier are also widely stored in text documents. In practical application, scientific formulas often need to be positioned and extracted, and how to position the formulas in the text images becomes an urgent problem to be solved.
Disclosure of Invention
In order to solve the above problems, the present application provides a formula positioning method for a text image, an image processing apparatus, and a storage medium, which can accurately position a formula in a text image.
The technical scheme adopted by the application is as follows: a formula positioning method of a text image is provided, and the method comprises the following steps: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set.
The step of calculating the formula coordinate set and the formula boundary set of the text line according to the text positioning information and the attention information comprises the following steps: acquiring an attention information vector of a target special character in a target text line according to the attention information; wherein, the special characters are characters in a formula; judging whether an index value corresponding to the maximum value in the attention information vector is 0 or not; and if so, adding the adjacent formula information of the target special character into the formula boundary set.
Wherein, the method also comprises: calculating the width value of the target text line according to the text positioning information of the target text line; calculating an abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector; determining a maximum abscissa value and a minimum abscissa value according to the abscissa value of the target special character; and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.
The step of calculating the width value of the target text line according to the text positioning information of the target text line comprises the following steps: the normalized width value of the target text line is calculated using the following formula:
Figure BDA0002075644940000021
wherein w is the coordinate width of the target text line, and h is the target text lineOf (c) is measured.
The step of calculating the abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector includes: calculating the abscissa value of the target special character by using the following formula:
Figure BDA0002075644940000022
where w is the width of the target text line, aidx is the index value corresponding to the maximum value in the attention information vector, and wmIs the normalized width value of the target text line.
Wherein, according to the abscissa value of the target special character, the step of determining the maximum and minimum abscissa values includes: determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character; when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values; if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value; and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.
Wherein, according to text positioning information, the maximum abscissa value and the minimum abscissa value, the step of calculating the formula coordinate set comprises: the formula coordinate set is calculated using the following formula:
Figure BDA0002075644940000023
wherein x is1Is the left value of the abscissa, x, of the formula2Is the right value of the abscissa of the formula, y1Is the value on the ordinate of the formula, y2Is the lower value of the ordinate of the formula, bi0For locating the left value of the abscissa in the information, bi2Locating values on ordinate in information for text, bi3Locating a lower value, w, of the ordinate in the information for the textminIs the minimum abscissa value, wmaxThe maximum abscissa value.
Wherein, according to the formula coordinate set and the formula boundary set, the step of calculating the positioning coordinate of the formula in the text line comprises the following steps: judging whether the previous line of text of the target line of text is in a formula boundary set or not; if yes, judging whether the target line text has adjacent formula information; and if so, fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.
Wherein, the method also comprises: acquiring a binary image of a target formula area according to the formula coordinate set; carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula; and updating the formula coordinate set by adopting the ordinate of the target formula.
Another technical scheme adopted by the application is as follows: provided is an image processing apparatus including: the acquisition module is used for acquiring text positioning information and attention information of a text line; the first calculation module is used for calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and the second calculation module is used for calculating the positioning coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set.
Another technical scheme adopted by the application is as follows: there is provided an image processing apparatus comprising a processor and a memory for storing program data, the processor being arranged to execute the program data to implement the method as described above.
Another technical scheme adopted by the application is as follows: a computer storage medium is provided for storing program data for implementing the method as described above when executed by a processor.
The formula positioning method of the text image comprises the following steps: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set. By the mode, the formula in the text image can be positioned by using the attention information, so that a foundation is laid for subsequent formula identification, and the image of the formula can be accurately obtained.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:
FIG. 1 is a schematic flowchart of a formula positioning method for text images according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of first location coordinates of a text provided in an embodiment of the present application;
fig. 3 is a schematic flowchart of acquiring a boundary set according to an embodiment of the present application;
FIG. 4 is a schematic flow chart illustrating obtaining a formula set according to an embodiment of the present application;
FIG. 5 is a schematic flow chart illustrating the determination of the maximum abscissa and the minimum abscissa in the embodiment of the present application;
FIG. 6 is a schematic diagram of second location coordinates of a text provided in an embodiment of the present application;
FIG. 7 is a schematic flow chart illustrating calculation of formula positioning coordinates according to an embodiment of the present application;
FIG. 8 is a logic diagram of a formula positioning method for text images according to an embodiment of the present disclosure;
FIG. 9 is a logic diagram for obtaining a formula location coordinate set according to an embodiment of the present application;
FIG. 10 is a logic diagram of formula coordinate consolidation provided by an embodiment of the present application;
fig. 11 is a schematic diagram of a first structure of an image processing apparatus according to an embodiment of the present application;
fig. 12 is a second schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a computer storage medium provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first", "second", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1, fig. 1 is a schematic flowchart of a formula positioning method for a text image according to an embodiment of the present application, where the method includes:
step 11: text positioning information and attention information of a text line are acquired.
Text images, also called document images, i.e. documents in image format. The method is to convert a paper document into a document in an image format in a certain mode for electronic reading of a user. Typical text image formats include jpg (jpeg), BMP, PNG, GIF, FSP, TIFF, TGA, EPS, and the like.
Alternatively, the text location information may be location coordinates of the text. It will be appreciated that the text is generally arranged line by line in the form of "lines", and the location coordinates are generally the coordinates of the upper left and lower right points in the rectangular area in which a line of text is located.
As shown in fig. 2, fig. 2 is a schematic diagram of a first location coordinate of a text provided in an embodiment of the present application, where a (x) is1,y1) Coordinate point representing the top left corner of the line of text, B (x)2,y2) A coordinate point representing the lower right corner of the line of text.
In a specific operation, the text image may be subjected to a gradation process.
Grayscale is the most direct visual feature that describes the content of a grayscale image. It refers to the color depth of the dots in a black-and-white image, generally ranging from 0 to 255, with white being 255 and black being 0, so the black-and-white image is also called a grayscale image. The grayscale image matrix elements typically take values of [0, 255], and thus their data type is typically an 8-bit unsigned integer, which is known as 256 levels of grayscale. When a color image is converted into a gray image, the effective brightness value of each pixel in the image needs to be calculated, and the calculation formula is as follows: y is 0.3R +0.59G + 0.11B.
Then, the text image is subjected to denoising processing.
Alternatively, the grayscale image may be gaussian smoothed using a gaussian filtering algorithm. The gaussian filtering is a process of weighted average of the whole image, and the value of each pixel point is obtained by weighted average of the value of each pixel point and other pixel values in the neighborhood. The specific operation of gaussian filtering is: each pixel in the image is scanned using a template (or convolution, mask), and the weighted average gray value of the pixels in the neighborhood determined by the template is used to replace the value of the pixel in the center of the template.
And secondly, carrying out binarization and reverse color processing on the text image.
Image Binarization (Image Binarization) is a process of setting the gray value of a pixel point on an Image to be 0 or 255, namely, the whole Image presents an obvious black-white effect.
The inverse is a color that can become white superimposed with the primary colors, i.e. the color of the primary colors subtracted from white (RGB: 255, 255, 255). For example, the reverse of red (RGB: 255, 0, 0) is cyan (0, 255, 255). In contrast, in the image subjected to the binarization processing in step 34, the gradation value 0 is changed to the gradation value 255, and the gradation value 255 is changed to the gradation value 0.
Finally, edge calculations may be performed on the text image.
Alternatively, a Canny edge algorithm can be used, which aims to find an optimal edge detection algorithm, which means:
(1) optimal detection: the algorithm can identify actual edges in the image as much as possible, and the probability of missing detection of the actual edges and the probability of false detection of the non-edges are both as small as possible;
(2) optimal positioning criterion: the position of the detected edge point is closest to the position of the actual edge point, or the degree that the detected edge deviates from the real edge of the object due to the influence of noise is minimum;
(3) the detection points correspond to the edge points one by one: the edge points detected by the operator should have a one-to-one correspondence with the actual edge points.
The Canny edge algorithm may include the following steps:
(1) finding intensity gradients (intensity gradients) of the image;
(2) applying a non-maximum suppression (non-maximum suppression) technique to eliminate edge false detection (which is not originally detected but detected);
(3) applying a dual threshold approach to determine possible (potential) boundaries;
(4) the boundaries are tracked using a hysteresis technique.
And preprocessing the text image to be corrected in the above mode, and starting to acquire first inclination information.
It can be understood that, through the above pre-processing of the text image, the points at the upper left corner and the lower right corner of the region where the text line is located can be identified and positioned through the identification of the image.
OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a black-white dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software.
Aocr (attention ocr), which is an algorithm for recognizing a single line of text by using an attention mechanism, generally takes CNN (Convolutional Neural Networks) features as input, and calculates an attention weight of a new state by using an attention model for attention weights of a state and a previous state of RNN (recurrent Neural Networks). And then, inputting the CNN characteristics and the weight into the RNN, and obtaining a result through encoding and decoding.
Step 12: and calculating a formula coordinate set and a formula boundary set of the text lines according to the text positioning information and the attention information.
Step 12 may specifically include two aspects, that is, first, a formula boundary set is obtained; second, a formula coordinate set is calculated.
Referring to fig. 3, fig. 3 is a schematic flowchart of acquiring a boundary set according to an embodiment of the present application, where the method includes:
step 31: acquiring an attention information vector of a target special character in a target text line according to the attention information; wherein, the special characters are characters in a formula.
Optionally, the extracted special characters may be encoded to obtain encoding characteristics; calculating a prediction probability for the coding features; and calculating weights of different encoding characteristics by using an attention mechanism to obtain an encoded attention information vector.
Step 32: and judging whether the index value corresponding to the maximum value in the attention information vector is 0 or not.
The attention confidence vectors are indexed according to 0, 1 and 2 … …, wherein if the index value corresponding to the maximum value is 0, the maximum value is the head of the vector, and the special character is further indicated to be positioned at the head of the text line.
If the determination result in step 32 is yes, step 33 is executed.
Step 33: the formula information adjacent to the target special character is added to the formula boundary set.
The adjacent formula information is used for indicating that the special character is positioned at the head of the text line, and the tail of the line of the previous text and the head of the text line possibly form a formula together.
Referring to fig. 4, fig. 4 is a schematic flowchart of obtaining a formula set according to an embodiment of the present application, where the method includes:
step 41: and calculating the width value of the target text line according to the text positioning information of the target text line.
Alternatively, the width value of the target text line may be calculated using the following formula:
Figure BDA0002075644940000081
wherein w is the coordinate width of the target text line, h is the height of the target text line, wmThe result of the calculation of (A) is rounded up, e.g. wmIf the calculation result of (a) is 1.5, then the value can be 2.
Step 42: and calculating the abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector.
Alternatively, the abscissa value of the target special character may be calculated using the following formula:
Figure BDA0002075644940000082
where w is the width of the target text line, aidx is the index value corresponding to the maximum value in the attention information vector, and wmIs the width value of the target text line, CwThe result of the calculation of (a) is rounded up.
Step 43: and determining the maximum abscissa value and the minimum abscissa value according to the abscissa value of the target special character.
Optionally, referring to fig. 5, fig. 5 is a schematic flowchart of determining a maximum abscissa and a minimum abscissa in an embodiment of the present application, where the method includes:
step 431: and determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character.
The abscissa value of the target special character is the abscissa value C of the target special character calculated in the above step 42wHere, the maximum abscissa value W is setmaxAnd a minimum abscissa value Wmin
Alternatively, the initial maximum and minimum abscissa values may be determined by traversing the initial abscissa values of the target special character obtained after one text line.
Step 432: and when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values.
Step 433: and if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value.
If CwRatio WminSmall, then to WminUpdating is carried out, optionally W can be addedminIs replaced by CwThe value of (c).
Step 434: and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.
If CwRatio WmaxLarge, then to WmaxUpdating is carried out, optionally W can be addedmaxIs replaced by CwThe value of (c).
Step 44: and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.
Alternatively, the formula coordinate set may be calculated using the following formula:
Figure BDA0002075644940000091
wherein x is1Is the left value of the abscissa, x, of the formula2Is the right value of the abscissa of the formula, y1Is the value on the ordinate of the formula, y2Is the lower value of the ordinate of the formula, bi0For locating the left value of the abscissa in the information, bi1For locating the right value of the abscissa in the information, bi2Locating values on ordinate in information for text, bi3Locating a lower value, w, of the ordinate in the information for the textminIs the minimum abscissa value, wmaxThe maximum abscissa value.
Step 13: and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set.
Referring to fig. 6, fig. 6 is a schematic diagram of a second location coordinate of a text provided in an embodiment of the present application, and it can be understood that, in some embodiments, a formula to be located may not be in the same text line, for example, a previous part of the formula is in a previous text line, and a next part of the formula is in a next text line. As shown in fig. 6, "exemplary" in "exemplary text" is on the upper line, and "text" is on the lower line.
Optionally, as shown in fig. 7, fig. 7 is a schematic flowchart of calculating a formula-based positioning coordinate in an embodiment of the present application, where the method includes:
step 71: and judging whether the text of the previous line of the target line of text is in the formula boundary set.
If the determination result in step 71 is yes, step 72 is executed.
It is understood that, through the determination process of step 71, it is known whether there is a formula located at the head or tail of the line in the previous line of text, and thus it is possible to use the same formula as the formula of the present line.
Step 72: and judging whether the target line text has adjacent formula information.
Wherein the adjacent formula information is the adjacent formula information added in the above step 33.
If the determination result in step 72 is yes, step 73 is executed.
Step 73: and fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.
As shown in FIG. 6, the upper left coordinate of "exemplary" in "exemplary text" is C (x)3,y3) And the lower right coordinate is D (x)4,y4) The upper left coordinate of "text" is E (x)5Y5) with the lower right coordinate F (x)6Y 6). Then the coordinates of the whole formula can be obtained from combining the coordinates.
In addition, in the process of coordinate calculation, the ordinate may be updated, specifically: acquiring a binary image of a target formula area according to the formula coordinate set; carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula; and updating the formula coordinate set by adopting the ordinate of the target formula.
Different from the prior art, the formula positioning method for text images provided by the embodiment includes: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set. By the mode, the formula in the text image can be positioned by using the attention information, so that a foundation is laid for subsequent formula identification, and the image of the formula can be accurately obtained.
The above embodiments are described below in several detail steps:
referring to fig. 8, fig. 8 is a logic diagram of a formula positioning method for text images according to an embodiment of the present application, where the method includes:
step 81: inputting a text image S containing a formula, a positioning coordinate information set B of each line of text, an identification information set T of AOCR on each line of text, and an attention information set A of each line of text.
Step 82 a: and acquiring the ith row of text information ti from T.
Step 82 b: and carrying out binarization on the image S to obtain a binarized image St.
Step 82a and step 82b may be executed simultaneously or sequentially.
Step 83: and judging whether the ti text characters have the mathematical keywords or not. If yes, go to step 84, otherwise go back to step 82 a.
Step 84: the corresponding attention information ai is obtained from a.
Step 85: and obtaining a formula coordinate set AB, a formula boundary set FB and a corresponding number k of the ith line of text according to the ti and the ai and the corresponding text positioning coordinate information bi.
Step 86: and calculating the formula positioning coordinate of the line by using AB, FB, k and St.
Step 87: all formula sets FBL are output.
Referring to fig. 9, fig. 9 is a schematic logic diagram for obtaining a formula location coordinate set according to an embodiment of the present application, where the method includes:
step 901: all mathematical key characters for ti are sought.
Step 902: and taking the mathematical key character as a center, searching all non-Chinese characters to the left and the right, and acquiring a corresponding number set FS.
Step 903: calculate the width w and height h of the line of text and normalize the width to wmSetting a minimum abscissa value WminAnd the maximum abscissa value Wmax
Step 904: and traversing the FS to obtain a corresponding number FS, and extracting an attention information vector a of ai corresponding to FS.
Step 905: obtaining the index aidx corresponding to the maximum value in a, and calculating the abscissa value Cw
Step 906: it is queried whether aidx is first. If yes, go to step 907, otherwise go to step 908.
Step 907: and adding adjacent formula information into the FB corresponding position, and reserving the number k corresponding to the ith line of text.
Step 908: judgment CwWhether or not to compare WminIs small. If yes, go to step 909, otherwise go to step 910.
Step 909: by CwUpdating Wmin
Step 910: judgment CwWhether or not to compare WmaxIs large. If yes, go to step 911, otherwise go to step 912.
Step 911: by CwUpdating Wmax
Step 912: return to step 904 until the FS is processed.
Step 913: and calculating the current formula AB and adding the current formula AB into a formula coordinate set AB.
Referring to fig. 10, fig. 10 is a logic diagram of formula coordinate combination provided in the embodiment of the present application, where the method includes:
step 101: and acquiring the j-th formula coordinate from AB, and intercepting the temporary binary image tt from St.
Step 102: and (5) carrying out vertical coordinate projection on tt by using a projection method to obtain an actual vertical coordinate, and updating the vertical coordinate of the jth formula.
Step 103: and returning to the step 101 until the left and right formula coordinates are completely processed, and executing the next step.
Step 104: and judging whether the FB corresponding to the k-1 number exists. If yes, go to step 105, otherwise go to step 107.
Step 105: it is determined whether the kth fb has adjacent formula information. If yes, go to step 106, otherwise go to step 107.
Step 106: and fusing the last formula coordinate in the FBL and the first formula coordinate of the current AB into a new formula coordinate, replacing the last formula coordinate in the FBL, and adding the rest formulas in the AB into the FBL.
Step 107: the current corresponding AB formula coordinate set is added to the formula set FBL.
Step 108: and outputting the formula coordinate set FBL of the line text.
It is to be understood that the logic steps described above are based on the above embodiments, and the principles and calculation methods are similar, and are not described herein again.
Referring to fig. 11, fig. 11 is a schematic diagram of a first structure of an image processing apparatus according to an embodiment of the present disclosure, where the image processing apparatus 110 includes an obtaining module 111, a first calculating module 112, and a second calculating module 113.
The obtaining module 111 is configured to obtain text positioning information and attention information of a text line; the first calculating module 112 is configured to calculate a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; the second calculating module 113 is configured to calculate the location coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set.
Referring to fig. 12, fig. 12 is a schematic diagram of a second structure of the image processing apparatus according to the embodiment of the present application, where the image processing apparatus 120 includes a processor 121 and a memory 122, the memory 122 is used for storing program data, and the processor 121 is used for executing the program data to implement the following method:
acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: acquiring an attention information vector of a target special character in a target text line according to the attention information; wherein, the special characters are characters in a formula; judging whether an index value corresponding to the maximum value in the attention information vector is 0 or not; and if so, adding the adjacent formula information of the target special character into the formula boundary set.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: calculating the width value of the target text line according to the text positioning information of the target text line; calculating an abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector; determining a maximum abscissa value and a minimum abscissa value according to the abscissa value of the target special character; and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.
Optionally, processor 121 is also configured to execute run-time data to implement, for exampleThe following method: the normalized width value of the target text line is calculated using the following formula:
Figure BDA0002075644940000131
wherein w is the coordinate width of the target text line, and h is the height of the target text line.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: calculating the abscissa value of the target special character by using the following formula:
Figure BDA0002075644940000132
where w is the width of the target text line, aidx is the index value corresponding to the maximum value in the attention information vector, and wmIs the normalized width value of the target text line.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character; when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values; if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value; and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: the formula coordinate set is calculated using the following formula:
Figure BDA0002075644940000141
wherein x is1Is the left value of the abscissa, x, of the formula2Is the right value of the abscissa of the formula, y1Is the value on the ordinate of the formula, y2Is the lower value of the ordinate of the formula, bi0For locating the left value of the abscissa in the information, bi2Locating values on ordinate in information for text, bi3Locating a lower value, w, of the ordinate in the information for the textminIs the minimum abscissa value, wmaxIs the maximum abscissaThe value is obtained.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: judging whether the previous line of text of the target line of text is in a formula boundary set or not; if yes, judging whether the target line text has adjacent formula information; and if so, fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.
Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: acquiring a binary image of a target formula area according to the formula coordinate set; carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula; and updating the formula coordinate set by adopting the ordinate of the target formula.
Referring to fig. 13, fig. 13 is a schematic structural diagram of a computer storage medium according to an embodiment of the present application, the computer storage medium 130 stores program data 131, and the program data 131 is executed by a processor to implement the following method:
acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated units in the other embodiments described above may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made according to the content of the present specification and the accompanying drawings, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (11)

1. A formula positioning method of a text image is characterized by comprising the following steps:
acquiring text positioning information and attention information of a text line; wherein the attention information is obtained by identifying the text line by using an attention mechanism;
calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information;
calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set;
the step of calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information includes:
acquiring an attention information vector of a target special character in a target text line according to the attention information; the special characters are characters in a formula, the attention information vector is obtained by encoding the target special characters to obtain encoding characteristics, and the weight of the encoding characteristics is calculated according to an attention mechanism;
judging whether an index value corresponding to the maximum value in the attention information vector is 0 or not;
and if so, adding the adjacent formula information of the target special character into a formula boundary set.
2. The method of claim 1,
the method further comprises the following steps:
calculating the width value of the target text line according to the text positioning information of the target text line;
calculating an abscissa value of the target special character according to the width value of the target text line and an index value corresponding to the maximum value in the attention information vector;
determining a maximum abscissa value and a minimum abscissa value according to the abscissa value of the target special character;
and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.
3. The method of claim 2,
the step of calculating the width value of the target text line according to the text positioning information of the target text line comprises the following steps:
calculating the normalized width value of the target text line using the following formula:
Figure FDA0003016419760000021
wherein w is the coordinate width of the target text line, and h is the height of the target text line.
4. The method of claim 2,
the step of calculating the abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector includes:
calculating the abscissa value of the target special character by using the following formula:
Figure FDA0003016419760000022
wherein w is the width of the target text line, aidx is the index value corresponding to the maximum value in the attention information vector, and wmAnd the normalized width value of the target text line.
5. The method of claim 2,
the step of determining the maximum and minimum abscissa values of the target special character, comprising:
determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character;
when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values;
if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value;
and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.
6. The method of claim 2,
the step of calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value includes:
the formula coordinate set is calculated using the following formula:
Figure FDA0003016419760000031
wherein x is1Is the left value of the abscissa, x, of the formula2Is the right value of the abscissa of the formula, y1Is the value on the ordinate of the formula, y2Is the lower value of the ordinate of the formula, bi0For locating the left value of the abscissa in the information, bi2For locating values on the ordinate in the information, bi3Locating a lower value, w, of a vertical coordinate in the textminIs the minimum abscissa value, wmaxThe maximum abscissa value.
7. The method of claim 1,
the step of calculating the location coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set includes:
judging whether the previous line of text of the target line of text is in the formula boundary set or not;
if yes, judging whether the target line text has adjacent formula information;
and if so, fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.
8. The method of claim 1,
the method further comprises the following steps:
acquiring a binary image of a target formula area according to the formula coordinate set;
carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula;
and updating the formula coordinate set by adopting the ordinate of the target formula.
9. An image processing apparatus characterized by comprising:
the acquisition module is used for acquiring text positioning information and attention information of a text line;
the first calculation module is used for calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information;
the second calculation module is used for calculating the positioning coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set; wherein the attention information is obtained by identifying the text line by using an attention mechanism;
the step of calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information includes:
acquiring an attention information vector of a target special character in a target text line according to the attention information; the special characters are characters in a formula, the attention information vector is obtained by encoding the target special characters to obtain encoding characteristics, and the weight of the encoding characteristics is calculated according to an attention mechanism;
judging whether an index value corresponding to the maximum value in the attention information vector is 0 or not;
and if so, adding the adjacent formula information of the target special character into a formula boundary set.
10. An image processing apparatus, characterized in that the image processing apparatus comprises a processor and a memory for storing program data, the processor being adapted to execute the program data to implement the method according to any of claims 1-8.
11. A computer storage medium for storing program data, which when executed by a processor is adapted to carry out the method of any one of claims 1 to 8.
CN201910452711.5A 2019-05-28 2019-05-28 Formula positioning method of text image, image processing device and storage medium Active CN110210467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910452711.5A CN110210467B (en) 2019-05-28 2019-05-28 Formula positioning method of text image, image processing device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910452711.5A CN110210467B (en) 2019-05-28 2019-05-28 Formula positioning method of text image, image processing device and storage medium

Publications (2)

Publication Number Publication Date
CN110210467A CN110210467A (en) 2019-09-06
CN110210467B true CN110210467B (en) 2021-07-30

Family

ID=67789041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910452711.5A Active CN110210467B (en) 2019-05-28 2019-05-28 Formula positioning method of text image, image processing device and storage medium

Country Status (1)

Country Link
CN (1) CN110210467B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699337B (en) * 2019-10-22 2022-07-29 北京易真学思教育科技有限公司 Equation correction method, electronic device and computer storage medium
CN112613279A (en) * 2020-12-24 2021-04-06 北京乐学帮网络技术有限公司 File conversion method and device, computer device and readable storage medium
CN112712075B (en) * 2020-12-30 2023-12-01 科大讯飞股份有限公司 Arithmetic detection method, electronic equipment and storage device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104751148A (en) * 2015-04-16 2015-07-01 同方知网数字出版技术股份有限公司 Method for recognizing scientific formulas in layout file
CN105913057A (en) * 2016-04-12 2016-08-31 中国传媒大学 Projection and structure characteristic-based in-image mathematical formula detection method
CN107169485A (en) * 2017-03-28 2017-09-15 北京捷通华声科技股份有限公司 A kind of method for identifying mathematical formula and device
CN107798321A (en) * 2017-12-04 2018-03-13 海南云江科技有限公司 A kind of examination paper analysis method and computing device
CN108399386A (en) * 2018-02-26 2018-08-14 阿博茨德(北京)科技有限公司 Information extracting method in pie chart and device
CN109241861A (en) * 2018-08-14 2019-01-18 科大讯飞股份有限公司 A kind of method for identifying mathematical formula, device, equipment and storage medium
CN109471583A (en) * 2014-03-20 2019-03-15 卡西欧计算机株式会社 Electronic equipment, mathematical expression display control method and recording medium
CN109614944A (en) * 2018-12-17 2019-04-12 科大讯飞股份有限公司 A kind of method for identifying mathematical formula, device, equipment and readable storage medium storing program for executing
CN111340020A (en) * 2019-12-12 2020-06-26 科大讯飞股份有限公司 Formula identification method, device, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9078093B2 (en) * 2011-10-19 2015-07-07 Electronics And Telecommunications Research Institute Apparatus and method for recognizing target mobile communication terminal

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471583A (en) * 2014-03-20 2019-03-15 卡西欧计算机株式会社 Electronic equipment, mathematical expression display control method and recording medium
CN104751148A (en) * 2015-04-16 2015-07-01 同方知网数字出版技术股份有限公司 Method for recognizing scientific formulas in layout file
CN105913057A (en) * 2016-04-12 2016-08-31 中国传媒大学 Projection and structure characteristic-based in-image mathematical formula detection method
CN107169485A (en) * 2017-03-28 2017-09-15 北京捷通华声科技股份有限公司 A kind of method for identifying mathematical formula and device
CN107798321A (en) * 2017-12-04 2018-03-13 海南云江科技有限公司 A kind of examination paper analysis method and computing device
CN108399386A (en) * 2018-02-26 2018-08-14 阿博茨德(北京)科技有限公司 Information extracting method in pie chart and device
CN109241861A (en) * 2018-08-14 2019-01-18 科大讯飞股份有限公司 A kind of method for identifying mathematical formula, device, equipment and storage medium
CN109614944A (en) * 2018-12-17 2019-04-12 科大讯飞股份有限公司 A kind of method for identifying mathematical formula, device, equipment and readable storage medium storing program for executing
CN111340020A (en) * 2019-12-12 2020-06-26 科大讯飞股份有限公司 Formula identification method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Attention-based Extraction of Structured Information from Street View Imagery";Zbigniew Wojna, Alex Gorban et.al.;《arXiv》;20170820;第1-7页 *
"Embedding a Mathematical OCR Module into OCRopus";Shinpei Yamazaki, Fumihiro Furukori et.al.;《2011 International Conference on Document Analysis and Recognition》;20111231;第880-884页 *
"中文电子文档的数学公式定位研究";林晓燕,高良才,汤帜;《北京大学学报(自然科学版)》;20140131;第50卷(第1期);第17-24 *

Also Published As

Publication number Publication date
CN110210467A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
US11302109B2 (en) Range and/or polarity-based thresholding for improved data extraction
CN108345880B (en) Invoice identification method and device, computer equipment and storage medium
US9769354B2 (en) Systems and methods of processing scanned data
US9754164B2 (en) Systems and methods for classifying objects in digital images captured using mobile devices
KR100339691B1 (en) Apparatus for recognizing code and method therefor
CN110956171A (en) Automatic nameplate identification method and device, computer equipment and storage medium
US8331670B2 (en) Method of detection document alteration by comparing characters using shape features of characters
CN110210467B (en) Formula positioning method of text image, image processing device and storage medium
US11574489B2 (en) Image processing system, image processing method, and storage medium
EP2605186B1 (en) Method and apparatus for recognizing a character based on a photographed image
EP2014082A1 (en) Generating a bitonal image from a scanned colour image
JP2013042415A (en) Image processing apparatus, image processing method, and computer program
US9171224B2 (en) Method of improving contrast for text extraction and recognition applications
CN110390643B (en) License plate enhancement method and device and electronic equipment
US11341739B2 (en) Image processing device, image processing method, and program recording medium
US9626601B2 (en) Identifying image transformations for improving optical character recognition quality
US8848984B2 (en) Dynamic thresholds for document tamper detection
US20140086473A1 (en) Image processing device, an image processing method and a program to be used to implement the image processing
US8705134B2 (en) Method of processing an image to clarify text in the image
JP2010074342A (en) Image processing apparatus, image forming apparatus, and program
CN115410191B (en) Text image recognition method, device, equipment and storage medium
CN116050379A (en) Document comparison method and storage medium
JP2008252877A (en) Image processing method, image processing apparatus, image reading apparatus, image forming apparatus, computer program and computer readable recording medium
CN114267035A (en) Document image processing method and system, electronic device and readable medium
Konya et al. Adaptive methods for robust document image understanding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230831

Address after: No. 79 Wanbo Second Road, Nancun Town, Panyu District, Guangzhou City, Guangdong Province, 5114303802 (self declared)

Patentee after: Guangzhou Huanju Mark Network Information Co.,Ltd.

Address before: 511449 28th floor, block B1, Wanda Plaza, Nancun Town, Panyu District, Guangzhou City, Guangdong Province

Patentee before: GUANGZHOU HUADUO NETWORK TECHNOLOGY Co.,Ltd.