CN110210467B

CN110210467B - Formula positioning method of text image, image processing device and storage medium

Info

Publication number: CN110210467B
Application number: CN201910452711.5A
Authority: CN
Inventors: 黄家冕; 梁炎; 王卫锋
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Huanju Mark Network Information Co ltd
Priority date: 2019-05-28
Filing date: 2019-05-28
Publication date: 2021-07-30
Anticipated expiration: 2039-05-28
Also published as: CN110210467A

Abstract

The application discloses a formula positioning method of a text image, an image processing device and a storage medium, wherein the formula positioning method of the text image comprises the following steps: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set. By the method, the formula in the text image can be accurately positioned.

Description

Formula positioning method of text image, image processing device and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a formula positioning method for text images, an image processing apparatus, and a storage medium.

Background

With the development of mobile internet technology, a large number of handheld mobile terminals such as smart phones and tablet computers enter our lives and become an indispensable part of our lives. The handheld terminals have the camera shooting function, and great convenience is provided for acquiring document information at any time.

Scientific formulas as a special information carrier are also widely stored in text documents. In practical application, scientific formulas often need to be positioned and extracted, and how to position the formulas in the text images becomes an urgent problem to be solved.

Disclosure of Invention

In order to solve the above problems, the present application provides a formula positioning method for a text image, an image processing apparatus, and a storage medium, which can accurately position a formula in a text image.

The technical scheme adopted by the application is as follows: a formula positioning method of a text image is provided, and the method comprises the following steps: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set.

The step of calculating the formula coordinate set and the formula boundary set of the text line according to the text positioning information and the attention information comprises the following steps: acquiring an attention information vector of a target special character in a target text line according to the attention information; wherein, the special characters are characters in a formula; judging whether an index value corresponding to the maximum value in the attention information vector is 0 or not; and if so, adding the adjacent formula information of the target special character into the formula boundary set.

Wherein, the method also comprises: calculating the width value of the target text line according to the text positioning information of the target text line; calculating an abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector; determining a maximum abscissa value and a minimum abscissa value according to the abscissa value of the target special character; and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.

The step of calculating the width value of the target text line according to the text positioning information of the target text line comprises the following steps: the normalized width value of the target text line is calculated using the following formula:

wherein w is the coordinate width of the target text line, and h is the target text lineOf (c) is measured.

The step of calculating the abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector includes: calculating the abscissa value of the target special character by using the following formula:

where w is the width of the target text line, aidx is the index value corresponding to the maximum value in the attention information vector, and w_mIs the normalized width value of the target text line.

Wherein, according to the abscissa value of the target special character, the step of determining the maximum and minimum abscissa values includes: determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character; when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values; if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value; and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.

Wherein, according to text positioning information, the maximum abscissa value and the minimum abscissa value, the step of calculating the formula coordinate set comprises: the formula coordinate set is calculated using the following formula:

wherein x is₁Is the left value of the abscissa, x, of the formula₂Is the right value of the abscissa of the formula, y₁Is the value on the ordinate of the formula, y₂Is the lower value of the ordinate of the formula, b_i0For locating the left value of the abscissa in the information, b_i2Locating values on ordinate in information for text, b_i3Locating a lower value, w, of the ordinate in the information for the text_minIs the minimum abscissa value, w_maxThe maximum abscissa value.

Wherein, according to the formula coordinate set and the formula boundary set, the step of calculating the positioning coordinate of the formula in the text line comprises the following steps: judging whether the previous line of text of the target line of text is in a formula boundary set or not; if yes, judging whether the target line text has adjacent formula information; and if so, fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.

Wherein, the method also comprises: acquiring a binary image of a target formula area according to the formula coordinate set; carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula; and updating the formula coordinate set by adopting the ordinate of the target formula.

Another technical scheme adopted by the application is as follows: provided is an image processing apparatus including: the acquisition module is used for acquiring text positioning information and attention information of a text line; the first calculation module is used for calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and the second calculation module is used for calculating the positioning coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set.

Another technical scheme adopted by the application is as follows: there is provided an image processing apparatus comprising a processor and a memory for storing program data, the processor being arranged to execute the program data to implement the method as described above.

Another technical scheme adopted by the application is as follows: a computer storage medium is provided for storing program data for implementing the method as described above when executed by a processor.

The formula positioning method of the text image comprises the following steps: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set. By the mode, the formula in the text image can be positioned by using the attention information, so that a foundation is laid for subsequent formula identification, and the image of the formula can be accurately obtained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flowchart of a formula positioning method for text images according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of first location coordinates of a text provided in an embodiment of the present application;

fig. 3 is a schematic flowchart of acquiring a boundary set according to an embodiment of the present application;

FIG. 4 is a schematic flow chart illustrating obtaining a formula set according to an embodiment of the present application;

FIG. 5 is a schematic flow chart illustrating the determination of the maximum abscissa and the minimum abscissa in the embodiment of the present application;

FIG. 6 is a schematic diagram of second location coordinates of a text provided in an embodiment of the present application;

FIG. 7 is a schematic flow chart illustrating calculation of formula positioning coordinates according to an embodiment of the present application;

FIG. 8 is a logic diagram of a formula positioning method for text images according to an embodiment of the present disclosure;

FIG. 9 is a logic diagram for obtaining a formula location coordinate set according to an embodiment of the present application;

FIG. 10 is a logic diagram of formula coordinate consolidation provided by an embodiment of the present application;

fig. 11 is a schematic diagram of a first structure of an image processing apparatus according to an embodiment of the present application;

fig. 12 is a second schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a computer storage medium provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Referring to fig. 1, fig. 1 is a schematic flowchart of a formula positioning method for a text image according to an embodiment of the present application, where the method includes:

step 11: text positioning information and attention information of a text line are acquired.

Text images, also called document images, i.e. documents in image format. The method is to convert a paper document into a document in an image format in a certain mode for electronic reading of a user. Typical text image formats include jpg (jpeg), BMP, PNG, GIF, FSP, TIFF, TGA, EPS, and the like.

Alternatively, the text location information may be location coordinates of the text. It will be appreciated that the text is generally arranged line by line in the form of "lines", and the location coordinates are generally the coordinates of the upper left and lower right points in the rectangular area in which a line of text is located.

As shown in fig. 2, fig. 2 is a schematic diagram of a first location coordinate of a text provided in an embodiment of the present application, where a (x) is₁,y₁) Coordinate point representing the top left corner of the line of text, B (x)₂,y₂) A coordinate point representing the lower right corner of the line of text.

In a specific operation, the text image may be subjected to a gradation process.

Grayscale is the most direct visual feature that describes the content of a grayscale image. It refers to the color depth of the dots in a black-and-white image, generally ranging from 0 to 255, with white being 255 and black being 0, so the black-and-white image is also called a grayscale image. The grayscale image matrix elements typically take values of [0, 255], and thus their data type is typically an 8-bit unsigned integer, which is known as 256 levels of grayscale. When a color image is converted into a gray image, the effective brightness value of each pixel in the image needs to be calculated, and the calculation formula is as follows: y is 0.3R +0.59G + 0.11B.

Then, the text image is subjected to denoising processing.

Alternatively, the grayscale image may be gaussian smoothed using a gaussian filtering algorithm. The gaussian filtering is a process of weighted average of the whole image, and the value of each pixel point is obtained by weighted average of the value of each pixel point and other pixel values in the neighborhood. The specific operation of gaussian filtering is: each pixel in the image is scanned using a template (or convolution, mask), and the weighted average gray value of the pixels in the neighborhood determined by the template is used to replace the value of the pixel in the center of the template.

And secondly, carrying out binarization and reverse color processing on the text image.

Image Binarization (Image Binarization) is a process of setting the gray value of a pixel point on an Image to be 0 or 255, namely, the whole Image presents an obvious black-white effect.

The inverse is a color that can become white superimposed with the primary colors, i.e. the color of the primary colors subtracted from white (RGB: 255, 255, 255). For example, the reverse of red (RGB: 255, 0, 0) is cyan (0, 255, 255). In contrast, in the image subjected to the binarization processing in step 34, the gradation value 0 is changed to the gradation value 255, and the gradation value 255 is changed to the gradation value 0.

Finally, edge calculations may be performed on the text image.

Alternatively, a Canny edge algorithm can be used, which aims to find an optimal edge detection algorithm, which means:

(1) optimal detection: the algorithm can identify actual edges in the image as much as possible, and the probability of missing detection of the actual edges and the probability of false detection of the non-edges are both as small as possible;

(2) optimal positioning criterion: the position of the detected edge point is closest to the position of the actual edge point, or the degree that the detected edge deviates from the real edge of the object due to the influence of noise is minimum;

(3) the detection points correspond to the edge points one by one: the edge points detected by the operator should have a one-to-one correspondence with the actual edge points.

The Canny edge algorithm may include the following steps:

(1) finding intensity gradients (intensity gradients) of the image;

(2) applying a non-maximum suppression (non-maximum suppression) technique to eliminate edge false detection (which is not originally detected but detected);

(3) applying a dual threshold approach to determine possible (potential) boundaries;

(4) the boundaries are tracked using a hysteresis technique.

And preprocessing the text image to be corrected in the above mode, and starting to acquire first inclination information.

It can be understood that, through the above pre-processing of the text image, the points at the upper left corner and the lower right corner of the region where the text line is located can be identified and positioned through the identification of the image.

OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a black-white dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software.

Aocr (attention ocr), which is an algorithm for recognizing a single line of text by using an attention mechanism, generally takes CNN (Convolutional Neural Networks) features as input, and calculates an attention weight of a new state by using an attention model for attention weights of a state and a previous state of RNN (recurrent Neural Networks). And then, inputting the CNN characteristics and the weight into the RNN, and obtaining a result through encoding and decoding.

Step 12: and calculating a formula coordinate set and a formula boundary set of the text lines according to the text positioning information and the attention information.

Step 12 may specifically include two aspects, that is, first, a formula boundary set is obtained; second, a formula coordinate set is calculated.

Referring to fig. 3, fig. 3 is a schematic flowchart of acquiring a boundary set according to an embodiment of the present application, where the method includes:

step 31: acquiring an attention information vector of a target special character in a target text line according to the attention information; wherein, the special characters are characters in a formula.

Optionally, the extracted special characters may be encoded to obtain encoding characteristics; calculating a prediction probability for the coding features; and calculating weights of different encoding characteristics by using an attention mechanism to obtain an encoded attention information vector.

Step 32: and judging whether the index value corresponding to the maximum value in the attention information vector is 0 or not.

The attention confidence vectors are indexed according to 0, 1 and 2 … …, wherein if the index value corresponding to the maximum value is 0, the maximum value is the head of the vector, and the special character is further indicated to be positioned at the head of the text line.

If the determination result in step 32 is yes, step 33 is executed.

Step 33: the formula information adjacent to the target special character is added to the formula boundary set.

The adjacent formula information is used for indicating that the special character is positioned at the head of the text line, and the tail of the line of the previous text and the head of the text line possibly form a formula together.

Referring to fig. 4, fig. 4 is a schematic flowchart of obtaining a formula set according to an embodiment of the present application, where the method includes:

step 41: and calculating the width value of the target text line according to the text positioning information of the target text line.

Alternatively, the width value of the target text line may be calculated using the following formula:

wherein w is the coordinate width of the target text line, h is the height of the target text line, w_mThe result of the calculation of (A) is rounded up, e.g. w_mIf the calculation result of (a) is 1.5, then the value can be 2.

Step 42: and calculating the abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector.

Alternatively, the abscissa value of the target special character may be calculated using the following formula:

where w is the width of the target text line, aidx is the index value corresponding to the maximum value in the attention information vector, and w_mIs the width value of the target text line, C_wThe result of the calculation of (a) is rounded up.

Step 43: and determining the maximum abscissa value and the minimum abscissa value according to the abscissa value of the target special character.

Optionally, referring to fig. 5, fig. 5 is a schematic flowchart of determining a maximum abscissa and a minimum abscissa in an embodiment of the present application, where the method includes:

step 431: and determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character.

The abscissa value of the target special character is the abscissa value C of the target special character calculated in the above step 42_wHere, the maximum abscissa value W is set_maxAnd a minimum abscissa value W_min。

Alternatively, the initial maximum and minimum abscissa values may be determined by traversing the initial abscissa values of the target special character obtained after one text line.

Step 432: and when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values.

Step 433: and if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value.

If C_wRatio W_minSmall, then to W_minUpdating is carried out, optionally W can be added_minIs replaced by C_wThe value of (c).

Step 434: and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.

If C_wRatio W_maxLarge, then to W_maxUpdating is carried out, optionally W can be added_maxIs replaced by C_wThe value of (c).

Step 44: and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.

Alternatively, the formula coordinate set may be calculated using the following formula:

wherein x is₁Is the left value of the abscissa, x, of the formula₂Is the right value of the abscissa of the formula, y₁Is the value on the ordinate of the formula, y₂Is the lower value of the ordinate of the formula, b_i0For locating the left value of the abscissa in the information, b_i1For locating the right value of the abscissa in the information, b_i2Locating values on ordinate in information for text, b_i3Locating a lower value, w, of the ordinate in the information for the text_minIs the minimum abscissa value, w_maxThe maximum abscissa value.

Step 13: and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set.

Referring to fig. 6, fig. 6 is a schematic diagram of a second location coordinate of a text provided in an embodiment of the present application, and it can be understood that, in some embodiments, a formula to be located may not be in the same text line, for example, a previous part of the formula is in a previous text line, and a next part of the formula is in a next text line. As shown in fig. 6, "exemplary" in "exemplary text" is on the upper line, and "text" is on the lower line.

Optionally, as shown in fig. 7, fig. 7 is a schematic flowchart of calculating a formula-based positioning coordinate in an embodiment of the present application, where the method includes:

step 71: and judging whether the text of the previous line of the target line of text is in the formula boundary set.

If the determination result in step 71 is yes, step 72 is executed.

It is understood that, through the determination process of step 71, it is known whether there is a formula located at the head or tail of the line in the previous line of text, and thus it is possible to use the same formula as the formula of the present line.

Step 72: and judging whether the target line text has adjacent formula information.

Wherein the adjacent formula information is the adjacent formula information added in the above step 33.

If the determination result in step 72 is yes, step 73 is executed.

Step 73: and fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.

As shown in FIG. 6, the upper left coordinate of "exemplary" in "exemplary text" is C (x)₃,y₃) And the lower right coordinate is D (x)₄,y₄) The upper left coordinate of "text" is E (x)₅Y5) with the lower right coordinate F (x)₆Y 6). Then the coordinates of the whole formula can be obtained from combining the coordinates.

In addition, in the process of coordinate calculation, the ordinate may be updated, specifically: acquiring a binary image of a target formula area according to the formula coordinate set; carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula; and updating the formula coordinate set by adopting the ordinate of the target formula.

Different from the prior art, the formula positioning method for text images provided by the embodiment includes: acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set. By the mode, the formula in the text image can be positioned by using the attention information, so that a foundation is laid for subsequent formula identification, and the image of the formula can be accurately obtained.

The above embodiments are described below in several detail steps:

referring to fig. 8, fig. 8 is a logic diagram of a formula positioning method for text images according to an embodiment of the present application, where the method includes:

step 81: inputting a text image S containing a formula, a positioning coordinate information set B of each line of text, an identification information set T of AOCR on each line of text, and an attention information set A of each line of text.

Step 82 a: and acquiring the ith row of text information ti from T.

Step 82 b: and carrying out binarization on the image S to obtain a binarized image St.

Step 82a and step 82b may be executed simultaneously or sequentially.

Step 83: and judging whether the ti text characters have the mathematical keywords or not. If yes, go to step 84, otherwise go back to step 82 a.

Step 84: the corresponding attention information ai is obtained from a.

Step 85: and obtaining a formula coordinate set AB, a formula boundary set FB and a corresponding number k of the ith line of text according to the ti and the ai and the corresponding text positioning coordinate information bi.

Step 86: and calculating the formula positioning coordinate of the line by using AB, FB, k and St.

Step 87: all formula sets FBL are output.

Referring to fig. 9, fig. 9 is a schematic logic diagram for obtaining a formula location coordinate set according to an embodiment of the present application, where the method includes:

step 901: all mathematical key characters for ti are sought.

Step 902: and taking the mathematical key character as a center, searching all non-Chinese characters to the left and the right, and acquiring a corresponding number set FS.

Step 903: calculate the width w and height h of the line of text and normalize the width to w_mSetting a minimum abscissa value W_minAnd the maximum abscissa value W_max。

Step 904: and traversing the FS to obtain a corresponding number FS, and extracting an attention information vector a of ai corresponding to FS.

Step 905: obtaining the index aidx corresponding to the maximum value in a, and calculating the abscissa value C_w。

Step 906: it is queried whether aidx is first. If yes, go to step 907, otherwise go to step 908.

Step 907: and adding adjacent formula information into the FB corresponding position, and reserving the number k corresponding to the ith line of text.

Step 908: judgment C_wWhether or not to compare W_minIs small. If yes, go to step 909, otherwise go to step 910.

Step 909: by C_wUpdating W_min。

Step 910: judgment C_wWhether or not to compare W_maxIs large. If yes, go to step 911, otherwise go to step 912.

Step 911: by C_wUpdating W_max。

Step 912: return to step 904 until the FS is processed.

Step 913: and calculating the current formula AB and adding the current formula AB into a formula coordinate set AB.

Referring to fig. 10, fig. 10 is a logic diagram of formula coordinate combination provided in the embodiment of the present application, where the method includes:

step 101: and acquiring the j-th formula coordinate from AB, and intercepting the temporary binary image tt from St.

Step 102: and (5) carrying out vertical coordinate projection on tt by using a projection method to obtain an actual vertical coordinate, and updating the vertical coordinate of the jth formula.

Step 103: and returning to the step 101 until the left and right formula coordinates are completely processed, and executing the next step.

Step 104: and judging whether the FB corresponding to the k-1 number exists. If yes, go to step 105, otherwise go to step 107.

Step 105: it is determined whether the kth fb has adjacent formula information. If yes, go to step 106, otherwise go to step 107.

Step 106: and fusing the last formula coordinate in the FBL and the first formula coordinate of the current AB into a new formula coordinate, replacing the last formula coordinate in the FBL, and adding the rest formulas in the AB into the FBL.

Step 107: the current corresponding AB formula coordinate set is added to the formula set FBL.

Step 108: and outputting the formula coordinate set FBL of the line text.

It is to be understood that the logic steps described above are based on the above embodiments, and the principles and calculation methods are similar, and are not described herein again.

Referring to fig. 11, fig. 11 is a schematic diagram of a first structure of an image processing apparatus according to an embodiment of the present disclosure, where the image processing apparatus 110 includes an obtaining module 111, a first calculating module 112, and a second calculating module 113.

The obtaining module 111 is configured to obtain text positioning information and attention information of a text line; the first calculating module 112 is configured to calculate a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; the second calculating module 113 is configured to calculate the location coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set.

Referring to fig. 12, fig. 12 is a schematic diagram of a second structure of the image processing apparatus according to the embodiment of the present application, where the image processing apparatus 120 includes a processor 121 and a memory 122, the memory 122 is used for storing program data, and the processor 121 is used for executing the program data to implement the following method:

acquiring text positioning information and attention information of a text line; calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information; and calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set.

Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: acquiring an attention information vector of a target special character in a target text line according to the attention information; wherein, the special characters are characters in a formula; judging whether an index value corresponding to the maximum value in the attention information vector is 0 or not; and if so, adding the adjacent formula information of the target special character into the formula boundary set.

Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: calculating the width value of the target text line according to the text positioning information of the target text line; calculating an abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector; determining a maximum abscissa value and a minimum abscissa value according to the abscissa value of the target special character; and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.

Optionally, processor 121 is also configured to execute run-time data to implement, for exampleThe following method: the normalized width value of the target text line is calculated using the following formula:

wherein w is the coordinate width of the target text line, and h is the height of the target text line.

Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: calculating the abscissa value of the target special character by using the following formula:

Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character; when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values; if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value; and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.

Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: the formula coordinate set is calculated using the following formula:

wherein x is₁Is the left value of the abscissa, x, of the formula₂Is the right value of the abscissa of the formula, y₁Is the value on the ordinate of the formula, y₂Is the lower value of the ordinate of the formula, b_i0For locating the left value of the abscissa in the information, b_i2Locating values on ordinate in information for text, b_i3Locating a lower value, w, of the ordinate in the information for the text_minIs the minimum abscissa value, w_maxIs the maximum abscissaThe value is obtained.

Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: judging whether the previous line of text of the target line of text is in a formula boundary set or not; if yes, judging whether the target line text has adjacent formula information; and if so, fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.

Optionally, the processor 121 is further configured to execute the sequence data to implement the following method: acquiring a binary image of a target formula area according to the formula coordinate set; carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula; and updating the formula coordinate set by adopting the ordinate of the target formula.

Referring to fig. 13, fig. 13 is a schematic structural diagram of a computer storage medium according to an embodiment of the present application, the computer storage medium 130 stores program data 131, and the program data 131 is executed by a processor to implement the following method:

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated units in the other embodiments described above may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made according to the content of the present specification and the accompanying drawings, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A formula positioning method of a text image is characterized by comprising the following steps:

acquiring text positioning information and attention information of a text line; wherein the attention information is obtained by identifying the text line by using an attention mechanism;

calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information;

calculating the positioning coordinates of the formulas in the text lines according to the formula coordinate set and the formula boundary set;

the step of calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information includes:

acquiring an attention information vector of a target special character in a target text line according to the attention information; the special characters are characters in a formula, the attention information vector is obtained by encoding the target special characters to obtain encoding characteristics, and the weight of the encoding characteristics is calculated according to an attention mechanism;

judging whether an index value corresponding to the maximum value in the attention information vector is 0 or not;

and if so, adding the adjacent formula information of the target special character into a formula boundary set.

2. The method of claim 1,

the method further comprises the following steps:

calculating the width value of the target text line according to the text positioning information of the target text line;

calculating an abscissa value of the target special character according to the width value of the target text line and an index value corresponding to the maximum value in the attention information vector;

determining a maximum abscissa value and a minimum abscissa value according to the abscissa value of the target special character;

and calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value.

3. The method of claim 2,

the step of calculating the width value of the target text line according to the text positioning information of the target text line comprises the following steps:

calculating the normalized width value of the target text line using the following formula:

4. The method of claim 2,

the step of calculating the abscissa value of the target special character according to the width value of the target text line and the index value corresponding to the maximum value in the attention information vector includes:

calculating the abscissa value of the target special character by using the following formula:

wherein w is the width of the target text line, aidx is the index value corresponding to the maximum value in the attention information vector, and w_mAnd the normalized width value of the target text line.

5. The method of claim 2,

the step of determining the maximum and minimum abscissa values of the target special character, comprising:

determining an initial maximum abscissa value and an initial minimum abscissa value according to the initial abscissa value of the target special character;

when a new abscissa value of the target special character is obtained, comparing the new abscissa value with the maximum and minimum abscissa values;

if the new abscissa value is smaller than the minimum abscissa value, updating the minimum abscissa value;

and if the new abscissa value is larger than the maximum abscissa value, updating the maximum abscissa value.

6. The method of claim 2,

the step of calculating a formula coordinate set according to the text positioning information, the maximum abscissa value and the minimum abscissa value includes:

the formula coordinate set is calculated using the following formula:

wherein x is₁Is the left value of the abscissa, x, of the formula₂Is the right value of the abscissa of the formula, y₁Is the value on the ordinate of the formula, y₂Is the lower value of the ordinate of the formula, b_i0For locating the left value of the abscissa in the information, b_i2For locating values on the ordinate in the information, b_i3Locating a lower value, w, of a vertical coordinate in the text_minIs the minimum abscissa value, w_maxThe maximum abscissa value.

7. The method of claim 1,

the step of calculating the location coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set includes:

judging whether the previous line of text of the target line of text is in the formula boundary set or not;

if yes, judging whether the target line text has adjacent formula information;

and if so, fusing the first formula coordinate in the target line text with the last formula coordinate in the previous line text.

8. The method of claim 1,

the method further comprises the following steps:

acquiring a binary image of a target formula area according to the formula coordinate set;

carrying out longitudinal coordinate projection on the binary image to obtain a longitudinal coordinate of the target formula;

and updating the formula coordinate set by adopting the ordinate of the target formula.

9. An image processing apparatus characterized by comprising:

the acquisition module is used for acquiring text positioning information and attention information of a text line;

the first calculation module is used for calculating a formula coordinate set and a formula boundary set of the text line according to the text positioning information and the attention information;

the second calculation module is used for calculating the positioning coordinates of the formula in the text line according to the formula coordinate set and the formula boundary set; wherein the attention information is obtained by identifying the text line by using an attention mechanism;

10. An image processing apparatus, characterized in that the image processing apparatus comprises a processor and a memory for storing program data, the processor being adapted to execute the program data to implement the method according to any of claims 1-8.

11. A computer storage medium for storing program data, which when executed by a processor is adapted to carry out the method of any one of claims 1 to 8.