CN113610073A

CN113610073A - Method and device for identifying formula in picture and storage medium

Info

Publication number: CN113610073A
Application number: CN202110730258.7A
Authority: CN
Inventors: 赵志勇; 王杰; 辛晓哲; 秦波; 苏雪峰
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2021-11-05

Abstract

The invention discloses a method and a device for identifying a formula in a picture and a storage medium, relates to the technical field of identification, and mainly solves the problem that the identification effect of the formula in the picture is poor at present. The method comprises the following steps: acquiring a picture to be identified, and determining a target area from the picture to be identified, wherein the target area is at least one and is an area containing a formula; dividing a target area to obtain a plurality of identification areas, wherein the identification areas are areas containing at least one formula, and the formula contains a plurality of formula units; identifying one or more formula units to obtain a corresponding formula identification result; and outputting the formula identification result. Based on the method, the method can ensure that a better formula identification effect can be achieved under the conditions that the picture to be identified contains more contents and is more complex in the formula identification process in the picture, and the problem that the identification effect of the formula in the current picture is poorer is solved.

Description

Method and device for identifying formula in picture and storage medium

Technical Field

The invention relates to the technical field of identification, in particular to a method and a device for identifying a formula in a picture and a storage medium.

Background

With the gradual development of intelligent recognition technology, scenes and requirements for recognizing contents in pictures are gradually increased.

In general, in the process of identifying a formula in a picture in a conventional manner, a picture only containing the formula needs to be provided separately, and the formula in the picture needs to be relatively simple, so that the picture can be identified through an identification model. However, in practical applications, when a picture contains not only a formula itself but also other contents such as images and characters, or the formula itself is composed of a plurality of simple formulas in different rows and columns, the existing formula identification method may fail to identify the picture due to the fact that the contents in the picture are too complex.

Disclosure of Invention

In view of the above problems, the present invention provides a method, an apparatus and a storage medium for recognizing a formula in a picture, and mainly aims to solve the problem that the recognition effect of the formula in the current picture is poor.

In order to solve the above technical problem, in a first aspect, the present invention provides a method for identifying a formula in a picture, where the method includes:

acquiring pictures to be identified, and determining target areas from the pictures to be identified, wherein the number of the target areas is at least one; the target area is an area containing a formula;

dividing the target area to obtain one or more identification areas, wherein the identification areas comprise areas of at least one formula, and the formula comprises a plurality of formula units;

identifying one or more formula units to obtain a corresponding formula identification result;

and outputting the formula identification result.

Optionally, the dividing the target area to obtain one or more identification areas includes:

determining each of the identified regions from the target region;

acquiring position information corresponding to each identification area;

before the outputting the formula identification result, the method further comprises:

combining the identification results corresponding to the plurality of identification areas according to the position information to obtain a combined result, wherein the combined result is used for representing the formula consisting of the plurality of formula units;

the outputting the formula identification result comprises:

and outputting the combination result.

Optionally, before identifying one or more formula units to obtain a corresponding formula identification result, the method further includes:

outputting the formula units and receiving indication information fed back by a user, wherein the indication information is triggered by the user after the formula units are output, and the indication information is used for selecting a target identification area from the formula units;

the identifying one or more formula units to obtain a corresponding formula identification result includes:

and executing formula identification operation on the target identification area according to the preset identification model to obtain a formula identification result corresponding to the target identification area.

when the number of the target areas is multiple, respectively identifying the multiple target areas to obtain one or more identification areas corresponding to each target area;

alternatively, the first and second electrodes may be,

and when the target areas are multiple, identifying based on at least one first target area selected by a user to respectively obtain one or more identification areas.

Optionally, the determining each identification region from the target region includes:

adding corresponding classification marks in the target area according to the data types, wherein the classification marks are used for representing the types of the data in the target area and comprise bracket type marks and text type marks;

and dividing the target area according to the classification identification to obtain each identification area.

Optionally, the dividing the target area according to the classification identifier to obtain each identification area includes:

adding a boundary box to the target area to obtain an identification unit, wherein the boundary box is used for distinguishing different identification areas in the target area;

the identifying one or more formula units to obtain the corresponding formula identification result comprises:

and executing a formula recognition operation on the recognition unit according to a preset recognition model to obtain a recognition result corresponding to the recognition unit.

Optionally, the obtaining the picture to be recognized and determining the target area from the picture to be recognized includes:

dividing the picture to be identified into a first area and a second area, wherein the first area is an area containing the formula, and the second area is an area except the area where the formula is located in the picture to be identified;

determining the first area as the target area.

Optionally, the outputting the formula identification result includes:

and outputting the formula recognition result in a preset mode, wherein the preset mode comprises an image mode, a code mode and a voice mode, and the code mode comprises a LaTeX source code.

In a second aspect, an embodiment of the present invention further provides an apparatus for identifying a formula in a picture, including:

the device comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for acquiring pictures to be identified and determining target areas from the pictures to be identified, and the number of the target areas is at least one; the target area is an area containing a formula;

the dividing unit is used for dividing the target area to obtain one or more identification areas, wherein the identification areas comprise areas of at least one formula, and the formula comprises a plurality of formula units;

the execution unit is used for identifying one or more formula units to obtain a corresponding formula identification result;

and the output unit is used for outputting the formula identification result.

Optionally, the dividing unit is specifically configured to:

determining each of the identified regions from the target region;

acquiring position information corresponding to each identification area;

the outputting the formula identification result comprises:

and outputting the combination result.

Optionally, the apparatus further includes a processing unit, where the processing unit is configured to:

Optionally, the dividing unit is specifically configured to:

alternatively, the first and second electrodes may be,

Optionally, the dividing unit is specifically configured to:

In a third aspect, the present invention provides a storage medium, where the storage medium includes a stored program, where when the program runs, a device in which the storage medium is located is controlled to execute the method for identifying a formula in a picture according to any one of the foregoing first aspects.

In a fourth aspect, the present invention provides an apparatus for identifying a formula in a picture, comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, and the method for identifying a formula in a picture according to any one of the first aspect.

The method comprises the steps of obtaining a picture to be identified, and determining a target area from the picture to be identified; then dividing the target area to obtain one or more identification areas; and then identifying one or more formula units to obtain a corresponding formula identification result, and finally outputting the formula identification result, thereby realizing the identification function of the formula in the picture. In the above scheme, since the target area is at least one; the target area is an area containing a formula, the identification area contains at least one formula area, and the formula contains a plurality of formula units, so that the complex formula can be divided into a plurality of small formulas or partial contents in the formula and identified according to the distribution of the identification area when the formula in the picture is complex in the identification process, and the identification can be carried out based on the method under the condition that the formula is complex, so that the identification effect is improved. In addition, the target area containing the formula can be determined from the acquired picture to be identified, so that the formula can be identified when one picture contains not only the formula but also other contents, the problem of poor formula identification effect under the conditions of more contents and more complexity in the picture to be identified is solved, and the formula identification effect is further improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1-a is a flow chart of a method for identifying a formula in a picture according to an embodiment of the present invention;

FIG. 1-b is a schematic diagram illustrating an identification process in an execution process of an identification method for a formula in a picture according to an embodiment of the present invention;

1-c is a schematic diagram illustrating an identification process in an execution process of an identification method for a formula in a picture according to an embodiment of the present invention;

1-d is a schematic diagram illustrating an identification process in an execution process of an identification method for a formula in a picture according to an embodiment of the present invention;

1-e shows a schematic diagram of an identification process in an execution process of an identification method for a formula in a picture according to an embodiment of the present invention;

fig. 1-f is a schematic diagram illustrating an identification process in an execution process of an identification method for a formula in a picture according to an embodiment of the present invention;

1-g is a schematic diagram illustrating an identification process in an execution process of an identification method for a formula in a picture according to an embodiment of the present invention;

FIG. 2 is a block diagram illustrating an apparatus for identifying formulas in a picture according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a client according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The embodiment of the invention provides a method for identifying a formula in a picture, which comprises the following steps of:

101. acquiring a picture to be identified, and determining a target area from the picture to be identified.

Wherein the target area is at least one; the target area contains an area of a formula.

In this embodiment, because the formula to be recognized may be located in the text or beside the schematic diagram of the picture to be recognized in an actual situation, that is, the picture to be recognized may often contain other contents besides the formula, it is particularly important to recognize and determine the position and the area where the formula is located. In this embodiment, the whole picture to be recognized may be segmented, and a region where a formula is located, that is, a region where the formula is located, may be determined as the target region.

It should be noted that the target area may include only the area of the formula, or may include a part of a text area adjacent to the formula, where the text may be a formula remark, or may be a text description of an adjacent context, and the like.

Of course, the picture to be recognized may include a plurality of formula areas, which may or may not be in an adjacent relationship. I.e. the target area may be 1 or more. That is, there may be a plurality of formulas in the acquired picture to be recognized, so that when determining the target region, it is actually the process of determining each region including the formulas in the picture to be recognized, and these regions may be adjacent to each other or not. Specifically, the distribution of the target area is not limited herein, and the formula content specifically contained in the actual picture to be recognized is taken as the standard.

The target area is an area which needs formula identification in the subsequent identification process, and other areas do not need identification. Because the target area can be determined from the picture to be recognized in the step, the target area to be recognized can be directly determined from the picture to be recognized when the picture to be recognized contains various different contents, and the problem of interference of the contents in other areas can be avoided in the process of recognizing the subsequent formula.

It should be noted that, in this embodiment, in the process of determining the target region based on the to-be-identified picture after acquiring the to-be-identified picture, the target region may be automatically determined from the to-be-identified picture; the target area may also be determined based on a formula area selected by the user, for example, the user may select an area desired to be identified in the picture to be identified, and the user may select one or more areas, which is equivalent to reducing the identification range of the image to be identified.

Certainly, in practical applications, the confirmation and determination may also be performed in combination with the selection of the user, for example, all regions that may need to be identified in the picture to be identified may be determined first to obtain a plurality of regions, such as the regions a to F, and then the regions a to F are output to the user, and after the user knows the regions, the user selects one or some regions from the regions to determine the target region. In the process of outputting the areas A to F to the user, the areas A to F can be displayed to the user in a frame selection or marking mode, so that the user can conveniently know the number of areas which can be subjected to subsequent formula identification in the current picture to be identified.

102. And dividing the target area to obtain one or more identification areas.

Wherein the identification region comprises a region of at least one formula, the formula comprising one or more of the formula cells.

In this embodiment, the formula unit may be understood as a part of the content in the formula in the foregoing step, and may be a simple formula, that is, when the formula is a compound formula, the formula unit is a simple formula in the compound formula. As shown in fig. 1-b, the formulas in the figure are all of fig. 1-b, so that the target area corresponding to the formulas actually contains more formula contents, and the formulas are more complex, and at this time, the method based on this step can divide the target area to obtain the identification areas corresponding to the multiple formula units. That is, in this step, the formula is actually split to obtain a plurality of formula units, which can facilitate subsequent identification.

103. And identifying one or more formula units to obtain a corresponding formula identification result.

After a plurality of identification areas are obtained, the identification areas all contain a part of a simple formula or a certain complex formula after splitting, namely a formula unit. At this time, a recognition operation may be performed on the recognition area based on the preset recognition model, and the recognition process may be as follows:

firstly, convolving an identification area with a plurality of convolution kernels in a preset identification model to generate characteristic images of the identification areas;

then, deconvolving each feature image to generate a plurality of divided images of the identification region, during which each divided image corresponds to each formula symbol in the identification region one-to-one;

then, the association relationship between the formula symbols is determined, and the recognition result of the formula is output based on the association relationship between the formula symbols, specifically, the recognition process described in this embodiment may be understood as a process of scanning and recognizing the formula in the picture, so as to extract the formula content, that is, converting the pixel into corresponding data, as shown in fig. 1-c, that is, recognizing one formula as corresponding formula data.

It should be noted that, in this embodiment, the preset recognition model and the recognition process are all consistent with a conventional recognition mode, for example, the preset recognition model may be a character recognition model, and can analyze and recognize characters, numbers and symbols in the area to be recognized. Of course, based on the different types of the preset recognition models, the recognition mode is also changed, which is not described herein in detail, and any existing preset recognition model can be adopted in the specific recognition process and recognized by using the corresponding recognition mode, and the recognition process is only exemplary, and is not specifically limited herein, and can be selected based on the actual needs of the user in the actual application.

104. And outputting the formula identification result.

After identifying the formula in the identified region, the formula identification result obtained in the foregoing step 103 may be output. It should be noted that the output mode used in the output process may be selected based on the needs of the user. For example, when the user needs to output the recognition result in a voice manner, the recognition result may be output in an audio manner, so that the user can acquire the recognition result in a desired manner.

Based on this, the method for identifying a formula in a picture provided by the above embodiment solves the problem that the identification effect is poor due to more contents and complexity contained in the picture when the formula in the picture is identified in the prior art, and obtains the picture to be identified and determines the target area from the picture to be identified; then dividing the target area to obtain one or more identification areas; and then identifying one or more formula units to obtain a corresponding formula identification result, and finally outputting the formula identification result, thereby realizing the identification function of the formula in the picture. In the above scheme, since the target area is at least one, the target area is an area containing a formula, the identification area contains at least one area containing a formula, and the formula contains a plurality of formula units, it is ensured that in the identification process, when the formula in the picture is relatively complex, the complex formula can be split into a plurality of small formulas or partial contents in the formula and identified according to the distribution of the identification area, so that identification based on the method of the present invention can be ensured even when the formula is relatively complex, and the identification effect is improved. In addition, the target area containing the formula can be determined from the acquired picture to be identified, so that the formula can be identified when one picture contains not only the formula but also other contents, the problem of poor formula identification effect under the conditions of more contents and more complexity in the picture to be identified is solved, and the identification effect of the formula in the picture is further improved.

In some embodiments, since a user needs to identify all contents in a picture during the process of identifying a formula in the picture, that is, needs to combine a plurality of identification regions after identifying the identification regions to obtain an identification result of the formula in the whole target region, in step 102 of the foregoing embodiment, dividing the target region to obtain one or more identification regions includes:

firstly, determining each identification area from the target area;

then, position information corresponding to each of the identification areas is acquired.

Before outputting the formula identification result in the aforementioned step 104, the method further includes:

based on this, the formula identification result output in the foregoing step 104 may specifically be: and outputting the combination result.

Thus, in the above step, each of the identification regions is determined from the target region, and specifically, a process of image segmentation of the original image based on the geometric and statistical features of the target may be specifically performed based on an existing target detection algorithm, where the target detection algorithm is used to identify some parts in the picture and detect the positions of the parts in the picture. Each object in the image can be determined during segmentation and can be located and classified at the same time. In this embodiment, since it is necessary to output the recognition result including all the contents of the entire target area subsequently, in this step, each recognition area in the target area may be divided and determined, and the corresponding location information is obtained at the same time, and the recognition results may be combined based on the location information of the recognition area after the recognition result of each recognition area is obtained subsequently, so as to ensure the effect of combining a plurality of recognition results into a combined result, that is, combining the recognition results of a plurality of formula units into a combined result of a formula according to the location information of the recognition area corresponding to the formula unit after the recognition of the formula units, so that the formula in the entire target area may be output integrally during the subsequent output, thereby achieving the effect of outputting integrally after the recognition of the formula, and satisfying the recognition function that the user knows all the contents of the formula in the target area at one time.

It should be noted that, in practical applications, because there may also be a problem that a distance between partial contents in one formula is large due to a layout of the contents in the picture to be recognized, there may also be a case of a recognition error when determining the recognition area in the target area, for example, there may be a problem of a target area corresponding to a certain formula being split incorrectly without splitting into two recognition areas.

For this purpose, after the identification area is obtained in this step, the identification area may be output to the user and calibrated by the user by clicking, for example, when the user finds that the identification area a and the identification area b are actually a formula, the two identification areas may be combined into one identification area a by the user by selecting. Therefore, the problem of recognition error caused by overlarge distance of each part between certain formulas due to the processes of typesetting, printing and the like in the picture to be recognized under certain conditions can be avoided, and the accuracy of overall formula recognition can be improved.

In addition, when the user outputs the identification area to the user in this step, when the user finds that a certain identification area is actually wrong or does not need to be identified, the identification area related to the instruction can be deleted or ignored in the plurality of identification areas by receiving the instruction issued by the user, so that when the user issues the instruction to determine that the certain identification area or areas can be deleted or ignored, the process of performing subsequent identification analysis on the certain identification area or areas can be saved, and therefore, the method disclosed by the embodiment can avoid the problems of time consumption and system resource consumption caused by identifying unnecessary identification areas in the execution process.

In some embodiments, in some cases, the user may only need to identify a part of the content in the target area, that is, only identify some formula unit or some formula units in the formula, but not all formula units, and the user may select the identification area to be identified based on the user's need. Therefore, before the previous embodiment identifies one or more formula units in step 103 to obtain the corresponding formula identification result, the method further includes:

the formula units output in the step can be shown as fig. 1-d, wherein each formula unit can be surrounded by a picture frame, and when the user determines which one needs to be identified, the corresponding picture frame can be selected through a preset instruction, so that the generation of the indication information is realized.

In the foregoing embodiment, the identifying one or more formula units in step 103 to obtain a corresponding formula identification result includes:

Based on the method of the step, after the identification area is output to the user, the target identification area needing to be identified subsequently is determined by the indication information of the user, so that the identification process can be ensured to select the content needing to be identified based on the user requirement, the unnecessary identification process caused by all identification is avoided, the resource occupation in the identification process can be reduced, and the targeted identification requirement of the user can be met.

In some embodiments, since the target area may be one or multiple, especially when there are multiple target areas, all the target areas may need to be divided to obtain identified areas so as to perform subsequent identification operations, or only a part of the target areas may need to be divided to obtain identified areas, based on which, in step 102 of the foregoing embodiment, the dividing the target areas to obtain one or more identified areas includes:

on one hand, when the target areas are multiple, the multiple target areas are respectively identified, and one or more identification areas corresponding to each target area are obtained. At this time, it can be understood that a plurality of target areas all need to be subjected to subsequent identification operation, and then in this step, the target areas need to be divided respectively to obtain an identification area corresponding to each target area, and then the identification operation may be continued. Because all the target areas can be identified under the condition of a plurality of target areas, the effect of identifying a plurality of formulas together can be realized when a plurality of formulas exist in one picture under certain conditions, and the requirement that a user acquires all the formulas in one picture simultaneously is met.

On the other hand, when the target areas are multiple, the identification is carried out based on at least one first target area selected by a user, and one or more identification areas are obtained respectively. In this case, it can be understood that the formula is displayed to the user through a preset interactive interface, so that the user selects where the formula needs to be identified, that is, the preset interactive interface is used to determine which of the target regions is specifically selected for subsequent identification operation, each target region in the preset interactive interface can be marked in a specific manner, so that the user can immediately know how many formulas in the current picture need to be identified when viewing the preset interactive interface, and the user can select the formula in the preset interactive interface and generate a corresponding feedback instruction, so that the feedback instruction can determine which formula the user needs to identify subsequently, that is, the first target region. By the method, in the subsequent identification process, under the condition that a plurality of target areas are detected, the system burden caused by identification of all the target areas is avoided, so that the effect of dividing and identifying the first target area required by the user can be achieved by the user through a feedback instruction issued by the preset interactive interface, and the method of the embodiment is more targeted.

In some embodiments, the aforementioned embodiments wherein each of the identified regions is determined from the target region, upon execution, comprise:

firstly, adding corresponding classification marks in the target area according to the data types, wherein the classification marks are used for representing the types of the data in the target area, and the classification marks comprise bracket type marks and text type marks. In this step, the classification mark may be understood as a label for determining a classification based on the data content contained in the target area, and since a formula generally includes a text portion (letters and numbers) and a symbol portion (brackets and the like), in this step, the text portion may be determined in the target area and determined as a text category mark in the classification mark, and the symbol portion may be determined as a bracket category mark of the classification mark.

Then, the target area is divided according to the classification identification to obtain each identification area. Based on the classification identifier, the content contained in the different areas can be determined, which can be performed directly based on the classification identifier when dividing the identification area.

In this embodiment, since the process of dividing the identification region is based on adding the corresponding classification identifier to the data type in the target region and dividing the identification region based on the classification identifier, it is ensured that the identification region can be divided based on the type of the data content in the target region in the identification process, so that each corresponding formula unit can be obtained by dividing different data contents and symbols in the formula when the divided identification region is ensured, the accuracy of the divided identification region is ensured, the problem that the identification effect of the subsequent identification region is affected by a division error is avoided, and the accuracy of formula identification in the picture can be integrally improved.

In some embodiments, in order to further improve the accuracy of the recognition process, after determining the recognition area, the recognition area may be further processed, which may specifically include but is not limited to adding a bounding box capable of facilitating subsequent recognition, dividing the target area based on the classification identifier in the foregoing steps to obtain each recognition area, and when specifically executing, the following method may be performed, where:

adding a boundary box to the target area to obtain an identification unit, wherein the boundary box is used for distinguishing different identification areas in the target area.

Based on this, identifying one or more formula units to obtain a corresponding formula identification result may specifically be: and executing a formula recognition operation on the recognition unit according to a preset recognition model to obtain a recognition result corresponding to the recognition unit. Therefore, after the identification area is determined in the target area, the corresponding boundary box can be added to each identification area, and each unit area to be identified subsequently, namely the identification unit, is formed, so that identification can be carried out according to the identification unit in the subsequent identification process, and each formula unit to be identified can be conveniently obtained and identified in the formula identification operation.

In addition, in practical applications, the size and form of the bounding box may affect the result of the subsequent recognition, for example, when a certain recognition area is small, if a large bounding box is added, the recognition of other recognition areas may be affected, or several recognition areas may be adjacent to each other, if the bounding box is large, the recognition areas may overlap each other. In view of this, in the present embodiment, when a bounding box is added to each recognition area in the target area, the recognition unit to which the bounding box is added may be output to the user, and the user may make a judgment as to whether or not the subsequent recognition process can be performed based on the currently determined recognition unit. Of course, different sizes and forms of bounding boxes can be provided simultaneously in the process, so that the user can adjust the bounding box of each recognition unit under the condition that the mutual influence exists among the individual recognition units, thereby being beneficial to the accuracy of subsequent recognition.

In some embodiments, since the picture to be recognized may contain many different contents, including not only formulas, but also other texts, images, etc. that do not need to be recognized, such as shown in fig. 1-e, where only the formulas corresponding to the upper half of the graph need to be recognized, and the contents of the characters, etc. of the lower half actually do not need to be recognized. Therefore, in the method of this embodiment, in the foregoing embodiment, the step 101 acquires a picture to be recognized, and determines the target area from the picture to be recognized, and when the method is specifically executed, the method may include:

firstly, dividing the picture to be recognized into a first area and a second area, wherein the first area is an area containing the formula, and the second area is an area except the area where the formula is located in the picture to be recognized; then, the first region is determined as the target region.

Based on the description of the foregoing embodiment, it can be known that the characteristics of the content included in the picture can be divided into different regions, so that it can be determined, based on the characteristics of the formula, which part is a region including the formula from all data contents in the picture to be recognized, as a first region, and the other part is a second region that does not need to be recognized, and the first region is determined as a target region that needs to be recognized subsequently, so that an effect of dividing the formula part and the non-formula part in the picture can be ensured, and thus the target region including the formula can be accurately obtained subsequently, and thus the content of the second region does not need to be recognized, which not only reduces unnecessary resource occupation in the recognition process, but also reduces the problem that the content of the second region affects the formula recognition effect in the target region.

In addition, in this embodiment, because there may be a case where the first region and the second region are not accurately divided due to the influence of the display definition in the picture to be recognized, in order to solve this problem, the second region may be determined and then output to the user, and the user may perform manual analysis on the second region to determine whether a formula also exists in the second region. It is then detected whether the user has reached an instruction to re-determine the first area. Therefore, the condition that certain formulas are omitted when the first area is identified under certain conditions can be avoided, and the identification accuracy can be improved.

In some embodiments, since there may be different output requirements for the user, for example, the formula may be output in a voice mode or in a picture mode, so that the step 104 in the foregoing embodiments outputs the formula recognition result, which may be executed in the following manner, among others:

and outputting the formula identification result in a preset mode.

The preset mode comprises an image mode, a code mode and a voice mode, and the code mode comprises a LaTeX source code.

In the process of outputting in an image manner, it can be understood that the user only needs to extract the formula part in the original picture to be recognized in the process of performing formula recognition on the picture to be recognized in the above manner, for example, as shown in fig. 1-f.

In addition, when the output is performed in a code manner, the type of the code may be selected and then output, and in the present embodiment, the code may be output through a LaTeX source code output formula, as shown in fig. 1-g, where the LaTeX source code transliteration "ladeh" code is a typesetting method based on < lambda > Ε X, and is developed by american computer researchers and may be used to express complex tables and mathematical formulas. Because it is very suitable for computer system to identify and generate high-printing-quality scientific and mathematical documents, it can be extensively used for expression of complex-structure texts of formula, etc.. Of course, since the LaTeX source codes are complex to understand, the LaTeX source codes obtained after recognition can be generally converted into images for output. Of course, in this embodiment, the specific output mode may be performed based on a preset mode selected by the user, and is not specifically limited herein, which may be subject to actual needs.

Because the output can be carried out based on the preset mode, the corresponding preset mode is selected for output when the user has different requirements, and then the flexibility of the output mode is ensured, so that the user can know the identification result of the formula in the picture in a proper mode.

Further, as an implementation of the method shown in fig. 1 and various embodiments, an embodiment of the present invention further provides an apparatus for identifying a formula in a picture, which is used to implement the method shown in fig. 1 and various embodiments. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method. As shown in fig. 2, the apparatus includes: a determination unit 21, a division unit 22, an execution unit 23 and an output unit 24, wherein

The determining unit 21 may be configured to acquire a to-be-identified picture, and determine a target area from the to-be-identified picture, where the target area is at least one; the target area is an area containing a formula;

the dividing unit 22 may be configured to divide the target region to obtain one or more identification regions, where the identification region includes a region of at least one formula, and the formula includes a plurality of formula units;

the execution unit 23 may be configured to identify one or more formula units to obtain a corresponding formula identification result;

and the output unit 24 can be used for outputting the formula identification result.

By means of the technical scheme, the method and the device for identifying the formula in the picture have the advantages that the problem that in the prior art, when the formula in the picture is identified, the identification effect is poor due to the fact that the content contained in the picture is more and the picture is more complex is solved, the picture to be identified is obtained, and the target area is determined from the picture to be identified; then dividing the target area to obtain one or more identification areas; and then identifying one or more formula units to obtain a corresponding formula identification result, and finally outputting the formula identification result, thereby realizing the identification function of the formula in the picture. In the above scheme, since the target area is at least one, the target area is an area containing a formula, the identification area contains at least one area containing a formula, and the formula contains a plurality of formula units, it is ensured that in the identification process, when the formula in the picture is relatively complex, the complex formula can be split into a plurality of small formulas or partial contents in the formula and identified according to the distribution of the identification area, so that identification based on the method of the present invention can be ensured even when the formula is relatively complex, and the identification effect is improved. In addition, the target area containing the formula can be determined from the acquired picture to be identified, so that the formula can be identified when one picture contains not only the formula but also other contents, the problem of poor formula identification effect under the conditions of more contents and more complexity in the picture to be identified is solved, and the identification effect of the formula in the picture is further improved.

Optionally, the dividing unit is specifically configured to:

determining each of the identified regions from the target region;

acquiring position information corresponding to each identification area;

the outputting the formula identification result comprises:

and outputting the combination result.

Optionally, the dividing unit is specifically configured to:

alternatively, the first and second electrodes may be,

Optionally, the dividing unit is specifically configured to:

The method provided by the embodiment of the present application may be executed by a client or a server, and the client and the server that execute the method are described below separately.

Fig. 3 shows a block diagram of a client 300. For example, the client 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.

Referring to fig. 3, client 300 may include one or more of the following components: processing component 302, memory 304, power component 306, multimedia component 308, audio component 310, input/output (I/O) interface 33, sensor component 314, and communication component 316.

The processing component 302 generally controls overall operation of the client 300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 302 may include one or more processors 320 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 302 can include one or more modules that facilitate interaction between the processing component 302 and other components. For example, the processing component 302 can include a multimedia module to facilitate interaction between the multimedia component 308 and the processing component 302.

The memory 304 is configured to store various types of data to support operations at the client 300. Examples of such data include instructions for any application or method operating on the client 300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 304 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power component 306 provides power to the various components of the client 300. The power components 306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the client 300.

The multimedia component 308 comprises a screen providing an output interface between the client 300 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 308 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the client 300 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 310 is configured to output and/or input audio signals. For example, the audio component 310 includes a Microphone (MIC) configured to receive external audio signals when the client 300 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 304 or transmitted via the communication component 316. In some embodiments, audio component 310 also includes a speaker for outputting audio signals.

The I/O interface provides an interface between the processing component 302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

Sensor component 314 includes one or more sensors for providing status assessment of various aspects to client 300. For example, sensor component 314 may detect an open/closed state of device 300, the relative positioning of components, such as a display and keypad of client 300, sensor component 314 may also detect a change in the position of client 300 or a component of client 300, the presence or absence of user contact with client 300, client 300 orientation or acceleration/deceleration, and a change in the temperature of client 300. Sensor assembly 314 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 316 is configured to facilitate communications between the client 300 and other devices in a wired or wireless manner. The client 300 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication section 316 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 316 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the client 300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the following methods:

and outputting the formula identification result.

determining each of the identified regions from the target region;

acquiring position information corresponding to each identification area;

the outputting the formula identification result comprises:

and outputting the combination result.

determining the first area as the target area.

Optionally, the outputting the formula identification result includes:

Fig. 4 is a schematic structural diagram of a server in an embodiment of the present application. The server 400 may vary significantly due to configuration or performance, and may include one or more Central Processing Units (CPUs) 422 (e.g., one or more processors) and memory 432, one or more storage media 430 (e.g., one or more mass storage devices) storing applications 442 or data 444. Wherein the memory 432 and storage medium 430 may be transient or persistent storage. The program stored on the storage medium 430 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 422 may be arranged to communicate with the storage medium 430, and execute a series of instruction operations in the storage medium 430 on the server 400.

Still further, the central processor 422 may perform the following method:

and outputting the formula identification result.

determining each of the identified regions from the target region;

acquiring position information corresponding to each identification area;

the outputting the formula identification result comprises:

and outputting the combination result.

determining the first area as the target area.

Optionally, the outputting the formula identification result includes:

The server 400 may also include one or more power supplies 426, one or more wired or wireless network interfaces 450, one or more input-output interfaces 456, one or more keyboards 456, and/or one or more operating systems 441, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

Embodiments of the present application also provide a computer-readable medium having stored thereon instructions, which, when executed by one or more processors, cause an apparatus to perform the method for identifying a formula in a picture provided in the above method embodiments.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the attached claims

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for identifying a formula in a picture is characterized by comprising the following steps:

acquiring pictures to be identified, and determining target areas from the pictures to be identified, wherein the number of the target areas is at least one; the target area comprises an area of a formula;

dividing the target area to obtain one or more identification areas, wherein the identification areas comprise areas of at least one formula, and the formula comprises one or more formula units;

and outputting the formula identification result.

2. The method of claim 1, wherein the dividing the target area into one or more identified areas comprises:

determining each of the identified regions from the target region;

acquiring position information corresponding to each identification area;

the outputting the formula identification result comprises:

and outputting the combination result.

3. The method of claim 1, wherein before said identifying one or more of said formula units to obtain a corresponding formula identification result, said method further comprises:

4. The method of claim 1, wherein the dividing the target area into one or more identified areas comprises:

alternatively, the first and second electrodes may be,

5. The method of claim 2, wherein said determining each of said identified regions from said target region comprises:

6. The method of claim 5, wherein the dividing the target region according to the classification identifier to obtain each of the identified regions comprises:

7. The method according to any one of claims 1-6, wherein the obtaining a picture to be recognized and determining a target region from the picture to be recognized comprises:

determining the first area as the target area.

8. The method of claim 7, wherein outputting the formula identification result comprises:

9. An apparatus for identifying a formula in a picture, comprising:

the device comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for acquiring pictures to be identified and determining target areas from the pictures to be identified, and the number of the target areas is at least one; the target area comprises an area of a formula;

the dividing unit is used for dividing the target area to obtain one or more identification areas, wherein the identification areas comprise areas of at least one formula, and the formula comprises one or more formula units;

and the output unit is used for outputting the formula identification result.

10. The apparatus according to claim 9, wherein the dividing unit is specifically configured to:

determining each of the identified regions from the target region;

acquiring position information corresponding to each identification area;

the outputting the formula identification result comprises:

and outputting the combination result.

11. The apparatus of claim 9, further comprising a processing unit to:

12. The apparatus according to claim 9, wherein the dividing unit is specifically configured to:

alternatively, the first and second electrodes may be,

13. The apparatus according to claim 10, wherein the dividing unit is specifically configured to:

14. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device where the storage medium is located is controlled to execute the method for identifying the formula in the picture according to any one of claims 1 to 8.

15. An apparatus for identifying a formula in a picture, comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising the method for identifying a formula in a picture according to any one of claims 1 to 8.