CN111767889A

CN111767889A - Formula recognition method, electronic device and computer readable medium

Info

Publication number: CN111767889A
Application number: CN202010653266.1A
Authority: CN
Inventors: 明卫鹏; 田意翔; 刘子韬
Original assignee: Beijing Century TAL Education Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2020-10-13

Abstract

The embodiment of the present invention discloses a formula recognition method: preprocessing a picture containing a formula, performing formula symbol detection on the preprocessed picture, and obtaining category information and position information of the formula symbol contained in the above formula; based on the formula symbol The category information and location information of , construct a mixed feature vector; based on the mixed feature vector, identify and convert the above formula symbols, and obtain the character string corresponding to the formula contained in the above picture. Since this scheme constructs the detected category information and position information of the formula symbols as mixed feature vectors, the accuracy rate is high in the process of identifying and converting the formula symbols.

Description

Formula recognition method, electronic device and computer readable medium

技术领域technical field

本发明实施例涉及文本识别技术领域，尤其涉及一种自然场景下的公式识别方法、电子设备和计算机可读介质。Embodiments of the present invention relate to the technical field of text recognition, and in particular, to a formula recognition method, an electronic device, and a computer-readable medium in a natural scene.

背景技术Background technique

自然场景中的公式识别，是一种在自然场景中经过拍照、扫描等操作获取含有公式的图片，然后将图片中的公式识别为latex字符串的过程。Recognition of formulas in natural scenes is a process of obtaining pictures containing formulas through operations such as taking pictures and scanning in natural scenes, and then identifying the formulas in the pictures as latex strings.

目前，虽可借助于多种算法和神经网络模型对图片中的公式进行识别，但由于公式结构复杂，且自然场景中的公式展现形式极其丰富：例如，公式可以有不同的大小、字体、颜色、亮度、对比度等，并且，可能有弯曲、旋转、扭曲等情况。由此，导致对图片中公式的识别精度不高，识别结果不够准确。At present, although a variety of algorithms and neural network models can be used to identify formulas in pictures, due to the complex structure of formulas and the extremely rich expressions of formulas in natural scenes: for example, formulas can have different sizes, fonts, and colors. , brightness, contrast, etc., and may be bent, rotated, distorted, etc. As a result, the recognition accuracy of the formula in the picture is not high, and the recognition result is not accurate enough.

发明内容SUMMARY OF THE INVENTION

本发明提供了一种公式识别方案，以至少部分解决上述问题。The present invention provides a formula recognition scheme to at least partially solve the above problems.

根据本发明实施例的第一方面，提供了一种公式识别方法，所述方法包括：对包含公式的图片进行预处理，得到预处理后的图片；对所述预处理后的图片中进行公式符号检测，得到所述公式包含的公式符号的类别信息以及位置信息；基于所述公式符号的类别信息以及位置信息，构造混合特征向量；基于所述混合特征向量，进行所述公式符号的识别和转换，获得所述图片中包含的所述公式对应的字符串。According to a first aspect of the embodiments of the present invention, there is provided a formula recognition method, the method includes: preprocessing a picture containing formulas to obtain a preprocessed picture; Symbol detection, to obtain the category information and position information of the formula symbol contained in the formula; based on the category information and position information of the formula symbol, construct a mixed feature vector; Convert, to obtain the character string corresponding to the formula contained in the picture.

根据本发明实施例的第二方面，提供了一种电子设备，所述设备包括：一个或多个处理器；计算机可读介质，配置为存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现如第一方面所述的公式识别方法。According to a second aspect of the embodiments of the present invention, there is provided an electronic device, the device comprising: one or more processors; a computer-readable medium configured to store one or more programs, when the one or more processors A program is executed by the one or more processors so that the one or more processors implement the formula recognition method as described in the first aspect.

根据本发明实施例的第三方面，提供了一种计算机可读介质，其上存储有计算机程序，该程序被处理器执行时实现如第一方面所述的公式识别方法。According to a third aspect of the embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the formula identification method according to the first aspect.

根据本发明实施例提供的方案：对包含公式的图片进行预处理，将预处理后的图片进行公式符号检测，得到上述公式包含的公式符号的类别信息以及位置信息；基于该公式符号的类别信息以及位置信息，构造混合特征向量；基于混合特征向量，进行上述公式符号的识别和转换，获得上述图片中包含的公式对应的字符串。由于本方案构造的混合特征向量包括了公式符号的位置信息以及类别信息，通过类别信息可以较为准确地确定公式符号的类别，而通过位置信息则可明确指示该公式符号的位置，由此，使得用于进行公式符号的识别和转换的信息更全面更完整，可以更为准确地对公式符号进行识别，进行公式符号的识别和转换的准确率和效率都更高。According to the solution provided by the embodiment of the present invention: preprocess the picture containing the formula, and perform formula symbol detection on the preprocessed picture to obtain the category information and position information of the formula symbol contained in the above formula; based on the category information of the formula symbol and position information to construct a mixed feature vector; based on the mixed feature vector, identify and convert the above formula symbols, and obtain the character string corresponding to the formula contained in the above picture. Since the hybrid feature vector constructed in this scheme includes the position information and category information of the formula symbol, the category of the formula symbol can be more accurately determined by the category information, and the position of the formula symbol can be clearly indicated by the position information, so that the The information used for the identification and conversion of formula symbols is more comprehensive and complete, the formula symbols can be identified more accurately, and the accuracy and efficiency of the identification and conversion of formula symbols are higher.

附图说明Description of drawings

通过阅读参照以下附图所作的对非限制性实施例所作的详细描述，本发明的其它特征、目的和优点将会变得更明显：Other features, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1为根据本发明实施例一的一种公式识别方法的步骤流程图；FIG. 1 is a flow chart of the steps of a formula recognition method according to Embodiment 1 of the present invention;

图2为根据本发明实施例一的一种基于Yolo结构的神经网络模型示意图；2 is a schematic diagram of a neural network model based on a Yolo structure according to Embodiment 1 of the present invention;

图3为根据本发明实施例一的一种基于注意力的序列到序列模型示意图；3 is a schematic diagram of an attention-based sequence-to-sequence model according to Embodiment 1 of the present invention;

图4为根据本发明实施例二的一种公式识别方法的流程图；FIG. 4 is a flowchart of a formula recognition method according to Embodiment 2 of the present invention;

图5为根据本发明实施例三的一种电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device according to Embodiment 3 of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅配置为解释相关发明，而非对该发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only configured to explain the related invention, rather than limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

需要说明的是，在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。It should be noted that the embodiments of the present invention and the features of the embodiments may be combined with each other under the condition of no conflict. The present invention will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

实施例一Example 1

参照图1，示出了根据本发明实施例一的一种公式识别方法的步骤流程图。Referring to FIG. 1 , a flow chart of steps of a formula recognition method according to Embodiment 1 of the present invention is shown.

本实施例的公式识别方法包括以下步骤：The formula recognition method of this embodiment includes the following steps:

步骤101、对包含公式的图片进行预处理，得到预处理后的图片。Step 101: Preprocess the picture containing the formula to obtain a preprocessed picture.

为了降低不同尺度的包含公式的图片对公式符号识别与定位的影响，和降低包含公式的图片中公式符号区域所占比例的大小、以及公式符号的大小对公式符号识别与定位的影响，可以对包含公式的图片进行预处理。In order to reduce the influence of pictures containing formulas of different scales on the recognition and positioning of formula symbols, and to reduce the proportion of the formula symbol area in the pictures containing formulas, and the influence of the size of the formula symbols on the identification and positioning of formula symbols, you can Images containing formulas are preprocessed.

在本实施例中，可选地，可以通过以下方式对包含公式的图片进行预处理：In this embodiment, optionally, the image containing the formula may be preprocessed in the following manner:

首先对包含公式的图片进行二值化处理，获得二值化图片；然后从二值化图片中，确定上述公式所在的图片区域；最后根据从二值化图片中切割出的图片区域，得到预处理后的图片。通过该方式得到的预处理后的图片中，一方面，二值化后的图片去除了大量的非必要信息，尤其是非公式信息；另一方面，可以保留图片中的公式所在区域并去除多余的图片区域，使得在后续进行公式符号检测时能够提高检测效率。First, binarize the image containing the formula to obtain a binarized image; then, from the binarized image, determine the image area where the above formula is located; finally, according to the image area cut out from the binarized image, obtain a Processed picture. In the preprocessed picture obtained in this way, on the one hand, the binarized picture removes a lot of unnecessary information, especially non-formula information; on the other hand, the area where the formula is located in the picture can be preserved and the redundant The picture area can improve the detection efficiency in subsequent formula symbol detection.

在本实施例中，对包含公式的图片进行二值化处理，其中，二值化处理的具体方式可由本领域技术人员采用任意适当的方式实现，本发明实施例对此不作限制。本实施例中，二值化处理后的二值化图片中，公式符号像素对应的像素值为1，非公式符号像素的像素值为0。In this embodiment, binarization processing is performed on a picture containing a formula, wherein a specific manner of binarization processing can be implemented by those skilled in the art in any appropriate manner, which is not limited in this embodiment of the present invention. In this embodiment, in the binarized image after the binarization process, the pixel value corresponding to the formula symbol pixel is 1, and the pixel value of the non-formula symbol pixel is 0.

然后，可以从二值化图片中获取像素值为1的多个坐标点，根据该多个坐标点确定上述二值化图片的切割范围，作为公式所在的图片区域。由此，能够得到多个像素值为1的像素点所在的位置，进而确定公式在图片中的大致区域，可以理解，获取的坐标点越多，确定公式在图片中的区域越准确。Then, a plurality of coordinate points with a pixel value of 1 may be obtained from the binarized picture, and the cutting range of the above-mentioned binarized picture may be determined according to the plurality of coordinate points, as the picture area where the formula is located. In this way, the positions of multiple pixel points with a pixel value of 1 can be obtained, and then the approximate area of the formula in the picture can be determined. It can be understood that the more coordinate points obtained, the more accurate the determination of the area of the formula in the picture.

可选地，可以根据多个坐标点中的最大横坐标，最小横坐标，最大纵坐标，最小纵坐标，得到所述公式的外接四边形的顶点坐标；然后根据所述顶点坐标确定切割范围，对上述二值化图片进行切割。由此，可以高效获得公式所在区域的图片。Optionally, the vertex coordinates of the circumscribed quadrilateral of the formula can be obtained according to the maximum abscissa, the minimum abscissa, the maximum ordinate and the minimum ordinate in the multiple coordinate points; then the cutting range is determined according to the vertex coordinates, and the The above binarized image is cut. In this way, a picture of the area where the formula is located can be efficiently obtained.

例如，可以从二值化图片中获取的像素值为1的所有坐标点中，求得最小横坐标x1、最大横坐标x2以及最小纵坐标y1、最大纵坐标y2；从而获得像素值为1的图片区域的左上角坐标点(x1，y2)、左下角坐标点(x1，y1)、右上角坐标点(x2，y2)以及右下角坐标点(x2，y1)。最后，可以以该4个坐标点组成的四边形，作为切割范围，对上述二值化图片进行切割。通过上述四个坐标点对二值化图片进行切割，由于包含了像素值为1的最小横坐标，最大横坐标，最小纵坐标，最大纵坐标，因此能够快速并完整的获取到公式所在的图片区域。For example, the minimum abscissa x1, the maximum abscissa x2, the minimum ordinate y1, and the maximum ordinate y2 can be obtained from all the coordinate points with the pixel value of 1 obtained from the binarized image; thus, the pixel value of 1 can be obtained. The upper left corner coordinate point (x1, y2), the lower left corner coordinate point (x1, y1), the upper right corner coordinate point (x2, y2) and the lower right corner coordinate point (x2, y1) of the image area. Finally, the above-mentioned binarized image can be cut by using the quadrilateral formed by the four coordinate points as the cutting range. The binarized picture is cut through the above four coordinate points. Since the minimum abscissa, the maximum abscissa, the minimum ordinate and the maximum ordinate with the pixel value of 1 are included, the picture where the formula is located can be quickly and completely obtained. area.

可选地，可以将从二值化图片中切割出的图片区域，按照预设比例缩放后，进行补边处理，得到预处理后的图片。其中，预设比例可由本领域技术人员根据实际情况进行设置，可满足后续处理的需求即可。Optionally, the image area cut out from the binarized image can be scaled according to a preset ratio, and then subjected to edge-filling processing to obtain a preprocessed image. Wherein, the preset ratio can be set by those skilled in the art according to the actual situation, and can meet the requirements of subsequent processing.

在一种可行方式中，可以将从二值化图片中切割出的图片区域，按照缩放倍数(nx/(x2-x1)，ny/(y2-y1))进行缩放，其中，nx表示n与x的乘积，ny表示n与y的乘积。其中，n为二值化图片中的公式所对应的组件数，具体实现为公式所对应的连通域的数量，其可利用任意适当的连通域检测方法求得。其中，在所述公式中，一个连通域对应一个公式的组件，一个或多个组件构成公式的一个部件。例如：公式6+3＝9中，组件数为6，“6”、“3”、“9”、“+”分别对应一个组件，而“＝”具有二个连通域，对应二个组件。而公式的部件则由各个公式符号形成，如上公式中包括5个部件，分别为“6”、“3”、“9”、“+”和“＝”。In a feasible manner, the image area cut out from the binarized image can be scaled according to the scaling factor (nx/(x2-x1), ny/(y2-y1)), where nx represents n and The product of x, ny represents the product of n and y. Among them, n is the number of components corresponding to the formula in the binarized picture, which is specifically realized as the number of connected domains corresponding to the formula, which can be obtained by using any appropriate connected domain detection method. Wherein, in the formula, a connected domain corresponds to a component of a formula, and one or more components constitute a component of the formula. For example: in the formula 6+3=9, the number of components is 6, "6", "3", "9", "+" correspond to one component respectively, and "=" has two connected domains, corresponding to two components. The components of the formula are formed by various formula symbols. As shown in the above formula, there are 5 components, namely "6", "3", "9", "+" and "=".

上述x与y的值可以通过大数据统计获得，例如，可以选取预设数量的公式，计算每个公式中公式符号的大小，然后对所有的公式符号的大小求均值而得到，本实施例对预设数量不做限制。其中，x可以代表公式符号的高度，y可以代表公式符号的宽度，所以(x，y)可以表示公式符号的像素大小。The above-mentioned values of x and y can be obtained through big data statistics. For example, a preset number of formulas can be selected, the size of the formula symbols in each formula can be calculated, and then the sizes of all formula symbols can be averaged to obtain. There is no limit to the number of presets. Among them, x can represent the height of the formula symbol, y can represent the width of the formula symbol, so (x, y) can represent the pixel size of the formula symbol.

本实施例中，按照缩放倍数(nx/(x2-x1)，ny/(y2-y1))，对从二值化图片中切割出的图片区域进行缩放，可以使图片中公式包含的公式符号大小相似。In this embodiment, according to the zoom factor (nx/(x2-x1), ny/(y2-y1)), the picture area cut out from the binarized picture is scaled, so that the formula symbol contained in the formula in the picture can be scaled Similar in size.

接下来可以对缩放后的图片进行补边处理，得到预处理后的图片，该预处理后的图片可以是一个(1024，256)大小的图片。但该大小仅为示例性说明，在实际应用中，本领域技术人员可以根据实际需要将预处理后的图片处理为所需大小。对缩放后的图片进行补边处理，可以使得在后续进行公式符号检测时，预处理后的图片最外围的像素点也能够被完整检测到。Next, the scaled image can be edge-replenished to obtain a preprocessed image, and the preprocessed image can be a (1024, 256) size image. However, the size is only an exemplary description, and in practical applications, those skilled in the art can process the preprocessed picture into a required size according to actual needs. Performing edge-filling processing on the scaled picture can enable the pixels at the outermost periphery of the preprocessed picture to be completely detected when formula symbols are detected subsequently.

在本实施例中，对包含公式的图片进行预处理，可以降低不同尺度的包含公式的图片对公式符号识别与定位的影响，也可以降低包含公式的图片中公式符号区域所占比例的大小、以及公式符号的大小对公式符号识别与定位的影响，从而可以提高公式识别精度和定位精度。In this embodiment, the preprocessing of pictures containing formulas can reduce the influence of pictures containing formulas of different scales on the identification and positioning of formula symbols, and can also reduce the size of the proportion of the formula symbol area in the pictures containing formulas, And the influence of the size of the formula symbol on the identification and positioning of the formula symbol, so that the accuracy of formula identification and positioning can be improved.

步骤102、对预处理后的图片中进行公式符号检测，得到公式包含的公式符号的类别信息以及位置信息。Step 102: Detect formula symbols in the preprocessed picture to obtain category information and location information of formula symbols included in the formula.

在本实施例中，公式符号的类别信息用于指示公式符号的类别，包括但不限于数字类别、字母类别、运算符号类别、标点类别等等，本实施例不做限制，位置信息可以表示公式符号在图片中的位置坐标。In this embodiment, the category information of the formula symbol is used to indicate the category of the formula symbol, including but not limited to number category, letter category, operation symbol category, punctuation category, etc. This embodiment does not limit, and the position information can represent formula The coordinates of the position of the symbol in the picture.

可选地，可以将预处理后的图片输入用于进行公式符号检测的第一神经网络模型，得到上述公式包含的公式符号的类别信息以及位置信息。通过第一神经网络模型进行公式符号检测，检测准确率高。Optionally, the preprocessed picture may be input into the first neural network model for formula symbol detection, to obtain category information and position information of formula symbols contained in the above formula. Formula symbol detection is performed through the first neural network model, and the detection accuracy is high.

具体地，可以将预处理后的图片输入用于进行公式符号检测的第一神经网络模型，通过第一神经网络模型对预处理后的图片进行多尺度特征提取和符号检测，得到公式包含的公式符号的类别信息以及位置信息。通过第一神经网络模型对预处理后的图片进行多尺度特征提取，能够保证公式符号检测的精度更高。Specifically, the preprocessed picture can be input into the first neural network model for formula symbol detection, and the first neural network model can perform multi-scale feature extraction and symbol detection on the preprocessed picture to obtain the formula contained in the formula. Category information and location information of the symbol. The multi-scale feature extraction is performed on the preprocessed image through the first neural network model, which can ensure higher accuracy of formula symbol detection.

在本实施例中，第一神经网络模型可以是基于Yolo结构的神经网络模型。以Yolo_v3为示例，现有的Yolo_v3采用了3个不同尺度的特征映射图来进行对象检测，为了避免随着网络层数的加深而使浅层的信息损失，从而影响公式识别和定位的精度，本实施例中的基于Yolo结构的神经网络模型可以进行至少四个尺度的特征提取，获得对应的至少四个尺度的特征映射图。其中，至少四个尺度的特征提取中包括低尺度特征提取。然后基于至少四个尺度的特征映射图，进行符号框检测和符号识别，得到上述公式包含的公式符号的类别信息以及位置信息。其中，低尺度特征提取是基于像素之间的特征提取，提取的低尺度特征是不需要任何形状/空间关系的信息就可以从图像中自动提取的基本特征。本实施例中，通过增加低尺度特征提取，可以对形状比例非常规的符号，如小目标符号如点状符号“.”、以及长横线符号“————”等等，进行有效的特征提取。In this embodiment, the first neural network model may be a neural network model based on the Yolo structure. Taking Yolo_v3 as an example, the existing Yolo_v3 uses three feature maps of different scales for object detection. In order to avoid the loss of shallow information as the number of network layers deepens, the accuracy of formula recognition and positioning will be affected. The neural network model based on the Yolo structure in this embodiment can perform feature extraction of at least four scales, and obtain corresponding feature maps of at least four scales. Among them, the feature extraction of at least four scales includes low-scale feature extraction. Then, based on the feature maps of at least four scales, symbol frame detection and symbol recognition are performed to obtain the category information and position information of the formula symbols contained in the above formula. Among them, low-scale feature extraction is based on feature extraction between pixels, and the extracted low-scale features are basic features that can be automatically extracted from images without any shape/spatial relationship information. In this embodiment, by adding low-scale feature extraction, symbols with unconventional shapes and proportions, such as small target symbols such as dotted symbols ".", and long horizontal line symbols "------", can be effectively used. Feature extraction.

现有的Yolo_v3中，设置了9个不同比例尺度的锚框，分别是：(10，13)，(16，30)，(33，23)，(30，61)，(62，45)，(59，119)，(116，90)，(156，198)，(373，326)，这些锚框对某些目标对象的识别不够准确，例如对某些数学符号的识别。在本实施例中，基于Yolo结构的神经网络模型中可以设置有12个不同比例尺度的锚框，这些锚框中包括针对设定符号进行检测的锚框；锚框大小可以根据实际需求进行设置或调整。作为一组示例数据，锚框大小可以分别为：(10，10)，(6，40)，(40，40)，(40，6)，(40，80)，(80，40)，(80，80)，(12，100)，(12，120)，(120，120)，(100，12)，(228，228)。上述设定符号包括以下至少之一：符号大小在预设大小范围内的符号，例如“＝”、“+”、“x”、以及常见的其他符号；长宽比例大于预设比例的符号，例如“.”、

等符号。其中，预设比例可以由本领域技术人员根据实际需求适当设置，本发明实施例对此不作限制。在基于Yolo结构的神经网络模型中设置上述大小的锚框，可以提高不常见的公式符号识别准确率和定位精度，例如

“Ω”、“─”、“∫”等公式符号识别准确率和定位精度。In the existing Yolo_v3, 9 anchor boxes of different scales are set, namely: (10, 13), (16, 30), (33, 23), (30, 61), (62, 45), (59, 119), (116, 90), (156, 198), (373, 326), these anchor boxes are not accurate enough to recognize some target objects, such as the recognition of some mathematical symbols. In this embodiment, 12 anchor boxes with different scales can be set in the neural network model based on the Yolo structure, and these anchor boxes include anchor boxes for detecting the set symbols; the size of the anchor boxes can be set according to actual needs or adjust. As a set of example data, the anchor box sizes can be: (10, 10), (6, 40), (40, 40), (40, 6), (40, 80), (80, 40), ( 80, 80), (12, 100), (12, 120), (120, 120), (100, 12), (228, 228). The above-mentioned setting symbols include at least one of the following: symbols whose size is within a preset size range, such as "=", "+", "x", and other common symbols; symbols whose aspect ratio is greater than the preset ratio, E.g".",

etc. symbols. The preset ratio may be appropriately set by those skilled in the art according to actual needs, which is not limited in this embodiment of the present invention. Setting the anchor box of the above size in the neural network model based on the Yolo structure can improve the recognition accuracy and positioning accuracy of uncommon formula symbols, such as

"Ω", "─", "∫" and other formula symbols recognition accuracy and positioning accuracy.

如图2所示，为上述基于Yolo结构的神经网络模型示意图，对于一个输入图片，该基于Yolo结构的神经网络模型将其映射到多个尺度的特征映射图。其中，DBL是Yolo_v3的基本组件，为卷积+BN+Leakyrelu，BN和Leakyrelu共同构成了Yolo_v3的最小组件。图2中的resn中的n代表数字，有res1，res2，……res8等等，表示这个res_block里含有多少个res_unit，是Yolo_v3的大组件，Yolo_v3借鉴了ResNet的残差结构，使用这种结构可以让网络结构更深，res_block的基本组件也是DBL。concat的作用是张量拼接，将darknet中间层和后面的某一层的上采样进行拼接，拼接的操作与残差层add的操作是不一样的，拼接会扩充张量的维度，而add只是直接相加不会导致张量维度的改变。As shown in Figure 2, which is a schematic diagram of the above-mentioned neural network model based on the Yolo structure, for an input image, the neural network model based on the Yolo structure maps it to feature maps of multiple scales. Among them, DBL is the basic component of Yolo_v3, which is convolution + BN + Leakyrelu, and BN and Leakyrelu together constitute the smallest component of Yolo_v3. The n in resn in Figure 2 represents a number, including res1, res2,...res8, etc., indicating how many res_units are contained in this res_block, which is a large component of Yolo_v3. Yolo_v3 draws on the residual structure of ResNet and uses this structure. The network structure can be made deeper, and the basic component of res_block is also DBL. The function of concat is tensor splicing, splicing the darknet intermediate layer and the upsampling of a later layer. The operation of splicing is different from that of the residual layer add. The splicing will expand the dimension of the tensor, and add is only Direct addition does not result in a change in tensor dimension.

公式符号中会存在一些形状比例不常规的符号，如点状符号“.”、长横线符号“————”等等，常规Yolo_v3不易检测，为保证公式中的这些符号的识别准确率，本实施例中设置了一个低尺度的特征提取通道，参见图2，图2示出了通过基于Yolo结构的神经网络模型进行四个尺度的特征提取示意图，图中虚线框所示为4个特征映射图，其中y4为获得的低尺度的特征映射图。对于一个输入的预处理后的图片，基于Yolo结构的神经网络模型可以将其映射到4个尺度的特征映射图，基于这些特征映射图和前述的12个锚框，即可以进行符号识别和符号框检测，对应得到输入的预处理后的图片中公式包含的公式符号的类别信息以及位置信息。There will be some symbols with irregular shapes and proportions in the formula symbols, such as the dotted symbol ".", the long horizontal line symbol "------", etc. The conventional Yolo_v3 is not easy to detect. In order to ensure the recognition accuracy of these symbols in the formula , in this embodiment, a low-scale feature extraction channel is set. See Figure 2. Figure 2 shows a schematic diagram of four-scale feature extraction through the neural network model based on the Yolo structure. The dashed box in the figure shows four Feature map, where y4 is the obtained low-scale feature map. For an input preprocessed image, the neural network model based on the Yolo structure can map it to 4-scale feature maps. Based on these feature maps and the aforementioned 12 anchor boxes, symbol recognition and symbol recognition can be performed. The frame detection corresponds to the category information and position information of the formula symbol contained in the formula in the inputted preprocessed picture.

在本实施例中，通过基于Yolo结构的神经网络模型对预处理后的图片进行多尺度特征提取和符号检测，可以避免非常规目标符号的漏检，进而避免了对下一阶段翻译过程的影响，提高了公式识别的准确率，并且，对不同场景的包含公式的图片的鲁棒性以及泛化能力也较好。In this embodiment, the multi-scale feature extraction and symbol detection are performed on the preprocessed image through the neural network model based on the Yolo structure, which can avoid the missed detection of unconventional target symbols, thereby avoiding the impact on the translation process in the next stage. , the accuracy of formula recognition is improved, and the robustness and generalization ability of pictures containing formulas in different scenarios are also better.

步骤103、基于上述公式符号的类别信息以及位置信息，构造混合特征向量。Step 103: Construct a mixed feature vector based on the category information and location information of the above formula symbols.

其中，上述步骤得到的公式符号的位置信息可以包括符号框的左上角、左下角、右上角、右下角四个点的坐标，可选地，每个坐标由8个0-1的数值构成，可以形成一个表征公式符号的位置信息的向量。Wherein, the position information of the formula symbol obtained in the above steps may include the coordinates of the upper left corner, the lower left corner, the upper right corner and the lower right corner of the symbol frame. A vector can be formed that represents the position information of the formula symbols.

在本实施例中，可以对所述公式符号的类别信息进行矢量化，得到类别向量；现将所述类别向量与所述公式符号的位置信息进行拼接，得到混合特征向量。例如，可以先将公式符号的类别信息进行矢量化，生成一个122维的类别向量，每个维度的数值在0-1之间。然后可以将类别向量与上述位置信息进行拼接，生成一个130维的包含上述公式符号的类别信息以及位置信息的混合特征向量。其中，130维的混合特征向量中，122维代表公式符号的类别信息，剩余8维代表公式符号的位置信息。需要说明的是，上述具体数值仅为示例性说明，在实际应用中，本领域技术人员可以根据实际需要将混合特征向量设置为其他数值的维度。In this embodiment, the category information of the formula symbol can be vectorized to obtain a category vector; now, the category vector and the position information of the formula symbol are spliced to obtain a mixed feature vector. For example, the category information of the formula symbol can be vectorized first to generate a 122-dimensional category vector, and the value of each dimension is between 0-1. Then, the category vector can be spliced with the above position information to generate a 130-dimensional mixed feature vector containing the category information and position information of the above formula symbols. Among them, in the 130-dimensional mixed feature vector, 122 dimensions represent the category information of the formula symbols, and the remaining 8 dimensions represent the position information of the formula symbols. It should be noted that the above-mentioned specific numerical values are only illustrative, and in practical applications, those skilled in the art can set the mixed feature vector as the dimension of other numerical values according to actual needs.

在本实施例中，可选地，还可以通过其他方式得到混合特征向量，例如将上述类别向量与公式符号的位置信息这两组数据进行纵向堆叠，即可以将公式符号的位置信息堆叠到类别向量的下几行。还可以通过重叠合并的方式得到混合特征向量，即将上述类别向量与公式符号的位置信息这两组数据的其中一组数据补充进另外一组数据中等。In this embodiment, optionally, the mixed feature vector can also be obtained in other ways, for example, vertically stacking the above two sets of data of the category vector and the position information of the formula symbol, that is, stacking the position information of the formula symbol into the category The next few lines of the vector. The mixed feature vector can also be obtained by overlapping and merging, that is, supplementing one of the two sets of data of the above-mentioned category vector and the position information of the formula symbol into the other set of data.

在本实施例中，由于构造的混合特征向量包括了公式符号的位置信息以及类别信息，相比于现有技术中只基于类别信息进行公式符号的识别，对公式符号的定位将会更快速准确，不会出现位置错乱等情况，准确率更高。In this embodiment, since the constructed mixed feature vector includes the position information and category information of the formula symbol, compared to the prior art that only recognizes the formula symbol based on the category information, the positioning of the formula symbol will be faster and more accurate , there will be no situation such as position disorder, and the accuracy rate is higher.

步骤104、基于混合特征向量，进行公式符号的识别和转换，获得图片中包含的公式对应的字符串。Step 104: Identify and convert formula symbols based on the mixed feature vector to obtain a character string corresponding to the formula contained in the picture.

在本实施例中，上述字符串可以是latex字符串，具体地，可以将上述混合特征向量输入第二神经网络模型进行公式符号的识别和转换，进而获得图片中包含的公式对应的字符串。In this embodiment, the above-mentioned character string may be a latex character string. Specifically, the above-mentioned mixed feature vector may be input into the second neural network model to recognize and convert formula symbols, thereby obtaining the character string corresponding to the formula contained in the picture.

可选地，第二神经网络模型可以为基于注意力机制的序列到序列模型。Optionally, the second neural network model may be a sequence-to-sequence model based on an attention mechanism.

如图3所示，为基于注意力机制的序列到序列模型示意图，该模型是一个加入注意力机制的Encoder-Decoder结构的神经网络。编码器的输入序列为上述混合特征向量，若该混合特征向量表示的公式为x²+y²＝125÷5，则混合特征向量经过编码器后生成编码向量，然后进入注意力Attention模块进行特征提取，进而，再通过解码器后输出latex字符串，该字符串为公式x²+y²＝125÷5对应的latex字符串。As shown in Figure 3, it is a schematic diagram of the sequence-to-sequence model based on the attention mechanism, which is a neural network with an Encoder-Decoder structure added to the attention mechanism. The input sequence of the encoder is the above-mentioned mixed feature vector. If the formula represented by the mixed feature vector is x ² +y ² =125÷5, then the mixed feature vector will pass through the encoder to generate a coding vector, and then enter the Attention module to perform features. Extract, and then output a latex string after passing through the decoder, where the string is a latex string corresponding to the formula x ² +y ² =125÷5.

传统的将公式中的数学符号转换为latex字符串的方法，利用的是基于位置的上下文无关语法，该方法在识别的数学符号较多的情况下计算效率低下。而本实施例提供的利用基于注意力机制的序列到序列模型，将此转化过程作为一个翻译过程，并构造了基于公式符号的类别信息以及位置信息的混合特征向量，由于混合特征向量包含的公式符号的类别信息和位置信息更精准、明确、完整，相比于传统方式，提取的公式符号特征信息没有丢失和偏差，因此翻译得到的字符串准确率更高，且计算效率较快。The traditional method of converting mathematical symbols in formulas into latex strings utilizes a position-based context-free grammar, which is computationally inefficient when many mathematical symbols are recognized. However, the sequence-to-sequence model provided by this embodiment uses the attention mechanism, and the conversion process is regarded as a translation process, and a mixed feature vector based on the category information and position information of the formula symbol is constructed. Since the mixed feature vector contains the formula The category information and position information of symbols are more accurate, clear and complete. Compared with the traditional method, the extracted formula symbol feature information is not lost or biased, so the translated strings have higher accuracy and faster calculation efficiency.

通过本实施例，对包含公式的图片进行预处理，将预处理后的图片进行公式符号检测，得到上述公式包含的公式符号的类别信息以及位置信息；基于该公式符号的类别信息以及位置信息，构造混合特征向量；基于混合特征向量，进行上述公式符号的识别和转换，获得上述图片中包含的公式对应的字符串。由于本方案构造的混合特征向量包括了公式符号的位置信息以及类别信息，通过类别信息可以较为准确地确定公式符号的类别，而通过位置信息则可明确指示该公式符号的位置，由此，使得用于进行公式符号的识别和转换的信息更全面更完整，可以更为准确地对公式符号进行识别，进行公式符号的识别和转换的准确率和效率都更高。Through this embodiment, the pictures containing the formula are preprocessed, and the formula symbols are detected on the preprocessed pictures to obtain the category information and position information of the formula symbols contained in the above formula; based on the category information and position information of the formula symbols, Construct a mixed feature vector; based on the mixed feature vector, identify and convert the above formula symbols, and obtain the character string corresponding to the formula contained in the above picture. Since the hybrid feature vector constructed in this scheme includes the position information and category information of the formula symbol, the category of the formula symbol can be more accurately determined by the category information, and the position of the formula symbol can be clearly indicated by the position information, so that the The information used for the identification and conversion of formula symbols is more comprehensive and complete, the formula symbols can be identified more accurately, and the accuracy and efficiency of the identification and conversion of formula symbols are higher.

本实施例的公式识别方法可以由任意适当的具有数据处理能力的电子设备执行，包括但不限于：服务器、移动终端(如手机、PAD等)和PC机等。The formula identification method in this embodiment may be executed by any appropriate electronic device with data processing capability, including but not limited to: a server, a mobile terminal (such as a mobile phone, a PAD, etc.), a PC, and the like.

实施例二Embodiment 2

如图4所示，示出了根据本发明实施例二的一种公式识别方法的流程图。As shown in FIG. 4 , a flowchart of a formula recognition method according to Embodiment 2 of the present invention is shown.

示例性的，若预处理后的图片中的公式为x²+y²＝125÷5，则将预处理后的图片作为基于Yolo结构的神经网络模型的输入，通过该神经网络模型对预处理后的图片进行多尺度特征提取和符号检测，获得对应的多个尺度的特征映射图。然后基于这些特征映射图，进行符号框检测和符号识别，得到公式x²+y²＝125÷5所包含的公式符号的类别信息以及位置信息。如图4中虚线框所示，最终得到预处理后的图片中的各个公式符号：“x”，“2”，“+”，“y”，“2”，“＝”，“1”，“2”，“5”，“÷”，“5”，以及每个公式符号在预处理后的图片中的位置。例如，图4中符号框的四个角的坐标。然后可以将基于Yolo结构的神经网络模型输出的公式符号的类别信息以及位置信息，构造为混合特征向量，将该混合特征向量作为seq2seq，即基于注意力机制的序列到序列模型的输入，通过seq2seq模型对该混合特征向量进行识别和转换，最终获得图片中包含的公式对应的字符串：“x”，“^”，“2”，“+”，“y”，“^”，“2”，“＝”，“1”，“2”，“5”，“\div”，“5”。Exemplarily, if the formula in the preprocessed picture is x ² +y ² =125÷5, the preprocessed picture is used as the input of the neural network model based on the Yolo structure, and the preprocessed image is processed by the neural network model. The latter image is subjected to multi-scale feature extraction and symbol detection to obtain corresponding feature maps of multiple scales. Then, based on these feature maps, symbol frame detection and symbol recognition are performed to obtain the category information and position information of the formula symbols contained in the formula x ² +y ² =125÷5. As shown in the dotted box in Figure 4, each formula symbol in the preprocessed picture is finally obtained: "x", "2", "+", "y", "2", "=", "1", "2", "5", "÷", "5", and the position of each formula symbol in the preprocessed picture. For example, the coordinates of the four corners of the symbol box in Figure 4. Then, the category information and position information of the formula symbols output by the neural network model based on the Yolo structure can be constructed as a mixed feature vector, and the mixed feature vector can be used as seq2seq, that is, the input of the sequence-to-sequence model based on the attention mechanism, through seq2seq The model recognizes and converts the mixed feature vector, and finally obtains the string corresponding to the formula contained in the picture: "x", "^", "2", "+", "y", "^", "2" , "=", "1", "2", "5", "\div", "5".

其中，所述预处理的过程、及各个模型的具体处理过程均可参见前述实施例一中相关部分的描述，在此不再赘述。Wherein, the process of the preprocessing and the specific processing process of each model can be referred to the description of the relevant part in the foregoing Embodiment 1, which will not be repeated here.

通过本实施例，在进行公式识别，尤其是包含非常规符号如小目标符号或长宽比例大于预设比例的符号等的公式识别时，可以预先对包含公式的图片进行预处理，来降低不同尺度的包含公式的图片对公式符号识别与定位的影响，从而可以提高公式识别精度和定位精度；然后通过基于Yolo结构的神经网络模型将预处理后的图片进行公式符号检测，得到上述公式包含的公式符号的类别信息以及位置信息，能够避免目标符号的漏检，进而避免了对下一阶段的影响，提高了包含小目标符号的公式识别的准确率；最后基于上述公式符号的类别信息以及位置信息，构造混合特征向量；再基于混合特征向量，进行上述公式符号的识别和转换，获得上述图片中包含的公式对应的字符串，由于将检测到的公式符号的类别信息以及位置信息构造为混合特征向量，使得在进行公式符号的识别和转换的过程中，准确率更高。Through this embodiment, when formula recognition is performed, especially when formula recognition includes unconventional symbols such as small target symbols or symbols whose aspect ratio is greater than a preset ratio, the pictures containing formulas can be pre-processed to reduce the difference The influence of scaled pictures containing formulas on the recognition and positioning of formula symbols, so that the accuracy of formula identification and positioning can be improved; then, the preprocessed pictures are detected by the neural network model based on the Yolo structure, and the formula symbols contained in the above formulas are obtained. The category information and position information of the formula symbols can avoid the missed detection of the target symbols, thereby avoiding the impact on the next stage, and improving the accuracy of the identification of formulas containing small target symbols; finally, based on the category information and position information of the above formula symbols information, construct a mixed feature vector; then based on the mixed feature vector, identify and convert the above formula symbols to obtain the character string corresponding to the formula contained in the above picture, because the category information and position information of the detected formula symbols are constructed as mixed The feature vector makes the recognition and conversion of formula symbols more accurate.

需要说明的是，本发明实施例的公式识别方案可广泛应用于多种场景中，包括但不限于：对包括纯打印字体公式的图片进行公式识别、对包括纯手写字体公式的图片进行公式识别、对同时包括打印字体公式和手写字体公式的图像进行公式识别，等等。由此，使得本发明实施例的公式识别方案可广泛适用各种公式图片的场景，兼容性也更好。It should be noted that the formula recognition scheme of the embodiment of the present invention can be widely used in a variety of scenarios, including but not limited to: performing formula recognition on pictures including pure printed font formulas, and performing formula recognition on pictures including pure handwritten font formulas , formula recognition for images that include both printed font formulas and handwritten font formulas, etc. Therefore, the formula recognition scheme of the embodiment of the present invention can be widely applied to scenarios of various formula pictures, and the compatibility is better.

实施例三Embodiment 3

图5为本发明实施例三中电子设备的硬件结构；如图5所示，该电子设备可以包括：处理器(processor)301、通信接口(Communications Interface)302、存储器(memory)303、以及通信总线304。FIG. 5 is a hardware structure of an electronic device in Embodiment 3 of the present invention; as shown in FIG. 5 , the electronic device may include: a processor (processor) 301, a communications interface (Communications Interface) 302, a memory (memory) 303, and a communication bus 304 .

其中：in:

处理器301、通信接口302、以及存储器303通过通信总线304完成相互间的通信。The processor 301 , the communication interface 302 , and the memory 303 communicate with each other through the communication bus 304 .

通信接口302，用于与其它电子设备或服务器进行通信。The communication interface 302 is used to communicate with other electronic devices or servers.

处理器301，用于执行程序305，具体可以执行上述公式识别方法实施例中的相关步骤。The processor 301 is configured to execute the program 305, and specifically may execute the relevant steps in the above-mentioned embodiments of the formula identification method.

具体地，程序305可以包括程序代码，该程序代码包括计算机操作指令。Specifically, the program 305 may include program code including computer operation instructions.

处理器301可能是中央处理器CPU，或者是特定集成电路ASIC(ApplicationSpecific Integrated Circuit)，或者是被配置成实施本发明实施例的一个或多个集成电路。智能设备包括的一个或多个处理器，可以是同一类型的处理器，如一个或多个CPU；也可以是不同类型的处理器，如一个或多个CPU以及一个或多个ASIC。The processor 301 may be a central processing unit (CPU), or an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. One or more processors included in the smart device may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.

存储器303，用于存放程序305。存储器303可能包含高速RAM存储器，也可能还包括非易失性存储器(non-volatilememory)，例如至少一个磁盘存储器。The memory 303 is used to store the program 305 . The memory 303 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.

程序305具体可以用于使得处理器301执行以下操作：对包含公式的图片进行预处理，得到预处理后的图片；对预处理后的图片中进行公式符号检测，得到公式包含的公式符号的类别信息以及位置信息；基于公式符号的类别信息以及位置信息，构造混合特征向量；基于混合特征向量，进行公式符号的识别和转换，获得图片中包含的公式对应的字符串。The program 305 can specifically be used to make the processor 301 perform the following operations: preprocess the pictures containing the formula to obtain the preprocessed pictures; perform formula symbol detection on the preprocessed pictures to obtain the categories of the formula symbols contained in the formulas. information and location information; construct a mixed feature vector based on the category information and location information of the formula symbol; identify and convert the formula symbol based on the mixed feature vector to obtain the string corresponding to the formula contained in the picture.

在一种可选的实施方式中，程序305还用于使得处理器301在基于公式符号的类别信息以及位置信息，构造混合特征向量时：对公式符号的类别信息进行矢量化，得到类别向量；将类别向量与公式符号的位置信息进行拼接，得到混合特征向量。In an optional implementation manner, the program 305 is further configured to cause the processor 301 to: vectorize the category information of the formula symbol to obtain a category vector when constructing the mixed feature vector based on the category information and position information of the formula symbol; The category vector is spliced with the position information of the formula symbol to obtain the mixed feature vector.

在一种可选的实施方式中，程序305还用于使得处理器301在对包含公式的图片进行预处理，得到预处理后的图片时：对包含公式的图片进行二值化处理，获得二值化图片；从二值化图片中，确定公式所在的图片区域；根据从二值化图片中切割出的图片区域，得到预处理后的图片。In an optional implementation manner, the program 305 is further configured to cause the processor 301 to perform a binarization process on the image containing the formula when obtaining the preprocessed image by preprocessing the image containing the formula to obtain two Value the image; from the binarized image, determine the image area where the formula is located; obtain the preprocessed image according to the image area cut out from the binarized image.

在一种可选的实施方式中，程序305还用于使得处理器301在从二值化图片中，确定公式所在的图片区域时：从二值化图片中获取像素值为1的多个坐标点；根据多个坐标点确定二值化图片的切割范围，作为公式所在的图片区域。In an optional implementation manner, the program 305 is further configured to cause the processor 301 to obtain, from the binarized picture, a plurality of coordinates with a pixel value of 1 when determining the picture area where the formula is located from the binarized picture. Point; determine the cutting range of the binarized image according to multiple coordinate points, as the image area where the formula is located.

在一种可选的实施方式中，程序305还用于使得处理器301在根据从二值化图片中切割出的图片区域，得到预处理后的图片时：将二值化图片中切割出的图片区域，按照预设比例缩放后，进行补边处理，得到预处理后的图片。In an optional implementation manner, the program 305 is further configured to cause the processor 301 to obtain a preprocessed picture according to the picture region cut out from the binarized picture: The image area is scaled according to a preset ratio, and then edge-filling processing is performed to obtain a pre-processed image.

在一种可选的实施方式中，程序305还用于使得处理器301在根据多个坐标点确定图片的切割范围，作为公式所在的图片区域时：根据所述多个坐标点中的最大横坐标，最小横坐标，最大纵坐标，最小纵坐标，得到所述公式的外接四边形的顶点坐标；根据所述顶点坐标确定切割范围，对所述二值化图片进行切割。In an optional implementation manner, the program 305 is further configured to cause the processor 301 to determine the cutting range of the picture according to multiple coordinate points as the picture area where the formula is located: according to the maximum horizontal width among the multiple coordinate points Coordinates, the minimum abscissa, the maximum ordinate, and the minimum ordinate, the vertex coordinates of the circumscribed quadrilateral of the formula are obtained; the cutting range is determined according to the vertex coordinates, and the binarized image is cut.

在一种可选的实施方式中，程序305还用于使得处理器301在对预处理后的图片中进行公式符号检测，得到公式包含的公式符号的类别信息以及位置信息时：将预处理后的图片输入用于进行公式符号检测的第一神经网络模型，得到公式包含的公式符号的类别信息以及位置信息。In an optional implementation manner, the program 305 is further configured to cause the processor 301 to perform formula symbol detection in the preprocessed picture to obtain the category information and position information of the formula symbol contained in the formula: The picture of is input into the first neural network model for formula symbol detection, and the category information and position information of the formula symbols contained in the formula are obtained.

在一种可选的实施方式中，程序305还用于使得处理器301在将预处理后的图片输入用于进行公式符号检测的第一神经网络模型，得到公式包含的公式符号的类别信息以及位置信息时：将预处理后的图片输入用于进行公式符号检测的第一神经网络模型，通过第一神经网络模型对预处理后的图片进行多尺度特征提取和符号检测，得到公式包含的公式符号的类别信息以及位置信息。In an optional implementation manner, the program 305 is further configured to enable the processor 301 to input the preprocessed picture into the first neural network model for formula symbol detection to obtain category information of formula symbols included in the formula and For location information: input the preprocessed image into the first neural network model for formula symbol detection, and perform multi-scale feature extraction and symbol detection on the preprocessed image through the first neural network model to obtain the formula contained in the formula Category information and location information of the symbol.

在一种可选的实施方式中，第一神经网络模型为基于Yolo结构的神经网络模型；程序305还用于使得处理器301在通过第一神经网络模型对预处理后的图片进行多尺度特征提取和符号检测，得到公式包含的公式符号的类别信息以及位置信息时：通过基于Yolo结构的神经网络模型进行至少四个尺度的特征提取，获得对应的至少四个尺度的特征映射图，其中，至少四个尺度的特征提取中包括低尺度特征提取；基于至少四个尺度的特征映射图，进行符号识别和符号框检测，对应得到公式包含的公式符号的类别信息以及位置信息。In an optional implementation manner, the first neural network model is a neural network model based on the Yolo structure; the program 305 is further configured to enable the processor 301 to perform multi-scale features on the preprocessed picture through the first neural network model Extraction and symbol detection to obtain the category information and position information of the formula symbols contained in the formula: perform feature extraction of at least four scales through the neural network model based on the Yolo structure, and obtain the corresponding feature maps of at least four scales, wherein, The feature extraction of at least four scales includes low-scale feature extraction; based on the feature maps of at least four scales, symbol recognition and symbol frame detection are performed, and the category information and position information of the formula symbols contained in the formula are correspondingly obtained.

在一种可选的实施方式中，神经网络模型中设置有12个不同比例尺度的锚框，锚框中包括针对设定符号进行检测的锚框。In an optional implementation manner, 12 anchor boxes with different scales are set in the neural network model, and the anchor boxes include anchor boxes for detecting set symbols.

在一种可选的实施方式中，设定符号包括以下至少之一：符号大小在预设大小范围内的符号、长宽比例大于预设比例的符号。In an optional implementation manner, the set symbol includes at least one of the following: a symbol whose size is within a preset size range, and a symbol whose aspect ratio is greater than a preset ratio.

在一种可选的实施方式中，程序305还用于使得处理器301在基于混合特征向量，进行公式符号的识别和转换，获得图片中包含的公式对应的字符串时：将混合特征向量输入第二神经网络模型进行公式符号的识别和转换，获得图片中包含的公式对应的字符串，其中，第二神经网络模型为基于注意力机制的序列到序列模型。In an optional implementation manner, the program 305 is further configured to cause the processor 301 to recognize and convert formula symbols based on the mixed feature vector to obtain the character string corresponding to the formula contained in the picture: input the mixed feature vector into The second neural network model recognizes and converts formula symbols, and obtains a string corresponding to the formula contained in the picture, wherein the second neural network model is a sequence-to-sequence model based on an attention mechanism.

程序305中各步骤的具体实现可以参见上述公式识别方法实施例中的相应步骤中对应的描述，在此不赘述。所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的设备和模块的具体工作过程，可以参考前述方法实施例中的对应过程描述，在此不再赘述。For the specific implementation of each step in the program 305, reference may be made to the corresponding description in the corresponding step in the above-mentioned embodiment of the formula identification method, which is not repeated here. Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the above-described devices and modules, reference may be made to the corresponding process descriptions in the foregoing method embodiments, which will not be repeated here.

通过本实施例的电子设备，对包含公式的图片进行预处理，将预处理后的图片进行公式符号检测，得到上述公式包含的公式符号的类别信息以及位置信息；基于该公式符号的类别信息以及位置信息，构造混合特征向量；基于混合特征向量，进行上述公式符号的识别和转换，获得上述图片中包含的公式对应的字符串。由于本方案构造的混合特征向量包括了公式符号的位置信息以及类别信息，通过类别信息可以较为准确地确定公式符号的类别，而通过位置信息则可明确指示该公式符号的位置，由此，使得用于进行公式符号的识别和转换的信息更全面更完整，可以更为准确地对公式符号进行识别，进行公式符号的识别和转换的准确率和效率都更高。By using the electronic device of this embodiment, the picture containing the formula is preprocessed, and the formula symbol is detected on the preprocessed picture, so as to obtain the category information and position information of the formula symbol contained in the above formula; the category information based on the formula symbol and The position information is used to construct a mixed feature vector; based on the mixed feature vector, the identification and conversion of the above formula symbols are performed, and the character string corresponding to the formula contained in the above picture is obtained. Since the hybrid feature vector constructed in this scheme includes the position information and category information of the formula symbol, the category of the formula symbol can be more accurately determined by the category information, and the position of the formula symbol can be clearly indicated by the position information, so that the The information used for the identification and conversion of formula symbols is more comprehensive and complete, the formula symbols can be identified more accurately, and the accuracy and efficiency of the identification and conversion of formula symbols are higher.

特别地，根据本发明的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本发明的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含配置为执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分从网络上被下载和安装，和/或从可拆卸介质被安装。在该计算机程序被中央处理单元(CPU)执行时，执行本发明实施例中的方法中限定的上述功能。需要说明的是，本发明实施例所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读介质例如可以但不限于是电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储介质(RAM)、只读存储介质(ROM)、可擦式可编程只读存储介质(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储介质(CD-ROM)、光存储介质件、磁存储介质件、或者上述的任意合适的组合。在本发明实施例中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本发明实施例中，计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输配置为由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：无线、电线、光缆、RF等等，或者上述的任意合适的组合。In particular, the processes described above with reference to the flowcharts may be implemented as computer software programs according to embodiments of the present invention. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code configured to perform the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion, and/or installed from a removable medium. When the computer program is executed by a central processing unit (CPU), the above-described functions defined in the methods in the embodiments of the present invention are performed. It should be noted that the computer-readable medium described in this embodiment of the present invention may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the foregoing two. The computer readable medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access storage media (RAM), read only storage media (ROM), erasable storage media programmable read-only storage media (EPROM or flash memory), optical fiber, portable compact disk read-only storage media (CD-ROM), optical storage media devices, magnetic storage media devices, or any suitable combination of the foregoing. In the embodiments of the present invention, the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device. In the embodiments of the present invention, however, the computer-readable signal medium may include a data signal in a baseband or propagated as part of a carrier wave, carrying computer-readable program codes therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport a program configured for use by or in connection with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言或其组合来编写配置为执行本发明实施例的操作的计算机程序代码，所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络：包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code configured to perform operations of embodiments of the present invention may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network: including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, through the Internet using an Internet service provider) connect).

附图中的流程图和框图，图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个配置为实现规定的逻辑功能的可执行指令。上述具体实施例中有特定先后关系，但这些先后关系只是示例性的，在具体实现的时候，这些步骤可能会更少、更多或执行顺序有调整。即在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions configured to implement the specified functions executable instructions. There are specific sequence relationships in the above specific embodiments, but these sequence relationships are only exemplary, and during specific implementation, these steps may be fewer, more, or the execution order may be adjusted. That is, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

描述于本发明实施例中所涉及到的模块可以通过软件的方式实现，也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中，例如，可以描述为：一种处理器包括接入模块和发送模块。其中，这些模块的名称在某种情况下并不构成对该模块本身的限定。The modules involved in the embodiments of the present invention may be implemented in a software manner, and may also be implemented in a hardware manner. The described modules can also be provided in the processor, for example, it can be described as: a processor includes an access module and a transmission module. Among them, the names of these modules do not constitute a limitation on the module itself under certain circumstances.

作为另一方面，本发明实施例还提供了一种计算机可读介质，其上存储有计算机程序，该程序被处理器执行时实现如上述实施例中所描述的公式识别方法。As another aspect, an embodiment of the present invention further provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the formula identification method described in the foregoing embodiments.

作为另一方面，本发明实施例还提供了一种计算机可读介质，该计算机可读介质可以是上述实施例中描述的装置中所包含的；也可以是单独存在，而未装配入该装置中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被该装置执行时，使得该装置：对包含公式的图片进行预处理，得到预处理后的图片；对预处理后的图片中进行公式符号检测，得到公式包含的公式符号的类别信息以及位置信息；基于公式符号的类别信息以及位置信息，构造混合特征向量；基于混合特征向量，进行公式符号的识别和转换，获得图片中包含的所述公式对应的字符串。As another aspect, an embodiment of the present invention also provides a computer-readable medium. The computer-readable medium may be included in the apparatus described in the above embodiments; or may exist alone without being assembled into the apparatus. middle. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the device, the device can: preprocess the picture containing the formula to obtain the preprocessed picture; The formula symbol is detected in the picture of the formula symbol, and the category information and position information of the formula symbol contained in the formula are obtained; based on the category information and position information of the formula symbol, a mixed feature vector is constructed; based on the mixed feature vector, the formula symbol is identified and converted, and obtained The string corresponding to the formula contained in the picture.

在本发明的各种实施方式中所使用的表述“第一”、“第二”、“所述第一”或“所述第二”可修饰各种部件而与顺序和/或重要性无关，但是这些表述不限制相应部件。以上表述仅配置为将元件与其它元件区分开的目的。The expressions "first," "second," "the first," or "the second" as used in various embodiments of the present invention may modify various elements regardless of order and/or importance , but these expressions do not limit the corresponding parts. The above expressions are only configured for the purpose of distinguishing an element from other elements.

以上描述仅为本发明的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本发明实施例中所涉及的发明范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述发明构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本发明实施例中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present invention and an illustration of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in the embodiments of the present invention is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, and should also cover the above-mentioned technical solutions without departing from the above-mentioned inventive concept. Other technical solutions formed by any combination of features or their equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the embodiments of the present invention (but not limited to) with similar functions.

Claims

1. a formula identification method, is characterized in that, described method comprises:

Preprocess the image containing the formula to obtain the preprocessed image;

Performing formula symbol detection on the preprocessed picture to obtain category information and position information of the formula symbol contained in the formula;

Construct a mixed feature vector based on the category information and position information of the formula symbol;

Based on the mixed feature vector, the identification and conversion of the formula symbol is performed, and the character string corresponding to the formula contained in the picture is obtained.

2. The method according to claim 1, wherein the constructing a mixed feature vector based on the category information and the position information of the formula symbol comprises:

Vectorizing the category information of the formula symbol to obtain a category vector;

The category vector is spliced with the position information of the formula symbol to obtain a mixed feature vector.

3. The method according to claim 1, wherein the preprocessing is performed on the picture containing the formula to obtain the preprocessed picture, comprising:

Binarize the image containing the formula to obtain a binarized image;

From the binarized picture, determine the picture area where the formula is located;

The preprocessed picture is obtained according to the picture region cut out from the binarized picture.

4. The method according to claim 3, wherein, determining the picture area where the formula is located from the binarized picture, comprising:

Obtain a plurality of coordinate points with a pixel value of 1 from the binarized image;

The cutting range of the binarized picture is determined according to the plurality of coordinate points as the picture area where the formula is located.

5. The method according to claim 3, wherein the obtaining the preprocessed picture according to the picture region cut out from the binarized picture, comprising:

The image area cut out from the binarized image is scaled according to a preset ratio, and then edge-filling processing is performed to obtain a preprocessed image.

6. The method according to claim 4, wherein the determining the cutting range of the picture according to the plurality of coordinate points, as the picture area where the formula is located, comprises:

According to the maximum abscissa, the minimum abscissa, the maximum ordinate, and the minimum ordinate in the plurality of coordinate points, the vertex coordinates of the circumscribed quadrilateral of the formula are obtained;

The cutting range is determined according to the vertex coordinates, and the binarized image is cut.

7. The method according to claim 1, characterized in that, performing formula symbol detection in the preprocessed picture to obtain category information and position information of the formula symbol contained in the formula, including:

Inputting the preprocessed picture into a first neural network model for formula symbol detection, to obtain category information and position information of formula symbols contained in the formula.

8 . The method according to claim 7 , wherein the preprocessed picture is input into a first neural network model for formula symbol detection to obtain category information of formula symbols included in the formula. 9 . and location information, including:

Inputting the preprocessed picture into a first neural network model for formula symbol detection, and performing multi-scale feature extraction and symbol detection on the preprocessed picture through the first neural network model to obtain the The category information and position information of the formula symbol contained in the formula.

9. method according to claim 8, is characterized in that, described first neural network model is the neural network model based on Yolo structure;

The first neural network model is used to perform multi-scale feature extraction and symbol detection on the preprocessed picture, so as to obtain the category information and position information of the formula symbol contained in the formula, including:

Perform feature extraction of at least four scales by using the neural network model based on the Yolo structure, and obtain corresponding feature maps of at least four scales, wherein the feature extraction of at least four scales includes low-scale feature extraction;

Based on the feature maps of the at least four scales, symbol recognition and symbol frame detection are performed, and category information and position information of the formula symbols contained in the formula are correspondingly obtained.

10 . The method according to claim 9 , wherein 12 anchor boxes with different scales are set in the neural network model, and the anchor boxes include anchor boxes for detecting set symbols. 11 .

11 . The method according to claim 10 , wherein the set symbol comprises at least one of the following: a symbol whose size is within a preset size range, and a symbol whose aspect ratio is greater than a preset ratio. 12 .

12 . The method according to claim 1 , wherein the identification and conversion of the formula symbol is performed based on the mixed feature vector, and the character string corresponding to the formula contained in the picture is obtained, comprising: 12 . :

Input the mixed feature vector into a second neural network model to identify and convert the formula symbols, and obtain the character string corresponding to the formula contained in the picture, wherein the second neural network model is based on attention A sequence-to-sequence model of the mechanism.

13. An electronic device, characterized in that the device comprises:

one or more processors;

A computer-readable medium configured to store one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement the formula recognition method according to any one of claims 1-12.

14. A computer-readable medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the formula recognition method according to any one of claims 1-12 is implemented.