WO2020164281A1 - Form parsing method based on character location and recognition, and medium and computer device - Google Patents

Form parsing method based on character location and recognition, and medium and computer device Download PDF

Info

Publication number
WO2020164281A1
WO2020164281A1 PCT/CN2019/118422 CN2019118422W WO2020164281A1 WO 2020164281 A1 WO2020164281 A1 WO 2020164281A1 CN 2019118422 W CN2019118422 W CN 2019118422W WO 2020164281 A1 WO2020164281 A1 WO 2020164281A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
layout
position information
recognition
table layout
Prior art date
Application number
PCT/CN2019/118422
Other languages
French (fr)
Chinese (zh)
Inventor
周罡
卢波
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910115364.7A external-priority patent/CN109961008B/en
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020164281A1 publication Critical patent/WO2020164281A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables

Definitions

  • This application relates to the field of computer processing technology, and in particular to a table analysis method, medium and computer equipment based on text positioning and recognition.
  • Deep learning is developing rapidly in the field of image recognition. It has completely surpassed the accuracy and efficiency of traditional methods, and is deeply concerned in the field of image recognition. Deep learning is a new field in machine learning research. Its motivation lies in establishing and simulating a neural network for analysis and learning of the human brain. It mimics the mechanism of the human brain to interpret data, such as images, sounds and texts.
  • the recognition of the table refers to the conversion of the table in the table picture into editable table text. In this process, text recognition and image recognition are required.
  • the existing technical solution is to perform table analysis based on the presence of table lines. When there is no table line, the table format picture cannot be extracted.
  • the present application provides a form analysis method and corresponding device based on text positioning and recognition, which mainly realizes the positioning and recognition of text in form pictures by using established deep learning models, and improves the efficiency and accuracy of form picture recognition.
  • This application also provides a computer device and a readable storage medium for executing the table analysis method based on text positioning and recognition of this application.
  • the present application provides a method for analyzing table images based on text positioning and recognition, the method including:
  • the input form picture to a pre-trained text positioning network to obtain position information of characters in the form picture includes:
  • a rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
  • This application provides a form analysis method based on text positioning and recognition.
  • the position information of the characters in the form pictures is obtained; the form pictures are performed according to the position information.
  • a first table layout; according to the first table layout and the cell character content, a table file of the table picture is generated.
  • the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.
  • This application can detect whether the table picture contains grid lines; if the table picture contains grid lines, extract the second table layout of the table picture; combine the second table layout with the first A table layout is compared, and when the result of the comparison is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid.
  • This application can additionally detect whether there are table lines in the table picture. In the case that the table pictures have table lines, the table lines are directly extracted, and then the obtained first table layout and the extracted table line form the first The two table layouts are compared to verify whether the first table layout is valid.
  • This application uses the text positioning network and the text recognition network to parse the table pictures, which can be compatible with the situations where there is no table line and the table line or the table line is incomplete, and the scope of application is wide.
  • the present application may further calculate the comparison result of the second table layout and the first table layout, and the comparison result is expressed as the difference between the first table layout and the second table layout,
  • the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value
  • the text positioning network is retrained. This application can flexibly and intelligently learn through this mechanism, and intelligently adjust the pre-trained text positioning network, so that the analysis result of the table image becomes more and more accurate.
  • FIG. 1 is a flowchart of a table parsing method based on text positioning recognition in an embodiment
  • Figure 2 is a text positioning network based on scene text detection in the prior art
  • FIG. 3 is a schematic diagram of obtaining position information of characters in the table picture in an embodiment
  • FIG. 4 is a structural block diagram of a table analysis device based on text positioning recognition in an embodiment
  • Fig. 5 is a block diagram of the internal structure of a computer device in an embodiment.
  • An embodiment of the present application provides a table analysis method based on text positioning and recognition. As shown in FIG. 1, the method includes the following steps:
  • the deep network training is performed by inputting multiple target samples in advance, and the text positioning network capable of positioning the text of the table picture and the text recognition network capable of recognizing the text of the table picture are trained. Specifically, feature point extraction and feature fusion are performed on the sample picture, and finally the text positioning network and the text recognition network are output.
  • the target sample includes at least a picture sample and the coordinates of a marked rectangular frame with text.
  • Deep network training is a new field in machine learning research. Its motivation is to establish and simulate a neural network that simulates the human brain for analysis and learning. It mimics the mechanism of the human brain to interpret data, such as images, sounds, and text.
  • the general idea of this application is the text detection and recognition process based on deep network training, specifically through FasterRCNN (deep learning-based target detection technology), CTPN (natural scene text detection) and other positioning networks for text detection and recognition in pictures. Positioning to obtain the location information of the text, and then input the area pointed to by the location information to the RNN-based text recognition network such as RCNN for text recognition, and obtain the character string corresponding to the location information.
  • RNN-based text recognition network such as RCNN for text recognition
  • Figure 2 is a text positioning network based on EAST (scene text detection).
  • the text positioning network used in this application is an improvement based on the EAST text positioning network.
  • the text positioning network used in this application is the score in the network structure shown in FIG. 2 After the map is connected to the LSTM (Long Short-term Memory Network), the score map is brightened and evenly touched. Use dice during training Loss replaces focus-loss.
  • LSTM is a time recurrent neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in time series.
  • inputting the form picture described in this application to the pre-trained text positioning network to obtain the position information of the characters in the form picture specifically includes: inputting the form picture to the pre-trained text positioning network; Several character strings of are used as a character string combination; the smallest rectangular frame surrounding the character string combination is obtained; a rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
  • FIG. 3 is a schematic diagram of obtaining position information of characters in the table picture.
  • the table picture contains several character string combinations. After the text positioning network is used, the smallest rectangular frame wrapping each character string combination is output.
  • the position information of the characters in the table picture is expressed as the coordinate value of the smallest rectangular frame that wraps the combination of character strings.
  • the coordinates of the four vertices of the rectangular frame surrounding the character string combination can be directly obtained through the character positioning network.
  • the position information is expressed as the coordinate values of the upper left corner and the lower right corner of the rectangular frame.
  • the minimum and maximum values of the X axis and the minimum and maximum values of the Y axis constitute the coordinates of the upper left corner and the lower right corner of the rectangular frame, thereby obtaining a standard rectangular frame.
  • the coordinates of the four vertices of the smallest rectangular frame surrounding a certain string combination obtained through the text positioning network are: A (X1, Y1), A (X1, Y2), A (X2, Y1), and A (X2, Y2), according to the size of X1, X2, Y1, and Y2, select the coordinates of the upper left and lower right corners of the rectangle.
  • a rectangular frame is determined according to the position information, and a cell picture is determined according to the rectangular frame.
  • the present application performs image segmentation on the form picture according to the rectangular frame, and cuts out the cell picture corresponding to the rectangular frame from the form picture, wherein each cell picture contains a character string combination.
  • the present application inputs the cell picture to the text recognition network to recognize the content of the character string combination in the cell picture to obtain the cell character content.
  • the character recognition network is a classic character recognition CRNN network, and the cell character content that can be edited is obtained through the network.
  • extracting the first table layout of the table picture according to the position information specifically includes: extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information; Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points in the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row; calculate the total number of rows and the total The number of columns is used as the first table layout.
  • the rectangular frame wrapping each character string combination is divided into the positions of the rows and columns corresponding to the table pictures according to the overlap ratio of the position information in the horizontal direction and the vertical direction.
  • the ordinates of the vertices of the rectangular boxes in the same row are the same or similar
  • the abscissas of the rectangular boxes in the same column are the same or similar.
  • This application can set when the ordinates of two points are the same or the difference between the ordinates of the two points is within a preset range to determine that the two points are in the same row, and when the abscissas of the two points are the same or When the difference between the abscissas of the two points is within the preset range, it is determined that the two points are located in the same column.
  • this application divides the vertices of the rectangular frame with the same or similar ordinates into the same row, and divides the same or similar abscissas into the same column.
  • the first table layout includes at least the number of rows and columns of the table.
  • the name content of the table it has a text length that spans columns, so you can remove it first.
  • the generating a table file of the table picture according to the first table layout and the cell character content specifically includes: drawing a table according to the first table layout; The characters are correspondingly filled in the cells of the drawn table to generate a table file of the table picture.
  • the table corresponding to the table picture is drawn, and the table contains the same number of cells as the combination of the character strings. Further, this application fills the identified cell character content into the cells of the table to generate a table file, whose content can be saved in csv or json format for data analysis and processing by the program, thereby realizing the analysis of the table image .
  • the method before the input of the form picture to the pre-trained text positioning network and the position information of the characters in the form picture is obtained, the method further includes: detecting whether the form picture contains grid lines; if the form If the picture contains grid lines, extract the second table layout of the table picture; compare the second table layout with the first table layout, and when the comparison result is that the first table layout and the When the second table layout is consistent, it is verified that the first table layout is valid.
  • the second table layout can be extracted through the open and close operation of image science.
  • the present application can verify the reliability of the first table layout and the second table layout by comparing the first table layout with the second table layout.
  • the present application may also calculate a comparison result of the second table layout and the first table layout, and the comparison result is expressed as the difference between the first table layout and the second table.
  • the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, the text positioning network is retrained to improve the recognition accuracy of the solution.
  • the present application provides a form image analysis device based on text positioning recognition, including:
  • the input module 11 is used to input form pictures to a pre-trained text positioning network to obtain position information of characters in the form pictures.
  • the deep network training is performed by inputting multiple target samples in advance, and the text positioning network capable of positioning the text of the table picture and the text recognition network capable of recognizing the text of the table picture are trained. Specifically, feature point extraction and feature fusion are performed on the sample picture, and finally the text positioning network and the text recognition network are output.
  • the target sample includes at least a picture sample and the coordinates of a marked rectangular frame with text.
  • Deep network training is a new field in machine learning research. Its motivation is to establish and simulate a neural network that simulates the human brain for analysis and learning. It mimics the mechanism of the human brain to interpret data, such as images, sounds, and text.
  • the general idea of this application is the text detection and recognition process based on deep network training, specifically through FasterRCNN (deep learning-based target detection technology), CTPN (natural scene text detection) and other positioning networks for text detection and recognition in pictures. Positioning to obtain the location information of the text, and then input the area pointed to by the location information to the RNN-based text recognition network such as RCNN for text recognition, and obtain the character string corresponding to the location information.
  • RNN-based text recognition network such as RCNN for text recognition
  • Figure 2 is a text positioning network based on EAST (scene text detection).
  • the text positioning network used in this application is an improvement based on the EAST text positioning network.
  • the text positioning network used in this application is the score in the network structure shown in FIG. 2 After the map is connected to the LSTM (Long Short-term Memory Network), the score map is brightened and evenly touched. Use dice during training Loss replaces focus-loss.
  • LSTM is a time recurrent neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in time series.
  • inputting the form picture described in this application to the pre-trained text positioning network to obtain the position information of the characters in the form picture specifically includes: inputting the form picture to the pre-trained text positioning network; Several character strings of are used as a character string combination; the smallest rectangular frame surrounding the character string combination is obtained; a rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
  • FIG. 3 is a schematic diagram of obtaining position information of characters in the table picture.
  • the table picture contains several character string combinations. After the text positioning network is used, the smallest rectangular frame wrapping each character string combination is output.
  • the position information of the characters in the table picture is expressed as the coordinate value of the smallest rectangular frame that wraps the combination of character strings.
  • the coordinates of the four vertices of the rectangular frame surrounding the character string combination can be directly obtained through the character positioning network.
  • the position information is expressed as the coordinate values of the upper left corner and the lower right corner of the rectangular frame.
  • the minimum and maximum values of the X axis and the minimum and maximum values of the Y axis constitute the coordinates of the upper left corner and the lower right corner of the rectangular frame, thereby obtaining a standard rectangular frame.
  • the coordinates of the four vertices of the smallest rectangular frame that wraps a certain string combination obtained through the text positioning network are: A(X1, Y1), A(X1, Y2), A(X2, Y1), and A (X2, Y2), according to the size of X1, X2, Y1, and Y2, select the coordinates of the upper left and lower right corners of the rectangle.
  • the segmentation module 12 is configured to perform graphic segmentation on the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain Cell character content.
  • a rectangular frame is determined according to the position information, and a cell picture is determined according to the rectangular frame.
  • the present application performs image segmentation on the form picture according to the rectangular frame, and cuts out the cell picture corresponding to the rectangular frame from the form picture, wherein each cell picture contains a character string combination.
  • the present application inputs the cell picture to the text recognition network to recognize the content of the character string combination in the cell picture to obtain the cell character content.
  • the character recognition network is a classic character recognition CRNN network, and the cell character content that can be edited is obtained through the network.
  • the extraction module 13 is configured to extract the first table layout of the table picture according to the position information.
  • extracting the first table layout of the table picture according to the position information specifically includes: extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information; Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points in the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row; calculate the total number of rows and the total The number of columns is used as the first table layout.
  • the rectangular frame wrapping each character string combination is divided into the positions of the rows and columns corresponding to the table pictures according to the overlap ratio of the position information in the horizontal direction and the vertical direction.
  • the ordinates of the vertices of the rectangular boxes in the same row are the same or similar
  • the abscissas of the rectangular boxes in the same column are the same or similar.
  • This application can set when the ordinates of two points are the same or the difference between the ordinates of the two points is within a preset range to determine that the two points are in the same row, and when the abscissas of the two points are the same or When the difference between the abscissas of the two points is within the preset range, it is determined that the two points are located in the same column.
  • this application divides the vertices of the rectangular frame with the same or similar ordinates into the same row, and divides the same or similar abscissas into the same column.
  • the first table layout includes at least the number of rows and columns of the table.
  • the name content of the table it has a text length that spans columns, so you can remove it first.
  • the generating module 14 is configured to generate a table file of the table picture according to the first table layout and the cell character content.
  • the generating a table file of the table picture according to the first table layout and the cell character content specifically includes: drawing a table according to the first table layout; The characters are correspondingly filled in the cells of the drawn table to generate a table file of the table picture.
  • the table corresponding to the table picture is drawn, and the table contains the same number of cells as the combination of the character strings. Further, this application fills the identified cell character content into the cells of the table to generate a table file, whose content can be saved in csv or json format for data analysis and processing by the program, thereby realizing the analysis of the table image .
  • the method before the input of the form picture to the pre-trained text positioning network and the position information of the characters in the form picture is obtained, the method further includes: detecting whether the form picture contains grid lines; if the form If the picture contains grid lines, extract the second table layout of the table picture; compare the second table layout with the first table layout, and when the comparison result is that the first table layout and the When the second table layout is consistent, it is verified that the first table layout is valid.
  • the second table layout can be extracted through the open and close operation of image science.
  • the present application can verify the reliability of the first table layout and the second table layout by comparing the first table layout with the second table layout.
  • the present application may also calculate a comparison result of the second table layout and the first table layout, and the comparison result is expressed as the difference between the first table layout and the second table.
  • the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, the text positioning network is retrained to improve the recognition accuracy of the solution.
  • an embodiment of the present application provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium.
  • the computer-readable storage medium stores computer-readable instructions, and when the program is executed by a processor, the table analysis method based on text positioning and recognition according to any one of the technical solutions is implemented.
  • the computer-readable storage medium includes but is not limited to any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory, read-only memory), RAM (Random AcceSS Memory, random memory), EPROM (EraSable Programmable Read-Only Memory, erasable programmable read-only memory), EEPROM (Electrically EraSable Programmable Read-Only Memory, electrically erasable programmable read-only memory), flash memory, magnetic card or optical card.
  • a storage device includes any medium that stores or transmits information in a readable form by a device (for example, a computer, a mobile phone), and may be a read-only memory, a magnetic disk, or an optical disk.
  • the computer-readable storage medium provided by the embodiment of the application can realize the input of form pictures to a pre-trained text positioning network to obtain the position information of the characters in the form pictures; graph the form pictures according to the position information Segmentation, segmenting the cell picture corresponding to the position information, inputting the cell picture into a pre-trained text recognition network for character recognition, and obtaining cell character content; extracting the first part of the table picture according to the position information A table layout; according to the first table layout and the cell character content, a table file of the table picture is generated.
  • the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.
  • the present application provides a computer device.
  • the computer device includes a processor 303, a memory 305, an input unit 307, and a display unit 309.
  • the memory 305 may be used to store the application program 301 and various functional modules, and the processor 303 runs the application program 301 stored in the memory 305 to execute various functional applications and data processing of the device.
  • the memory 305 may be internal memory or external memory, or include both internal memory and external memory.
  • the internal memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, or random access memory.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • flash memory or random access memory.
  • External storage can include hard disks, floppy disks, ZIP disks, U disks, tapes, etc.
  • the memory disclosed in this application includes but is not limited to these types of memory.
  • the memory 305 disclosed in this application is merely an example and not a limitation.
  • the input unit 307 is used for receiving input of signals and receiving keywords input by the user.
  • the input unit 307 may include a touch panel and other input devices.
  • the touch panel can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc., to operate on the touch panel or near the touch panel), and according to the preset
  • the program drives the corresponding connection device; other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as playback control buttons, switch buttons, etc.), trackball, mouse, and joystick.
  • the display unit 309 may be used to display information input by the user or information provided to the user and various menus of the computer device.
  • the display unit 309 may take the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the processor 303 is the control center of the computer equipment. It uses various interfaces and lines to connect the various parts of the entire computer. By running or executing the software programs and/or modules stored in the memory 303, and calling the data stored in the memory, execute Various functions and processing data.
  • the one or more processors 303 shown in FIG. 5 can execute and realize the functions of the input module 11, the recognition module 12, the extraction module 13, and the generation module 14 shown in FIG. 4.
  • the computer device includes a memory 305 and a processor 303.
  • the memory 305 stores computer-readable instructions.
  • the processor 303 executes the steps of a table analysis method based on character positioning recognition described in the above embodiment.
  • the computer device provided by the embodiment of the application can input form pictures to a pre-trained text positioning network to obtain position information of characters in the form pictures; perform graphic segmentation and segmentation on the form pictures according to the position information
  • the cell picture corresponding to the position information is extracted, and the cell picture is input into a pre-trained text recognition network for character recognition to obtain the cell character content; according to the position information, the first table layout of the table picture is extracted ; According to the first table layout and the cell character content, a table file of the table picture is generated.
  • the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.
  • the present application can also detect whether the table picture contains grid lines; if the table picture contains grid lines, extract the second table layout of the table picture; The second table layout is compared with the first table layout, and when the comparison result is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid.
  • This application can additionally detect whether there are table lines in the table picture. In the case that the table pictures have table lines, the table lines are directly extracted, and then the obtained first table layout and the extracted table line form the first The two table layouts are compared to verify whether the first table layout is valid.
  • This application uses the text positioning network and the text recognition network to parse the table pictures, which can be compatible with the situations where there is no table line and the table line or the table line is incomplete, and the scope of application is wide.
  • the computer-readable storage medium provided in the embodiment of the present application can implement the above-mentioned embodiment of the table analysis method based on text positioning and recognition.
  • the aforementioned storage medium may be a magnetic disk, an optical disk, a read-only storage memory (Read-Only Non-volatile storage media such as Memory, ROM, or Random Access Memory (RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

A form parsing method based on character location and recognition. The method comprises: inputting a form picture into a pre-trained character location network to obtain location information of characters in the form picture (S11); performing graph segmentation on the form picture according to the location information to obtain a cell picture corresponding to the location information, and inputting the cell picture into a pre-trained character recognition network to perform character recognition so as to obtain the cell character content (S12); extracting a first form layout of the form picture according to the location information (S13); and generating a form file of the form picture according to the first form layout and the cell character content (S14). An established deep learning model can be used for locating and recognizing characters in a form picture, thereby improving the efficiency and accuracy of form picture recognition.

Description

基于文字定位识别的表格解析方法、介质及计算机设备 Form analysis method, medium and computer equipment based on character positioning recognition To
本申请要求于2019年2月13日提交中国专利局、申请号为201910115364.7、发明名称为“基于文字定位识别的表格解析方法、介质及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 13, 2019, the application number is 201910115364.7, and the invention title is "Form analysis method, medium and computer equipment based on text positioning recognition". The reference is incorporated in the application.
技术领域Technical field
本申请涉及计算机处理技术领域,尤其涉及一种基于文字定位识别的表格解析方法、介质及计算机设备。This application relates to the field of computer processing technology, and in particular to a table analysis method, medium and computer equipment based on text positioning and recognition.
背景技术Background technique
目前,深度学习在图片识别领域发展迅速,它已完全超越传统方法的准确率和效率,深受图片识别领域的关注。深度学习是机器学习研究中的一个新的领域,其动机在于建立、模拟人脑进行分析学习的神经网络,它模仿人脑的机制来解释数据,例如图像,声音和文本。然而,表格的识别是指将表格图片中的表格转换成可编辑的表格文本,该过程中需要用到文本的识别以及图像的识别。At present, deep learning is developing rapidly in the field of image recognition. It has completely surpassed the accuracy and efficiency of traditional methods, and is deeply concerned in the field of image recognition. Deep learning is a new field in machine learning research. Its motivation lies in establishing and simulating a neural network for analysis and learning of the human brain. It mimics the mechanism of the human brain to interpret data, such as images, sounds and texts. However, the recognition of the table refers to the conversion of the table in the table picture into editable table text. In this process, text recognition and image recognition are required.
现有的技术中,也有应用深度学习对表格图片中的表格进行解析,但是现有的技术方案中,是通过深度学习对表格图片中的表格线进行检测识别,其至少存在以下缺陷:In the existing technology, deep learning is also used to analyze the table in the table picture, but the existing technical solution is to detect and recognize the table line in the table picture through deep learning, which has at least the following defects:
现有的技术方案是基于有表格线的情况进行表格解析,当没有表格线时的表格格式图片,则不能进行表格提取。The existing technical solution is to perform table analysis based on the presence of table lines. When there is no table line, the table format picture cannot be extracted.
发明内容Summary of the invention
本申请提供一种基于文字定位识别的表格解析方法及相应的装置,其主要实现了利用建立好的深度学习模型进行表格图片中文字的定位与识别,提高了表格图片识别的效率以及准确率。 The present application provides a form analysis method and corresponding device based on text positioning and recognition, which mainly realizes the positioning and recognition of text in form pictures by using established deep learning models, and improves the efficiency and accuracy of form picture recognition. To
本申请还提供一种用于执行本申请的基于文字定位识别的表格解析方法的计算机设备及可读存储介质。This application also provides a computer device and a readable storage medium for executing the table analysis method based on text positioning and recognition of this application.
为解决上述问题,本申请采用如下各方面的技术方案:To solve the above problems, this application adopts the following technical solutions:
第一方面,本申请提供一种基于文字定位识别的表格图片解析方法,所述方法包括:In the first aspect, the present application provides a method for analyzing table images based on text positioning and recognition, the method including:
输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;Input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture;
依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;Graphically segment the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain cell character content;
依据所述位置信息,提取所述表格图片的第一表格布局;Extracting the first table layout of the table picture according to the position information;
依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件;Generating a table file of the table picture according to the first table layout and the cell character content;
其中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,包括:Wherein, the input form picture to a pre-trained text positioning network to obtain position information of characters in the form picture includes:
输入表格图片至预先训练的文字定位网络;Input form picture to pre-trained text positioning network;
获取所述表格图片中连续的若干个字符串作为一个字符串组合;Acquiring several consecutive character strings in the table picture as a character string combination;
获取包围所述字符串组合的最小的矩形框;Obtaining the smallest rectangular frame surrounding the character string combination;
建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。A rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
相对于现有技术,本申请的技术方案至少具备如下优点:Compared with the prior art, the technical solution of this application has at least the following advantages:
1、本申请提供一种基于文字定位识别的表格解析方法,通过输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;依据所述位置信息,提取所述表格图片的第一表格布局;依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件。本申请可以利用建立好的深度学习模型进行表格图片中文字的定位与识别,提高了表格图片识别的效率以及准确率。1. This application provides a form analysis method based on text positioning and recognition. By inputting form pictures to a pre-trained text positioning network, the position information of the characters in the form pictures is obtained; the form pictures are performed according to the position information. Graphic segmentation, segmenting the cell picture corresponding to the position information, inputting the cell picture into a pre-trained text recognition network for character recognition, and obtaining the cell character content; extracting the table picture according to the position information A first table layout; according to the first table layout and the cell character content, a table file of the table picture is generated. In this application, the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.
2、本申请通过输入表格图片至预先训练的文字定位网络;获取所述表格图片中连续的若干个字符串作为一个字符串组合;获取包围所述字符串组合的最小的矩形框;建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。本申请通过该机制获取所述表格图片中文字的位置信息,提高文字定位的准确性与效率。2. In this application, input form pictures into a pre-trained text positioning network; obtain several consecutive character strings in the form pictures as a character string combination; obtain the smallest rectangular frame surrounding the character string combination; establish rectangular coordinates System, acquiring the coordinates of each vertex of the rectangular frame as the position information. This application obtains the position information of the text in the table picture through this mechanism, and improves the accuracy and efficiency of text positioning.
3、本申请可以检测所述表格图片中是否包含网格线;若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。本申请还可以另外检测所述表格图片是否存在表格线,在所述表格图片存在表格线的情况下,直接提取所述表格线,然后将得到的第一表格布局与提取的表格线构成的第二表格布局进行比对以校验所述第一表格布局是否有效。本申请通过文字定位网络以及文字识别网络解析表格图片,可以兼容无表格线和有表格线或表格线残缺的情况,适用范围广。3. This application can detect whether the table picture contains grid lines; if the table picture contains grid lines, extract the second table layout of the table picture; combine the second table layout with the first A table layout is compared, and when the result of the comparison is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid. This application can additionally detect whether there are table lines in the table picture. In the case that the table pictures have table lines, the table lines are directly extracted, and then the obtained first table layout and the extracted table line form the first The two table layouts are compared to verify whether the first table layout is valid. This application uses the text positioning network and the text recognition network to parse the table pictures, which can be compatible with the situations where there is no table line and the table line or the table line is incomplete, and the scope of application is wide.
4、本申请还可以进一步计算所述第二表格布局与所述第一表格布局的比对结果,所述比对结果被表达为所述第一表格布局与所述第二表格的差异点,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络。本申请通过该机制可以灵活智能学习,智能调整预先训练好的文字定位网络,以使得表格图片的解析结果越来越精准。4. The present application may further calculate the comparison result of the second table layout and the first table layout, and the comparison result is expressed as the difference between the first table layout and the second table layout, When the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, the text positioning network is retrained. This application can flexibly and intelligently learn through this mechanism, and intelligently adjust the pre-trained text positioning network, so that the analysis result of the table image becomes more and more accurate.
附图说明Description of the drawings
图1为一个实施例中基于文字定位识别的表格解析方法流程图;FIG. 1 is a flowchart of a table parsing method based on text positioning recognition in an embodiment;
图2为现有技术中基于场景文字检测的文字定位网络;Figure 2 is a text positioning network based on scene text detection in the prior art;
图3为一个实施例中为获取到所述表格图片中字符的位置信息示意图;3 is a schematic diagram of obtaining position information of characters in the table picture in an embodiment;
图4为一个实施例中基于文字定位识别的表格解析装置结构框图;4 is a structural block diagram of a table analysis device based on text positioning recognition in an embodiment;
图5为一个实施例中计算机设备的内部结构框图。Fig. 5 is a block diagram of the internal structure of a computer device in an embodiment.
本申请目的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization of the purpose, functional characteristics and advantages of the application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application.
在本申请的说明书和权利要求书及上述附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如S11、S12等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。In some processes described in the specification and claims of this application and the above drawings, multiple operations appearing in a specific order are included, but it should be clearly understood that these operations may not be in the order in which they appear in this text. Execution or parallel execution, the sequence numbers of operations such as S11, S12, etc., are only used to distinguish different operations, and the sequence numbers themselves do not represent any execution order. In addition, these processes may include more or fewer operations, and these operations may be executed sequentially or in parallel. It should be noted that the descriptions of "first" and "second" in this article are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, nor do they limit the "first" and "second" Are different types.
本领域普通技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。Those of ordinary skill in the art can understand that, unless specifically stated otherwise, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of this application refers to the presence of the described features, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof. It should be understood that when we refer to an element as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element, or intervening elements may also be present. In addition, “connected” or “coupled” used herein may include wireless connection or wireless coupling. The term "and/or" as used herein includes all or any unit and all combinations of one or more associated listed items.
本领域普通技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语,应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样被特定定义,否则不会用理想化或过于正式的含义来解释。A person of ordinary skill in the art can understand that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meanings as those of ordinary skill in the art to which this application belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have a meaning consistent with the meaning in the context of the prior art, and unless specifically defined as here, they will not be idealized or overly Explain the formal meaning.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application, wherein the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative work are within the protection scope of this application.
请参阅图1,本申请实施例提供一种基于文字定位识别的表格解析方法,如图1所示,所述方法包括以下步骤:Please refer to FIG. 1. An embodiment of the present application provides a table analysis method based on text positioning and recognition. As shown in FIG. 1, the method includes the following steps:
S11、输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息。S11. Input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture.
本申请实施例中,预先通过输入多个目标样本进行深度网络的训练,训练出能够进行表格图片的文字定位的所述文字定位网络和能够进行表格图片文字识别的文字识别网络。具体的,对所述样本图片进行特征点提取以及特征融合,最终输出所述文字定位网络和所述文字识别网络。其中,所述目标样本至少包括图片样本以及标注的有文字的矩形框坐标。In the embodiment of the present application, the deep network training is performed by inputting multiple target samples in advance, and the text positioning network capable of positioning the text of the table picture and the text recognition network capable of recognizing the text of the table picture are trained. Specifically, feature point extraction and feature fusion are performed on the sample picture, and finally the text positioning network and the text recognition network are output. Wherein, the target sample includes at least a picture sample and the coordinates of a marked rectangular frame with text.
深度网络的训练是机器学习研究中的一个新的领域,其动机在于建立、模拟人脑进行分析学习的神经网络,它模仿人脑的机制来解释数据,例如图像,声音和文本。Deep network training is a new field in machine learning research. Its motivation is to establish and simulate a neural network that simulates the human brain for analysis and learning. It mimics the mechanism of the human brain to interpret data, such as images, sounds, and text.
本申请的总体思路为基于深度网络的训练的文字检测与识别过程,具体是通过FasterRCNN(基于深度学习的目标检测技术)、CTPN(自然场景文本检测)等定位网络针对图片中的文字进行检测和定位,得到文字的位置信息,然后将该位置信息所指向的区域输入到基于RNN文字识别网络如RCNN等进行文字的识别,得到该位置信息对应的字符串。The general idea of this application is the text detection and recognition process based on deep network training, specifically through FasterRCNN (deep learning-based target detection technology), CTPN (natural scene text detection) and other positioning networks for text detection and recognition in pictures. Positioning to obtain the location information of the text, and then input the area pointed to by the location information to the RNN-based text recognition network such as RCNN for text recognition, and obtain the character string corresponding to the location information.
请参考图2,图2为基于EAST(场景文字检测)文字定位网络。本申请所应用的文字定位网络是基于EAST文字定位网络改进而成。具体的,本申请所应用的文字定位网络是在图2所示的网络结构中的score map后接入LSTM(长短期记忆网络),将score map提亮摸均匀,训练时使用dice loss替换focus-loss。其中,LSTM是一种时间递归神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。Please refer to Figure 2. Figure 2 is a text positioning network based on EAST (scene text detection). The text positioning network used in this application is an improvement based on the EAST text positioning network. Specifically, the text positioning network used in this application is the score in the network structure shown in FIG. 2 After the map is connected to the LSTM (Long Short-term Memory Network), the score map is brightened and evenly touched. Use dice during training Loss replaces focus-loss. Among them, LSTM is a time recurrent neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in time series.
进一步的,本申请所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,具体包括:输入表格图片至预先训练的文字定位网络;获取所述表格图片中连续的若干个字符串作为一个字符串组合;获取包围所述字符串组合的最小的矩形框;建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。Further, inputting the form picture described in this application to the pre-trained text positioning network to obtain the position information of the characters in the form picture specifically includes: inputting the form picture to the pre-trained text positioning network; Several character strings of are used as a character string combination; the smallest rectangular frame surrounding the character string combination is obtained; a rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
请参考图3,图3为获取到所述表格图片中字符的位置信息示意图。如图3所示,所述表格图片中包含若干个字符串组合。通过所述文字定位网络后输出包裹各个字符串组合的最小矩形框。本申请实施例中,所述表格图片中字符的位置信息被表达为包裹所述字符串组合的最小矩形框的坐标值。本申请通过所述文字定位网络可以直接得到包裹所述字符串组合的矩形框的四个顶点的坐标。具体的,所述位置信息被表达为该矩形框的左上角以及右下角的坐标值。在实际使用时,因为表格文字基本是水平的,所以取得到的Quad Geometry这个函数中四个坐标的X轴最小值与最大值,Y轴的最小值与最大值,组成所述矩形框的左上角与右下角的坐标,从而得到标准的矩形框。例如,通过所述文字定位网络得到包裹某个字符串组合的最小矩形框的四个顶点的坐标分别为:A(X1,Y1)、A(X1,Y2)、A(X2,Y1)以及A(X2,Y2),依据X1、X2、Y1以及Y2的大小值,选取该矩形的左上角以及右下角的点的坐标值。Please refer to FIG. 3, which is a schematic diagram of obtaining position information of characters in the table picture. As shown in Figure 3, the table picture contains several character string combinations. After the text positioning network is used, the smallest rectangular frame wrapping each character string combination is output. In the embodiment of the present application, the position information of the characters in the table picture is expressed as the coordinate value of the smallest rectangular frame that wraps the combination of character strings. In the present application, the coordinates of the four vertices of the rectangular frame surrounding the character string combination can be directly obtained through the character positioning network. Specifically, the position information is expressed as the coordinate values of the upper left corner and the lower right corner of the rectangular frame. In actual use, because the table text is basically horizontal, the obtained Quad In the Geometry function, the minimum and maximum values of the X axis and the minimum and maximum values of the Y axis constitute the coordinates of the upper left corner and the lower right corner of the rectangular frame, thereby obtaining a standard rectangular frame. For example, the coordinates of the four vertices of the smallest rectangular frame surrounding a certain string combination obtained through the text positioning network are: A (X1, Y1), A (X1, Y2), A (X2, Y1), and A (X2, Y2), according to the size of X1, X2, Y1, and Y2, select the coordinates of the upper left and lower right corners of the rectangle.
S12、依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容。S12. Perform graphic segmentation on the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain cell character content .
本申请实施例中,依据所述位置信息确定一个矩形框,依据所述矩形框确定一个单元格图片。具体的,本申请依据所述矩形框对所述表格图片进行图像分割,从所述表格图片中截取出该矩形框对应的单元格图片,其中,每个单元格图片中包含一个字符串组合。In the embodiment of the present application, a rectangular frame is determined according to the position information, and a cell picture is determined according to the rectangular frame. Specifically, the present application performs image segmentation on the form picture according to the rectangular frame, and cuts out the cell picture corresponding to the rectangular frame from the form picture, wherein each cell picture contains a character string combination.
进一步的,本申请将所述单元格图片输入至所述文字识别网络,以对所述单元格图片中的字符串组合的内容进行识别得到所述单元格字符内容。本申请实施例中,所述文字识别网络是经典的文字识别CRNN网络,通过该网络后得到可供编辑的所述单元格字符内容。Further, the present application inputs the cell picture to the text recognition network to recognize the content of the character string combination in the cell picture to obtain the cell character content. In the embodiment of the present application, the character recognition network is a classic character recognition CRNN network, and the cell character content that can be edited is obtained through the network.
S13、依据所述位置信息,提取所述表格图片的第一表格布局。S13. Extract a first table layout of the table picture according to the location information.
本申请实施例中,所述依据所述位置信息,提取所述表格图片的第一表格布局,具体包括:提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;计算总的行数以及总的列数作为所述第一表格布局。In the embodiment of the present application, extracting the first table layout of the table picture according to the position information specifically includes: extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information; Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points in the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row; calculate the total number of rows and the total The number of columns is used as the first table layout.
本申请实施例中,通过所述位置信息在水平方向上和垂直方向上的重叠比例将包裹各个字符串组合的矩形框划分到表格图片对应的行列的位置。其中,相同行中矩形框的顶点的纵坐标相同或者相近,相同列的矩形框的横坐标相同或者相近。本申请可以设定当两个点的纵坐标相同或者两个点的纵坐标的差值在预设范围内时判断该两个点位于同一行,以及设定当两个点的横坐标相同或者两个点的横坐标的差值在预设范围内时判断该两个点位于同一列。本申请依据该原理,将矩形框的顶点的纵坐标相同或相近的划分为同一行,将横坐标相同或相近的划分为同一列。In the embodiment of the present application, the rectangular frame wrapping each character string combination is divided into the positions of the rows and columns corresponding to the table pictures according to the overlap ratio of the position information in the horizontal direction and the vertical direction. Wherein, the ordinates of the vertices of the rectangular boxes in the same row are the same or similar, and the abscissas of the rectangular boxes in the same column are the same or similar. This application can set when the ordinates of two points are the same or the difference between the ordinates of the two points is within a preset range to determine that the two points are in the same row, and when the abscissas of the two points are the same or When the difference between the abscissas of the two points is within the preset range, it is determined that the two points are located in the same column. According to this principle, this application divides the vertices of the rectangular frame with the same or similar ordinates into the same row, and divides the same or similar abscissas into the same column.
请继续参考图3,如图3所示,同一列的矩形框的顶点的横坐标存在相同或相近的,而不同列的横坐标范围没有交集。同一行的矩形框具有重合的纵坐标的交集,而不同行的纵坐标范围不存在交集。Please continue to refer to Figure 3, as shown in Figure 3, the abscissas of the vertices of the rectangular boxes in the same column are the same or similar, but the abscissa ranges of different columns do not overlap. Rectangular boxes in the same row have the intersection of coincident ordinates, and there is no intersection of ordinate ranges that are not in the same row.
本申请实施例中,所述第一表格布局至少包括表格的行数以及列数。对于表格的名称内容,它具有跨列的文字长度,则可以将其先去除。通过以上规则,可以提取所述表格图片的行的数量N以及列的数量M,进一步的,提取出所述表格图片的N×M布局格式。In the embodiment of the present application, the first table layout includes at least the number of rows and columns of the table. For the name content of the table, it has a text length that spans columns, so you can remove it first. Through the above rules, the number N of rows and the number M of columns of the table picture can be extracted, and further, the N×M layout format of the table picture can be extracted.
S14、依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件。S14. Generate a table file of the table picture according to the first table layout and the cell character content.
本申请实施例中,所述依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件,具体包括:依据所述第一表格布局绘制表格;将所述单元格字符对应填入绘制的表格的单元格中,生成所述表格图片的表格文件。In the embodiment of the application, the generating a table file of the table picture according to the first table layout and the cell character content specifically includes: drawing a table according to the first table layout; The characters are correspondingly filled in the cells of the drawn table to generate a table file of the table picture.
本申请实施例中,提取所述表格图片的第一表格布局之后绘制所述表格图片对应的表格,所述表格中包含与所述字符串组合数量相同的单元格。进一步的,本申请将识别出的单元格字符内容对应填入所述表格的单元格中生成表格文件,其内容可保存为csv或者json格式可供程序进行数据分析处理,从而实现表格图片的解析。In the embodiment of the present application, after extracting the first table layout of the table picture, the table corresponding to the table picture is drawn, and the table contains the same number of cells as the combination of the character strings. Further, this application fills the identified cell character content into the cells of the table to generate a table file, whose content can be saved in csv or json format for data analysis and processing by the program, thereby realizing the analysis of the table image .
本申请实施例中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息之前,还包括:检测所述表格图片中是否包含网格线;若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。一种可能的设计中,如果所述表格图中表格有网格线,可以通过图像学开闭运算提取出所述第二表格布局。In the embodiment of the present application, before the input of the form picture to the pre-trained text positioning network and the position information of the characters in the form picture is obtained, the method further includes: detecting whether the form picture contains grid lines; if the form If the picture contains grid lines, extract the second table layout of the table picture; compare the second table layout with the first table layout, and when the comparison result is that the first table layout and the When the second table layout is consistent, it is verified that the first table layout is valid. In a possible design, if there are grid lines in the table in the table diagram, the second table layout can be extracted through the open and close operation of image science.
实际上,本申请可以通过将所述第一表格布局与所述第二表格布局进行比对同时验证所述第一表格布局与所述第二表格布局的可靠性。In fact, the present application can verify the reliability of the first table layout and the second table layout by comparing the first table layout with the second table layout.
优选的,本申请还可以计算所述第二表格布局与所述第一表格布局的比对结果,所述比对结果被表达为所述第一表格布局与所述第二表格的差异点,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络,以提高本方案的识别精度。Preferably, the present application may also calculate a comparison result of the second table layout and the first table layout, and the comparison result is expressed as the difference between the first table layout and the second table. When the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, the text positioning network is retrained to improve the recognition accuracy of the solution.
请参考图4,在另一种实施例中,本申请提供了一种基于文字定位识别的表格图片解析装置,包括:Please refer to Fig. 4, in another embodiment, the present application provides a form image analysis device based on text positioning recognition, including:
输入模块11,用于输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息。The input module 11 is used to input form pictures to a pre-trained text positioning network to obtain position information of characters in the form pictures.
本申请实施例中,预先通过输入多个目标样本进行深度网络的训练,训练出能够进行表格图片的文字定位的所述文字定位网络和能够进行表格图片文字识别的文字识别网络。具体的,对所述样本图片进行特征点提取以及特征融合,最终输出所述文字定位网络和所述文字识别网络。其中,所述目标样本至少包括图片样本以及标注的有文字的矩形框坐标。In the embodiment of the present application, the deep network training is performed by inputting multiple target samples in advance, and the text positioning network capable of positioning the text of the table picture and the text recognition network capable of recognizing the text of the table picture are trained. Specifically, feature point extraction and feature fusion are performed on the sample picture, and finally the text positioning network and the text recognition network are output. Wherein, the target sample includes at least a picture sample and the coordinates of a marked rectangular frame with text.
深度网络的训练是机器学习研究中的一个新的领域,其动机在于建立、模拟人脑进行分析学习的神经网络,它模仿人脑的机制来解释数据,例如图像,声音和文本。Deep network training is a new field in machine learning research. Its motivation is to establish and simulate a neural network that simulates the human brain for analysis and learning. It mimics the mechanism of the human brain to interpret data, such as images, sounds, and text.
本申请的总体思路为基于深度网络的训练的文字检测与识别过程,具体是通过FasterRCNN(基于深度学习的目标检测技术)、CTPN(自然场景文本检测)等定位网络针对图片中的文字进行检测和定位,得到文字的位置信息,然后将该位置信息所指向的区域输入到基于RNN文字识别网络如RCNN等进行文字的识别,得到该位置信息对应的字符串。The general idea of this application is the text detection and recognition process based on deep network training, specifically through FasterRCNN (deep learning-based target detection technology), CTPN (natural scene text detection) and other positioning networks for text detection and recognition in pictures. Positioning to obtain the location information of the text, and then input the area pointed to by the location information to the RNN-based text recognition network such as RCNN for text recognition, and obtain the character string corresponding to the location information.
请参考图2,图2为基于EAST(场景文字检测)文字定位网络。本申请所应用的文字定位网络是基于EAST文字定位网络改进而成。具体的,本申请所应用的文字定位网络是在图2所示的网络结构中的score map后接入LSTM(长短期记忆网络),将score map提亮摸均匀,训练时使用dice loss替换focus-loss。其中,LSTM是一种时间递归神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。Please refer to Figure 2. Figure 2 is a text positioning network based on EAST (scene text detection). The text positioning network used in this application is an improvement based on the EAST text positioning network. Specifically, the text positioning network used in this application is the score in the network structure shown in FIG. 2 After the map is connected to the LSTM (Long Short-term Memory Network), the score map is brightened and evenly touched. Use dice during training Loss replaces focus-loss. Among them, LSTM is a time recurrent neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in time series.
进一步的,本申请所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,具体包括:输入表格图片至预先训练的文字定位网络;获取所述表格图片中连续的若干个字符串作为一个字符串组合;获取包围所述字符串组合的最小的矩形框;建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。Further, inputting the form picture described in this application to the pre-trained text positioning network to obtain the position information of the characters in the form picture specifically includes: inputting the form picture to the pre-trained text positioning network; Several character strings of are used as a character string combination; the smallest rectangular frame surrounding the character string combination is obtained; a rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
请继续参考图3,图3为获取到所述表格图片中字符的位置信息示意图。如图3所示,所述表格图片中包含若干个字符串组合。通过所述文字定位网络后输出包裹各个字符串组合的最小矩形框。本申请实施例中,所述表格图片中字符的位置信息被表达为包裹所述字符串组合的最小矩形框的坐标值。本申请通过所述文字定位网络可以直接得到包裹所述字符串组合的矩形框的四个顶点的坐标。具体的,所述位置信息被表达为该矩形框的左上角以及右下角的坐标值。在实际使用时,因为表格文字基本是水平的,所以取得到的Quad Geometry这个函数中四个坐标的X轴最小值与最大值,Y轴的最小值与最大值,组成所述矩形框的左上角与右下角的坐标,从而得到标准的矩形框。例如,通过所述文字定位网络得到包裹某个字符串组合的最小矩形框的四个顶点的坐标分别为:A(X1,Y1)、A(X1,Y2)、A(X2,Y1)以及A(X2,Y2),依据X1、X2、Y1以及Y2的大小值,选取该矩形的左上角以及右下角的点的坐标值。Please continue to refer to FIG. 3, which is a schematic diagram of obtaining position information of characters in the table picture. As shown in Figure 3, the table picture contains several character string combinations. After the text positioning network is used, the smallest rectangular frame wrapping each character string combination is output. In the embodiment of the present application, the position information of the characters in the table picture is expressed as the coordinate value of the smallest rectangular frame that wraps the combination of character strings. In the present application, the coordinates of the four vertices of the rectangular frame surrounding the character string combination can be directly obtained through the character positioning network. Specifically, the position information is expressed as the coordinate values of the upper left corner and the lower right corner of the rectangular frame. In actual use, because the table text is basically horizontal, the obtained Quad In the Geometry function, the minimum and maximum values of the X axis and the minimum and maximum values of the Y axis constitute the coordinates of the upper left corner and the lower right corner of the rectangular frame, thereby obtaining a standard rectangular frame. For example, the coordinates of the four vertices of the smallest rectangular frame that wraps a certain string combination obtained through the text positioning network are: A(X1, Y1), A(X1, Y2), A(X2, Y1), and A (X2, Y2), according to the size of X1, X2, Y1, and Y2, select the coordinates of the upper left and lower right corners of the rectangle.
分割模块12,用于依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容。The segmentation module 12 is configured to perform graphic segmentation on the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain Cell character content.
本申请实施例中,依据所述位置信息确定一个矩形框,依据所述矩形框确定一个单元格图片。具体的,本申请依据所述矩形框对所述表格图片进行图像分割,从所述表格图片中截取出该矩形框对应的单元格图片,其中,每个单元格图片中包含一个字符串组合。In the embodiment of the present application, a rectangular frame is determined according to the position information, and a cell picture is determined according to the rectangular frame. Specifically, the present application performs image segmentation on the form picture according to the rectangular frame, and cuts out the cell picture corresponding to the rectangular frame from the form picture, wherein each cell picture contains a character string combination.
进一步的,本申请将所述单元格图片输入至所述文字识别网络,以对所述单元格图片中的字符串组合的内容进行识别得到所述单元格字符内容。本申请实施例中,所述文字识别网络是经典的文字识别CRNN网络,通过该网络后得到可供编辑的所述单元格字符内容。Further, the present application inputs the cell picture to the text recognition network to recognize the content of the character string combination in the cell picture to obtain the cell character content. In the embodiment of the present application, the character recognition network is a classic character recognition CRNN network, and the cell character content that can be edited is obtained through the network.
提取模块13,用于依据所述位置信息,提取所述表格图片的第一表格布局。The extraction module 13 is configured to extract the first table layout of the table picture according to the position information.
本申请实施例中,所述依据所述位置信息,提取所述表格图片的第一表格布局,具体包括:提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;计算总的行数以及总的列数作为所述第一表格布局。In the embodiment of the present application, extracting the first table layout of the table picture according to the position information specifically includes: extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information; Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points in the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row; calculate the total number of rows and the total The number of columns is used as the first table layout.
本申请实施例中,通过所述位置信息在水平方向上和垂直方向上的重叠比例将包裹各个字符串组合的矩形框划分到表格图片对应的行列的位置。其中,相同行中矩形框的顶点的纵坐标相同或者相近,相同列的矩形框的横坐标相同或者相近。本申请可以设定当两个点的纵坐标相同或者两个点的纵坐标的差值在预设范围内时判断该两个点位于同一行,以及设定当两个点的横坐标相同或者两个点的横坐标的差值在预设范围内时判断该两个点位于同一列。本申请依据该原理,将矩形框的顶点的纵坐标相同或相近的划分为同一行,将横坐标相同或相近的划分为同一列。In the embodiment of the present application, the rectangular frame wrapping each character string combination is divided into the positions of the rows and columns corresponding to the table pictures according to the overlap ratio of the position information in the horizontal direction and the vertical direction. Wherein, the ordinates of the vertices of the rectangular boxes in the same row are the same or similar, and the abscissas of the rectangular boxes in the same column are the same or similar. This application can set when the ordinates of two points are the same or the difference between the ordinates of the two points is within a preset range to determine that the two points are in the same row, and when the abscissas of the two points are the same or When the difference between the abscissas of the two points is within the preset range, it is determined that the two points are located in the same column. According to this principle, this application divides the vertices of the rectangular frame with the same or similar ordinates into the same row, and divides the same or similar abscissas into the same column.
请继续参考图3,如图3所示,同一列的矩形框的顶点的横坐标存在相同或相近的,而不同列的横坐标范围没有交集。同一行的矩形框具有重合的纵坐标的交集,而不同行的纵坐标范围不存在交集。Please continue to refer to Figure 3, as shown in Figure 3, the abscissas of the vertices of the rectangular boxes in the same column are the same or similar, but the abscissa ranges of different columns do not overlap. Rectangular boxes in the same row have the intersection of coincident ordinates, and there is no intersection of ordinate ranges that are not in the same row.
本申请实施例中,所述第一表格布局至少包括表格的行数以及列数。对于表格的名称内容,它具有跨列的文字长度,则可以将其先去除。通过以上规则,可以提取所述表格图片的行的数量N以及列的数量M,进一步的,提取出所述表格图片的N×M布局格式。In the embodiment of the present application, the first table layout includes at least the number of rows and columns of the table. For the name content of the table, it has a text length that spans columns, so you can remove it first. Through the above rules, the number N of rows and the number M of columns of the table picture can be extracted, and further, the N×M layout format of the table picture can be extracted.
生成模块14,用于依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件。The generating module 14 is configured to generate a table file of the table picture according to the first table layout and the cell character content.
本申请实施例中,所述依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件,具体包括:依据所述第一表格布局绘制表格;将所述单元格字符对应填入绘制的表格的单元格中,生成所述表格图片的表格文件。In the embodiment of the application, the generating a table file of the table picture according to the first table layout and the cell character content specifically includes: drawing a table according to the first table layout; The characters are correspondingly filled in the cells of the drawn table to generate a table file of the table picture.
本申请实施例中,提取所述表格图片的第一表格布局之后绘制所述表格图片对应的表格,所述表格中包含与所述字符串组合数量相同的单元格。进一步的,本申请将识别出的单元格字符内容对应填入所述表格的单元格中生成表格文件,其内容可保存为csv或者json格式可供程序进行数据分析处理,从而实现表格图片的解析。In the embodiment of the present application, after extracting the first table layout of the table picture, the table corresponding to the table picture is drawn, and the table contains the same number of cells as the combination of the character strings. Further, this application fills the identified cell character content into the cells of the table to generate a table file, whose content can be saved in csv or json format for data analysis and processing by the program, thereby realizing the analysis of the table image .
本申请实施例中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息之前,还包括:检测所述表格图片中是否包含网格线;若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。一种可能的设计中,如果所述表格图中表格有网格线,可以通过图像学开闭运算提取出所述第二表格布局。In the embodiment of the present application, before the input of the form picture to the pre-trained text positioning network and the position information of the characters in the form picture is obtained, the method further includes: detecting whether the form picture contains grid lines; if the form If the picture contains grid lines, extract the second table layout of the table picture; compare the second table layout with the first table layout, and when the comparison result is that the first table layout and the When the second table layout is consistent, it is verified that the first table layout is valid. In a possible design, if there are grid lines in the table in the table diagram, the second table layout can be extracted through the open and close operation of image science.
实际上,本申请可以通过将所述第一表格布局与所述第二表格布局进行比对同时验证所述第一表格布局与所述第二表格布局的可靠性。In fact, the present application can verify the reliability of the first table layout and the second table layout by comparing the first table layout with the second table layout.
优选的,本申请还可以计算所述第二表格布局与所述第一表格布局的比对结果,所述比对结果被表达为所述第一表格布局与所述第二表格的差异点,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络,以提高本方案的识别精度。Preferably, the present application may also calculate a comparison result of the second table layout and the first table layout, and the comparison result is expressed as the difference between the first table layout and the second table. When the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, the text positioning network is retrained to improve the recognition accuracy of the solution.
在另一种实施例中,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质可以为非易失性可读存储介质。所述计算机可读存储介质上存储有计算机可读指令,该程序被处理器执行时实现任一项技术方案所述的基于文字定位识别的表格解析方法。其中,所述计算机可读存储介质包括但不限于任何类型的盘(包括软盘、硬盘、光盘、CD-ROM、和磁光盘)、ROM(Read-Only Memory,只读存储器)、RAM(Random AcceSS Memory,随即存储器)、EPROM(EraSable Programmable Read-Only Memory,可擦写可编程只读存储器)、EEPROM(Electrically EraSable Programmable Read-Only Memory,电可擦可编程只读存储器)、闪存、磁性卡片或光线卡片。也就是,存储设备包括由设备(例如,计算机、手机)以能够读的形式存储或传输信息的任何介质,可以是只读存储器,磁盘或光盘等。In another embodiment, an embodiment of the present application provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium. The computer-readable storage medium stores computer-readable instructions, and when the program is executed by a processor, the table analysis method based on text positioning and recognition according to any one of the technical solutions is implemented. Wherein, the computer-readable storage medium includes but is not limited to any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory, read-only memory), RAM (Random AcceSS Memory, random memory), EPROM (EraSable Programmable Read-Only Memory, erasable programmable read-only memory), EEPROM (Electrically EraSable Programmable Read-Only Memory, electrically erasable programmable read-only memory), flash memory, magnetic card or optical card. That is, a storage device includes any medium that stores or transmits information in a readable form by a device (for example, a computer, a mobile phone), and may be a read-only memory, a magnetic disk, or an optical disk.
本申请实施例提供的一种计算机可读存储介质,可实现输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;依据所述位置信息,提取所述表格图片的第一表格布局;依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件。本申请可以利用建立好的深度学习模型进行表格图片中文字的定位与识别,提高了表格图片识别的效率以及准确率。The computer-readable storage medium provided by the embodiment of the application can realize the input of form pictures to a pre-trained text positioning network to obtain the position information of the characters in the form pictures; graph the form pictures according to the position information Segmentation, segmenting the cell picture corresponding to the position information, inputting the cell picture into a pre-trained text recognition network for character recognition, and obtaining cell character content; extracting the first part of the table picture according to the position information A table layout; according to the first table layout and the cell character content, a table file of the table picture is generated. In this application, the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.
此外,在又一种实施例中,本申请提供了一种计算机设备,如图5所示,所述计算机设备包括处理器303、存储器305、输入单元307以及显示单元309等器件。本领域技术人员可以理解,图5示出的结构器件并不构成对所有计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件。存储器305可用于存储应用程序301以及各功能模块,处理器303运行存储在存储器305的应用程序301,从而执行设备的各种功能应用以及数据处理。存储器305可以是内存储器或外存储器,或者包括内存储器和外存储器两者。内存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦写可编程ROM(EEPROM)、快闪存储器、或者随机存储器。外存储器可以包括硬盘、软盘、ZIP盘、U盘、磁带等。本申请所公开的存储器包括但不限于这些类型的存储器。本申请所公开的存储器305只作为例子而非作为限定。In addition, in another embodiment, the present application provides a computer device. As shown in FIG. 5, the computer device includes a processor 303, a memory 305, an input unit 307, and a display unit 309. Those skilled in the art can understand that the structural components shown in FIG. 5 do not constitute a limitation on all computer equipment, and may include more or less components than those shown in the figure, or combine certain components. The memory 305 may be used to store the application program 301 and various functional modules, and the processor 303 runs the application program 301 stored in the memory 305 to execute various functional applications and data processing of the device. The memory 305 may be internal memory or external memory, or include both internal memory and external memory. The internal memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, or random access memory. External storage can include hard disks, floppy disks, ZIP disks, U disks, tapes, etc. The memory disclosed in this application includes but is not limited to these types of memory. The memory 305 disclosed in this application is merely an example and not a limitation.
输入单元307用于接收信号的输入,以及接收用户输入的关键字。输入单元307可包括触控面板以及其它输入设备。触控面板可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板上或在触控面板附近的操作),并根据预先设定的程序驱动相应的连接装置;其它输入设备可以包括但不限于物理键盘、功能键(比如播放控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。显示单元309可用于显示用户输入的信息或提供给用户的信息以及计算机设备的各种菜单。显示单元309可采用液晶显示器、有机发光二极管等形式。处理器303是计算机设备的控制中心,利用各种接口和线路连接整个电脑的各个部分,通过运行或执行存储在存储器303内的软件程序和/或模块,以及调用存储在存储器内的数据,执行各种功能和处理数据。图5中所示的一个或多个处理器303能够执行、实现图4中所示的输入模块11、识别模块12、提取模块13以及生成模块14的功能。The input unit 307 is used for receiving input of signals and receiving keywords input by the user. The input unit 307 may include a touch panel and other input devices. The touch panel can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc., to operate on the touch panel or near the touch panel), and according to the preset The program drives the corresponding connection device; other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as playback control buttons, switch buttons, etc.), trackball, mouse, and joystick. The display unit 309 may be used to display information input by the user or information provided to the user and various menus of the computer device. The display unit 309 may take the form of a liquid crystal display, an organic light emitting diode, or the like. The processor 303 is the control center of the computer equipment. It uses various interfaces and lines to connect the various parts of the entire computer. By running or executing the software programs and/or modules stored in the memory 303, and calling the data stored in the memory, execute Various functions and processing data. The one or more processors 303 shown in FIG. 5 can execute and realize the functions of the input module 11, the recognition module 12, the extraction module 13, and the generation module 14 shown in FIG. 4.
在一种实施方式中,所述计算机设备包括存储器305和处理器303,所述存储器305中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器303执行以上实施例所述的一种基于文字定位识别的表格解析方法的步骤。In one embodiment, the computer device includes a memory 305 and a processor 303. The memory 305 stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor 303 executes the steps of a table analysis method based on character positioning recognition described in the above embodiment.
本申请实施例提供的一种计算机设备,可实现输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;依据所述位置信息,提取所述表格图片的第一表格布局;依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件。本申请可以利用建立好的深度学习模型进行表格图片中文字的定位与识别,提高了表格图片识别的效率以及准确率。The computer device provided by the embodiment of the application can input form pictures to a pre-trained text positioning network to obtain position information of characters in the form pictures; perform graphic segmentation and segmentation on the form pictures according to the position information The cell picture corresponding to the position information is extracted, and the cell picture is input into a pre-trained text recognition network for character recognition to obtain the cell character content; according to the position information, the first table layout of the table picture is extracted ; According to the first table layout and the cell character content, a table file of the table picture is generated. In this application, the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.
另一种实施例中,本申请还可以实现检测所述表格图片中是否包含网格线;若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。本申请还可以另外检测所述表格图片是否存在表格线,在所述表格图片存在表格线的情况下,直接提取所述表格线,然后将得到的第一表格布局与提取的表格线构成的第二表格布局进行比对以校验所述第一表格布局是否有效。本申请通过文字定位网络以及文字识别网络解析表格图片,可以兼容无表格线和有表格线或表格线残缺的情况,适用范围广。In another embodiment, the present application can also detect whether the table picture contains grid lines; if the table picture contains grid lines, extract the second table layout of the table picture; The second table layout is compared with the first table layout, and when the comparison result is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid. This application can additionally detect whether there are table lines in the table picture. In the case that the table pictures have table lines, the table lines are directly extracted, and then the obtained first table layout and the extracted table line form the first The two table layouts are compared to verify whether the first table layout is valid. This application uses the text positioning network and the text recognition network to parse the table pictures, which can be compatible with the situations where there is no table line and the table line or the table line is incomplete, and the scope of application is wide.
本申请实施例提供的计算机可读存储介质可以实现上述基于文字定位识别的表格解析方法的实施例,具体功能实现请参见方法实施例中的说明,在此不再赘述。The computer-readable storage medium provided in the embodiment of the present application can implement the above-mentioned embodiment of the table analysis method based on text positioning and recognition. For specific function realization, please refer to the description in the method embodiment, and will not be repeated here.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a computer-readable storage medium. When the program is executed, it may include the procedures of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a magnetic disk, an optical disk, a read-only storage memory (Read-Only Non-volatile storage media such as Memory, ROM, or Random Access Memory (RAM), etc.
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, All should be considered as the scope of this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of this application, and their descriptions are more specific and detailed, but they should not be construed as limiting the scope of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种基于文字定位识别的表格图片解析方法,其特征在于,所述方法包括: A method for analyzing table pictures based on text positioning and recognition, characterized in that the method includes:
    输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;Input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture;
    依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;Graphically segment the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain cell character content;
    依据所述位置信息,提取所述表格图片的第一表格布局;Extracting the first table layout of the table picture according to the position information;
    依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件;Generating a table file of the table picture according to the first table layout and the cell character content;
    其中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,包括:Wherein, the input form picture to a pre-trained text positioning network to obtain position information of characters in the form picture includes:
    输入表格图片至预先训练的文字定位网络;Input form picture to pre-trained text positioning network;
    获取所述表格图片中连续的若干个字符串作为一个字符串组合;Acquiring several consecutive character strings in the table picture as a character string combination;
    获取包围所述字符串组合的最小的矩形框;Obtaining the smallest rectangular frame surrounding the character string combination;
    建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。A rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
  2. 根据权利要求1所述的基于文字定位识别的表格图片解析方法,其特征在于,还包括:The method for analyzing table images based on text positioning and recognition according to claim 1, characterized in that it further comprises:
    输入表格图片的样本进行深度网络的训练,训练出所述文字定位网络以及所述文字识别网络。Input the sample of the table picture to train the deep network to train the text positioning network and the text recognition network.
  3. 根据权利要求1所述的基于文字定位识别的表格图片解析方法,其特征在于,所述依据所述位置信息,提取所述表格图片的第一表格布局,包括:The method for analyzing table pictures based on text positioning and recognition according to claim 1, wherein said extracting a first table layout of said table pictures according to said position information comprises:
    提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;Extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information;
    依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points on the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row;
    计算总的行数以及总的列数作为所述第一表格布局。Calculate the total number of rows and the total number of columns as the first table layout.
  4. 根据权利要求1所述的基于文字定位识别的表格图片解析方法,其特征在于,所述依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件,包括:The method for analyzing table pictures based on text positioning and recognition according to claim 1, wherein the generating a table file of the table pictures according to the first table layout and the cell character content includes:
    依据所述第一表格布局绘制表格;Draw a table according to the first table layout;
    将所述单元格字符对应填入绘制的表格的单元格中,生成所述表格图片的表格文件。Filling the cell characters into the cells of the drawn table correspondingly to generate a table file of the table picture.
  5. 根据权利要求1所述的基于文字定位识别的表格图片解析方法,其特征在于,所述依据所述位置信息,提取所述表格图片的第一表格布局之后,包括:The method for analyzing table pictures based on text positioning and recognition according to claim 1, wherein after extracting the first table layout of the table pictures according to the position information, the method comprises:
    检测所述表格图片中是否包含网格线;Detecting whether the table picture contains grid lines;
    若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;If the table picture contains grid lines, extract the second table layout of the table picture;
    将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。The second table layout is compared with the first table layout, and when the comparison result is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid.
  6. 根据权利要求5所述的基于文字定位识别的表格图片解析方法,其特征在于,所述依据所述位置信息,生成所述表格图片的第一表格布局之后,包括:The method for analyzing table pictures based on text positioning and recognition according to claim 5, wherein after generating the first table layout of the table pictures according to the position information, the method comprises:
    计算所述第二表格布局与所述第一表格布局的比对结果,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络。Calculate the comparison result of the second table layout and the first table layout, and when the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, retrain The text positioning network.
  7. 一种基于文字定位识别的表格图片解析装置,其特征在于,所述基于文字定位识别的表格图片解析装置包括:A form picture analysis device based on text location recognition, characterized in that the form picture analysis device based on text location recognition includes:
    输入模块,用于输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;The input module is used to input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture;
    识别模块,用于依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;The recognition module is configured to perform graphic segmentation on the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain a cell Grid character content;
    提取模块,用于依据所述位置信息,提取所述表格图片的第一表格布局;An extraction module, configured to extract the first table layout of the table picture according to the position information;
    生成模块,用于依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件;A generating module, configured to generate a table file of the table picture according to the first table layout and the cell character content;
    其中,所述输入模块,还用于:Wherein, the input module is also used for:
    输入表格图片至预先训练的文字定位网络;Input form picture to pre-trained text positioning network;
    获取所述表格图片中连续的若干个字符串作为一个字符串组合;Acquiring several consecutive character strings in the table picture as a character string combination;
    获取包围所述字符串组合的最小的矩形框;Obtaining the smallest rectangular frame surrounding the character string combination;
    建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。A rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
  8. 如权利要求7所述的基于文字定位识别的表格图片解析装置,其特征在于,所述基于文字定位识别的表格图片解析装置还包括:8. The form image analysis device based on text location recognition according to claim 7, wherein the form image analysis device based on text location recognition further comprises:
    训练模块,输入表格图片的样本进行深度网络的训练,训练出所述文字定位网络以及所述文字识别网络。The training module inputs the samples of the table pictures to train the deep network, and trains the text positioning network and the text recognition network.
  9. 如权利要求7所述的基于文字定位识别的表格图片解析装置,其特征在于,所述提取模块还用于:8. The form image analysis device based on text positioning and recognition according to claim 7, wherein the extraction module is further configured to:
    提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;Extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information;
    依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points on the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row;
    计算总的行数以及总的列数作为所述第一表格布局。Calculate the total number of rows and the total number of columns as the first table layout.
  10. 如权利要求7所述的基于文字定位识别的表格图片解析装置,其特征在于,所述基于文字定位识别的表格图片解析装置还包括:8. The form image analysis device based on text location recognition according to claim 7, wherein the form image analysis device based on text location recognition further comprises:
    检测所述表格图片中是否包含网格线;Detecting whether the table picture contains grid lines;
    若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;If the table picture contains grid lines, extract the second table layout of the table picture;
    将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。The second table layout is compared with the first table layout, and when the comparison result is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid.
  11. 根据权利要求10所述的基于文字定位识别的表格图片解析装置,其特征在于,所述基于文字定位识别的表格图片解析装置还包括:10. The form image analysis device based on text location recognition according to claim 10, wherein the form image analysis device based on text location recognition further comprises:
    计算所述第二表格布局与所述第一表格布局的比对结果,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络。Calculate the comparison result of the second table layout and the first table layout, and when the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, retrain The text positioning network.
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现如下步骤:A computer-readable storage medium, characterized in that computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
    输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;Input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture;
    依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;Graphically segment the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain cell character content;
    依据所述位置信息,提取所述表格图片的第一表格布局;Extracting the first table layout of the table picture according to the position information;
    依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件;Generating a table file of the table picture according to the first table layout and the cell character content;
    其中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,包括:Wherein, the input form picture to a pre-trained text positioning network to obtain position information of characters in the form picture includes:
    输入表格图片至预先训练的文字定位网络;Input form picture to pre-trained text positioning network;
    获取所述表格图片中连续的若干个字符串作为一个字符串组合;Acquiring several consecutive character strings in the table picture as a character string combination;
    获取包围所述字符串组合的最小的矩形框;Obtaining the smallest rectangular frame surrounding the character string combination;
    建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。A rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
  13. 如权利要求12所述的计算机可读存储介质,其特征在于,所述依据所述位置信息,提取所述表格图片的第一表格布局,包括:The computer-readable storage medium of claim 12, wherein the extracting the first table layout of the table picture according to the location information comprises:
    提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;Extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information;
    依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points on the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row;
    计算总的行数以及总的列数作为所述第一表格布局。Calculate the total number of rows and the total number of columns as the first table layout.
  14. 根据权利要求12所述的计算机可读存储介质,其特征在于,所述依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件,包括:The computer-readable storage medium according to claim 12, wherein the generating a table file of the table picture according to the first table layout and the cell character content comprises:
    依据所述第一表格布局绘制表格;Draw a table according to the first table layout;
    将所述单元格字符对应填入绘制的表格的单元格中,生成所述表格图片的表格文件。Filling the cell characters into the cells of the drawn table correspondingly to generate a table file of the table picture.
  15. 根据权利要求12所述的计算机可读存储介质,其特征在于,所述依据所述位置信息,提取所述表格图片的第一表格布局之后,包括:The computer-readable storage medium according to claim 12, wherein after the extracting the first table layout of the table picture according to the location information, the method comprises:
    检测所述表格图片中是否包含网格线;Detecting whether the table picture contains grid lines;
    若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;If the table picture contains grid lines, extract the second table layout of the table picture;
    将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。The second table layout is compared with the first table layout, and when the comparison result is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid.
  16. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述依据所述位置信息,生成所述表格图片的第一表格布局之后,包括:15. The computer-readable storage medium according to claim 15, wherein after the generating the first table layout of the table picture according to the location information, the method comprises:
    计算所述第二表格布局与所述第一表格布局的比对结果,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络。Calculate the comparison result of the second table layout and the first table layout, and when the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, retrain The text positioning network.
  17. 一种计算机设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,实现如下步骤:A computer device, characterized by comprising a memory and a processor, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the following steps are implemented:
    输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;Input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture;
    依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;Graphically segment the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain cell character content;
    依据所述位置信息,提取所述表格图片的第一表格布局;Extracting the first table layout of the table picture according to the position information;
    依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件;Generating a table file of the table picture according to the first table layout and the cell character content;
    其中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,包括:Wherein, the input form picture to a pre-trained text positioning network to obtain position information of characters in the form picture includes:
    输入表格图片至预先训练的文字定位网络;Input form picture to pre-trained text positioning network;
    获取所述表格图片中连续的若干个字符串作为一个字符串组合;Acquiring several consecutive character strings in the table picture as a character string combination;
    获取包围所述字符串组合的最小的矩形框;Obtaining the smallest rectangular frame surrounding the character string combination;
    建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。A rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
  18. 根据权利要求17所述的计算机设备,其特征在于,还包括:18. The computer device of claim 17, further comprising:
    输入表格图片的样本进行深度网络的训练,训练出所述文字定位网络以及所述文字识别网络。Input the sample of the table picture to train the deep network to train the text positioning network and the text recognition network.
  19. 根据权利要求17所述的计算机设备,其特征在于,所述依据所述位置信息,提取所述表格图片的第一表格布局,包括:18. The computer device of claim 17, wherein the extracting the first table layout of the table picture according to the location information comprises:
    提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;Extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information;
    依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points on the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row;
    计算总的行数以及总的列数作为所述第一表格布局。Calculate the total number of rows and the total number of columns as the first table layout.
  20. 根据权利要求17所述的计算机设备,其特征在于,所述依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件,包括:18. The computer device of claim 17, wherein the generating a table file of the table picture according to the first table layout and the cell character content comprises:
    依据所述第一表格布局绘制表格;Draw a table according to the first table layout;
    将所述单元格字符对应填入绘制的表格的单元格中,生成所述表格图片的表格文件。 Filling the cell characters into the cells of the drawn table correspondingly to generate a table file of the table picture.
PCT/CN2019/118422 2019-02-13 2019-11-14 Form parsing method based on character location and recognition, and medium and computer device WO2020164281A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910115364.7 2019-02-13
CN201910115364.7A CN109961008B (en) 2019-02-13 Table analysis method, medium and computer equipment based on text positioning recognition

Publications (1)

Publication Number Publication Date
WO2020164281A1 true WO2020164281A1 (en) 2020-08-20

Family

ID=67023672

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118422 WO2020164281A1 (en) 2019-02-13 2019-11-14 Form parsing method based on character location and recognition, and medium and computer device

Country Status (1)

Country Link
WO (1) WO2020164281A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985459A (en) * 2020-09-18 2020-11-24 北京百度网讯科技有限公司 Table image correction method, device, electronic equipment and storage medium
CN112036304A (en) * 2020-08-31 2020-12-04 平安医疗健康管理股份有限公司 Medical bill layout identification method and device and computer equipment
CN112132794A (en) * 2020-09-14 2020-12-25 杭州安恒信息技术股份有限公司 Text positioning method, device and equipment for audit video and readable storage medium
CN112200117A (en) * 2020-10-22 2021-01-08 长城计算机软件与系统有限公司 Form identification method and device
CN112364726A (en) * 2020-10-27 2021-02-12 重庆大学 Part code spraying character positioning method based on improved EAST
CN112686258A (en) * 2020-12-10 2021-04-20 广州广电运通金融电子股份有限公司 Physical examination report information structuring method and device, readable storage medium and terminal
CN112712014A (en) * 2020-12-29 2021-04-27 平安健康保险股份有限公司 Table picture structure analysis method, system, equipment and readable storage medium
CN112800904A (en) * 2021-01-19 2021-05-14 深圳市玩瞳科技有限公司 Method and device for identifying character strings in picture according to finger pointing
CN113128490A (en) * 2021-04-28 2021-07-16 湖南荣冠智能科技有限公司 Prescription information scanning and automatic identification method
CN113378789A (en) * 2021-07-08 2021-09-10 京东数科海益信息科技有限公司 Cell position detection method and device and electronic equipment
CN113392811A (en) * 2021-07-08 2021-09-14 北京百度网讯科技有限公司 Table extraction method and device, electronic equipment and storage medium
CN113538291A (en) * 2021-08-02 2021-10-22 广州广电运通金融电子股份有限公司 Card image tilt correction method and device, computer equipment and storage medium
CN114170616A (en) * 2021-11-15 2022-03-11 嵊州市光宇实业有限公司 Electric power engineering material information acquisition and analysis system and method based on graph paper set
CN114612921A (en) * 2022-05-12 2022-06-10 中信证券股份有限公司 Form recognition method and device, electronic equipment and computer readable medium
CN115841679A (en) * 2023-02-23 2023-03-24 江西中至科技有限公司 Drawing sheet extraction method, system, computer and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908136A (en) * 2009-06-08 2010-12-08 比亚迪股份有限公司 Table identifying and processing method and system
US20150169972A1 (en) * 2013-12-12 2015-06-18 Aliphcom Character data generation based on transformed imaged data to identify nutrition-related data or other types of data
CN105512611A (en) * 2015-11-25 2016-04-20 成都数联铭品科技有限公司 Detection and identification method for form image
CN108805076A (en) * 2018-06-07 2018-11-13 浙江大学 The extracting method and system of environmental impact assessment report table word
CN109961008A (en) * 2019-02-13 2019-07-02 平安科技(深圳)有限公司 Form analysis method, medium and computer equipment based on text location identification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908136A (en) * 2009-06-08 2010-12-08 比亚迪股份有限公司 Table identifying and processing method and system
US20150169972A1 (en) * 2013-12-12 2015-06-18 Aliphcom Character data generation based on transformed imaged data to identify nutrition-related data or other types of data
CN105512611A (en) * 2015-11-25 2016-04-20 成都数联铭品科技有限公司 Detection and identification method for form image
CN108805076A (en) * 2018-06-07 2018-11-13 浙江大学 The extracting method and system of environmental impact assessment report table word
CN109961008A (en) * 2019-02-13 2019-07-02 平安科技(深圳)有限公司 Form analysis method, medium and computer equipment based on text location identification

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036304A (en) * 2020-08-31 2020-12-04 平安医疗健康管理股份有限公司 Medical bill layout identification method and device and computer equipment
CN112132794A (en) * 2020-09-14 2020-12-25 杭州安恒信息技术股份有限公司 Text positioning method, device and equipment for audit video and readable storage medium
CN111985459B (en) * 2020-09-18 2023-07-28 北京百度网讯科技有限公司 Table image correction method, apparatus, electronic device and storage medium
CN111985459A (en) * 2020-09-18 2020-11-24 北京百度网讯科技有限公司 Table image correction method, device, electronic equipment and storage medium
CN112200117A (en) * 2020-10-22 2021-01-08 长城计算机软件与系统有限公司 Form identification method and device
CN112200117B (en) * 2020-10-22 2023-10-13 长城计算机软件与系统有限公司 Form identification method and device
CN112364726B (en) * 2020-10-27 2024-06-04 重庆大学 Part code-spraying character positioning method based on improved EAST
CN112364726A (en) * 2020-10-27 2021-02-12 重庆大学 Part code spraying character positioning method based on improved EAST
CN112686258A (en) * 2020-12-10 2021-04-20 广州广电运通金融电子股份有限公司 Physical examination report information structuring method and device, readable storage medium and terminal
CN112712014A (en) * 2020-12-29 2021-04-27 平安健康保险股份有限公司 Table picture structure analysis method, system, equipment and readable storage medium
CN112712014B (en) * 2020-12-29 2024-04-30 平安健康保险股份有限公司 Method, system, device and readable storage medium for parsing table picture structure
CN112800904A (en) * 2021-01-19 2021-05-14 深圳市玩瞳科技有限公司 Method and device for identifying character strings in picture according to finger pointing
CN113128490A (en) * 2021-04-28 2021-07-16 湖南荣冠智能科技有限公司 Prescription information scanning and automatic identification method
CN113128490B (en) * 2021-04-28 2023-12-05 湖南荣冠智能科技有限公司 Prescription information scanning and automatic identification method
CN113392811B (en) * 2021-07-08 2023-08-01 北京百度网讯科技有限公司 Table extraction method and device, electronic equipment and storage medium
CN113378789B (en) * 2021-07-08 2023-09-26 京东科技信息技术有限公司 Cell position detection method and device and electronic equipment
CN113392811A (en) * 2021-07-08 2021-09-14 北京百度网讯科技有限公司 Table extraction method and device, electronic equipment and storage medium
CN113378789A (en) * 2021-07-08 2021-09-10 京东数科海益信息科技有限公司 Cell position detection method and device and electronic equipment
CN113538291A (en) * 2021-08-02 2021-10-22 广州广电运通金融电子股份有限公司 Card image tilt correction method and device, computer equipment and storage medium
CN113538291B (en) * 2021-08-02 2024-05-14 广州广电运通金融电子股份有限公司 Card image inclination correction method, device, computer equipment and storage medium
CN114170616A (en) * 2021-11-15 2022-03-11 嵊州市光宇实业有限公司 Electric power engineering material information acquisition and analysis system and method based on graph paper set
CN114612921B (en) * 2022-05-12 2022-07-19 中信证券股份有限公司 Form recognition method and device, electronic equipment and computer readable medium
CN114612921A (en) * 2022-05-12 2022-06-10 中信证券股份有限公司 Form recognition method and device, electronic equipment and computer readable medium
CN115841679A (en) * 2023-02-23 2023-03-24 江西中至科技有限公司 Drawing sheet extraction method, system, computer and readable storage medium
CN115841679B (en) * 2023-02-23 2023-05-05 江西中至科技有限公司 Drawing form extraction method, drawing form extraction system, computer and readable storage medium

Also Published As

Publication number Publication date
CN109961008A (en) 2019-07-02

Similar Documents

Publication Publication Date Title
WO2020164281A1 (en) Form parsing method based on character location and recognition, and medium and computer device
WO2020164267A1 (en) Text classification model construction method and apparatus, and terminal and storage medium
WO2020107765A1 (en) Statement analysis processing method, apparatus and device, and computer-readable storage medium
WO2020253112A1 (en) Test strategy acquisition method, device, terminal, and readable storage medium
WO2014069741A1 (en) Apparatus and method for automatic scoring
WO2012161359A1 (en) Method and device for user interface
WO2019156332A1 (en) Device for producing artificial intelligence character for augmented reality and service system using same
WO2018090740A1 (en) Method and apparatus for implementing company based on mixed reality technology
WO2015065006A1 (en) Multimedia apparatus, online education system, and method for providing education content thereof
WO2020107761A1 (en) Advertising copy processing method, apparatus and device, and computer-readable storage medium
WO2011068284A1 (en) Language learning electronic device driving method, system, and simultaneous interpretation system applying same
WO2020159140A1 (en) Electronic device and control method therefor
WO2014069815A1 (en) Display apparatus for studying mask and method for displaying studying mask
WO2016182393A1 (en) Method and device for analyzing user's emotion
WO2023224433A1 (en) Information generation method and device
CN112016077A (en) Page information acquisition method and device based on sliding track simulation and electronic equipment
WO2022145723A1 (en) Method and apparatus for detecting layout
WO2015109772A1 (en) Data processing device and data processing method
WO2020022645A1 (en) Method and electronic device for configuring touch screen keyboard
WO2020045909A1 (en) Apparatus and method for user interface framework for multi-selection and operation of non-consecutive segmented information
CN111860083A (en) Character relation completion method and device
WO2021177719A1 (en) Translation platform operating method
WO2021003922A1 (en) Page information input optimization method, device and apparatus, and storage medium
WO2024048881A1 (en) Learning system, and method for operating learning application
WO2023068495A1 (en) Electronic device and control method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19915547

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 05.10.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19915547

Country of ref document: EP

Kind code of ref document: A1