WO2023279847A1 - Cell position detection method and apparatus, and electronic device - Google Patents

Cell position detection method and apparatus, and electronic device Download PDF

Info

Publication number
WO2023279847A1
WO2023279847A1 PCT/CN2022/092571 CN2022092571W WO2023279847A1 WO 2023279847 A1 WO2023279847 A1 WO 2023279847A1 CN 2022092571 W CN2022092571 W CN 2022092571W WO 2023279847 A1 WO2023279847 A1 WO 2023279847A1
Authority
WO
WIPO (PCT)
Prior art keywords
cell
prediction
predicted
cells
adjacency matrix
Prior art date
Application number
PCT/CN2022/092571
Other languages
French (fr)
Chinese (zh)
Inventor
陶大程
薛文元
Original Assignee
京东科技信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东科技信息技术有限公司 filed Critical 京东科技信息技术有限公司
Publication of WO2023279847A1 publication Critical patent/WO2023279847A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present disclosure relates to the technical field of computer applications, and in particular to a cell position detection method, device, electronic equipment, computer-readable storage medium, computer program product and computer program.
  • tabular data has the advantages of simplicity, intuition, and ease of processing, and is widely used in people's office life.
  • artificial intelligence technology the requirements for automatic recognition of table data are getting higher and higher.
  • the position of cells is automatically detected from the table image, so that information extraction can be performed based on the position of the cells.
  • the detected cell position information is incomplete and has poor robustness.
  • the embodiment of the first aspect of the present disclosure proposes a cell position detection method, which can use the predicted cell as a node, and obtain an adjacency matrix based on the first position of the predicted cell, and then according to the first position and the adjacency matrix
  • the fusion node feature of the prediction cell is obtained, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and the expression effect of the fusion node feature of the prediction cell obtained is better OK
  • the second position of the predicted cell is obtained according to the fusion node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
  • the embodiment of the second aspect of the present disclosure provides a device for detecting the position of a cell.
  • the embodiment of the third aspect of the present disclosure provides an electronic device.
  • Embodiments of a fourth aspect of the present disclosure provide a computer-readable storage medium.
  • the embodiment of the fifth aspect of the present disclosure provides a computer program product.
  • the embodiment of the sixth aspect of the present disclosure provides a computer program.
  • the embodiment of the first aspect of the present disclosure proposes a cell position detection method, including: obtaining the first position of the predicted cell in the table image, wherein the first position is used to represent the area occupied by the predicted cell position in the form image; according to the first position, the adjacency matrix of the form image is obtained, wherein each of the prediction cells in the form image is a node, and the adjacency matrix is used To represent the positional relationship between the prediction units; according to the first position of any prediction unit and the adjacency matrix, obtain the fusion node feature of any prediction unit; according to any prediction unit The fused node feature of the cell is used to obtain the second position of any prediction cell, wherein the second position is used to represent the row and/or column to which the prediction cell belongs.
  • the detection method of the cell position in the embodiment of the present disclosure can use the predicted cell as a node, and obtain the adjacency matrix based on the first position of the predicted cell, and then obtain the fusion node of the predicted cell according to the first position and the adjacency matrix feature, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cells, and the obtained fusion node feature of the predicted cell can be expressed better, and according to the fusion node feature Obtaining the second position of the predicted cell can simultaneously obtain the first position and the second position of the cell, and the obtained cell position is more comprehensive and more robust.
  • the first position includes at least one of the two-dimensional coordinates of the center point of the predicted cell, the width of the predicted cell, and the height of the predicted cell.
  • the obtaining the adjacency matrix of the table image according to the first position includes: determining The value of the corresponding element.
  • the determining the value of the corresponding element in the adjacency matrix based on the first position and the number of the predicted cell includes: obtaining the number n of the predicted cells, and sequentially numbering each of the prediction cells according to numbers 1 to n, wherein the n is an integer greater than 1; extracting the prediction units with the numbers i and j from the first position The abscissa and ordinate of the center point of the grid, wherein, 1 ⁇ i ⁇ n, 1 ⁇ j ⁇ n; obtain the width and height of the table image, and adjust parameters; obtain the number i, j Predicting the first ratio of the difference between the abscissa of the center point of the cell and the width, and determining the value of the element in row i and column j in the adjacency matrix based on the product of the first ratio and the adjustment parameter The value of the row dimension; obtain the second ratio of the difference between the vertical coordinates of the center points of the prediction cells numbered i and j to the height,
  • the obtaining the fusion node feature of any prediction unit according to the first position of any prediction unit and the adjacency matrix includes: according to any prediction unit The first position of the cell to obtain the node feature of any prediction cell; the node feature and the adjacency matrix are input into the graph convolutional network GCN, and the node feature is obtained by the graph convolutional network. The point features are fused with the adjacency matrix to generate the fused node features of any prediction unit.
  • the obtaining the node feature of any prediction cell according to the first position of any prediction cell includes: for the first position of any prediction cell The position is linearly mapped to obtain the spatial characteristics of any of the predicted cells; based on the first position of any of the predicted cells, the visual semantic features of any of the predicted cells are extracted from the table image; The spatial feature and the visual semantic feature of the any prediction cell are spliced to obtain the node feature of the any prediction cell.
  • the extracting the visual semantic feature of any prediction cell from the table image based on the first position of any prediction cell includes: A first position of a predicted cell, determining a target pixel contained in any of the predicted cells from the pixels contained in the table image; extracting the visual semantic features of the target pixel from the table image , as the visual semantic feature of any prediction cell.
  • the obtaining the second position of any prediction unit according to the fusion node feature of any prediction unit includes: based on the fusion of any prediction unit The node feature is used to obtain the predicted probability of any of the predicted cells in each candidate second position; the maximum predicted probability is obtained from the predicted probability of any of the predicted cells in each candidate second position, and The candidate second position corresponding to the maximum prediction probability is determined as the second position of any prediction unit.
  • the obtaining the second position of any prediction unit according to the fusion node feature of any prediction unit includes: establishing for any prediction unit A target vector, the target vector includes n dimensions, and the n is the number of candidate second positions of any prediction unit; based on the fusion node characteristics of any prediction unit, the target vector is obtained The predicted probability that the value of any vector dimension is 0 or 1; the maximum predicted probability is obtained from the predicted probability that the value of any vector dimension is 0 or 1, and the value corresponding to the maximum predicted probability is determined as The target value of any vector dimension; based on the sum of the target values of the vector dimension, the second position of any prediction cell is obtained.
  • the obtaining the first position of the prediction cell in the table image includes: extracting a detection frame of each prediction cell from the table image, and based on the detection frame Get the first position of the predicted cell.
  • the second position includes at least one of a starting row number, an ending row number, a starting column number, and an ending column number of the prediction cell.
  • the embodiment of the second aspect of the present disclosure proposes a cell position detection device, including: a first acquisition module, configured to acquire the first position of the predicted cell in the table image, wherein the first position is used to represent the The position of the region occupied by the predicted cell in the form image; the second acquisition module is configured to obtain the adjacency matrix of the form image according to the first position, wherein each of the form images in the form The prediction cell is a node, and the adjacency matrix is used to represent the positional relationship between the prediction cells; the third acquisition module is used to, according to the first position of any prediction cell and the adjacency matrix, Obtain the fusion node feature of any prediction unit; the fourth acquisition module is configured to obtain the second position of any prediction unit according to the fusion node feature of any prediction unit, wherein, The second position is used to represent the row and/or the column to which the predicted cell belongs.
  • a first acquisition module configured to acquire the first position of the predicted cell in the table image, wherein the first position is used to represent the The
  • the device for detecting the position of a cell in an embodiment of the disclosure can use the predicted cell as a node, and obtain an adjacency matrix based on the first position of the predicted cell, and then obtain the fusion node of the predicted cell according to the first position and the adjacency matrix feature, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cells, and the obtained fusion node feature of the predicted cell can be expressed better, and according to the fusion node feature Obtaining the second position of the predicted cell can simultaneously obtain the first position and the second position of the cell, and the obtained cell position is more comprehensive and more robust.
  • the first position includes at least one of the two-dimensional coordinates of the center point of the predicted cell, the width of the predicted cell, and the height of the predicted cell.
  • the second obtaining module is further configured to: determine the value of the corresponding element in the adjacency matrix based on the first position and the number of the predicted cell.
  • the second obtaining module is further configured to: obtain the number n of the predicted cells, and sequentially number each predicted cell according to numbers 1 to n, wherein, The n is an integer greater than 1; the abscissa and ordinate of the center point of the prediction cell numbered i and j are extracted from the first position, wherein, 1 ⁇ i ⁇ n, 1 ⁇ j ⁇ n; obtain the width and height of the table image, and adjust parameters; obtain the first ratio of the difference between the abscissa of the center point of the predicted cell numbered i and j to the width , and based on the product of the first ratio and the adjustment parameter, determine the value of the row dimension of the element in the i-th row and j-th column in the adjacency matrix; obtain the prediction cell numbered i, j The second ratio of the difference between the vertical coordinates of the central point of the center point and the height, and determine the column dimension of the element in the i-th row and j-th column in
  • the third acquisition module includes: an acquisition unit, configured to obtain the node feature of any prediction unit according to the first position of any prediction unit; fusion A unit for inputting the node features and the adjacency matrix into the graph convolution network GCN, and the graph convolution network performs feature fusion of the node features and the adjacency matrix to generate the any A fused node feature for predicted cells.
  • the acquisition unit includes: a mapping subunit, configured to linearly map the first position of any prediction cell to obtain the spatial characteristics of any prediction cell;
  • the extraction subunit is used to extract the visual semantic features of any prediction cell from the table image based on the first position of any prediction cell;
  • the splicing subunit is used to combine any prediction cell The spatial feature and the visual semantic feature of the prediction cell are spliced to obtain the node feature of any prediction cell.
  • the extracting subunit is further configured to: determine the any prediction unit from the pixels contained in the table image based on the first position of the any prediction unit Included target pixels; extracting the visual semantic features of the target pixel points from the table image as the visual semantic features of any prediction unit.
  • the fourth obtaining module is further configured to: obtain the information of each candidate second position of the any prediction unit based on the fusion node feature of the any prediction unit. the predicted probability; obtain the maximum predicted probability from the predicted probability of any predicted cell in each candidate second position, and determine the candidate second position corresponding to the maximum predicted probability as the predicted probability of any predicted cell second position.
  • the fourth acquisition module is further configured to: establish a target vector for any of the prediction cells, the target vector includes n dimensions, and n is the any The number of candidate second positions of the prediction cell; based on the fusion node feature of any of the prediction cells, the prediction probability of any vector dimension of the target vector being 0 or 1 is obtained; from the any Obtaining the maximum prediction probability from the prediction probability with a value of 0 or 1 in a vector dimension, and determining the value corresponding to the maximum prediction probability as the target value of any vector dimension; based on the target value of the vector dimension The sum value of , to obtain the second position of any predicted cell.
  • the first acquisition module is further configured to: extract the detection frame of each prediction unit cell from the table image, and obtain the prediction unit based on the detection frame the first position of the grid.
  • the second position includes at least one of a starting row number, an ending row number, a starting column number, and an ending column number of the prediction cell.
  • the embodiment of the third aspect of the present disclosure proposes an electronic device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the program, the aforementioned first
  • the detection method of the cell position described in the embodiment the aforementioned first
  • the computer program stored in the memory can be executed by the processor, the predicted cell can be used as a node, and the adjacency matrix can be obtained based on the first position of the predicted cell, and then according to the first position and the adjacency matrix
  • the fusion node feature of the prediction cell is obtained, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and the expression effect of the fusion node feature of the prediction cell obtained is better OK
  • the second position of the predicted cell is obtained according to the fusion node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
  • the embodiment of the fourth aspect of the present disclosure provides a computer-readable storage medium on which a computer program is stored.
  • the program is executed by a processor, the method for detecting the position of a cell as described in the embodiment of the first aspect is implemented.
  • the computer-readable storage medium of the embodiment of the present disclosure stores the computer program and executes it by the processor.
  • the prediction unit can be used as a node, and the adjacency matrix can be obtained based on the first position of the prediction unit, and then according to the first position and the adjacency matrix to obtain the fusion node features of the predicted cell, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cells, and the obtained representation effect of the fusion node feature of the predicted cell It is better, and the second position of the predicted cell is obtained according to the fusion node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
  • the embodiment of the fifth aspect of the present disclosure proposes a computer program product, wherein the computer program product includes computer program code, and when the computer program code is run on a computer, the above-mentioned embodiment of the first aspect is implemented.
  • the detection method of the cell position is not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to the cell position.
  • the computer program product includes computer program code, when the computer program code is run on the computer, the predicted cell can be used as a node, and based on the first position of the predicted cell, the The adjacency matrix, and then according to the first position and the adjacency matrix, the fusion node features of the prediction cell are obtained, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and the obtained prediction
  • the representation effect of the fusion node feature of the cell is better, and the second position of the predicted cell can be obtained according to the fusion node feature, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive , with better robustness.
  • the embodiment of the sixth aspect of the present disclosure proposes a computer program, wherein the computer program includes computer program code, and when the computer program code is run on the computer, the computer executes the unit described in the embodiment of the first aspect The detection method of grid position.
  • the computer program of the embodiment of the present disclosure includes computer program code.
  • the computer can use the predicted cell as a node, and obtain an adjacency matrix based on the first position of the predicted cell, and then according to the first A position and adjacency matrix to obtain the fusion node feature of the predicted cell, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cell, and the obtained fusion node of the predicted cell.
  • the representation effect of the feature is better, and the second position of the predicted cell can be obtained according to the fusion node feature, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust .
  • FIG. 1 is a schematic flowchart of a method for detecting a cell position according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flow diagram of determining values of corresponding elements in an adjacency matrix in a cell position detection method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic flow diagram of obtaining fusion node features of any predicted cell in a cell position detection method according to an embodiment of the present disclosure
  • FIG. 4 is a schematic flow diagram of obtaining the node characteristics of any predicted cell in the cell position detection method according to an embodiment of the present disclosure
  • FIG. 5 is a schematic flow diagram of obtaining the second position of any predicted cell in the cell position detection method according to an embodiment of the present disclosure
  • FIG. 6 is a schematic flowchart of obtaining the second position of any predicted cell in a cell position detection method according to another embodiment of the present disclosure
  • Fig. 7 is a schematic diagram of a detection model of a cell position according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a device for detecting cell positions according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • FIG. 1 is a schematic flowchart of a method for detecting a cell position according to an embodiment of the present disclosure.
  • the method for detecting a cell position in an embodiment of the present disclosure includes steps S101-S104.
  • the executor of the cell position detection method in the embodiment of the present disclosure may be a cell position detection device, and the cell position detection device in the embodiment of the present disclosure can be configured in any electronic device, so that the The electronic device can execute the detection method of the cell position in the embodiment of the present disclosure.
  • the electronic device can be a personal computer (Personal Computer, referred to as PC), cloud device, mobile device, etc.
  • the mobile device can be a mobile phone, tablet computer, personal digital assistant, wearable device, vehicle-mounted device, etc., with various operating systems, Hardware devices for touch screens and/or displays.
  • the first position of the predicted cell in the table image may be obtained. It can be understood that a table image may contain at least one prediction cell, and different prediction cells may correspond to different first positions.
  • the first position is used to represent the position of the area occupied by the predicted cell in the table image, that is, the position of the area occupied by the predicted cell in the table image can be determined according to the first position , the location of the predicted cell can be realized according to the first position.
  • the first position includes at least one of the two-dimensional coordinates of the center point of the predicted cell, the width of the predicted cell, and the height of the predicted cell. At this time, the area occupied by the predicted cell is a rectangle.
  • the cell recognition can be performed on the table image to generate the detection frame of the predicted cell, then obtaining the first position of the predicted cell in the table image may include extracting each predicted cell from the table image The detection frame of the cell, and obtain the first position of the predicted cell based on the detection frame.
  • performing cell recognition on the table image to generate a detection frame for the predicted cell may include performing cell recognition on the table image according to a cell recognition algorithm, so that the predicted cell can be located from the table image , to generate detection boxes for predicted cells.
  • the cell identification algorithm can be set according to the actual situation, and there is no excessive limitation here.
  • obtaining the first position of the predicted cell based on the detection frame may include obtaining the two-dimensional coordinates of the center point of the detection frame, the width and height of the detection frame, and taking the two-dimensional coordinates of the center point of the detection frame as The two-dimensional coordinates of the center point of the predicted cell, and the width and height of the detection frame are respectively used as the width and height of the predicted cell.
  • each prediction cell in the table image may be regarded as a node, the prediction cells and the nodes have a one-to-one correspondence, and each node is used to represent the corresponding prediction cell.
  • the adjacency matrix is used to represent the positional relationship between predicted cells.
  • the adjacency matrix of the table image can be obtained according to the first position. It can be understood that, according to the first positions of any two predicted cells, the positional relationship between any two predicted cells can be obtained, and then the value of the corresponding element in the adjacency matrix can be obtained.
  • the location relationship includes but is not limited to Euclidean distance, Manhattan distance, etc., which are not limited here.
  • elements in the adjacency matrix may be used to represent undirected edges between nodes corresponding to any two prediction cells.
  • the fusion node feature of any prediction unit can be obtained according to the first position and the adjacency matrix of any prediction unit. Therefore, this method can obtain the fusion node features based on the first position of the prediction cell and the adjacency matrix, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and obtain The representation of the fusion node features of predicted cells is better.
  • n fusion node features can be obtained according to the n first positions and the adjacency matrix.
  • the second position of any prediction unit can be obtained according to the fusion node characteristics of any prediction unit, that is, according to the fusion node characteristics of any prediction unit, any prediction unit predict the second position of any predicted cell, and obtain the second position of any predicted cell.
  • the second position is used to represent the row and/or column to which the prediction cell belongs, that is, the row and/or column to which the prediction cell belongs in the table can be determined according to the second position.
  • column the positioning of the predicted cell can be realized according to the second position.
  • the second position includes at least one of the number of the starting row, the number of the ending row, the number of the starting column, and the number of the ending column of the predicted cell. It can be understood that the rows and columns in the table can be numbered respectively in advance.
  • the row to which the predicted cell belongs may be determined according to the number of the start row and the number of the end row of the predicted cell. For example, the candidate numbers between the number of the start row and the number of the end row can be obtained, and the number of the start row, the candidate number, and the number of the end row can be determined as the number of the corresponding row, so that according to the number of the determined row Determine the row to which the forecasted cell belongs.
  • the manner of determining the column to which the prediction cell belongs may refer to the manner of determining the row to which the prediction cell belongs, and details are not repeated here.
  • obtaining the second position of any prediction cell according to the fusion node characteristics of any prediction unit may include inputting the fusion node characteristics of any prediction unit into the position prediction algorithm,
  • the position prediction algorithm is used to predict the position according to the fusion node features, and the second position of any predicted cell is generated.
  • the location prediction algorithm can be set according to actual conditions, and there is no excessive limitation here.
  • the predicted cell can be used as a node, and the adjacency matrix can be obtained based on the first position of the predicted cell, and then the predicted cell can be obtained according to the first position and the adjacency matrix
  • the fusion node feature of the prediction cell can match the fusion node feature with the first position of the prediction cell and the positional relationship between the prediction cells, and the obtained fusion node feature of the prediction cell has a better representation effect, and according to
  • the second position of the predicted cell can be obtained by fusing the node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
  • obtaining the adjacency matrix of the table image according to the first position in step S102 may include determining values of corresponding elements in the adjacency matrix based on the first position and the number of the predicted cell.
  • the positional relationship between any two predicted cells can be obtained based on the first positions of any two predicted cells, and the target number of the corresponding element in the adjacency matrix can be determined according to the numbers of any two predicted cells , and then the value of the element of the target number in the adjacency matrix can be determined according to the positional relationship between any two predicted cells.
  • the value of the corresponding element in the adjacency matrix is determined, including steps S201-S205.
  • the prediction cells can be numbered consecutively according to numbers 1 to n, and the numbers 1 to n can be randomly assigned. For example, if the number of predicted cells is 10, each predicted cell may be numbered consecutively according to numbers 1 to 10.
  • the first position includes the abscissa and ordinate of the center point of the predicted cell, and the abscissa and ordinate of the center point of the predicted cell numbered i and j can be extracted from the first position .
  • the first position has a corresponding relationship with the number of the predicted cell, and the above corresponding relationship can be queried according to the numbers i and j to obtain the abscissa and ordinate of the center point of the predicted cell with the numbers i and j.
  • a mapping relationship or a mapping table between the first position and the number of the predicted cell can be established in advance, wherein the first position includes the abscissa and ordinate of the center point of the predicted cell, then it can be calculated according to The number of the predicted cell queries the above mapping relationship or mapping table to obtain the abscissa and ordinate of the center point of the predicted cell. It should be noted that the above mapping relationship or mapping table can be set according to actual conditions, and there is no excessive limitation here.
  • obtaining the width and height of the form image may include performing size recognition on the form image according to an image size recognition algorithm to obtain the width and height of the form image.
  • the image size recognition algorithm can be set according to the actual situation, which is not limited here.
  • the adjustment parameters may be set according to actual conditions, and are not limited here.
  • the adjustment parameter is positively correlated with the number of rows and/or columns of the table.
  • the following formula is used to calculate the value of the row dimension of the element in row i and column j in the adjacency matrix:
  • the following formula is used to calculate the value of the column dimension of the element in the i-th row and the j-th column in the adjacency matrix:
  • this method can comprehensively consider the abscissa of the center point of the prediction cell numbered i and j, the width of the table image, and the value of the adjustment parameter to the row dimension of the element in the i-th row and j-th column in the adjacency matrix. influence, and comprehensively consider the ordinate of the center point of the prediction cell numbered i and j, the height of the table image, and the influence of adjustment parameters on the value of the column dimension of the element in row i and column j in the adjacency matrix.
  • step S103 according to the first position and adjacency matrix of any prediction cell, the fusion node features of any prediction cell are obtained, including steps S301-S302 .
  • the node feature of any predicted cell can be obtained according to the first position of any predicted cell, so that the node feature can match the first position of the predicted cell.
  • obtaining the node feature of any predicted cell may include inputting the first position of any predicted cell into the feature extraction algorithm, and the feature The extraction algorithm extracts the node features of any prediction cell from the first position.
  • the feature extraction algorithm can be set according to the actual situation, and there is no excessive limitation here.
  • the node features and adjacency matrix can be input into the graph convolutional network (Graph Convolutional Network, GCN), and the node feature and the adjacency matrix are fused by the graph convolutional network to generate any prediction
  • the fused node feature of the cell can be reconstructed by using the adjacency matrix through the graph convolutional network to generate the fused node feature.
  • the graph convolutional network can be set according to the actual situation, and there are no too many restrictions here.
  • the fusion node features are calculated using the following formula:
  • X' is the fusion node feature
  • X is the node feature
  • A is the adjacency matrix
  • ReLU( ⁇ ) is the activation function.
  • this method can obtain the node features of any prediction cell according to the first position of any prediction cell, and input the node features and adjacency matrix into the graph convolutional network GCN, and the graph convolutional network The feature fusion of the node features and the adjacency matrix is performed to generate the fusion node features of any prediction cell.
  • step S301 according to the first position of any predicted cell, the node features of any predicted cell are obtained, including steps S401 - S403 .
  • the first position may be a one-dimensional or multi-dimensional vector.
  • the first position when the first position includes the two-dimensional coordinates of the center point of the predicted cell, the width and height of the predicted cell, the first position is a 4-dimensional vector, which can be used to represent, where b i is the first position of the prediction cell numbered i, is the abscissa of the center point of the prediction cell numbered i, is the ordinate of the center point of the predicted cell numbered i, is the width of the predicted cell numbered i, predict the height of the cell numbered i.
  • a linear mapping may be performed on the first position of any prediction cell to obtain the spatial characteristics of any prediction cell. It can be appreciated that the spatial characteristics of any predicted cell match the first location.
  • performing linear mapping on the first position of any predicted cell to obtain the spatial characteristics of any predicted cell may include inputting the first position of any predicted cell into a linear mapping algorithm, by The linear mapping algorithm performs linear mapping on the first position to obtain the spatial characteristics of any predicted cell.
  • the linear mapping algorithm can be set according to the actual situation, and there is no excessive limitation here.
  • the visual semantic feature of any predicted cell can be extracted from the table image, so that the visual semantic feature can match the first position of the predicted cell.
  • extracting the visual semantic features of any predicted cell from the table image based on the first position of any predicted cell may include determining any predicted The area occupied by the cell on the table image, and the visual semantic feature is extracted from the corresponding area in the table image as the visual semantic feature of any predicted cell.
  • extracting the visual semantic features of any predicted cell from the table image may include, based on the first position of any predicted cell, extracting from the table image Determine the target pixel contained in any prediction cell from the pixels in the table image, and extract the visual semantic feature of the target pixel from the table image as the visual semantic feature of any prediction cell.
  • the table image includes a plurality of pixel points, and based on the first position of any prediction cell, the target pixel point included in any prediction cell can be determined from the pixels included in the table image. It should be noted that the target pixel point refers to a pixel point located in the area occupied by the prediction cell.
  • extracting the visual semantic feature of the target pixel from the table image as the visual semantic feature of any predicted cell may include extracting the visual semantic feature of each pixel from the table image, according to The preset extraction algorithm extracts the visual semantic features of the target pixels from the visual semantic features.
  • the extraction algorithm can be set according to the actual situation, and it is not limited here too much, for example, it can be the RoIAlign algorithm.
  • the spatial features and visual semantic features of any prediction cell can be combined horizontally to obtain the node features of any prediction cell.
  • the spatial features and visual semantic features of any predicted cell are X s and X v respectively, and X s and X v are vectors of 256 dimensions and 1024 dimensions respectively, X s and X v can be spliced horizontally to obtain The node feature of any predicted cell is a 1280-dimensional vector.
  • the method can obtain spatial features and visual semantic features based on the first position of any prediction cell respectively, and splicing the spatial features and visual semantic features to obtain the node features of any prediction cell.
  • step S104 the second position of any prediction unit is obtained according to the fusion node features of any prediction unit, which may include the following two possible implementation modes:
  • Step S104 according to the fusion node features of any prediction unit, the second position of any prediction unit is obtained, which may include steps S501 - S502 .
  • the second position as the starting line of the predicted cell as an example, if the number of rows in the table is T, and the candidate second position includes rows 1, 2 to T, then based on the fusion node characteristics of any predicted cell, we can get The predicted probability of any predicted cell under rows 1, 2 to T.
  • the prediction probability of any prediction unit under each candidate second position may be different, and the greater the prediction probability, the greater the possibility that the candidate second position is the second position.
  • the prediction unit obtains the maximum prediction probability from the prediction probabilities under each candidate second position, and determines the candidate second position corresponding to the maximum prediction probability as the second position of any prediction unit.
  • the candidate second position includes rows 1, 2 to T, and any predicted cell is under rows 1, 2 to T
  • the predicted probabilities of P 1 , P 2 to PT are respectively, and the maximum value among P 1 , P 2 to PT is P 2 , then row 2 can be used as the starting row of the predicted cell.
  • this method can obtain the prediction probability of any prediction unit at each candidate second position based on the fusion node features of any prediction unit, and obtain the prediction probability of any prediction unit at each candidate second position from any prediction unit at each candidate second position
  • the maximum predicted probability is obtained from the predicted probabilities below, and the candidate second position corresponding to the maximum predicted probability is determined as the second position of any predicted cell.
  • step S104 according to the fusion node features of any prediction unit, the second position of any prediction unit is obtained, which may include steps S601 - S604 .
  • the target vector includes T dimensions.
  • the candidate second position includes rows 1, 2 to T, and the target vector includes T dimensions, then it can be based on any prediction unit
  • the fusion node features of the grid can be used to obtain the predicted probability of 0 or 1 for the 1st, 2nd to T vector dimensions of the target vector.
  • the predicted probability of any vector dimension taking a value of 0 or 1 may be different, and a larger predicted probability of taking a value of 0 indicates that the possibility of taking a value of any vector dimension is more likely to be 0, and vice versa , the larger the predicted probability of 1 indicates that the possibility of any vector dimension to be 1 is greater, then the maximum predicted probability can be obtained from the predicted probability of any vector dimension with a value of 0 or 1, and Determine the value corresponding to the maximum predicted probability as the target value of any vector dimension.
  • the candidate second position includes rows 1, 2 to T
  • the target vector includes T dimensions
  • the mth vector dimension of the target vector The predicted probabilities of 0 or 1 are respectively The maximum value in Then the target value of the mth vector dimension of the target vector is 1. Among them, 1 ⁇ m ⁇ T.
  • the sum of the target values of the vector dimension of the target vector has a corresponding relationship with the second position, then the corresponding relationship can be queried based on the sum of the target values of the vector dimension, and the corresponding second position can be determined .
  • the above corresponding relationship may be set according to actual conditions, and is not limited here.
  • the number of each candidate second position can be converted into a candidate vector by using the following formula:
  • the candidate vector includes n dimensions, and n is the number of candidate second positions, is the value of the t-th vector dimension of the candidate vector, r i is the number of the second position of the candidate, 0 ⁇ r i ⁇ n-1, 1 ⁇ t ⁇ n.
  • the candidate second position includes rows 1, 2 to 3, that is, the numbers of the candidate second positions are 0, 1, 2, Corresponding to rows 1, 2, and 3 respectively, the numbers 0, 1, and 2 of the candidate second position can be converted into candidate vectors (0,0,0), (1,0,0), (1, 1,0).
  • the number of the second position may be determined based on the sum of target values of all vector dimensions of the target vector and the target sum of 1. If the sum of the target values of all vector dimensions of the target vector is 2, it can be determined that the number of the starting row of the predicted cell is 3, that is, the starting row of the predicted cell is the third row.
  • this method can establish a target vector for any prediction unit, and determine the value of any vector dimension of the target vector based on the fusion node characteristics of any prediction unit, and according to the target value of the vector dimension The sum value of , to get the second position of any predicted cell, the accuracy of the second position obtained is better.
  • the method for acquiring a second location in the embodiments of the present disclosure is applicable to any type of second location.
  • the method for obtaining the second position in the embodiment of the present disclosure is suitable for determining the number of the start row, the number of the end row, the number of the start column, and the number of the end column of the prediction cell.
  • obtaining the first position of the predicted cell in the table image in step S101 may include extracting the visual semantic feature of each pixel from the table image, and obtaining each pixel based on the visual semantic feature
  • the recognition probability of a point under each category, the maximum recognition probability is obtained from the recognition probability of any pixel point under each category, and the category corresponding to the maximum recognition probability is determined as the target category corresponding to any pixel point, and the recognition is performed by
  • the target category is a connected domain formed by the pixels of cells, the smallest bounding rectangle of the connected domain is determined as the detection frame of the predicted cell, and the first position of the predicted cell is obtained based on the detection frame.
  • the category includes but not limited to background, cell, border line.
  • the recognition probability of each pixel in each category is obtained based on the visual semantic features, which may include inputting the visual semantic features of any pixel into the classification algorithm, and the classification algorithm performs category prediction according to the visual semantic features to generate any The recognition probability of a pixel in each category.
  • the classification algorithm can be set according to the actual situation, and there is no excessive limitation here.
  • the present disclosure also provides a detection model of the cell position.
  • the input of the detection model is a table image, and the output is the predicted cell in the table image.
  • the detection model includes a visual semantic feature extraction layer, a first classification layer, a node feature extraction layer, a graph reconstruction network layer, and a second classification layer.
  • the visual semantic feature extraction layer is used to extract the visual semantic feature of each pixel from the table image.
  • the first classification layer is used to obtain the recognition probability of each pixel in each category based on the visual semantic features, and then determine the target category corresponding to any pixel according to the recognition probability, and identify the pixel whose target category is the cell A connected domain composed of points, the smallest bounding rectangle of the connected domain is determined as the detection frame of the predicted cell, and the first position of the predicted cell is obtained based on the detection frame.
  • the node feature extraction layer is used to obtain the node feature of any predicted cell according to the first position of any predicted cell.
  • the graph reconstruction network layer is used for feature fusion of node features and adjacency matrix to generate fusion node features of any prediction unit.
  • the second classification layer is used to obtain the second position of any prediction unit according to the fusion node feature of any prediction unit.
  • the present disclosure also provides a device for detecting a position of a cell.
  • the detection method of the cell position provided in the embodiment of FIG. 6 the implementation of the detection method of the cell position is also applicable to the detection device of the cell position provided in the embodiment of the present disclosure, and will not be described in the embodiment of the present disclosure. A detailed description.
  • FIG. 8 is a schematic structural diagram of a device for detecting cell positions according to an embodiment of the present disclosure.
  • the cell position detection device 100 of the embodiment of the present disclosure may include: a first acquisition module 110 , a second acquisition module 120 , a third acquisition module 130 and a fourth acquisition module 140 .
  • the first acquisition module 110 is configured to acquire a first position of a prediction cell in the form image, wherein the first position is used to represent the position of the area occupied by the prediction cell in the form image;
  • the second obtaining module 120 is configured to obtain the adjacency matrix of the table image according to the first position, wherein each of the prediction cells in the table image is a node, and the adjacency matrix is used for Representing the positional relationship between the predicted cells;
  • the third acquisition module 130 is configured to obtain the fusion node feature of any prediction unit according to the first position of any prediction unit and the adjacency matrix;
  • the fourth obtaining module 140 is configured to obtain the second position of any prediction unit according to the fusion node feature of the prediction unit, wherein the second position is used to characterize the prediction unit The row and/or column of the .
  • the first position includes at least one of the two-dimensional coordinates of the center point of the predicted cell, the width of the predicted cell, and the height of the predicted cell.
  • the second obtaining module 120 is further configured to: determine the value of a corresponding element in the adjacency matrix based on the first position and the number of the predicted cell.
  • the second obtaining module 120 is further configured to: obtain the number n of the predicted cells, and serially number each predicted cell according to numbers 1 to n, wherein , the n is an integer greater than 1; the abscissa and ordinate of the center point of the prediction cell numbered i and j are extracted from the first position, wherein, 1 ⁇ i ⁇ n, 1 ⁇ j ⁇ n; obtain the width and height of the table image, and adjust parameters; obtain the difference between the abscissa of the center point of the prediction cell numbered i and j and the first value of the width ratio, and based on the product of the first ratio and the adjustment parameter, determine the value of the row dimension of the i-th row and j-th column element in the adjacency matrix; obtain the prediction unit numbered i, j The difference between the vertical coordinates of the central point of the grid and the second ratio of the height, and based on the product of the second ratio and the adjustment parameter, determine the column dimension of the element in the
  • the third acquisition module 130 includes: an acquisition unit, configured to obtain the node feature of any prediction unit according to the first position of any prediction unit;
  • the fusion unit is used to input the node features and the adjacency matrix into the graph convolution network GCN, and the graph convolution network performs feature fusion of the node features and the adjacency matrix to generate the Fusion node features for any predicted cell.
  • the acquisition unit includes: a mapping subunit, configured to linearly map the first position of any prediction cell to obtain the spatial characteristics of any prediction cell;
  • the extraction subunit is used to extract the visual semantic features of any prediction cell from the table image based on the first position of any prediction cell;
  • the splicing subunit is used to combine any prediction cell The spatial feature and the visual semantic feature of the prediction cell are spliced to obtain the node feature of any prediction cell.
  • the extracting subunit is further configured to: determine the any prediction unit from the pixels contained in the table image based on the first position of the any prediction unit Included target pixels; extracting the visual semantic features of the target pixel points from the table image as the visual semantic features of any prediction unit.
  • the fourth acquisition module 140 is further configured to: obtain the location of each candidate second position of any prediction unit based on the fusion node feature of any prediction unit.
  • the predicted probability under: Obtain the maximum predicted probability from the predicted probability of any predicted cell in each candidate second position, and determine the candidate second position corresponding to the maximum predicted probability as the any predicted cell the second position of .
  • the fourth obtaining module 140 is further configured to: establish a target vector for any prediction unit, the target vector includes n dimensions, and n is the The number of candidate second positions of a prediction cell; based on the fusion node feature of any prediction cell, obtain the prediction probability that any vector dimension of the target vector is 0 or 1; from the Obtaining the maximum prediction probability from the prediction probability of the value of any vector dimension being 0 or 1, and determining the value corresponding to the maximum prediction probability as the target value of any vector dimension; based on the target value of the vector dimension The sum of the values yields the second position of either predicted cell.
  • the first obtaining module 110 is further configured to: extract the detection frame of each predicted cell from the table image, and obtain the prediction based on the detection frame The first position of the cell.
  • the second position includes at least one of a starting row number, an ending row number, a starting column number, and an ending column number of the prediction cell.
  • the device for detecting the position of a cell in an embodiment of the disclosure can use the predicted cell as a node, and obtain an adjacency matrix based on the first position of the predicted cell, and then obtain the fusion node of the predicted cell according to the first position and the adjacency matrix feature, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cells, and the obtained fusion node feature of the predicted cell can be expressed better, and according to the fusion node feature Obtaining the second position of the predicted cell can simultaneously obtain the first position and the second position of the cell, and the obtained cell position is more comprehensive and more robust.
  • an embodiment of the present disclosure also proposes an electronic device 200, including: a memory 210, a processor 220, and a computer program stored in the memory 210 and operable on the processor 220, When the processor 220 executes the program, it realizes the detection method of the cell position as proposed in the foregoing embodiments of the present disclosure.
  • the computer program stored in the memory can be executed by the processor, the predicted cell can be used as a node, and the adjacency matrix can be obtained based on the first position of the predicted cell, and then according to the first position and the adjacency matrix
  • the fusion node feature of the prediction cell is obtained, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and the expression effect of the fusion node feature of the prediction cell obtained is better OK
  • the second position of the predicted cell is obtained according to the fusion node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
  • the embodiments of the present disclosure also propose a computer-readable storage medium, on which a computer program is stored.
  • the program is executed by a processor, the method for detecting the position of a cell as proposed in the foregoing embodiments of the present disclosure is implemented. .
  • the computer readable storage medium is a non-transitory computer readable storage medium.
  • the computer-readable storage medium of the embodiment of the present disclosure stores the computer program and executes it by the processor.
  • the prediction unit can be used as a node, and the adjacency matrix can be obtained based on the first position of the prediction unit, and then according to the first position and the adjacency matrix to obtain the fusion node features of the predicted cell, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cells, and the obtained representation effect of the fusion node feature of the predicted cell It is better, and the second position of the predicted cell is obtained according to the fusion node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
  • the embodiments of the present disclosure also propose a computer program product, the computer program product includes computer program code, when the computer program code is run on the computer, it realizes the unit cell as proposed in the foregoing embodiments of the present disclosure. Location detection method.
  • the computer program product includes computer program code, when the computer program code is run on the computer, the predicted cell can be used as a node, and based on the first position of the predicted cell, the The adjacency matrix, and then according to the first position and the adjacency matrix, the fusion node features of the prediction cell are obtained, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and the obtained prediction
  • the representation effect of the fusion node feature of the cell is better, and the second position of the predicted cell can be obtained according to the fusion node feature, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive , with better robustness.
  • the embodiments of the present disclosure also propose a computer program, wherein the computer program includes computer program codes, and when the computer program codes are run on the computer, the computer executes the unit cell as proposed in the foregoing embodiments of the present disclosure.
  • Location detection method when the computer program codes are run on the computer, the computer executes the unit cell as proposed in the foregoing embodiments of the present disclosure.
  • the computer program of the embodiment of the present disclosure includes computer program code.
  • the computer can use the predicted cell as a node, and obtain an adjacency matrix based on the first position of the predicted cell, and then according to the first A position and adjacency matrix to obtain the fusion node feature of the predicted cell, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cell, and the obtained fusion node of the predicted cell.
  • the representation effect of the feature is better, and the second position of the predicted cell can be obtained according to the fusion node feature, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust .
  • first and second are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features.
  • the features defined as “first” and “second” may explicitly or implicitly include at least one of these features.
  • “plurality” means at least two, such as two, three, etc., unless otherwise specifically defined.
  • a "computer-readable medium” may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device or device.
  • computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary.
  • the program is processed electronically and stored in computer memory.
  • various parts of the present disclosure may be implemented in hardware, software, firmware or a combination thereof.
  • various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: a discrete Logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.

Abstract

Provided are a cell position detection method and apparatus, and an electronic device. The cell position detection method comprises: obtaining a first position of a prediction cell in a table image, wherein the first position is used for representing the position of a region occupied by the prediction cell in the table image; obtaining an adjacency matrix of the table image according to the first position, wherein each prediction cell in the table image is used as a node, and the adjacency matrix is used for indicating a positional relationship between prediction cells; obtaining a fused node feature of any prediction cell according to the first position of any prediction cell and the adjacency matrix; and obtaining a second position of any prediction cell according to the fused node feature of any prediction cell, wherein the second position is used for representing the row and/or the column to which the prediction cell belongs.

Description

单元格位置的检测方法、装置和电子设备Cell position detection method, device and electronic equipment
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202110772902.7、申请日为2021年7月8日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202110772902.7 and a filing date of July 8, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本公开涉及计算机应用技术领域,具体涉及一种单元格位置的检测方法、装置、电子设备、计算机可读存储介质、计算机程序产品和计算机程序。The present disclosure relates to the technical field of computer applications, and in particular to a cell position detection method, device, electronic equipment, computer-readable storage medium, computer program product and computer program.
背景技术Background technique
目前,表格数据具有简洁、直观、易于处理等优点,被广泛应用于人们的办公生活中。随着人工智能技术的发展,对表格数据的自动识别的要求越来越高,比如,自动从表格图像中检测出单元格的位置,以便可根据单元格的位置进行信息抽取等操作。然而,相关技术中的单元格位置的检测方法,检测出的单元格的位置信息不全面、鲁棒性较差。At present, tabular data has the advantages of simplicity, intuition, and ease of processing, and is widely used in people's office life. With the development of artificial intelligence technology, the requirements for automatic recognition of table data are getting higher and higher. For example, the position of cells is automatically detected from the table image, so that information extraction can be performed based on the position of the cells. However, in the cell position detection method in the related art, the detected cell position information is incomplete and has poor robustness.
发明内容Contents of the invention
为此,本公开第一方面实施例提出一种单元格位置的检测方法,可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。For this reason, the embodiment of the first aspect of the present disclosure proposes a cell position detection method, which can use the predicted cell as a node, and obtain an adjacency matrix based on the first position of the predicted cell, and then according to the first position and the adjacency matrix The fusion node feature of the prediction cell is obtained, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and the expression effect of the fusion node feature of the prediction cell obtained is better OK, and the second position of the predicted cell is obtained according to the fusion node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
本公开第二方面实施例提出一种单元格位置的检测装置。The embodiment of the second aspect of the present disclosure provides a device for detecting the position of a cell.
本公开第三方面实施例提出一种电子设备。The embodiment of the third aspect of the present disclosure provides an electronic device.
本公开第四方面实施例提出一种计算机可读存储介质。Embodiments of a fourth aspect of the present disclosure provide a computer-readable storage medium.
本公开第五方面实施例提出一种计算机程序产品。The embodiment of the fifth aspect of the present disclosure provides a computer program product.
本公开第六方面实施例提出一种计算机程序。The embodiment of the sixth aspect of the present disclosure provides a computer program.
本公开第一方面实施例提出了一种单元格位置的检测方法,包括:获取表格图像中预测单元格的第一位置,其中,所述第一位置用于表征所述预测单元格占用的区域在所述表格图像中的位置;根据所述第一位置,得到所述表格图像的邻接矩阵,其中,所述表格图像中的每个所述预测单元格为一个结点,所述邻接矩阵用于表示所述预测单元格之间的位置关系;根据任一预测单元格的第一位置和所述邻接矩阵,得到所述任一预测单元格的融合结点特征;根据所述任一预测单元格的融合结点特征,得到所述任一预测单元格的第二位置,其中,所述第二位置用于表征所述预测单元格的所属行和/或所属列。The embodiment of the first aspect of the present disclosure proposes a cell position detection method, including: obtaining the first position of the predicted cell in the table image, wherein the first position is used to represent the area occupied by the predicted cell position in the form image; according to the first position, the adjacency matrix of the form image is obtained, wherein each of the prediction cells in the form image is a node, and the adjacency matrix is used To represent the positional relationship between the prediction units; according to the first position of any prediction unit and the adjacency matrix, obtain the fusion node feature of any prediction unit; according to any prediction unit The fused node feature of the cell is used to obtain the second position of any prediction cell, wherein the second position is used to represent the row and/or column to which the prediction cell belongs.
本公开实施例的单元格位置的检测方法,可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与 预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。The detection method of the cell position in the embodiment of the present disclosure can use the predicted cell as a node, and obtain the adjacency matrix based on the first position of the predicted cell, and then obtain the fusion node of the predicted cell according to the first position and the adjacency matrix feature, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cells, and the obtained fusion node feature of the predicted cell can be expressed better, and according to the fusion node feature Obtaining the second position of the predicted cell can simultaneously obtain the first position and the second position of the cell, and the obtained cell position is more comprehensive and more robust.
在本公开的一个实施例中,所述第一位置包括所述预测单元格的中心点的二维坐标、所述预测单元格的宽度、所述预测单元格的高度中的至少一种。In an embodiment of the present disclosure, the first position includes at least one of the two-dimensional coordinates of the center point of the predicted cell, the width of the predicted cell, and the height of the predicted cell.
在本公开的一个实施例中,所述根据所述第一位置,得到所述表格图像的邻接矩阵,包括:基于所述第一位置和所述预测单元格的编号,确定所述邻接矩阵中对应元素的取值。In an embodiment of the present disclosure, the obtaining the adjacency matrix of the table image according to the first position includes: determining The value of the corresponding element.
在本公开的一个实施例中,所述基于所述第一位置和所述预测单元格的编号,确定所述邻接矩阵中对应元素的取值,包括:获取所述预测单元格的数量n,并按照编号1至n对每个所述预测单元格进行连续编号,其中,所述n为大于1的整数;从所述第一位置中提取出所述编号为i、j的所述预测单元格的中心点的横坐标和纵坐标,其中,1≤i≤n,1≤j≤n;获取所述表格图像的宽度和高度,以及调整参数;获取所述编号为i、j的所述预测单元格的中心点的横坐标的差值与所述宽度的第一比值,并基于所述第一比值和所述调整参数的乘积确定所述邻接矩阵中第i行第j列的元素的行维度的取值;获取所述编号为i、j的所述预测单元格的中心点的纵坐标的差值与所述高度的第二比值,并基于所述第二比值和所述调整参数的乘积确定所述邻接矩阵中第i行第j列的元素的列维度的取值。In an embodiment of the present disclosure, the determining the value of the corresponding element in the adjacency matrix based on the first position and the number of the predicted cell includes: obtaining the number n of the predicted cells, and sequentially numbering each of the prediction cells according to numbers 1 to n, wherein the n is an integer greater than 1; extracting the prediction units with the numbers i and j from the first position The abscissa and ordinate of the center point of the grid, wherein, 1≤i≤n, 1≤j≤n; obtain the width and height of the table image, and adjust parameters; obtain the number i, j Predicting the first ratio of the difference between the abscissa of the center point of the cell and the width, and determining the value of the element in row i and column j in the adjacency matrix based on the product of the first ratio and the adjustment parameter The value of the row dimension; obtain the second ratio of the difference between the vertical coordinates of the center points of the prediction cells numbered i and j to the height, and based on the second ratio and the adjustment parameter The product of determines the value of the column dimension of the element in row i and column j in the adjacency matrix.
在本公开的一个实施例中,所述根据任一预测单元格的第一位置和所述邻接矩阵,得到所述任一预测单元格的融合结点特征,包括:根据所述任一预测单元格的第一位置,得到所述任一预测单元格的结点特征;将所述结点特征和所述邻接矩阵输入至图卷积网络GCN中,由所述图卷积网络将所述结点特征与所述邻接矩阵进行特征融合,生成所述任一预测单元格的融合结点特征。In an embodiment of the present disclosure, the obtaining the fusion node feature of any prediction unit according to the first position of any prediction unit and the adjacency matrix includes: according to any prediction unit The first position of the cell to obtain the node feature of any prediction cell; the node feature and the adjacency matrix are input into the graph convolutional network GCN, and the node feature is obtained by the graph convolutional network. The point features are fused with the adjacency matrix to generate the fused node features of any prediction unit.
在本公开的一个实施例中,所述根据所述任一预测单元格的第一位置,得到所述任一预测单元格的结点特征,包括:对所述任一预测单元格的第一位置进行线性映射,得到所述任一预测单元格的空间特征;基于所述任一预测单元格的第一位置,从所述表格图像中提取出所述任一预测单元格的视觉语义特征;将所述任一预测单元格的所述空间特征和所述视觉语义特征进行拼接,得到所述任一预测单元格的结点特征。In an embodiment of the present disclosure, the obtaining the node feature of any prediction cell according to the first position of any prediction cell includes: for the first position of any prediction cell The position is linearly mapped to obtain the spatial characteristics of any of the predicted cells; based on the first position of any of the predicted cells, the visual semantic features of any of the predicted cells are extracted from the table image; The spatial feature and the visual semantic feature of the any prediction cell are spliced to obtain the node feature of the any prediction cell.
在本公开的一个实施例中,所述基于所述任一预测单元格的第一位置,从所述表格图像中提取出所述任一预测单元格的视觉语义特征,包括:基于所述任一预测单元格的第一位置,从所述表格图像包含的像素点中确定所述任一预测单元格包含的目标像素点;从所述表格图像中提取出所述目标像素点的视觉语义特征,作为所述任一预测单元格的所述视觉语义特征。In an embodiment of the present disclosure, the extracting the visual semantic feature of any prediction cell from the table image based on the first position of any prediction cell includes: A first position of a predicted cell, determining a target pixel contained in any of the predicted cells from the pixels contained in the table image; extracting the visual semantic features of the target pixel from the table image , as the visual semantic feature of any prediction cell.
在本公开的一个实施例中,所述根据所述任一预测单元格的融合结点特征,得到所述任一预测单元格的第二位置,包括:基于所述任一预测单元格的融合结点特征,得到所述任一预测单元格在每个候选第二位置下的预测概率;从所述任一预测单元格在每个候选第二位置下的预测概率中获取最大预测概率,并将最大预测概率对应的候选第二位置确定为所述任一预测单元格的第二位置。In an embodiment of the present disclosure, the obtaining the second position of any prediction unit according to the fusion node feature of any prediction unit includes: based on the fusion of any prediction unit The node feature is used to obtain the predicted probability of any of the predicted cells in each candidate second position; the maximum predicted probability is obtained from the predicted probability of any of the predicted cells in each candidate second position, and The candidate second position corresponding to the maximum prediction probability is determined as the second position of any prediction unit.
在本公开的一个实施例中,所述根据所述任一预测单元格的融合结点特征,得到所述任一预测单元格的第二位置,包括:针对所述任一预测单元格,建立目标向量,所述目标向量包括n个维度,所述n为所述任一预测单元格的候选第二位置的数量;基于所述任一预测单元格的融合结点特征,得到所述目 标向量的任一向量维度的取值为0或1的预测概率;从所述任一向量维度的取值为0或1的预测概率中获取最大预测概率,并将最大预测概率对应的取值确定为所述任一向量维度的目标取值;基于所述向量维度的目标取值的和值,得到所述任一预测单元格的第二位置。In an embodiment of the present disclosure, the obtaining the second position of any prediction unit according to the fusion node feature of any prediction unit includes: establishing for any prediction unit A target vector, the target vector includes n dimensions, and the n is the number of candidate second positions of any prediction unit; based on the fusion node characteristics of any prediction unit, the target vector is obtained The predicted probability that the value of any vector dimension is 0 or 1; the maximum predicted probability is obtained from the predicted probability that the value of any vector dimension is 0 or 1, and the value corresponding to the maximum predicted probability is determined as The target value of any vector dimension; based on the sum of the target values of the vector dimension, the second position of any prediction cell is obtained.
在本公开的一个实施例中,所述获取表格图像中预测单元格的第一位置,包括:从所述表格图像中提取出每个所述预测单元格的检测框,并基于所述检测框获取所述预测单元格的第一位置。In an embodiment of the present disclosure, the obtaining the first position of the prediction cell in the table image includes: extracting a detection frame of each prediction cell from the table image, and based on the detection frame Get the first position of the predicted cell.
在本公开的一个实施例中,所述第二位置包括所述预测单元格的起始行的编号、终止行的编号、起始列的编号、终止列的编号中的至少一种。In an embodiment of the present disclosure, the second position includes at least one of a starting row number, an ending row number, a starting column number, and an ending column number of the prediction cell.
本公开第二方面实施例提出了一种单元格位置的检测装置,包括:第一获取模块,用于获取表格图像中预测单元格的第一位置,其中,所述第一位置用于表征所述预测单元格占用的区域在所述表格图像中的位置;第二获取模块,用于根据所述第一位置,得到所述表格图像的邻接矩阵,其中,所述表格图像中的每个所述预测单元格为一个结点,所述邻接矩阵用于表示所述预测单元格之间的位置关系;第三获取模块,用于根据任一预测单元格的第一位置和所述邻接矩阵,得到所述任一预测单元格的融合结点特征;第四获取模块,用于根据所述任一预测单元格的融合结点特征,得到所述任一预测单元格的第二位置,其中,所述第二位置用于表征所述预测单元格的所属行和/或所属列。The embodiment of the second aspect of the present disclosure proposes a cell position detection device, including: a first acquisition module, configured to acquire the first position of the predicted cell in the table image, wherein the first position is used to represent the The position of the region occupied by the predicted cell in the form image; the second acquisition module is configured to obtain the adjacency matrix of the form image according to the first position, wherein each of the form images in the form The prediction cell is a node, and the adjacency matrix is used to represent the positional relationship between the prediction cells; the third acquisition module is used to, according to the first position of any prediction cell and the adjacency matrix, Obtain the fusion node feature of any prediction unit; the fourth acquisition module is configured to obtain the second position of any prediction unit according to the fusion node feature of any prediction unit, wherein, The second position is used to represent the row and/or the column to which the predicted cell belongs.
本公开实施例的单元格位置的检测装置,可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。The device for detecting the position of a cell in an embodiment of the disclosure can use the predicted cell as a node, and obtain an adjacency matrix based on the first position of the predicted cell, and then obtain the fusion node of the predicted cell according to the first position and the adjacency matrix feature, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cells, and the obtained fusion node feature of the predicted cell can be expressed better, and according to the fusion node feature Obtaining the second position of the predicted cell can simultaneously obtain the first position and the second position of the cell, and the obtained cell position is more comprehensive and more robust.
在本公开的一个实施例中,所述第一位置包括所述预测单元格的中心点的二维坐标、所述预测单元格的宽度、所述预测单元格的高度中的至少一种。In an embodiment of the present disclosure, the first position includes at least one of the two-dimensional coordinates of the center point of the predicted cell, the width of the predicted cell, and the height of the predicted cell.
在本公开的一个实施例中,所述第二获取模块,还用于:基于所述第一位置和所述预测单元格的编号,确定所述邻接矩阵中对应元素的取值。In an embodiment of the present disclosure, the second obtaining module is further configured to: determine the value of the corresponding element in the adjacency matrix based on the first position and the number of the predicted cell.
在本公开的一个实施例中,所述第二获取模块,还用于:获取所述预测单元格的数量n,并按照编号1至n对每个所述预测单元格进行连续编号,其中,所述n为大于1的整数;从所述第一位置中提取出所述编号为i、j的所述预测单元格的中心点的横坐标和纵坐标,其中,1≤i≤n,1≤j≤n;获取所述表格图像的宽度和高度,以及调整参数;获取所述编号为i、j的所述预测单元格的中心点的横坐标的差值与所述宽度的第一比值,并基于所述第一比值和所述调整参数的乘积确定所述邻接矩阵中第i行第j列的元素的行维度的取值;获取所述编号为i、j的所述预测单元格的中心点的纵坐标的差值与所述高度的第二比值,并基于所述第二比值和所述调整参数的乘积确定所述邻接矩阵中第i行第j列的元素的列维度的取值。In an embodiment of the present disclosure, the second obtaining module is further configured to: obtain the number n of the predicted cells, and sequentially number each predicted cell according to numbers 1 to n, wherein, The n is an integer greater than 1; the abscissa and ordinate of the center point of the prediction cell numbered i and j are extracted from the first position, wherein, 1≤i≤n, 1 ≤j≤n; obtain the width and height of the table image, and adjust parameters; obtain the first ratio of the difference between the abscissa of the center point of the predicted cell numbered i and j to the width , and based on the product of the first ratio and the adjustment parameter, determine the value of the row dimension of the element in the i-th row and j-th column in the adjacency matrix; obtain the prediction cell numbered i, j The second ratio of the difference between the vertical coordinates of the central point of the center point and the height, and determine the column dimension of the element in the i-th row and j-th column in the adjacency matrix based on the product of the second ratio and the adjustment parameter value.
在本公开的一个实施例中,所述第三获取模块,包括:获取单元,用于根据所述任一预测单元格的第一位置,得到所述任一预测单元格的结点特征;融合单元,用于将所述结点特征和所述邻接矩阵输入至图卷积网络GCN中,由所述图卷积网络将所述结点特征与所述邻接矩阵进行特征融合,生成所述任一预测单元格的融合结点特征。In an embodiment of the present disclosure, the third acquisition module includes: an acquisition unit, configured to obtain the node feature of any prediction unit according to the first position of any prediction unit; fusion A unit for inputting the node features and the adjacency matrix into the graph convolution network GCN, and the graph convolution network performs feature fusion of the node features and the adjacency matrix to generate the any A fused node feature for predicted cells.
在本公开的一个实施例中,所述获取单元,包括:映射子单元,用于对所述任一预测单元格的第一位置进行线性映射,得到所述任一预测单元格的空间特征;提取子单元,用于基于所述任一预测单元格的第一位置,从所述表格图像中提取出所述任一预测单元格的视觉语义特征;拼接子单元,用于将所述任一预测单元格的所述空间特征和所述视觉语义特征进行拼接,得到所述任一预测单元格的结点特征。In an embodiment of the present disclosure, the acquisition unit includes: a mapping subunit, configured to linearly map the first position of any prediction cell to obtain the spatial characteristics of any prediction cell; The extraction subunit is used to extract the visual semantic features of any prediction cell from the table image based on the first position of any prediction cell; the splicing subunit is used to combine any prediction cell The spatial feature and the visual semantic feature of the prediction cell are spliced to obtain the node feature of any prediction cell.
在本公开的一个实施例中,所述提取子单元,还用于:基于所述任一预测单元格的第一位置,从所述表格图像包含的像素点中确定所述任一预测单元格包含的目标像素点;从所述表格图像中提取出所述目标像素点的视觉语义特征,作为所述任一预测单元格的所述视觉语义特征。In an embodiment of the present disclosure, the extracting subunit is further configured to: determine the any prediction unit from the pixels contained in the table image based on the first position of the any prediction unit Included target pixels; extracting the visual semantic features of the target pixel points from the table image as the visual semantic features of any prediction unit.
在本公开的一个实施例中,所述第四获取模块,还用于:基于所述任一预测单元格的融合结点特征,得到所述任一预测单元格在每个候选第二位置下的预测概率;从所述任一预测单元格在每个候选第二位置下的预测概率中获取最大预测概率,并将最大预测概率对应的候选第二位置确定为所述任一预测单元格的第二位置。In an embodiment of the present disclosure, the fourth obtaining module is further configured to: obtain the information of each candidate second position of the any prediction unit based on the fusion node feature of the any prediction unit. the predicted probability; obtain the maximum predicted probability from the predicted probability of any predicted cell in each candidate second position, and determine the candidate second position corresponding to the maximum predicted probability as the predicted probability of any predicted cell second position.
在本公开的一个实施例中,所述第四获取模块,还用于:针对所述任一预测单元格,建立目标向量,所述目标向量包括n个维度,所述n为所述任一预测单元格的候选第二位置的数量;基于所述任一预测单元格的融合结点特征,得到所述目标向量的任一向量维度的取值为0或1的预测概率;从所述任一向量维度的取值为0或1的预测概率中获取最大预测概率,并将最大预测概率对应的取值确定为所述任一向量维度的目标取值;基于所述向量维度的目标取值的和值,得到所述任一预测单元格的第二位置。In an embodiment of the present disclosure, the fourth acquisition module is further configured to: establish a target vector for any of the prediction cells, the target vector includes n dimensions, and n is the any The number of candidate second positions of the prediction cell; based on the fusion node feature of any of the prediction cells, the prediction probability of any vector dimension of the target vector being 0 or 1 is obtained; from the any Obtaining the maximum prediction probability from the prediction probability with a value of 0 or 1 in a vector dimension, and determining the value corresponding to the maximum prediction probability as the target value of any vector dimension; based on the target value of the vector dimension The sum value of , to obtain the second position of any predicted cell.
在本公开的一个实施例中,所述第一获取模块,还用于:从所述表格图像中提取出每个所述预测单元格的检测框,并基于所述检测框获取所述预测单元格的第一位置。In an embodiment of the present disclosure, the first acquisition module is further configured to: extract the detection frame of each prediction unit cell from the table image, and obtain the prediction unit based on the detection frame the first position of the grid.
在本公开的一个实施例中,所述第二位置包括所述预测单元格的起始行的编号、终止行的编号、起始列的编号、终止列的编号中的至少一种。In an embodiment of the present disclosure, the second position includes at least one of a starting row number, an ending row number, a starting column number, and an ending column number of the prediction cell.
本公开第三方面实施例提出了一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,实现如前述第一方面实施例所述的单元格位置的检测方法。The embodiment of the third aspect of the present disclosure proposes an electronic device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the program, the aforementioned first On the one hand, the detection method of the cell position described in the embodiment.
本公开实施例的电子设备,通过处理器执行存储在存储器上的计算机程序,可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。In the electronic device of the embodiment of the present disclosure, the computer program stored in the memory can be executed by the processor, the predicted cell can be used as a node, and the adjacency matrix can be obtained based on the first position of the predicted cell, and then according to the first position and the adjacency matrix The fusion node feature of the prediction cell is obtained, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and the expression effect of the fusion node feature of the prediction cell obtained is better OK, and the second position of the predicted cell is obtained according to the fusion node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
本公开第四方面实施例提出了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,实现如前述第一方面实施例所述的单元格位置的检测方法。The embodiment of the fourth aspect of the present disclosure provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the method for detecting the position of a cell as described in the embodiment of the first aspect is implemented.
本公开实施例的计算机可读存储介质,通过存储计算机程序并被处理器执行,可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。The computer-readable storage medium of the embodiment of the present disclosure stores the computer program and executes it by the processor. The prediction unit can be used as a node, and the adjacency matrix can be obtained based on the first position of the prediction unit, and then according to the first position and the adjacency matrix to obtain the fusion node features of the predicted cell, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cells, and the obtained representation effect of the fusion node feature of the predicted cell It is better, and the second position of the predicted cell is obtained according to the fusion node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
本公开第五方面实施例提出了一种计算机程序产品,其中所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,实现如前述第一方面实施例所述的单元格位置的检测方法。The embodiment of the fifth aspect of the present disclosure proposes a computer program product, wherein the computer program product includes computer program code, and when the computer program code is run on a computer, the above-mentioned embodiment of the first aspect is implemented. The detection method of the cell position.
本公开实施例的计算机程序产品,该计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。The computer program product of the embodiment of the present disclosure, the computer program product includes computer program code, when the computer program code is run on the computer, the predicted cell can be used as a node, and based on the first position of the predicted cell, the The adjacency matrix, and then according to the first position and the adjacency matrix, the fusion node features of the prediction cell are obtained, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and the obtained prediction The representation effect of the fusion node feature of the cell is better, and the second position of the predicted cell can be obtained according to the fusion node feature, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive , with better robustness.
本公开第六方面实施例提出了一种计算机程序,其中所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行如前述第一方面实施例所述的单元格位置的检测方法。The embodiment of the sixth aspect of the present disclosure proposes a computer program, wherein the computer program includes computer program code, and when the computer program code is run on the computer, the computer executes the unit described in the embodiment of the first aspect The detection method of grid position.
本公开实施例的计算机程序,包括计算机程序代码,当该计算机程序代码在计算机上运行时,计算机可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。The computer program of the embodiment of the present disclosure includes computer program code. When the computer program code is run on the computer, the computer can use the predicted cell as a node, and obtain an adjacency matrix based on the first position of the predicted cell, and then according to the first A position and adjacency matrix to obtain the fusion node feature of the predicted cell, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cell, and the obtained fusion node of the predicted cell The representation effect of the feature is better, and the second position of the predicted cell can be obtained according to the fusion node feature, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust .
本公开附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。Additional aspects and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
附图说明Description of drawings
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present disclosure will become apparent and understandable from the following description of the embodiments in conjunction with the accompanying drawings, wherein:
图1为根据本公开一个实施例的单元格位置的检测方法的流程示意图;FIG. 1 is a schematic flowchart of a method for detecting a cell position according to an embodiment of the present disclosure;
图2为根据本公开一个实施例的单元格位置的检测方法中确定邻接矩阵中对应元素的取值的流程示意图;FIG. 2 is a schematic flow diagram of determining values of corresponding elements in an adjacency matrix in a cell position detection method according to an embodiment of the present disclosure;
图3为根据本公开一个实施例的单元格位置的检测方法中得到任一预测单元格的融合结点特征的流程示意图;FIG. 3 is a schematic flow diagram of obtaining fusion node features of any predicted cell in a cell position detection method according to an embodiment of the present disclosure;
图4为根据本公开一个实施例的单元格位置的检测方法中得到任一预测单元格的结点特征的流程示意图;FIG. 4 is a schematic flow diagram of obtaining the node characteristics of any predicted cell in the cell position detection method according to an embodiment of the present disclosure;
图5为根据本公开一个实施例的单元格位置的检测方法中得到任一预测单元格的第二位置的流程示意图;FIG. 5 is a schematic flow diagram of obtaining the second position of any predicted cell in the cell position detection method according to an embodiment of the present disclosure;
图6为根据本公开另一个实施例的单元格位置的检测方法中得到任一预测单元格的第二位置的流程示意图;6 is a schematic flowchart of obtaining the second position of any predicted cell in a cell position detection method according to another embodiment of the present disclosure;
图7为根据本公开一个实施例的单元格位置的检测模型的示意图;Fig. 7 is a schematic diagram of a detection model of a cell position according to an embodiment of the present disclosure;
图8为根据本公开一个实施例的单元格位置的检测装置的结构示意图;FIG. 8 is a schematic structural diagram of a device for detecting cell positions according to an embodiment of the present disclosure;
图9为根据本公开一个实施例的电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
具体实施方式detailed description
下面详细描述本公开的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本公开,而不能理解为对本公开的限制。Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the drawings, in which the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present disclosure and should not be construed as limiting the present disclosure.
下面参照附图描述本公开实施例的单元格位置的检测方法、装置、电子设备和存储介质。The method, device, electronic device and storage medium for detecting a cell position according to the embodiments of the present disclosure will be described below with reference to the accompanying drawings.
图1为根据本公开一个实施例的单元格位置的检测方法的流程示意图。FIG. 1 is a schematic flowchart of a method for detecting a cell position according to an embodiment of the present disclosure.
如图1所示,本公开实施例的单元格位置的检测方法,包括步骤S101-S104。As shown in FIG. 1 , the method for detecting a cell position in an embodiment of the present disclosure includes steps S101-S104.
S101,获取表格图像中预测单元格的第一位置,其中,第一位置用于表征预测单元格占用的区域在表格图像中的位置。S101. Acquire a first position of a predicted cell in the table image, where the first position is used to represent a position in the table image of an area occupied by the predicted cell.
需要说明的是,本公开实施例的单元格位置的检测方法的执行主体可为单元格位置的检测装置,本公开实施例的单元格位置的检测装置可以配置在任意电子设备中,以使该电子设备可以执行本公开实施例的单元格位置的检测方法。其中,电子设备可以为个人电脑(Personal Computer,简称PC)、云端设备、移动设备等,移动设备例如可以为手机、平板电脑、个人数字助理、穿戴式设备、车载设备等具有各种操作系统、触摸屏和/或显示屏的硬件设备。It should be noted that the executor of the cell position detection method in the embodiment of the present disclosure may be a cell position detection device, and the cell position detection device in the embodiment of the present disclosure can be configured in any electronic device, so that the The electronic device can execute the detection method of the cell position in the embodiment of the present disclosure. Among them, the electronic device can be a personal computer (Personal Computer, referred to as PC), cloud device, mobile device, etc., and the mobile device can be a mobile phone, tablet computer, personal digital assistant, wearable device, vehicle-mounted device, etc., with various operating systems, Hardware devices for touch screens and/or displays.
本公开的实施例中,可获取表格图像中预测单元格的第一位置。可以理解的是,一个表格图像中可包含至少一个预测单元格,不同的预测单元格可对应不同的第一位置。In the embodiment of the present disclosure, the first position of the predicted cell in the table image may be obtained. It can be understood that a table image may contain at least one prediction cell, and different prediction cells may correspond to different first positions.
需要说明的是,本公开的实施例中,第一位置用于表征预测单元格占用的区域在表格图像中的位置,即可根据第一位置确定预测单元格占用的区域在表格图像中的位置,即可根据第一位置实现预测单元格的定位。It should be noted that, in the embodiments of the present disclosure, the first position is used to represent the position of the area occupied by the predicted cell in the table image, that is, the position of the area occupied by the predicted cell in the table image can be determined according to the first position , the location of the predicted cell can be realized according to the first position.
在一种实施方式中,第一位置包括预测单元格的中心点的二维坐标、预测单元格的宽度、预测单元格的高度中的至少一种,此时预测单元格占用的区域为矩形。In one embodiment, the first position includes at least one of the two-dimensional coordinates of the center point of the predicted cell, the width of the predicted cell, and the height of the predicted cell. At this time, the area occupied by the predicted cell is a rectangle.
在一种实施方式中,可对表格图像进行单元格识别,以生成预测单元格的检测框,则获取表格图像中预测单元格的第一位置,可包括从表格图像中提取出每个预测单元格的检测框,并基于检测框获取预测单元格的第一位置。In one embodiment, the cell recognition can be performed on the table image to generate the detection frame of the predicted cell, then obtaining the first position of the predicted cell in the table image may include extracting each predicted cell from the table image The detection frame of the cell, and obtain the first position of the predicted cell based on the detection frame.
在一种实施方式中,对表格图像进行单元格识别,以生成预测单元格的检测框,可包括按照单元格识别算法对表格图像进行单元格识别,从而可从表格图像中定位到预测单元格,以生成预测单元格的检测框。其中,单元格识别算法可根据实际情况进行设置,这里不做过多限定。In one embodiment, performing cell recognition on the table image to generate a detection frame for the predicted cell may include performing cell recognition on the table image according to a cell recognition algorithm, so that the predicted cell can be located from the table image , to generate detection boxes for predicted cells. Wherein, the cell identification algorithm can be set according to the actual situation, and there is no excessive limitation here.
在一种实施方式中,基于检测框获取预测单元格的第一位置,可包括获取检测框的中心点的二维坐标、检测框的宽度和高度,将检测框的中心点的二维坐标作为预测单元格的中心点的二维坐标,将检测框的宽度和高度分别作为预测单元格的宽度和高度。In one embodiment, obtaining the first position of the predicted cell based on the detection frame may include obtaining the two-dimensional coordinates of the center point of the detection frame, the width and height of the detection frame, and taking the two-dimensional coordinates of the center point of the detection frame as The two-dimensional coordinates of the center point of the predicted cell, and the width and height of the detection frame are respectively used as the width and height of the predicted cell.
S102,根据第一位置,得到表格图像的邻接矩阵,其中,表格图像中的每个预测单元格为一个结 点,邻接矩阵用于表示预测单元格之间的位置关系。S102. Obtain an adjacency matrix of the table image according to the first position, wherein each prediction cell in the table image is a node, and the adjacency matrix is used to represent the positional relationship between the prediction cells.
本公开的实施例中,可将表格图像中的每个预测单元格作为一个结点,预测单元格和结点具有一一对应关系,每个结点用于表征对应的预测单元格。相应的,邻接矩阵用于表示预测单元格之间的位置关系。In the embodiments of the present disclosure, each prediction cell in the table image may be regarded as a node, the prediction cells and the nodes have a one-to-one correspondence, and each node is used to represent the corresponding prediction cell. Correspondingly, the adjacency matrix is used to represent the positional relationship between predicted cells.
本公开的实施例中,可根据第一位置得到表格图像的邻接矩阵。可以理解的是,可根据任意两个预测单元格的第一位置,得到任意两个预测单元格之间的位置关系,进而得到邻接矩阵中对应元素的取值。其中,位置关系包括但不限于欧式距离、曼哈顿距离等,这里不做过多限定。In the embodiment of the present disclosure, the adjacency matrix of the table image can be obtained according to the first position. It can be understood that, according to the first positions of any two predicted cells, the positional relationship between any two predicted cells can be obtained, and then the value of the corresponding element in the adjacency matrix can be obtained. Wherein, the location relationship includes but is not limited to Euclidean distance, Manhattan distance, etc., which are not limited here.
在一种实施方式中,邻接矩阵中的元素可用于表示任意两个预测单元格对应的结点之间的无向边。In one embodiment, elements in the adjacency matrix may be used to represent undirected edges between nodes corresponding to any two prediction cells.
S103,根据任一预测单元格的第一位置和邻接矩阵,得到任一预测单元格的融合结点特征。S103. According to the first position and the adjacency matrix of any prediction cell, obtain the fusion node feature of any prediction cell.
本公开的实施例中,可根据任一预测单元格的第一位置和邻接矩阵,得到任一预测单元格的融合结点特征。由此,该方法可基于预测单元格的第一位置和邻接矩阵得到融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好。In the embodiment of the present disclosure, the fusion node feature of any prediction unit can be obtained according to the first position and the adjacency matrix of any prediction unit. Therefore, this method can obtain the fusion node features based on the first position of the prediction cell and the adjacency matrix, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and obtain The representation of the fusion node features of predicted cells is better.
例如,假设预测单元格的数量为n个,则获取的预测单元格的第一位置为n个,则可根据n个第一位置和邻接矩阵,得到n个融合结点特征。For example, assuming that the number of prediction cells is n, and the number of first positions of the prediction cells obtained is n, then n fusion node features can be obtained according to the n first positions and the adjacency matrix.
S104,根据任一预测单元格的融合结点特征,得到任一预测单元格的第二位置,其中,第二位置用于表征预测单元格的所属行和/或所属列。S104. Obtain a second position of any prediction cell according to the fusion node feature of any prediction cell, where the second position is used to represent the row and/or column to which the prediction cell belongs.
本公开的实施例中,可根据任一预测单元格的融合结点特征,得到任一预测单元格的第二位置,即可根据任一预测单元格的融合结点特征,对任一预测单元格的第二位置进行预测,得到任一预测单元格的第二位置。In the embodiment of the present disclosure, the second position of any prediction unit can be obtained according to the fusion node characteristics of any prediction unit, that is, according to the fusion node characteristics of any prediction unit, any prediction unit predict the second position of any predicted cell, and obtain the second position of any predicted cell.
需要说明的是,本公开的实施例中,第二位置用于表征预测单元格的所属行和/或所属列,即可根据第二位置确定预测单元格在表格中的所属行和/或所属列,即可根据第二位置实现预测单元格的定位。It should be noted that, in the embodiments of the present disclosure, the second position is used to represent the row and/or column to which the prediction cell belongs, that is, the row and/or column to which the prediction cell belongs in the table can be determined according to the second position. column, the positioning of the predicted cell can be realized according to the second position.
在一种实施方式中,第二位置包括预测单元格的起始行的编号、终止行的编号、起始列的编号、终止列的编号中的至少一种。可以理解的是,可预先对表格中的行、列分别进行编号。In one embodiment, the second position includes at least one of the number of the starting row, the number of the ending row, the number of the starting column, and the number of the ending column of the predicted cell. It can be understood that the rows and columns in the table can be numbered respectively in advance.
在一种实施方式中,可根据预测单元格的起始行的编号、终止行的编号确定预测单元格的所属行。例如,可获取处于起始行的编号和终止行的编号之间的候选编号,将起始行的编号、候选编号、终止行的编号确定为所属行的编号,从而根据确定的所属行的编号确定预测单元格的所属行。需要说明的是,确定预测单元格的所属列的方式可参照上述确定预测单元格的所属行的方式,这里不再赘述。In one embodiment, the row to which the predicted cell belongs may be determined according to the number of the start row and the number of the end row of the predicted cell. For example, the candidate numbers between the number of the start row and the number of the end row can be obtained, and the number of the start row, the candidate number, and the number of the end row can be determined as the number of the corresponding row, so that according to the number of the determined row Determine the row to which the forecasted cell belongs. It should be noted that, the manner of determining the column to which the prediction cell belongs may refer to the manner of determining the row to which the prediction cell belongs, and details are not repeated here.
本公开的实施例中,根据任一预测单元格的融合结点特征,得到任一预测单元格的第二位置,可包括将任一预测单元格的融合结点特征输入至位置预测算法中,由位置预测算法根据融合结点特征进行位置预测,生成任一预测单元格的第二位置。其中,位置预测算法可根据实际情况进行设置,这里不做过多限定。In the embodiment of the present disclosure, obtaining the second position of any prediction cell according to the fusion node characteristics of any prediction unit may include inputting the fusion node characteristics of any prediction unit into the position prediction algorithm, The position prediction algorithm is used to predict the position according to the fusion node features, and the second position of any predicted cell is generated. Wherein, the location prediction algorithm can be set according to actual conditions, and there is no excessive limitation here.
综上,根据本公开实施例的单元格位置的检测方法,可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点 特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。To sum up, according to the detection method of the cell position according to the embodiment of the present disclosure, the predicted cell can be used as a node, and the adjacency matrix can be obtained based on the first position of the predicted cell, and then the predicted cell can be obtained according to the first position and the adjacency matrix The fusion node feature of the prediction cell can match the fusion node feature with the first position of the prediction cell and the positional relationship between the prediction cells, and the obtained fusion node feature of the prediction cell has a better representation effect, and according to The second position of the predicted cell can be obtained by fusing the node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
在上述任一实施例的基础上,步骤S102中根据第一位置,得到表格图像的邻接矩阵,可包括基于第一位置和预测单元格的编号,确定邻接矩阵中对应元素的取值。On the basis of any of the above embodiments, obtaining the adjacency matrix of the table image according to the first position in step S102 may include determining values of corresponding elements in the adjacency matrix based on the first position and the number of the predicted cell.
可以理解的是,可基于任意两个预测单元格的第一位置,得到任意两个预测单元格之间的位置关系,并根据任意两个预测单元格的编号确定邻接矩阵中对应元素的目标编号,进而可根据任意两个预测单元格之间的位置关系确定邻接矩阵中目标编号的元素的取值。It can be understood that the positional relationship between any two predicted cells can be obtained based on the first positions of any two predicted cells, and the target number of the corresponding element in the adjacency matrix can be determined according to the numbers of any two predicted cells , and then the value of the element of the target number in the adjacency matrix can be determined according to the positional relationship between any two predicted cells.
在一种实施方式中,如图2所示,基于第一位置和预测单元格的编号,确定邻接矩阵中对应元素的取值,包括步骤S201-S205。In one embodiment, as shown in FIG. 2 , based on the first position and the number of the predicted cell, the value of the corresponding element in the adjacency matrix is determined, including steps S201-S205.
S201,获取预测单元格的数量n,并按照编号1至n对每个预测单元格进行连续编号,其中,n为大于1的整数。S201. Acquire the number n of predicted cells, and sequentially number each predicted cell according to the numbers 1 to n, wherein n is an integer greater than 1.
本公开的实施例中,可按照编号1至n对预测单元格进行连续编号,编号1至n可随机分配。例如,若预测单元格的数量为10,则可按照编号1至10对每个预测单元格进行连续编号。In the embodiment of the present disclosure, the prediction cells can be numbered consecutively according to numbers 1 to n, and the numbers 1 to n can be randomly assigned. For example, if the number of predicted cells is 10, each predicted cell may be numbered consecutively according to numbers 1 to 10.
S202,从第一位置中提取出编号为i、j的预测单元格的中心点的横坐标和纵坐标,其中,1≤i≤n,1≤j≤n。S202. Extract the abscissa and ordinate of the central point of the prediction cell numbered i and j from the first position, where 1≤i≤n, 1≤j≤n.
本公开的实施例中,第一位置包括预测单元格的中心点的横坐标和纵坐标,可从第一位置中提取出编号为i、j的预测单元格的中心点的横坐标和纵坐标。In the embodiment of the present disclosure, the first position includes the abscissa and ordinate of the center point of the predicted cell, and the abscissa and ordinate of the center point of the predicted cell numbered i and j can be extracted from the first position .
其中,1≤i≤n,1≤j≤n,且i、j均为整数。Wherein, 1≤i≤n, 1≤j≤n, and both i and j are integers.
可以理解的是,第一位置与预测单元格的编号具有对应关系,则可根据编号i、j查询上述对应关系,得到编号为i、j的预测单元格的中心点的横坐标和纵坐标。It can be understood that the first position has a corresponding relationship with the number of the predicted cell, and the above corresponding relationship can be queried according to the numbers i and j to obtain the abscissa and ordinate of the center point of the predicted cell with the numbers i and j.
在一种实施方式中,可预先建立第一位置与预测单元格的编号之间的映射关系或者映射表,其中,第一位置包括预测单元格的中心点的横坐标和纵坐标,则可根据预测单元格的编号查询上述映射关系或者映射表,获取预测单元格的中心点的横坐标和纵坐标。应说明的是,上述映射关系或者映射表均可根据实际情况进行设置,这里不做过多限定。In one embodiment, a mapping relationship or a mapping table between the first position and the number of the predicted cell can be established in advance, wherein the first position includes the abscissa and ordinate of the center point of the predicted cell, then it can be calculated according to The number of the predicted cell queries the above mapping relationship or mapping table to obtain the abscissa and ordinate of the center point of the predicted cell. It should be noted that the above mapping relationship or mapping table can be set according to actual conditions, and there is no excessive limitation here.
S203,获取表格图像的宽度和高度,以及调整参数。S203, acquiring the width and height of the table image, and adjusting parameters.
在一种实施方式中,获取表格图像的宽度和高度,可包括按照图像尺寸识别算法对表格图像进行尺寸识别,得到表格图像的宽度和高度。其中,图像尺寸识别算法可根据实际情况进行设置,这里不做过多限定。In one embodiment, obtaining the width and height of the form image may include performing size recognition on the form image according to an image size recognition algorithm to obtain the width and height of the form image. Wherein, the image size recognition algorithm can be set according to the actual situation, which is not limited here.
需要说明的是,本公开的实施例中,调整参数可根据实际情况进行设置,这里不做过多限定。在一种实施方式中,调整参数与表格的行数和/或列数正相关。It should be noted that, in the embodiments of the present disclosure, the adjustment parameters may be set according to actual conditions, and are not limited here. In one embodiment, the adjustment parameter is positively correlated with the number of rows and/or columns of the table.
S204,获取编号为i、j的预测单元格的中心点的横坐标的差值与宽度的第一比值,并基于第一比值和调整参数的乘积确定邻接矩阵中第i行第j列的元素的行维度的取值。S204. Obtain the first ratio of the difference between the abscissa of the central point of the prediction cell numbered i and j to the width, and determine the element in row i and column j in the adjacency matrix based on the product of the first ratio and the adjustment parameter The value of the row dimension of .
在一种实施方式中,采用如下公式计算邻接矩阵中第i行第j列的元素的行维度的取值:In one implementation, the following formula is used to calculate the value of the row dimension of the element in row i and column j in the adjacency matrix:
Figure PCTCN2022092571-appb-000001
Figure PCTCN2022092571-appb-000001
其中,
Figure PCTCN2022092571-appb-000002
为邻接矩阵中第i行第j列的元素的行维度的取值,
Figure PCTCN2022092571-appb-000003
为编号为i的预测单元格的中心点的横坐标,
Figure PCTCN2022092571-appb-000004
为编号为j的预测单元格的中心点的横坐标,w为表格图像的宽度,c为调整参数。
in,
Figure PCTCN2022092571-appb-000002
is the value of the row dimension of the element in row i and column j in the adjacency matrix,
Figure PCTCN2022092571-appb-000003
is the abscissa of the center point of the prediction cell numbered i,
Figure PCTCN2022092571-appb-000004
is the abscissa of the center point of the predicted cell with number j, w is the width of the table image, and c is the adjustment parameter.
可以理解的是,确定邻接矩阵中第i行第j列的元素的行维度的取值还可为其他方式,这里不再赘述。It can be understood that other methods may be used to determine the value of the row dimension of the element in row i and column j in the adjacency matrix, which will not be repeated here.
S205,获取编号为i、j的预测单元格的中心点的纵坐标的差值与高度的第二比值,并基于第二比值和调整参数的乘积确定邻接矩阵中第i行第j列的元素的列维度的取值。S205. Obtain the second ratio of the difference between the vertical coordinates of the center point of the prediction cell numbered i and j to the height, and determine the element in row i and column j in the adjacency matrix based on the product of the second ratio and the adjustment parameter The value of the column dimension of .
在一种实施方式中,采用如下公式计算邻接矩阵中第i行第j列的元素的列维度的取值:In one embodiment, the following formula is used to calculate the value of the column dimension of the element in the i-th row and the j-th column in the adjacency matrix:
Figure PCTCN2022092571-appb-000005
Figure PCTCN2022092571-appb-000005
其中,
Figure PCTCN2022092571-appb-000006
为邻接矩阵中第i行第j列的元素的列维度的取值,
Figure PCTCN2022092571-appb-000007
为编号为i的预测单元格的中心点的纵坐标,
Figure PCTCN2022092571-appb-000008
为编号为j的预测单元格的中心点的纵坐标,H为表格图像的宽度,c为调整参数。
in,
Figure PCTCN2022092571-appb-000006
is the value of the column dimension of the element in row i and column j in the adjacency matrix,
Figure PCTCN2022092571-appb-000007
is the ordinate of the center point of the prediction cell numbered i,
Figure PCTCN2022092571-appb-000008
is the ordinate of the center point of the predicted cell with number j, H is the width of the table image, and c is the adjustment parameter.
可以理解的是,确定邻接矩阵中第i行第j列的元素的列维度的取值还可为其他方式,这里不再赘述。It can be understood that other methods may be used to determine the value of the column dimension of the element in row i and column j in the adjacency matrix, which will not be repeated here.
由此,该方法可综合考虑编号为i、j的预测单元格的中心点的横坐标、表格图像的宽度、调整参数对邻接矩阵中第i行第j列的元素的行维度的取值的影响,以及综合考虑编号为i、j的预测单元格的中心点的纵坐标、表格图像的高度、调整参数对邻接矩阵中第i行第j列的元素的列维度的取值的影响。Therefore, this method can comprehensively consider the abscissa of the center point of the prediction cell numbered i and j, the width of the table image, and the value of the adjustment parameter to the row dimension of the element in the i-th row and j-th column in the adjacency matrix. influence, and comprehensively consider the ordinate of the center point of the prediction cell numbered i and j, the height of the table image, and the influence of adjustment parameters on the value of the column dimension of the element in row i and column j in the adjacency matrix.
在上述任一实施例的基础上,如图3所示,步骤S103中根据任一预测单元格的第一位置和邻接矩阵,得到任一预测单元格的融合结点特征,包括步骤S301-S302。On the basis of any of the above embodiments, as shown in Figure 3, in step S103, according to the first position and adjacency matrix of any prediction cell, the fusion node features of any prediction cell are obtained, including steps S301-S302 .
S301,根据任一预测单元格的第一位置,得到任一预测单元格的结点特征。S301. According to the first position of any predicted cell, obtain the node feature of any predicted cell.
本公开的实施例中,可根据任一预测单元格的第一位置,得到任一预测单元格的结点特征,使得结点特征可与预测单元格的第一位置相匹配。In the embodiment of the present disclosure, the node feature of any predicted cell can be obtained according to the first position of any predicted cell, so that the node feature can match the first position of the predicted cell.
在一种实施方式中,根据任一预测单元格的第一位置,得到任一预测单元格的结点特征,可包括将任一预测单元格的第一位置输入至特征提取算法中,由特征提取算法从第一位置中提取出任一预测单元格的结点特征。其中,特征提取算法可根据实际情况进行设置,这里不做过多限定。In one embodiment, according to the first position of any predicted cell, obtaining the node feature of any predicted cell may include inputting the first position of any predicted cell into the feature extraction algorithm, and the feature The extraction algorithm extracts the node features of any prediction cell from the first position. Wherein, the feature extraction algorithm can be set according to the actual situation, and there is no excessive limitation here.
S302,将结点特征和邻接矩阵输入至图卷积网络GCN中,由图卷积网络将结点特征与邻接矩阵进行特征融合,生成任一预测单元格的融合结点特征。S302. Input the node features and adjacency matrix into the graph convolutional network GCN, and perform feature fusion of the node features and the adjacency matrix by the graph convolutional network to generate fusion node features of any prediction unit.
本公开的实施例中,可将结点特征和邻接矩阵输入至图卷积网络(Graph Convolutional Network,GCN)中,由图卷积网络将结点特征与邻接矩阵进行特征融合,生成任一预测单元格的融合结点特征,即可通过图卷积网络利用邻接矩阵重构结点特征,生成融合结点特征。其中,图卷积网络可根据实际情况进行设置,这里不做过多限定。In the embodiment of the present disclosure, the node features and adjacency matrix can be input into the graph convolutional network (Graph Convolutional Network, GCN), and the node feature and the adjacency matrix are fused by the graph convolutional network to generate any prediction The fused node feature of the cell can be reconstructed by using the adjacency matrix through the graph convolutional network to generate the fused node feature. Among them, the graph convolutional network can be set according to the actual situation, and there are no too many restrictions here.
在一种实施方式中,采用如下公式计算融合结点特征:In one embodiment, the fusion node features are calculated using the following formula:
X'=ReLU(GCN(X,A))X'=ReLU(GCN(X,A))
其中,X'为融合结点特征,X为结点特征,A为邻接矩阵,ReLU(·)为激活函数。Among them, X' is the fusion node feature, X is the node feature, A is the adjacency matrix, and ReLU(·) is the activation function.
由此,该方法可根据任一预测单元格的第一位置,得到任一预测单元格的结点特征,并将结点特征和邻接矩阵输入至图卷积网络GCN中,由图卷积网络将结点特征与邻接矩阵进行特征融合,生成任一预测单元格的融合结点特征。Therefore, this method can obtain the node features of any prediction cell according to the first position of any prediction cell, and input the node features and adjacency matrix into the graph convolutional network GCN, and the graph convolutional network The feature fusion of the node features and the adjacency matrix is performed to generate the fusion node features of any prediction cell.
在上述任一实施例的基础上,如图4所示,步骤S301中根据任一预测单元格的第一位置,得到任一预测单元格的结点特征,包括步骤S401-S403。On the basis of any of the above embodiments, as shown in FIG. 4 , in step S301 , according to the first position of any predicted cell, the node features of any predicted cell are obtained, including steps S401 - S403 .
S401,对任一预测单元格的第一位置进行线性映射,得到任一预测单元格的空间特征。S401. Perform linear mapping on the first position of any prediction cell to obtain the spatial characteristics of any prediction cell.
可以理解的是,第一位置可为一维或者多维向量。例如,第一位置包括预测单元格的中心点的二维坐标、预测单元格的宽度和高度时,第一位置为4维向量,可用
Figure PCTCN2022092571-appb-000009
来表示,其中,b i为编号为i的预测单元格的第一位置,
Figure PCTCN2022092571-appb-000010
为编号为i的预测单元格的中心点的横坐标,
Figure PCTCN2022092571-appb-000011
为编号为i的预测单元格的中心点的纵坐标,
Figure PCTCN2022092571-appb-000012
为编号为i的预测单元格的宽度,
Figure PCTCN2022092571-appb-000013
为编号为i的预测单元格的高度。
It can be understood that the first position may be a one-dimensional or multi-dimensional vector. For example, when the first position includes the two-dimensional coordinates of the center point of the predicted cell, the width and height of the predicted cell, the first position is a 4-dimensional vector, which can be used
Figure PCTCN2022092571-appb-000009
to represent, where b i is the first position of the prediction cell numbered i,
Figure PCTCN2022092571-appb-000010
is the abscissa of the center point of the prediction cell numbered i,
Figure PCTCN2022092571-appb-000011
is the ordinate of the center point of the predicted cell numbered i,
Figure PCTCN2022092571-appb-000012
is the width of the predicted cell numbered i,
Figure PCTCN2022092571-appb-000013
predict the height of the cell numbered i.
本公开的实施例中,可对任一预测单元格的第一位置进行线性映射,得到任一预测单元格的空间特征。可以理解的是,任一预测单元格的空间特征与第一位置相匹配。In the embodiments of the present disclosure, a linear mapping may be performed on the first position of any prediction cell to obtain the spatial characteristics of any prediction cell. It can be appreciated that the spatial characteristics of any predicted cell match the first location.
在一种实施方式中,对任一预测单元格的第一位置进行线性映射,得到任一预测单元格的空间特征,可包括将任一预测单元格的第一位置输入至线性映射算法,由线性映射算法对第一位置进行线性映射,得到任一预测单元格的空间特征。其中,线性映射算法可根据实际情况进行设置,这里不做过多限定。In one embodiment, performing linear mapping on the first position of any predicted cell to obtain the spatial characteristics of any predicted cell may include inputting the first position of any predicted cell into a linear mapping algorithm, by The linear mapping algorithm performs linear mapping on the first position to obtain the spatial characteristics of any predicted cell. Wherein, the linear mapping algorithm can be set according to the actual situation, and there is no excessive limitation here.
S402,基于任一预测单元格的第一位置,从表格图像中提取出任一预测单元格的视觉语义特征。S402. Based on the first position of any prediction cell, extract the visual semantic features of any prediction cell from the table image.
本公开的实施例中,可基于任一预测单元格的第一位置,从表格图像中提取出任一预测单元格的视觉语义特征,使得视觉语义特征可与预测单元格的第一位置相匹配。In the embodiments of the present disclosure, based on the first position of any predicted cell, the visual semantic feature of any predicted cell can be extracted from the table image, so that the visual semantic feature can match the first position of the predicted cell.
本公开的实施例中,基于任一预测单元格的第一位置,从表格图像中提取出任一预测单元格的视觉语义特征,可包括基于任一预测单元格的第一位置,确定任一预测单元格在表格图像上占用的区域,并从表格图像中对应的区域中提取出视觉语义特征,作为任一预测单元格的视觉语义特征。In the embodiment of the present disclosure, extracting the visual semantic features of any predicted cell from the table image based on the first position of any predicted cell may include determining any predicted The area occupied by the cell on the table image, and the visual semantic feature is extracted from the corresponding area in the table image as the visual semantic feature of any predicted cell.
在一种实施方式中,基于任一预测单元格的第一位置,从表格图像中提取出任一预测单元格的视觉语义特征,可包括基于任一预测单元格的第一位置,从表格图像包含的像素点中确定任一预测单元格包含的目标像素点,并从表格图像中提取出目标像素点的视觉语义特征,作为任一预测单元格的视觉语义特征。In one embodiment, based on the first position of any predicted cell, extracting the visual semantic features of any predicted cell from the table image may include, based on the first position of any predicted cell, extracting from the table image Determine the target pixel contained in any prediction cell from the pixels in the table image, and extract the visual semantic feature of the target pixel from the table image as the visual semantic feature of any prediction cell.
可以理解的是,表格图像包含多个像素点,可基于任一预测单元格的第一位置,从表格图像包含的像素点中确定任一预测单元格包含的目标像素点。应说明的是,目标像素点指的是位于预测单元格占用的区域内的像素点。It can be understood that the table image includes a plurality of pixel points, and based on the first position of any prediction cell, the target pixel point included in any prediction cell can be determined from the pixels included in the table image. It should be noted that the target pixel point refers to a pixel point located in the area occupied by the prediction cell.
在一种实施方式中,从表格图像中提取出目标像素点的视觉语义特征,作为任一预测单元格的视觉语义特征,可包括从表格图像中提取出每个像素点的视觉语义特征,按照预设提取算法从视觉语义特征中提取出目标像素点的视觉语义特征。其中,提取算法可根据实际情况进行设置,这里不做过多限定, 例如可为RoIAlign算法。In one embodiment, extracting the visual semantic feature of the target pixel from the table image as the visual semantic feature of any predicted cell may include extracting the visual semantic feature of each pixel from the table image, according to The preset extraction algorithm extracts the visual semantic features of the target pixels from the visual semantic features. Wherein, the extraction algorithm can be set according to the actual situation, and it is not limited here too much, for example, it can be the RoIAlign algorithm.
S403,将任一预测单元格的空间特征和视觉语义特征进行拼接,得到任一预测单元格的结点特征。S403. Concatenate the spatial features and visual semantic features of any prediction cell to obtain node features of any prediction cell.
在一种实施方式中,可将任一预测单元格的空间特征和视觉语义特征进行横向拼接,得到任一预测单元格的结点特征。例如,任一预测单元格的空间特征、视觉语义特征分别为X s、X v,X s、X v分别为256维、1024维的向量,则可将X s、X v进行横向拼接,得到任一预测单元格的结点特征为1280维的向量。 In one embodiment, the spatial features and visual semantic features of any prediction cell can be combined horizontally to obtain the node features of any prediction cell. For example, if the spatial features and visual semantic features of any predicted cell are X s and X v respectively, and X s and X v are vectors of 256 dimensions and 1024 dimensions respectively, X s and X v can be spliced horizontally to obtain The node feature of any predicted cell is a 1280-dimensional vector.
由此,该方法可分别基于任一预测单元格的第一位置得到空间特征和视觉语义特征,并将空间特征和视觉语义特征进行拼接,得到任一预测单元格的结点特征。Therefore, the method can obtain spatial features and visual semantic features based on the first position of any prediction cell respectively, and splicing the spatial features and visual semantic features to obtain the node features of any prediction cell.
在上述任一实施例的基础上,步骤S104中根据任一预测单元格的融合结点特征,得到任一预测单元格的第二位置,可包括如下两种可能的实施方式:On the basis of any of the above-mentioned embodiments, in step S104, the second position of any prediction unit is obtained according to the fusion node features of any prediction unit, which may include the following two possible implementation modes:
方式1、如图5所示,步骤S104中根据任一预测单元格的融合结点特征,得到任一预测单元格的第二位置,可包括步骤S501-S502。Method 1. As shown in FIG. 5 , in step S104 , according to the fusion node features of any prediction unit, the second position of any prediction unit is obtained, which may include steps S501 - S502 .
S501,基于任一预测单元格的融合结点特征,得到任一预测单元格在每个候选第二位置下的预测概率。S501. Based on the fusion node feature of any prediction unit, obtain the prediction probability of any prediction unit at each candidate second position.
以第二位置为预测单元格的起始行为例,若表格的行数为T,候选第二位置包括第1、2至T行,则可基于任一预测单元格的融合结点特征,得到任一预测单元格在第1、2至T行下的预测概率。Taking the second position as the starting line of the predicted cell as an example, if the number of rows in the table is T, and the candidate second position includes rows 1, 2 to T, then based on the fusion node characteristics of any predicted cell, we can get The predicted probability of any predicted cell under rows 1, 2 to T.
S502,从任一预测单元格在每个候选第二位置下的预测概率中获取最大预测概率,并将最大预测概率对应的候选第二位置确定为任一预测单元格的第二位置。S502. Obtain the maximum prediction probability from the prediction probability of any prediction unit at each candidate second position, and determine the candidate second position corresponding to the maximum prediction probability as the second position of any prediction unit.
本公开的实施例中,任一预测单元格在每个候选第二位置下的预测概率可能不同,预测概率越大表明候选第二位置为第二位置的可能性越大,则可从任一预测单元格在每个候选第二位置下的预测概率中获取最大预测概率,并将最大预测概率对应的候选第二位置确定为任一预测单元格的第二位置。In the embodiment of the present disclosure, the prediction probability of any prediction unit under each candidate second position may be different, and the greater the prediction probability, the greater the possibility that the candidate second position is the second position. The prediction unit obtains the maximum prediction probability from the prediction probabilities under each candidate second position, and determines the candidate second position corresponding to the maximum prediction probability as the second position of any prediction unit.
继续以第二位置为预测单元格的起始行为例,若表格的行数为T,候选第二位置包括第1、2至T行,任一预测单元格在第1、2至T行下的预测概率分别为P 1、P 2至P T,P 1、P 2至P T中的最大值为P 2,则可将第2行作为预测单元格的起始行。 Continuing to take the second position as the starting line of the predicted cell as an example, if the number of rows in the table is T, the candidate second position includes rows 1, 2 to T, and any predicted cell is under rows 1, 2 to T The predicted probabilities of P 1 , P 2 to PT are respectively, and the maximum value among P 1 , P 2 to PT is P 2 , then row 2 can be used as the starting row of the predicted cell.
由此,该方法可基于任一预测单元格的融合结点特征,得到任一预测单元格在每个候选第二位置下的预测概率,并从任一预测单元格在每个候选第二位置下的预测概率中获取最大预测概率,并将最大预测概率对应的候选第二位置确定为任一预测单元格的第二位置。Therefore, this method can obtain the prediction probability of any prediction unit at each candidate second position based on the fusion node features of any prediction unit, and obtain the prediction probability of any prediction unit at each candidate second position from any prediction unit at each candidate second position The maximum predicted probability is obtained from the predicted probabilities below, and the candidate second position corresponding to the maximum predicted probability is determined as the second position of any predicted cell.
方式2、如图6所示,步骤S104中根据任一预测单元格的融合结点特征,得到任一预测单元格的第二位置,可包括步骤S601-S604。Method 2. As shown in FIG. 6 , in step S104 , according to the fusion node features of any prediction unit, the second position of any prediction unit is obtained, which may include steps S601 - S604 .
S601,针对任一预测单元格,建立目标向量,目标向量包括n个维度,n为任一预测单元格的候选第二位置的数量。S601. For any prediction unit, establish a target vector, where the target vector includes n dimensions, and n is the number of candidate second positions of any prediction unit.
以第二位置为预测单元格的起始行为例,若表格的行数为T,候选第二位置包括第1、2至T行,则此时目标向量包括T个维度。Taking the second position as the starting row of the predicted cell as an example, if the number of rows in the table is T, and the candidate second position includes rows 1, 2 to T, then the target vector includes T dimensions.
S602,基于任一预测单元格的融合结点特征,得到目标向量的任一向量维度的取值为0或1的预 测概率。S602. Obtain the predicted probability of any vector dimension of the target vector being 0 or 1 based on the fusion node feature of any prediction unit.
继续以第二位置为预测单元格的起始行为例,若表格的行数为T,候选第二位置包括第1、2至T行,目标向量包括T个维度,则可基于任一预测单元格的融合结点特征,得到目标向量的第1、2至T向量维度的取值为0或1的预测概率。Continuing to take the second position as the starting line of the prediction unit as an example, if the number of rows in the table is T, the candidate second position includes rows 1, 2 to T, and the target vector includes T dimensions, then it can be based on any prediction unit The fusion node features of the grid can be used to obtain the predicted probability of 0 or 1 for the 1st, 2nd to T vector dimensions of the target vector.
S603,从任一向量维度的取值为0或1的预测概率中获取最大预测概率,并将最大预测概率对应的取值确定为任一向量维度的目标取值。S603. Obtain the maximum predicted probability from the predicted probability of any vector dimension with a value of 0 or 1, and determine the value corresponding to the maximum predicted probability as the target value of any vector dimension.
本公开的实施例中,任一向量维度的取值为0或1的预测概率可能不同,取值为0的预测概率较大表明任一向量维度的取值为0的可能性较大,反之,取值为1的预测概率较大表明任一向量维度的取值为1的可能性较大,则可从任一向量维度的取值为0或1的预测概率中获取最大预测概率,并将最大预测概率对应的取值确定为任一向量维度的目标取值。In the embodiments of the present disclosure, the predicted probability of any vector dimension taking a value of 0 or 1 may be different, and a larger predicted probability of taking a value of 0 indicates that the possibility of taking a value of any vector dimension is more likely to be 0, and vice versa , the larger the predicted probability of 1 indicates that the possibility of any vector dimension to be 1 is greater, then the maximum predicted probability can be obtained from the predicted probability of any vector dimension with a value of 0 or 1, and Determine the value corresponding to the maximum predicted probability as the target value of any vector dimension.
继续以第二位置为预测单元格的起始行为例,若表格的行数为T,候选第二位置包括第1、2至T行,目标向量包括T个维度,目标向量的第m向量维度的取值为0或1的预测概率分别为
Figure PCTCN2022092571-appb-000014
Figure PCTCN2022092571-appb-000015
中的最大值为
Figure PCTCN2022092571-appb-000016
则目标向量的第m向量维度的目标取值为1。其中,1≤m≤T。
Continuing to take the second position as the starting line of the predicted cell as an example, if the number of rows in the table is T, the candidate second position includes rows 1, 2 to T, the target vector includes T dimensions, and the mth vector dimension of the target vector The predicted probabilities of 0 or 1 are respectively
Figure PCTCN2022092571-appb-000014
Figure PCTCN2022092571-appb-000015
The maximum value in
Figure PCTCN2022092571-appb-000016
Then the target value of the mth vector dimension of the target vector is 1. Among them, 1≤m≤T.
S604,基于向量维度的目标取值的和值,得到任一预测单元格的第二位置。S604. Obtain the second position of any prediction cell based on the sum of the target values of the vector dimension.
本公开的实施例中,目标向量的向量维度的目标取值的和值与第二位置具有对应关系,则可基于向量维度的目标取值的和值,查询对应关系,确定对应的第二位置。应说明的是,上述对应关系可根据实际情况进行设置,这里不做过多限定。In the embodiment of the present disclosure, the sum of the target values of the vector dimension of the target vector has a corresponding relationship with the second position, then the corresponding relationship can be queried based on the sum of the target values of the vector dimension, and the corresponding second position can be determined . It should be noted that the above corresponding relationship may be set according to actual conditions, and is not limited here.
在一种实施方式中,针对编号为i的预测单元格,可采用如下公式将每个候选第二位置的编号转换为候选向量:In one embodiment, for the prediction cell numbered i, the number of each candidate second position can be converted into a candidate vector by using the following formula:
Figure PCTCN2022092571-appb-000017
Figure PCTCN2022092571-appb-000017
其中,候选向量包括n个维度,n为候选第二位置的数量,
Figure PCTCN2022092571-appb-000018
为候选向量的第t向量维度的取值,r i为候选第二位置的编号,0≤r i≤n-1,1≤t≤n。
Wherein, the candidate vector includes n dimensions, and n is the number of candidate second positions,
Figure PCTCN2022092571-appb-000018
is the value of the t-th vector dimension of the candidate vector, r i is the number of the second position of the candidate, 0≤r i ≤n-1, 1≤t≤n.
继续以第二位置为预测单元格的起始行为例,若表格的行数为3,候选第二位置包括第1、2至3行,即候选第二位置的编号为0、1、2,分别对应第1、2、3行,则可按照上述公式将候选第二位置的编号0、1、2转换为候选向量(0,0,0)、(1,0,0)、(1,1,0)。Continuing to take the second position as the starting row of the predicted cell as an example, if the number of rows in the table is 3, the candidate second position includes rows 1, 2 to 3, that is, the numbers of the candidate second positions are 0, 1, 2, Corresponding to rows 1, 2, and 3 respectively, the numbers 0, 1, and 2 of the candidate second position can be converted into candidate vectors (0,0,0), (1,0,0), (1, 1,0).
此时可基于目标向量的所有向量维度的目标取值的和值与1的目标和值确定第二位置的编号。若目标向量的所有向量维度的目标取值的和值为2,则可确定预测单元格的起始行的编号为3,即预测单元格的起始行为第3行。At this time, the number of the second position may be determined based on the sum of target values of all vector dimensions of the target vector and the target sum of 1. If the sum of the target values of all vector dimensions of the target vector is 2, it can be determined that the number of the starting row of the predicted cell is 3, that is, the starting row of the predicted cell is the third row.
由此,该方法可针对任一预测单元格,建立目标向量,并基于任一预测单元格的融合结点特征,确定目标向量的任一向量维度的取值,并根据向量维度的目标取值的和值,得到任一预测单元格的第二位置,得到的第二位置的准确性更好。Therefore, this method can establish a target vector for any prediction unit, and determine the value of any vector dimension of the target vector based on the fusion node characteristics of any prediction unit, and according to the target value of the vector dimension The sum value of , to get the second position of any predicted cell, the accuracy of the second position obtained is better.
需要说明的是,本公开实施例中的第二位置的获取方法适用于任一类型的第二位置。在一种实施方 式中,本公开实施例中的第二位置的获取方法适用于确定预测单元格的起始行的编号、终止行的编号、起始列的编号、终止列的编号。It should be noted that the method for acquiring a second location in the embodiments of the present disclosure is applicable to any type of second location. In one implementation, the method for obtaining the second position in the embodiment of the present disclosure is suitable for determining the number of the start row, the number of the end row, the number of the start column, and the number of the end column of the prediction cell.
在上述任一实施例的基础上,步骤S101中获取表格图像中预测单元格的第一位置,可包括从表格图像中提取出每个像素点的视觉语义特征,基于视觉语义特征得到每个像素点在每个类别下的识别概率,从任一像素点在每个类别下的识别概率中获取最大识别概率,并将最大识别概率对应的类别确定为任一像素点对应的目标类别,识别由目标类别为单元格的像素点构成的连通域,将连通域的最小外接矩形确定为预测单元格的检测框,并基于检测框获取预测单元格的第一位置。On the basis of any of the above-mentioned embodiments, obtaining the first position of the predicted cell in the table image in step S101 may include extracting the visual semantic feature of each pixel from the table image, and obtaining each pixel based on the visual semantic feature The recognition probability of a point under each category, the maximum recognition probability is obtained from the recognition probability of any pixel point under each category, and the category corresponding to the maximum recognition probability is determined as the target category corresponding to any pixel point, and the recognition is performed by The target category is a connected domain formed by the pixels of cells, the smallest bounding rectangle of the connected domain is determined as the detection frame of the predicted cell, and the first position of the predicted cell is obtained based on the detection frame.
其中,类别包括但不限于背景、单元格、边界线。Among them, the category includes but not limited to background, cell, border line.
其中,基于视觉语义特征得到每个像素点在每个类别下的识别概率,可包括将任一像素点的视觉语义特征输入至分类算法中,由分类算法根据视觉语义特征进行类别预测,生成任一像素点在每个类别下的识别概率。其中,分类算法可根据实际情况进行设置,这里不做过多限定。Among them, the recognition probability of each pixel in each category is obtained based on the visual semantic features, which may include inputting the visual semantic features of any pixel into the classification algorithm, and the classification algorithm performs category prediction according to the visual semantic features to generate any The recognition probability of a pixel in each category. Wherein, the classification algorithm can be set according to the actual situation, and there is no excessive limitation here.
需要说明的是,基于检测框获取预测单元格的第一位置的相关内容可参见上述实施例,这里不再赘述。It should be noted that, for relevant content of obtaining the first position of the predicted cell based on the detection frame, reference may be made to the above-mentioned embodiments, and details are not repeated here.
与上述图1至图6实施例提供的单元格位置的检测方法相对应,本公开还提供一种单元格位置的检测模型,该检测模型的输入为表格图像,输出为表格图像中预测单元格的第一位置和第二位置。Corresponding to the detection method of the cell position provided by the above-mentioned embodiments in FIGS. 1 to 6 , the present disclosure also provides a detection model of the cell position. The input of the detection model is a table image, and the output is the predicted cell in the table image. The first and second positions of .
如图7所示,该检测模型包括视觉语义特征提取层、第一分类层、结点特征提取层、图重构网络层、第二分类层。As shown in Figure 7, the detection model includes a visual semantic feature extraction layer, a first classification layer, a node feature extraction layer, a graph reconstruction network layer, and a second classification layer.
其中,视觉语义特征提取层用于从表格图像中提取出每个像素点的视觉语义特征。Among them, the visual semantic feature extraction layer is used to extract the visual semantic feature of each pixel from the table image.
其中,第一分类层用于基于视觉语义特征得到每个像素点在每个类别下的识别概率,进而根据识别概率确定任一像素点对应的目标类别,并识别由目标类别为单元格的像素点构成的连通域,将连通域的最小外接矩形确定为预测单元格的检测框,并基于检测框获取预测单元格的第一位置。Among them, the first classification layer is used to obtain the recognition probability of each pixel in each category based on the visual semantic features, and then determine the target category corresponding to any pixel according to the recognition probability, and identify the pixel whose target category is the cell A connected domain composed of points, the smallest bounding rectangle of the connected domain is determined as the detection frame of the predicted cell, and the first position of the predicted cell is obtained based on the detection frame.
其中,结点特征提取层用于根据任一预测单元格的第一位置,得到任一预测单元格的结点特征。Wherein, the node feature extraction layer is used to obtain the node feature of any predicted cell according to the first position of any predicted cell.
其中,图重构网络层用于将结点特征与邻接矩阵进行特征融合,生成任一预测单元格的融合结点特征。Among them, the graph reconstruction network layer is used for feature fusion of node features and adjacency matrix to generate fusion node features of any prediction unit.
其中,第二分类层用于根据任一预测单元格的融合结点特征,得到任一预测单元格的第二位置。Wherein, the second classification layer is used to obtain the second position of any prediction unit according to the fusion node feature of any prediction unit.
与上述图1至图6实施例提供的单元格位置的检测方法相对应,本公开还提供一种单元格位置的检测装置,由于本公开实施例提供的单元格位置的检测装置与上述图1至图6实施例提供的单元格位置的检测方法相对应,因此单元格位置的检测方法的实施方式也适用于本公开实施例提供的单元格位置的检测装置,在本公开实施例中不再详细描述。Corresponding to the method for detecting the position of a cell provided by the embodiments of FIGS. 1 to 6 above, the present disclosure also provides a device for detecting a position of a cell. Corresponding to the detection method of the cell position provided in the embodiment of FIG. 6, the implementation of the detection method of the cell position is also applicable to the detection device of the cell position provided in the embodiment of the present disclosure, and will not be described in the embodiment of the present disclosure. A detailed description.
图8为根据本公开一个实施例的单元格位置的检测装置的结构示意图。FIG. 8 is a schematic structural diagram of a device for detecting cell positions according to an embodiment of the present disclosure.
如图8所示,本公开实施例的单元格位置的检测装置100可以包括:第一获取模块110、第二获取模块120、第三获取模块130和第四获取模块140。As shown in FIG. 8 , the cell position detection device 100 of the embodiment of the present disclosure may include: a first acquisition module 110 , a second acquisition module 120 , a third acquisition module 130 and a fourth acquisition module 140 .
第一获取模块110,用于获取表格图像中预测单元格的第一位置,其中,所述第一位置用于表征所述预测单元格占用的区域在所述表格图像中的位置;The first acquisition module 110 is configured to acquire a first position of a prediction cell in the form image, wherein the first position is used to represent the position of the area occupied by the prediction cell in the form image;
第二获取模块120,用于根据所述第一位置,得到所述表格图像的邻接矩阵,其中,所述表格图像 中的每个所述预测单元格为一个结点,所述邻接矩阵用于表示所述预测单元格之间的位置关系;The second obtaining module 120 is configured to obtain the adjacency matrix of the table image according to the first position, wherein each of the prediction cells in the table image is a node, and the adjacency matrix is used for Representing the positional relationship between the predicted cells;
第三获取模块130,用于根据任一预测单元格的第一位置和所述邻接矩阵,得到所述任一预测单元格的融合结点特征;The third acquisition module 130 is configured to obtain the fusion node feature of any prediction unit according to the first position of any prediction unit and the adjacency matrix;
第四获取模块140,用于根据所述任一预测单元格的融合结点特征,得到所述任一预测单元格的第二位置,其中,所述第二位置用于表征所述预测单元格的所属行和/或所属列。The fourth obtaining module 140 is configured to obtain the second position of any prediction unit according to the fusion node feature of the prediction unit, wherein the second position is used to characterize the prediction unit The row and/or column of the .
在本公开的一个实施例中,所述第一位置包括所述预测单元格的中心点的二维坐标、所述预测单元格的宽度、所述预测单元格的高度中的至少一种。In an embodiment of the present disclosure, the first position includes at least one of the two-dimensional coordinates of the center point of the predicted cell, the width of the predicted cell, and the height of the predicted cell.
在本公开的一个实施例中,所述第二获取模块120,还用于:基于所述第一位置和所述预测单元格的编号,确定所述邻接矩阵中对应元素的取值。In an embodiment of the present disclosure, the second obtaining module 120 is further configured to: determine the value of a corresponding element in the adjacency matrix based on the first position and the number of the predicted cell.
在本公开的一个实施例中,所述第二获取模块120,还用于:获取所述预测单元格的数量n,并按照编号1至n对每个所述预测单元格进行连续编号,其中,所述n为大于1的整数;从所述第一位置中提取出所述编号为i、j的所述预测单元格的中心点的横坐标和纵坐标,其中,1≤i≤n,1≤j≤n;获取所述表格图像的宽度和高度,以及调整参数;获取所述编号为i、j的所述预测单元格的中心点的横坐标的差值与所述宽度的第一比值,并基于所述第一比值和所述调整参数的乘积确定所述邻接矩阵中第i行第j列的元素的行维度的取值;获取所述编号为i、j的所述预测单元格的中心点的纵坐标的差值与所述高度的第二比值,并基于所述第二比值和所述调整参数的乘积确定所述邻接矩阵中第i行第j列的元素的列维度的取值。In an embodiment of the present disclosure, the second obtaining module 120 is further configured to: obtain the number n of the predicted cells, and serially number each predicted cell according to numbers 1 to n, wherein , the n is an integer greater than 1; the abscissa and ordinate of the center point of the prediction cell numbered i and j are extracted from the first position, wherein, 1≤i≤n, 1≤j≤n; obtain the width and height of the table image, and adjust parameters; obtain the difference between the abscissa of the center point of the prediction cell numbered i and j and the first value of the width ratio, and based on the product of the first ratio and the adjustment parameter, determine the value of the row dimension of the i-th row and j-th column element in the adjacency matrix; obtain the prediction unit numbered i, j The difference between the vertical coordinates of the central point of the grid and the second ratio of the height, and based on the product of the second ratio and the adjustment parameter, determine the column dimension of the element in the i-th row and j-th column in the adjacency matrix value of .
在本公开的一个实施例中,所述第三获取模块130,包括:获取单元,用于根据所述任一预测单元格的第一位置,得到所述任一预测单元格的结点特征;融合单元,用于将所述结点特征和所述邻接矩阵输入至图卷积网络GCN中,由所述图卷积网络将所述结点特征与所述邻接矩阵进行特征融合,生成所述任一预测单元格的融合结点特征。In an embodiment of the present disclosure, the third acquisition module 130 includes: an acquisition unit, configured to obtain the node feature of any prediction unit according to the first position of any prediction unit; The fusion unit is used to input the node features and the adjacency matrix into the graph convolution network GCN, and the graph convolution network performs feature fusion of the node features and the adjacency matrix to generate the Fusion node features for any predicted cell.
在本公开的一个实施例中,所述获取单元,包括:映射子单元,用于对所述任一预测单元格的第一位置进行线性映射,得到所述任一预测单元格的空间特征;提取子单元,用于基于所述任一预测单元格的第一位置,从所述表格图像中提取出所述任一预测单元格的视觉语义特征;拼接子单元,用于将所述任一预测单元格的所述空间特征和所述视觉语义特征进行拼接,得到所述任一预测单元格的结点特征。In an embodiment of the present disclosure, the acquisition unit includes: a mapping subunit, configured to linearly map the first position of any prediction cell to obtain the spatial characteristics of any prediction cell; The extraction subunit is used to extract the visual semantic features of any prediction cell from the table image based on the first position of any prediction cell; the splicing subunit is used to combine any prediction cell The spatial feature and the visual semantic feature of the prediction cell are spliced to obtain the node feature of any prediction cell.
在本公开的一个实施例中,所述提取子单元,还用于:基于所述任一预测单元格的第一位置,从所述表格图像包含的像素点中确定所述任一预测单元格包含的目标像素点;从所述表格图像中提取出所述目标像素点的视觉语义特征,作为所述任一预测单元格的所述视觉语义特征。In an embodiment of the present disclosure, the extracting subunit is further configured to: determine the any prediction unit from the pixels contained in the table image based on the first position of the any prediction unit Included target pixels; extracting the visual semantic features of the target pixel points from the table image as the visual semantic features of any prediction unit.
在本公开的一个实施例中,所述第四获取模块140,还用于:基于所述任一预测单元格的融合结点特征,得到所述任一预测单元格在每个候选第二位置下的预测概率;从所述任一预测单元格在每个候选第二位置下的预测概率中获取最大预测概率,并将最大预测概率对应的候选第二位置确定为所述任一预测单元格的第二位置。In an embodiment of the present disclosure, the fourth acquisition module 140 is further configured to: obtain the location of each candidate second position of any prediction unit based on the fusion node feature of any prediction unit. The predicted probability under: Obtain the maximum predicted probability from the predicted probability of any predicted cell in each candidate second position, and determine the candidate second position corresponding to the maximum predicted probability as the any predicted cell the second position of .
在本公开的一个实施例中,所述第四获取模块140,还用于:针对所述任一预测单元格,建立目标向量,所述目标向量包括n个维度,所述n为所述任一预测单元格的候选第二位置的数量;基于所述任一预测单元格的融合结点特征,得到所述目标向量的任一向量维度的取值为0或1的预测概率;从所述 任一向量维度的取值为0或1的预测概率中获取最大预测概率,并将最大预测概率对应的取值确定为所述任一向量维度的目标取值;基于所述向量维度的目标取值的和值,得到所述任一预测单元格的第二位置。In an embodiment of the present disclosure, the fourth obtaining module 140 is further configured to: establish a target vector for any prediction unit, the target vector includes n dimensions, and n is the The number of candidate second positions of a prediction cell; based on the fusion node feature of any prediction cell, obtain the prediction probability that any vector dimension of the target vector is 0 or 1; from the Obtaining the maximum prediction probability from the prediction probability of the value of any vector dimension being 0 or 1, and determining the value corresponding to the maximum prediction probability as the target value of any vector dimension; based on the target value of the vector dimension The sum of the values yields the second position of either predicted cell.
在本公开的一个实施例中,所述第一获取模块110,还用于:从所述表格图像中提取出每个所述预测单元格的检测框,并基于所述检测框获取所述预测单元格的第一位置。In an embodiment of the present disclosure, the first obtaining module 110 is further configured to: extract the detection frame of each predicted cell from the table image, and obtain the prediction based on the detection frame The first position of the cell.
在本公开的一个实施例中,所述第二位置包括所述预测单元格的起始行的编号、终止行的编号、起始列的编号、终止列的编号中的至少一种。In an embodiment of the present disclosure, the second position includes at least one of a starting row number, an ending row number, a starting column number, and an ending column number of the prediction cell.
本公开实施例的单元格位置的检测装置,可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。The device for detecting the position of a cell in an embodiment of the disclosure can use the predicted cell as a node, and obtain an adjacency matrix based on the first position of the predicted cell, and then obtain the fusion node of the predicted cell according to the first position and the adjacency matrix feature, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cells, and the obtained fusion node feature of the predicted cell can be expressed better, and according to the fusion node feature Obtaining the second position of the predicted cell can simultaneously obtain the first position and the second position of the cell, and the obtained cell position is more comprehensive and more robust.
为了实现上述实施例,如图9所示,本公开实施例还提出一种电子设备200,包括:存储器210、处理器220及存储在存储器210上并可在处理器220上运行的计算机程序,处理器220执行程序时,实现如本公开前述实施例提出的单元格位置的检测方法。In order to realize the above-mentioned embodiments, as shown in FIG. 9 , an embodiment of the present disclosure also proposes an electronic device 200, including: a memory 210, a processor 220, and a computer program stored in the memory 210 and operable on the processor 220, When the processor 220 executes the program, it realizes the detection method of the cell position as proposed in the foregoing embodiments of the present disclosure.
本公开实施例的电子设备,通过处理器执行存储在存储器上的计算机程序,可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。In the electronic device of the embodiment of the present disclosure, the computer program stored in the memory can be executed by the processor, the predicted cell can be used as a node, and the adjacency matrix can be obtained based on the first position of the predicted cell, and then according to the first position and the adjacency matrix The fusion node feature of the prediction cell is obtained, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and the expression effect of the fusion node feature of the prediction cell obtained is better OK, and the second position of the predicted cell is obtained according to the fusion node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
为了实现上述实施例,本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,实现如本公开前述实施例提出的单元格位置的检测方法。In order to realize the above-mentioned embodiments, the embodiments of the present disclosure also propose a computer-readable storage medium, on which a computer program is stored. When the program is executed by a processor, the method for detecting the position of a cell as proposed in the foregoing embodiments of the present disclosure is implemented. .
在本公开的一个实施例中,所述计算机可读存储介质是非瞬时性计算机可读存储介质。In one embodiment of the present disclosure, the computer readable storage medium is a non-transitory computer readable storage medium.
本公开实施例的计算机可读存储介质,通过存储计算机程序并被处理器执行,可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。The computer-readable storage medium of the embodiment of the present disclosure stores the computer program and executes it by the processor. The prediction unit can be used as a node, and the adjacency matrix can be obtained based on the first position of the prediction unit, and then according to the first position and the adjacency matrix to obtain the fusion node features of the predicted cell, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cells, and the obtained representation effect of the fusion node feature of the predicted cell It is better, and the second position of the predicted cell is obtained according to the fusion node features, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust.
为了实现上述实施例,本公开实施例还提出一种计算机程序产品,该计算机程序产品中包括计算机程序代码,当该计算机程序代码在计算机上运行时,实现如本公开前述实施例提出的单元格位置的检测方法。In order to realize the above-mentioned embodiments, the embodiments of the present disclosure also propose a computer program product, the computer program product includes computer program code, when the computer program code is run on the computer, it realizes the unit cell as proposed in the foregoing embodiments of the present disclosure. Location detection method.
本公开实施例的计算机程序产品,该计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合 结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。The computer program product of the embodiment of the present disclosure, the computer program product includes computer program code, when the computer program code is run on the computer, the predicted cell can be used as a node, and based on the first position of the predicted cell, the The adjacency matrix, and then according to the first position and the adjacency matrix, the fusion node features of the prediction cell are obtained, so that the fusion node feature can match the first position of the prediction cell and the positional relationship between the prediction cells, and the obtained prediction The representation effect of the fusion node feature of the cell is better, and the second position of the predicted cell can be obtained according to the fusion node feature, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive , with better robustness.
为了实现上述实施例,本公开实施例还提出一种计算机程序,其中该计算机程序包括计算机程序代码,当该计算机程序代码在计算机上运行时,使得计算机执行如本公开前述实施例提出的单元格位置的检测方法。In order to realize the above-mentioned embodiments, the embodiments of the present disclosure also propose a computer program, wherein the computer program includes computer program codes, and when the computer program codes are run on the computer, the computer executes the unit cell as proposed in the foregoing embodiments of the present disclosure. Location detection method.
本公开实施例的计算机程序,包括计算机程序代码,当该计算机程序代码在计算机上运行时,计算机可将预测单元格作为结点,并基于预测单元格的第一位置得到邻接矩阵,进而根据第一位置和邻接矩阵得到预测单元格的融合结点特征,使得融合结点特征可与预测单元格的第一位置和预测单元格之间的位置关系相匹配,得到的预测单元格的融合结点特征的表示效果更好,并根据融合结点特征得到预测单元格的第二位置,可同时获取单元格的第一位置和第二位置,得到的单元格的位置更加全面,鲁棒性更好。The computer program of the embodiment of the present disclosure includes computer program code. When the computer program code is run on the computer, the computer can use the predicted cell as a node, and obtain an adjacency matrix based on the first position of the predicted cell, and then according to the first A position and adjacency matrix to obtain the fusion node feature of the predicted cell, so that the fusion node feature can match the first position of the predicted cell and the positional relationship between the predicted cell, and the obtained fusion node of the predicted cell The representation effect of the feature is better, and the second position of the predicted cell can be obtained according to the fusion node feature, and the first position and the second position of the cell can be obtained at the same time, and the obtained cell position is more comprehensive and more robust .
需要说明的是,前述对单元格位置的检测方法实施例的解释说明也适用于上述实施例中的计算机设备、计算机可读存储介质、计算机程序产品和计算机程序,此处不再赘述。It should be noted that the foregoing explanations of the embodiments of the method for detecting cell positions are also applicable to the computer equipment, computer-readable storage media, computer program products, and computer programs in the above embodiments, and will not be repeated here.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本公开的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present disclosure, "plurality" means at least two, such as two, three, etc., unless otherwise specifically defined.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本公开的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本公开的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing custom logical functions or steps of a process , and the scope of preferred embodiments of the present disclosure includes additional implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order depending on the functions involved, which shall It is understood by those skilled in the art to which the embodiments of the present disclosure pertain.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为 可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, which can be embodied in any computer-readable medium, For use with instruction execution systems, devices, or devices (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or devices and execute instructions), or in conjunction with these instruction execution systems, devices or equipment for use. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device or device. More specific examples (non-exhaustive list) of computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary. The program is processed electronically and stored in computer memory.
应当理解,本公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of the present disclosure may be implemented in hardware, software, firmware or a combination thereof. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: a discrete Logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium. During execution, one or a combination of the steps of the method embodiments is included.
此外,在本公开各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本公开的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本公开的限制,本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。The storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present disclosure have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limitations on the present disclosure, and those skilled in the art can understand the above-mentioned embodiments within the scope of the present disclosure. The embodiments are subject to changes, modifications, substitutions and variations.
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the scope of protection required by the present disclosure.

Claims (16)

  1. 一种单元格位置的检测方法,其特征在于,包括:A method for detecting a cell position, comprising:
    获取表格图像中预测单元格的第一位置,其中,所述第一位置用于表征所述预测单元格占用的区域在所述表格图像中的位置;Acquiring a first position of a predicted cell in the table image, wherein the first position is used to characterize the position of the area occupied by the predicted cell in the table image;
    根据所述第一位置,得到所述表格图像的邻接矩阵,其中,所述表格图像中的每个所述预测单元格为一个结点,所述邻接矩阵用于表示所述预测单元格之间的位置关系;According to the first position, the adjacency matrix of the table image is obtained, wherein each of the prediction cells in the table image is a node, and the adjacency matrix is used to represent the relationship between the prediction cells positional relationship;
    根据任一预测单元格的第一位置和所述邻接矩阵,得到所述任一预测单元格的融合结点特征;According to the first position of any prediction cell and the adjacency matrix, obtain the fusion node feature of any prediction cell;
    根据所述任一预测单元格的融合结点特征,得到所述任一预测单元格的第二位置,其中,所述第二位置用于表征所述预测单元格的所属行和/或所属列。According to the fusion node feature of any prediction cell, the second position of any prediction cell is obtained, wherein the second position is used to represent the row and/or column to which the prediction cell belongs .
  2. 根据权利要求1所述的方法,其特征在于,所述第一位置包括所述预测单元格的中心点的二维坐标、所述预测单元格的宽度、所述预测单元格的高度中的至少一种。The method according to claim 1, wherein the first position comprises at least one of the two-dimensional coordinates of the central point of the predicted cell, the width of the predicted cell, and the height of the predicted cell A sort of.
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述第一位置,得到所述表格图像的邻接矩阵,包括:The method according to claim 1 or 2, wherein said obtaining the adjacency matrix of said form image according to said first position comprises:
    基于所述第一位置和所述预测单元格的编号,确定所述邻接矩阵中对应元素的取值。Based on the first position and the number of the predicted cell, determine the value of the corresponding element in the adjacency matrix.
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述第一位置和所述预测单元格的编号,确定所述邻接矩阵中对应元素的取值,包括:The method according to claim 3, wherein the determination of the value of the corresponding element in the adjacency matrix based on the first position and the number of the predicted cell includes:
    获取所述预测单元格的数量n,并按照编号1至n对每个所述预测单元格进行连续编号,其中,所述n为大于1的整数;Acquiring the number n of the predicted cells, and sequentially numbering each of the predicted cells according to numbers 1 to n, wherein the n is an integer greater than 1;
    从所述第一位置中提取出所述编号为i、j的所述预测单元格的中心点的横坐标和纵坐标,其中,1≤i≤n,1≤j≤n;Extracting the abscissa and ordinate of the center point of the prediction cell numbered i and j from the first position, where 1≤i≤n, 1≤j≤n;
    获取所述表格图像的宽度和高度,以及调整参数;Obtain the width and height of the table image, and adjust parameters;
    获取所述编号为i、j的所述预测单元格的中心点的横坐标的差值与所述宽度的第一比值,并基于所述第一比值和所述调整参数的乘积确定所述邻接矩阵中第i行第j列的元素的行维度的取值;Acquiring the first ratio of the difference between the abscissa of the central point of the prediction cell numbered i and j to the width, and determining the adjacency based on the product of the first ratio and the adjustment parameter The value of the row dimension of the element in row i and column j in the matrix;
    获取所述编号为i、j的所述预测单元格的中心点的纵坐标的差值与所述高度的第二比值,并基于所述第二比值和所述调整参数的乘积确定所述邻接矩阵中第i行第j列的元素的列维度的取值。Obtaining a second ratio of the difference between the vertical coordinates of the center points of the prediction cells numbered i and j to the height, and determining the adjacency based on the product of the second ratio and the adjustment parameter The value of the column dimension of the element in row i and column j in the matrix.
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述根据任一预测单元格的第一位置和所述邻接矩阵,得到所述任一预测单元格的融合结点特征,包括:The method according to any one of claims 1-4, characterized in that, according to the first position of any prediction unit and the adjacency matrix, the fusion node features of any prediction unit are obtained ,include:
    根据所述任一预测单元格的第一位置,得到所述任一预测单元格的结点特征;Obtaining the node characteristics of any prediction cell according to the first position of any prediction cell;
    将所述结点特征和所述邻接矩阵输入至图卷积网络GCN中,由所述图卷积网络将所述结点特征与所述邻接矩阵进行特征融合,生成所述任一预测单元格的融合结点特征。The node features and the adjacency matrix are input into the graph convolutional network GCN, and the graph convolutional network performs feature fusion of the node features and the adjacency matrix to generate any of the prediction cells The fusion node features.
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述任一预测单元格的第一位置,得到所述任一预测单元格的结点特征,包括:The method according to claim 5, wherein, according to the first position of any of the predicted cells, obtaining the node characteristics of any of the predicted cells includes:
    对所述任一预测单元格的第一位置进行线性映射,得到所述任一预测单元格的空间特征;Performing linear mapping on the first position of any of the predicted cells to obtain the spatial characteristics of any of the predicted cells;
    基于所述任一预测单元格的第一位置,从所述表格图像中提取出所述任一预测单元格的视觉语义特征;Based on the first position of any of the predicted cells, extracting the visual semantic features of any of the predicted cells from the table image;
    将所述任一预测单元格的所述空间特征和所述视觉语义特征进行拼接,得到所述任一预测单元格的结点特征。The spatial feature and the visual semantic feature of the any prediction cell are spliced to obtain the node feature of the any prediction cell.
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述任一预测单元格的第一位置,从所述表格图像中提取出所述任一预测单元格的视觉语义特征,包括:The method according to claim 6, wherein the extracting the visual semantic feature of any prediction cell from the table image based on the first position of the prediction cell comprises:
    基于所述任一预测单元格的第一位置,从所述表格图像包含的像素点中确定所述任一预测单元格包含的目标像素点;Based on the first position of any prediction unit, determine a target pixel contained in any prediction unit from pixels contained in the table image;
    从所述表格图像中提取出所述目标像素点的视觉语义特征,作为所述任一预测单元格的所述视觉语义特征。The visual semantic feature of the target pixel is extracted from the table image as the visual semantic feature of any prediction cell.
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述根据所述任一预测单元格的融合结点特征,得到所述任一预测单元格的第二位置,包括:The method according to any one of claims 1-7, wherein the obtaining the second position of any prediction cell according to the fusion node feature of any prediction cell includes:
    基于所述任一预测单元格的融合结点特征,得到所述任一预测单元格在每个候选第二位置下的预测概率;Based on the fused node features of any of the prediction cells, the prediction probability of any of the prediction cells at each candidate second position is obtained;
    从所述任一预测单元格在每个候选第二位置下的预测概率中获取最大预测概率,并将最大预测概率对应的候选第二位置确定为所述任一预测单元格的第二位置。Obtaining the maximum prediction probability from the prediction probabilities of the any prediction unit at each candidate second position, and determining the candidate second position corresponding to the maximum prediction probability as the second position of the any prediction unit.
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,所述根据所述任一预测单元格的融合结点特征,得到所述任一预测单元格的第二位置,包括:The method according to any one of claims 1-8, wherein the obtaining the second position of any prediction cell according to the fusion node feature of any prediction cell includes:
    针对所述任一预测单元格,建立目标向量,所述目标向量包括n个维度,所述n为所述任一预测单元格的候选第二位置的数量;A target vector is established for any of the prediction cells, the target vector includes n dimensions, and the n is the number of candidate second positions of any of the prediction cells;
    基于所述任一预测单元格的融合结点特征,得到所述目标向量的任一向量维度的取值为0或1的预测概率;Based on the fusion node features of any prediction unit, the prediction probability of any vector dimension of the target vector being 0 or 1 is obtained;
    从所述任一向量维度的取值为0或1的预测概率中获取最大预测概率,并将最大预测概率对应的取值确定为所述任一向量维度的目标取值;Obtaining the maximum prediction probability from the prediction probability of any vector dimension whose value is 0 or 1, and determining the value corresponding to the maximum prediction probability as the target value of any vector dimension;
    基于所述向量维度的目标取值的和值,得到所述任一预测单元格的第二位置。Based on the sum of the target values of the vector dimensions, the second position of any prediction cell is obtained.
  10. 根据权利要求1-9中任一项所述的方法,其特征在于,所述获取表格图像中预测单元格的第一位置,包括:The method according to any one of claims 1-9, wherein said obtaining the first position of the predicted cell in the table image comprises:
    从所述表格图像中提取出每个所述预测单元格的检测框,并基于所述检测框获取所述预测单元格的第一位置。A detection frame of each prediction cell is extracted from the table image, and a first position of the prediction cell is obtained based on the detection frame.
  11. 根据权利要求1-10中任一项所述的方法,其特征在于,所述第二位置包括所述预测单元格的起始行的编号、终止行的编号、起始列的编号、终止列的编号中的至少一种。The method according to any one of claims 1-10, wherein the second position includes the number of the start row, the number of the end row, the number of the start column, the number of the end column of the predicted cell at least one of the numbers.
  12. 一种单元格位置的检测装置,其特征在于,包括:A detection device for a cell position, characterized in that it comprises:
    第一获取模块,用于获取表格图像中预测单元格的第一位置,其中,所述第一位置用于表征所述预测单元格占用的区域在所述表格图像中的位置;A first acquiring module, configured to acquire a first position of a predicted cell in the form image, wherein the first position is used to represent the position of the region occupied by the predicted cell in the form image;
    第二获取模块,用于根据所述第一位置,得到所述表格图像的邻接矩阵,其中,所述表格图像中的每个所述预测单元格为一个结点,所述邻接矩阵用于表示所述预测单元格之间的位置关系;The second acquisition module is configured to obtain the adjacency matrix of the table image according to the first position, wherein each of the prediction cells in the table image is a node, and the adjacency matrix is used to represent The positional relationship between the predicted cells;
    第三获取模块,用于根据任一预测单元格的第一位置和所述邻接矩阵,得到所述任一预测单元格的融合结点特征;The third acquisition module is used to obtain the fusion node feature of any prediction unit according to the first position of any prediction unit and the adjacency matrix;
    第四获取模块,用于根据所述任一预测单元格的融合结点特征,得到所述任一预测单元格的第二位置,其中,所述第二位置用于表征所述预测单元格的所属行和/或所属列。The fourth acquisition module is configured to obtain the second position of any prediction unit according to the fusion node feature of the prediction unit, wherein the second position is used to characterize the prediction unit Owning row and/or Owning column.
  13. 一种电子设备,其特征在于,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时,实现如权利要求1-11中任一项所述的单元格位置的检测方法。An electronic device, characterized by comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, any one of claims 1-11 can be realized. The detection method for the cell position described in item .
  14. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-11中任一项所述的单元格位置的检测方法。A computer-readable storage medium, on which a computer program is stored, wherein when the program is executed by a processor, the method for detecting the position of a cell according to any one of claims 1-11 is implemented.
  15. 一种计算机程序产品,其特征在于,所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,实现如权利要求1-11中任一项所述的方法。A computer program product, characterized in that the computer program product includes computer program code, and when the computer program code is run on a computer, the method according to any one of claims 1-11 is realized.
  16. 一种计算机程序,其特征在于,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行如权利要求1-11中任一项所述的方法。A computer program, characterized in that the computer program includes computer program code, and when the computer program code is run on the computer, the computer is made to execute the method according to any one of claims 1-11.
PCT/CN2022/092571 2021-07-08 2022-05-12 Cell position detection method and apparatus, and electronic device WO2023279847A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110772902.7 2021-07-08
CN202110772902.7A CN113378789B (en) 2021-07-08 2021-07-08 Cell position detection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
WO2023279847A1 true WO2023279847A1 (en) 2023-01-12

Family

ID=77581423

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/092571 WO2023279847A1 (en) 2021-07-08 2022-05-12 Cell position detection method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN113378789B (en)
WO (1) WO2023279847A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071771A (en) * 2023-03-24 2023-05-05 南京燧坤智能科技有限公司 Table reconstruction method and device, nonvolatile storage medium and electronic equipment
CN116503888A (en) * 2023-06-29 2023-07-28 杭州同花顺数据开发有限公司 Method, system and storage medium for extracting form from image

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378789B (en) * 2021-07-08 2023-09-26 京东科技信息技术有限公司 Cell position detection method and device and electronic equipment
CN114639107B (en) * 2022-04-21 2023-03-24 北京百度网讯科技有限公司 Table image processing method, apparatus and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120260152A1 (en) * 2011-03-01 2012-10-11 Ubiquitous Entertainment Inc. Spreadsheet control program, spreadsheet control apparatus and spreadsheet control method
CN110705213A (en) * 2019-08-23 2020-01-17 平安科技(深圳)有限公司 PDF (Portable document Format) table extraction method and device, terminal and computer readable storage medium
CN110751038A (en) * 2019-09-17 2020-02-04 北京理工大学 PDF table structure identification method based on graph attention machine mechanism
CN111492370A (en) * 2020-03-19 2020-08-04 香港应用科技研究院有限公司 Device and method for recognizing text images of a structured layout
CN111639637A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Table identification method and device, electronic equipment and storage medium
CN112200117A (en) * 2020-10-22 2021-01-08 长城计算机软件与系统有限公司 Form identification method and device
CN113378789A (en) * 2021-07-08 2021-09-10 京东数科海益信息科技有限公司 Cell position detection method and device and electronic equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961008A (en) * 2019-02-13 2019-07-02 平安科技(深圳)有限公司 Form analysis method, medium and computer equipment based on text location identification
CN109934226A (en) * 2019-03-13 2019-06-25 厦门美图之家科技有限公司 Key area determines method, apparatus and computer readable storage medium
CN109948507B (en) * 2019-03-14 2021-05-07 北京百度网讯科技有限公司 Method and device for detecting table
CN112100426A (en) * 2020-09-22 2020-12-18 哈尔滨工业大学(深圳) Method and system for searching general table information based on visual and text characteristics
CN112668566A (en) * 2020-12-23 2021-04-16 深圳壹账通智能科技有限公司 Form processing method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120260152A1 (en) * 2011-03-01 2012-10-11 Ubiquitous Entertainment Inc. Spreadsheet control program, spreadsheet control apparatus and spreadsheet control method
CN110705213A (en) * 2019-08-23 2020-01-17 平安科技(深圳)有限公司 PDF (Portable document Format) table extraction method and device, terminal and computer readable storage medium
CN110751038A (en) * 2019-09-17 2020-02-04 北京理工大学 PDF table structure identification method based on graph attention machine mechanism
CN111492370A (en) * 2020-03-19 2020-08-04 香港应用科技研究院有限公司 Device and method for recognizing text images of a structured layout
CN111639637A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Table identification method and device, electronic equipment and storage medium
CN112200117A (en) * 2020-10-22 2021-01-08 长城计算机软件与系统有限公司 Form identification method and device
CN113378789A (en) * 2021-07-08 2021-09-10 京东数科海益信息科技有限公司 Cell position detection method and device and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071771A (en) * 2023-03-24 2023-05-05 南京燧坤智能科技有限公司 Table reconstruction method and device, nonvolatile storage medium and electronic equipment
CN116503888A (en) * 2023-06-29 2023-07-28 杭州同花顺数据开发有限公司 Method, system and storage medium for extracting form from image
CN116503888B (en) * 2023-06-29 2023-09-05 杭州同花顺数据开发有限公司 Method, system and storage medium for extracting form from image

Also Published As

Publication number Publication date
CN113378789B (en) 2023-09-26
CN113378789A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
WO2023279847A1 (en) Cell position detection method and apparatus, and electronic device
CN108520229B (en) Image detection method, image detection device, electronic equipment and computer readable medium
US10762376B2 (en) Method and apparatus for detecting text
WO2021057848A1 (en) Network training method, image processing method, network, terminal device and medium
WO2022213879A1 (en) Target object detection method and apparatus, and computer device and storage medium
AU2016201908B2 (en) Joint depth estimation and semantic labeling of a single image
US9418319B2 (en) Object detection using cascaded convolutional neural networks
CN110059697B (en) Automatic lung nodule segmentation method based on deep learning
US20220253631A1 (en) Image processing method, electronic device and storage medium
US8374454B2 (en) Detection of objects using range information
WO2018161764A1 (en) Document reading-order detection method, computer device, and storage medium
US20220108478A1 (en) Processing images using self-attention based neural networks
CN109635714B (en) Correction method and device for document scanning image
CN113033516A (en) Object identification statistical method and device, electronic equipment and storage medium
CN110827301B (en) Method and apparatus for processing image
Wang et al. Voxel segmentation-based 3D building detection algorithm for airborne LIDAR data
CN113936287A (en) Table detection method and device based on artificial intelligence, electronic equipment and medium
Wang et al. Residential roof condition assessment system using deep learning
CN111027551B (en) Image processing method, apparatus and medium
CN111985471A (en) License plate positioning method and device and storage medium
WO2023066142A1 (en) Target detection method and apparatus for panoramic image, computer device and storage medium
Ranyal et al. Automated pothole condition assessment in pavement using photogrammetry-assisted convolutional neural network
US20230237662A1 (en) Dual-level model for segmentation
JP2022185143A (en) Text detection method, and text recognition method and device
CN113762292B (en) Training data acquisition method and device and model training method and device

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE