CN111611990A

CN111611990A - Method and device for identifying table in image

Info

Publication number: CN111611990A
Application number: CN202010444345.1A
Authority: CN
Inventors: 黄相凯; 李乔伊; 刘明浩; 秦铎浩; 郭江亮
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-09-01
Anticipated expiration: 2040-05-22
Also published as: CN111611990B

Abstract

The embodiment of the application discloses a method and a device for identifying a table in an image, which can be applied to the technical field of image processing. The specific implementation scheme is as follows: acquiring a picture to be processed; identifying field names and field values included in the picture to be processed; acquiring a semantic vector of a field name and a semantic vector of a field value; determining a matching relation between the field names and the field values based on the semantic vectors of the field names and the semantic vectors of the field values and a pre-trained matching model; and generating a table according to the matching relation between the field names and the field values. This embodiment improves the efficiency of identifying the table in the image.

Description

Method and device for identifying table in image

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of image processing.

Background

The form is a document form which is very commonly used in daily work, but in many scenes, the form exists in the form of an image, and how to convert the form in the form of a picture into a format capable of being stored in a structured mode becomes a problem which needs to be solved urgently.

Most of the traditional modes for structured storage of image forms are manual entry, and the image information is entered into a data system in a contrast manner, so that a large amount of manpower is consumed, and high repeatability is achieved. With the development of Optical Character Recognition (OCR) technology, the technology of image-to-text data conversion has become mature, but the OCR technology cannot determine the correspondence between field names and field values.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for identifying a table in an image.

In a first aspect, some embodiments of the present application provide a method for identifying a table in an image, the method comprising: acquiring a picture to be processed; identifying field names and field values included in the picture to be processed; acquiring a semantic vector of a field name and a semantic vector of a field value; determining a matching relation between the field names and the field values based on the semantic vectors of the field names and the semantic vectors of the field values and a pre-trained matching model; and generating a table according to the matching relation between the field names and the field values.

In a second aspect, some embodiments of the present application provide an apparatus for identifying a table in an image, the apparatus comprising: a first acquisition unit configured to acquire a picture to be processed; an identifying unit configured to identify a field name and a field value included in the picture to be processed; a second acquisition unit configured to acquire a semantic vector of a field name and a semantic vector of a field value; a determining unit configured to determine a matching relationship of the field name and the field value based on a semantic vector of the field name and a semantic vector of the field value and a pre-trained matching model; and a generating unit configured to generate a table according to a matching relationship of the field name and the field value.

In a third aspect, some embodiments of the present application provide an apparatus comprising: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described above in the first aspect.

In a fourth aspect, some embodiments of the present application provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method as described above in the first aspect.

According to the technology of the application, the efficiency of identifying the table in the image is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a diagram of an exemplary system architecture to which some of the present application may be applied;

FIG. 2 is a schematic diagram according to a first embodiment of the present application;

FIG. 3 is a schematic diagram of a to-be-processed picture in an embodiment of the present application;

FIG. 4 is a schematic illustration according to a second embodiment of the present application;

FIG. 5 is a schematic illustration according to a third embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device suitable for implementing the method for identifying a table in an image according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the present method for identifying a table in an image or an apparatus for identifying a table in an image may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various client applications, such as a text recognition type application, a social type application, a search type application, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, for example, a background server providing support for applications installed on the

terminal devices

101, 102, and 103, and the server 105 may obtain to-be-processed pictures uploaded by the

terminal devices

101, 102, and 103; identifying field names and field values included in the picture to be processed; acquiring a semantic vector of a field name and a semantic vector of a field value; determining a matching relation between the field names and the field values based on the semantic vectors of the field names and the semantic vectors of the field values and a pre-trained matching model; and generating a table according to the matching relation between the field names and the field values.

The method for identifying the table in the image provided in the embodiment of the present application may be executed by the server 105, or may be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for identifying the table in the image may be disposed in the server 105, or may be disposed in the

terminal devices

101, 102, and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for identifying a table in an image according to the present application is shown. The method for identifying the table in the image comprises the following steps:

step 201, a picture to be processed is obtained.

In this embodiment, a method executing subject (for example, a terminal or a server shown in fig. 1) for identifying a form in an image may acquire a to-be-processed image, which includes an image of the to-be-identified form, and may be obtained by scanning or shooting, and may originate from a medical institution, a financial institution, or other type of institution. In addition, the execution main body can also perform preprocessing operations such as camera distortion elimination, clipping, rotation and the like on the picture to be processed, so that subsequent identification is facilitated.

Step 202, identifying field names and field values included in the picture to be processed.

In this embodiment, the execution subject may identify the field name and the field value included in the to-be-processed picture acquired in step 201. The table at least includes field names and field values, where the field names represent some fixed attributes, the field values represent the content corresponding to the fixed attributes, and taking fig. 3 as an example, the fixed attributes such as "item category", "item name", "quantity", "unit price", "amount", and "category" are the field names, and the content corresponding to the fixed attributes is the field values, such as "contrast catheter (210 in taylon japan)" corresponding to the item name "and" 3.00 "corresponding to the quantity".

Here, the execution subject may identify the field name and the field value included in the picture to be processed using an OCR technology or other convolutional neural network-based image detection method, such as a regional convolutional neural network (Region-CNN) algorithm, a fast regional convolutional neural network (fast R-CNN) algorithm.

The optical character recognition refers to a process of analyzing and recognizing an image file of text data to acquire character and layout information. That is, the characters in the image are recognized and returned in the form of text. A typical OCR solution can be divided into two parts: character detection and character recognition. Text detection is to detect the position, range and layout of the text in the image, and usually includes layout analysis, text line detection and the like. Text detection mainly determines which positions of an image have text, and how large the range of the text is. The text recognition is to recognize the text content on the basis of text detection and convert the text information in the image into text information. Character recognition mainly determines what each character detected by the character is.

The fast R-CNN algorithm adopts a region candidate network (RPN) for assisting in generating samples, the algorithm structure is divided into two parts, whether a candidate frame is a target frame or not is judged by the RPN, then the type of the target frame is judged through a multi-task loss function of classified positioning, the whole network flow can share characteristic information extracted by the convolutional neural network, and the calculation cost is saved. For character detection of a limited scene, the Faster R-CNN algorithm is excellent in performance, and text regions with different granularities can be determined through multiple detections.

In addition, the execution body may use a text range obtained by text detection to segment the text in the same line, or may segment the text in the same line by a pre-trained semantic model, for example, the text may be segmented by a pre-trained Long Short-Term Memory network (LSTM) or Bi-directional Long Short-Term Memory network (bilst).

Step 203, obtaining the semantic vector of the field name and the semantic vector of the field value

In this embodiment, the execution subject may obtain the semantic vector of the field name and the semantic vector of the field value identified in step 201. The semantic vector of the field name and the semantic vector of the field value can be determined by a bag of words model, Word2Vec (Word-to-vector) model, topic model, or the like.

In some optional implementations of this embodiment, the semantic vector of the field value includes a semantic vector determined via: inputting the field value into a pre-trained coding network to obtain semantic codes of each single character in the field value; and fusing the semantic codes of the single characters in the field value to obtain a semantic vector of the field value. Likewise, semantic vectors for field names may also be determined with reference to the present implementation. Because some words which are not included in the dictionary may appear in the table, compared with a mode of obtaining a semantic vector through dictionary query, the implementation mode obtains the semantic vector of the field value through fusing semantic codes of single words in the field value, can determine the semantic vector which is more fit with the actual semantics of each field value, and based on the semantic vector which is more fit with the actual semantics, the matching model can determine a more accurate matching relation.

As an example, the encoding network may include a forward LSTM and a backward LSTM, and the output results of the forward LSTM and the backward LSTM are spliced to obtain context information representation of each single word, and a semantic vector of a field value is obtained through maximum pooling, average pooling and the like. Specifically, for the current single character t, the first t-1 single character representation vectors can be used as the upper information of t, and the upper information of t is coded through the forward LSTM to obtain hidden layer output at the t moment in the forward LSTM; similarly, the following information of t is coded through the backward LSTM, hidden layer output at the time of t in the backward LSTM can be obtained, vector-level splicing is carried out on the hidden layer output at the time of t in the forward LSTM and the hidden layer output at the time of t in the backward LSTM to obtain bidirectional LSTM output representation at the time of t, and then semantic coding of each single character in the field value is fused through maximum pooling operation to obtain a semantic vector of the field value.

And 204, determining the matching relation between the field names and the field values based on the semantic vectors of the field names and the semantic vectors of the field values and a pre-trained matching model.

In this embodiment, the execution subject may determine a matching relationship between the field name and the field value based on a semantic vector of the field name and a semantic vector of the field value and a pre-trained matching model. The input of the matching model may be generated based on the semantic vector of the field name, the semantic vector of the field value, and other related vectors, for example, the semantic vector of the field name and the semantic vector of the field value may be directly used as the input vector of the matching model, the semantic vector of the field name and the semantic vector of the field value may be spliced to be used as the input vector of the matching model, and the semantic vector of the field name, the semantic vector of the field value, and other related vectors may be spliced to be used as the input vector of the matching model.

Here, the matching model may represent a correspondence of the input vector with a matching relationship of the field name and the field value. The matching model can be obtained by training an initial matching model based on a sample, or a corresponding relation table which is preset by technicians based on statistics of parameter values of a large number of input parameters and matching results and stores corresponding relations between the parameter values of a plurality of input parameters and the matching results; the calculation formula may be a calculation formula that is preset by a technician based on statistics of a large amount of data and stored in the electronic device, and is used for obtaining a calculation result representing a matching result by performing numerical calculation on parameter values of one or more input parameters, for example, the calculation formula may be a formula that performs weighted average on parameter values of one or more input parameters, and the obtained calculation result represents matching if the obtained calculation result is greater than a predetermined value.

The matching model may include a Logistic Regression (Logistic Regression), a random forest (random forest), an iterative decision tree (iterative decision tree), a support vector Machine (support vector Machine), and other models for classification, and may also include a fully-connected network and a softmax Logistic Regression, a maximum argument point set (argmax), and other functions.

Step 205, a table is generated according to the matching relationship between the field names and the field values.

In this embodiment, the execution agent may generate a table according to a matching relationship between the field name and the field value. The field name matches the field value, indicating that the field value can be assigned to the field name. The field name does not match the field value and a determination may continue as to whether other field names match the field value. In addition, the execution body may determine which field values are located in the same row according to the position information of the field values or by referring to the direction of the text.

In the process 200 of the method for identifying the table in the image in the embodiment, the matching relationship between the field name and the field value is determined through the semantic vector of the field name and the semantic vector of the field value and a pre-trained matching model, and then the table is automatically generated according to the matching relationship between the field name and the field value.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for identifying a table in an image is shown. The process 400 of the method for identifying a form in an image includes the steps of:

step 401, acquiring a picture to be processed.

Step 402, identifying field names and field values included in the picture to be processed.

Step 403, based on the semantic vector of the field name and the semantic vector of the field value and a pre-trained matching model.

In step 404, the location information of the field name and the location information of the field value are obtained.

In the present embodiment, a method for recognizing a table in an image execution subject (e.g., a terminal or a server shown in fig. 1) may acquire location information of a field name and location information of a field value through OCR or other character detection algorithms, such as fast R-CNN. The location information may indicate a field name and a location of the field value. The position information may include coordinates of key points of an area where the field name and the field value are located in the image to be processed, or information of a boundary line of the area where the field name and the field value are located in the image to be processed. The key points may include center points, boundary points, etc., and taking the region shape as a rectangle as an example, the key points may include an upper left corner point, a lower left corner point, an upper right corner point, and a lower right corner point.

Step 405, generating a distance vector of the field name and the field value according to the position information of the field name and the position information of the field value.

In the present embodiment, the execution body described above may generate a distance vector of a field name and a field value from position information of the field name and position information of the field value. The distance vector may characterize the distance between the field name and the field value, and the dimension of the distance vector may directly include the distance between the field name and the field value, or other information that may characterize the distance between the field name and the field value, such as the difference between the horizontal and vertical coordinates.

In some optional implementations of the present embodiment, generating a distance vector of the field name and the field value according to the location information of the field name and the location information of the field value includes: and generating a distance vector of the field name and the field value according to the difference value of the coordinates of the key point of the area where the field name is located and the coordinates of the key point of the area where the field value is located in the preset direction. The predetermined direction may include an abscissa direction and/or an ordinate direction. Specifically, taking the shape of the region as a rectangle as an example, the four dimensions of the distance vector may be: the difference between the abscissa of the upper left corner of the field value and the abscissa of the upper left corner of the field name, the difference between the ordinate of the upper left corner of the field value and the ordinate of the upper left corner of the field name, the difference between the abscissa of the lower right corner of the field value and the abscissa of the lower right corner of the field name, and the difference between the ordinate of the lower right corner of the field value and the ordinate of the lower right corner of the field name.

Compared with the mode of directly using the distance between the field name and the field value to generate the distance vector, the distance vector generated based on the implementation mode can embody the position relation of the field name and the field value in the preset direction, contains richer position information and is beneficial to determining more accurate matching relation subsequently.

Step 406, generating an input vector of the matching model according to the semantic vector of the field name, the semantic vector of the field value and the distance vector.

In this embodiment, the execution agent may generate an input vector of the matching model according to a semantic vector of the field name, a semantic vector of the field value, and a distance vector. The execution main body can directly splice the semantic vector of the field name, the semantic vector of the field value and the distance vector to obtain the input vector of the matching model, can also perform some processing on the semantic vector, the semantic vector of the field value and the distance vector, and then can fuse the processed vectors based on splicing and other modes to obtain the input vector.

In some optional implementations of this embodiment, generating an input vector of the matching model according to the semantic vector of the field name, the semantic vector of the field value, and the distance vector includes: fusing a semantic vector of the field name and a semantic vector of the field value to obtain a first vector; performing dimensionality transformation on the distance vector to obtain a second vector, wherein the dimensionality of the second vector is the same as that of the first vector; and splicing the first vector and the second vector to obtain an input vector. As an example, the execution subject may splice a semantic vector of a field name and a semantic vector of a field value to obtain a first vector, or may sufficiently fuse and learn information of the two through a full link layer after splicing to obtain the first vector. Because the position information and the semantic information are important when determining the matching relationship between the field name and the field value, the realization mode leads the dimension of the second vector representing the position information to be the same as the dimension of the first vector representing the semantic information through dimension transformation, and the two are spliced to be used as the input vector of the matching model, thus leading the position information carried in the input vector to be more balanced with the semantic information, and the matching model can give consideration to the two to obtain more accurate output results.

Step 407, inputting the input vector into the matching model, and determining the matching relationship between the field name and the field value according to the output of the matching model.

In this embodiment, the execution agent may input the input vector into the matching model, and determine a matching relationship between the field name and the field value according to an output of the matching model. The output of the matching model may indicate whether the field name and the field value match.

Step 408, a table is generated according to the matching relationship between the field names and the field values.

In this embodiment, the operations of step 401, step 402, step 403, and step 408 are substantially the same as the operations of step 201, step 202, step 203, and step 205, and are not described herein again.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, in the process 400 of the method for identifying a table in an image in the present embodiment, an input vector of a matching model is generated according to a semantic vector of a field name, a semantic vector of a field value, and a distance vector, where the input vector includes not only semantic information but also distance information, and the two kinds of information are combined, so that a matching relationship between the field name and the field value can be determined more accurately, and thus, accuracy of identifying the table in the image is improved.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for identifying a table in an image, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for recognizing a table in an image of the present embodiment includes: a first acquisition unit 501, a recognition unit 502, a second acquisition unit 503, a determination unit 504, and a generation unit 505. The device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is configured to acquire a picture to be processed; an identifying unit configured to identify a field name and a field value included in the picture to be processed; a second acquisition unit configured to acquire a semantic vector of a field name and a semantic vector of a field value; a determining unit configured to determine a matching relationship of the field name and the field value based on a semantic vector of the field name and a semantic vector of the field value and a pre-trained matching model; and a generating unit configured to generate a table according to a matching relationship of the field name and the field value.

In the present embodiment, specific processing of the first acquiring unit 501, the identifying unit 502, the second acquiring unit 503, the determining unit 504 and the generating unit 505 of the apparatus 500 for identifying a table in an image may refer to step 201, step 202, step 203, step 204 and step 205 in the corresponding embodiment of fig. 2.

In some optional implementations of this embodiment, the determining unit includes: an acquisition subunit configured to acquire position information of a field name and position information of a field value; a first generation subunit configured to generate a field name and a distance vector of a field value from position information of the field name and position information of the field value; a second generation subunit configured to generate an input vector of the matching model from the semantic vector of the field name, the semantic vector of the field value, and the distance vector; and a determining subunit configured to input the input vector into the matching model, and determine a matching relationship between the field name and the field value according to an output of the matching model.

In some optional implementations of this embodiment, the second generating subunit is further configured to: fusing a semantic vector of the field name and a semantic vector of the field value to obtain a first vector; performing dimensionality transformation on the distance vector to obtain a second vector, wherein the dimensionality of the second vector is the same as that of the first vector; and splicing the first vector and the second vector to obtain an input vector.

In some optional implementations of this embodiment, the location information includes: coordinates of key points of the field names and the areas where the field values are located in the image to be processed; and a first generating subunit further configured to: and generating a distance vector of the field name and the field value according to the difference value of the coordinates of the key point of the area where the field name is located and the coordinates of the key point of the area where the field value is located in the preset direction.

In some optional implementations of this embodiment, the apparatus further comprises a semantic vector determination unit configured to: inputting the field value into a pre-trained coding network to obtain semantic codes of each single character in the field value; and fusing the semantic codes of the single characters in the field value to obtain a semantic vector of the field value.

The device provided by the above embodiment of the present application obtains the picture to be processed; identifying field names and field values included in the picture to be processed; acquiring a semantic vector of a field name and a semantic vector of a field value; determining a matching relation between the field names and the field values based on the semantic vectors of the field names and the semantic vectors of the field values and a pre-trained matching model; and a table is generated according to the matching relation between the field name and the field value, so that the efficiency of identifying the table in the image is improved.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, it is a block diagram of an electronic device for identifying a table in an image according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method for identifying a table in an image provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for identifying a table in an image provided by the present application.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for identifying a table in an image in the embodiment of the present application (for example, the first acquisition unit 501, the identification unit 502, the second acquisition unit 503, the determination unit 504, and the generation unit 505 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing, i.e., implements the method for identifying a table in an image in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of an electronic device for identifying a table in an image, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory remotely located from the processor 601, and these remote memories may be connected over a network to an electronic device for identifying the tables in the image. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device for the method of identifying a form in an image may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus for recognizing the form in the image, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the efficiency of identifying the table in the image is improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for identifying a table in an image, comprising:

acquiring a picture to be processed;

identifying a field name and a field value included in the picture to be processed;

acquiring a semantic vector of the field name and a semantic vector of the field value;

determining a matching relation between the field names and the field values based on the semantic vectors of the field names and the semantic vectors of the field values and a pre-trained matching model;

and generating a table according to the matching relationship between the field names and the field values.

2. The method of claim 1, wherein the determining the matching relationship between the field name and the field value based on the semantic vector of the field name and the semantic vector of the field value and a pre-trained matching model comprises:

acquiring the position information of the field name and the position information of the field value;

generating a distance vector of the field name and the field value according to the position information of the field name and the position information of the field value;

generating an input vector of the matching model according to the semantic vector of the field name, the semantic vector of the field value and the distance vector;

and inputting the input vector into the matching model, and determining the matching relation between the field name and the field value according to the output of the matching model.

3. The method of claim 2, wherein the generating an input vector for the matching model from the semantic vector for the field name, the semantic vector for the field value, and the distance vector comprises:

fusing the semantic vector of the field name and the semantic vector of the field value to obtain a first vector;

performing dimension transformation on the distance vector to obtain a second vector, wherein the dimension of the second vector is the same as that of the first vector;

and splicing the first vector and the second vector to obtain the input vector.

4. The method of claim 2, wherein the location information comprises: coordinates of key points of the field names and the areas where the field values are located in the image to be processed; and

the generating of the distance vector of the field name and the field value according to the position information of the field name and the position information of the field value includes:

and generating a distance vector of the field name and the field value according to the difference value of the coordinates of the key point of the area where the field name is located and the coordinates of the key point of the area where the field value is located in a preset direction.

5. The method of any of claims 1-4, wherein the semantic vector of the field value comprises a semantic vector determined via:

inputting the field value into a pre-trained coding network to obtain semantic codes of individual characters in the field value;

and fusing the semantic codes of the single characters in the field value to obtain a semantic vector of the field value.

6. An apparatus for identifying a table in an image, comprising:

a first acquisition unit configured to acquire a picture to be processed;

an identification unit configured to identify a field name and a field value included in the picture to be processed;

a second acquisition unit configured to acquire a semantic vector of the field name and a semantic vector of the field value;

a determining unit configured to determine a matching relationship between the field name and the field value based on a semantic vector of the field name and a semantic vector of the field value and a pre-trained matching model;

a generating unit configured to generate a table according to a matching relationship of the field name and the field value.

7. The apparatus of claim 6, wherein the determining unit comprises:

an acquisition subunit configured to acquire position information of the field name and position information of the field value;

a first generation subunit configured to generate a distance vector of the field name and the field value from position information of the field name and position information of the field value;

a second generation subunit configured to generate an input vector of the matching model from the semantic vector of the field name, the semantic vector of the field value, and the distance vector;

a determining subunit configured to input the input vector into the matching model, and determine a matching relationship between the field name and the field value according to an output of the matching model.

8. The apparatus of claim 7, wherein the second generation subunit is further configured to:

and splicing the first vector and the second vector to obtain the input vector.

9. The apparatus of claim 7, wherein the location information comprises: coordinates of key points of the field names and the areas where the field values are located in the image to be processed; and

the first generation subunit further configured to:

10. The apparatus according to any one of claims 6-9, wherein the apparatus further comprises a semantic vector determination unit configured to:

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.