CN115797941A

CN115797941A - Information extraction method and device

Info

Publication number: CN115797941A
Application number: CN202211312254.8A
Authority: CN
Inventors: 杨小猛
Original assignee: Pacific Insurance Technology Co Ltd
Current assignee: Pacific Insurance Technology Co Ltd
Priority date: 2022-10-25
Filing date: 2022-10-25
Publication date: 2023-03-14

Abstract

The application provides a method and a device for extracting information, and the implementation method of the application is that key entity information is determined according to a task scene; and then extracting and coding all entity information in a task scene, constructing a multi-head (multi-head) matrix as a carrier of entity information characteristic values, forming a characteristic value matrix by each multi-head matrix, decoding the characteristic values in the multi-head matrix where the key entity information is located, and finally finishing the extraction of the key entity information.

Description

Information extraction method and device

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a method and an apparatus for extracting information.

Background

With the development of science and technology, high technology is gradually integrated into the daily study and life of people, when a user needs to take notes in a classroom and the notes are too long, most people can select to shoot images and then write notes according to the image content, but the efficiency is low, and when the image content is large, a lot of time is consumed for rearranging, so that the extraction of the information of the document images is particularly important.

The Optical Character Recognition (OCR) has been proposed to provide a possible implementation manner for the above-mentioned document image information extraction, but it is inevitable to recognize other unwanted contents, for example, in the case of a similar text billboard appearing in the image, the OCR will extract all recognized texts, which causes a certain trouble for post-processing data, and for this reason, a method capable of extracting key information is needed.

In the prior art, a method based on a graph neural network is often used for extracting key information, the method utilizes an OCR result, an output text takes a text detection box as a unit, each text detection box is taken as a node, and a plurality of nodes are adjacent by edges, so that the whole image forms a graph structure, and iteration is performed through a graph convolution network to obtain the characteristics of the graph structure. And finally, classifying each node and predicting the connection relation of edges to obtain a final result. However, the method cannot distinguish the condition of adhesion among the nodes, and therefore, the method can be only used for simple scenes.

Disclosure of Invention

In view of this, embodiments of the present application provide an information extraction method and apparatus, which aim to process an information extraction task in any general scene, where the general scene includes a simple scene and a complex scene.

In a first aspect, an embodiment of the present application provides an information extraction method, where the method includes:

determining at least one piece of key entity information according to a task scene;

identifying an entity information pair contained in a target image in a task scene, wherein the entity information pair comprises entity information content and coordinate data corresponding to the entity information content;

coding the entity information pairs to obtain characteristic values corresponding to the entity information pairs;

constructing a multi-head matrix corresponding to each eigenvalue, and forming the eigenvalue matrix by each multi-head matrix;

identifying the position of a multi-head matrix corresponding to entity information content containing key entity information in the characteristic value matrix;

and determining a target multi-head matrix corresponding to the position, and decoding the characteristic value corresponding to the target multi-head matrix to obtain key entity information.

Optionally, identifying entity information pairs included in the target image in the task scene includes:

and identifying entity information pairs contained in the target image in the task scene by an optical character identification method.

Preferably, before encoding the entity information pairs to obtain the feature values corresponding to the entity information pairs, the method further includes:

and pre-training a layout model by utilizing an entity information pair contained in the identified target image in the task scene, wherein the layout model is used for coding data and obtaining a characteristic value.

Optionally, constructing a multi-head matrix corresponding to each eigenvalue, and forming an eigenvalue matrix from each multi-head matrix includes:

and filling the entity information into the multi-head matrix for the corresponding eigenvalue, respectively arranging the multi-head matrix horizontally and longitudinally according to the coordinate data, and forming the eigenvalue matrix by each multi-head matrix.

Optionally, identifying the position of the multi-head matrix corresponding to the entity information content containing the key entity information in the eigenvalue matrix includes:

marking a multi-head matrix containing the characteristic values corresponding to the starting field and the ending field of the entity information content in the characteristic value matrix as a first position;

judging whether a multi-head matrix marked as a first position has a characteristic value corresponding to key entity information or not;

marking the multi-head matrix without the characteristic value corresponding to the key entity information as a second position, and marking the multi-head matrix with the characteristic value corresponding to the key entity information as a third position;

determining a position of a third location in the matrix of eigenvalues.

In a second aspect, an embodiment of the present application provides an information extraction apparatus, including:

the key entity information determining module is used for determining at least one piece of key entity information according to the task scene;

the system comprises an entity information pair identification module, a task scene processing module and a task processing module, wherein the entity information pair identification module is used for identifying an entity information pair contained in a target image in the task scene, and the entity information pair comprises entity information content and coordinate data corresponding to the entity information content;

the information coding module is used for coding the entity information pairs to obtain the characteristic value corresponding to each entity information pair;

the multi-head matrix construction module is used for constructing a multi-head matrix corresponding to each eigenvalue and forming the eigenvalue matrix by each multi-head matrix;

the position identification module is used for identifying the position of a multi-head matrix corresponding to entity information content containing key entity information in the characteristic value matrix;

and the decoding module is used for determining the target multi-head matrix corresponding to the position and decoding the characteristic value corresponding to the target multi-head matrix to obtain the key entity information.

Preferably, the apparatus further comprises:

and the format model pre-training module is used for pre-training the format model by utilizing the entity information pair contained in the identified target image in the task scene, wherein the format model is used for coding data and obtaining a characteristic value.

Optionally, the location identification module includes:

the first position marking unit is used for marking a multi-head matrix, which contains the characteristic values corresponding to the starting field and the ending field of the entity information content in the characteristic value matrix, as a first position;

the judging unit is used for judging whether a multi-head matrix marked as a first position has a characteristic value corresponding to the key entity information or not;

the second position marking unit is used for marking the multi-head matrix without the characteristic value corresponding to the key entity information as a second position;

the third position marking unit is used for marking the multi-head matrix with the characteristic value corresponding to the key entity information as a third position;

a position determination unit for determining a position of a third position in the eigenvalue matrix.

In a third aspect, an embodiment of the present application provides an apparatus, where the apparatus includes a memory and a processor, where the memory is configured to store instructions or codes, and the processor is configured to execute the instructions or codes, so as to cause the apparatus to perform the information extraction method according to any one of the foregoing first aspects.

In a fourth aspect, an embodiment of the present application provides a computer storage medium, where codes are stored in the computer storage medium, and when the codes are executed, an apparatus that runs the codes implements the information extraction method according to any one of the foregoing first aspects.

Compared with the prior art, the method and the device have the advantages that at least one piece of key entity information is determined according to the task scene; then identifying an entity information pair contained in a target image in the task scene, wherein the entity information pair comprises entity information content and coordinate data corresponding to the entity information content; coding the entity information pairs to obtain a characteristic value corresponding to each entity information pair; then constructing a multi-head matrix corresponding to each eigenvalue, and forming the multi-head matrix into an eigenvalue matrix; finally, the position of a multi-head matrix corresponding to the entity information content containing the key entity information in the characteristic value matrix is identified; and decoding the characteristic value corresponding to the target multi-head matrix to obtain the key entity information by determining the target multi-head matrix corresponding to the position.

Because each multi-head matrix only contains one characteristic value data and finally only the characteristic value data in the multi-head matrix containing the key entity information are decoded to obtain the key entity information, the condition that the entity information contents are mutually interfered does not exist, the method provided by the application can be applied to complex scenes, and the effect of extracting the key entity information is better in simple scenes.

Drawings

To illustrate the technical solutions in the present embodiment or the prior art more clearly, the drawings needed to be used in the description of the embodiment or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of a method of information extraction provided by an embodiment of the present application;

fig. 2 is a schematic diagram of a method for constructing a eigenvalue matrix according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a first position marker provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a second location and a third location marker provided in an embodiment of the present application;

FIG. 5 is a flow chart of another method of information extraction provided by embodiments of the present application;

fig. 6 is a schematic structural diagram of an information extraction apparatus according to an embodiment of the present application.

Detailed Description

In the prior art, a method based on a graph neural network is often used for extracting key information, the method utilizes the result of Optical Character Recognition (OCR), and takes an output text as a unit of text detection boxes, each text detection box is taken as a node, and a plurality of nodes are adjacent to each other by edges, so that the whole image forms a graph structure, and iteration is performed through a graph convolution network to obtain the characteristics of the graph structure. And finally, classifying each node and predicting the connection relation of edges to obtain a final result.

Through research, the method in the prior art cannot distinguish the condition that the nodes are adhered, for example, the condition that a plurality of entities are printed together when a machine issues a ticket, and the condition that two nodes are adhered cannot be distinguished by using the prior art, so that detection omission is caused, and therefore the method in the prior art can only be used in a simple scene.

Compared with the prior art, the method and the device have the advantages that at least one piece of key entity information is determined according to the task scene; then, identifying an entity information pair contained in a target image in the task scene, wherein the entity information pair comprises entity information content and coordinate data corresponding to the entity information content; coding the entity information pairs to obtain a characteristic value corresponding to each entity information pair; then constructing a multi-head (multi-head) matrix corresponding to each eigenvalue, and forming an eigenvalue matrix by each multi-head matrix; finally, the position of a multi-head matrix corresponding to the entity information content containing the key entity information in the eigenvalue matrix is identified; and decoding the characteristic value corresponding to the target multi-head matrix to obtain the key entity information by determining the target multi-head matrix corresponding to the position.

Because each multi-head matrix only contains one characteristic value data, and finally only the characteristic value data in the multi-head matrix containing the key entity information is decoded to obtain the key entity information, the condition that the entity information contents are mutually interfered does not exist, the method provided by the application can be applied to complex scenes, and has a better effect of extracting the key entity information in simple scenes.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of a method of an information extraction method provided in an embodiment of the present application, including:

s101: at least one key entity information is determined according to the task scenario.

The task scene is a scene that needs to be extracted, for example, information included in the target image is extracted.

The key entity information refers to specific entity information which needs to be extracted in a task scene, for example, in an identity card scanning task scene, an identity card number and a name can be determined as the key entity information, and the two pieces of key entity information are extracted by an information extraction method provided by any embodiment of the application.

S102: entity information pairs contained in target images in a task scene are identified.

The entity information pair comprises entity information content and coordinate data corresponding to the entity information content. For example, in the task scene of extracting the identity card information, the entity information content includes a name, an identity card number, a home address, and the like; the coordinate data corresponding to the entity information content is a coordinate system established by the image edge, and the position coordinate data of each field of the entity information content relative to the origin is calculated.

The effect of this step is to extract all the information contained in the target image.

In one possible implementation manner, identifying entity information pairs contained in a target image in a task scene includes:

Optical Character Recognition (OCR) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a printed Character on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text using a Character Recognition method.

By utilizing the OCR method, all entity information contents with entity shape characteristics in the target image can be obtained, and the position of the entity information contents can be restored according to the coordinates.

S103: and coding the entity information pairs to obtain the characteristic value corresponding to each entity information pair.

Since the entity information pair cannot be directly identified by the intelligent device such as a computer, the purpose of the encoding is to convert the entity information pair into a characteristic value which can be identified by the intelligent device, so that the intelligent device can continue to perform the subsequent information extraction step.

The characteristic value is generally obtained by encoding different values assigned to different pairs of entity information, such as scanning student class information, assigning a value of 0 to a class one, assigning a value of 1 to a class two, and so on.

S104: and constructing a multi-head matrix corresponding to each eigenvalue, and forming the eigenvalue matrix by using each multi-head matrix.

Each multi-head matrix can be regarded as a data storage address, a characteristic value corresponding to an entity information pair is stored in the multi-head matrix, and the process of forming the multi-head matrix into the characteristic value matrix can be regarded as a data integration process, so that all scattered multi-head matrices are combined to form a data set for subsequent data extraction.

In a possible implementation manner, constructing a multi-head matrix corresponding to each eigenvalue, and forming the eigenvalue matrix from the multi-head matrices includes:

filling the entity information pair into a multi-head matrix according to the corresponding characteristic value;

and respectively arranging the multi-head matrixes transversely and longitudinally according to coordinate data, and forming characteristic value matrixes from the multi-head matrixes.

Referring to fig. 2, fig. 2 is a schematic diagram of a method for constructing an eigenvalue matrix provided in the embodiment of the present application, in the diagram, horizontal and vertical hamburgers, french fries, sums, and numbers (i.e., price information) are entity information contents, each small square is a multi-head matrix, and all multi-head matrices jointly form an eigenvalue matrix.

The multi-head matrix is respectively arranged transversely and longitudinally according to the coordinate data, the multi-head matrix can be arranged according to the coordinate data or can be randomly arranged, and the finally formed characteristic value matrix is a square matrix with the same row number and column number.

S105: and identifying the position of the multi-head matrix corresponding to the entity information content containing the key entity information in the eigenvalue matrix.

The identification of the position of the multi-head matrix corresponding to the entity information content containing the key entity information in the eigenvalue matrix refers to the relative position of the multi-head matrix in the eigenvalue matrix, and the coordinate information can be read for each multi-head matrix by establishing a coordinate system.

In a possible implementation manner, identifying a position of a multi-head matrix corresponding to an entity information content containing the key entity information in the eigenvalue matrix includes:

judging whether a multi-head matrix marked as a first position has a characteristic value corresponding to the key entity information;

determining a position of the third location in the matrix of eigenvalues.

Wherein, the multi-head matrix containing the characteristic value corresponding to the start field and the characteristic value corresponding to the end field of the entity information content is marked as a first position and has the following functions in the following steps:

reading all eigenvalues belonging to the same entity information content between the initial field eigenvalue and the end field eigenvalue, for example, referring to fig. 3, where fig. 3 is a labeling schematic diagram of a first position provided in the embodiment of the present application, and in fig. 3, marking a multi-head matrix where the initial field eigenvalue and the end field eigenvalue of a hamburger chip are located as 1, that is, a multi-head matrix capable of forming entity information content is identified, and since the eigenvalue matrix is a matrix with equal number of rows and columns, it is only necessary to mark the entity information content in the upper triangular matrix; specifically, for example, a characteristic value of a beginning field of a hamburger, i.e., "han", is scanned laterally, a characteristic value corresponding to an ending field of the hamburger, i.e., "hamburger", is scanned longitudinally, and a position of a corresponding multi-head matrix is labeled as 1, where the scanning order may also be that the hamburger is scanned longitudinally first and then laterally, and the labeled value may also be other than 1.

For the explanation that the multi-headed matrix without the characteristic value corresponding to the key entity information is labeled as the second position, and the multi-headed matrix with the characteristic value corresponding to the key entity information is labeled as the third position, see fig. 4, and fig. 4 is a labeled schematic diagram of the second position and the third position provided in the embodiment of the present application, assuming that the key entity information is hamburger, the multi-headed matrix including the key entity information of hamburger is labeled as 2 in the first position labeled as 1 in fig. 3, and the position of the labeled multi-headed matrix is the third position, where the multi-headed matrix may be labeled as 2, or may be another value different from the first position, the multi-headed matrix not including the key entity information of hamburger is labeled as a value other than the first position and the third position or not labeled, the position of the labeled multi-headed matrix is the second position, and when the multi-headed matrix of the key entity information position is subsequently determined, only the position where the multi-headed matrix labeled as the third position is located needs to be scanned.

S106: and determining a target multi-head matrix corresponding to the position, and decoding the characteristic value corresponding to the target multi-head matrix to obtain key entity information.

In step S105, the position of the key entity information in the eigenvalue matrix is determined, and since the eigenvalue matrix is formed by at least one multi-head matrix and the position is also a relative position in the eigenvalue matrix, each position corresponds to one multi-head matrix, and thus the corresponding multi-head matrix at the position can be determined.

After the multi-head matrix is determined, the eigenvalue in the multi-head matrix can be decoded, so that entity information content containing key entity information is obtained, and the purpose of the decoding step is to convert the content which can be identified by the intelligent equipment into content which can be recognized by people, so that the task of extracting the key entity information is completed.

In the embodiment of the application, each multi-head matrix only contains one characteristic value data, and finally, only the characteristic value data in the multi-head matrix containing the key entity information is decoded to obtain the key entity information, so that the condition of mutual interference among entity information contents does not exist, the method provided by the application can be applied to complex scenes, and a better effect of extracting the key entity information is achieved in simple scenes.

In this embodiment, a pre-training step of the layout model may be added before step S103 described in fig. 1, which is described below. It should be noted that the implementation manners given in the following description are only exemplary illustrations, and do not represent all implementation manners of the embodiments of the present application.

Referring to fig. 5, fig. 5 is a flowchart of a method for extracting information after a pre-training step of adding a layout model is performed, where some steps are the same as those in the foregoing method embodiment, and therefore are not described in detail in this embodiment, the method includes the following steps:

s501: at least one key entity information is determined according to the task scene.

S502: and identifying an entity information pair contained in the target image in the task scene, wherein the entity information pair comprises entity information content and coordinate data corresponding to the entity information content.

S503: and pre-training the layout model by utilizing the entity information pair contained in the target image in the identified task scene.

The layout model is used for coding data and obtaining a characteristic value.

The layout model is set in advance according to a task scene, for example, in a scene of scanning the identity card, the layout model is set to be a layout similar to the size of the identity card in advance, the information position is set, when the identity card is scanned to acquire an information task, the entity identity card frame can be directly scanned, and then the information extraction and other steps are performed.

In a possible implementation manner, the pre-training of the layout model by using entity information pairs included in target images in the identified task scene includes:

taking entity information pairs contained in target images in the identified task scene as training samples to be input into the format model;

and pre-training the layout model through the training sample to obtain the layout model capable of coding entity information pairs contained in the target image in the task scene.

S504: and coding the entity information pairs to obtain the characteristic value corresponding to each entity information pair.

S505: and constructing a multi-head matrix corresponding to each eigenvalue, and forming the eigenvalue matrix by using each multi-head matrix.

S506: and identifying the position of a multi-head matrix corresponding to the entity information content containing the key entity information in the eigenvalue matrix.

S507: and determining a target multi-head matrix corresponding to the position, and decoding a characteristic value corresponding to the target multi-head matrix to obtain the key entity information.

According to the method and the device, the step of pre-training the layout model by utilizing the entity information pair contained in the target image in the identified task scene is added, so that the accuracy of coding in the specific task scene is improved, the method for extracting the information can be applied to different task scenes, and the wide applicability of the method for extracting the information is improved.

The foregoing is some specific implementation manners of the information extraction method provided in the embodiment of the present application, and based on this, the present application also provides a corresponding apparatus. The device provided by the embodiment of the present application will be described in terms of functional modularity.

Referring to fig. 6, the information extracting apparatus includes:

a key entity information determining module 601, configured to determine at least one piece of key entity information according to a task scenario;

an entity information pair identification module 602, configured to identify an entity information pair included in a target image in a task scene, where the entity information pair includes entity information content and coordinate data corresponding to the entity information content;

the information encoding module 603 is configured to encode the entity information pairs to obtain feature values corresponding to the entity information pairs;

a multi-head matrix construction module 604, configured to construct a multi-head matrix corresponding to each eigenvalue, and form each multi-head matrix into an eigenvalue matrix;

a position identification module 605, configured to identify a position of a multi-head matrix corresponding to entity information content including key entity information in the eigenvalue matrix;

the decoding module 606 is configured to determine a target multi-head matrix corresponding to the position, and decode a eigenvalue corresponding to the target multi-head matrix to obtain key entity information.

In a possible implementation manner, the entity information pair identification module 602 is specifically configured to:

In one possible implementation, the apparatus further includes:

and the format model pre-training module is used for pre-training the format model by utilizing the entity information pair contained in the target image in the identified task scene.

The layout model is used for coding data and obtaining characteristic values.

In one possible implementation, the multi-head matrix building module 604 is specifically configured to:

and filling the entity information into a multi-head matrix for the corresponding eigenvalue, respectively arranging the multi-head matrix horizontally and longitudinally according to the coordinate data, and forming the eigenvalue matrix by each multi-head matrix.

In one possible implementation, the location identification module 605 includes:

the judging unit is used for judging whether a multi-head matrix marked as a first position has a characteristic value corresponding to the key entity information;

a third position marking unit, configured to mark the multi-head matrix with the eigenvalue corresponding to the key entity information as a third position;

The embodiment of the application also provides corresponding equipment and a computer storage medium, which are used for realizing the scheme provided by the embodiment of the application.

The device comprises a memory and a processor, wherein the memory is used for storing instructions or codes, and the processor is used for executing the instructions or codes so as to enable the device to execute the information extraction method in any embodiment of the application.

The computer storage medium has code stored therein, and when the code is executed, an apparatus for executing the code implements the method for information extraction according to any embodiment of the present application.

In the embodiments of the present application, the names "first" and "second" (if present) in the names "first" and "second" are used for name identification, and do not represent the first and second in sequence.

As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a router, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement without inventive effort.

The above description is only an exemplary embodiment of the present application, and is not intended to limit the scope of the present application.

Claims

1. A method of information extraction, the method comprising:

identifying an entity information pair contained in a target image in the task scene, wherein the entity information pair comprises entity information content and coordinate data corresponding to the entity information content;

coding the entity information pairs to obtain a characteristic value corresponding to each entity information pair;

constructing a multi-head matrix corresponding to each eigenvalue, and forming the multi-head matrix into an eigenvalue matrix;

identifying the position of a multi-head matrix corresponding to the entity information content containing the key entity information in the eigenvalue matrix;

and determining a target multi-head matrix corresponding to the position, and decoding a characteristic value corresponding to the target multi-head matrix to obtain the key entity information.

2. The method of claim 1, wherein the identifying entity information pairs contained in the target image in the task scene comprises:

3. The method of claim 1, wherein before encoding the entity information pairs to obtain the eigenvalues corresponding to the entity information pairs, the method further comprises:

and pre-training a layout model by utilizing the identified entity information pair contained in the target image in the task scene, wherein the layout model is used for coding data and obtaining a characteristic value.

4. The method according to claim 1, wherein the constructing a multi-head matrix corresponding to each eigenvalue, and the forming the multi-head matrices into eigenvalue matrices comprises:

filling the entity information into multi-head matrixes for corresponding eigenvalues, respectively arranging the multi-head matrixes horizontally and longitudinally according to the coordinate data, and forming the eigenvalue matrixes by the multi-head matrixes.

5. The method of claim 1, wherein the identifying the position of the multi-head matrix corresponding to the entity information content containing the key entity information in the eigenvalue matrix comprises:

marking a multi-head matrix containing the characteristic value corresponding to the starting field and the characteristic value corresponding to the ending field of the entity information content in the characteristic value matrix as a first position;

judging whether a multi-head matrix marked as a first position has a characteristic value corresponding to the key entity information or not;

determining a position of the third location in the matrix of eigenvalues.

6. An information extraction apparatus, characterized in that the apparatus comprises:

the entity information pair identification module is used for identifying an entity information pair contained in a target image in the task scene, wherein the entity information pair comprises entity information content and coordinate data corresponding to the entity information content;

the information coding module is used for coding the entity information pairs to obtain characteristic values corresponding to the entity information pairs;

the multi-head matrix construction module is used for constructing a multi-head matrix corresponding to each eigenvalue and forming the multi-head matrix into an eigenvalue matrix;

a position identification module, configured to identify a position of a multi-head matrix corresponding to entity information content including the key entity information in the eigenvalue matrix;

and the decoding module is used for determining a target multi-head matrix corresponding to the position and decoding the characteristic value corresponding to the target multi-head matrix to obtain the key entity information.

7. The apparatus of claim 6, further comprising:

and the format model pre-training module is used for pre-training the format model by utilizing the identified entity information pair contained in the target image in the task scene, wherein the format model is used for coding data and obtaining a characteristic value.

8. The apparatus of claim 6, wherein the location identification module comprises:

the first position marking unit is used for marking a multi-head matrix containing the characteristic values corresponding to the starting field and the ending field of the entity information content in the characteristic value matrix as a first position;

a third position marking unit, configured to mark the multi-head matrix with the characteristic value corresponding to the key entity information as a third position;

a position determining unit, configured to determine a position of the third position in the eigenvalue matrix.

9. An information extraction device, characterized in that the device comprises:

a memory for storing instructions or code for said information extraction;

a processor for executing the information extraction instructions or codes to implement the information extraction method of any one of claims 1-5.

10. A computer storage medium having code stored therein, wherein when the code is executed, an apparatus for executing the code implements the method of information extraction of any one of claims 1-5.