CN116611450A - Method, device and equipment for extracting document information and readable storage medium - Google Patents

Method, device and equipment for extracting document information and readable storage medium Download PDF

Info

Publication number
CN116611450A
CN116611450A CN202310645296.1A CN202310645296A CN116611450A CN 116611450 A CN116611450 A CN 116611450A CN 202310645296 A CN202310645296 A CN 202310645296A CN 116611450 A CN116611450 A CN 116611450A
Authority
CN
China
Prior art keywords
information
semantic
extracting
target document
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310645296.1A
Other languages
Chinese (zh)
Inventor
李树凯
张颖
杜新凯
田强
刘润玉
赵泽通
王足根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sunshine Insurance Group Co Ltd
Original Assignee
Sunshine Insurance Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sunshine Insurance Group Co Ltd filed Critical Sunshine Insurance Group Co Ltd
Priority to CN202310645296.1A priority Critical patent/CN116611450A/en
Publication of CN116611450A publication Critical patent/CN116611450A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a method, a device, equipment and a readable storage medium for extracting document information, wherein the method comprises the steps of marking semantic entities with association relations in a target document in the same detection frame to obtain a plurality of local detection frames; extracting the position information of a plurality of local detection frames to obtain a plurality of local position information; and extracting the information from the plurality of local position information and semantic entity identification information of the target document through a dual-affine attention mechanism to obtain an information extraction result. The method can achieve the effect of improving the accuracy of the document extraction information.

Description

Method, device and equipment for extracting document information and readable storage medium
Technical Field
The present application relates to the field of document information extraction, and in particular, to a method, apparatus, device, and readable storage medium for extracting document information.
Background
At present, along with the increase of geometric progression of document data, the demands for document class identification are increasing, and as the layout of the document data is complex and various and the included data modes are various, the demands are difficult to meet due to the customized development based on templates, and in recent years, multi-mode information extraction based on fusion of various modes such as document content, image characteristics and layout information is widely applied, and the current text information extraction can directly extract all information of the document.
However, in the process of extracting information, there is a case where the entire document information is collected together, resulting in a case where an information extraction error occurs when extracting the document information.
Therefore, how to improve the accuracy of document extraction information is a technical problem to be solved.
Disclosure of Invention
The embodiment of the application aims to provide a method for extracting document information, and the effect of improving the accuracy of extracting the document information can be achieved through the technical scheme of the embodiment of the application.
In a first aspect, an embodiment of the present application provides a method for extracting document information, including labeling semantic entities having an association relationship in a target document in the same detection frame, so as to obtain a plurality of local detection frames; extracting the position information of a plurality of local detection frames to obtain a plurality of local position information; and extracting the information from the plurality of local position information and semantic entity identification information of the target document through a dual-affine attention mechanism to obtain an information extraction result.
In the embodiment of the application, the dual affine attention mechanism is input by combining the position information of the local detection frame and the document semantic entity identification information, so that the dual affine attention mechanism can extract the semantic entity identification information by considering the position of the local information in the target document, the situation that the results obtained by extracting the whole document information are arranged together and are unclear can be avoided, and the effect of improving the accuracy of the document extraction information can be achieved.
In some embodiments, before labeling semantic entities with association relations in the target document in the same detection frame to obtain a plurality of local detection frames, the method further includes:
labeling text fields of the target document to obtain a plurality of text field detection boxes;
extracting text features of a plurality of text field detection boxes and visual features of a target document, wherein the text features comprise text semantic features and text position features;
and carrying out entity recognition on the text features and the visual features through a convolutional neural network to obtain semantic entity recognition information.
In the embodiment of the application, the semantic entity identification information of the whole target document can be obtained by identifying the content in the detection frame by marking the detection frame of the whole target document, and the semantic entity identification information existing in the whole target document can be rapidly obtained by the method.
In some embodiments, after extracting information from the plurality of local position information and semantic entity identification information of the target document by using a dual affine attention mechanism, obtaining an information extraction result, the method further includes:
training the basic information extraction model through an information extraction result to obtain an information extraction model, wherein the information extraction result comprises the relation between semantic entity identification information of the target document and the semantic entity;
and inputting the document to be extracted into an information extraction model to obtain an extraction result.
In the embodiment of the application, the extraction result obtained by extracting the target document can be used as a sample training information extraction model, so that the information in the document can be directly extracted through the information extraction model, and the final extraction result can be obtained by taking the position of the local entity information in the document into account by extracting the document information through the information extraction model.
In some embodiments, the information extraction is performed on the plurality of local position information and semantic entity identification information of the target document through a dual affine attention mechanism to obtain an information extraction result, including:
screening a plurality of local position information and semantic entity identification information of a target document according to a preset proportion to obtain an information set;
and extracting the semantic entity relationship from the information set to obtain the relationship among a plurality of semantic entities.
In the embodiment of the application, the relation among a plurality of semantic entities can be obtained by extracting the semantic entity relation through the local position information of the preset proportion and the semantic entity identification information of the target document, the relation among the semantic entities can be considered when the information extraction model is used for extracting the information, and the accuracy of the information extraction is improved.
In a second aspect, an embodiment of the present application provides an apparatus for extracting document information, including:
the labeling module is used for labeling semantic entities with association relations in the target document in the same detection frame to obtain a plurality of local detection frames;
the extraction module is used for extracting the position information of the plurality of local detection frames to obtain a plurality of local position information;
and the extraction module is used for carrying out information extraction on the plurality of local position information and the semantic entity identification information of the target document through a double affine attention mechanism to obtain an information extraction result.
Optionally, the apparatus further includes:
the identification module is used for marking text fields of the target document before marking semantic entities with association relations in the target document in the same detection frame to obtain a plurality of local detection frames, so as to obtain a plurality of text field detection frames;
extracting text features of a plurality of text field detection boxes and visual features of a target document, wherein the text features comprise text semantic features and text position features;
and carrying out entity recognition on the text features and the visual features through a convolutional neural network to obtain semantic entity recognition information.
Optionally, the apparatus further includes:
the training module is used for carrying out information extraction on the plurality of local position information and semantic entity identification information of the target document through the double affine attention mechanism to obtain an information extraction result, and training the basic information extraction model through the information extraction result to obtain an information extraction model, wherein the information extraction result comprises the relationship between the semantic entity identification information of the target document and the semantic entity;
and inputting the document to be extracted into an information extraction model to obtain an extraction result.
Optionally, the extraction module is specifically configured to:
screening a plurality of local position information and semantic entity identification information of a target document according to a preset proportion to obtain an information set;
and extracting the semantic entity relationship from the information set to obtain the relationship among a plurality of semantic entities.
In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing computer readable instructions which, when executed by the processor, perform the steps of the method as provided in the first aspect above.
In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method as provided in the first aspect above.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for extracting document information according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a detailed method for extracting document information according to an embodiment of the present application;
FIG. 3 is a schematic block diagram of an apparatus for extracting document information according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an apparatus for extracting document information according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
The method is applied to the scene of document information extraction, and the specific scene is to extract the text entity content in the document by considering the position of the local information of the document.
At present, along with the increase of geometric progression of document data, the demands for document class identification are increasing, and as the layout of the document data is complex and various and the included data modes are various, the demands are difficult to meet due to the customized development based on templates, and in recent years, multi-mode information extraction based on fusion of various modes such as document content, image characteristics and layout information is widely applied, and the current text information extraction can directly extract all information of the document. However, in the process of extracting information, there is a case where the entire document information is collected together, resulting in a case where an information extraction error occurs when extracting the document information.
Therefore, semantic entities with association relations in the target document are marked in the same detection frame to obtain a plurality of local detection frames; extracting the position information of a plurality of local detection frames to obtain a plurality of local position information; and extracting the information from the plurality of local position information and semantic entity identification information of the target document through a dual-affine attention mechanism to obtain an information extraction result. The dual affine attention mechanism is input by combining the position information of the local detection frame and the document semantic entity identification information, so that the dual affine attention mechanism can extract the semantic entity identification information by considering the position of the local information in the target document, the situation that the results obtained by extracting the whole document information are arranged together and are unclear can be avoided, and the effect of improving the accuracy of the document extraction information can be achieved.
In the embodiment of the present application, the execution body may be an extracting document information device in the extracting document information system, and in practical application, the extracting document information device may be electronic devices such as a terminal device and a server, which is not limited herein.
The method of extracting document information according to an embodiment of the present application will be described in detail with reference to fig. 1.
Referring to fig. 1, fig. 1 is a flowchart of a method for extracting document information according to an embodiment of the present application, where the method for extracting document information shown in fig. 1 includes:
step 110: and labeling semantic entities with association relations in the target document in the same detection frame to obtain a plurality of local detection frames.
The semantic entity comprises text content such as words, sentences and the like. The semantic entity with the association relationship indicates that the content information in the document belongs to the same type, the same attribute, the same object or the like and is considered to have the association relationship, and the basic information of the applicant, the information such as age, sex and certificate and the like and the semantic entity such as age, sex and certificate and the like of the insured person and the beneficiary with the association relationship are taken as examples. The basic information of the applicant, such as age, sex, certificate and the like, and the information of insurance amount, term, payment type and the like are regarded as entities without association.
In some embodiments of the present application, before labeling semantic entities having association relationships in a target document in the same detection frame to obtain a plurality of local detection frames, the method shown in fig. 1 further includes: labeling text fields of the target document to obtain a plurality of text field detection boxes; extracting text features of a plurality of text field detection boxes and visual features of a target document, wherein the text features comprise text semantic features and text position features; and carrying out entity recognition on the text features and the visual features through a convolutional neural network to obtain semantic entity recognition information.
In the process, the whole target document is marked by the detection frame, the semantic entity identification information of the whole target document can be obtained by identifying the content in the detection frame, and the semantic entity identification information existing in the whole target document can be rapidly obtained by the method.
The detection frame comprises a text field, which can be a word, a sentence, code, number and other information. The labeling of the detection frame can be directly realized through a tilting text detection labeling tool (labelimage). The visual features may be some information of color, shape, and size in the document graphic. Semantic features may be some entity content of text, meaning of entity content, etc. The location feature may be a location of the physical content in the document. The convolutional neural network can adopt a master rcnn, dbnet and other target detection algorithms to identify the text features and visual features. Text recognition results can be obtained by directly recognizing the content in the detection box through a commonly used text recognition model (e.g., crnn). Semantic features of the text can be extracted through a common word embedding model, and position features of the text can be extracted through spatial feature codes of the text; visual features of text visual features of the document may be extracted by convolutional neural networks.
Step 120: and extracting the position information of the plurality of local detection frames to obtain a plurality of local position information.
The extraction of the detection frame corresponding to the local position information of the document can be completed through a commonly used target detection algorithm, such as a master rcnn, a dbnet and the like, and the position information of the detection frame can be directly extracted. The local position information may use coordinates of a center point of the detection frame as the local position information of the detection frame.
Step 130: and extracting the information from the plurality of local position information and semantic entity identification information of the target document through a dual-affine attention mechanism to obtain an information extraction result.
The dual affine attention mechanism can simultaneously consider two input information features and output an information extraction result. The information extraction of the plurality of local position information and the semantic entity identification information of the target document through the dual affine attention mechanism comprises the following steps: extracting entity identification information of the target document by using the local position information as a reference through a dual-affine attention mechanism, wherein the obtained extraction result can classify semantic entities according to the local position information to obtain a final classification result.
In some embodiments of the present application, the information extraction is performed on a plurality of local location information and semantic entity identification information of a target document by a dual affine attention mechanism to obtain an information extraction result, including: screening a plurality of local position information and semantic entity identification information of a target document according to a preset proportion to obtain an information set; and extracting the semantic entity relationship from the information set to obtain the relationship among a plurality of semantic entities.
In the process, the semantic entity relationship is extracted through the local position information of the preset proportion and the semantic entity identification information of the target document, so that the relationship among a plurality of semantic entities can be obtained, the relationship among the semantic entities can be considered when the information extraction model is used for extracting the information, and the accuracy of the information extraction is improved.
The preset proportion can be obtained by continuous learning according to the accuracy of the actual extraction result. The application combines semantic entity identification information with document local position information, and provides a combination mode capable of being input into a key and a value in a dual affine attention mechanism, wherein the identified semantic entity identification information can be divided into key-value pairs, such as the key: "applicant", value: on one hand, the XX considers the mapping relation between the global key and the value, and on the other hand, adds the local position information of the document, so that the key and the value with the association relation can be enhanced, and further, the semantic entity identification information can be classified by considering the local position information in the information extraction process. The combined calculation formula of key and value input into the dual affine attention mechanism is as follows:
kv_relations=ɑ1*kv_relations_whole+ɑ2*kv_relations_part_set;
wherein, kv_references represent mapping values, kv_references_window is a mapping value of global key and value, and kv_references_part_set is a mapping value set of local position information key and value; and alpha 1 and alpha 2 are weight coefficients, wherein alpha 1 plus alpha 2=1, and the accuracy of information extraction model extraction can be improved by configuring the weight between alpha 1 and alpha 2.
In some embodiments of the present application, after extracting information from the plurality of local location information and semantic entity identification information of the target document by using the dual affine attention mechanism, the method shown in fig. 1 further includes: training the basic information extraction model through an information extraction result to obtain an information extraction model, wherein the information extraction result comprises the relation between semantic entity identification information of the target document and the semantic entity; and inputting the document to be extracted into an information extraction model to obtain an extraction result.
In the process, the extraction result obtained by extracting the target document can be used as a sample training information extraction model, so that the information in the document can be directly extracted through the information extraction model, and the final extraction result can be obtained by taking the position of the local entity information in the document into account by extracting the document information through the information extraction model.
The extraction result comprises classified semantic entities, and the semantic entities in the same area have association relations. The information extraction result is to consider the semantic entity extracted by the local position information of the document, and the trained information extraction model can also consider the position relation of the semantic entity in the document to extract the semantic entity with the association relation into the same class.
In the process shown in the above figure 1, the semantic entity with the association relationship in the target document is marked in the same detection frame to obtain a plurality of local detection frames; extracting the position information of a plurality of local detection frames to obtain a plurality of local position information; and extracting the information from the plurality of local position information and semantic entity identification information of the target document through a dual-affine attention mechanism to obtain an information extraction result. The dual affine attention mechanism is input by combining the position information of the local detection frame and the document semantic entity identification information, so that the dual affine attention mechanism can extract the semantic entity identification information by considering the position of the local information in the target document, the situation that the results obtained by extracting the whole document information are arranged together and are unclear can be avoided, and the effect of improving the accuracy of the document extraction information can be achieved.
The method of extracting document information according to an embodiment of the present application will be described in detail with reference to fig. 2.
Referring to fig. 2, fig. 2 is a schematic diagram of a detailed method for extracting document information according to an embodiment of the present application, where the detailed method for extracting document information shown in fig. 2 includes:
labeling text fields of a target document to obtain a plurality of text field detection boxes and a plurality of document local position detection boxes; identifying a plurality of text field detection box text fields to obtain text position features, text semantic features and visual features of a target document; the text semantic features can be extracted through a common word embedding model, and the text position features can be extracted through space feature codes of the text; visual features of the document may be extracted by a convolutional neural network; inputting the extracted text position features, text semantic features and visual features of the target document into a Transformer layer to identify entity information, so as to obtain entity identification information; extracting the position information of a plurality of local detection frames to obtain local position information of a plurality of documents; and inputting the entity identification information and the local position information of the plurality of documents into a dual affine attention mechanism to obtain an information extraction result.
In addition, the specific steps and methods shown in fig. 2 can refer to the methods shown in fig. 1, and are not described in detail herein.
The method of extracting document information is described above by fig. 1 to 2, and the apparatus of extracting document information is described below with reference to fig. 3 to 4.
Referring to fig. 3, a schematic block diagram of an apparatus 300 for extracting document information according to an embodiment of the present application is provided, where the apparatus 300 may be a module, a program segment, or a code on an electronic device. The apparatus 300 corresponds to the embodiment of the method of fig. 1 described above, and is capable of performing the steps involved in the embodiment of the method of fig. 1. Specific functions of the apparatus 300 will be described below, and detailed descriptions thereof will be omitted herein as appropriate to avoid redundancy.
Optionally, the apparatus 300 includes:
the labeling module 310 is configured to label semantic entities with association relationships in a target document in the same detection frame, so as to obtain a plurality of local detection frames;
an extracting module 320, configured to extract position information of a plurality of local detection frames to obtain a plurality of local position information;
the extraction module 330 is configured to extract information from the plurality of local location information and semantic entity identification information of the target document by using a dual affine attention mechanism, so as to obtain an information extraction result.
Optionally, the apparatus further includes:
the identification module is used for marking text fields of the target document before marking semantic entities with association relations in the target document in the same detection frame to obtain a plurality of local detection frames, so as to obtain a plurality of text field detection frames; extracting text features of a plurality of text field detection boxes and visual features of a target document, wherein the text features comprise text semantic features and text position features; and carrying out entity recognition on the text features and the visual features through a convolutional neural network to obtain semantic entity recognition information.
Optionally, the apparatus further includes:
the training module is used for carrying out information extraction on the plurality of local position information and semantic entity identification information of the target document through the double affine attention mechanism to obtain an information extraction result, and training the basic information extraction model through the information extraction result to obtain an information extraction model, wherein the information extraction result comprises the relationship between the semantic entity identification information of the target document and the semantic entity; and inputting the document to be extracted into an information extraction model to obtain an extraction result.
Optionally, the extraction module is specifically configured to:
screening a plurality of local position information and semantic entity identification information of a target document according to a preset proportion to obtain an information set; and extracting the semantic entity relationship from the information set to obtain the relationship among a plurality of semantic entities.
Referring to fig. 4, a schematic block diagram of an apparatus for extracting document information according to an embodiment of the present application may include a memory 410 and a processor 420. Optionally, the apparatus may further include: a communication interface 430 and a communication bus 440. The apparatus corresponds to the embodiment of the method of fig. 1 described above, and is capable of performing the steps involved in the embodiment of the method of fig. 1, and specific functions of the apparatus may be found in the following description.
In particular, the memory 410 is used to store computer readable instructions.
The processor 420, which processes the readable instructions stored in the memory, is capable of performing the various steps in the method of fig. 1.
Communication interface 430 is used for signaling or data communication with other node devices. For example: for communication with a server or terminal, or with other device nodes, although embodiments of the application are not limited in this regard.
A communication bus 440 for enabling direct connection communication of the above-described components.
The communication interface 430 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The memory 410 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. Memory 410 may also optionally be at least one storage device located remotely from the aforementioned processor. The memory 410 has stored therein computer readable instructions which, when executed by the processor 420, perform the method process described above in fig. 1. Processor 420 may be used on apparatus 300 and to perform functions in the present application. By way of example, the processor 420 described above may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, and the embodiments of the application are not limited in this regard.
Embodiments of the present application also provide a readable storage medium, which when executed by a processor, performs a method process performed by an electronic device in the method embodiment shown in fig. 1.
It will be clear to those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding procedure in the foregoing method for the specific working procedure of the apparatus described above, and this will not be repeated here.
In summary, the embodiments of the present application provide a method, an apparatus, an electronic device, and a readable storage medium for extracting document information, where the method includes labeling semantic entities having an association relationship in a target document in the same detection frame to obtain a plurality of local detection frames; extracting the position information of a plurality of local detection frames to obtain a plurality of local position information; and extracting the information from the plurality of local position information and semantic entity identification information of the target document through a dual-affine attention mechanism to obtain an information extraction result. The method can achieve the effect of improving the accuracy of the document extraction information.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method of extracting document information, comprising:
marking semantic entities with association relations in a target document in the same detection frame to obtain a plurality of local detection frames;
extracting the position information of the local detection frames to obtain a plurality of local position information;
and extracting information from the plurality of local position information and the semantic entity identification information of the target document through a dual-affine attention mechanism to obtain an information extraction result.
2. The method according to claim 1, wherein before labeling semantic entities having an association relationship in the target document in the same detection frame to obtain a plurality of local detection frames, the method further comprises:
labeling text fields of the target document to obtain a plurality of text field detection boxes;
extracting text features of the plurality of text field detection boxes and visual features of the target document, wherein the text features comprise text semantic features and text position features;
and carrying out entity recognition on the text features and the visual features through a convolutional neural network to obtain the semantic entity recognition information.
3. The method according to claim 1 or 2, wherein after the extracting of the information of the plurality of local location information and the semantic entity identification information of the target document by the dual affine attention mechanism, the method further comprises:
training a basic information extraction model through the information extraction result to obtain an information extraction model, wherein the information extraction result comprises the relation between semantic entity identification information of the target document and semantic entities;
and inputting the document to be extracted into the information extraction model to obtain an extraction result.
4. The method according to claim 1 or 2, wherein the extracting the information of the plurality of local location information and the semantic entity identification information of the target document by the dual affine attention mechanism to obtain an information extraction result includes:
screening the plurality of local position information and semantic entity identification information of the target document through a preset proportion to obtain an information set;
and extracting the semantic entity relationship from the information set to obtain the relationship among a plurality of semantic entities.
5. An apparatus for extracting document information, comprising:
the labeling module is used for labeling semantic entities with association relations in the target document in the same detection frame to obtain a plurality of local detection frames;
the extraction module is used for extracting the position information of the plurality of local detection frames to obtain a plurality of local position information;
and the extraction module is used for carrying out information extraction on the plurality of local position information and the semantic entity identification information of the target document through a double affine attention mechanism to obtain an information extraction result.
6. The apparatus of claim 5, wherein the apparatus further comprises:
the identification module is used for marking text fields of the target document before marking semantic entities with association relations in the target document in the same detection frame to obtain a plurality of local detection frames to obtain a plurality of text field detection frames;
extracting text features of the plurality of text field detection boxes and visual features of the target document, wherein the text features comprise text semantic features and text position features;
and carrying out entity recognition on the text features and the visual features through a convolutional neural network to obtain the semantic entity recognition information.
7. The apparatus according to claim 5 or 6, characterized in that the apparatus further comprises:
the training module is used for training the basic information extraction model through the information extraction result after the extraction module performs information extraction on the plurality of local position information and the semantic entity identification information of the target document through the double affine attention mechanism to obtain the information extraction result, so as to obtain the information extraction model, wherein the information extraction result comprises the relationship between the semantic entity identification information of the target document and the semantic entity;
and inputting the document to be extracted into the information extraction model to obtain an extraction result.
8. The apparatus according to claim 5 or 6, wherein the extraction module is specifically configured to:
screening the plurality of local position information and semantic entity identification information of the target document through a preset proportion to obtain an information set;
and extracting the semantic entity relationship from the information set to obtain the relationship among a plurality of semantic entities.
9. An electronic device, comprising:
a memory and a processor, the memory storing computer readable instructions that, when executed by the processor, perform the steps in the method of any of claims 1-4.
10. A computer-readable storage medium, comprising:
computer program which, when run on a computer, causes the computer to perform the method according to any one of claims 1-4.
CN202310645296.1A 2023-06-01 2023-06-01 Method, device and equipment for extracting document information and readable storage medium Pending CN116611450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310645296.1A CN116611450A (en) 2023-06-01 2023-06-01 Method, device and equipment for extracting document information and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310645296.1A CN116611450A (en) 2023-06-01 2023-06-01 Method, device and equipment for extracting document information and readable storage medium

Publications (1)

Publication Number Publication Date
CN116611450A true CN116611450A (en) 2023-08-18

Family

ID=87685214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310645296.1A Pending CN116611450A (en) 2023-06-01 2023-06-01 Method, device and equipment for extracting document information and readable storage medium

Country Status (1)

Country Link
CN (1) CN116611450A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496542A (en) * 2023-12-29 2024-02-02 恒生电子股份有限公司 Document information extraction method, device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496542A (en) * 2023-12-29 2024-02-02 恒生电子股份有限公司 Document information extraction method, device, electronic equipment and storage medium
CN117496542B (en) * 2023-12-29 2024-03-15 恒生电子股份有限公司 Document information extraction method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CA3124358C (en) Method and system for identifying citations within regulatory content
KR101865102B1 (en) Systems and methods for visual question answering
CN107808011B (en) Information classification extraction method and device, computer equipment and storage medium
CN112131920B (en) Data structure generation for table information in scanned images
EP3869385B1 (en) Method for extracting structural data from image, apparatus and device
CN112185520B (en) Text structuring processing system and method for medical pathology report picture
CN110598001A (en) Method, device and storage medium for extracting association entity relationship
CN112949476B (en) Text relation detection method, device and storage medium based on graph convolution neural network
CN113762309B (en) Object matching method, device and equipment
CN111488732B (en) Method, system and related equipment for detecting deformed keywords
CN111753082A (en) Text classification method and device based on comment data, equipment and medium
CN113449801B (en) Image character behavior description generation method based on multi-level image context coding and decoding
CN112784009B (en) Method and device for mining subject term, electronic equipment and storage medium
CN114596566A (en) Text recognition method and related device
CN112287100A (en) Text recognition method, spelling error correction method and voice recognition method
CN113449528A (en) Address element extraction method and device, computer equipment and storage medium
CN116611450A (en) Method, device and equipment for extracting document information and readable storage medium
CN114332893A (en) Table structure identification method and device, computer equipment and storage medium
Dutta et al. Cnn based extraction of panels/characters from bengali comic book page images
CN112597997A (en) Region-of-interest determining method, image content identifying method and device
CN114973286A (en) Document element extraction method, device, equipment and storage medium
CN117271759A (en) Text abstract generation model training method, text abstract generation method and device
CN114428860A (en) Pre-hospital emergency case text recognition method and device, terminal and storage medium
CN110851597A (en) Method and device for sentence annotation based on similar entity replacement
CN112966676B (en) Document key information extraction method based on zero sample learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination