CN115934928A - Information extraction method, device, equipment and storage medium - Google Patents

Information extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN115934928A
CN115934928A CN202211696330.XA CN202211696330A CN115934928A CN 115934928 A CN115934928 A CN 115934928A CN 202211696330 A CN202211696330 A CN 202211696330A CN 115934928 A CN115934928 A CN 115934928A
Authority
CN
China
Prior art keywords
fund
poster
layout
coordinates
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211696330.XA
Other languages
Chinese (zh)
Inventor
董儒汲
郭焕阳
彭锃
丁波
刘超
纪传俊
纪达麒
陈运文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Daguan Intelligent Shenzhen Co ltd
Original Assignee
Daguan Intelligent Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Daguan Intelligent Shenzhen Co ltd filed Critical Daguan Intelligent Shenzhen Co ltd
Priority to CN202211696330.XA priority Critical patent/CN115934928A/en
Publication of CN115934928A publication Critical patent/CN115934928A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides an information extraction method, an information extraction device, information extraction equipment and a storage medium, wherein the method comprises the following steps: converting the fund poster into an editable file; performing layout identification on the fund poster to obtain coordinates of layout blocks; extracting each target field from the text of the editable file with the coordinates in each layout block aiming at each layout block; screening the target field to obtain an extraction result; and combining the extraction results to obtain structured data, and presenting the structured data. The technical scheme provided by the embodiment of the invention can improve the working efficiency and save the manual reading time.

Description

Information extraction method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to an information extraction method, an information extraction device, information extraction equipment and a storage medium.
Background
With the rapid development of Chinese economy in recent years, the number of financial assets which can be allocated by residents is increased, and the public fund raising industry is developed vigorously. The fund propaganda and promotion material is used as a necessary form for promoting fund products, and various propaganda and promotion materials can be manufactured by related personnel or fund sales organizations according to the requirement of improving the propaganda effect. In order to protect the legitimate rights and interests of consumers and promote the healthy development of the market, the propaganda and promotion material is required to have objective content and real data.
Fund promotional material is usually presented in the form of a "Fund poster" with no fixed format or content. At present, some computer technical means are urgently needed to assist manual processing, so that the working efficiency is improved, and the labor cost is reduced, wherein information extraction is an important step, but the methods in the related technologies are difficult to process.
Disclosure of Invention
The embodiment of the invention provides an information extraction method, an information extraction device, information extraction equipment and a storage medium, which can improve the working efficiency and save the manual reading time.
In a first aspect, an embodiment of the present invention provides an information extraction method, including:
converting the fund poster into an editable file;
performing layout identification on the fund poster to obtain coordinates of layout blocks;
extracting each target field from the text of the editable file with the coordinates in each layout block aiming at each layout block;
screening the target field to obtain an extraction result;
and combining the extraction results to obtain structured data, and presenting the structured data.
In a second aspect, an embodiment of the present invention provides an information extraction apparatus, including:
the conversion module is used for converting the fund poster into an editable file;
the identification module is used for identifying the layout of the fund poster to obtain the coordinates of the layout block;
the extraction module is used for extracting each target field from the text of the editable file with the coordinates positioned in each edition block aiming at each edition block;
the filtering module is used for screening the target field to obtain an extraction result;
and the combination and presentation module is used for combining the extraction results to obtain structured data and presenting the structured data.
In a third aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the method provided by the embodiments of the present invention.
In a fourth aspect, the embodiment of the present invention provides a computer-readable storage medium, where computer instructions are stored, and the computer instructions are used to enable a processor to implement the method provided by the embodiment of the present invention when executed.
According to the technical scheme provided by the embodiment of the invention, the fund poster is converted into an editable file; performing layout identification on the fund poster to obtain coordinates of layout blocks; extracting each target field from the text of the editable file with the coordinates in each layout block aiming at each layout block; screening the target field to obtain an extraction result; and combining the extraction results to obtain structured data, and presenting the structured data, so that the working efficiency can be improved, and the manual reading time can be saved.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an information extraction method according to an embodiment of the present invention;
fig. 2a is a flowchart of an information extraction method according to an embodiment of the present invention;
FIG. 2b is a flowchart of an information extraction method according to an embodiment of the present invention;
fig. 3 is a block diagram of an information extraction apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a flowchart of an information extraction method according to an embodiment of the present invention, where the embodiment is applicable to information extraction of poster material of a fund, and the method may be executed by an information extraction apparatus, which may be implemented in the form of hardware and/or software, and the apparatus may be configured in an electronic device. As shown in fig. 1, the method includes:
s110: the fund poster is converted into an editable file.
In the embodiment of the present invention, before S110, the method may further include dividing the key information to be identified into several target fields, and configuring the target fields. Analyzing key information to be identified, dividing the key information into a plurality of target fields, and setting the target fields. After the setup is completed, the fund poster may be uploaded to process the fund poster.
In an implementation of the embodiment of the present invention, optionally, the converting the fund poster into an editable file includes: converting the fund posters in various formats into single-layer PDF files; and converting the single-layer PDF file into a double-layer PDF file through optical character recognition, and recording the coordinate of each character. The contents of the fund posters are normalized, and specifically, the fund posters in various formats, such as pictures, documents, scanning pieces and the like, are converted into single-layer PDF files and stored. Then, the text in the single-layer PDF file is recognized by Optical Character Recognition (OCR) and added to the PDF file, a two-layer PDF file is formed, and the coordinates of each Character are recorded.
S120: and carrying out layout identification on the fund poster to obtain the coordinates of layout blocks.
In the embodiment of the present invention, the fund poster is generally in columns, but the boundaries of the columns cannot be identified during OCR recognition, which easily causes an error in the reading sequence of the fund poster (for example, two sentences that are not a paragraph may be linked together), and causes the subsequent target field to be incorrectly extracted, so that it is necessary to perform layout recognition to obtain each layout block, and thus perform the extraction of the target field in each block.
In an implementation manner of the embodiment of the present invention, optionally, the performing layout identification on the fund poster to obtain coordinates of layout blocks includes: carrying out binarization processing on the fund poster; and (4) performing line scanning on the binaryzation-processed fund poster to obtain the coordinates of the layout blocks. Optionally, the step of performing binarization processing on the fund poster includes: converting the color picture of the fund poster into a black and white picture of the fund poster; correspondingly, the line scanning is carried out on the fund poster subjected to binarization processing to obtain the coordinates of the layout blocks, and the method comprises the following steps: and performing line scanning on the black and white picture by adopting OpenCV to obtain coordinates of layout blocks. The method can be used for dividing a picture into information islands containing information by OpenCV line scanning, so that block division is realized, and coordinates of layout blocks are obtained.
Therefore, by carrying out layout identification on the fund poster, the coordinates of each layout block are obtained, so that the target field can be conveniently and correctly extracted subsequently, and the information can be correctly extracted.
S130: for each layout block, each target field is extracted from the text of the editable file with coordinates located in each layout block.
In the embodiment of the invention, because the coordinates of each character in the editable file are recorded, the coordinates of each layout block are also obtained, the text of the editable file with the coordinates positioned in each layout block can be obtained, and each target field is extracted from the text.
S140: and screening the target field to obtain an extraction result.
In the embodiment of the invention, some extracted target fields do not meet the requirements, and the target fields need to be screened, so that the target fields are repeatedly extracted to obtain the extraction result.
In an implementation manner of the embodiment of the present invention, optionally, the database is queried to determine whether the target field meets the requirement; and filtering the target fields which do not meet the requirements to obtain an extraction result. Specifically, each target field is matched with a database, whether each target field is a field which needs to be extracted and meets the requirement is judged, if the target field does not meet the requirement, the target field is filtered, and the target field meeting the requirement is reserved to obtain an extraction result. For example, to extract a fund name, when setting, the target field of the extraction XX fund is set. During extraction, all fields of the XX fund are obtained, a certain 'essential fund' field may not be a fund name, a database is required to be inquired to judge whether the 'essential fund' field is the fund name, and after inquiry, the field is filtered, and a target field conforming to the fund name is reserved to obtain an extraction result, wherein the 'essential fund' field is not the fund name.
S150: and combining the extraction results to obtain structured data, and presenting the structured data.
In an implementation manner of the embodiment of the present invention, optionally, the combining the extraction results to obtain the structured data includes: and combining the extracted results based on the position relation of the layout blocks to obtain structured data. Specifically, in the extraction result, fields with specified relationships of the same layout block are combined, and structured data is formed based on the belonged relationships. For example, in the same layout block, one of fund performance x and 2021 is found, and the other of fund performance y and 2022 is found, and these two groups belong to fund performance and fund a to which the fund performance belongs. Therefore, fund a, the fund performance, two groupings under fund performance may form structured data.
According to the technical scheme provided by the embodiment of the invention, the fund poster is converted into an editable file; performing layout identification on the fund poster to obtain coordinates of layout blocks; extracting each target field from the text of the editable file with the coordinates in each layout block aiming at each layout block; screening the target field to obtain an extraction result; and combining the extraction results to obtain structured data, and presenting the structured data, so that the extraction efficiency can be improved, and the time for manual reading can be saved.
Fig. 2a is a flowchart of an information extraction method provided in an embodiment of the present invention, where in this embodiment, optionally, the performing layout identification on the fund poster to obtain coordinates of layout blocks includes:
carrying out binarization processing on the fund poster;
and (4) performing line scanning on the binaryzation-processed fund poster to obtain the coordinates of the layout blocks.
Optionally, the converting the fund poster into an editable file comprises:
converting the fund posters in various formats into single-layer PDF files;
and converting the single-layer PDF file into a double-layer PDF file through optical character recognition, and recording the coordinate of each character.
The screening of the target field to obtain an extraction result comprises:
judging whether the target field meets the requirement or not by inquiring a database;
and filtering the target fields which do not meet the requirements to obtain an extraction result.
As shown in fig. 2a, the technical solution provided by the embodiment of the present invention includes:
s210: the fund posters in various formats are converted into single-layer PDF files.
S220: and converting the single-layer PDF file into a double-layer PDF file through optical character recognition, and recording the coordinate of each character.
S230: and carrying out binarization processing on the fund poster.
S240: and (4) performing line scanning on the foundation poster subjected to the binarization processing to obtain coordinates of the layout blocks.
S250: for each layout block, each target field is extracted from the text of the editable file with coordinates located in each layout block.
S260: and judging whether the target field meets the requirement or not by querying a database.
S270: and filtering the target fields which do not meet the requirements to obtain an extraction result.
S280: and combining the extraction results to obtain structured data, and presenting the structured data.
Wherein, reference may be made to the description of the above embodiments for S210 to S280.
The technical solution provided by the embodiment of the present invention may also refer to fig. 2b, as shown in fig. 2b, the method includes:
setting fields: analyzing the key information to be identified, and dividing the key information into a plurality of target fields.
Data normalization: the pictures, documents and fund posters of the scanned pieces in various formats are converted into PDFs and are stored locally.
OCR processing: and identifying characters in the picture type, adding the characters to a PDF file, and recording the coordinates of the characters.
And (3) binarization processing: the color gold poster picture is converted into black and white.
And (3) identifying the layout: and obtaining the coordinates of the layout blocks by using OpenCV line scanning.
Fuzzy extraction: for each layout block, each target field is extracted for the text whose coordinates are within the layout block.
Back check and extraction: and inquiring a data database for matching the extracted information, and accurately extracting the result.
And (3) extracting the relation: and combining the extracted information according to the position relation of the layout blocks.
And presenting the result: and presenting the extraction result according to the combined structure.
According to the technical scheme provided by the embodiment of the invention, the layout of the fund poster and the characters in the fund poster are automatically identified, some specific key information is extracted on the basis, the information is combined according to the correlation relation to obtain the structured data of the key information of the fund poster, the structured data is displayed to the user according to the structure of the information, the identification of the fund poster without limiting the file format can be supported, the identification efficiency is high, the key information in the fund poster can be read in a short time, the manual reading time is saved, the accuracy is high, the processing efficiency is improved, the expansibility is good, and different extraction rules can be met only by adjusting the field setting.
Fig. 3 is a block diagram of an information extraction apparatus according to an embodiment of the present invention, and as shown in fig. 3, the apparatus includes: a conversion module 310, an identification module 320, an extraction module 330,
A conversion module 310 for converting the fund poster into an editable file;
the identification module 320 is used for identifying the layout of the fund poster to obtain the coordinates of a layout block;
an extracting module 330, configured to extract, for each of the layout blocks, each target field from the text of the editable file whose coordinates are located in each of the layout blocks;
the screening module 340 is configured to screen the target field to obtain an extraction result;
a combining and presenting module 350, configured to combine the extraction results to obtain structured data, and present the structured data.
Optionally, the right of the fund poster performs layout recognition to obtain coordinates of layout blocks, including:
carrying out binarization processing on the fund poster;
and (4) performing line scanning on the binaryzation-processed fund poster to obtain the coordinates of the layout blocks.
Optionally, the binarizing processing of the fund poster includes:
converting the color picture of the fund poster into a black and white picture of the fund poster;
correspondingly, the line scanning is carried out on the fund poster subjected to binarization processing to obtain the coordinates of the layout blocks, and the method comprises the following steps:
and performing line scanning on the black and white picture by adopting OpenCV to obtain coordinates of layout blocks.
Optionally, the converting the fund poster into an editable file includes:
converting the fund posters in various formats into single-layer PDF files;
and converting the single-layer PDF file into a double-layer PDF file through optical character recognition, and recording the coordinate of each character.
Optionally, the screening the target field to obtain an extraction result includes:
judging whether the target field meets the requirement or not by querying a database;
and filtering the target fields which do not meet the requirements to obtain an extraction result.
Optionally, the combining the extraction results to obtain structured data includes:
and combining the extracted results based on the position relation of the layout blocks to obtain structured data.
Optionally, the apparatus further includes a setting module, configured to:
dividing key information to be identified into a plurality of target fields, and configuring the target fields.
The device provided by the embodiment of the invention can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
FIG. 4 shows a schematic block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as the information extraction method.
In some embodiments, the information extraction method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the information extraction method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the information extraction method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the Internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An information extraction method, comprising:
converting the fund poster into an editable file;
performing layout identification on the fund poster to obtain coordinates of layout blocks;
extracting each target field from the text of the editable file with the coordinates in each layout block aiming at each layout block;
screening the target field to obtain an extraction result;
and combining the extraction results to obtain structured data, and presenting the structured data.
2. The method of claim 1 wherein said layout recognizing said fund poster to obtain coordinates of a layout patch comprises:
carrying out binarization processing on the fund poster;
and (4) performing line scanning on the binaryzation-processed fund poster to obtain the coordinates of the layout blocks.
3. The method according to claim 2, wherein the subjecting the fund poster to binarization processing includes:
converting the color picture of the fund poster into a black and white picture of the fund poster;
correspondingly, the line scanning is carried out on the fund poster subjected to binarization processing to obtain the coordinates of the layout blocks, and the method comprises the following steps:
and performing line scanning on the black and white picture by adopting OpenCV to obtain coordinates of layout blocks.
4. The method of claim 1,
the converting a fund poster into an editable file, comprising:
converting the fund posters in various formats into single-layer PDF files;
and converting the single-layer PDF file into a double-layer PDF file through optical character recognition, and recording the coordinate of each character.
5. The method of claim 1, wherein the screening the target field to obtain an extraction result comprises:
judging whether the target field meets the requirement or not by inquiring a database;
and filtering the target fields which do not meet the requirements to obtain an extraction result.
6. The method of claim 1, wherein said combining the extracted results to obtain structured data comprises:
and combining the extracted results based on the position relation of the layout blocks to obtain structured data.
7. The method of claim 1, further comprising:
dividing key information to be identified into a plurality of target fields, and configuring the target fields.
8. An information extraction apparatus, characterized by comprising:
the conversion module is used for converting the fund poster into an editable file;
the identification module is used for identifying the layout of the fund poster to obtain the coordinates of a layout block;
the extraction module is used for extracting each target field from the text of the editable file with the coordinates positioned in each edition block aiming at each edition block;
the screening module is used for screening the target field to obtain an extraction result;
and the combination and presentation module is used for combining the extraction results to obtain structured data and presenting the structured data.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for causing a processor to perform the method of any one of claims 1-7 when executed.
CN202211696330.XA 2022-12-28 2022-12-28 Information extraction method, device, equipment and storage medium Pending CN115934928A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211696330.XA CN115934928A (en) 2022-12-28 2022-12-28 Information extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211696330.XA CN115934928A (en) 2022-12-28 2022-12-28 Information extraction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115934928A true CN115934928A (en) 2023-04-07

Family

ID=86557569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211696330.XA Pending CN115934928A (en) 2022-12-28 2022-12-28 Information extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115934928A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159969A (en) * 2021-05-17 2021-07-23 广州故新智能科技有限责任公司 Financial long text rechecking system
CN118550891A (en) * 2024-05-10 2024-08-27 北京度友信息技术有限公司 Portable file format document processing method, portable file format document processing device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159969A (en) * 2021-05-17 2021-07-23 广州故新智能科技有限责任公司 Financial long text rechecking system
CN118550891A (en) * 2024-05-10 2024-08-27 北京度友信息技术有限公司 Portable file format document processing method, portable file format document processing device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN115934928A (en) Information extraction method, device, equipment and storage medium
CN113705554A (en) Training method, device and equipment of image recognition model and storage medium
US11341319B2 (en) Visual data mapping
CN113408323B (en) Extraction method, device and equipment of table information and storage medium
US12118770B2 (en) Image recognition method and apparatus, electronic device and readable storage medium
CN113239807B (en) Method and device for training bill identification model and bill identification
WO2023231380A1 (en) Electrode plate defect recognition method and apparatus, and electrode plate defect recognition model training method and apparatus, and electronic device
CN115098440A (en) Electronic archive query method, device, storage medium and equipment
CN114924959A (en) Page testing method and device, electronic equipment and medium
EP3869398A2 (en) Method and apparatus for processing image, device and storage medium
CN114187448A (en) Document image recognition method and device, electronic equipment and computer readable medium
CN113610809A (en) Fracture detection method, fracture detection device, electronic device, and storage medium
CN112528610A (en) Data labeling method and device, electronic equipment and storage medium
CN112801016A (en) Vote data statistical method, device, equipment and medium
CN115393870A (en) Text information processing method, device, equipment and storage medium
CN114049686A (en) Signature recognition model training method and device and electronic equipment
CN115116070A (en) Method, device and equipment for accurately cutting PDF and storage medium
CN115116080A (en) Table analysis method and device, electronic equipment and storage medium
CN113515280A (en) Page code generation method and device
CN114998906B (en) Text detection method, training method and device of model, electronic equipment and medium
CN114328242B (en) Form testing method and device, electronic equipment and medium
CN116644724B (en) Method, device, equipment and storage medium for generating bid
CN114911963A (en) Template picture classification method, device, equipment, storage medium and product
CN116884023A (en) Image recognition method, device, electronic equipment and storage medium
CN115757739A (en) Information extraction model training method, information extraction device, information extraction equipment and information extraction medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination