CN112800761A - Information backfill method and related electronic equipment and storage medium thereof - Google Patents

Information backfill method and related electronic equipment and storage medium thereof Download PDF

Info

Publication number
CN112800761A
CN112800761A CN202011565819.4A CN202011565819A CN112800761A CN 112800761 A CN112800761 A CN 112800761A CN 202011565819 A CN202011565819 A CN 202011565819A CN 112800761 A CN112800761 A CN 112800761A
Authority
CN
China
Prior art keywords
extraction
label
target
extracted
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011565819.4A
Other languages
Chinese (zh)
Inventor
徐美君
昕宇
昌玮
路姚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iflytek Information Technology Co Ltd
Original Assignee
Iflytek Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iflytek Information Technology Co Ltd filed Critical Iflytek Information Technology Co Ltd
Priority to CN202011565819.4A priority Critical patent/CN112800761A/en
Publication of CN112800761A publication Critical patent/CN112800761A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The application discloses an information backfill method and related electronic equipment and storage media thereof. The information backfilling method comprises the following steps: acquiring a plurality of files to be extracted; respectively extracting elements of a plurality of files to be extracted to obtain element extraction results comprising at least one group of label extraction results; selecting one or more extraction elements from a group of label extraction results matched with the target label of the target file; and taking the selected extraction element as backfill information of a target label of the target file. Because the plurality of extraction elements included in each group of label extraction results correspond to the same label and are extracted from different files to be extracted respectively, the content can be selected from the different files to be extracted as the backfill content of the target file, and the target file does not need to be manually filled after the files to be extracted are manually screened, so that the labor can be saved, and the efficiency of information backfill is improved.

Description

Information backfill method and related electronic equipment and storage medium thereof
Technical Field
The present disclosure relates to the field of information technologies, and in particular, to an information backfill method and related electronic device and storage medium.
Background
Related mechanisms can generate different types of files at different stages of a case, but in the same case, information such as names, identification numbers and the like can be shared, so that partial content can be obtained from the existing files and can be directly applied to subsequent files. Generally, when acquiring the information that can be shared from the existing file, it is necessary to manually read the existing file to find the corresponding content, and then manually fill in the target file, which takes a lot of time and labor.
Disclosure of Invention
The technical problem mainly solved by the application is to provide an information backfilling method and related electronic equipment and storage media thereof, so that labor can be saved, and the information backfilling efficiency is improved.
In order to solve the above problem, a first aspect of the present application provides an information backfilling method, including: acquiring a plurality of files to be extracted; respectively extracting elements of the plurality of files to be extracted to obtain element extraction results, wherein the element extraction results comprise at least one group of label extraction results, each group of label extraction results comprise a plurality of extraction elements corresponding to the same label, and different extraction elements corresponding to the same label are respectively extracted from different files to be extracted; selecting one or more extraction elements from a group of label extraction results matched with a target label of a target file; and taking the selected extraction element as backfill information of a target label of the target file.
Each group of label extraction results further comprises a plurality of extraction marks respectively corresponding to the plurality of extraction elements, each extraction mark comprises a label name, and the label names in each group of label extraction results all point to the same label; before selecting one or more of the extracted elements from the set of extracted results matching the target label of the target document, the method further comprises: and selecting a group of label extraction results of which the contained label names are matched with the target label as a group of label extraction results matched with the target label.
Wherein the tag name comprises at least one of: document number, document date, residence place, party information.
Each group of label extraction results further comprises a plurality of extraction marks respectively corresponding to the plurality of extraction elements, and each extraction mark comprises a file type of the file to be extracted from which the corresponding extraction element comes; selecting one or more of the extracted elements from a set of label extraction results matching the target label of the target document, including: according to the priority of the file type, sequentially taking each extraction element in a group of label extraction results matched with the target label as a candidate extraction element; judging whether the candidate extraction element is empty or not; and if not, taking the candidate extraction element as a selected extraction element, and executing the backfill information taking the selected extraction element as the target label of the target file.
Wherein the taking the selected extraction element as the backfill information of the target label of the target file comprises: performing format processing on the selected extracted elements to obtain standard elements, wherein the format processing includes at least one of the following: converting the data into matched preset enumeration elements, date format conversion and digital conversion; and taking the standard element as backfill information of a target label of the target file.
Wherein after the formatting process is performed on the selected extracted element to obtain a standard element, the method further comprises: judging whether the standard element meets the backfill requirement of the target label or not; if not, the extraction elements which do not meet the backfill requirements are removed from a group of label extraction results matched with the target labels of the target files, one or more extraction elements and subsequent steps thereof are selected from the group of label extraction results matched with the target labels of the target files, until the standard elements corresponding to the selected extraction elements meet the backfill requirements, and the standard elements are used as backfill information of the target labels of the target files.
Each group of label extraction results further comprises a plurality of position information respectively corresponding to the plurality of extraction elements, and the position information is used for indexing the positions of the corresponding extraction elements in the files to be extracted; after the selected extraction element is taken as backfill information of a target label of the target file, the method further comprises the following steps: responding to a user's reference instruction of the backfill information, and acquiring the position information corresponding to the extraction element as the backfill information; and displaying the corresponding page of the file to be extracted pointed by the acquired position information.
After the displaying the corresponding page of the file to be extracted pointed by the acquired location information, the method further includes: marking the content extracted as the backfill information on the displayed page; and/or acquiring the modification content of the backfill information by a user, and taking the modification content as the final backfill information of the target label of the target file.
The element extraction result is obtained by utilizing a file extraction engine to extract elements of the plurality of files to be extracted; after the taking the modified content as the final backfill information of the target label of the target file, the method further comprises: and taking the target file backfilled with the final backfilling information as a training sample of the file extraction engine.
Wherein, the acquiring a plurality of files to be extracted comprises: acquiring a plurality of original electronic files; identifying a file type of the original electronic file; screening out the original electronic file with a preset file type as the file to be extracted based on the file type; before the element extraction is performed on the plurality of documents to be extracted respectively to obtain element extraction results, the method further includes: and converting the file to be extracted into a text format by using a character recognition algorithm.
To solve the above problem, a second aspect of the present application provides an electronic device, comprising: a memory and a processor coupled to each other; the processor is used for executing the program instructions stored in the memory to realize the information backfilling method of the first aspect.
In order to solve the above problem, a third aspect of the present application provides a computer-readable storage medium on which program instructions are stored, the program instructions, when executed by a processor, implement the information backfilling method of the first aspect.
According to the mode, the element extraction results comprising at least one group of label extraction results are obtained after the element extraction is carried out on the plurality of files to be extracted respectively, and the plurality of extraction elements comprising each group of label extraction results correspond to the same label and are extracted from different files to be extracted respectively, so that when one or more extraction elements are selected from one group of label extraction results to serve as backfill information, the content can be selected from different files to be extracted to serve as backfill content of the target file, the target file does not need to be manually filled after the files to be extracted are manually screened, labor is saved, and the efficiency of information backfill is improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of an information backfilling method according to the present application;
FIG. 2 is a schematic flow chart diagram illustrating another embodiment of an information backfilling method according to the present application;
FIG. 3 is a schematic flow chart diagram illustrating a further embodiment of the information backfilling method of the present application;
FIG. 4 is a block diagram of an embodiment of an electronic device of the present application;
FIG. 5 is a block diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The term "and/or" herein is merely one kind of association relationship describing an associated object, and means that three kinds of relationships may exist. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an embodiment of an information backfilling method according to the present application.
Specifically, the method of the present embodiment includes the following steps:
step S11: and acquiring a plurality of files to be extracted.
The file to be extracted can be used for providing relevant content of backfill information. Because different files may include the same content, the existing file can be used as the file to be extracted, and the file to be filled is the target file, so that the content in the file to be extracted is filled in the target file, and the information backfill of the target file is realized. The file to be extracted can be any file containing characters, pictures and the like. For convenience of subsequent element extraction, the format of the document to be extracted may be standardized, for example, before performing element extraction on the plurality of documents to be extracted respectively in step S12 to obtain the element extraction result, the document to be extracted may be converted into a text format by using a character recognition algorithm. The text Recognition algorithm may be any algorithm capable of implementing format conversion, including but not limited to an OCR (Optical Character Recognition) text Recognition algorithm.
The document to be extracted may be an electronic document generated in various fields such as teaching and medicine, or may be a picture obtained by scanning or shooting a related paper document, and the like, and is not limited specifically herein. In an application scenario, when a relevant organization processes a case, files of different document types are generated at various stages, and backfill contents can be extracted from generated files to be extracted and filled in target files in order to fill the target files. The document to be extracted may be various documents such as an opinion book and a decision book.
The file to be extracted may be derived from the original electronic file, i.e. the file to be extracted may be at least a part of the original electronic file. In one embodiment, after a plurality of original electronic files are obtained, the file types of the original electronic files are identified, and therefore the original electronic files of preset file types are screened out to serve as files to be extracted. The reliability of the content contained in the original electronic files of different file types may be different, so that the original electronic file of the preset file type can be set as the file to be extracted in a customized manner, the file type of the file to be extracted is customized, and the reliability of the content is improved.
Step S12: and respectively extracting elements of the plurality of files to be extracted to obtain element extraction results, wherein the element extraction results comprise at least one group of label extraction results, each group of label extraction results comprise a plurality of extraction elements corresponding to the same label, and different extraction elements corresponding to the same label are respectively extracted from different files to be extracted.
After the plurality of files to be extracted are obtained, element extraction is respectively carried out on the plurality of files to be extracted, and an element extraction result can be obtained. The method for extracting the elements includes, but is not limited to, performing element extraction on a plurality of files to be extracted by using a file extraction engine to obtain an element extraction result.
The element extraction result comprises at least one group of label extraction results. Each group of label extraction results comprises a plurality of extraction elements corresponding to the same label. Different extraction elements corresponding to the same label are extracted from different files to be extracted respectively. Each group of label extraction results comprises a plurality of extraction elements, and different extraction elements can be derived from different files to be extracted. For example, in an application scenario, there are A, B, C files to be extracted currently, and A, B, C files to be extracted each include an "address" tag, and element extraction is performed on A, B, C files to be extracted respectively to obtain a set of tag extraction results, where the tag extraction results include three extraction elements, namely, address a1, address B1, and address C1, corresponding to the address "tag, where address a1, address B1, and address C1 are extracted from the file to be extracted a, the file to be extracted B, and the file to be extracted C, respectively. It is understood that under the same label, a plurality of label extraction results can be included.
In an embodiment, the element extraction result may further include, but is not limited to, a number of extraction marks respectively corresponding to a number of extraction elements. The extraction flag is used to indicate information associated with the extraction element. The extraction mark comprises a label name, a file type of a file to be extracted from which the corresponding extraction element comes, and the like. The label name is one-to-one correspondence to the extraction element in the document to be extracted, for example, the label name is name, and the extraction element is Zhang III. The label name depends on the specific content of the file to be extracted, and is not specifically limited herein. In an embodiment, the tag name may include at least one of: document number, document date, residence place, party information.
Each group of label extraction results comprise a plurality of extraction elements corresponding to the same label, and different extraction elements corresponding to the same label are extracted from different files to be extracted respectively, so that one-to-many matching relationship exists between the label and the extraction elements, and one or more extraction elements are selected from the extraction elements conveniently as information backfilled to the target file.
Step S13: one or more extraction elements are selected from a set of label extraction results that match the target label of the target document.
The target tag of the target file may include several. The target label of each target document corresponds to a set of label extraction results, such that one or more extraction elements may be selected from the set of label extraction results that match the target label of the target document. As in the foregoing application scenario, the set of tag extraction results includes address a1, address B1, and address C1, and when the target tag of the target file is an address, one address a1 may be selected from address a1, address B1, and address C1.
In an embodiment, each set of tag extraction results may further include a plurality of extraction marks respectively corresponding to a plurality of extraction elements, and each extraction mark includes a tag name, so that before one or more extraction elements are selected from a set of tag extraction results matching a target tag of a target document, a set of tag extraction results including tag names matching the target tag may be selected as a set of tag extraction results matching the target tag. Because the label names in each group of label extraction results all point to the same label, if the label name of the label extraction result is matched with the target label, the group of label extraction results is matched with the target label.
In an embodiment, each set of tag extraction results further includes a plurality of extraction marks respectively corresponding to a plurality of extraction elements, and the extraction marks include file types of the documents to be extracted from which the corresponding extraction elements come, so that when one or more extraction elements are selected from a set of tag extraction results matched with a target tag of a target document, each extraction element in the set of tag extraction results matched with the target tag can be sequentially used as a candidate extraction element according to the priority of the file type; judging whether the candidate extraction element is empty; and if not, taking the candidate extraction element as the selected extraction element, and executing backfill information taking the selected extraction element as the target label of the target file.
In an embodiment, association relations among a plurality of extraction elements are preset, so that when element extraction is performed on a plurality of documents to be extracted respectively, the plurality of extraction elements with association relations can be classified into the same group of label extraction results, and therefore the plurality of extraction elements can be selected from a group of label extraction results matched with a target label of a target document.
Step S14: and taking the selected extraction element as backfill information of a target label of the target file.
The selected extraction element is derived from the label extraction result matched with the target label of the target file, so that the content in the file to be extracted can be filled in the target file by taking the selected extraction element as the backfill information of the target label of the target file, and the information of the corresponding labels in the file to be extracted and the target file is the same.
According to the mode, after element extraction is carried out on the plurality of files to be extracted respectively, the element extraction results comprising at least one group of label extraction results are obtained, and the plurality of extraction elements included in each group of label extraction results correspond to the same label and are extracted from different files to be extracted respectively.
As described above, when one or more extraction elements are selected from a set of extraction results of tags matching a target tag of a target document, each extraction element in the set of extraction results of tags matching the target tag may be sequentially used as a candidate extraction element according to the priority of the document type; judging whether the candidate extraction element is empty; and if not, taking the candidate extraction element as the selected extraction element, and taking the selected extraction element as the backfill information of the target label of the target file. In order to further normalize the backfill content of the backfill target file, when the selected extraction element is used as the backfill information of the target label of the target file, format processing can be performed on the selected extraction element to obtain a standard element, and the standard element is used as the backfill information of the target label of the target file. It will be appreciated that the extracted elements of the document type having a high priority may not be complete, and even if the selected extracted elements are formatted, the backfill content of the target tag may not be satisfied, for example, the backfill content of the "address" target tag requires province city district county village, but the extracted elements with high priority of document type only include city, province, district, county, therefore, in order to make the backfill information of the target label more complete and accurate, after the element extraction result is obtained, and judging whether the format processing is carried out or not, and whether the backfill requirements of the target label are met or not, and the like, so that the optimal extraction element is selected from a group of label extraction results matched with the target label of the target file. Referring to fig. 2, fig. 2 is a schematic flow chart illustrating another embodiment of the information backfilling method according to the present application. Specifically, the method of the present embodiment includes the following steps:
step S21: and acquiring a plurality of files to be extracted.
The description of step S21 can refer to the detailed description of step S11 shown in fig. 1, and is not repeated herein.
Step S22: and respectively extracting the elements of the plurality of files to be extracted to obtain element extraction results.
In this embodiment, the element extraction result includes at least one group of tag extraction results. Each group of label extraction results comprise a plurality of extraction elements corresponding to the same label and a plurality of extraction marks respectively corresponding to the extraction elements. The extraction mark comprises label names and file types of files to be extracted from which corresponding extraction elements come, so that the corresponding extraction elements obtained from different files to be extracted can be distinguished through the file types, and meanwhile, a plurality of label names in each group of label extraction results all point to the same label. For example, extracting and marking the opinion book address, wherein the opinion book is a file type, and the address is a label name; the extraction mark is a decision book address, wherein the decision book is a file type, and the address is a label name, so that extraction elements actually corresponding to the opinion book address and the decision book address are about the address, but the two are from different files to be extracted. The same group of label extraction results correspond to the same label name and correspond to a plurality of extraction elements, so that the label name and the extraction elements can have a one-to-one or one-to-many matching relationship.
Step S23: and selecting a group of label extraction results of which the contained label names are matched with the target label as a group of label extraction results matched with the target label.
The plurality of label names in each group of label extraction results all point to the same label, that is, the label names respectively corresponding to the extraction elements in each group of label extraction results all point to the same label, so that if the label name of the label extraction result is matched with the target label, the group of label extraction results is matched with the target label.
Step S24: and according to the priority of the file type, sequentially taking each extraction element in a group of label extraction results matched with the target label as a candidate extraction element.
In the same group of label extraction results, different extraction elements corresponding to the same label are extracted from different documents to be extracted respectively, and the documents to be extracted of different document types have differences of authority, normalization and the like, so that under the condition that the extraction mark comprises the document type of the document to be extracted from the corresponding extraction element, each extraction element in a group of label extraction results matched with the target label can be sequentially used as a candidate extraction element according to the priority of the document type, and the extraction element of a specific document type can be preferentially considered as the backfill information of the target label.
Step S25: and judging whether the candidate extraction element is empty or not.
In the document to be extracted, there may be some extraction elements that are omitted, so that there is no substantial content that can be used as backfill information, and therefore, after the extraction element in the tag extraction result is used as a candidate extraction element, it may be determined whether the candidate extraction element is empty, and if not, step S26 is executed; if so, continuing to take the next extraction element after the candidate extraction element in the group of label extraction results matched with the target label as a new candidate extraction element according to the priority of the document type until the candidate extraction element is not empty and the candidate extraction element is taken as the selected extraction element.
Through the above steps S23-S25, a set of tag extraction results matching the target tag can be determined by using the tag name in the tag extraction results, and the extraction elements obtained from the document to be extracted of the specific document type are preferentially taken as the selected extraction elements by using the document type in the tag extraction results, so that in the present embodiment, one or more extraction elements are selected from the set of tag extraction results matching the target tag of the target document by using both the tag name and the document type extraction flag.
Step S26: and taking the candidate extraction elements as the selected extraction elements, and performing format processing on the selected extraction elements to obtain standard elements.
The format processing includes at least one of: converting into matched preset enumeration elements, date format conversion and digital conversion. When the selected extraction element is converted into the matched preset enumeration element, whether the selected extraction element is matched with the preset enumeration element is judged, if not, the selected extraction element is converted into the matched preset enumeration element, for example, the extraction element 'home-living-fertilizer-united county' is converted into 'Anhui-province-fertilizer-united county'. In the date format conversion, for example, the extracted element is "two 0-eight years Yuanyue one number", and the extracted element is converted into "2018-01-01" after the date format conversion. In the case of digital conversion, for example, the number of extracted elements is "three", and the extracted elements are converted into "3" after digital conversion. The format processing includes, but is not limited to, conversion to a matching pre-set enumeration element, date format conversion, and number conversion, and is not particularly limited herein. The description mode of the extraction element is various, when the extraction element is not in accordance with the verification format of the backfill information, the extraction element is difficult to directly backfill into the target file, and therefore the formats of the diversified and irregular extraction elements are unified through format processing.
Step S27: and judging whether the standard elements meet the backfill requirements of the target label.
The standard element is the result after the element format extraction processing, but the situation that the information is incomplete may still exist, so that whether the standard element meets the backfill requirement of the target label or not can be judged, if yes, the step S28 is executed to take the standard element as the backfill information of the target label of the target file; if not, removing the extraction elements which do not meet the backfill requirement from the group of label extraction results matched with the target label of the target file, executing the step S24 and the subsequent steps again until the standard elements corresponding to the selected extraction elements meet the backfill requirement, and executing the step S28 to take the standard elements as the backfill information of the target label of the target file.
Therefore, under the condition that the standard element does not meet the backfill requirement of the target label, the extraction element which does not meet the backfill requirement is removed from the group of label extraction results matched with the target label of the target file, one or more extraction elements and subsequent steps thereof are selected from the group of label extraction results matched with the target label of the target file, until the standard element corresponding to the selected extraction element meets the backfill requirement, the standard element is taken as the backfill information of the target label of the target file, and therefore the accuracy of the backfill information can be improved.
Step S28: and taking the standard element as backfill information of a target label of the target file.
Through the above-mentioned format processing and the judgment of the backfill requirement of the target label, the selected standard element not only meets the format requirement, but also meets the backfill requirement of the target label, so that the standard element is used as the backfill information of the target label of the target file, and the accuracy of backfill content can be improved.
After the selected extraction elements are used as backfill information of the target label of the target file, the backfill information can be further refined and utilized. Referring to fig. 3, fig. 3 is a schematic flow chart illustrating a further embodiment of the information backfilling method according to the present application. Specifically, the method of the embodiment includes the following steps:
step S31: and acquiring a plurality of files to be extracted.
The description of step S31 can refer to the detailed description of step S11 shown in fig. 1, and is not repeated herein.
Step S32: and respectively extracting the elements of the plurality of files to be extracted to obtain element extraction results.
In this embodiment, the element extraction result may be obtained by performing element extraction on a plurality of documents to be extracted by using a document extraction engine. Specifically, a file extraction engine is used for respectively extracting elements of a plurality of files to be extracted, and when an element extraction result is obtained, the elements of the files to be extracted are analyzed, extracted and deduced through semantic analysis, logical relation judgment, numerical operation and other means. The document extraction engine can, but is not limited to, extract elements of the document to be extracted by utilizing a neural network model.
The element extraction result comprises at least one group of label extraction results, and each group of label extraction result comprises a plurality of extraction elements corresponding to the same label and a plurality of position information respectively corresponding to the extraction elements. The position information is used to index the position of the corresponding extraction element in the document to be extracted, for example, the number of lines to which the extraction element belongs in the document to be extracted, or the number of lines and the character position to which the extraction element belongs in the document to be extracted, and the like, and is not limited specifically herein.
Step S33: one or more extraction elements are selected from a set of label extraction results that match the target label of the target document.
The description of step S33 may refer to the detailed description of step S13 shown in fig. 1, or refer to the detailed description of step S23-step S25 shown in fig. 2, which is not described herein again.
Step S34: and taking the selected extraction element as backfill information of a target label of the target file.
Step S35: and responding to a reference instruction of the user to the backfill information, and acquiring position information corresponding to the extraction element as the backfill information.
Different from the above embodiment, after the selected extraction element is used as the backfill information of the target tag of the target file, the embodiment can also provide the target file for the user to review in response to a review instruction of the user on the backfill information, so that the user can review, modify and the like the backfill information of the target tag.
Because each group of label extraction results also comprises a plurality of position information respectively corresponding to the extraction elements, and the position information is used for indexing the positions of the corresponding extraction elements in the files to be extracted, the position information corresponding to the extraction elements serving as backfill information is obtained in response to a query instruction of a user on the backfill information, so that the user can know from which files to be extracted the extraction elements come from and the positions of the extraction elements corresponding to the indexes in the files to be extracted through the position information, and the user can conveniently perform query, check or modification and other operations on the backfill information.
Step S36: and displaying the corresponding page of the file to be extracted pointed by the acquired position information.
The position information is used for indexing the position of the corresponding extraction element in the file to be extracted, so that after the position information corresponding to the extraction element serving as backfill information is acquired, the corresponding page of the file to be extracted, which is pointed by the acquired position information, can be displayed, and a user can conveniently look up the source of the extraction element.
Step S37: and acquiring the modification content of the backfill information by the user, and taking the modification content as the final backfill information of the target label of the target file.
After the corresponding page of the file to be extracted pointed by the acquired position information is displayed, the modification content of the backfill information by the user can be acquired, and the modification content is used as the final backfill information of the target label of the target file. After the user refers to the backfill information corresponding to the extraction element through the corresponding page of the file to be extracted, if the backfill information is found to be wrong, the backfill information can be modified, so that the modified content of the backfill information by the user is obtained, the modified content is used as the final backfill information of the target label of the target file, and manual verification and confirmation of the backfill information are achieved.
After the corresponding page of the file to be extracted, to which the acquired position information points, is displayed, the content extracted as the backfill information may be marked on the displayed page. Specific implementations of the mark include, but are not limited to, bold, underline, highlight in a predetermined color, and the like, and are not particularly limited herein.
Step S38: and taking the target file backfilled with the final backfilling information as a training sample of the file extraction engine.
In order to improve the accuracy of the document extraction engine in extracting the elements of the plurality of documents to be extracted, after the modified content is used as the final backfill information of the target label of the target document, the target document backfilled with the final backfill information can be used as a training sample of the document extraction engine.
By the mode, each group of label extraction results further comprises a plurality of position information respectively corresponding to the plurality of extraction elements, and the position information is used for indexing the positions of the corresponding extraction elements in the files to be extracted, so that the position information corresponding to the extraction elements serving as the backfill information is acquired in response to a query instruction of a user on the backfill information, the corresponding page of the files to be extracted pointed by the acquired position information is displayed, and the operations of querying, checking or modifying the backfill information and the like are facilitated for the user. In addition, the target file backfilled with the final backfilling information modified by the user can be used as a training sample of the file extraction engine, so that the accuracy of the file extraction engine in element extraction of the plurality of files to be extracted is improved.
In order to facilitate understanding of the present application, the following description is made with reference to a specific application example:
the related mechanism generates a large number of files in the process of processing cases, and part of case information in the files is the same and can be shared in different files, so that the existing files can be used as files to be extracted to fill in target files. The documents to be extracted may be documents generated by the relevant institutions in the whole process of processing cases, including but not limited to opinions, opinion decisions, answers, evidence materials.
Take the document to be extracted as the opinion book, the answer form and the target document as the decision book as an example. The label names of the opinion book and the answer form comprise the information of the residence and the party, and the target label of the decision book also comprises the information of the residence and the party. And respectively extracting the elements of the opinion book and the answer form by using a file extraction engine to obtain an element extraction result. The case information is extracted from a large amount of unstructured information of the files such as the opinion books by using the file extraction engine, so that the subsequent automatic filling of the case information is facilitated, the information backfilling efficiency is improved, missing filling and wrong filling are reduced, and the case handling efficiency can be improved.
The element extraction result comprises two groups of label extraction results, wherein one group of label extraction results comprises two extraction element residence a1 and residence a2 corresponding to residence, wherein residence a1 is extracted from the opinion book, and residence a2 is extracted from the answer form; the other group of label extraction result comprises two extraction elements of the party information, namely the party information b1 and the party information b2, wherein the party information b1 is extracted from the opinion book, the party information b2 is extracted from the answer form, namely different extraction elements corresponding to the same label are respectively extracted from different documents to be extracted.
The following steps will be described by taking as an example a set of tag extraction results in which a tag name is selected as a place of residence and a set of tag extraction results in which the place of residence of the target file matches. The priority of the opinion book is set to be higher than that of the answer form in advance, so that according to the priority of the file type, the address a1 in the group of label extraction results matched with the address is firstly used as a candidate extraction element, whether the candidate extraction element is empty is judged, and if not, the address a1 is used as the selected extraction element.
When the place of residence a1 is not empty, format processing is performed on the selected place of residence a1 to obtain standard elements, whether the standard elements meet the backfill requirements of the target label is judged, and if yes, the standard elements are used as the backfill information of the target label of the target file.
If address a1 is empty, address a2 in the set of tag extraction results matching the address is used as a candidate extraction element, and it is determined whether the candidate extraction element is empty, and if not, address a2 is used as the selected extraction element.
Through the judgment of whether the candidate extraction elements are empty or not, the candidate extraction elements which are not empty are used as the selected extraction elements, so that the backfill rate and the utilization degree of the file extraction result can be improved, and the workload of manual searching and filling of personnel is reduced.
Certainly, when the standard element is judged not to meet the backfill requirement of the target tag, the residential areas which do not meet the backfill requirement are removed from the group of tag extraction results of the residential areas, and other residential areas in the group of tag extraction results of the residential areas are taken as candidate extraction elements and subsequent steps according to the priority of the file types, which is not described herein again.
In order to confirm the backfill information manually, after the extracted elements are automatically backfilled to the target file, the position information corresponding to the extracted elements can be synchronously returned to the background server, so that the position information corresponding to the extracted elements serving as the backfill information can be obtained in response to a query instruction of a user for the backfill information; displaying a corresponding page of the file to be extracted pointed by the acquired position information, and marking the content extracted as backfill information on the displayed page, so that the file to be extracted is quickly positioned, and the return information is marked and displayed, thereby being convenient for confirming the accuracy of the backfill information; and acquiring the modified content of the backfill information by the user, and taking the modified content as the final backfill information of the target label of the target file. In order to improve the accuracy of the file extraction engine, the target file backfilled with the final backfilling information can be used as a training sample of the file extraction engine.
The application embodiment has at least the following beneficial effects: the business practice is fully considered, the labels and the extraction elements can realize one-to-one and one-to-many relations, and the same backfill information can obtain the relation configuration from different files to be extracted; the file extraction engine is utilized to extract the element extraction result from the unstructured information of the file to be extracted, so that the automatic backfilling of subsequent information is facilitated, the information backfilling efficiency is improved, the missing filling and the wrong filling are reduced, and the case handling efficiency can be improved; the target file backfilled with the final backfilling information is used as a training sample of the file extraction engine, so that the accuracy of the file extraction engine can be improved; the selected extraction elements can be subjected to format processing and/or judgment of the backfill requirements of the target labels, so that the formats of the diversified and irregular extraction elements are unified, and the backfill requirements of the target labels are better met.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Referring to fig. 4, fig. 4 is a schematic block diagram of an embodiment of an electronic device 40 according to the present application. The electronic device 40 includes a memory 41 and a processor 42 coupled to each other, and the processor 42 is configured to execute program instructions stored in the memory 41 to implement the steps of any of the above-mentioned embodiments of the information backfilling method. In one particular implementation scenario, electronic device 40 may include, but is not limited to: a microcomputer, a server, and the electronic device 40 may also include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.
Specifically, the processor 42 is configured to control itself and the memory 41 to implement the steps of any of the above-described embodiments of the information backfilling method. Processor 42 may also be referred to as a CPU (Central Processing Unit). The processor 42 may be an integrated circuit chip having signal processing capabilities. The Processor 42 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 42 may be commonly implemented by an integrated circuit chip.
Referring to fig. 5, fig. 5 is a block diagram illustrating an embodiment of a computer-readable storage medium 50 according to the present application. The computer readable storage medium 50 stores program instructions 501 capable of being executed by the processor, and the program instructions 501 are used for implementing the steps of any of the above-described information backfilling method embodiments.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims (12)

1. An information backfilling method, the method comprising:
acquiring a plurality of files to be extracted;
respectively extracting elements of the plurality of files to be extracted to obtain element extraction results, wherein the element extraction results comprise at least one group of label extraction results, each group of label extraction results comprise a plurality of extraction elements corresponding to the same label, and different extraction elements corresponding to the same label are respectively extracted from different files to be extracted;
selecting one or more extraction elements from a group of label extraction results matched with a target label of a target file;
and taking the selected extraction element as backfill information of a target label of the target file.
2. The method according to claim 1, wherein each group of the tag extraction results further includes a plurality of extraction marks respectively corresponding to the plurality of extraction elements, the extraction marks include tag names, and the tag names in each group of the tag extraction results all point to the same tag;
before selecting one or more of the extracted elements from the set of extracted results matching the target label of the target document, the method further comprises:
and selecting a group of label extraction results of which the contained label names are matched with the target label as a group of label extraction results matched with the target label.
3. The method of claim 2, wherein the tag name comprises at least one of: document number, document date, residence place, party information.
4. The method according to claim 1, wherein each set of the label extraction results further includes a plurality of extraction marks respectively corresponding to the plurality of extraction elements, and the extraction marks include the document types of the documents to be extracted from which the corresponding extraction elements come;
selecting one or more of the extracted elements from a set of label extraction results matching the target label of the target document, including:
according to the priority of the file type, sequentially taking each extraction element in a group of label extraction results matched with the target label as a candidate extraction element;
judging whether the candidate extraction element is empty or not;
and if not, taking the candidate extraction element as a selected extraction element, and executing the backfill information taking the selected extraction element as the target label of the target file.
5. The method according to claim 1, wherein the using the selected extracted element as backfill information of a target label of the target document comprises:
performing format processing on the selected extracted elements to obtain standard elements, wherein the format processing includes at least one of the following: converting the data into matched preset enumeration elements, date format conversion and digital conversion;
and taking the standard element as backfill information of a target label of the target file.
6. The method of claim 5, wherein after said formatting said selected said extracted elements to obtain standard elements, said method further comprises:
judging whether the standard element meets the backfill requirement of the target label or not;
if not, the extraction elements which do not meet the backfill requirements are removed from a group of label extraction results matched with the target labels of the target files, one or more extraction elements and subsequent steps are selected from the group of label extraction results matched with the target labels of the target files, until the standard elements corresponding to the selected extraction elements meet the backfill requirements, and the standard elements are used as backfill information of the target labels of the target files.
7. The method according to claim 1, wherein each group of the tag extraction results further includes a plurality of position information respectively corresponding to the plurality of extraction elements, and the position information is used for indexing the positions of the corresponding extraction elements in the document to be extracted;
after the selected extraction element is taken as backfill information of a target label of the target file, the method further comprises the following steps:
responding to a user's reference instruction of the backfill information, and acquiring the position information corresponding to the extraction element as the backfill information;
and displaying the corresponding page of the file to be extracted pointed by the acquired position information.
8. The method according to claim 7, wherein after the displaying the corresponding page of the file to be extracted to which the obtained location information points, the method further comprises:
marking the content extracted as the backfill information on the displayed page; and/or the presence of a gas in the gas,
and acquiring the modification content of the backfill information by a user, and taking the modification content as the final backfill information of the target label of the target file.
9. The method according to claim 8, wherein the element extraction result is obtained by performing element extraction on the plurality of documents to be extracted by using a document extraction engine;
after the taking the modified content as the final backfill information of the target label of the target file, the method further comprises:
and taking the target file backfilled with the final backfilling information as a training sample of the file extraction engine.
10. The method according to claim 1, wherein the obtaining a plurality of files to be extracted comprises:
acquiring a plurality of original electronic files;
identifying a file type of the original electronic file;
screening out the original electronic file with a preset file type as the file to be extracted based on the file type;
before the element extraction is performed on the plurality of documents to be extracted respectively to obtain element extraction results, the method further includes:
and converting the file to be extracted into a text format by using a character recognition algorithm.
11. An electronic device, comprising a memory and a processor coupled to each other, wherein the processor is configured to execute program instructions stored in the memory to implement the information backfilling method according to any one of claims 1 to 10.
12. A computer-readable storage medium having stored thereon program instructions, which when executed by a processor, implement the information backfilling method according to any one of claims 1 through 10.
CN202011565819.4A 2020-12-25 2020-12-25 Information backfill method and related electronic equipment and storage medium thereof Pending CN112800761A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011565819.4A CN112800761A (en) 2020-12-25 2020-12-25 Information backfill method and related electronic equipment and storage medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011565819.4A CN112800761A (en) 2020-12-25 2020-12-25 Information backfill method and related electronic equipment and storage medium thereof

Publications (1)

Publication Number Publication Date
CN112800761A true CN112800761A (en) 2021-05-14

Family

ID=75804916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011565819.4A Pending CN112800761A (en) 2020-12-25 2020-12-25 Information backfill method and related electronic equipment and storage medium thereof

Country Status (1)

Country Link
CN (1) CN112800761A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815471A (en) * 2019-01-04 2019-05-28 深圳壹账通智能科技有限公司 Contract text generation method, device, computer equipment and storage medium
CN110334217A (en) * 2019-05-10 2019-10-15 科大讯飞股份有限公司 A kind of element abstracting method, device, equipment and storage medium
CN110688349A (en) * 2019-08-29 2020-01-14 重庆小雨点小额贷款有限公司 Document sorting method, device, terminal and computer readable storage medium
CN110765770A (en) * 2019-09-04 2020-02-07 平安科技(深圳)有限公司 Automatic contract generation method and device
CN111932413A (en) * 2020-09-14 2020-11-13 平安国际智慧城市科技股份有限公司 Case element extraction method, case element extraction device, case element extraction equipment and case element extraction medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815471A (en) * 2019-01-04 2019-05-28 深圳壹账通智能科技有限公司 Contract text generation method, device, computer equipment and storage medium
CN110334217A (en) * 2019-05-10 2019-10-15 科大讯飞股份有限公司 A kind of element abstracting method, device, equipment and storage medium
CN110688349A (en) * 2019-08-29 2020-01-14 重庆小雨点小额贷款有限公司 Document sorting method, device, terminal and computer readable storage medium
CN110765770A (en) * 2019-09-04 2020-02-07 平安科技(深圳)有限公司 Automatic contract generation method and device
CN111932413A (en) * 2020-09-14 2020-11-13 平安国际智慧城市科技股份有限公司 Case element extraction method, case element extraction device, case element extraction equipment and case element extraction medium

Similar Documents

Publication Publication Date Title
US20160055376A1 (en) Method and system for identification and extraction of data from structured documents
US10482170B2 (en) User interface for contextual document recognition
US8824803B2 (en) Automated field position linking of indexed data to digital images
CN110737630B (en) Method and device for processing electronic archive file, computer equipment and storage medium
CN112800848A (en) Structured extraction method, device and equipment of information after bill identification
CN110765101B (en) Label generation method and device, computer readable storage medium and server
CN115758451A (en) Data labeling method, device, equipment and storage medium based on artificial intelligence
CN115116068A (en) Archive intelligent filing system based on OCR
CN111932413A (en) Case element extraction method, case element extraction device, case element extraction equipment and case element extraction medium
CN112800761A (en) Information backfill method and related electronic equipment and storage medium thereof
CN113111829B (en) Method and device for identifying document
US20100023517A1 (en) Method and system for extracting data-points from a data file
CN114626341A (en) Document conversion method, device and storage medium
CN114549177A (en) Insurance letter examination method, device, system and computer readable storage medium
CN117010349B (en) Form filling method, system and storage medium based on neural network model
CN115358751B (en) Automatic auditing method and device for transaction receipt and electronic equipment
CN112199467B (en) Configuration method and device for mail display page
CN115640952B (en) Method and system for importing and uploading data
CN113821441B (en) Execution method, device, equipment and storage medium based on document test case
CN112199466B (en) Method and device for identifying associated rule of mail
CN105683945A (en) Computer implemented system and method for collating and presenting multi-format information
CN116702703A (en) Automatic typesetting method and electronic equipment
CN115730074A (en) File classification method and device, computer equipment and storage medium
CN117612182A (en) Document classification method, device, electronic equipment and medium
CN113920343A (en) Information input scanning device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination