CN117272970B - Document generation method, device, equipment and storage medium - Google Patents

Document generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN117272970B
CN117272970B CN202311559212.9A CN202311559212A CN117272970B CN 117272970 B CN117272970 B CN 117272970B CN 202311559212 A CN202311559212 A CN 202311559212A CN 117272970 B CN117272970 B CN 117272970B
Authority
CN
China
Prior art keywords
target
paragraph
document
chapter
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311559212.9A
Other languages
Chinese (zh)
Other versions
CN117272970A (en
Inventor
刘桂松
张龙辉
王晓麟
王加煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiping Pension Insurance Co ltd
Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch
Original Assignee
Taiping Pension Insurance Co ltd
Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiping Pension Insurance Co ltd, Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch filed Critical Taiping Pension Insurance Co ltd
Priority to CN202311559212.9A priority Critical patent/CN117272970B/en
Publication of CN117272970A publication Critical patent/CN117272970A/en
Application granted granted Critical
Publication of CN117272970B publication Critical patent/CN117272970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a document generation method, a device, equipment and a storage medium, belonging to the technical field of computer communication, wherein the method comprises the following steps: determining a matching object of the target object from the document to be matched according to the target keyword of the target object; determining target content corresponding to the target object according to the target keyword and the matching object; and generating a target document according to the target content based on the preset document template. According to the invention, manual operation is not needed in the whole process, so that intelligent extraction of document content is realized, and the generation efficiency and the generation quality of the document are improved; especially in the information disclosure report generation scene, based on a keyword matching mode, intelligent extraction of the offline report is realized, the generation efficiency and the generation quality of the information disclosure report are improved, and the timeliness of the information disclosure service is further improved.

Description

Document generation method, device, equipment and storage medium
Technical Field
The present invention relates to the field of computer communications technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a document.
Background
At present, professional annuity management investment supervision operation is generally based on a trusted management mode, and relates to multiparty management institutions such as a hosting institution, an investment institution, a trusted institution, an account management institution and the like, and annuity data are interacted in a mode of information disclosure report between different institutions in a mode of interfaces, profound evidence communication or offline mode and the like.
The generation of the information disclosure report depends on the offline report of the annuity data of each institution, which is not completely unified in file format. In the traditional information disclosure report generation method, a large number of manual work processes exist in the information disclosure report generation flow, so that errors are very easy to occur, the information disclosure report generation efficiency and the information disclosure report generation quality are affected, and the timeliness of the information disclosure service is poor.
Disclosure of Invention
The invention provides a document generation method, a document generation device, document generation equipment and a storage medium, which are used for improving the generation efficiency and the generation quality of an information disclosure report and further improving the timeliness of an information disclosure service.
According to an aspect of the present invention, there is provided a document generating method including:
determining a matching object of the target object from the document to be matched according to the target keyword of the target object;
determining target content corresponding to the target object according to the target keyword and the matching object;
and generating a target document according to the target content based on the preset document template.
According to another aspect of the present invention, there is provided a document generating apparatus including:
the matching object determining module is used for determining a matching object of the target object from the document to be matched according to the target keyword of the target object;
The target content determining module is used for determining target content corresponding to the target object according to the target keyword and the matching object;
and the target document determining module is used for generating a target document according to the target content based on the preset document template.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the document generation method of any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a document generation method of any one of the embodiments of the present invention.
According to the technical scheme, the matching object of the target object is determined from the document to be matched according to the target keyword of the target object; determining target content corresponding to the target object according to the target keyword and the matching object; and generating a target document according to the target content based on the preset document template. According to the technical scheme, the matching object is determined from the document to be matched based on the keyword matching mode, so that the target content is positioned based on the keyword and the matching object, the target document is generated, manual operation is not needed in the whole process, intelligent extraction of the document content is realized, and the generation efficiency and the generation quality of the document are improved; especially in the information disclosure report generation scene, based on a keyword matching mode, intelligent extraction of the offline report is realized, the generation efficiency and the generation quality of the information disclosure report are improved, and the timeliness of the information disclosure service is further improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a document generation method provided in accordance with a first embodiment of the present invention;
FIG. 2 is a flow chart of a document generation method provided according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a document generation method provided according to a third embodiment of the present invention;
FIG. 4 is a schematic diagram of a document generating apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device implementing a document generating method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "target" and "candidate" and the like in the description of the present invention and the claims and the above-described drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In addition, it should be noted that, in the technical scheme of the invention, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the document to be matched and the preset document template and the like all conform to the regulations of the related laws and regulations and do not violate the popular regulations of the public order.
Example 1
Fig. 1 is a flowchart of a document generation method according to an embodiment of the present invention, where the method may be implemented by a document generation device, and the device may be implemented in hardware and/or software, and may be configured in an electronic device, which may be a server or a workstation. As shown in fig. 1, the method includes:
s101, determining a matching object of the target object from the document to be matched according to the target keyword of the target object.
The target object refers to an object needing to be filled with content in a preset document template, and can be a table in the preset document template or a paragraph section in the preset document template. The preset document template refers to a pre-built document template, and can be pre-generated according to actual service requirements, for example, the preset document template is an information disclosure report template of a trusted organization.
The target keyword refers to a text capable of uniquely identifying a target object, and may be a table field or a paragraph chapter name. The document to be matched refers to an off-line annuity data report after the document types of all institutions are processed uniformly; documents to be matched include, but are not limited to, an offline annuity data report for a escrow institution, an offline annuity data report for a ledger institution, and an offline annuity data report for an investment institution. The matching object refers to an object matched with the target object, for example, if the target object is a table, the matching object is a table; if the target object is a paragraph chapter, the matching object is a paragraph chapter.
The file type unified processing refers to converting types of offline annuity data reports of each organization into doc types or xls types, for example, converting types of offline annuity data reports such as rtf, wps, docx, etc. into types of offline annuity data reports such as doc types, and converting types of offline annuity data reports such as xlsx, xml, html, etc. into types of offline annuity data reports such as xls types.
Specifically, in the case that the target object is a table, table names of all tables and key fields corresponding to the table names may be extracted from the document to be matched, for example, table 1: field 1, field 2, field 3, and storing into a preset data table; and according to each target keyword of the target object, inquiring key fields completely consistent with the target keywords from a preset data table, and determining the table corresponding to the key fields as a matching object of the target object. The mechanism related to the target object can be acquired under the condition that the target object is a paragraph chapter; screening a document to be matched corresponding to the mechanism from the documents to be matched according to the mechanism related to the target object; and extracting the matching object of the target object from the document to be matched according to the target keyword of the target object.
S102, determining target content corresponding to the target object according to the target keyword and the matching object.
The target content refers to content corresponding to a target keyword of the target object. For example, if the target object is a table and the target keyword is a table field, the target content is a value of the table field; if the target object is a paragraph chapter, the target keyword is a chapter field, and the target content is the content of the chapter field.
Specifically, the target keywords are matched with each keyword in the matched object one by one, the successfully matched keywords are determined according to the matching result, and the content corresponding to the matched keywords is extracted and used as the target content corresponding to the target object.
S103, generating a target document according to the target content based on a preset document template.
Wherein the target document refers to a document generated based on a preset document template.
Specifically, filling target content into a region corresponding to a target object in a preset document template to obtain a target document.
According to the technical scheme, the matching object of the target object is determined from the document to be matched according to the target keyword of the target object; determining target content corresponding to the target object according to the target keyword and the matching object; and generating a target document according to the target content based on the preset document template. According to the technical scheme, the matching object is determined from the document to be matched based on the keyword matching mode, so that the target content is positioned based on the keyword and the matching object, and the target document is generated, manual operation is not needed in the whole process, intelligent extraction of the document content is realized, and the generation efficiency and the generation quality of the document are improved; especially in the information disclosure report generation scene, based on a keyword matching mode, intelligent extraction of the under-line report can be realized, the generation efficiency and the generation quality of the information disclosure report are improved, and the timeliness of the information disclosure service is further improved.
On the basis of the embodiment, as an optional mode of the embodiment of the invention, the target document can be audited based on a preset auditing rule; under the condition that the auditing is passed, determining a stamping page of the target business chapter according to the number of pages of the target document; and determining the stamping position of the target business stamp according to the pixel information in the stamping page.
The preset auditing rules can be preset according to actual service requirements, for example, form data auditing relation auditing rules, text format auditing rules, data type auditing rules and the like, and the embodiment of the invention does not limit the invention specifically. The table data auditing rule refers to a rule for auditing the association relationship such as the inter-dependency relationship, the inter-influence relationship and the like among the table data, for example, taking the product A as an example, the value of the field 1 of the table A provided by the hosting mechanism needs to be consistent with the value of the field 3 of the table B provided by the investment mechanism. The text format auditing rule refers to a rule for auditing the content of a document, for example, detecting whether a messy code exists in the document, whether the document content is overlong, and the like. The data type auditing rule refers to a rule for auditing the data type of the form in the document, for example, detecting whether the data type of the form in the document meets the actual business requirement. The number of pages of a document refers to the number of pages of a target document. The target business chapter refers to a business chapter that the target document needs to cover. The seal page refers to a document page where the target business seal is located. The pixel information refers to information of the pixels in the seal page, and may include the total number of the pixels in the seal page and coordinates of each pixel in the seal page. The stamping position refers to the position of the stamping page for stamping the target business stamp.
Specifically, auditing the target document based on a preset auditing rule; under the condition that the auditing is passed, determining a stamping page of the target business chapter according to the number of pages of the target document and a preset target business chapter keyword; and determining the stamping position of the target business stamp according to the pixel point coordinates of the preset target business stamp keywords in the stamping page, and stamping the business stamp for the target document.
It should be noted that, the target business chapter keyword may be preset according to the actual business requirement, which is not specifically limited in the embodiment of the present invention.
It can be understood that the target document is audited based on the preset audit rule, manual audit is not needed, manual participation is reduced, and audit errors are reduced; meanwhile, under the condition that the target document passes the audit, the business chapter is automatically stamped for the target document according to the number of pages of the target document and the pixel information in the stamping page, so that the complicated process of off-line printing is avoided, and the timeliness of the information disclosure business is improved.
Example two
Fig. 2 is a flowchart of a document generating method according to a second embodiment of the present invention, where, based on the foregoing embodiment, optionally, the target object is a target table, and further, the "determining, according to a target keyword of the target object, a matching object of the target object from the documents to be matched" is further optimized, which provides an optional implementation manner. In the embodiments of the present invention, parts not described in detail may be referred to for related expressions of other embodiments. As shown in fig. 2, the method includes:
S201, calculating the table similarity between the target table and the candidate table in the candidate document according to the target keyword of the target table when the target object is the target table.
The target table refers to a table which needs to be filled with contents in a preset document template. The preset document template refers to a pre-built document template, and can be pre-generated according to actual service requirements, for example, the preset document template is an information disclosure report template of a trusted organization. The target key refers to a table field that can uniquely identify the target table. Optionally, the number of the target keywords of the target table is at least five, so as to more accurately determine the matching table of the target table from the document to be matched. The candidate documents are documents for storing table mapping configuration of all tables in the documents to be matched; the candidate documents may include a table name of the table, a field keyword of the table, and a document to be matched to which the table belongs, for example, table a: key1, key2, key3 belong to the document 1 to be matched. Candidate forms refer to forms in a candidate document that may be selected. The similarity of tables refers to the degree of similarity between tables, and can be expressed by decimal numbers between 0 and 1, and the larger the numerical value is, the higher the similarity of tables between tables is.
Specifically, assuming that the target table is table 1, the target keywords of the target table are A, B, C and D; assuming that the candidate forms in the candidate document include table 2, table 3, and table 4, the field keywords of table 2 are A1, B1, C1, and D1, the field keywords of table 3 are A2, B2, C2, D2, and E2, and the field keywords of table 4 are A3, B3, C3, and D3; table similarity between table 1 and tables 2, 3 and 4, respectively, is calculated based on a cosine similarity algorithm based on the target keywords of the target table, and the table similarity between table 1 and table 2 is determined, for example, by the following formula:
wherein,for the table similarity between table 1 and table 2, +.>For the cosine similarity of the target key A in Table 1 with the field key A1 in Table 2,/I>For the cosine similarity of the target key B in Table 1 with the field key B1 in Table 2,/I>For the cosine similarity of the target keyword C in Table 1 with the field keyword C1 in Table 2,/->The cosine similarity of the target keyword D in table 1 to the field keyword D1 in table 2. Similarly, the table similarity between Table 1 and Table 3 can be determined>Table similarity between table 1 and table 4 ∈ ->
Optionally, for each target keyword of the target table, if the cosine similarity between the target keyword and a field keyword of a certain candidate table in the candidate document is greater than or equal to a preset similarity threshold, the field keyword of the candidate table is used as an alias of the target keyword and stored in the target table, so that the matching recognition success rate of other types of target tables is improved. The preset similarity threshold may be preset according to actual service requirements, which is not specifically limited in the embodiment of the present invention.
S202, determining a matching table of the target table from the documents to be matched according to the table similarity.
The document to be matched refers to an offline annuity data report after the document types of all institutions are processed uniformly; documents to be matched include, but are not limited to, an offline annuity data report for a escrow institution, an offline annuity data report for a ledger institution, and an offline annuity data report for an investment institution. The matching table refers to a table matched with the target table in the document to be matched.
Specifically, the calculated table similarity is compared in pairs, and the maximum table similarity is determined from the table similarity; based on the corresponding relation between the candidate table and the document to be matched, determining a matching table of the target table from the document to be matched according to the candidate table in the candidate document corresponding to the maximum table similarityAnd (5) a grid. For example, if the target table is table 1, the candidate tables in the candidate document include table 2, table 3 and table 4, and the table similarity between table 1 and table 2 isTable similarity between table 1 and table 3 ∈ ->Table similarity between table 1 and table 4 ∈ ->Table 2 belongs to the document 1 to be matched, table 3 belongs to the document 2 to be matched, and table 4 belongs to the document 3 to be matched; will- >、/>And->Comparing every two, if->And if the table 2 is the largest, determining that the document to be matched is the document 1 to be matched by taking the table 2 as a candidate table in the candidate documents corresponding to the largest table similarity, and further determining that the matching table of the target table is the table 2 in the document 1 to be matched.
Alternatively, in the case where the maximum table similarity corresponds to at least two candidate tables in the candidate documents, the matching table of the target table may be determined from the documents to be matched according to the table similarity and the target data type of the target keyword.
The target data type refers to the data type of each target keyword in the target table.
Specifically, under the condition that the maximum table similarity corresponds to at least two candidate tables in the candidate documents, taking the at least two candidate tables corresponding to the maximum table similarity as tables to be matched; and comparing the target data types of the target keywords of the target table with the data types of the field keywords of the tables to be matched one by one, and determining the matching table of the target table from the documents to be matched according to the tables to be matched based on the corresponding relation between the candidate table and the documents to be matched if the target data types of the target keywords of the target table are completely consistent with the data types of the field keywords of a certain table to be matched.
It can be understood that, according to the similarity of the tables and the target data type of the target keywords, the matching table of the target tables is determined from the document to be matched, so that the determined matching table is more accurate, and the accuracy of the target content of the subsequent target tables is improved.
S203, determining target content corresponding to the target object according to the target keyword and the matching object.
Specifically, under the condition that the target object is a target table, matching each field keyword in the target keyword and the matching table one by one respectively, determining the successfully matched matching field keyword according to the matching result, and extracting the content corresponding to the matching field keyword as the target content corresponding to the target object.
S204, generating a target document according to the target content based on a preset document template.
According to the technical scheme, under the condition that the target object is the target table, the field similarity between the target keyword of the target table and the table field in the candidate document is calculated; and determining a matching table of the target table from the documents to be matched according to the field similarity. Specifically, the target object is distinguished, and under the condition that the target object is a target table, the matching object of the target object is determined from the document to be matched in a targeted mode according to the characteristics of the table, so that the obtained matching object is more accurate.
Example III
Fig. 3 is a flowchart of a document generating method according to a third embodiment of the present invention, where, based on the foregoing embodiment, optionally, the target object may be a target paragraph, and further, an alternative implementation is provided by optimizing "determining, according to a target keyword of the target object, a matching object of the target object from the documents to be matched". In the embodiments of the present invention, parts not described in detail may be referred to for related expressions of other embodiments. As shown in fig. 3, the method includes:
s301, calculating the keyword similarity between the target keyword of the target paragraph and the candidate paragraph keyword of the candidate paragraph under the condition that the target object is the target paragraph.
The target paragraph section refers to a paragraph section which needs to be filled with content in a preset document template. The objective paragraph section includes, but is not limited to, a major event description, a performance situation, a market analysis, and an investment review and hope. The preset document template refers to a pre-built document template, and can be pre-generated according to actual service requirements, for example, the preset document template is an information disclosure report template of a trusted organization. The target keyword refers to a chapter keyword that can uniquely identify a target paragraph chapter. The candidate paragraph sections refer to paragraph sections in the document to be matched that can be selected. The document to be matched refers to an off-line annuity data report after the document types of all institutions are processed uniformly; documents to be matched include, but are not limited to, an offline annuity data report for a escrow institution, an offline annuity data report for a ledger institution, and an offline annuity data report for an investment institution. The candidate chapter key refers to a chapter key that can uniquely identify a candidate paragraph chapter. Keyword similarity refers to the similarity of chapter keywords between paragraph chapters, and can be expressed by decimal numbers between 0 and 1, and the larger the numerical value is, the higher the keyword similarity between paragraph chapters is.
Specifically, when the target object is a target paragraph chapter, the keyword similarity between the target keyword of the target paragraph chapter and the candidate chapter keyword of the candidate paragraph chapter is calculated based on a cosine similarity algorithm. For example, if the target keyword is a major item description, the candidate chapter keyword is a major item description, and the keyword similarity is determined to be 1 based on a cosine similarity algorithm.
S302, determining the matched paragraph section of the target paragraph section from the document to be matched according to the keyword similarity.
The matching paragraph chapter refers to a paragraph chapter matched with the target paragraph chapter in the document to be matched.
Specifically, the calculated keyword similarity is compared in pairs, and the maximum keyword similarity is determined from the keyword similarity; and further taking the candidate paragraph chapter in the document to be matched corresponding to the maximum keyword similarity as a matching paragraph chapter of the target paragraph chapter.
Optionally, in the case that the maximum keyword similarity corresponds to at least two candidate paragraph chapters in the document to be matched, the matching paragraph chapter of the target paragraph chapter may be determined from the candidate paragraph chapters according to the keyword similarity and the target attribution party of the target paragraph chapter.
The target attribution party is a party to which the target paragraph section belongs, and may be a hosting mechanism, an account management mechanism or an investment mechanism, which is not particularly limited in the embodiment of the present invention.
Specifically, under the condition that the maximum keyword similarity corresponds to at least two candidate paragraph chapters in the document to be matched, taking the at least two candidate paragraph chapters corresponding to the maximum table keyword similarity as paragraph chapters to be matched; and comparing the attribution of the paragraph section to be matched with the target attribution according to the target attribution of the target paragraph section, and taking the paragraph section to be matched as a matched paragraph section of the target paragraph section if the attribution of a certain paragraph section to be matched is consistent with the target attribution.
It can be understood that, according to the keyword similarity and the target attribution of the target paragraph chapter, the matching paragraph chapter of the target paragraph chapter is determined from the candidate paragraph chapters, so that the determined matching paragraph chapter is more accurate, and the accuracy of the target content of the subsequent target paragraph chapter is improved.
S303, determining target content corresponding to the target object according to the target keyword and the matching object.
Specifically, under the condition that the target object is a target paragraph chapter, matching the target keyword with each chapter keyword in the matched paragraph chapter one by one, determining the matched chapter keyword successfully matched according to a matching result, and extracting the content corresponding to the matched chapter keyword as target content corresponding to the target object.
Optionally, in the case that the target object is the target paragraph chapter, determining the matching paragraph chapter position corresponding to the matching paragraph chapter according to the target keyword and the target attribution based on the corresponding relationship among the target keyword, the document attribution and the paragraph chapter position; and determining the target content of the target paragraph chapter from the matched paragraph chapter according to the matched paragraph chapter position.
The document attribution party refers to a party to which the document belongs, and can be a hosting mechanism, an account management mechanism or an investment mechanism. Paragraph chapter locations refer to the specific locations of paragraph chapters in a document. Alternatively, the paragraph chapter positions may be determined using a start identifier and an end identifier. The matching paragraph chapter position refers to the position of the matching paragraph chapter in the document to be matched.
Specifically, when the target object is a target paragraph chapter, based on a correspondence between a target keyword, a document attribution party and a paragraph chapter position, for example, the target keyword is a role-in-performance condition, the document attribution party corresponding to the target keyword is a hosting mechanism, and the paragraph chapter position corresponding to the target keyword is a position where the role-in-performance condition is located in an offline annuity data report of the hosting mechanism; according to the target keyword and the target attribution of the target paragraph chapter, the position of the matching paragraph chapter corresponding to the matching paragraph chapter can be determined, and then the target content of the target paragraph chapter can be extracted from the matching paragraph chapter.
It can be understood that, in the case that the target object is the target paragraph chapter, based on the correspondence among the target keyword, the document attribution party and the paragraph chapter position, determining the matching paragraph chapter position corresponding to the matching paragraph chapter according to the target keyword and the target attribution party; and determining the target content of the target paragraph chapter from the matching paragraph chapter according to the position of the matching paragraph chapter, so that the target content of the target paragraph chapter is more accurate, and the generation quality of the subsequent target document is improved.
S304, generating a target document according to the target content based on a preset document template.
According to the technical scheme, when the target object is the target paragraph chapter, the keyword similarity between the target keyword of the target paragraph chapter and the candidate chapter keyword of the candidate paragraph chapter is calculated; and determining the matching paragraph section of the target paragraph section from the document to be matched according to the keyword similarity. Specifically, the target object is distinguished, and under the condition that the target object is a target paragraph chapter, the matching object of the target object is determined from the document to be matched in a targeted mode according to the characteristics of the paragraph chapter, so that the obtained matching object is more accurate.
Example IV
Fig. 4 is a schematic structural diagram of a document generating apparatus according to a fourth embodiment of the present invention. The embodiment is applicable to the case of automatic generation of information disclosure reports, and the device may be implemented in hardware and/or software, and may be configured in an electronic device, which may be a server or a workstation. As shown in fig. 4, the apparatus includes:
a matching object determining module 401, configured to determine a matching object of the target object from the document to be matched according to the target keyword of the target object;
A target content determining module 402, configured to determine target content corresponding to the target object according to the target keyword and the matching object;
the target document determining module 403 is configured to generate a target document according to the target content based on a preset document template.
According to the technical scheme, the matching object of the target object is determined through the matching object determining module; determining target content corresponding to the target object through a target content determining module; the target document is generated by a target document determination module. According to the technical scheme, the matching object is determined from the document to be matched based on the keyword matching mode, so that the target content is positioned based on the keyword and the matching object, the target document is generated, manual operation is not needed in the whole process, intelligent extraction of the document content is realized, and the generation efficiency and the generation quality of the document are improved; especially in the information disclosure report generation scene, based on a keyword matching mode, intelligent extraction of the offline report is realized, the generation efficiency and the generation quality of the information disclosure report are improved, and the timeliness of the information disclosure service is further improved.
Optionally, the matching object determining module 401 includes:
A table similarity determining unit, configured to calculate, when the target object is a target table, a table similarity between the target table and a candidate table in the candidate document according to a target keyword of the target table;
and the matching table determining unit is used for determining a matching table of the target table from the documents to be matched according to the table similarity.
Optionally, the matching table determining unit is specifically configured to:
and determining a matching table of the target table from the documents to be matched according to the table similarity and the target data type of the target keyword.
Optionally, the matching object determining module 401 includes:
the keyword similarity determining unit is used for calculating the keyword similarity between the target keyword of the target paragraph chapter and the candidate chapter keyword of the candidate paragraph chapter under the condition that the target object is the target paragraph chapter;
and the paragraph section determining unit is used for determining the matched paragraph section of the target paragraph section from the document to be matched according to the keyword similarity.
Optionally, the paragraph chapter determining unit is specifically configured to:
and determining the matching paragraph section of the target paragraph section from the candidate paragraph sections according to the keyword similarity and the target attribution of the target paragraph section.
Optionally, the target content determining module 402 is specifically configured to:
under the condition that the target object is a target paragraph chapter, determining a matched paragraph chapter position corresponding to the matched paragraph chapter according to the target keyword and the target attribution based on the corresponding relation among the target keyword, the document attribution and the paragraph chapter position;
and determining the target content of the target paragraph chapter from the matched paragraph chapter according to the matched paragraph chapter position.
Optionally, the apparatus further comprises:
the document auditing module is used for auditing the target document based on a preset auditing rule;
the seal page determining module is used for determining a seal page of the target business seal according to the document page number of the target document under the condition that the audit is passed;
and the stamping position determining module is used for determining the stamping position of the target business stamp according to the pixel information in the stamping page.
The document generation device provided by the embodiment of the invention can execute the document generation method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the document generation methods.
Example five
Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM12 and the RAM13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, such as a document generation method.
In some embodiments, the document generation method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM12 and/or the communication unit 19. When the computer program is loaded into the RAM13 and executed by the processor 11, one or more steps of the document generation method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the document generation method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (6)

1. A document generation method, comprising:
determining a matching object of the target object from the document to be matched according to the target keyword of the target object;
determining target content corresponding to the target object according to the target keyword and the matching object;
generating a target document according to the target content based on a preset document template;
the determining the matching object of the target object from the document to be matched according to the target keyword of the target object comprises the following steps:
Under the condition that the target object is a target table, calculating the table similarity between the target table and a candidate table in the candidate document according to a target keyword of the target table; the target form is a form which needs to be filled with content in a preset document template; the candidate documents are documents storing the table mapping configuration of all tables in the document to be matched;
according to the table similarity, determining a matching table of the target table from the document to be matched, including: extracting the maximum table similarity from at least one table similarity, and taking a candidate table corresponding to the maximum table similarity as a matching table of a target table; the document to be matched comprises at least one of an offline annuity data report of a hosting mechanism, an offline annuity data report of an account management mechanism and an offline annuity data report of an investment mechanism;
under the condition that the target object is a target paragraph chapter, calculating the keyword similarity between the target keyword of the target paragraph chapter and the candidate paragraph keyword of the candidate paragraph chapter; the target paragraph chapter is a paragraph chapter which needs to be filled with content in a preset document template; the candidate paragraph chapters are paragraph chapters which can be selected in the document to be matched;
And determining the matching paragraph section of the target paragraph section from the document to be matched according to the keyword similarity, wherein the matching paragraph section comprises the following steps: extracting the maximum keyword similarity from at least one keyword similarity, and taking a candidate paragraph chapter corresponding to the maximum keyword similarity as a matching paragraph chapter of a target paragraph chapter; if at least two candidate paragraph chapters corresponding to the maximum keyword similarity exist, determining a matching paragraph chapter of the target paragraph chapter from the at least two candidate paragraph chapters according to the maximum keyword similarity and a target attribution party of the target paragraph chapter;
the determining the target content corresponding to the target object according to the target keyword and the matching object comprises the following steps:
when the target object is a target paragraph chapter, determining a matching paragraph chapter position corresponding to the matching paragraph chapter according to the target keyword and the target attribution party based on the corresponding relation among the target keyword, the document attribution party and the paragraph chapter position;
and determining target content of the target paragraph chapter from the matching paragraph chapter according to the matching paragraph chapter position.
2. The method of claim 1, wherein determining a matching table of the target table from the documents to be matched according to the table similarity comprises:
and determining a matching table of the target table from the document to be matched according to the table similarity and the target data type of the target keyword.
3. The method according to any one of claims 1-2, wherein the method further comprises:
based on a preset auditing rule, auditing the target document;
under the condition that the auditing is passed, determining a stamping page of the target business chapter according to the document page number of the target document;
and determining the stamping position of the target business stamp according to the pixel information in the stamping page.
4. A document generating apparatus, comprising:
the matching object determining module is used for determining a matching object of the target object from the document to be matched according to the target keyword of the target object;
the target content determining module is used for determining target content corresponding to the target object according to the target keyword and the matching object;
the target document determining module is used for generating a target document according to the target content based on a preset document template;
Wherein, the matching object determining module includes:
a table similarity determining unit, configured to calculate, when the target object is a target table, a table similarity between the target table and a candidate table in the candidate document according to a target keyword of the target table; the target form is a form which needs to be filled with content in a preset document template; the candidate documents are documents storing the table mapping configuration of all tables in the document to be matched;
a matching table determining unit for determining a matching table of the target table from the documents to be matched according to the table similarity, including: extracting the maximum table similarity from at least one table similarity, and taking a candidate table corresponding to the maximum table similarity as a matching table of a target table; the document to be matched comprises at least one of an offline annuity data report of a hosting mechanism, an offline annuity data report of an account management mechanism and an offline annuity data report of an investment mechanism;
the keyword similarity determining unit is used for calculating the keyword similarity between the target keyword of the target paragraph chapter and the candidate chapter keyword of the candidate paragraph chapter under the condition that the target object is the target paragraph chapter; the target paragraph chapter is a paragraph chapter which needs to be filled with content in a preset document template; the candidate paragraph chapters are paragraph chapters which can be selected in the document to be matched;
The paragraph section determining unit is configured to determine, according to the keyword similarity, a matching paragraph section of the target paragraph section from the document to be matched, and includes: extracting the maximum keyword similarity from at least one keyword similarity, and taking a candidate paragraph chapter corresponding to the maximum keyword similarity as a matching paragraph chapter of a target paragraph chapter; if at least two candidate paragraph chapters corresponding to the maximum keyword similarity exist, determining a matching paragraph chapter of the target paragraph chapter from the at least two candidate paragraph chapters according to the maximum keyword similarity and a target attribution party of the target paragraph chapter;
the target content determining module is specifically configured to:
when the target object is a target paragraph chapter, determining a matching paragraph chapter position corresponding to the matching paragraph chapter according to the target keyword and the target attribution party based on the corresponding relation among the target keyword, the document attribution party and the paragraph chapter position;
and determining target content of the target paragraph chapter from the matching paragraph chapter according to the matching paragraph chapter position.
5. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the document generation method of any one of claims 1-3.
6. A computer readable storage medium storing computer instructions for causing a processor to perform the document generation method of any one of claims 1-3.
CN202311559212.9A 2023-11-22 2023-11-22 Document generation method, device, equipment and storage medium Active CN117272970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311559212.9A CN117272970B (en) 2023-11-22 2023-11-22 Document generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311559212.9A CN117272970B (en) 2023-11-22 2023-11-22 Document generation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117272970A CN117272970A (en) 2023-12-22
CN117272970B true CN117272970B (en) 2024-03-01

Family

ID=89203042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311559212.9A Active CN117272970B (en) 2023-11-22 2023-11-22 Document generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117272970B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795919A (en) * 2019-11-07 2020-02-14 达而观信息科技(上海)有限公司 Method, device, equipment and medium for extracting table in PDF document
CN114417820A (en) * 2022-01-26 2022-04-29 盟浪可持续数字科技(深圳)有限责任公司 Content filtering method for target object
CN114444465A (en) * 2022-02-09 2022-05-06 北京百度网讯科技有限公司 Information extraction method, device, equipment and storage medium
CN116776850A (en) * 2023-06-09 2023-09-19 兴业银行股份有限公司 Document processing method, device, computer equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9448992B2 (en) * 2013-06-04 2016-09-20 Google Inc. Natural language search results for intent queries
US20190139642A1 (en) * 2016-04-26 2019-05-09 Ascend Hit Llc System and methods for medical image analysis and reporting

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795919A (en) * 2019-11-07 2020-02-14 达而观信息科技(上海)有限公司 Method, device, equipment and medium for extracting table in PDF document
CN114417820A (en) * 2022-01-26 2022-04-29 盟浪可持续数字科技(深圳)有限责任公司 Content filtering method for target object
CN114444465A (en) * 2022-02-09 2022-05-06 北京百度网讯科技有限公司 Information extraction method, device, equipment and storage medium
CN116776850A (en) * 2023-06-09 2023-09-19 兴业银行股份有限公司 Document processing method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN117272970A (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN113407610B (en) Information extraction method, information extraction device, electronic equipment and readable storage medium
CN112560468A (en) Meteorological early warning text processing method, related device and computer program product
KR102555607B1 (en) Method and apparatus for annotating data, device, storage medium and computer program
CN111651552A (en) Structured information determination method and device and electronic equipment
CN109471957B (en) Metadata conversion method and device based on uniform tags
CN113904943A (en) Account detection method and device, electronic equipment and storage medium
CN117272970B (en) Document generation method, device, equipment and storage medium
CN116309963A (en) Batch labeling method and device for images, electronic equipment and storage medium
CN115422275A (en) Data processing method, device, equipment and storage medium
CN113360672B (en) Method, apparatus, device, medium and product for generating knowledge graph
CN115658903A (en) Text classification method, model training method, related device and electronic equipment
CN114969444A (en) Data processing method and device, electronic equipment and storage medium
CN114461665A (en) Method, apparatus and computer program product for generating a statement transformation model
CN114187448A (en) Document image recognition method and device, electronic equipment and computer readable medium
CN117150215B (en) Assessment result determining method and device, electronic equipment and storage medium
CN114281981B (en) News brief report generation method and device and electronic equipment
CN114492409B (en) Method and device for evaluating file content, electronic equipment and program product
CN116069914B (en) Training data generation method, model training method and device
EP4131022A1 (en) Method and apparatus of determining location information, electronic device, storage medium, and program product
CN118227580A (en) Log analysis method and device, electronic equipment and storage medium
CN118193809A (en) Image generation method, device, electronic apparatus, storage medium, and program product
CN116229489A (en) Document comparison method, device, equipment and medium
CN115525614A (en) Data access method, device, equipment, system and storage medium
CN117009356A (en) Method, device and equipment for determining application success of public data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant