CN115146030A

CN115146030A - Official document writing method and system based on knowledge graph

Info

Publication number: CN115146030A
Application number: CN202210794241.2A
Authority: CN
Inventors: 陈刚
Original assignee: Shanghai Yanshu Computer Technology Co ltd
Current assignee: Shanghai Yanshu Computer Technology Co ltd
Priority date: 2022-07-05
Filing date: 2022-07-05
Publication date: 2022-10-04

Abstract

The invention discloses a method and a system for writing official documents based on a knowledge graph, wherein the method comprises the following steps: importing materials for reference into a system, extracting a knowledge graph from material contents, generating a basic information table according to an article outline, sequencing the basic information table, and pushing text segments, wherein the system comprises: the system comprises a storage module, a transmission module, a graph database module, a recall query module, a sorting module and a pushing module. According to the invention, related technologies such as NLP, a knowledge graph and big data are adopted, and the content of a reference material can be cut and extracted to generate the knowledge graph. The generated knowledge graph can help a user to push matched text segments for reference when writing the official document. The time and the energy consumed by the staff during writing are reduced, and the writing efficiency of the staff is improved.

Description

Official document writing method and system based on knowledge graph

Technical Field

The invention relates to the field of official document writing, in particular to an official document writing method and system based on a knowledge map.

Background

Official documents are the common practice of official agencies and organizations in public activities. The written materials, also called official documents, are called official documents for short, which are formed and used according to a specific style through a certain processing procedure. Efficient writing of documents can help work to run correctly and efficiently.

When writing official documents, the writing difficulty is that the materials needed to be used as references are very many, and the refining requires high precision. Due to the two difficulties, when a staff writes a document, a large amount of time needs to be consumed to collect the materials, the materials are repeatedly read, and the needed parts are screened out from the materials. A large amount of tedious and repetitive preparation work consumes a large amount of time and energy of staff, so that the efficiency of writing the official documents is low. Therefore, those skilled in the art provide a method and system for writing official documents based on knowledge graph to solve the above problems in the background art.

Disclosure of Invention

In order to solve the technical problems, the invention provides a method for writing official documents based on a knowledge graph, which comprises the following steps:

step 1: importing materials for reference into a system;

and 2, step: extracting a knowledge graph from the material content;

and step 3: generating a basic information table according to the article outline;

and 4, step 4: sorting the basic information table;

and 5: and pushing the text segment.

The invention provides a service document writing system based on a knowledge graph, which comprises a storage module, a transmission module, a graph database module, a recall query module, a sorting module and a pushing module.

Preferably: the storage module is used for storing data;

the storage module comprises a Neo4j database and an ES non-standard database, wherein the Neo4j database is used for storing data in an 'entity-relationship-entity' format, and the ES non-standard database is used for storing ES table data.

Preferably: the transmission module is used for transmitting data between different modules;

the transmission module comprises a receiving unit, a sending unit and a transmission unit, wherein the receiving unit is used for receiving data sent by other modules, the sending unit is used for sending the data to other units, and the transmission unit is used for transporting the data between the receiving unit and the sending unit.

Preferably: the map database module is used for extracting a knowledge map;

the graph database module comprises a document preprocessing unit, a hash calculating unit, a triple extracting unit and a Neo4j database storage unit, wherein the document preprocessing unit divides the acquired article content into paragraphs by taking an outline as a partition, the hash calculating unit calculates a hash value according to the paragraphs to generate an index of a Pid representative paragraph, the triple extracting unit divides the paragraph data in the es table into sentence-level granularity, extracts knowledge triples according to the syntactic dependency relationship and the noun and verb levels after word segmentation, and submits the triples to the Neo4j database storage unit.

Preferably, the following components: the recall query module is used for generating a basic information table;

the recall query module comprises a user retrieval unit, a Neo4j database fuzzy query unit and a recall result storage unit, wherein the user retrieval unit acquires outline data input by a user, calls a word segmentation tool to cut the data, finally transmits the data into an ES database to generate a basic information table of the outline data for storage, the Neo4j database fuzzy query unit calls the basic information table of the outline data, generates a query result table by querying the Neo4j database in a fuzzy mode, removes repeated items of the query result table, receives the query result table, marks articles according to the query result table to create article IDs as supplementary information, and generates a subsequent basic information table.

Preferably, the following components: the sorting module is used for calculating similarity;

the sequencing module comprises a TF-IDF matrix transformation unit and a similarity calculation unit, wherein the TF-IDF matrix transformation unit establishes a large matrix and a query index according to a basic information table, establishes a search matrix according to outline data of a user, and the similarity calculation unit performs Euclidean distance calculation on the large matrix and the search matrix and finally obtains the texts of topk returned paragraphs according to calculation results.

Preferably, the following components: the push module is used for pushing the document segment content;

the pushing module comprises a recall unit and a sequencing unit, wherein the recall unit calculates the similarity between all data in the ES non-standard database and the outline, pushes similar contents and stores the contents in a text table, and the sequencing unit acquires the most similar contents for pushing through the similarity calculation when a user inputs a search word.

The invention has the technical effects and advantages that:

according to the invention, related technologies such as NLP, a knowledge graph and big data are adopted, and the content of a reference material can be cut and extracted to generate the knowledge graph. The generated knowledge graph can help a user to push matched text segments for reference when writing the official document. The time and the energy consumed by the staff during writing are reduced, and the writing efficiency of the staff is improved.

Drawings

FIG. 1 is a diagram of the method steps provided herein;

fig. 2 is a flow chart of the system provided in the present application.

Detailed Description

The invention is described in further detail below with reference to the drawings and the detailed description. The embodiments of the present invention have been presented for purposes of illustration and description, and are not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Referring to fig. 1-2, in the present embodiment, a method for writing a official document based on a knowledge-graph is provided, the method comprising:

step 1: importing materials for reference into a system;

step 2: extracting a knowledge graph from the material content;

and 4, step 4: sorting the basic information table;

and 5: and pushing the text segment.

1. Importing a reference material;

and (4) importing the materials for writing the official document into the system, and preparing the official document.

2. Extracting a knowledge graph;

and the graph database module is used for acquiring the materials imported into the system. And the document preprocessing unit divides the acquired article contents into paragraph forms by separating the articles into the outline, and each article is divided into a plurality of paragraphs and then is handed over to the hash calculation unit. And the Hash calculation unit is used for generating an index of the Pid representative paragraph according to the paragraph calculation Hash value, and each paragraph carries the respective Pid and is stored in an ES table of the ES nonstandard database to be used as a preparation material of the triple extraction unit. And the triple extraction unit is used for dividing paragraph data in the es table into sentence-level granularity, extracting knowledge triples in the grade of nouns and verbs according to the syntactic dependency relationship after word segmentation, and extracting the extracted triples to be submitted to the neo4j database storage unit. The Neo4j database storage unit transmits the extracted triples to the Neo4j database for storage according to an entity-relationship-entity format, and provides the triples for the recall query module to use;

3. generating a basic information table;

and a user retrieval unit in the recall query module calls a crawler frame to acquire the outline data input by the user, then calls a final word segmentation tool of the server to segment the outline data, and transmits the segmented outline data into the ES database to generate and store a basic information table of the outline data after the segmentation is finished. And a Neo4j library fuzzy query unit calls a basic information table of the outline data, queries a Neo4j database by using cypher query language in a fuzzy query mode to generate a query result table, and cleans the query result table to remove repeated items. And transmitting the cleaned query result table to a recall result storage unit. And the recall result storage unit is used for receiving the cleaned query result table, marking the article currently processed according to the query result table, creating an article ID as supplementary information, generating a basic information table for subsequent user search, and providing the basic information table for the sequencing module.

4. Sequencing the query;

and a TF-IDF matrix transformation unit in the sequencing module performs TF-IDF value large matrix transformation on a recall result of the fuzzy query, namely a basic information table searched by a user, and establishes a query index. And establishing a TF-IDF search matrix for the outline data input by the user, and transmitting the TF-IDF search matrix into the computing unit. And the similarity calculation unit receives the established TF-IDF value large matrix and the outline data TF-IDF search matrix input by the user to perform Euclidean distance calculation, and takes topk returned paragraph texts as the results with the nearest distance values.

5. The segments for reference are pushed.

And the pushing module is used for calculating the similarity between all data in the ES non-standard database and the outline when the user newly builds or modifies the outline of the article, pushing similar contents and storing the contents in the character table. And the sequencing unit is used for selecting the most similar text segment content for pushing through similarity calculation when the user inputs the search word.

The embodiment also provides a public affair document writing system based on the knowledge graph, which comprises a storage module, a transmission module, a graph database module, a recall query module, a sorting module and a pushing module.

And the storage module is used for storing data. The storage module comprises a Neo4j database and an ES nonstandard database. The Neo4j database is used for storing data in an entity-relationship-entity format. And the ES non-standard database is used for storing the ES table data.

The transmission module comprises a receiving unit, a sending unit and a transmission unit and is used for transmitting data among different modules. The receiving unit user receives the data sent by other modules, and the sending unit is used for sending the data to other units. And a transmission unit for transporting data between the receiving unit and the sending unit.

The database module comprises a document preprocessing unit, a hash calculation unit, a triple extraction unit and a Neo4j database storage unit and is used for extracting the knowledge graph. And the document preprocessing unit is used for segmenting the acquired article content into paragraphs by using the outline as a segmentation. And the Hash calculation unit generates an index of the Pid representing paragraph according to the paragraph calculation Hash value. And the triple extraction unit divides paragraph data in the es table into sentence-level granularity, extracts knowledge triples in the levels of nouns and verbs according to syntactic dependencies after word segmentation, and submits the triples to the Neo4j database storage unit.

And the recall query module comprises a user retrieval unit, a Neo4j library fuzzy query unit and a recall result storage unit and is used for generating a basic information table. And the user retrieval unit acquires the outline data input by the user, calls a word segmentation tool to cut the data, and finally transmits the data into the ES database to generate a basic information table of the outline data for storage. And a Neo4j library fuzzy query unit calls a basic information table of the outline data, generates a query result table by fuzzy query of the Neo4j database, and removes repeated items of the query result table. And the recall result storage unit is used for receiving the query result table, marking the articles according to the query result table to create article IDs as supplementary information and generating a subsequent basic information table.

And the sequencing module comprises a TF-IDF matrix transformation unit and a similarity calculation unit and is used for calculating the similarity. And the TF-IDF matrix transformation unit establishes a large matrix and a query index according to the basic information table, and establishes a search matrix according to the outline data of the user. And the similarity calculation unit is used for performing Euclidean distance calculation on the large matrix and the search matrix and finally obtaining the texts of topk returned paragraphs according to the calculation result.

And the pushing module comprises a recall unit and a sequencing unit and is used for pushing the document segment content. And the recall unit calculates the similarity between all the data in the ES non-standard database and the outline, pushes similar contents and stores the contents in the character table. And the sequencing unit is used for selecting the most similar content for pushing through similarity calculation when the user inputs the search terms.

It should be apparent that the described embodiments are only some of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by one of ordinary skill in the art and related arts based on the embodiments of the present invention without any creative effort, shall fall within the protection scope of the present invention. Structures, devices, and methods of operation not specifically described or illustrated herein are generally practiced in the art without specific recitation or limitation.

Claims

1. A method for writing official documents based on knowledge graph is characterized by comprising:

step 1: importing materials for reference into a system;

step 2: extracting a knowledge graph from the material content;

and 4, step 4: sorting the basic information table;

and 5: and pushing the text segment.

2. A service document writing system based on a knowledge graph is characterized by comprising a storage module, a transmission module, a graph database module, a recall query module, a sorting module and a pushing module.

3. The knowledge-graph-based official document writing system of claim 2, wherein said storage module is adapted to store data;

the storage module comprises a Neo4j database and an ES non-standard database, wherein the Neo4j database is used for storing data in an 'entity-relation-entity' format, and the ES non-standard database is used for storing ES table data.

4. The knowledge-graph-based official document writing system of claim 2, wherein said transfer module is adapted to transfer data between different modules;

5. The knowledge-graph-based official document writing system of claim 2, wherein said graph database module is configured to extract a knowledge graph;

6. The knowledge-graph-based official document writing system of claim 2, wherein said recall query module is configured to generate a basic information table;

the recall query module comprises a user retrieval unit, a Neo4j database fuzzy query unit and a recall result storage unit, wherein the user retrieval unit acquires outline data input by a user, calls a word segmentation tool to cut the data, finally transmits the data to an ES database to generate a basic information table of the outline data for storage, the Neo4j database fuzzy query unit calls the basic information table of the outline data, generates a query result table by fuzzy query of the Neo4j database, removes repeated items of the query result table, receives the query result table, marks articles according to the query result table to create article IDs as supplementary information, and generates a subsequent basic information table.

7. The knowledge-graph-based official document writing system of claim 2, wherein said ranking module is configured to calculate similarity;

8. The knowledge-graph-based official document writing system of claim 2, wherein said pushing module is configured to push the document contents;