CN115146030A - Official document writing method and system based on knowledge graph - Google Patents

Official document writing method and system based on knowledge graph Download PDF

Info

Publication number
CN115146030A
CN115146030A CN202210794241.2A CN202210794241A CN115146030A CN 115146030 A CN115146030 A CN 115146030A CN 202210794241 A CN202210794241 A CN 202210794241A CN 115146030 A CN115146030 A CN 115146030A
Authority
CN
China
Prior art keywords
unit
module
data
graph
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210794241.2A
Other languages
Chinese (zh)
Inventor
陈刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yanshu Computer Technology Co ltd
Original Assignee
Shanghai Yanshu Computer Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yanshu Computer Technology Co ltd filed Critical Shanghai Yanshu Computer Technology Co ltd
Priority to CN202210794241.2A priority Critical patent/CN115146030A/en
Publication of CN115146030A publication Critical patent/CN115146030A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Abstract

The invention discloses a method and a system for writing official documents based on a knowledge graph, wherein the method comprises the following steps: importing materials for reference into a system, extracting a knowledge graph from material contents, generating a basic information table according to an article outline, sequencing the basic information table, and pushing text segments, wherein the system comprises: the system comprises a storage module, a transmission module, a graph database module, a recall query module, a sorting module and a pushing module. According to the invention, related technologies such as NLP, a knowledge graph and big data are adopted, and the content of a reference material can be cut and extracted to generate the knowledge graph. The generated knowledge graph can help a user to push matched text segments for reference when writing the official document. The time and the energy consumed by the staff during writing are reduced, and the writing efficiency of the staff is improved.

Description

Official document writing method and system based on knowledge graph
Technical Field
The invention relates to the field of official document writing, in particular to an official document writing method and system based on a knowledge map.
Background
Official documents are the common practice of official agencies and organizations in public activities. The written materials, also called official documents, are called official documents for short, which are formed and used according to a specific style through a certain processing procedure. Efficient writing of documents can help work to run correctly and efficiently.
When writing official documents, the writing difficulty is that the materials needed to be used as references are very many, and the refining requires high precision. Due to the two difficulties, when a staff writes a document, a large amount of time needs to be consumed to collect the materials, the materials are repeatedly read, and the needed parts are screened out from the materials. A large amount of tedious and repetitive preparation work consumes a large amount of time and energy of staff, so that the efficiency of writing the official documents is low. Therefore, those skilled in the art provide a method and system for writing official documents based on knowledge graph to solve the above problems in the background art.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for writing official documents based on a knowledge graph, which comprises the following steps:
step 1: importing materials for reference into a system;
and 2, step: extracting a knowledge graph from the material content;
and step 3: generating a basic information table according to the article outline;
and 4, step 4: sorting the basic information table;
and 5: and pushing the text segment.
The invention provides a service document writing system based on a knowledge graph, which comprises a storage module, a transmission module, a graph database module, a recall query module, a sorting module and a pushing module.
Preferably: the storage module is used for storing data;
the storage module comprises a Neo4j database and an ES non-standard database, wherein the Neo4j database is used for storing data in an 'entity-relationship-entity' format, and the ES non-standard database is used for storing ES table data.
Preferably: the transmission module is used for transmitting data between different modules;
the transmission module comprises a receiving unit, a sending unit and a transmission unit, wherein the receiving unit is used for receiving data sent by other modules, the sending unit is used for sending the data to other units, and the transmission unit is used for transporting the data between the receiving unit and the sending unit.
Preferably: the map database module is used for extracting a knowledge map;
the graph database module comprises a document preprocessing unit, a hash calculating unit, a triple extracting unit and a Neo4j database storage unit, wherein the document preprocessing unit divides the acquired article content into paragraphs by taking an outline as a partition, the hash calculating unit calculates a hash value according to the paragraphs to generate an index of a Pid representative paragraph, the triple extracting unit divides the paragraph data in the es table into sentence-level granularity, extracts knowledge triples according to the syntactic dependency relationship and the noun and verb levels after word segmentation, and submits the triples to the Neo4j database storage unit.
Preferably, the following components: the recall query module is used for generating a basic information table;
the recall query module comprises a user retrieval unit, a Neo4j database fuzzy query unit and a recall result storage unit, wherein the user retrieval unit acquires outline data input by a user, calls a word segmentation tool to cut the data, finally transmits the data into an ES database to generate a basic information table of the outline data for storage, the Neo4j database fuzzy query unit calls the basic information table of the outline data, generates a query result table by querying the Neo4j database in a fuzzy mode, removes repeated items of the query result table, receives the query result table, marks articles according to the query result table to create article IDs as supplementary information, and generates a subsequent basic information table.
Preferably, the following components: the sorting module is used for calculating similarity;
the sequencing module comprises a TF-IDF matrix transformation unit and a similarity calculation unit, wherein the TF-IDF matrix transformation unit establishes a large matrix and a query index according to a basic information table, establishes a search matrix according to outline data of a user, and the similarity calculation unit performs Euclidean distance calculation on the large matrix and the search matrix and finally obtains the texts of topk returned paragraphs according to calculation results.
Preferably, the following components: the push module is used for pushing the document segment content;
the pushing module comprises a recall unit and a sequencing unit, wherein the recall unit calculates the similarity between all data in the ES non-standard database and the outline, pushes similar contents and stores the contents in a text table, and the sequencing unit acquires the most similar contents for pushing through the similarity calculation when a user inputs a search word.
The invention has the technical effects and advantages that:
according to the invention, related technologies such as NLP, a knowledge graph and big data are adopted, and the content of a reference material can be cut and extracted to generate the knowledge graph. The generated knowledge graph can help a user to push matched text segments for reference when writing the official document. The time and the energy consumed by the staff during writing are reduced, and the writing efficiency of the staff is improved.
Drawings
FIG. 1 is a diagram of the method steps provided herein;
fig. 2 is a flow chart of the system provided in the present application.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description. The embodiments of the present invention have been presented for purposes of illustration and description, and are not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Referring to fig. 1-2, in the present embodiment, a method for writing a official document based on a knowledge-graph is provided, the method comprising:
step 1: importing materials for reference into a system;
step 2: extracting a knowledge graph from the material content;
and step 3: generating a basic information table according to the article outline;
and 4, step 4: sorting the basic information table;
and 5: and pushing the text segment.
1. Importing a reference material;
and (4) importing the materials for writing the official document into the system, and preparing the official document.
2. Extracting a knowledge graph;
and the graph database module is used for acquiring the materials imported into the system. And the document preprocessing unit divides the acquired article contents into paragraph forms by separating the articles into the outline, and each article is divided into a plurality of paragraphs and then is handed over to the hash calculation unit. And the Hash calculation unit is used for generating an index of the Pid representative paragraph according to the paragraph calculation Hash value, and each paragraph carries the respective Pid and is stored in an ES table of the ES nonstandard database to be used as a preparation material of the triple extraction unit. And the triple extraction unit is used for dividing paragraph data in the es table into sentence-level granularity, extracting knowledge triples in the grade of nouns and verbs according to the syntactic dependency relationship after word segmentation, and extracting the extracted triples to be submitted to the neo4j database storage unit. The Neo4j database storage unit transmits the extracted triples to the Neo4j database for storage according to an entity-relationship-entity format, and provides the triples for the recall query module to use;
3. generating a basic information table;
and a user retrieval unit in the recall query module calls a crawler frame to acquire the outline data input by the user, then calls a final word segmentation tool of the server to segment the outline data, and transmits the segmented outline data into the ES database to generate and store a basic information table of the outline data after the segmentation is finished. And a Neo4j library fuzzy query unit calls a basic information table of the outline data, queries a Neo4j database by using cypher query language in a fuzzy query mode to generate a query result table, and cleans the query result table to remove repeated items. And transmitting the cleaned query result table to a recall result storage unit. And the recall result storage unit is used for receiving the cleaned query result table, marking the article currently processed according to the query result table, creating an article ID as supplementary information, generating a basic information table for subsequent user search, and providing the basic information table for the sequencing module.
4. Sequencing the query;
and a TF-IDF matrix transformation unit in the sequencing module performs TF-IDF value large matrix transformation on a recall result of the fuzzy query, namely a basic information table searched by a user, and establishes a query index. And establishing a TF-IDF search matrix for the outline data input by the user, and transmitting the TF-IDF search matrix into the computing unit. And the similarity calculation unit receives the established TF-IDF value large matrix and the outline data TF-IDF search matrix input by the user to perform Euclidean distance calculation, and takes topk returned paragraph texts as the results with the nearest distance values.
5. The segments for reference are pushed.
And the pushing module is used for calculating the similarity between all data in the ES non-standard database and the outline when the user newly builds or modifies the outline of the article, pushing similar contents and storing the contents in the character table. And the sequencing unit is used for selecting the most similar text segment content for pushing through similarity calculation when the user inputs the search word.
The embodiment also provides a public affair document writing system based on the knowledge graph, which comprises a storage module, a transmission module, a graph database module, a recall query module, a sorting module and a pushing module.
And the storage module is used for storing data. The storage module comprises a Neo4j database and an ES nonstandard database. The Neo4j database is used for storing data in an entity-relationship-entity format. And the ES non-standard database is used for storing the ES table data.
The transmission module comprises a receiving unit, a sending unit and a transmission unit and is used for transmitting data among different modules. The receiving unit user receives the data sent by other modules, and the sending unit is used for sending the data to other units. And a transmission unit for transporting data between the receiving unit and the sending unit.
The database module comprises a document preprocessing unit, a hash calculation unit, a triple extraction unit and a Neo4j database storage unit and is used for extracting the knowledge graph. And the document preprocessing unit is used for segmenting the acquired article content into paragraphs by using the outline as a segmentation. And the Hash calculation unit generates an index of the Pid representing paragraph according to the paragraph calculation Hash value. And the triple extraction unit divides paragraph data in the es table into sentence-level granularity, extracts knowledge triples in the levels of nouns and verbs according to syntactic dependencies after word segmentation, and submits the triples to the Neo4j database storage unit.
And the recall query module comprises a user retrieval unit, a Neo4j library fuzzy query unit and a recall result storage unit and is used for generating a basic information table. And the user retrieval unit acquires the outline data input by the user, calls a word segmentation tool to cut the data, and finally transmits the data into the ES database to generate a basic information table of the outline data for storage. And a Neo4j library fuzzy query unit calls a basic information table of the outline data, generates a query result table by fuzzy query of the Neo4j database, and removes repeated items of the query result table. And the recall result storage unit is used for receiving the query result table, marking the articles according to the query result table to create article IDs as supplementary information and generating a subsequent basic information table.
And the sequencing module comprises a TF-IDF matrix transformation unit and a similarity calculation unit and is used for calculating the similarity. And the TF-IDF matrix transformation unit establishes a large matrix and a query index according to the basic information table, and establishes a search matrix according to the outline data of the user. And the similarity calculation unit is used for performing Euclidean distance calculation on the large matrix and the search matrix and finally obtaining the texts of topk returned paragraphs according to the calculation result.
And the pushing module comprises a recall unit and a sequencing unit and is used for pushing the document segment content. And the recall unit calculates the similarity between all the data in the ES non-standard database and the outline, pushes similar contents and stores the contents in the character table. And the sequencing unit is used for selecting the most similar content for pushing through similarity calculation when the user inputs the search terms.
It should be apparent that the described embodiments are only some of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by one of ordinary skill in the art and related arts based on the embodiments of the present invention without any creative effort, shall fall within the protection scope of the present invention. Structures, devices, and methods of operation not specifically described or illustrated herein are generally practiced in the art without specific recitation or limitation.

Claims (8)

1. A method for writing official documents based on knowledge graph is characterized by comprising:
step 1: importing materials for reference into a system;
step 2: extracting a knowledge graph from the material content;
and step 3: generating a basic information table according to the article outline;
and 4, step 4: sorting the basic information table;
and 5: and pushing the text segment.
2. A service document writing system based on a knowledge graph is characterized by comprising a storage module, a transmission module, a graph database module, a recall query module, a sorting module and a pushing module.
3. The knowledge-graph-based official document writing system of claim 2, wherein said storage module is adapted to store data;
the storage module comprises a Neo4j database and an ES non-standard database, wherein the Neo4j database is used for storing data in an 'entity-relation-entity' format, and the ES non-standard database is used for storing ES table data.
4. The knowledge-graph-based official document writing system of claim 2, wherein said transfer module is adapted to transfer data between different modules;
the transmission module comprises a receiving unit, a sending unit and a transmission unit, wherein the receiving unit is used for receiving data sent by other modules, the sending unit is used for sending the data to other units, and the transmission unit is used for transporting the data between the receiving unit and the sending unit.
5. The knowledge-graph-based official document writing system of claim 2, wherein said graph database module is configured to extract a knowledge graph;
the graph database module comprises a document preprocessing unit, a hash calculating unit, a triple extracting unit and a Neo4j database storage unit, wherein the document preprocessing unit divides the acquired article content into paragraphs by taking an outline as a partition, the hash calculating unit calculates a hash value according to the paragraphs to generate an index of a Pid representative paragraph, the triple extracting unit divides the paragraph data in the es table into sentence-level granularity, extracts knowledge triples according to the syntactic dependency relationship and the noun and verb levels after word segmentation, and submits the triples to the Neo4j database storage unit.
6. The knowledge-graph-based official document writing system of claim 2, wherein said recall query module is configured to generate a basic information table;
the recall query module comprises a user retrieval unit, a Neo4j database fuzzy query unit and a recall result storage unit, wherein the user retrieval unit acquires outline data input by a user, calls a word segmentation tool to cut the data, finally transmits the data to an ES database to generate a basic information table of the outline data for storage, the Neo4j database fuzzy query unit calls the basic information table of the outline data, generates a query result table by fuzzy query of the Neo4j database, removes repeated items of the query result table, receives the query result table, marks articles according to the query result table to create article IDs as supplementary information, and generates a subsequent basic information table.
7. The knowledge-graph-based official document writing system of claim 2, wherein said ranking module is configured to calculate similarity;
the sequencing module comprises a TF-IDF matrix transformation unit and a similarity calculation unit, wherein the TF-IDF matrix transformation unit establishes a large matrix and a query index according to a basic information table, establishes a search matrix according to outline data of a user, and the similarity calculation unit performs Euclidean distance calculation on the large matrix and the search matrix and finally obtains the texts of topk returned paragraphs according to calculation results.
8. The knowledge-graph-based official document writing system of claim 2, wherein said pushing module is configured to push the document contents;
the pushing module comprises a recall unit and a sequencing unit, wherein the recall unit calculates the similarity between all data in the ES non-standard database and the outline, pushes similar contents and stores the contents in a text table, and the sequencing unit acquires the most similar contents for pushing through the similarity calculation when a user inputs a search word.
CN202210794241.2A 2022-07-05 2022-07-05 Official document writing method and system based on knowledge graph Pending CN115146030A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210794241.2A CN115146030A (en) 2022-07-05 2022-07-05 Official document writing method and system based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210794241.2A CN115146030A (en) 2022-07-05 2022-07-05 Official document writing method and system based on knowledge graph

Publications (1)

Publication Number Publication Date
CN115146030A true CN115146030A (en) 2022-10-04

Family

ID=83412801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210794241.2A Pending CN115146030A (en) 2022-07-05 2022-07-05 Official document writing method and system based on knowledge graph

Country Status (1)

Country Link
CN (1) CN115146030A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116090473A (en) * 2023-04-06 2023-05-09 北京大学深圳研究生院 Intelligent writing assisting method, device and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116090473A (en) * 2023-04-06 2023-05-09 北京大学深圳研究生院 Intelligent writing assisting method, device and system

Similar Documents

Publication Publication Date Title
CN104199965B (en) Semantic information retrieval method
CN103425687A (en) Retrieval method and system based on queries
CN107085583B (en) Electronic document management method and device based on content
CN106156111B (en) Patent document retrieval method, device and system
CN107844493B (en) File association method and system
CN113190687B (en) Knowledge graph determining method and device, computer equipment and storage medium
CN110866102A (en) Search processing method
CN112699232A (en) Text label extraction method, device, equipment and storage medium
US20040122660A1 (en) Creating taxonomies and training data in multiple languages
CN110674365A (en) Searching method, device, equipment and storage medium
CN105404677A (en) Tree structure based retrieval method
CN115146030A (en) Official document writing method and system based on knowledge graph
CN108388556B (en) Method and system for mining homogeneous entity
CN109885641A (en) A kind of method and system of database Chinese Full Text Retrieval
CN110866086A (en) Article matching system
CN111522938B (en) Method, device and equipment for screening talent performance documents
CN106372123B (en) Tag-based related content recommendation method and system
CN112015907A (en) Method and device for quickly constructing discipline knowledge graph and storage medium
CN105426490A (en) Tree structure based indexing method
CN115203445A (en) Multimedia resource searching method, device, equipment and medium
CN113449063B (en) Method and device for constructing document structure information retrieval library
US20220083736A1 (en) Information processing apparatus and non-transitory computer readable medium
CN111680122B (en) Space data active recommendation method and device, storage medium and computer equipment
CN114706938A (en) Document tag determination method and device, electronic equipment and storage medium
CN110245215B (en) Text retrieval method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination