WO2021114810A1

WO2021114810A1 - Graph structure-based official document recommendation method, apparatus, computer device, and medium

Info

Publication number: WO2021114810A1
Application number: PCT/CN2020/116744
Authority: WO
Inventors: 谢静文; 阮晓雯; 徐亮
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-05-29
Filing date: 2020-09-22
Publication date: 2021-06-17
Also published as: CN111666401B; CN111666401A

Abstract

The present application relates to the field of big data, and disclosed by the present application are a graph structure-based official document recommendation method, apparatus, computer device, and medium. The method comprises: obtaining a variety of official documents, filtering feature words according to TF-IDF, and recording the feature words as keyword tags of a corresponding official document; filtering out text topics whose probability of selection is greater than or equal to a preset probability by means of a text topic–keyword distribution probability matrix of the official document, and recording the selected text topics as subject tags of the corresponding official document; generating official document attributes according to the keyword tags and topic tags; obtaining the record data of the official document, and establishing an official document recommendation library based on a graph structure according to the document record data and document attributes by means of a Neo4j framework; receiving search content entered by a user from the official document recommendation library, and outputting the target official documents according to the height order of degree of similarity calculated by SimRank. The present application can recommend to the user a target official document which is most relevant to search content entered by the user.

Description

Method, device, computer equipment and medium for recommending official documents based on graph structure

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 29, 2020, the application number is 202010475897.9, and the invention title is "The method, device, computer equipment and medium for recommending official documents based on graph structure", and its entire content Incorporated in this application by reference.

Technical field

This application relates to the field of data analysis in the field of big data, and in particular to a method, device, computer equipment, and medium for recommending official documents based on a graph structure.

Background technique

At present, most commonly used official document recommendation methods are based on traditional search engines. When traditional search engines recommend official documents, they usually recommend official documents based on the similarity of the official documents. In this way, official documents with higher relevance to users can be recommended. However, the inventor realized that In the prior art, the determination of the similarity of official documents is often based on a single manually set standard. Therefore, the manually set standard may have inaccurate problems, which will cause the traditional search engine to make official document recommendation. There is a problem of incomplete consideration, which leads to the inability to recommend to the user the most relevant official document with the content input by the user, which affects the user experience effect. Therefore, those skilled in the art urgently need to find a technical solution to solve the aforementioned problems.

Summary of the invention

Based on this, it is necessary to address the above technical problems and provide a method, device, computer equipment, and medium for recommending official documents based on a graph structure, which can recommend to users the official documents that are most relevant to the content input by the user, thereby improving user experience.

An official document recommendation method based on graph structure, including:

Acquire a variety of official documents with different types of official documents, determine the characteristic words in the acquired documents according to TF-IDF based on preset word statistical characteristics, and filter characteristic words with a frequency greater than or equal to the preset frequency according to TF-IDF, and Recording the selected characteristic words as the keyword tags of the official document corresponding thereto;

Input the official document into the preset LDA topic model, calculate the distribution probability matrix of the text topic-keyword in the official document through the LDA topic model, and then obtain the LDA topic model according to the text topic of the official document. The distribution probability matrix of keywords selects the text topics whose selection probability is greater than or equal to the preset probability, and records the selected text topics as the topic tags of the official documents corresponding to them; the text topic-key The distribution probability matrix of a word includes a plurality of the selection probabilities, and the selection probability refers to the probability that the keywords in the official document belong to the text topic of the official document;

Generating official document attributes according to the keyword tags and the topic tags;

Obtain the record data of the official document according to each type of official document, and establish an official document recommendation database based on the graph structure based on the record data of the official document and the document attribute through the Neo4j framework; the official document recommendation database contains multiple images Structure, one said graph structure corresponds to said official document of at least one of said official document type, one said graph structure contains multiple nodes connected to each other; one said node represents said record data, said keyword tag and One of the subject tags;

The retrieval content input by the user from the official document recommendation library is received, and the target document is output according to the order of similarity calculated by SimRank; the similarity refers to the similarity between the retrieval content and the node.

An official document recommendation device based on graph structure, including:

The first recording module is used to obtain a variety of official documents with different types of official documents, determine the characteristic words in the obtained official documents according to TF-IDF based on preset word statistical characteristics, and filter according to the TF-IDF the occurrence frequency is greater than or equal to the expected Set frequency characteristic words, and record the selected characteristic words as the corresponding keyword tags of the official document;

The second recording module is used to input the official document into a preset LDA topic model, calculate the distribution probability matrix of the text topic-keywords in the official document through the LDA topic model, and then obtain the LDA topic model according to The text topic-keyword distribution probability matrix of the official document selects the text topics whose selection probability is greater than or equal to the preset probability, and records the selected text topics as the topic tags of the official documents corresponding to them The distribution probability matrix of the text topic-keyword includes a plurality of the selection probabilities, and the selection probability refers to the probability that the keywords in the official document belong to the text theme of the official document;

The first generation module is configured to generate official document attributes according to the keyword tags and the topic tags;

The establishment module is used to obtain the record data of the official document according to each type of official document, and establish an official document recommendation database based on the graph structure according to the record data of the official document and the attribute of the official document through the Neo4j framework; the official document recommendation database Contains multiple graph structures, one graph structure corresponds to the official document of at least one of the document types, and one graph structure contains multiple nodes connected to each other; one of the nodes represents the record data, all One of the keyword tag and the topic tag;

The calculation module is configured to receive the search content input by the user from the official document recommendation library, and output the target official documents according to the order of similarity calculated by SimRank; the similarity refers to the similarity between the search content and the node degree.

A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:

One or more readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:

The above-mentioned method, device, computer equipment and media for document recommendation based on graph structure, keyword tags based on TF-IDF will be relatively objective, and keyword tags are obtained based on statistical methods, which can guarantee the obtained keyword tags It has the advantages of comprehensive consideration and low error rate, and the number of keyword tags given is in a controllable state, which can ensure that the keyword tags are relatively rich; the topic tags given based on the LDA topic model will be relatively objective, and each keyword is The corresponding text topic is obtained based on the model calculation method, thereby ensuring that the obtained text topic label has the advantages of comprehensive consideration and low error rate; SimRank calculates the similarity between the search content entered by the user and the node, because SimRank combines The features in the text of a variety of official documents can therefore recommend more relevant target official documents to improve the accuracy and efficiency of the recommendation. The similarity between objects measured by SimRank is more in line with human intuitive judgment, and the similarity is To determine the order of the output target documents, the user experience can be improved.

The details of one or more embodiments of the present application are presented in the following drawings and description, and other features and advantages of the present application will become apparent from the description, drawings and claims.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.

FIG. 1 is a schematic diagram of an application environment of an official document recommendation method based on a graph structure in an embodiment of the present application;

2 is a flowchart of a method for recommending official documents based on a graph structure in an embodiment of the present application;

3 is a schematic diagram of the structure of an official document recommendation device based on a graph structure in an embodiment of the present application;

Fig. 4 is a schematic diagram of a computer device in an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The method for recommending official documents based on the graph structure provided in this application can be applied to the application environment as shown in FIG. 1, in which the client communicates with the server through the network. Among them, the client can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.

In an embodiment, as shown in FIG. 2, a method for recommending an official document based on a graph structure is provided. Taking the method applied to the server in FIG. 1 as an example, the method includes the following steps:

S10. Acquire multiple official documents with different official document types, determine the acquired characteristic words in the official document based on preset word statistical characteristics according to TF-IDF, and filter characteristic words with occurrence frequency greater than or equal to the preset frequency according to TF-IDF , And record the selected characteristic words as the corresponding keyword tags of the official document;

Understandably, official documents are currently at least 15 official document types, which include but are not limited to commands, decisions, announcements, announcements, and notifications; TF-IDF (Term frequency—Inverse Document Frequency) is a type of information Commonly used weighting techniques for retrieval and data mining can be used as a means of keyword extraction. Among them, TF refers to the frequency of feature words, IDF refers to the frequency of feature words in other official documents; feature words generally refer to the content of the official document that can represent the official document Words such as personal pronouns, modal auxiliary words and conjunctions are generally not included in the characteristic words, while specific executive verbs in official documents can be included in the characteristic words. Specifically, the statistical characteristics of the words can be set according to the needs to determine the The characteristic words obtained from the official document, therefore, the statistical characteristics of the words in this embodiment include the characteristics of a variety of characteristic words, such as the verb characteristics corresponding to the executive verbs in the characteristic words; the higher the frequency of occurrence of the characteristic words, the characteristic words can be explained. The representativeness and importance in the official document is very high. Optionally, the preset frequency can be set according to the application field. However, because some characteristic words are biased toward a certain application field, in the official document field of this embodiment, the preset frequency The frequency setting can keep the number of selected characteristic words at about 10, and the specific number can be determined according to requirements. In this embodiment, the keyword tags given based on TF-IDF will be relatively objective, and the keyword tags are obtained based on statistical methods, thereby ensuring that the obtained keyword tags have the advantages of comprehensive consideration and low error rate. And the number of keyword tags given is in a controllable state, which can ensure that the keyword tags are richer.

S20. Input the official document into a preset LDA topic model, calculate the distribution probability matrix of the text topic-keywords in the official document through the LDA topic model, and then obtain the LDA topic model according to the text of the official document The topic-keyword distribution probability matrix filters out the text topics whose selection probability is greater than or equal to the preset probability, and records the selected text topics as the topic tags of the official documents corresponding thereto; the text topics -The distribution probability matrix of keywords includes a plurality of the selection probabilities, and the selection probability refers to the probability that the keywords in the official document belong to the text topic of the official document;

Understandably, the LDA topic model is a document topic generation model and a three-layer Bayesian probability model. The model can extract the text topic-keyword distribution probability matrix from the official document (the topic-keyword distribution matrix is defined by the class The inter-dispersion matrix S _B and the intra-class scatter matrix S _{W are} calculated, where the topic-keyword distribution matrix is used as the feature matrix

When there is a preset LDA topic model for official document input characters, in the preset LDA topic model, the word embedding feature of the official document can be multiplied by the matrix W, and the text topic-keyword distribution probability matrix of the official document can be obtained, where, The word embedding feature of the official document is the text feature after the word embedding of the official document using wordembedding. Word embedding is a method of converting the keywords in the official document into a digital vector). Specifically, it is calculated by the LDA theme model that each keyword belongs to all The distribution probability of any topic in the topic, and the distribution probability is used as the selection probability. Among them, a selection probability represents the probability of a keyword associated with the text topic of the official document. Then, after comparing the selection probability with the preset probability through the LDA topic model, Finally, the text topics whose selection probability is greater than or equal to the preset probability and output by the LDA topic model are obtained; in the official document domain of this embodiment, the preset probability setting can keep the number of text topics filtered out at about 3. The quantity is determined according to demand. In this embodiment, the topic tags given based on the LDA topic model will be relatively objective, and the text topic corresponding to each keyword is obtained based on the model calculation method, thereby ensuring that the resulting text topic tags have comprehensive consideration and error rate Low advantage.

S30, generating an official document attribute according to the keyword tag and the topic tag;

Understandably, the attribute of the official document may represent the key attribute of the official document, where the attribute of the official document includes keyword tags, topic tags, mathematical entities, time of the official document and the unit of the document, etc.

S40. Obtain record data of the official document according to each type of official document, and establish an official document recommendation database based on a graph structure based on the record data of the official document and the attribute of the official document through the Neo4j framework; the official document recommendation database contains multiple A graph structure, one graph structure corresponds to at least one of the official document types, and one graph structure contains multiple nodes connected to each other; one of the nodes represents the record data and the keywords One of the label and the subject label;

Understandably, the record data is the data corresponding to each official document formed for the database. Among them, a record data can correspond to a type of official document, and the record data may include the overall content data in the official document; Neo4j framework It is a high-performance NOSQL graph database that stores structured data on the network instead of data tables, and the Neo4j framework can also be regarded as a high-performance graph engine, so this embodiment can borrow the Neo4j framework to establish Set up an official document recommendation library about graph structures that record data and official document attributes. The graph structure-based document recommendation library includes multiple graph structures, each graph structure can contain multiple nodes, and each graph structure can refer to at least An official document type of official document, such as the official document A as a node, the node is respectively associated with the node corresponding to the keyword tag "personnel transfer to a certain department", and the node corresponding to the topic tag "personnel change", and finally A diagram structure of personnel transfer to a certain department-official document A-personnel changes can be formed. In this embodiment, the work efficiency of the official document recommendation library containing multiple graph structures is better than that of a traditional database or a traditional search engine, and when too many official documents are stored in the official document recommendation library, the official documents of the official document recommendation library The recommended efficiency will not be affected.

S50: Receive the search content input by the user from the official document recommendation library, and output the target official documents according to the order of similarity calculated by SimRank; the similarity refers to the similarity between the search content and the node.

Understandably, SimRank is a model that measures the degree of similarity between any two objects based on the topological structure information of the graph. It can also be understood as a calculation method. The model or the method can be embedded in the official document recommendation database. The specific calculation process is to obtain the search content input by the user from the input interface in the official document recommendation library. The search content can be multiple target keywords or official document names. At this time, the target keyword or official document in the search content Names, etc. are used as search nodes. SimRank calculates the similarity between the search node and each node in the graph structure. For example, there are a total of 6 nodes in the official document A and the search node, of which 2 nodes are shared, and 4 nodes are similar. When, the similarity is 4/6=0.67. In this embodiment, the similarity between the retrieved content and the node is calculated by SimRank. Since SimRank combines the features in the text of a variety of official documents (the above-mentioned record data and the attributes of the official document), it can recommend relevant The target document with higher performance improves the accuracy and efficiency of recommendation, and the similarity between objects measured by SimRank is more in line with human intuitive judgment, and the order of the output target document is determined by the degree of similarity, which can improve User experience effect.

Further, before the acquiring multiple types of official documents with different types of official documents, the method further includes:

Analyze the overall chapter structure of the official document through the successfully trained BERT model, and obtain an analysis result of the overall chapter structure of the official document; the overall chapter structure refers to the various constituent structures of the official document, and the analysis The result is the result of judging the completeness and rationality of each of the constituent structures of the official document;

When the analysis result is that one of the constituent structures of the official document does not have the completeness or/and the rationality, the constituent structure or/and the inconsistency that is missing in the official document are extracted from the official document. The reasonable composition structure is marked with the missing composition structure or/and the unreasonable composition structure in the official document in a highlighted form, and the preset data recipient is asked to modify the official document.

Understandably, the BERT model is a language representation model that can be used to analyze the overall chapter structure and length of the official document. The specific training process of the BERT model is: first, it is necessary to train each sentence corresponding to the constituent structure of the official document to be labeled. For example, the sentences of the corresponding paragraphs in the overall chapter structure of the training text are labeled 1-B, 2-B, 3-B, 1-I, 2-I.3-I, 1 can represent the beginning, 2 can represent the discussion, and 3 can represent At the end, the beginning, the discussion, and the end are the composition structure of the official document. Then the BERT model is modeled. Before the BERT model is trained, the existing word vectors in the BERT model can be performed according to the successfully marked sentences in the official document. Enhance training to make the distribution of word vector representation more suitable for official documents (there is insufficient number of marked sentences in official documents, you can skip this enhanced training), and you can pass bert-base when training the BERT model On the basis of BERT, the BERT model is constantly fine-tuned to make the word vector distribution more reasonable (currently the pre-training word vector provided by the BERT model is trained based on all Chinese corpus, so the word vector distribution and the word vector in the official document application field are obtained The distribution is different, so the BERT model needs to be fine-tuned to suit the application field of the official document), and finally after all the word vectors are trained (so that the output of the BERT model can depict the essence of the language), the output of the BERT model can be selected [CLS ] Location ([CLS] location contains high-level feature vectors, containing the semantic information of the entire sentence) as the classification result of the composition structure classification of the official document (a category represents a composition structure) (this embodiment also outputs the BERT model The classification results are further revised. The revision is to solve the jumping composition structure in the classification result. The jumping structure such as 1-B, 1-I, 3-B, 2-B, 2-I, 3-B, and its representative For beginning-beginning-end-discussion-discussion-end, the correction method is mainly to adjust the position of each composition structure to adjust), the output of the classification result is in the form of the probability corresponding to the composition structure category of different official documents, and the classification results After comparing each probability with its preset threshold (mainly for the missing component structure in the official document), it can be determined whether the sentence corresponding to the component structure under the category is complete or/and reasonable. In this embodiment, based on the BERT model, the composition structure of the official document can be identified independently, and in the recognition process of the BERT model, the BERT model has the advantage of being convenient to use, and is not affected by the length of the official document, and the official document can be structured Disassembled, and the BERT model has strong generalization ability, which can target different types of official documents, and the analysis results of the BERT model output are obtained after analyzing the multi-dimensional composition structure of the entire official document, and pass the BERT model The output analysis results can also be used to further analyze the number of idioms used in each composition structure and the distribution of space.

Further, the official document attribute further includes a digital entity; before the establishment of the official document recommendation database based on the graph structure according to the record data of the official document and the official document attribute through the Neo4j framework, the method further includes:

After performing a digital entity search on the official document through the target entity expression in the preset rule template, locate the target location of the digital entity, and use the capture regular expression in the preset rule template to find the target location of the digital entity. Grab the digital entity.

Understandably, the target entity expression in the preset rule template is used to locate the target location of the digital entity. The target entity expression generally has an association relationship with the digital entity, such as "the estimated total investment amount of the project is", and The capture rule expression in the preset rule template is used to capture a digital entity, for example, the digital entity is 10,000 yuan. In this embodiment, the digital entity extracted based on the preset rule template can improve the efficiency and effect of crawling. The above method is also applicable to the capture of the time of the official document and the communication unit.

Further, the attributes of the official document further include the time of the official document and the unit of the document; before the establishment of the official document recommendation database based on the graph structure based on the record data of the official document and the attribute of the official document through the Neo4j framework, the method further includes:

Obtain the official document content of the official document, and identify the official document communication time corresponding to the time component and the communication unit corresponding to the unit component from the official document content through the NLP model;

The generating of official document attributes according to the keyword tags and the topic tags includes:

The official document attribute is generated according to the time of the official document communication, the communication unit, the keyword tag, and the topic tag.

Understandably, the NLP model is a natural language processing algorithm engine. In this embodiment, based on the NLP model, various required components can be identified, and then the content corresponding to the component components can be identified. Among them, one component corresponds to one content, such as the official document corresponding to the time component mentioned above. The time of the communication and the corresponding unit of the communication.

Further, the establishment of an official document recommendation database based on a graph structure through the Neo4j framework according to the record data of the official document and the attribute of the official document includes:

Construct each node corresponding to the official document according to the node attribute through the create node sentence in the Neo4j framework; the node attribute corresponds to the record data and the official document attribute respectively;

Construct a connection relationship between each of the nodes according to a preset relationship through the creation relationship statement in the Neo4j framework; the preset relationship corresponds to the record data and the official document attribute respectively;

The path sentences in the Neo4j framework are used to determine the paths of all the nodes according to the connection relationship, and the official document recommendation library based on the graph structure is established.

Understandably, the create node statement is used to build a node, such as A node; the create relationship statement is used to build a connection relationship between nodes, such as A node-B node, etc.; the path statement is used to determine two nodes The entire path or the shortest path between nodes, such as A node-B node and A node-C node. This embodiment mainly implements the establishment of an official document recommendation library based on a graph structure by using execution sentences in the Neo4j framework.

Further, after the output of the target official document according to the high and low order of the similarity calculated by SimRank, the method further includes:

Compress the target official documents output in sequence according to the degree of similarity into link points, and when the user selects at least one of the link points, the corresponding ones of the link points are completely presented in the form of a preset view The official document content of the target official document; one of the link points corresponds to a piece of the target official document.

Understandably, this embodiment compresses the target official document into link points, where a link point can be stored in the entire official document content of a target official document. Therefore, this embodiment can be used to save the display resources of the target official document and avoid users Watching the target official document with too much official document content affects the user experience.

In summary, the above provides a method for recommending official documents based on graph structure. The keyword tags given by TF-IDF will be relatively objective. The keyword tags are obtained based on statistical methods to ensure that the key is obtained. The word tag has the advantages of comprehensive consideration and low error rate, and the number of keyword tags given is controllable, which can ensure that the keyword tags are relatively rich; the topic tags given based on the LDA topic model will be relatively objective, each key The text topic corresponding to the word is obtained based on the model calculation method, which can ensure that the obtained text topic label has the advantages of comprehensive consideration and low error rate; SimRank calculates the similarity between the search content entered by the user and the node, due to SimRank Combining the features in the text of a variety of official documents, it can recommend more relevant target official documents, improving the accuracy and efficiency of the recommendation; and the similarity between objects measured by SimRank is more in line with human intuitive judgment, and The level of the similarity determines the order of output target documents, which can improve the user experience.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

In one embodiment, a device for recommending official documents based on a graph structure is provided, and the device for recommending official documents based on a graph structure corresponds to the method for recommending official documents based on a graph structure in the foregoing embodiment. As shown in FIG. 3, the apparatus for recommending official documents based on a graph structure includes a first recording module 11, a second recording module 12, a first generating module 13, a establishing module 14, and a calculating module 15. The detailed description of each functional module is as follows:

The first recording module 11 is configured to obtain multiple official documents with different types of official documents, determine the characteristic words in the obtained official documents based on the TF-IDF based on preset word statistical characteristics, and screen according to the TF-IDF the occurrence frequency is greater than or equal to Preset frequency characteristic words, and record the selected characteristic words as the keyword tags of the official document corresponding thereto;

The second recording module 12 is configured to input the official document into a preset LDA topic model, calculate the distribution probability matrix of the text topic-keywords in the official document through the LDA topic model, and then obtain the LDA topic model According to the text topic-keyword distribution probability matrix of the official document, select the text topic whose selection probability is greater than or equal to the preset probability, and record the selected text topic as the topic of the official document corresponding to it Label; the text topic-keyword distribution probability matrix contains a plurality of the selection probabilities, the selection probability refers to the probability that the keywords in the official document belong to the text theme of the official document;

The first generating module 13 is configured to generate official document attributes according to the keyword tags and the topic tags;

The establishment module 14 is configured to obtain the record data of the official document according to each type of official document, and establish an official document recommendation database based on the graph structure according to the record data of the official document and the official document attribute through the Neo4j framework; the official document recommendation The library contains multiple graph structures, one graph structure corresponds to the official document of at least one of the document types, and one graph structure contains multiple nodes connected to each other; one of the nodes represents the record data, One of the keyword tag and the topic tag;

The calculation module 15 is configured to receive the search content input by the user from the official document recommendation library, and output the target official documents according to the order of similarity calculated by SimRank; the similarity refers to the difference between the search content and the node Similarity.

Further, the apparatus for recommending official documents based on the graph structure further includes:

The analysis module is used to analyze the overall chapter structure of the official document through the successfully trained BERT model to obtain an analysis result of the overall chapter structure of the official document; the overall chapter structure refers to the various components of the official document Structure, the analysis result is the result of judging the completeness and rationality of each of the constituent structures of the official document;

The marking module is used to extract the missing components from the official document when the analysis result is that one of the constituent structures of the official document does not have the completeness or/and the rationality The structure or/and the unreasonable composition structure, the missing composition structure or/and the unreasonable composition structure in the official document are marked in the form of highlighting, and the preset data recipient is asked to refer to the official document to modify.

The grabbing module is used to locate the target position of the digital entity after searching for the digital entity of the official document through the target entity expression in the preset rule template, and express it through the grabbing rule in the preset rule template The digital entity is captured from the target location in a manner.

The identification module is used to obtain the official document content of the official document, and identify the official document communication time corresponding to the time component and the communication unit corresponding to the unit component from the official document content through the NLP model;

The second generating module is configured to generate the attributes of the official document according to the time of the official document communication, the unit of the communication, the keyword tag, and the topic tag.

Further, the establishment module includes:

The first construction sub-module is used to construct each node corresponding to the official document according to the node attribute through the create node sentence in the Neo4j framework; the node attribute corresponds to the record data and the official document attribute respectively;

The second building sub-module is used to build a connection relationship between each of the nodes according to a preset relationship through the creation relationship statement in the Neo4j framework; the preset relationship corresponds to the record data and the official document attribute respectively;

The establishment sub-module is used to determine the paths of all the nodes according to the connection relationship through the path statements in the Neo4j framework, and establish and complete the official document recommendation library based on the graph structure.

The selection module is configured to compress the target official documents output in sequence according to the degree of similarity into link points, and when the user selects at least one of the link points, fully present the target document in a preset view. The official document content of the target official document corresponding to the link points; one of the link points corresponds to a piece of the target official document.

Regarding the specific definition of the device for recommending official documents based on the graph structure, please refer to the above definition of the method for recommending official documents based on the graph structure, which will not be repeated here. The various modules in the above-mentioned apparatus for recommending official documents based on the graph structure can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 4. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a readable storage medium and an internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium. The database of the computer equipment is used to store the data involved in the method for recommending official documents based on the graph structure. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by the processor, a method for recommending official documents based on a graph structure is realized. The readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.

In one embodiment, one or more readable storage media storing computer readable instructions are provided. The readable storage media provided in this embodiment include non-volatile readable storage media and volatile readable storage. Medium; the readable storage medium stores computer readable instructions, and when the computer readable instructions are executed by one or more processors, the one or more processors implement the method for recommending official documents based on the graph structure in the above-mentioned embodiment Steps, for example, step S10 to step S50 shown in FIG. 2. Alternatively, when the processor executes the computer-readable instructions, the functions of the modules/units of the apparatus for recommending official documents based on the graph structure in the foregoing embodiment are implemented, for example, the functions of modules 11 to 15 shown in FIG. 3. To avoid repetition, I won’t repeat them here.

In one embodiment, a computer-readable storage medium is provided, and computer-readable instructions are stored thereon. When the computer-readable instructions are executed by a processor, the steps of the method for recommending official documents based on the graph structure in the foregoing embodiment are implemented, for example, Steps S10 to S50 shown in FIG. 2. Or, when the computer-readable instructions are executed by the processor, the functions of the modules/units of the apparatus for recommending official documents based on the graph structure in the foregoing embodiments, such as the functions of the modules 11 to 15 shown in FIG. 3, are realized. To avoid repetition, I won’t repeat them here.

A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium or a volatile readable storage medium, the computer-readable instructions may include the processes of the foregoing method embodiments when executed. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

An official document recommendation method based on graph structure, which includes:

Acquire a variety of official documents with different types of official documents, determine the characteristic words in the acquired documents according to TF-IDF based on preset word statistical characteristics, and filter characteristic words with a frequency greater than or equal to the preset frequency according to TF-IDF, and Recording the selected characteristic words as the keyword tags of the official document corresponding thereto;

Input the official document into the preset LDA topic model, calculate the distribution probability matrix of the text topic-keyword in the official document through the LDA topic model, and then obtain the LDA topic model according to the text topic of the official document. The distribution probability matrix of keywords selects the text topics whose selection probability is greater than or equal to the preset probability, and records the selected text topics as the topic tags of the official documents corresponding to them; the text topic-key The distribution probability matrix of a word includes a plurality of the selection probabilities, and the selection probability refers to the probability that the keywords in the official document belong to the text topic of the official document;

Generating official document attributes according to the keyword tags and the topic tags;

Obtain the record data of the official document according to each type of official document, and establish an official document recommendation library based on the graph structure based on the record data of the official document and the document attribute through the Neo4j framework; the official document recommendation library contains multiple images Structure, one said graph structure corresponds to said official document of at least one of said official document type, one said graph structure contains multiple nodes connected to each other; one said node represents said record data, said keyword tag and One of the subject tags;

The retrieval content input by the user from the official document recommendation library is received, and the target document is output according to the order of similarity calculated by SimRank; the similarity refers to the similarity between the retrieval content and the node.
The method for recommending an official document based on a graph structure according to claim 1, wherein before said acquiring multiple official documents with different types of official documents, the method further comprises:

Analyze the overall chapter structure of the official document through the successfully trained BERT model, and obtain an analysis result of the overall chapter structure of the official document; the overall chapter structure refers to the various constituent structures of the official document, and the analysis The result is the result of judging the completeness and rationality of each of the constituent structures of the official document;

When the analysis result is that one of the constituent structures of the official document does not have the completeness or/and the rationality, the constituent structure or/and the inconsistency that is missing in the official document are extracted from the official document. The reasonable composition structure is marked with the missing composition structure or/and the unreasonable composition structure in the official document in a highlighted form, and the preset data recipient is asked to modify the official document.
The method for recommending an official document based on a graph structure according to claim 1, wherein the attributes of the official document further include digital entities; the Neo4j framework is used to establish a graph structure-based document based on the record data of the official document and the official document attributes. Before the official document recommendation library, it also includes:

After performing a digital entity search on the official document through the target entity expression in the preset rule template, locate the target location of the digital entity, and use the capture regular expression in the preset rule template to find the target location of the digital entity. Grab the digital entity.
The method for recommending an official document based on a graph structure according to claim 1, wherein the attributes of the official document further include the time and unit of the official document; the Neo4j framework is based on the record data of the official document and the official document. Before the establishment of an official document recommendation library based on the graph structure, it also includes:

Obtain the official document content of the official document, and identify the official document communication time corresponding to the time component and the communication unit corresponding to the unit component from the official document content through the NLP model;

The generating of official document attributes according to the keyword tags and the topic tags includes:

The official document attribute is generated according to the time of the official document communication, the communication unit, the keyword tag, and the topic tag.
The method for recommending an official document based on a graph structure according to claim 1, wherein the establishment of an official document recommendation database based on the graph structure through the Neo4j framework according to the record data of the official document and the attributes of the official document comprises:

Construct each node corresponding to the official document according to the node attribute through the create node sentence in the Neo4j framework; the node attribute corresponds to the record data and the official document attribute respectively;

Construct a connection relationship between each of the nodes according to a preset relationship through the creation relationship statement in the Neo4j framework; the preset relationship corresponds to the record data and the official document attribute respectively;

The path sentences in the Neo4j framework are used to determine the paths of all the nodes according to the connection relationship, and the official document recommendation library based on the graph structure is established.
The method for recommending official documents based on graph structure according to claim 1, wherein after outputting the target official documents according to the order of similarity calculated by SimRank, the method further comprises:

Compress the target official documents output in sequence according to the degree of similarity into link points, and when the user selects at least one of the link points, the corresponding ones of the link points are completely presented in the form of a preset view The official document content of the target official document; one of the link points corresponds to a piece of the target official document.
An official document recommendation device based on a graph structure, which includes:

The first recording module is used to obtain a variety of official documents with different types of official documents, determine the characteristic words in the obtained official documents according to TF-IDF based on preset word statistical characteristics, and filter according to the TF-IDF the occurrence frequency is greater than or equal to the expected Set frequency characteristic words, and record the selected characteristic words as the corresponding keyword tags of the official document;

The second recording module is used to input the official document into a preset LDA topic model, calculate the distribution probability matrix of the text topic-keywords in the official document through the LDA topic model, and then obtain the LDA topic model according to The text topic-keyword distribution probability matrix of the official document selects the text topics whose selection probability is greater than or equal to the preset probability, and records the selected text topics as the topic tags of the official documents corresponding to them The distribution probability matrix of the text topic-keyword includes a plurality of the selection probabilities, and the selection probability refers to the probability that the keywords in the official document belong to the text theme of the official document;

The first generation module is configured to generate official document attributes according to the keyword tags and the topic tags;

The establishment module is used to obtain the record data of the official document according to each type of official document, and establish an official document recommendation database based on the graph structure according to the record data of the official document and the attribute of the official document through the Neo4j framework; the official document recommendation database Contains multiple graph structures, one graph structure corresponds to the official document of at least one of the document types, and one graph structure contains multiple nodes connected to each other; one of the nodes represents the record data, all One of the keyword tag and the topic tag;

The calculation module is configured to receive the search content input by the user from the official document recommendation library, and output the target official documents according to the order of similarity calculated by SimRank; the similarity refers to the similarity between the search content and the node degree.
7. The apparatus for recommending official documents based on a graph structure according to claim 7, wherein the apparatus for recommending official documents based on a graph structure further comprises:

The analysis module is used to analyze the overall chapter structure of the official document through the successfully trained BERT model to obtain an analysis result of the overall chapter structure of the official document; the overall chapter structure refers to the various components of the official document Structure, the analysis result is the result of judging the completeness and rationality of each of the constituent structures of the official document;

The marking module is used to extract the missing components from the official document when the analysis result is that one of the constituent structures of the official document does not have the completeness or/and the rationality The structure or/and the unreasonable composition structure, the missing composition structure or/and the unreasonable composition structure in the official document are marked in the form of highlighting, and the preset data recipient is asked to refer to the official document to modify.
A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, wherein the processor implements the following steps when the processor executes the computer-readable instructions:

Acquire a variety of official documents with different types of official documents, determine the characteristic words in the acquired official documents according to TF-IDF based on preset word statistical characteristics, and filter characteristic words with a frequency greater than or equal to the preset frequency according to TF-IDF, and Recording the selected characteristic words as the corresponding keyword tags of the official document;

Input the official document into the preset LDA topic model, calculate the distribution probability matrix of the text topic-keyword in the official document through the LDA topic model, and then obtain the LDA topic model according to the text topic of the official document. The distribution probability matrix of keywords selects the text topics whose selection probability is greater than or equal to the preset probability, and records the selected text topics as the topic tags of the official documents corresponding to them; the text topic-key The distribution probability matrix of a word includes a plurality of the selection probabilities, and the selection probability refers to the probability that the keywords in the official document belong to the text topic of the official document;

Generating official document attributes according to the keyword tags and the topic tags;

Obtain the record data of the official document according to each type of official document, and establish an official document recommendation database based on the graph structure based on the record data of the official document and the document attribute through the Neo4j framework; the official document recommendation database contains multiple images Structure, one said graph structure corresponds to said official document of at least one of said official document type, one said graph structure contains multiple nodes connected to each other; one said node represents said record data, said keyword tag and One of the subject tags;

The retrieval content input by the user from the official document recommendation library is received, and the target document is output according to the order of similarity calculated by SimRank; the similarity refers to the similarity between the retrieval content and the node.
9. The computer device according to claim 9, wherein, before said acquiring multiple types of official documents with different types of official documents, the processor further implements the following steps when executing the computer-readable instructions:

Analyze the overall chapter structure of the official document through the successfully trained BERT model, and obtain an analysis result of the overall chapter structure of the official document; the overall chapter structure refers to the various constituent structures of the official document, and the analysis The result is the result of judging the completeness and rationality of each of the constituent structures of the official document;

When the analysis result is that one of the constituent structures of the official document does not have the completeness or/and the rationality, the constituent structure or/and the inconsistency that is missing in the official document are extracted from the official document. The reasonable composition structure is marked with the missing composition structure or/and the unreasonable composition structure in the official document in a highlighted form, and the preset data recipient is asked to modify the official document.
8. The computer device according to claim 9, wherein the attributes of the official document further comprise a digital entity; before the establishment of the official document recommendation database based on the graph structure according to the record data of the official document and the attribute of the official document through the Neo4j framework, The processor further implements the following steps when executing the computer-readable instructions:

After performing a digital entity search on the official document through the target entity expression in the preset rule template, locate the target location of the digital entity, and use the capture regular expression in the preset rule template to find the target location of the digital entity. Grab the digital entity.
The computer device according to claim 9, wherein the attributes of the official document further include the time and unit of the official document; the Neo4j framework is used to establish a graph structure based on the record data of the official document and the attribute of the official document. Before the official document recommendation library of, the processor further implements the following steps when executing the computer-readable instructions:

Obtain the official document content of the official document, and identify the official document communication time corresponding to the time component and the communication unit corresponding to the unit component from the official document content through the NLP model;

The generating of official document attributes according to the keyword tags and the topic tags includes:

The official document attribute is generated according to the time of the official document communication, the communication unit, the keyword tag, and the topic tag.
9. The computer device according to claim 9, wherein said establishing an official document recommendation database based on a graph structure according to said record data of said official document and said official document attribute through the Neo4j framework comprises:

Construct each node corresponding to the official document according to the node attribute through the create node sentence in the Neo4j framework; the node attribute corresponds to the record data and the official document attribute respectively;

Construct a connection relationship between each of the nodes according to a preset relationship through the creation relationship statement in the Neo4j framework; the preset relationship corresponds to the record data and the official document attribute respectively;

The path sentences in the Neo4j framework determine the paths of all the nodes according to the connection relationship, and establish and complete the official document recommendation library based on the graph structure.
9. The computer device according to claim 9, wherein after the target document is output according to the order of similarity calculated by SimRank, the processor further implements the following steps when executing the computer readable instruction:

Compress the target official documents output in sequence according to the degree of similarity into link points, and when the user selects at least one of the link points, the corresponding ones of the link points are completely presented in the form of a preset view The official document content of the target official document; one of the link points corresponds to a piece of the target official document.
One or more readable storage media storing computer readable instructions, where when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Acquire a variety of official documents with different types of official documents, determine the characteristic words in the acquired documents according to TF-IDF based on preset word statistical characteristics, and filter characteristic words with a frequency greater than or equal to the preset frequency according to TF-IDF, and Recording the selected characteristic words as the keyword tags of the official document corresponding thereto;

Input the official document into the preset LDA topic model, calculate the distribution probability matrix of the text topic-keyword in the official document through the LDA topic model, and then obtain the LDA topic model according to the text topic of the official document. The distribution probability matrix of keywords selects the text topics whose selection probability is greater than or equal to the preset probability, and records the selected text topics as the topic tags of the official documents corresponding to them; the text topic-key The distribution probability matrix of a word includes a plurality of the selection probabilities, and the selection probability refers to the probability that the keywords in the official document belong to the text topic of the official document;

Generating official document attributes according to the keyword tags and the topic tags;

Obtain the record data of the official document according to each type of official document, and establish an official document recommendation database based on the graph structure based on the record data of the official document and the document attribute through the Neo4j framework; the official document recommendation database contains multiple images Structure, one said graph structure corresponds to said official document of at least one of said official document type, one said graph structure contains multiple nodes connected to each other; one said node represents said record data, said keyword tag and One of the subject tags;

The retrieval content input by the user from the official document recommendation library is received, and the target document is output according to the order of similarity calculated by SimRank; the similarity refers to the similarity between the retrieval content and the node.
The readable storage medium according to claim 15, wherein, before the acquiring multiple types of official documents with different types of official documents, when the computer-readable instructions are executed by one or more processors, the one or more The processor also performs the following steps:

Analyze the overall chapter structure of the official document through the successfully trained BERT model, and obtain an analysis result of the overall chapter structure of the official document; the overall chapter structure refers to the various constituent structures of the official document, and the analysis The result is the result of judging the completeness and rationality of each of the constituent structures of the official document;

When the analysis result is that one of the constituent structures of the official document does not have the completeness or/and the rationality, the constituent structure or/and the inconsistency that is missing in the official document are extracted from the official document. The reasonable composition structure is marked with the missing composition structure or/and the unreasonable composition structure in the official document in a highlighted form, and the preset data recipient is asked to modify the official document.
The readable storage medium according to claim 15, wherein the attributes of the official document further include digital entities; the Neo4j framework establishes an official document recommendation database based on a graph structure according to the recorded data of the official document and the attributes of the official document Previously, when the computer-readable instructions were executed by one or more processors, the one or more processors further performed the following steps:

After performing a digital entity search on the official document through the target entity expression in the preset rule template, locate the target location of the digital entity, and use the capture regular expression in the preset rule template to find the target location of the digital entity. Grab the digital entity.
The readable storage medium according to claim 15, wherein the attributes of the official document further include the time and unit of the official document; the Neo4j framework is used to establish a document based on the recorded data and the attribute of the official document. Before the official document recommendation library of the graph structure, when the computer-readable instructions are executed by one or more processors, the one or more processors further execute the following steps:

Obtain the official document content of the official document, and identify the official document communication time corresponding to the time component and the communication unit corresponding to the unit component from the official document content through the NLP model;

The generating of official document attributes according to the keyword tags and the topic tags includes:

The official document attribute is generated according to the time of the official document communication, the communication unit, the keyword tag, and the topic tag.
15. The readable storage medium according to claim 15, wherein the establishment of a document recommendation database based on a graph structure according to the recorded data of the document and the document attribute through the Neo4j framework comprises:

Construct each node corresponding to the official document according to the node attribute through the create node sentence in the Neo4j framework; the node attribute corresponds to the record data and the official document attribute respectively;

Construct a connection relationship between each of the nodes according to a preset relationship through the creation relationship statement in the Neo4j framework; the preset relationship corresponds to the record data and the official document attribute respectively;

The path sentences in the Neo4j framework are used to determine the paths of all the nodes according to the connection relationship, and the official document recommendation library based on the graph structure is established.
The readable storage medium according to claim 15, wherein after the target document is output according to the order of similarity calculated by SimRank, when the computer readable instruction is executed by one or more processors, the One or more processors also perform the following steps:

Compress the target official documents output in sequence according to the degree of similarity into link points, and when the user selects at least one of the link points, the corresponding ones of the link points are completely presented in the form of a preset view The official document content of the target official document; one of the link points corresponds to a piece of the target official document.