CN113553825A - Method and system for analyzing context relationship of electronic official document - Google Patents

Method and system for analyzing context relationship of electronic official document Download PDF

Info

Publication number
CN113553825A
CN113553825A CN202110837789.6A CN202110837789A CN113553825A CN 113553825 A CN113553825 A CN 113553825A CN 202110837789 A CN202110837789 A CN 202110837789A CN 113553825 A CN113553825 A CN 113553825A
Authority
CN
China
Prior art keywords
text
similar
official document
document
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110837789.6A
Other languages
Chinese (zh)
Other versions
CN113553825B (en
Inventor
许建兵
朱彦欣
冯伟
刘伟康
李强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Suncn Pap Information Technology Co ltd
Original Assignee
Anhui Suncn Pap Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Suncn Pap Information Technology Co ltd filed Critical Anhui Suncn Pap Information Technology Co ltd
Priority to CN202110837789.6A priority Critical patent/CN113553825B/en
Publication of CN113553825A publication Critical patent/CN113553825A/en
Application granted granted Critical
Publication of CN113553825B publication Critical patent/CN113553825B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Abstract

The invention relates to an electronic official document context relationship analysis method and system, wherein the method comprises the following steps of: predicting the subject words of the known text through a subject model, and storing the subject words to form a database; feature extraction: extracting a characteristic theme of the target official document; and (3) text retrieval: searching the characteristic theme of the target official document from the database, and screening a plurality of similar texts to form a candidate set; text vectorization: performing text vectorization on the target official document and the similar text to obtain document feature vectors of the target official document and the similar text; text calculation: calculating the cosine distance between the target official document and the similar text; text screening: comparing cosine distances of the target official document and the similar texts, and screening out the similar texts of which the cosine distances are smaller than a threshold value; generating a relation tree: and the target official document is a root node, the rest similar texts are father nodes, and a relationship tree is generated.

Description

Method and system for analyzing context relationship of electronic official document
Technical Field
The invention belongs to the technical field of text analysis, and particularly relates to an electronic official document context relationship analysis method and system.
Background
The government affair office system comprises several functions of handling documents, handling affairs, meeting and the like, and the influence and the connection among the documents, the meetings, the events and the news are needed to be known, so that the relation among the affairs needs to be related through an effective method, and an effective document context is formed.
In the related art, the discovery of the context relationship of the electronic official document is mainly realized by a rule method, and the specific realization method is as follows:
1. identifying key entities in the electronic documents to be analyzed, such as policy and regulation names, document names and the like, by using rules or algorithms;
2. searching a database according to the identified entity to find official documents, news, meetings and the like containing the entity; and then calculating the similarity between the retrieved news, meetings and the like and the electronic official documents to be analyzed, filtering out the meetings, the news and the like with lower similarity, sequencing the rest news, meetings and the like according to the time dimension, and displaying.
The related art has the following problems that for government affair information, cross-system government affair information and the like disclosed on the internet, accurate association is difficult to carry out through a simple means due to messy and unscrambled information; there is also no efficient analysis method for social impact of the official documents. Therefore, a method for conveniently analyzing the context of the electronic official document is needed, and the follow-up work can be conveniently guided through the official document information.
Disclosure of Invention
In order to solve the problems, the invention discloses an electronic official document context relationship analysis method which is used for analyzing and processing an electronic official document and providing data support for subsequent work.
In a first aspect, the invention discloses an analysis method for context relationship of electronic documents, which comprises the following steps,
and (3) data storage: predicting the subject words of the known text through a subject model, and storing the subject words to form a database;
feature extraction: extracting a characteristic theme of the target official document;
and (3) text retrieval: searching the characteristic theme of the target official document from the database, and screening out a plurality of similar texts;
text vectorization: performing text vectorization on the target official document and the similar text to obtain document feature vectors of the target official document and the similar text;
text calculation: calculating the cosine distance between the target official document and the similar text according to the document feature vectors of the target official document and the similar text;
text screening: comparing cosine distances between the target official document and the similar texts, and selecting the similar texts of which the cosine distances are more than or equal to a threshold value;
generating a relation tree: and generating a relation tree by taking the target official document as a root node and the rest similar texts as father nodes.
Further, the text vectorization of the target official document and the similar text to obtain the document feature vectors of the target official document and the similar text specifically includes,
predicting word vectors of the target official document and the similar text titles and carrying out weighted average to obtain title feature vectors of the target official document and the similar text;
predicting word vectors of the target official document and the text of the similar text and carrying out weighted average to obtain text characteristic vectors of the target official document and the similar text;
and performing weighted average calculation on the title feature vector and the text feature vector to obtain the document feature vectors of the target official document and the similar text.
Furthermore, when the weighted average calculation is performed on the title feature vector and the text feature vector, the weight of the title feature vector is greater than the weight of the text feature vector.
Furthermore, the analysis method also comprises secondary screening of the text,
and screening time before comparing the cosine distances of the target official document and the similar texts and screening out the similar texts of which the cosine distances are smaller than a threshold value, and screening out the similar texts of which the release time is outside a specified time interval.
Furthermore, the analysis method further comprises adding child nodes of the relationship tree,
after the relationship tree is generated, inputting a similar document corresponding to the father node, repeatedly performing text retrieval, text vectorization, text calculation and text screening, and adding the obtained similar text of the father node into the relationship tree as a child node.
Furthermore, the depth of the relation tree is a preset value.
On the other hand, the invention also discloses an electronic official document context relationship analysis system, which comprises the following technical scheme,
an electronic official document context relationship analysis system, the analysis system comprising,
a data saving module: the system comprises a topic model, a database and a database, wherein the topic model is used for predicting the topic words of the known text and storing the topic words to form the database;
a feature extraction module: the characteristic theme is used for extracting the target official document;
a text retrieval module: the system is used for searching the characteristic theme of the target official document from the database and screening a plurality of similar texts;
a text vectorization module: the text vectorization is carried out on the target official document and the similar text to obtain document feature vectors of the target official document and the similar text;
a text calculation module: the cosine distance between the target official document and the similar text is calculated according to the document feature vectors of the target official document and the similar text;
the text screening module: the similar text is used for comparing the cosine distances of the target official document and the similar text, and the similar text of which the cosine distance is more than or equal to a threshold value is selected;
a relationship tree generation module: and generating a relationship tree by taking the target official document as a root node and the rest similar texts as father nodes.
Further, the text vectorization module includes a title vectorization unit, a body vectorization unit, and a feature vector calculation unit,
the title vectorization unit is used for predicting word vectors of the target official document and the similar text titles and carrying out weighted average to obtain title feature vectors of the target official document and the similar text;
the text vectorization unit is used for predicting word vectors of the target official document and the text of the similar text and carrying out weighted average to obtain text characteristic vectors of the target official document and the similar text;
the feature vector calculation unit is used for performing weighted average calculation on the title feature vector and the text feature vector to obtain document feature vectors of the target official document and the similar text.
Furthermore, the analysis system further comprises a text secondary screening module, wherein the text secondary screening module is used for carrying out time screening before comparing the cosine distances between the target official document and the similar texts and screening out the similar texts of which the cosine distances are smaller than a threshold value, and screening out the similar texts of which the release time is not within a specified time interval.
Furthermore, the analysis system further comprises a child node adding module, wherein the child node adding module is used for repeatedly performing text retrieval, text vectorization, text calculation and text screening after the relationship tree is generated, and adding the obtained similar text of the father node into the relationship tree as a child node.
The present invention has at least the following advantages,
the method comprises the steps of establishing a database of known texts, extracting and searching features of a target text input by a user to obtain similar texts in the database, screening step by step to obtain a final similar text, and analyzing and displaying the official context relationship between the target text and the similar text in a manner of establishing a relationship tree, so that the emotion trend of the text can be conveniently analyzed and follow-up work can be conveniently guided.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow diagram of the analytical method of the present application;
fig. 2 is a schematic diagram of a relationship tree structure in the embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the embodiment of the present application further discloses an electronic official document context relationship analysis method, which includes the steps of, after a user inputs an official document, performing subject word extraction, similar text retrieval, text vectorization, text calculation and screening, and generating and storing a relationship tree.
The embodiment of the present application further discloses an electronic official document context relationship analysis system, which includes: the system comprises a data storage module, a feature extraction module, a text retrieval module, a text vectorization module, a text calculation module, a text screening module, a relation tree generation module and a child node adding module.
The following describes the steps of the method with reference to the above analysis system, and the analysis method includes the following steps:
and S1, storing data, wherein the data storage module predicts the subject words of the known text through the subject model and stores the subject words to form a database.
The Topic Model is a BTM (Biterm Topic Model), an open-source known text is obtained through a trained BTM, subject words of the known text are predicted, all the subject words are collected and stored to form a database, and subsequent retrieval and use are facilitated. For example, meetings, news, and the like are crawled as known text from a number of currently popular news portal sites.
After crawling to obtain the known text, the known text can be subjected to data cleaning, word segmentation and stop word removal are included, and the structure of the database is more accurate. When the subject term of the known text is predicted, n subject terms are extracted from each text, and then the subject terms are stored in a database. The BTM model is used for extracting the subject words, and the method has a good extracting effect on known texts such as official documents, meetings and news with the possibility of short texts.
And S2, feature extraction, wherein the target document is input, and the feature extraction module extracts the feature theme of the target document.
The characteristic topics correspond to the topic words, when the characteristic topics of the target official document are extracted, the BTM model is also adopted for prediction, and besides prediction of each characteristic topic, prediction of the probability of occurrence of each characteristic topic is also carried out.
And S3, text retrieval, wherein the text retrieval module retrieves the characteristic theme of the target document from the database and screens out a plurality of similar texts to form a candidate set.
When searching is carried out, the characteristic subject of the target official document is used as a search word, and the probability of the appearance of the characteristic subject is used as a weight. The retrieved similar texts need to be sorted, and the Elasticsearch itself carries out similarity sorting on the retrieved data. The method uses the elastic search to search and sort, the sorting is carried out according to the word frequency of the similar texts and the sorting is carried out from top to bottom, and the top n similar texts in the search result are used as a similar candidate set for the subsequent analysis process.
The method has the advantages that the text which is possibly similar can be screened out by using the Elasticissearch search, the running complexity of the subsequent algorithm is reduced, and the running efficiency of the whole algorithm can be improved. Increasing the number of similar texts in the candidate set can increase the recall rate, and at the same time, the complexity of the algorithm is also increased, and a reasonable value should be selected, in the embodiment of the present application, n is set to 10 in the screening, and the top 10 similar texts are selected.
And S4, text vectorization, wherein the text vectorization module carries out text vectorization on the target official document and the similar text to obtain the document feature vectors of the target official document and the similar text.
The text vectorization module comprises a title vectorization unit, a text vectorization unit and a feature vector calculation unit.
In this embodiment, a word vector of each piece of data is predicted by an ELMO model (deep quantized word representation), and during vectorization, a title vectorization unit predicts a title word vector of each piece of data (including a target text and similar texts in a candidate set) respectively, and obtains title feature vectors of the target text and the similar texts by weighted averaging. And the text vectorization unit respectively predicts the text word vector of each piece of data (including the target text and the similar texts in the candidate set), and obtains the text feature vectors of the target official document and the similar texts through weighted average. And the feature vector calculation unit performs weighted average calculation on the title feature vector and the text feature vector to obtain the document feature vectors of the target official document and the similar text.
The title feature vector is title _ embedding, the text feature vector is content _ embedding, the document feature vector is news _ embedding, when performing weighted average calculation,
news_embedding=0.6*title_embedding+0.4*content_embedding
it can be seen that the title feature vector is given higher weight, which takes into account that the focus on the title is generally higher than the body when looking at and referencing the official document.
And S5, text calculation, wherein the text calculation module calculates the cosine distance between the target official document and the similar text according to the document feature vectors of the target official document and the similar text.
And respectively calculating to obtain cosine distances between the target official document and 10 similar texts in the candidate set, and storing the cosine distances between each similar text and the target official document for a subsequent screening process.
Commonly used text similarity calculation methods include: jacard similarity, edit distance, cosine similarity, etc.
The Jacard similarity measures the discrimination of the two sets by the proportion of different elements in the two sets in all elements, and has the defect that the Jacard similarity is only suitable for the sets of binary data;
the editing distance refers to the minimum number of editing operations required for converting one character string into another character string, and the defects are high algorithm complexity and low efficiency.
The Jacard similarity and the editing distance are discretized analysis of words in the document and cannot be considered in the semantic level.
Cosine similarity is a measure for measuring the difference between two individuals by using a cosine value of an included angle between two vectors in a vector space, and is commonly used for operation between two text vectors.
And S6, text screening, wherein the text screening module is used for screening out similar texts of which the cosine distance is smaller than a threshold value by comparing the preselected distance between the target official document and each similar text, and keeping the similar texts of which the cosine distance is larger than or equal to the threshold value.
In addition, the text secondary screening module can also screen similar texts, time screening is carried out before the cosine distance between the target official document and the similar texts is compared and the similar texts with the cosine distance smaller than a threshold value are screened out, the similar texts with the release time outside a specified time interval are screened out, and only the similar texts with the release time between the first half year and the second half year of the release time of the target official document are reserved, so that the similar texts have higher timeliness. And screening the similar texts to obtain the remaining similar texts which meet the conditions (namely, the cosine distance is greater than or equal to the threshold value).
And S7, generating a relation tree, wherein the relation tree generation module takes the target official document as a root node, and the screened residual similar texts as father nodes to generate the relation tree with the depth of 2.
As shown in fig. 2, the diagram is a schematic diagram of a relationship tree, and is used for performing distance display on the generated relationship tree. After the screening analysis is carried out, the target official document input by the user is taken as a root node, the remaining similar texts stored in the previous step are taken as father nodes, a relationship tree convenient to store and view is formed, and meanwhile sequencing of all the father nodes in the relationship tree is carried out through the publishing time of the similar official documents.
After the above steps are performed, the relationship tree needs to be grown, the child node adding module repeats the steps of S3-S6 after the relationship tree is generated (at this time, the single similar text is used as the target text, that is, the input related meeting or news corresponding to the parent node), and the obtained similar text is used as the child node. In the screening process of the child node in this step, meetings and news which are already appeared in the parent node need to be screened out. After acquiring the child nodes of all father nodes, only keeping the child nodes which belong to half or more than half of father nodes at the same time, filtering all the other child nodes, and only selecting the child node with the highest similarity with the father node for the kept child nodes to add the child node into the relationship tree. And for the child nodes, the steps are continued until no new child nodes are generated or the depth of the depth channel of the relation tree is specified. In this embodiment, the specified depth of the relationship tree is 3.
In fig. 2, for convenience of illustration, only 6 similar texts are selected for each search, that is, n is 6, and three layers of texts from top to bottom represent a root node, a parent node, and a child node in the relationship tree, respectively. The texts framed by the wide dashed lines in the parent node and the child node layer represent the screened and filtered texts, are not added into the relationship tree, and are placed in fig. 2 to facilitate the display of the relationship tree. The text in the dotted dashed box represents the retrieved but unfiltered filtered text from which the child nodes were filtered.
Through the steps, the target official document and the similar text are connected in the relation tree mode, the target official document and the similar text are displayed in the relation tree mode, checking and subsequent analysis are facilitated, meanwhile sequencing is conducted according to the time dimension, and analysis of the emotion trend is facilitated.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for analyzing context relationship of electronic official document is characterized by comprising the following steps,
and (3) data storage: predicting the subject words of the known text through a subject model, and storing the subject words to form a database;
feature extraction: extracting a characteristic theme of the target official document;
and (3) text retrieval: searching the characteristic theme of the target official document from the database, and screening out a plurality of similar texts;
text vectorization: performing text vectorization on the target official document and the similar text to obtain document feature vectors of the target official document and the similar text;
text calculation: calculating the cosine distance between the target official document and the similar text according to the document feature vectors of the target official document and the similar text;
text screening: comparing cosine distances of the target official document and the similar texts, and selecting the similar texts of which the cosine distances are greater than or equal to a threshold value;
generating a relation tree: and generating a relationship tree by taking the target document as a root node and the similar text of which the cosine distance is greater than or equal to a threshold value as a father node.
2. The method according to claim 1, wherein the text vectorization of the target document and the similar text to obtain the document feature vectors of the target document and the similar text specifically comprises,
predicting word vectors of the target official document and the similar text titles and carrying out weighted average to obtain title feature vectors of the target official document and the similar text;
predicting word vectors of the target official document and the text of the similar text and carrying out weighted average to obtain text characteristic vectors of the target official document and the similar text;
and performing weighted average calculation on the title feature vector and the text feature vector to obtain the document feature vectors of the target official document and the similar text.
3. The method as claimed in claim 2, wherein the weight of the header feature vector is greater than the weight of the body feature vector when performing weighted average calculation on the header feature vector and the body feature vector.
4. The method according to claim 1, further comprising a second text filtering,
and screening time before comparing the cosine distances of the target official document and the similar texts and screening out the similar texts of which the cosine distances are smaller than a threshold value, and screening out the similar texts of which the release time is outside a specified time interval.
5. The method of claim 1, further comprising adding child nodes of a relationship tree,
after the relationship tree is generated, inputting a similar document corresponding to the father node, repeatedly performing text retrieval, text vectorization, text calculation and text screening, and adding the obtained similar text of the father node into the relationship tree as a child node.
6. The method according to any one of claims 1 or 5, wherein the depth of the relationship tree is a predetermined value.
7. An electronic official document context relationship analysis system, characterized in that the analysis system comprises,
a data saving module: the system comprises a topic model, a database and a database, wherein the topic model is used for predicting the topic words of the known text and storing the topic words to form the database;
a feature extraction module: the characteristic theme is used for extracting the target official document;
a text retrieval module: the system is used for searching the characteristic theme of the target official document from the database and screening a plurality of similar texts;
a text vectorization module: the text vectorization is carried out on the target official document and the similar text to obtain document feature vectors of the target official document and the similar text;
a text calculation module: the cosine distance between the target official document and the similar text is calculated according to the document feature vectors of the target official document and the similar text;
the text screening module: the similar text is used for comparing the cosine distances of the target official document and the similar text, and the similar text of which the cosine distance is greater than or equal to a threshold value is selected;
a relationship tree generation module: and generating a relationship tree by taking the target document as a root node and the similar text of which the cosine distance is greater than or equal to a threshold value as a father node.
8. The electronic official document context and relationship analysis system of claim 7, wherein the text vectorization module includes a title vectorization unit, a body vectorization unit and a feature vector calculation unit,
the title vectorization unit is used for predicting word vectors of the target official document and the similar text titles and carrying out weighted average to obtain title feature vectors of the target official document and the similar text;
the text vectorization unit is used for predicting word vectors of the target official document and the text of the similar text and carrying out weighted average to obtain text characteristic vectors of the target official document and the similar text;
the feature vector calculation unit is used for performing weighted average calculation on the title feature vector and the text feature vector to obtain document feature vectors of the target official document and the similar text.
9. The system according to claim 7, further comprising a secondary text filtering module,
the text secondary screening module is used for screening time before comparing cosine distances of the target official document and the similar text and screening out the similar text of which the cosine distance is smaller than a threshold value, and screening out the similar text of which the release time is not within a specified time interval.
10. The electronic official document context and relationship analysis system of claim 7, further comprising a child node adding module,
and the child node adding module is used for repeatedly performing text retrieval, text vectorization, text calculation and text screening after the relationship tree is generated, and adding the obtained similar text of the father node into the relationship tree as a child node.
CN202110837789.6A 2021-07-23 2021-07-23 Method and system for analyzing context relationship of electronic official document Active CN113553825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110837789.6A CN113553825B (en) 2021-07-23 2021-07-23 Method and system for analyzing context relationship of electronic official document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110837789.6A CN113553825B (en) 2021-07-23 2021-07-23 Method and system for analyzing context relationship of electronic official document

Publications (2)

Publication Number Publication Date
CN113553825A true CN113553825A (en) 2021-10-26
CN113553825B CN113553825B (en) 2023-03-21

Family

ID=78104286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110837789.6A Active CN113553825B (en) 2021-07-23 2021-07-23 Method and system for analyzing context relationship of electronic official document

Country Status (1)

Country Link
CN (1) CN113553825B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243537A1 (en) * 2001-09-07 2004-12-02 Jiang Wang Contents filter based on the comparison between similarity of content character and correlation of subject matter
CN111104794A (en) * 2019-12-25 2020-05-05 同方知网(北京)技术有限公司 Text similarity matching method based on subject words
CN111382276A (en) * 2018-12-29 2020-07-07 中国科学院信息工程研究所 Event development venation map generation method
CN111666401A (en) * 2020-05-29 2020-09-15 平安科技(深圳)有限公司 Official document recommendation method and device based on graph structure, computer equipment and medium
CN112084298A (en) * 2020-07-31 2020-12-15 北京明略昭辉科技有限公司 Public opinion theme processing method and device based on rapid BTM
CN112463952A (en) * 2020-12-22 2021-03-09 安徽商信政通信息技术股份有限公司 News text aggregation method and system based on neighbor search
CN113032575A (en) * 2021-05-28 2021-06-25 北京明略昭辉科技有限公司 Document blood relationship mining method and device based on topic model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243537A1 (en) * 2001-09-07 2004-12-02 Jiang Wang Contents filter based on the comparison between similarity of content character and correlation of subject matter
CN111382276A (en) * 2018-12-29 2020-07-07 中国科学院信息工程研究所 Event development venation map generation method
CN111104794A (en) * 2019-12-25 2020-05-05 同方知网(北京)技术有限公司 Text similarity matching method based on subject words
CN111666401A (en) * 2020-05-29 2020-09-15 平安科技(深圳)有限公司 Official document recommendation method and device based on graph structure, computer equipment and medium
CN112084298A (en) * 2020-07-31 2020-12-15 北京明略昭辉科技有限公司 Public opinion theme processing method and device based on rapid BTM
CN112463952A (en) * 2020-12-22 2021-03-09 安徽商信政通信息技术股份有限公司 News text aggregation method and system based on neighbor search
CN113032575A (en) * 2021-05-28 2021-06-25 北京明略昭辉科技有限公司 Document blood relationship mining method and device based on topic model

Also Published As

Publication number Publication date
CN113553825B (en) 2023-03-21

Similar Documents

Publication Publication Date Title
CN109189942B (en) Construction method and device of patent data knowledge graph
JP5092165B2 (en) Data construction method and system
US7792786B2 (en) Methodologies and analytics tools for locating experts with specific sets of expertise
US20080243889A1 (en) Information mining using domain specific conceptual structures
CN107463548B (en) Phrase mining method and device
CN112256939B (en) Text entity relation extraction method for chemical field
CN106156372B (en) A kind of classification method and device of internet site
WO2002025479A1 (en) A document categorisation system
CN104392006B (en) A kind of event query processing method and processing device
CN112699246A (en) Domain knowledge pushing method based on knowledge graph
CN110287292B (en) Judgment criminal measuring deviation degree prediction method and device
CN110569496A (en) Entity linking method, device and storage medium
US20160170993A1 (en) System and method for ranking news feeds
CN107577672B (en) Public opinion-based script setting method and device
CN112036659B (en) Social network media information popularity prediction method based on combination strategy
CN111723256A (en) Government affair user portrait construction method and system based on information resource library
CN107526721A (en) A kind of disambiguation method and device to electric business product review vocabulary
CN110413998B (en) Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof
Jaman et al. Sentiment analysis of customers on utilizing online motorcycle taxi service at twitter with the support vector machine
CN113553825B (en) Method and system for analyzing context relationship of electronic official document
CN114282119A (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network
Hanjalic et al. Dancers: Delft advanced news retrieval system
CN113468339A (en) Label extraction method, system, electronic device and medium based on knowledge graph
Rahmayudha et al. Recommendation System for the Improvement of E-Government Services in the Tourism Sector of Pontianak City
CN117648444B (en) Patent clustering method and system based on graph convolution attribute aggregation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant