CN112699246B - Domain knowledge pushing method based on knowledge graph - Google Patents

Domain knowledge pushing method based on knowledge graph Download PDF

Info

Publication number
CN112699246B
CN112699246B CN202011522006.7A CN202011522006A CN112699246B CN 112699246 B CN112699246 B CN 112699246B CN 202011522006 A CN202011522006 A CN 202011522006A CN 112699246 B CN112699246 B CN 112699246B
Authority
CN
China
Prior art keywords
text
knowledge
graph
task
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011522006.7A
Other languages
Chinese (zh)
Other versions
CN112699246A (en
Inventor
李蔚清
颜于升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202011522006.7A priority Critical patent/CN112699246B/en
Publication of CN112699246A publication Critical patent/CN112699246A/en
Application granted granted Critical
Publication of CN112699246B publication Critical patent/CN112699246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a domain knowledge pushing method based on a knowledge graph, which comprises the following steps: collecting a domain knowledge text to construct a text knowledge base; performing semantic analysis and topic modeling on knowledge base texts; embedding a domain knowledge graph to obtain semantic distribution vectors of the nodes; establishing task context characteristics according to user task description and task topics; performing entity alignment according to a domain entity and a knowledge graph in task description, performing feature expansion based on graph node paths and graph node semantic distribution characteristics, and performing task-associated knowledge recall; performing text similarity calculation on the recalled text and the user task text to obtain a candidate text score; and pushing the sequencing result text to the user according to the score. The invention improves the text matching degree and the user experience of the domain knowledge pushing by the node association knowledge of the knowledge graph and the graph embedding technology.

Description

Domain knowledge pushing method based on knowledge graph
Technical Field
The invention belongs to the computer application technology, and particularly relates to a domain knowledge pushing method based on a knowledge graph.
Background
In the increasing scale production and fierce service competition, with the continuous appearance of large-scale complex business systems, enterprises conduct a large amount of business knowledge management, and accumulate a large amount of rich system management knowledge. The large-scale complex system is indispensable, a plurality of defects occur, and standardized system inspection and defect repair are required frequently. However, at present, on-site maintenance work generally carries out system troubleshooting through technology and experience accumulation of workers, and an effective practical intelligent supporting means is lacked to help the workers to carry out standardized operation, quickly acquire relevant knowledge of system faults and quickly update relevant data.
With the continuous development of service systems, the coverage area is continuously enlarged, the number is continuously increased, the network architecture is continuously upgraded, and the complexity of system maintenance is continuously improved. Therefore, operation and maintenance personnel are required to achieve the operation standard in the maintenance process of the system, and the processing method meets the requirements. Therefore, a set of systematized operable operation and maintenance flow and a knowledge pushing system for guiding business operation flow are constructed through the field knowledge accumulated by enterprises, and the requirements for improving the quality and the efficiency of the whole maintenance work are very necessary.
Knowledge push is a technology that automatically selects specific information related to or interested in a user from a server according to a certain protocol, and periodically transmits the information to the user in a certain way so as to reduce the learning cost of the user. Knowledge pushing mainly comprises three stages, namely a user data acquisition stage, a data processing stage and a pushing stage. The method has the main idea that the server actively pushes information which is interested by the user to the user according to the acquired state and intention of the user, so that the time of the user for retrieving the information is shortened, meanwhile, the information is screened according to the purpose and the interest of the user, the user is helped to discover valuable information, and the accuracy and the efficiency of the user for acquiring the information are improved. At present, all industries carry out related research and experiments of knowledge pushing technology in the system in related fields. However, most still employ open-world-oriented knowledge recommendation similar approaches, such as content-based recommendation, collaborative filtering-based, model-based approaches. The classical theoretical method generally adopts user behaviors collected by a system to perform user portrait modeling, and recommends through item feature modeling and user collaborative filtering strategies. The problems of cold start caused by the imperfection of the theoretical method, push content limitation and death caused by the Martian effect and the like are solved.
The classical recommendation algorithm is often used for serving various products, and recommendation of various form information including pictures, audio, characters, videos, commodities and the like is not suitable for pushing professional knowledge in various industries or fields.
Disclosure of Invention
The invention provides a domain knowledge pushing method based on a knowledge graph.
The technical scheme for realizing the purpose of the invention is as follows: a domain knowledge pushing method based on knowledge graph comprises the following specific steps:
step 1, constructing a text knowledge base, wherein the text knowledge base is composed of a field knowledge text;
step 2, performing semantic analysis and topic modeling on the knowledge base text;
step 3, obtaining semantic distribution vectors of knowledge points by carrying out graph embedding processing on the domain knowledge graph;
step 4, establishing a task context feature vector according to the user task description and the task theme;
step 5, performing entity alignment on the domain entities in the user task description text and the domain knowledge graph in the step 3, performing feature expansion based on graph node paths and graph node semantic distribution characteristics, and performing task associated knowledge recall;
step 6, performing text similarity calculation on the recall text obtained in the step 5 and the user task to obtain a recall text score;
step 7, pushing the sorting result text to the user according to the scores;
step 8, if the user task is finished, the pushing is terminated; steps 4 to 7 are repeated when the user's scene and status changes.
Preferably, the construction method of the text knowledge base comprises the following steps: determining a knowledge range according to the field task requirement, and screening the content; sentence division is carried out on the text, and stop words are filtered; and constructing the final text set into a text knowledge base.
Preferably, the specific method for performing semantic analysis on the knowledge base text is as follows:
segmenting a knowledge text, and training the text by adopting an unsupervised WORD2VEC WORD embedding algorithm to obtain a semantic distribution vector of WORDs;
and calculating the semantic vector of the text sentence by adopting a method based on the word vector weighted sum.
Preferably, the specific method for text topic modeling is as follows:
performing word segmentation on texts in a knowledge base, performing word frequency statistics on text sentences in the knowledge base according to word segmentation results, and performing word filtering on the texts with the word frequency lower than a preset threshold value;
performing character processing on the sentence to obtain a BIGRAM dictionary of the knowledge base text and to construct a mapping table from the text to the corresponding bag-of-words vector;
and acquiring a bag-of-words vector of the knowledge base text through a mapping table, and training the bag-of-words vector as the input of an LDA algorithm to acquire a theme distribution vector of the knowledge base text.
Preferably, the specific method for obtaining the semantic distribution vector of the knowledge graph nodes is as follows:
step 3.1, constructing a domain knowledge graph, including two tasks of named entity identification and relationship extraction, and obtaining a domain knowledge entity and a relationship between the entities by adopting a BERT-based pre-training model to perform supervised learning;
and 3.2, obtaining a map node semantic distribution vector, and learning the node topology in the domain knowledge map through a map convolution neural network to obtain the semantic distribution vector of the node.
Preferably, the specific method for establishing the task context characteristics is as follows:
step 4.1, performing word segmentation on the user task description text, and performing vectorization representation of task description by using the word vector trained in the step 2 to serve as a semantic feature of the user task;
and 4.2, extracting entities in the user task theme, and obtaining an entity expression vector associated with the operation and detection task by using the knowledge graph node semantic distribution vector trained in the step 3 as a classification characteristic of the user task.
Preferably, the specific steps of aligning the domain entities in the user task description text with the domain knowledge graph in step 3, performing feature expansion based on graph node paths and graph node semantic distribution features, and performing task-associated knowledge recall include:
step 5.1, acquiring task description and task association system components according to a user task entity, and performing entity alignment operation on a knowledge graph spectrum to obtain a sub-graph corresponding to the task entity on the graph spectrum;
step 5.2, calculating the embedded vectors of the entity of the sub-graph in the step 5.1, and obtaining the word embedded vectors of the entity nodes on each path in the three hops of the sub-graph;
step 5.3, performing key path expansion on entity nodes of each path of the graph;
and 5.4, carrying out knowledge base text filtering by taking the user task context characteristics, the graph embedded vector of the task entity and the embedded vector of the sub-graph node combination in the step 4 as a primary recall condition to obtain a recall text with rough knowledge precision of the task associated nodes.
Preferably, the text similarity calculation is performed between the recall text obtained in step 5 and the user task, and the specific method for obtaining the score of the recall text is as follows:
6.1, respectively calculating topic distribution vectors of the recalled text and the user task according to the topic model of the text knowledge base obtained in the step 2;
step 6.2, according to a word migration distance algorithm, performing word-level similarity calculation on the recalled text and the task description to obtain a word migration distance similarity score of the recalled text;
6.3, calculating the similarity according to a cosine formula of the vector space to obtain a similarity score of the recalled text theme;
and 6.4, calculating scores based on a weighted voting strategy, and adjusting the word shift distance weight and the topic similarity weight according to the tasks.
Compared with the prior art, the invention has the following remarkable advantages:
(1) the method is based on the domain knowledge map, overcomes the Martian effect of a recommendation system through rich domain entity associated knowledge, and expands the diversity of pushed knowledge according to the associated knowledge;
(2) according to the method, the modeling is carried out based on the scene and the user task, the attributes and the characteristics of the task are captured more effectively, the distinguishing capability of the specific task associated knowledge text is enhanced, and the accuracy of text knowledge pushing is improved;
(3) the method is based on semantic feature calculation, has strong interpretability, and can flexibly adapt to diversified scenes and tasks by replacing a feature model and a similarity calculation method;
(4) the invention adopts an unsupervised method, and can obtain better performance and accuracy of knowledge recommendation even in large-scale domain knowledge;
(5) the method has good portability, can be popularized to various fields with similar scene and task requirements, and provides knowledge push service.
The present invention is described in further detail below with reference to the attached drawings.
Drawings
Fig. 1 is a flowchart of a domain knowledge push method based on a knowledge graph.
FIG. 2 is a named entity recognition flow diagram.
FIG. 3 is a flow chart of entity relationship extraction.
FIG. 4 is a schematic view of a knowledge-graph structure.
Fig. 5 is a text similarity calculation flowchart.
Detailed Description
A domain knowledge pushing method based on knowledge graph includes the following steps:
step 1, constructing a text knowledge base, wherein the text knowledge base is composed of a field knowledge text;
specifically, the construction method of the text knowledge base comprises the following steps: and determining the knowledge range according to the field task requirements, and screening the content. The method comprises the steps of sentence segmentation, stop word filtering and the like on texts, wherein the stop words are mainly provided by experts in the field. The final text set is constructed as a text knowledge base.
Step 2, performing semantic analysis and text topic modeling on the knowledge base text;
in one embodiment, the specific method for performing semantic analysis on the knowledge base text is as follows:
the method comprises the steps of segmenting a knowledge text, and training the text by adopting an unsupervised WORD2VEC WORD embedding algorithm to obtain a semantic distribution vector of WORDs, namely a WORD vector. In the aspect of semantic vector representation of text sentences, a method based on word vector weighted sum is adopted for calculation. Specifically, a higher weight is given to a vocabulary having a high degree of matching with the task description text, and a lower weight is given to an irrelevant vocabulary. Here the degree of match is measured in terms of the number of string matches.
In one embodiment, the specific method for text topic modeling is as follows:
and performing word segmentation on the texts in the knowledge base, performing word frequency statistics on the text sentences in the knowledge base according to word segmentation results, and performing word filtering on the texts of which the word frequency is lower than a preset threshold value.
And performing character processing on the sentence to obtain a BIGRAM dictionary of the knowledge base text and using the BIGRAM dictionary to construct a mapping table from the text to the corresponding bag-of-words vector. And finally, acquiring a bag-of-words vector of the knowledge base text through a mapping table, and training the bag-of-words vector as LDA algorithm input to acquire a theme distribution vector of the knowledge base text.
Step 3, obtaining semantic distribution vectors of knowledge points by carrying out graph embedding processing on the domain knowledge graph;
in one embodiment, the specific method for obtaining the semantic distribution vector of the knowledge graph node is as follows:
and 3.1, constructing a domain knowledge graph, mainly comprising two tasks of named entity identification and relationship extraction, wherein supervised learning is carried out by adopting a BERT pre-training model to obtain domain knowledge entities and relationships among the entities. The constructed power knowledge graph is mainly stored in a form of a triplet, such as < transformer, component and bushing >, and the construction process is respectively shown in fig. 2, 3 and 4.
And 3.2, obtaining a semantic distribution vector of the map nodes. Graph embedding is a knowledge graph node semantic distribution vector representation technology and can be obtained by algorithms such as random walk. The embodiment adopts a GCN graph-based neural network to carry out graph node embedded representation learning. Specifically, the node topology in the domain knowledge graph is learned through a graph convolution neural network, namely, the attributes and connection relation semantics of the graph nodes are mapped to a low-dimensional space through the neural network, so that the semantic distribution vector of the nodes is obtained. The learning effect of the node classification task can be effectively improved by adding the node attribute information in the training process.
Step 4, establishing a task context feature vector according to the user task description and the task theme;
in one embodiment, the specific method for establishing the task context features includes:
step 4.1, performing word segmentation processing on the user task description text, and performing vectorization representation of task description by using the word vector trained in the step 2 to serve as a semantic feature of the user task;
and 4.2, extracting entities in the user task theme, and obtaining an entity expression vector associated with the operation and detection task by using the knowledge graph node semantic distribution vector trained in the step 3 as a classification characteristic of the user task.
Step 5, performing entity alignment on the domain entities in the user task description text and the domain knowledge graph in the step 3, performing feature expansion based on graph node paths and graph node semantic distribution characteristics, and performing task associated knowledge recall;
in a further embodiment, the method comprises the following specific steps:
step 5.1, acquiring task description and task association system components according to a user task entity, and performing entity alignment operation on a knowledge graph spectrum to obtain a sub-graph corresponding to the task entity on the graph spectrum;
step 5.2, calculating the embedded vectors of the entity of the sub-graph in the step 5.1, and obtaining the word embedded vectors of the entity nodes on each path in the three hops of the sub-graph;
step 5.3, performing key path expansion on entity nodes of various paths of the subgraph, namely combining nodes on paths in three hops to obtain sentence embedding vectors with combined characteristics, wherein the combination mode adopts a sum-average method;
and 5.4, carrying out knowledge base text filtering by taking the user task context characteristics, the graph embedded vector of the task entity and the embedded vector of the sub-graph node combination in the step 4 as a primary recall condition to obtain a recall text with rough knowledge precision of the task associated nodes.
Step 6, performing text similarity calculation on the recall text obtained in the step 5 and the user task to obtain a recall text score;
in a further embodiment, the method comprises the following specific steps:
6.1, respectively calculating topic distribution vectors of the recall text and the user task according to the topic model of the text knowledge base obtained in the step 2;
step 6.2, according to a word migration distance algorithm, performing word-level similarity calculation on the recalled text and the task description to obtain a word migration distance similarity score of the recalled text;
6.3, obtaining the similarity of the recall text and the theme of the user task, namely calculating the similarity according to a cosine formula of a vector space to obtain the similarity score of the recall text theme;
and 6.4, performing final score calculation based on a weighted voting strategy, and adjusting the word shift distance weight and the topic similarity weight according to the task. The voting result is the candidate document score.
Step 7, pushing the sorted texts to a user according to the scores;
step 8, if the user task is finished, the pushing is terminated; steps 4 to 7 are repeated when the user changes scene and state.
The invention mainly completes the pushing of the domain knowledge according to the following aspects:
1) mining domain knowledge: in the service system maintenance work, the user work content and scene are often required to be recorded and analyzed, but the information is usually scattered, the relevance is not high, and the features are sparse. Therefore, abundant domain knowledge reserves are needed for the problems to be solved by users, and the knowledge graph is a structured knowledge representation form with abundant association formed by mining the entity and the relationship of unstructured text information in the vertical domain, so that the requirements on knowledge storage and mining are met.
2) Task feature modeling: the operation and maintenance tasks of the user need to be operated according to certain specifications and procedures. Compared with the traditional pushing system, knowledge pushing needs to take specific tasks and task scenes of a user as a starting point, does not need to infer according to user preferences and historical operations, and is to carry out knowledge association feature mining on massive texts and tasks, so that knowledge texts with the same semantic connotation as the tasks are pushed.
3) Text matching calculation: a large number of text recalls are the main content of knowledge pushing, and recall precision influences the final effect of subsequent text similarity calculation. In addition, the result form of knowledge pushing is short and high-accuracy knowledge text, and the method relates to the technology related to natural language processing.
Examples
A domain knowledge pushing method based on knowledge graph is disclosed, as shown in figure 1, the key steps and implementation are as follows:
step 1, collecting knowledge texts of the power equipment to construct a text knowledge base.
The electric power field text knowledge base is a text set aiming at knowledge required by system tasks and is a source for pushing auxiliary knowledge of transformer substation operation and inspection tasks. The sources of the knowledge base mainly comprise electric power operation examination authoritative books, electric power operation examination related journal documents, electric power science research institute internal documents and an electric power operation examination subject network encyclopedic question-answer knowledge base.
After the knowledge source obtains the documents, the knowledge range is determined according to the requirements of the electric power operation and inspection task, the documents such as a transformer, a circuit breaker, a secondary non-electric quantity device, a protection device and the like are mainly related, and then the contents are screened. And performing sentence segmentation, stop word filtering and other processing on the text, wherein the stop words are mainly provided by experts in the field of electric power operation and detection. And constructing the final text set into a power domain text corpus.
And 2, performing semantic analysis and topic modeling on the knowledge base text of the power equipment. The method is implemented according to the following steps:
and 2.1, segmenting WORDs of the text related to the equipment, wherein the WORDs comprise equipment description, equipment operation and inspection task description and equipment defect description, and training the text by adopting an unsupervised WORD2VEC WORD embedding algorithm to obtain semantic distribution vectors of the WORDs. In the aspect of semantic vector representation of text sentences, a method based on word vector weighted sum is adopted for calculation. Specifically, a high weight is given to words having a high degree of matching with the electric power equipment operation task description text, and a low weight is given to irrelevant words. Here, the matching degree is measured by the number of character string matching.
And 2.2, modeling the text theme of the knowledge base. The method comprises the steps of segmenting words of texts in a text base of the power equipment, counting word frequencies in the text base, and filtering words with low word frequencies according to needs. The method comprises the steps of segmenting words of texts in a knowledge base, carrying out word frequency statistics on text sentences in the knowledge base according to word segmentation results, and carrying out word filtering when the word frequency is lower than a preset threshold value. And then, performing character processing on the sentence to obtain a BIGRAM dictionary of the text of the knowledge base of the power equipment, and using the BIGRAM dictionary to construct a mapping table from the text to a corresponding bag-of-words vector. And finally, acquiring a bag-of-words vector of the knowledge base text through a mapping table, and training the bag-of-words vector as LDA algorithm input to acquire a theme distribution vector of the equipment knowledge base text.
Semantic analysis here includes both word embedding and sentence embedding. Semantic analysis can keep sentence semantic information, calculate the similarity between texts on a semantic level, and is different from the similarity on simple vocabulary fonts.
And 3, carrying out graph embedded processing on the knowledge graph in the power field to obtain a semantic distribution vector of the node. The method is implemented according to the following steps:
and 3.1, constructing a domain knowledge graph, mainly comprising two tasks of named entity identification and relationship extraction, wherein supervised learning is carried out by adopting a BERT pre-training model to obtain domain knowledge entities and relationships among the entities. The constructed power knowledge graph is mainly stored in a form of a triplet, such as < transformer, component and bushing >, and the construction process is respectively shown in fig. 2, 3 and 4.
And 3.2, obtaining a semantic distribution vector of the map node. Graph embedding is a knowledge graph node semantic distribution vector representation technology and can be obtained by algorithms such as random walk and the like. The implementation adopts a GCN-based graph neural network to carry out graph node embedded representation learning. Specifically, the node topology in the domain knowledge graph is learned through a graph convolution neural network, that is, the attributes and the connection relation semantics of the graph nodes are mapped to a low-dimensional space through the neural network, so that the semantic distribution vector of the nodes is obtained. The learning effect of the node classification task can be effectively improved by adding the node attribute information in the training process.
And 4, establishing task context characteristics through task description according to the electric power equipment operation and detection task of the user by taking a light gas alarm task as an example. The method is implemented according to the following steps:
step 4.1, performing word segmentation on the light gas alarm task description text, and performing vectorization representation of task description by using the word vector trained in the step 2 to serve as a semantic feature of the light gas alarm task;
step 4.2, extracting entities in the topic of the light gas alarm task, and obtaining entity expression vectors associated with the operation and detection task by using the node distribution vectors of the knowledge graph trained in the step 3, wherein the entity expression vectors are used as a classification characteristic of the light gas alarm task;
step 5, carrying out entity alignment on the electric power equipment entity and the electric power field knowledge graph according to the equipment operation and inspection task text, carrying out feature expansion and task associated knowledge recall on the basis of graph node paths and graph node semantic distribution characteristics, and specifically implementing according to the following steps:
step 5.1, acquiring task description and task association system components, namely a gas relay, according to a gas alarm task entity, and performing entity alignment operation on a knowledge graph in the power field to acquire a subgraph corresponding to the task association entity on the graph;
step 5.2, acquiring the embedding vector of the sub-graph entity calculated in the step 5.1 and the word embedding vector of the entity node in the adjacent relation path in the three hops of the sub-graph node;
step 5.3, performing key path expansion on the sub-graph nodes, namely combining adjacent nodes, and simultaneously acquiring sentence embedding of combination characteristics, wherein the combination mode adopts a sum average method;
and 5.4, performing text filtering by taking the gas alarm task context characteristic vector, the graph embedded vector of the task entity and the embedded vector of the sub-graph node combination in the step 4 as a primary recall condition to obtain the recall of the task associated node knowledge with coarse precision.
And 6, performing text similarity calculation on the recall text obtained in the step 5 and the user task to obtain a recall text score, wherein the process is shown in fig. 5. The method is implemented according to the following steps:
6.1, respectively calculating topic distribution vectors of the recalled text and the gas operation and inspection task according to the topic model of the text corpus obtained in the step 2;
step 6.2, according to a word migration distance algorithm, performing word-level similarity calculation on the candidate text and the task description to obtain a word migration distance similarity score of the recalled text;
the word-shift distance is a way (method) for measuring the distance between two text documents, and is used for judging the similarity between two texts, namely the larger the WMD distance is, the smaller the similarity is, the smaller the WMD distance is, the greater the text similarity is.
6.3, obtaining the theme similarity of the recall text and the gas alarm task, namely calculating the similarity according to a cosine formula of a vector space to obtain a theme similarity score of the recall text;
and 6.4, performing final score calculation based on a weighted voting strategy, and adjusting the word shift distance weight and the topic similarity weight according to the task. The voting result is the candidate document score.
And 7, pushing the sorted text to the user according to the scores. The method is implemented according to the following steps:
and 6, sorting in a descending order according to the scores of the recalled documents obtained in the step 6, and selecting a certain number of documents to push as required.
Step 8, if the user task is finished, the pushing is terminated; steps 4 to 7 are repeated when the user changes scene and state.

Claims (6)

1. A domain knowledge pushing method based on a knowledge graph is characterized by comprising the following specific steps:
step 1, constructing a text knowledge base, wherein the text knowledge base is composed of a field knowledge text;
step 2, performing semantic analysis and topic modeling on the knowledge base text;
step 3, obtaining semantic distribution vectors of knowledge points by carrying out graph embedding processing on the domain knowledge graph;
step 4, establishing a task context feature vector according to the user task description and the task theme;
step 5, carrying out entity alignment on the domain entities in the user task description text and the domain knowledge graph in the step 3, carrying out feature expansion and task associated knowledge recall based on graph node paths and graph node semantic distribution characteristics, and specifically comprising the following steps:
step 5.1, acquiring task description and task association system components according to a user task entity, and performing entity alignment operation on a knowledge graph spectrum to acquire a sub-graph corresponding to the task entity on the graph;
step 5.2, calculating the embedded vectors of the entity of the sub-graph in the step 5.1, and obtaining the word embedded vectors of the entity nodes on each path in the three hops of the sub-graph;
step 5.3, performing key path expansion on the entity nodes in each path of the graph;
step 5.4, filtering the text of the knowledge base by taking the user task context characteristics, the graph embedded vector of the task entity and the embedded vector of the sub-graph node combination in the step 4 as a primary recall condition to obtain a recall text with rough knowledge precision of the task associated nodes;
and 6, performing text similarity calculation on the recall text obtained in the step 5 and the user task to obtain a recall text score, wherein the specific method comprises the following steps of:
6.1, respectively calculating topic distribution vectors of the recalled text and the user task according to the topic model of the text knowledge base obtained in the step 2;
step 6.2, according to a word migration distance algorithm, performing word-level similarity calculation on the recalled text and the task description to obtain a word migration distance similarity score of the recalled text;
6.3, calculating the similarity according to a cosine formula of the vector space to obtain a similarity score of the recalled text theme;
6.4, calculating scores based on a weighted voting strategy, and adjusting the word shift distance weight and the theme similarity weight according to the tasks;
step 7, pushing the sequencing result text to the user according to the scores;
step 8, if the user task is finished, the pushing is terminated; steps 4 to 7 are repeated when the user's context and status changes.
2. The knowledge-graph-based domain knowledge pushing method according to claim 1, wherein the text knowledge base is constructed by the following method: determining a knowledge range according to the field task requirements, and screening the content; sentence division is carried out on the text, and stop words are filtered; and constructing the final text set into a text knowledge base.
3. The knowledge-graph-based domain knowledge push method according to claim 1, wherein the specific method for semantic analysis of knowledge base text is as follows:
segmenting the knowledge text, and training the text by adopting an unsupervised WORD2VEC WORD embedding algorithm to obtain semantic distribution vectors of WORDs;
and calculating the semantic vector of the text sentence by adopting a method based on word vector weighted sum.
4. The knowledge-graph-based domain knowledge push method according to claim 1, wherein the text topic modeling is performed by a specific method comprising:
performing word segmentation on texts in a knowledge base, performing word frequency statistics on text sentences in the knowledge base according to word segmentation results, and performing word filtering on the texts of which the word frequency is lower than a preset threshold value;
performing character processing on the sentence to obtain a BIGRAM dictionary of the knowledge base text and to construct a mapping table from the text to the corresponding bag-of-words vector;
and acquiring a bag-of-words vector of the knowledge base text through a mapping table, and training the bag-of-words vector as the input of an LDA algorithm to acquire a theme distribution vector of the knowledge base text.
5. The domain knowledge push method based on the knowledge-graph according to claim 1, wherein the specific method for obtaining the semantic distribution vector of the nodes of the knowledge-graph is as follows:
step 3.1, constructing a domain knowledge graph, including two tasks of named entity identification and relation extraction, and obtaining a domain knowledge entity and a relation between the entities by adopting a BERT-based pre-training model to perform supervised learning;
and 3.2, obtaining a map node semantic distribution vector, and learning the node topology in the domain knowledge map through a map convolution neural network to obtain the semantic distribution vector of the node.
6. The domain knowledge push method based on the knowledge graph according to claim 1, wherein the specific method for establishing the task context features is as follows:
step 4.1, performing word segmentation processing on the user task description text, and performing vectorization representation of task description by using the word vector trained in the step 2 to serve as a semantic feature of the user task;
and 4.2, extracting entities in the user task theme, and obtaining an entity expression vector associated with the operation and detection task by using the knowledge graph node semantic distribution vector trained in the step 3 as a classification characteristic of the user task.
CN202011522006.7A 2020-12-21 2020-12-21 Domain knowledge pushing method based on knowledge graph Active CN112699246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011522006.7A CN112699246B (en) 2020-12-21 2020-12-21 Domain knowledge pushing method based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011522006.7A CN112699246B (en) 2020-12-21 2020-12-21 Domain knowledge pushing method based on knowledge graph

Publications (2)

Publication Number Publication Date
CN112699246A CN112699246A (en) 2021-04-23
CN112699246B true CN112699246B (en) 2022-09-27

Family

ID=75510145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011522006.7A Active CN112699246B (en) 2020-12-21 2020-12-21 Domain knowledge pushing method based on knowledge graph

Country Status (1)

Country Link
CN (1) CN112699246B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434659B (en) * 2021-06-17 2023-03-17 天津大学 Implicit conflict sensing method in collaborative design process
CN113254620B (en) * 2021-06-21 2022-08-30 中国平安人寿保险股份有限公司 Response method, device and equipment based on graph neural network and storage medium
CN113254550B (en) * 2021-06-29 2022-04-19 浙江大华技术股份有限公司 Knowledge graph-based recommendation method, electronic device and computer storage medium
CN113779387A (en) * 2021-08-25 2021-12-10 上海大智慧信息科技有限公司 Industry recommendation method and system based on knowledge graph
CN113918729B (en) * 2021-10-08 2024-04-16 肇庆学院 Task collaboration method and system based on knowledge tree
CN113886605A (en) * 2021-10-25 2022-01-04 支付宝(杭州)信息技术有限公司 Knowledge graph processing method and system
CN115964459B (en) * 2021-12-28 2023-09-12 北方工业大学 Multi-hop reasoning question-answering method and system based on food safety cognition spectrum
CN114374604B (en) * 2022-01-04 2023-06-02 厦门立林科技有限公司 Offline mode intelligent home configuration method, storage medium and system
CN114331789B (en) * 2022-03-07 2022-06-24 联通高新大数据人工智能科技(成都)有限公司 Intelligent cheap and clean knowledge recommendation method, device, equipment and storage medium
CN114745427A (en) * 2022-03-14 2022-07-12 北京科东电力控制系统有限责任公司 Monitoring service information situation pushing method and device based on knowledge graph
CN115168600B (en) * 2022-06-23 2023-07-11 广州大学 Value chain knowledge discovery method under personalized customization

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955780A (en) * 2019-10-12 2020-04-03 中国人民解放军国防科技大学 Entity alignment method for knowledge graph
CN112100322A (en) * 2020-08-06 2020-12-18 复旦大学 API element comparison result automatic generation method based on knowledge graph

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955780A (en) * 2019-10-12 2020-04-03 中国人民解放军国防科技大学 Entity alignment method for knowledge graph
CN112100322A (en) * 2020-08-06 2020-12-18 复旦大学 API element comparison result automatic generation method based on knowledge graph

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种基于图卷积自编码模型的多维度学科知识网络融合方法;李慧等;《图书情报工作》;20200930;第64卷(第18期);第114-125页 *
基于特征增强的中文 STEM 课程知识的关系抽取;韩萌等;《计算机应用研究》;20200630;第37卷(第S1期);第40-42页 *
基于知识图谱的科技大数据知识发现平台建设;胡吉颖等;《数据分析与知识发现》;20190131(第1期);第55-62页 *

Also Published As

Publication number Publication date
CN112699246A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
CN112699246B (en) Domain knowledge pushing method based on knowledge graph
CN109189901B (en) Method for automatically discovering new classification and corresponding corpus in intelligent customer service system
CN106776711B (en) Chinese medical knowledge map construction method based on deep learning
CN105824959B (en) Public opinion monitoring method and system
CN108073568A (en) keyword extracting method and device
CN108062304A (en) A kind of sentiment analysis method of the comment on commodity data based on machine learning
CN109271477A (en) A kind of method and system by internet building taxonomy library
CN110134792B (en) Text recognition method and device, electronic equipment and storage medium
US10387805B2 (en) System and method for ranking news feeds
CN106886580A (en) A kind of picture feeling polarities analysis method based on deep learning
CN113112164A (en) Transformer fault diagnosis method and device based on knowledge graph and electronic equipment
CN112559684A (en) Keyword extraction and information retrieval method
CN110110220B (en) Recommendation model fusing social network and user evaluation
Liu et al. Open intent discovery through unsupervised semantic clustering and dependency parsing
CN115017425B (en) Location search method, location search device, electronic device, and storage medium
CN111460145A (en) Learning resource recommendation method, device and storage medium
CN112989813A (en) Scientific and technological resource relation extraction method and device based on pre-training language model
CN114048354B (en) Test question retrieval method, device and medium based on multi-element characterization and metric learning
Wei et al. Online education recommendation model based on user behavior data analysis
CN111259223B (en) News recommendation and text classification method based on emotion analysis model
CN112883182A (en) Question-answer matching method and device based on machine reading
CN105468780A (en) Normalization method and device of product name entity in microblog text
CN110413997B (en) New word discovery method, system and readable storage medium for power industry
CN111859955A (en) Public opinion data analysis model based on deep learning
CN110618980A (en) System and method based on legal text accurate matching and contradiction detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant