CN113836289B - Entity evolution rule recommendation method and device - Google Patents

Entity evolution rule recommendation method and device Download PDF

Info

Publication number
CN113836289B
CN113836289B CN202110938544.2A CN202110938544A CN113836289B CN 113836289 B CN113836289 B CN 113836289B CN 202110938544 A CN202110938544 A CN 202110938544A CN 113836289 B CN113836289 B CN 113836289B
Authority
CN
China
Prior art keywords
entity
semantic
search content
node
double
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110938544.2A
Other languages
Chinese (zh)
Other versions
CN113836289A (en
Inventor
杜军平
黄恩一
薛哲
徐欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110938544.2A priority Critical patent/CN113836289B/en
Publication of CN113836289A publication Critical patent/CN113836289A/en
Application granted granted Critical
Publication of CN113836289B publication Critical patent/CN113836289B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

One or more embodiments of the present disclosure provide a method and an apparatus for recommending entity evolution rules, including: searching according to the input entity keywords to obtain search content related to the entity keywords, performing semantic expansion on the search content to obtain search content after semantic expansion, extracting semantic features from the search content after semantic expansion, determining semantic association relations among the semantic features, constructing a graph structure according to the semantic association relations, clustering based on the graph structure to obtain at least one entity group, calculating the heat value of the entity group where the entity keywords are located, and outputting a recommendation result according to the heat value calculation result. The method of the embodiment can recommend the research heat and the evolution rule of the related entity keywords to the user, and has the advantages of higher calculation efficiency and accurate recommendation result.

Description

Entity evolution rule recommendation method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for recommending entity evolution rules.
Background
At present, some personalized recommendation demands recommend research hotspots and evolution rule change conditions related to entity keywords according to the input entity keywords. In order to realize the personalized recommendation, the entity keywords are analyzed by adopting an evolution rule analysis algorithm, the existing evolution rule analysis algorithm has higher complexity, the entity with the unsound association relationship cannot be screened, and a certain error exists in the analysis result.
Disclosure of Invention
In view of this, an object of one or more embodiments of the present disclosure is to provide an entity evolution rule recommendation method, which can accurately recommend an evolution rule condition of an entity.
Based on the above objects, one or more embodiments of the present disclosure provide a method for recommending entity evolution rules, including:
searching and obtaining search content related to the entity keywords according to the input entity keywords;
performing semantic expansion on the search content to obtain search content after semantic expansion;
extracting semantic features from the semantically expanded search content;
determining semantic association relationships among the semantic features;
constructing a graph structure according to the semantic association relationship, and clustering based on the graph structure to obtain at least one entity group;
and calculating the heat value of the entity group where the entity keyword is located, and outputting a recommendation result according to the heat value calculation result.
Optionally, the search content includes an entity title and entity content;
performing semantic expansion on each piece of search content to obtain search content after semantic expansion, wherein the method comprises the following steps:
calculating the similarity of the entity title and the entity content;
and responding to the similarity meeting a preset expansion condition, and expanding the entity title according to the entity content to obtain the search content after semantic expansion.
Optionally, calculating the similarity between the entity title and the entity content is: calculating Jaccard distance between the entity title and the entity content;
the similarity meets the preset expansion condition that: the Jaccard distance is greater than a preset first distance threshold and is less than a preset second distance threshold.
Optionally, after extracting the semantic features from the semantically expanded search content, the method further includes:
and carrying out feature dimension reduction processing on the semantic features to obtain dimension-reduced semantic features.
Optionally, determining the semantic association relationship between the semantic features is:
and taking the semantic features as input, and outputting semantic association relations among the semantic features by utilizing a multi-channel attention model.
Optionally, constructing a graph structure according to the semantic association relationship, clustering based on the graph structure to obtain at least one entity group, including:
constructing an entity bipartite graph according to the semantic association relationship;
for each entity node in the entity bipartite graph:
calculating the similarity between the entity node and other entity nodes, and storing other entity nodes with the similarity larger than a preset threshold value in a similarity matrix of the entity nodes;
determining all double-connected components according to a graph structure formed by entity nodes in the similarity matrix;
and responding to the double-communication component meeting a preset communication condition, and classifying each entity node of the double-communication component into an entity group of the entity node.
Optionally, in response to the dual connectivity component satisfying a preset connectivity condition, classifying each entity node of the dual connectivity component into an entity group of the entity node, including:
responding to the fact that the number of nodes on the double-communication component is larger than a preset upper limit threshold, and carrying out minimum segmentation calculation on the double-communication component until the number of nodes on the segmented double-communication component is smaller than or equal to the upper limit threshold;
for the pre-processed double-connected components with the node number smaller than or equal to the upper limit threshold value, responding to the fact that the node number is smaller than a preset reasonable threshold value, and carrying out minimum segmentation calculation on the pre-processed double-connected components to obtain reasonable double-connected components;
and classifying the entity nodes on the reasonable double connected components into entity groups.
Optionally, calculating a heat value of the entity group where the entity keyword is located, and outputting a recommendation result according to the heat value calculation result, including:
determining each entity node in the entity group where the entity keyword is located;
according to the search content of each entity node, respectively calculating the heat value of each entity node;
and outputting the recommended results according to the order of the heat value of each entity node from large to small.
Optionally, the search content includes time information;
the clustering is performed based on the graph structure, and at least one entity group is obtained as follows:
clustering based on the graph structure and the time information to obtain at least one entity group;
the recommended results are output according to the sequence from the high heat value to the low heat value of each entity node:
and outputting the recommended results in the specific time period according to the order of the heat value of each entity node from high to low.
The embodiment of the specification also provides a device for recommending entity evolution rules, which comprises:
the searching module is used for searching and obtaining searching contents related to the entity keywords according to the input entity keywords;
the semantic expansion module is used for carrying out semantic expansion on the search content to obtain search content after semantic expansion;
the feature extraction module is used for extracting semantic features from the search content after semantic expansion;
the semantic association module is used for determining semantic association relations among the semantic features;
the clustering module is used for constructing a graph structure according to the semantic association relationship, and clustering based on the graph structure to obtain at least one entity group;
and the output module is used for calculating the heat value of the entity group where the entity keyword is located and outputting a recommendation result according to the heat value calculation result.
From the above, it can be seen that, according to the entity evolution rule recommendation method and device provided in one or more embodiments of the present disclosure, search content related to an entity keyword is obtained by searching according to the input entity keyword, semantic expansion is performed on the search content to obtain the search content after semantic expansion, semantic features are extracted from the search content after semantic expansion, semantic association relationships between the semantic features are determined, a graph structure is constructed according to the semantic association relationships, clustering is performed based on the graph structure, at least one entity group is obtained, a heat value of the entity group where the entity keyword is located is calculated, and a recommendation result is output according to the heat value calculation result. The method of the embodiment can recommend the research heat and the evolution rule of the related entity keywords to the user, and has the advantages of higher calculation efficiency and accurate recommendation result.
Drawings
For a clearer description of one or more embodiments of the present description or of the solutions of the prior art, the drawings that are necessary for the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only one or more embodiments of the present description, from which other drawings can be obtained, without inventive effort, for a person skilled in the art.
FIG. 1 is a schematic flow diagram of a method of one or more embodiments of the present disclosure;
FIG. 2 is a schematic diagram of a self-encoder model of one or more embodiments of the present disclosure;
FIG. 3 is a pictorial representation of a physical bipartite in accordance with one or more embodiments of the present disclosure;
FIG. 4 is a schematic diagram of an apparatus according to one or more embodiments of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to one or more embodiments of the present disclosure.
Detailed Description
For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same.
It is noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present disclosure should be taken in a general sense as understood by one of ordinary skill in the art to which the present disclosure pertains. The use of the terms "first," "second," and the like in one or more embodiments of the present description does not denote any order, quantity, or importance, but rather the terms "first," "second," and the like are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
As described in the background section, the existing evolution rule analysis algorithms include an evolution rule analysis algorithm based on a K-means clustering algorithm, an evolution rule analysis algorithm based on an artificial neural network, and an evolution rule analysis algorithm based on a condensation hierarchical clustering, which have large calculation amount and high time complexity, and are not screened out entities with a non-tight association relationship, and a hash table data structure is used for storing association relationships between the entities with the non-tight association relationship, so that a result is obtained by performing traversal calculation on the storage structure, resource waste is caused, operation efficiency is reduced, and errors exist in the result.
In view of this, the embodiment of the present disclosure provides a method for recommending entity evolution rules, which constructs a graph structure according to semantic association relationships of entities, analyzes the evolution rules of the entities through the graph structure, can reduce time complexity, screens out entities with non-compact association relationships, improves operation efficiency, and improves recommendation accuracy.
The technical scheme of the present disclosure is further described in detail below through specific examples.
As shown in fig. 1, one or more embodiments of the present disclosure provide a method for recommending entity evolution rules, including:
s101: searching and obtaining search contents related to the entity keywords according to the input entity keywords;
in this embodiment, a user inputs an entity keyword to be queried, and obtains a plurality of search contents related to the entity keyword through searching. For example, by inputting an entity keyword of "machine learning", several pieces of search content related to machine learning may be obtained, each piece of search content is related to machine learning, and there may be an association between each piece of search content, and there may be no association.
S102: performing semantic expansion on the search content to obtain search content after semantic expansion;
s103: extracting semantic features from the search content after semantic expansion;
s104: determining semantic association relations among semantic features;
in this embodiment, after searching to obtain the search content, in order to obtain a more comprehensive and accurate recommendation result, semantic expansion is performed on each piece of search content to obtain the search content after semantic expansion; extracting semantic features based on the search content of semantic expansion; after extracting the semantic features of each piece of search content, mining semantic association relations among the semantic features of all pieces of search content.
S105: constructing a graph structure according to the semantic association relationship, and clustering based on the graph structure to obtain at least one entity group;
in this embodiment, after determining the semantic association relationship of the search content, a graph structure is constructed according to the semantic association relationship, and clustering processing is performed on the basis of the graph structure, so as to obtain an entity group corresponding to each entity node in the graph structure.
S106: and calculating the heat value of the entity group where the entity keyword is located, and outputting a recommendation result according to the heat value calculation result.
In this embodiment, after determining all entity groups, the popularity value of the entity group where the entity keyword is located is calculated, so as to obtain a popularity value calculation result of each entity node in the corresponding entity group, and then a recommendation result is determined according to the popularity value calculation result, and the recommendation result is output to the user.
The entity evolution rule recommending method provided by the embodiment comprises the steps of searching for search content related to entity keywords according to the input entity keywords, performing semantic expansion on the search content to obtain search content after semantic expansion, extracting semantic features from the search content after semantic expansion, determining semantic association relations among the semantic features, constructing a graph structure according to the semantic association relations, clustering based on the graph structure to obtain at least one entity group, calculating the heat value of the entity group where the entity keywords are located, and outputting recommending results according to the heat value calculation result. The method of the embodiment utilizes the graph structure to cluster, so that the operation efficiency can be improved, the constructed graph structure screens out entities with non-compact association relations, and the recommendation accuracy can be improved.
In some embodiments, the search content includes an entity title and entity content;
performing semantic expansion on each piece of search content to obtain search content after semantic expansion, wherein the method comprises the following steps:
calculating the similarity of the entity title and the entity content;
and responding to the similarity meeting a preset expansion condition, and expanding the entity title according to the entity content to obtain the search content after semantic expansion.
In this embodiment, the search content obtained by searching according to the entity keyword includes two parts of an entity title and entity content, in general, the entity title is a simple sentence, the entity content is detailed content such as a complete sentence, paragraph, picture, etc. related to the entity title, and the detailed content includes additional information about the entity such as time, entity item, etc. For each piece of search content, semantic expansion is performed when the semantic expansion is required, and the semantic expansion is not performed when the semantic expansion is not required. Specifically, for each piece of search content, calculating the similarity between the entity title and the entity content, if the similarity between the entity title and the entity content reaches the expansion condition, the piece of search content needs semantic expansion, the entity title is subjected to semantic expansion according to the entity content, and if the expansion condition is not reached, the semantic expansion is not needed.
In some modes, the similarity of the entity title and the entity content is calculated, and whether the similarity meets the expansion condition is judged by calculating the Jaccard distance between the entity title and the entity content, judging whether the Jaccard distance is larger than a preset first distance threshold and smaller than a preset second distance threshold, and judging that the search content needs semantic expansion if the similarity meets the expansion condition.
The Jaccard distance between the entity title and the entity content is calculated by the following formula:
Figure BDA0003214111460000061
wherein T is i For the features of the ith term in the collection of terms of the entity title (collection of all terms make up contained in the entity title), C i,j Is the joint feature of the ith word and the jth word in the word set of the entity content (the set of all words contained in the entity content), and I·| is the number of words in the set.
A first distance threshold alpha and a second distance threshold beta are set, and the first distance threshold is less than the second distance threshold. At J (T) i ,C i,j ) Under the condition of < alpha, the semantic features of the entity title and the entity content are far different, and semantic expansion is carried out according to the entity content on the basis of the entity title, so that the final recommendation result is greatly influenced. For example, the search content is a piece of technology requirement information issued by a company, and the entity of the search content is entitled "turbocharger design", and the entity content includes "issue time: 2020-01-23; release company: company A; project funds: x ten thousand yuan ", etc. In the search content, the correlation of semantic features between the entity title and the entity content is small, and the intention change after semantic expansion is large, which may cause semantic deviation.
At J (T) i ,C i,j ) Under the condition of beta, the semantic features of the entity title and the entity content are very close, semantic expansion is carried out according to the entity content on the basis of the entity title, and the recommendation result is not obviously influenced.
In view of this, in order to avoid the adverse effect of semantic expansion on the final result, in the present embodiment, the expansion condition is set to α < J (T i ,C i,j ) And (3) carrying out semantic expansion according to the entity content on the basis of the entity title when the calculated Jaccard distance is larger than the first distance threshold and smaller than the second distance threshold so as to obtain an accurate and comprehensive recommendation result. Alternatively to this, the method may comprise,the selection of the first distance threshold α and the second distance threshold β may be verified in a plurality of test procedures to obtain an optimal value, which is not limited in this embodiment.
For example, in a piece of technology requirement information that is released, the entity is entitled "rdma technology-based distributed communication engine development", and the entity content includes "rdma is an abbreviation of Remote Direct Memory Access, which means remote direct data access, which is generated to solve the delay of server-side data processing in network transmission. The company realizes the development of a new communication engine through rdma technology and is realized in a zookeeper distributed mode, and hopefully, people join. By calculating the similarity of the entity title and the entity content, the similarity between the entity title and the entity content meets the expansion condition, the entity content is added on the basis of the entity title, and the search content after semantic expansion is obtained. In some modes, in order to avoid overlong search content after semantic expansion and influence on calculation efficiency, a length threshold can be set, so that the length of the search content after semantic expansion is not greater than the length threshold, and the calculation efficiency is ensured.
In the embodiment, for each piece of search content, judging whether semantic expansion is needed, and for the search content with small correlation between the entity title and the entity content, not performing semantic expansion, screening out the various search content, and avoiding influencing subsequent processing; the semantic expansion is not performed on the search content with larger correlation between the entity title and the entity content, so that the resource waste is avoided; and reasonably expanding the search content meeting the expansion conditions, so as to obtain a comprehensive and accurate recommendation result.
In some embodiments, after extracting the semantic features from the semantically expanded search content, the method further comprises:
and carrying out feature dimension reduction processing on the semantic features to obtain dimension-reduced semantic features.
In this embodiment, after the search content is subjected to semantic expansion, semantic features are extracted from the search content by using a word bag model, the extracted semantic features are high-dimensional, and feature dimension reduction processing is performed on the extracted semantic features to obtain dimension-reduced semantic features in order to reduce redundant information and processing complexity.
In some modes, the semantic features of each dimension extracted by the bag-of-words model are normalized, and normalized semantic features are obtained and expressed as follows:
Figure BDA0003214111460000081
wherein x is j Meaning features of the j-th word, x n Semantic features representing the nth dimension, v j Representing semantic features of the jth word after normalization processing, n represents the nth dimension, and k is the number of dimensions.
The normalized semantic features form feature vectors, the feature vectors are input into an automatic encoder model for feature dimension reduction processing, the automatic encoder model comprises an encoder and a decoder, the encoder maps the input feature vectors to obtain encoded vectors, and the decoder maps the encoded vectors to obtain reconstructed feature vectors.
As shown in fig. 2, in this embodiment, the automatic encoder model includes a first codec layer, a second codec layer, and a third codec layer, where the first codec layer is used for performing feature dimension reduction processing for the first time to obtain feature vectors after the first time dimension reduction, the second codec layer is used for extracting feature vectors from the feature vectors after the first time dimension reduction, and the third codec layer is used for performing feature dimension reduction processing for the second time based on the extracted feature vectors to obtain feature vectors after the dimension reduction.
In some aspects, the excitation function of the encoder of the first codec layer is:
Figure BDA0003214111460000082
wherein z is j Feature vector, z representing the jth term k A feature vector representing the kth dimension, K being the number of dimensions,
Figure BDA0003214111460000083
representing the feature vector z j The characteristic vector after specialization, +.>
Figure BDA0003214111460000084
Representing the feature vector z k And the characteristic vector after specialization.
The loss function of the encoder of the first codec layer is:
J=-∑ i v i log o i (4)
wherein v is i Feature vector representing the ith term, o i Indicating the degree of loss of the i-th word.
The excitation function of the encoder of the second codec layer is:
Figure BDA0003214111460000091
wherein t represents the weight of the excitation function, e -t The weight after specialization is represented.
The loss function of the encoder of the second codec layer is:
Figure BDA0003214111460000092
wherein w is l Representing the excitation function value, p, of the first word calculated by the first codec layer l And the loss function value of the first word calculated by the first codec layer is represented.
And (3) decoding the features of the hidden layer by using a loss function shown in a formula (6) and performing unsupervised training on the automatic coding model by minimizing the distance between the output feature vector and the input feature vector. And after performing feature dimension reduction processing by using the trained automatic coding model, the calculation result of the hidden layer of the model is the semantic feature after dimension reduction. It should be noted that the above automatic coding model is only an exemplary illustration, and the specific structure and principle of the model are not explained, and in other modes, the feature dimension reduction process can be performed by using other models, and the specific model is not limited.
In some embodiments, determining semantic association between semantic features is:
and taking the semantic features as input, and outputting semantic association relations among the semantic features by utilizing the multi-channel attention model.
In this embodiment, after determining the semantic features of all the search contents, the semantic features are used as input of the multi-channel attention model, and the multi-channel attention model outputs the semantic association relationship between the semantic features. The specific structure and principle of the multi-channel attention model are not specifically described in this embodiment.
In some embodiments, constructing a graph structure according to a semantic association relationship, clustering based on the graph structure to obtain at least one entity group, including:
constructing an entity bipartite graph according to the semantic association relationship;
for each entity node in the entity bipartite graph:
calculating the similarity between the entity node and other entity nodes, and storing other entity nodes with the similarity larger than a preset threshold value in a similarity matrix of the entity nodes;
determining all double-connected components according to a graph structure formed by entity nodes in the similarity matrix;
and in response to the dual-connectivity component meeting a preset connectivity condition, classifying each entity node of the dual-connectivity component into an entity group of the entity node.
In this embodiment, after determining the semantic association relationship of the semantic features of the search content, an entity bipartite graph is constructed based on the semantic association relationship, each entity node of the entity bipartite graph corresponds to an entity word in the search content, and edges between the entity nodes represent that the two entity words have the semantic association relationship. Based on the entity bipartite graph, for each entity node, calculating the similarity between the entity node and other entity nodes, and storing other entity nodes with similarity larger than a threshold value in a similarity matrix corresponding to the entity node, and after traversing all other entity nodes, obtaining the similarity matrix storing other entity nodes with certain similarity with the entity node; and then, for the entity nodes in the similarity matrix to form a graph structure, determining double-communication components of the graph structure, sequentially judging whether each double-communication component meets the communication condition, storing each entity node on the double-communication components meeting the communication condition in an entity group of the entity node, and completing the clustering processing of the entity node and the related entity nodes, wherein semantic features of each entity node in the entity group have a certain similarity.
In some modes, the similarity matrix is a hash table data structure, and nodes with a non-tight association relationship can be stored. Since the same node may exist in the entity groups of different nodes, calculation and analysis of the dual connectivity components are required.
In some embodiments, in response to the dual connectivity component satisfying a preset connectivity condition, classifying each entity node of the dual connectivity component into an entity group of the entity node includes:
responding to the fact that the number of nodes on the double-communication components is larger than a preset upper threshold, and carrying out minimum segmentation calculation on the double-communication components until the number of nodes on the segmented double-communication components is smaller than or equal to the upper threshold;
for the pre-processed double-connected components with the number of nodes being smaller than or equal to the upper limit threshold value, responding to the fact that the number of the nodes is smaller than a preset reasonable threshold value, and carrying out minimum segmentation calculation on the pre-processed double-connected components to obtain reasonable double-connected components;
the entity nodes on the reasonable dual connectivity component are categorized into entity groups.
In this embodiment, whether the communication condition is satisfied is determined according to each dual-communication component determined by the graph structure. If the number of the nodes on the double-communication component is larger than the upper threshold, carrying out minimum segmentation calculation on the double-communication component until the number of the nodes on the double-communication component after the minimum segmentation is smaller than or equal to the upper threshold; then, judging whether the node number is smaller than a reasonable threshold value for all the double-connected components with the node number smaller than or equal to the upper threshold value, if so, further carrying out minimum segmentation calculation to obtain reasonable double-connected components, and screening by combining the reasonable threshold value can remove unreasonable entity nodes and improve accuracy; in this way, each entity node on the reasonable double-connected component obtained through preliminary processing of the number of the nodes and filtering processing of the unreasonable entity nodes is added into the entity group, clustering of the entity nodes is completed, and the entity nodes belonging to the same class have similarity of semantic features, unreasonable entity nodes are filtered, so that errors can be reduced.
Alternatively, the number of nodes on the reasonably dual connectivity component is greater than 2 while being less than the upper threshold.
For example, as shown in fig. 3, the input entity keywords are "machine learning", the obtained search content includes entity words "data mining", "machine manufacturing" and "medicine", and the like, an entity bipartite graph is constructed according to the semantic association relationship between the entity words, and the similarity between the entity words in the entity bipartite graph is calculated, for example, the similarity between "machine learning" and "data mining" is 0.98, the similarity between "machine learning" and "machine manufacturing" is 0.32, and the similarity between "machine learning" and "medicine" is 0.47. And determining all double connected components based on a graph structure constructed by entity words with similarity larger than a threshold value, and judging through connected conditions to obtain an entity group 1 and an entity group 2, wherein the entity group 1 comprises entity words of machine learning, data mining and medicine, and the entity group 2 comprises entity words of machine manufacturing.
In some embodiments, calculating a popularity value of an entity group where an entity keyword is located, and outputting a recommendation result according to the popularity value calculation result includes:
determining each entity node in the entity group where the entity keyword is located;
according to the search content of each entity node, respectively calculating the heat value of each entity node;
and outputting the recommended results according to the order of the heat value of each entity node from large to small.
In this embodiment, after a plurality of entity groups are obtained through clustering, the entity group in which the entity keyword is located is determined, and each entity word in the entity group has a certain similarity with the entity keyword. And then, respectively calculating the heat value of each entity word in the entity group, and outputting a recommendation result according to the heat value sequence of each entity word after the heat values of all the entity words are calculated.
In some approaches, the popularity value of an entity term may be calculated based on the search content in which the entity term is located. For example, the search content containing the entity word "data mining" is "X data mining item, release time: 2020-01-23; release company: company A; project funds: x ten thousand yuan ", the hotness value of the entity word" data mining "is: (current time-release time)/time base + project funds/reward base + total number of technological requirement information released by the release company. It should be noted that, the calculation of the heat value may be adaptively configured according to the specific application field and the requirement, and the calculation mode of the heat value is not particularly limited.
In some embodiments, after calculating the heat value of each entity node, the method further includes:
determining the comprehensive heat value of each entity node according to the similarity between the entity node and the entity keyword and the heat value of each entity node;
the recommended results are output according to the sequence from the big to the small of the heat value of each entity node: and outputting the recommended results according to the sequence from the high to the low of the comprehensive heat value of each entity node.
In this embodiment, for the recommendation sequence of each entity node in the same entity group, not only the heat value of the entity node is considered, but also the similarity between the entity node and the entity keyword is combined to obtain the comprehensive recommendation result. For example, the weight from big to small is distributed to each entity node according to the similarity from big to small, the comprehensive heat value is calculated according to the weight and the heat value of the entity node, and finally the comprehensive recommendation result is obtained according to the order of the comprehensive heat value from big to small.
In some embodiments, the search content includes time information;
clustering is carried out based on the graph structure, and at least one entity group is obtained as follows: clustering based on the graph structure and the time information to obtain at least one entity group;
the recommended results are output according to the sequence from the big to the small of the heat value of each entity node: and outputting the recommended results in the specific time period according to the order of the heat value of each entity node from high to low.
In this embodiment, a recommended result in a specific period of time may be obtained. Specifically, the search content includes time information (for example, release time), an entity group is obtained according to clustering processing of the time information, a heat value of the entity group where the entity keyword is located is calculated, and then a recommendation result of a specific time period is sequentially output according to the heat value of each entity node in the entity group, or a comprehensive recommendation result of the specific time period is output according to the heat value of each entity node and the similarity between the heat value and the entity keyword. For example, based on the graph structure, entity nodes within 2019-2010 are classified into one entity group, and a recommendation result within the year is obtained.
It should be noted that the methods of one or more embodiments of the present description may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of one or more embodiments of the present description, the devices interacting with each other to accomplish the methods.
It should be noted that the foregoing describes specific embodiments of the present invention. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
As shown in fig. 4, the embodiment of the present disclosure further provides an entity evolution rule recommendation device, including:
the searching module is used for searching and obtaining searching contents related to the entity keywords according to the input entity keywords;
the semantic expansion module is used for carrying out semantic expansion on the search content to obtain search content after semantic expansion;
the feature extraction module is used for extracting semantic features from the search content after semantic expansion;
the semantic association module is used for determining semantic association relations among the semantic features;
the clustering module is used for constructing a graph structure according to the semantic association relationship, and clustering based on the graph structure to obtain at least one entity group;
and the output module is used for calculating the heat value of the entity group where the entity keyword is located and outputting a recommendation result according to the heat value calculation result.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in one or more pieces of software and/or hardware when implementing one or more embodiments of the present description.
The device of the foregoing embodiment is configured to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Fig. 5 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.
The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
The electronic device of the foregoing embodiment is configured to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; combinations of features of the above embodiments or in different embodiments are also possible within the spirit of the present disclosure, steps may be implemented in any order, and there are many other variations of the different aspects of one or more embodiments described above which are not provided in detail for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure one or more embodiments of the present description. Furthermore, the apparatus may be shown in block diagram form in order to avoid obscuring the one or more embodiments of the present description, and also in view of the fact that specifics with respect to implementation of such block diagram apparatus are highly dependent upon the platform within which the one or more embodiments of the present description are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that one or more embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The present disclosure is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the one or more embodiments of the disclosure, are therefore intended to be included within the scope of the disclosure.

Claims (7)

1. The entity evolution rule recommending method is characterized by comprising the following steps of:
searching to obtain search content related to the entity keywords according to the input entity keywords; wherein the search content includes an entity title and entity content;
performing semantic expansion on the search content to obtain search content after semantic expansion, wherein the semantic expansion comprises the following steps: calculating Jaccard distance between the entity title and the entity content; responding to the fact that the Jaccard distance is larger than a first preset distance threshold value and smaller than a second preset distance threshold value, and expanding the entity title according to the entity content to obtain the search content after semantic expansion;
extracting semantic features from the semantically expanded search content;
determining semantic association relationships among the semantic features;
constructing a graph structure according to the semantic association relationship, clustering based on the graph structure to obtain at least one entity group, wherein the method comprises the following steps:
constructing an entity bipartite graph according to the semantic association relationship;
for each entity node in the entity bipartite graph:
calculating the similarity between the entity node and other entity nodes, and storing other entity nodes with the similarity larger than a preset threshold value in a similarity matrix of the entity nodes;
determining all double-connected components according to a graph structure formed by entity nodes in the similarity matrix; responding to the double-communication component meeting a preset communication condition, and classifying each entity node of the double-communication component into an entity group of the entity node;
and calculating the heat value of the entity group where the entity keyword is located, and outputting a recommendation result according to the heat value calculation result.
2. The method of claim 1, further comprising, after extracting semantic features from the semantically expanded search content:
and carrying out feature dimension reduction processing on the semantic features to obtain dimension-reduced semantic features.
3. The method of claim 1, wherein determining the semantic association between the semantic features is:
and taking the semantic features as input, and outputting semantic association relations among the semantic features by utilizing a multi-channel attention model.
4. The method of claim 1, wherein classifying each entity node of the dual connectivity component into an entity group of the entity node in response to the dual connectivity component satisfying a preset connectivity condition comprises:
responding to the fact that the number of nodes on the double-communication component is larger than a preset upper limit threshold, and carrying out minimum segmentation calculation on the double-communication component until the number of nodes on the segmented double-communication component is smaller than or equal to the upper limit threshold;
for the pre-processed double-connected components with the node number smaller than or equal to the upper limit threshold value, responding to the fact that the node number is smaller than a preset reasonable threshold value, and carrying out minimum segmentation calculation on the pre-processed double-connected components to obtain reasonable double-connected components;
and classifying the entity nodes on the reasonable double connected components into entity groups.
5. The method of claim 1, wherein calculating the popularity value of the entity group in which the entity keyword is located, and outputting the recommendation result according to the popularity value calculation result, comprises:
determining each entity node in the entity group where the entity keyword is located;
according to the search content of each entity node, respectively calculating the heat value of each entity node;
and outputting the recommended results according to the order of the heat value of each entity node from large to small.
6. The method of claim 5, wherein the search content includes time information;
the clustering based on the graph structure, to obtain at least one entity group, includes:
clustering based on the graph structure and the time information to obtain at least one entity group;
the recommended results are output according to the sequence from the high heat value to the low heat value of each entity node:
and outputting the recommended results in the specific time period according to the order of the heat value of each entity node from high to low.
7. An entity evolution law recommending device is characterized by comprising:
the searching module is used for searching and obtaining searching contents related to the entity keywords according to the input entity keywords; wherein the search content includes an entity title and entity content;
the semantic expansion module is used for carrying out semantic expansion on the search content to obtain search content after semantic expansion, and comprises the following steps: calculating Jaccard distance between the entity title and the entity content; responding to the fact that the Jaccard distance is larger than a first preset distance threshold value and smaller than a second preset distance threshold value, and expanding the entity title according to the entity content to obtain the search content after semantic expansion;
the feature extraction module is used for extracting semantic features from the search content after semantic expansion;
the semantic association module is used for determining semantic association relations among the semantic features;
the clustering module is used for constructing a graph structure according to the semantic association relationship, clustering based on the graph structure to obtain at least one entity group, and comprises the following steps: constructing an entity bipartite graph according to the semantic association relationship; for each entity node in the entity bipartite graph: calculating the similarity between the entity node and other entity nodes, and storing other entity nodes with the similarity larger than a preset threshold value in a similarity matrix of the entity nodes; determining all double-connected components according to a graph structure formed by entity nodes in the similarity matrix; responding to the double-communication component meeting a preset communication condition, and classifying each entity node of the double-communication component into an entity group of the entity node;
and the output module is used for calculating the heat value of the entity group where the entity keyword is located and outputting a recommendation result according to the heat value calculation result.
CN202110938544.2A 2021-08-16 2021-08-16 Entity evolution rule recommendation method and device Active CN113836289B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110938544.2A CN113836289B (en) 2021-08-16 2021-08-16 Entity evolution rule recommendation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110938544.2A CN113836289B (en) 2021-08-16 2021-08-16 Entity evolution rule recommendation method and device

Publications (2)

Publication Number Publication Date
CN113836289A CN113836289A (en) 2021-12-24
CN113836289B true CN113836289B (en) 2023-06-09

Family

ID=78960677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110938544.2A Active CN113836289B (en) 2021-08-16 2021-08-16 Entity evolution rule recommendation method and device

Country Status (1)

Country Link
CN (1) CN113836289B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10268703B1 (en) * 2012-01-17 2019-04-23 Google Llc System and method for associating images with semantic entities

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663122A (en) * 2012-04-20 2012-09-12 北京邮电大学 Semantic query expansion algorithm based on emergency ontology
KR101485940B1 (en) * 2013-08-23 2015-01-27 네이버 주식회사 Presenting System of Keyword Using depth of semantic Method Thereof
CN106528633B (en) * 2016-10-11 2019-07-02 杭州电子科技大学 A kind of video society attention rate improvement method recommended based on keyword
US10789298B2 (en) * 2016-11-16 2020-09-29 International Business Machines Corporation Specialist keywords recommendations in semantic space
CN108241699B (en) * 2016-12-26 2022-03-11 百度在线网络技术(北京)有限公司 Method and device for pushing information
CN107169010A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 A kind of determination method and device of recommendation search keyword
CN108197098B (en) * 2017-11-22 2021-12-24 创新先进技术有限公司 Method, device and equipment for generating keyword combination strategy and expanding keywords
US11625426B2 (en) * 2019-02-05 2023-04-11 Microstrategy Incorporated Incorporating opinion information with semantic graph data
CN110704743B (en) * 2019-09-30 2022-02-18 北京科技大学 Semantic search method and device based on knowledge graph
CN112148701A (en) * 2020-09-23 2020-12-29 平安直通咨询有限公司上海分公司 File retrieval method and equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10268703B1 (en) * 2012-01-17 2019-04-23 Google Llc System and method for associating images with semantic entities

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Prediction of Financial Big Data Stock Trends Based on Attention Mechanism;Junping Du,Zhe Xue;《IEEE》;全文 *

Also Published As

Publication number Publication date
CN113836289A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
Lu et al. VGCN-BERT: augmenting BERT with graph embedding for text classification
CN111488426B (en) Query intention determining method, device and processing equipment
CN111461004B (en) Event detection method and device based on graph attention neural network and electronic equipment
WO2023065211A1 (en) Information acquisition method and apparatus
CN113011186B (en) Named entity recognition method, named entity recognition device, named entity recognition equipment and computer readable storage medium
CN111539197A (en) Text matching method and device, computer system and readable storage medium
CN113076476B (en) User portrait construction method of microblog heterogeneous information
CN113656660B (en) Cross-modal data matching method, device, equipment and medium
US20230386238A1 (en) Data processing method and apparatus, computer device, and storage medium
CN113312480A (en) Scientific and technological thesis level multi-label classification method and device based on graph convolution network
CN112581327B (en) Knowledge graph-based law recommendation method and device and electronic equipment
CN112329460A (en) Text topic clustering method, device, equipment and storage medium
CN115795030A (en) Text classification method and device, computer equipment and storage medium
CN113535912B (en) Text association method and related equipment based on graph rolling network and attention mechanism
CN116975615A (en) Task prediction method and device based on video multi-mode information
CN110134852B (en) Document duplicate removal method and device and readable medium
WO2023246849A1 (en) Feedback data graph generation method and refrigerator
CN113836289B (en) Entity evolution rule recommendation method and device
CN116975271A (en) Text relevance determining method, device, computer equipment and storage medium
CN115329754A (en) Text theme extraction method, device and equipment and storage medium
CN115640375A (en) Technical problem extraction method in patent literature and related equipment
CN115186085A (en) Reply content processing method and interaction method of media content interaction content
CN116227467A (en) Model training method, text processing method and device
CN116821781A (en) Classification model training method, text analysis method and related equipment
CN114637846A (en) Video data processing method, video data processing device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant