CN113836289A - Entity evolution law recommendation method and device - Google Patents

Entity evolution law recommendation method and device Download PDF

Info

Publication number
CN113836289A
CN113836289A CN202110938544.2A CN202110938544A CN113836289A CN 113836289 A CN113836289 A CN 113836289A CN 202110938544 A CN202110938544 A CN 202110938544A CN 113836289 A CN113836289 A CN 113836289A
Authority
CN
China
Prior art keywords
entity
semantic
search content
expansion
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110938544.2A
Other languages
Chinese (zh)
Other versions
CN113836289B (en
Inventor
杜军平
黄恩一
薛哲
徐欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110938544.2A priority Critical patent/CN113836289B/en
Publication of CN113836289A publication Critical patent/CN113836289A/en
Application granted granted Critical
Publication of CN113836289B publication Critical patent/CN113836289B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

One or more embodiments of the present disclosure provide a method and an apparatus for recommending an entity evolution rule, including: searching to obtain search content related to the entity keywords according to the input entity keywords, performing semantic expansion on the search content to obtain the search content after the semantic expansion, extracting semantic features from the search content after the semantic expansion, determining semantic association relations among the semantic features, constructing a graph structure according to the semantic association relations, clustering based on the graph structure to obtain at least one entity group, calculating the heat value of the entity group where the entity keywords are located, and outputting a recommendation result according to the calculation result of the heat value. The method of the embodiment can recommend the research popularity and the evolution rule of the related entity keywords to the user, and has high calculation efficiency and accurate recommendation result.

Description

Entity evolution law recommendation method and device
Technical Field
One or more embodiments of the present disclosure relate to the technical field of artificial intelligence, and in particular, to a method and an apparatus for recommending an entity evolution law.
Background
At present, some personalized recommendation requirements recommend research hotspots and evolution rule change conditions related to entity keywords according to the input entity keywords. In order to realize the personalized recommendation, entity keywords are analyzed by an evolution rule analysis algorithm, the existing evolution rule analysis algorithm has high complexity, entities with untight association relation cannot be screened out, and certain errors exist in analysis results.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure are directed to a method for recommending an entity evolution rule, which can accurately recommend an evolution rule situation of an entity.
In view of the foregoing, one or more embodiments of the present specification provide an entity evolution rule recommendation method, including:
searching to obtain search content related to the entity keywords according to the input entity keywords;
performing semantic expansion on the search content to obtain search content after the semantic expansion;
extracting semantic features from the search content after semantic expansion;
determining semantic association relations among the semantic features;
constructing a graph structure according to the semantic association relation, and clustering based on the graph structure to obtain at least one entity group;
and calculating the heat value of the entity group where the entity key words are located, and outputting a recommendation result according to the calculation result of the heat value.
Optionally, the search content includes an entity title and entity content;
performing semantic expansion on each piece of search content to obtain the search content after the semantic expansion, wherein the semantic expansion comprises the following steps:
calculating the similarity between the entity title and the entity content;
and responding to the similarity meeting a preset expansion condition, and expanding the entity title according to the entity content to obtain the search content after semantic expansion.
Optionally, calculating the similarity between the entity title and the entity content is: calculating the Jaccard distance between the entity title and the entity content;
the similarity satisfies the preset expansion conditions as follows: the Jaccard distance is larger than a preset first distance threshold value and smaller than a preset second distance threshold value.
Optionally, after extracting semantic features from the search content after semantic expansion, the method further includes:
and performing feature dimension reduction processing on the semantic features to obtain the semantic features after dimension reduction.
Optionally, determining the semantic association relationship between the semantic features is:
and outputting semantic association relations among the semantic features by using the multichannel attention model by taking the semantic features as input.
Optionally, constructing a graph structure according to the semantic association relationship, and performing clustering based on the graph structure to obtain at least one entity group, including:
constructing an entity bipartite graph according to the semantic association relation;
for each entity node in the entity bipartite graph:
calculating the similarity between the entity node and other entity nodes, and storing other entity nodes with the similarity larger than a preset threshold in a similarity matrix of the entity node;
determining all dual connected components according to a graph structure formed by entity nodes in the similar matrix;
and in response to the double communication component meeting a preset communication condition, classifying each entity node of the double communication component into an entity group of the entity node.
Optionally, in response to that the dual connectivity component satisfies a preset connectivity condition, classifying each entity node of the bidirectional connectivity component into an entity group of the entity node, including:
performing minimum segmentation calculation on the dual connectivity components in response to the fact that the number of nodes on the dual connectivity components is larger than a preset upper limit threshold value until the number of nodes on the split dual connectivity components is smaller than or equal to the upper limit threshold value;
for the preprocessed double-connected components with the node number smaller than or equal to the upper limit threshold, performing minimum segmentation calculation on the preprocessed double-connected components in response to the node number smaller than a preset reasonable threshold to obtain reasonable double-connected components;
and classifying entity nodes on the reasonable dual connectivity components into entity groups.
Optionally, calculating a heat value of the entity group where the entity keyword is located, and outputting a recommendation result according to a calculation result of the heat value, where the calculating step includes:
determining each entity node in the entity group where the entity key word is located;
respectively calculating the heat value of each entity node according to the search content of each entity node;
and outputting recommendation results according to the sequence of the heat value of each entity node from large to small.
Optionally, the search content includes time information;
and clustering based on the graph structure to obtain at least one entity group as follows:
clustering based on the graph structure and the time information to obtain at least one entity group;
the output of the recommendation results according to the sequence of the heat value of each entity node from big to small is as follows:
and outputting the recommendation results in a specific time period according to the sequence of the heat value of each entity node from large to small.
An embodiment of the present specification further provides an entity evolution law recommendation device, including:
the searching module is used for searching and obtaining searching contents related to the entity keywords according to the input entity keywords;
the semantic expansion module is used for performing semantic expansion on the search content to obtain the search content after the semantic expansion;
the feature extraction module is used for extracting semantic features from the search content after the semantic expansion;
the semantic association module is used for determining semantic association relation among the semantic features;
the clustering module is used for constructing a graph structure according to the semantic association relation and clustering based on the graph structure to obtain at least one entity group;
and the output module is used for calculating the heat value of the entity group where the entity key words are located and outputting the recommendation result according to the calculation result of the heat value.
As can be seen from the above, in the entity evolution law recommendation method and apparatus provided in one or more embodiments of the present specification, search content related to an entity keyword is obtained by searching according to the input entity keyword, the search content is subjected to semantic expansion to obtain search content after the semantic expansion, semantic features are extracted from the search content after the semantic expansion, a semantic association relationship between the semantic features is determined, a graph structure is constructed according to the semantic association relationship, clustering is performed based on the graph structure to obtain at least one entity group, a heat value of the entity group where the entity keyword is located is calculated, and a recommendation result is output according to a calculation result of the heat value. The method of the embodiment can recommend the research popularity and the evolution rule of the related entity keywords to the user, and has high calculation efficiency and accurate recommendation result.
Drawings
In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.
FIG. 1 is a schematic flow chart of a method according to one or more embodiments of the present disclosure;
FIG. 2 is a schematic diagram of a self-coder model in accordance with one or more embodiments of the present disclosure;
FIG. 3 is a diagram of a two-part entity diagram in accordance with one or more embodiments of the present disclosure;
FIG. 4 is a schematic diagram of an apparatus according to one or more embodiments of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to one or more embodiments of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the specification is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
As described in the background section, existing evolution law analysis algorithms include an evolution law analysis algorithm based on a K-means clustering algorithm, an evolution law analysis algorithm based on an artificial neural network, and an evolution law analysis algorithm based on a coacervation hierarchical clustering, which are large in calculation amount and high in time complexity, and further, entities with untight association relationship are not screened out, and a hash table data structure is adopted to store the association relationship between the entities with untight association relationship, and a result is obtained by traversing the storage structure, so that resource waste is caused, the operation efficiency is reduced, and an error exists in the result.
In view of this, embodiments of the present disclosure provide a method for recommending an entity evolution rule, in which a graph structure is constructed according to semantic association relationships of entities, and an entity evolution rule is analyzed through the graph structure, so that time complexity can be reduced, entities with untight association relationships can be screened, operation efficiency is improved, and recommendation accuracy is improved.
Hereinafter, the technical means of the present disclosure will be described in further detail with reference to specific examples.
As shown in fig. 1, one or more embodiments of the present specification provide an entity evolution law recommendation method, including:
s101: searching to obtain search content related to the entity keywords according to the input entity keywords;
in this embodiment, a user inputs an entity keyword to be queried, and obtains a plurality of pieces of search content related to the entity keyword through search. For example, when an entity keyword of "machine learning" is input, several pieces of search content related to machine learning can be obtained, each piece of search content is related to machine learning, and there may be an association or no association between the pieces of search content.
S102: performing semantic expansion on the search content to obtain the search content after the semantic expansion;
s103: extracting semantic features from the search content after semantic expansion;
s104: determining semantic association relations among the semantic features;
in the embodiment, after the search content is obtained through searching, in order to obtain a more comprehensive and accurate recommendation result, semantic expansion is performed on each piece of search content to obtain the search content after the semantic expansion; extracting semantic features based on the search content of semantic expansion; and after the semantic features of each piece of search content are extracted, mining semantic association relations among the semantic features of all the search content.
S105: constructing a graph structure according to the semantic association relation, and clustering based on the graph structure to obtain at least one entity group;
in this embodiment, after determining the semantic association relationship of the search content, a graph structure is constructed according to the semantic association relationship, and clustering is performed based on the graph structure to obtain an entity group corresponding to each entity node in the graph structure.
S106: and calculating the heat value of the entity group where the entity key words are located, and outputting a recommendation result according to the calculation result of the heat value.
In this embodiment, after all the entity groups are determined, the popularity value of the entity group in which the entity keyword is located is calculated to obtain the popularity value calculation result of each entity node in the corresponding entity group, and then the recommendation result is determined according to the popularity value calculation result, and the recommendation result is output to the user.
The entity evolution law recommendation method provided by the embodiment comprises the steps of searching to obtain search content related to entity keywords according to input entity keywords, performing semantic expansion on the search content to obtain search content after the semantic expansion, extracting semantic features from the search content after the semantic expansion, determining semantic association relations among the semantic features, constructing a graph structure according to the semantic association relations, clustering based on the graph structure to obtain at least one entity group, calculating the heat value of the entity group where the entity keywords are located, and outputting a recommendation result according to the calculation result of the heat value. The method of the embodiment utilizes the graph structure to perform clustering, so that the operation efficiency can be improved, and the constructed graph structure screens out entities with untight association relation, so that the recommendation accuracy can be improved.
In some embodiments, the search content includes an entity title and entity content;
performing semantic expansion on each piece of search content to obtain the search content after the semantic expansion, wherein the semantic expansion comprises the following steps:
calculating the similarity between the entity title and the entity content;
and responding to the similarity meeting the preset expansion condition, and expanding the entity title according to the entity content to obtain the search content after semantic expansion.
In this embodiment, the search content obtained according to the entity keyword search includes an entity title and an entity content, generally, the entity title is a concise sentence, the entity content is a detailed content related to the entity title, such as a complete sentence, a paragraph, and a picture, and the detailed content includes, for example, time, additional information related to the entity, such as an entity item. And for each piece of search content, judging whether semantic expansion is needed or not when the semantic expansion is needed, and if the semantic expansion is not needed, not performing the semantic expansion. Specifically, for each piece of search content, the similarity between the entity title and the entity content is calculated, if the similarity between the entity title and the entity content meets the expansion condition, the piece of search content needs semantic expansion, the entity title is subjected to semantic expansion according to the entity content, and if the similarity does not meet the expansion condition, the semantic expansion is not needed.
In some modes, the method for calculating the similarity between the entity title and the entity content and judging whether the similarity meets the expansion condition includes calculating the Jaccard distance between the entity title and the entity content, judging whether the Jaccard distance is greater than a preset first distance threshold and smaller than a preset second distance threshold, and if so, judging that the search content needs semantic expansion.
Wherein, the Jaccard distance between the entity title and the entity content is calculated by the following formula:
Figure BDA0003214111460000061
in the formula, TiIs a characteristic of the ith word in the entity title word set (the set of all words contained in the entity title), Ci,jFor the joint feature of the ith word and the jth word in the entity content word set (the set composed of all words contained in the entity content), i |, is the number of words in the set.
A first distance threshold value alpha and a second distance threshold value beta are set, and the first distance threshold value is smaller than the second distance threshold value. At J (T)i,Ci,j) If the recommendation result is less than alpha, the semantic features of the entity title and the entity content are far different, and semantic expansion is performed according to the entity content on the basis of the entity title, so that the final recommendation result is greatly influenced. For example, the search content is scientific and technological requirement information published by a certain company, the entity title of the search content is "turbocharger design", and the entity content includes "publication time: 2020-01-23;the issuing company: company A; and (4) fund of the project: x ten thousand yuan and the like. In the item of search content, the correlation of semantic features between the entity title and the entity content is small, and the intention after semantic expansion is greatly changed, which may cause semantic deviation.
At J (T)i,Ci,j) Under the condition of beta, the semantic features of the entity title and the entity content are very close, and the semantic expansion is carried out according to the entity content on the basis of the entity title, so that the recommendation result is not obviously influenced.
In view of this, in order to avoid the adverse effect of semantic expansion on the final result, in the present embodiment, the expansion condition is set to α < J (T)i,Ci,j) Beta, namely when the calculated Jaccard distance is larger than the first distance threshold and smaller than the second distance threshold, semantic expansion is required to be carried out according to entity content on the basis of the entity title so as to obtain an accurate and comprehensive recommendation result. Optionally, the selection of the first distance threshold α and the second distance threshold β may be verified in multiple test processes to obtain an optimal value, and the specific value is not limited in this embodiment.
For example, in a piece of technical requirement information, the entity title is "rdma technology-based distributed communication engine development", and the entity content includes "rdma is an abbreviation of Remote Direct Memory Access, which means Remote Direct data Access, which is generated to solve the delay of server-side data processing in network transmission. The company realizes the development of a new communication engine through rdma technology and realizes the development in a zookeeper distributed mode, and hopes for a conscious person to join. The similarity between the entity title and the entity content is calculated, the similarity between the entity title and the entity content meets the expansion condition, the entity content is added on the basis of the entity title, and the search content after semantic expansion is obtained, namely the distributed communication engine development _ network transmission _ zookeeper based on the rdma technology, so that the search content after semantic expansion is more complete and comprehensive, and the recommendation result obtained on the basis is more complete and complete. In some modes, in order to avoid that the search content after semantic expansion is too long and affects the calculation efficiency, a length threshold value can be set, so that the length of the search content after semantic expansion is not greater than the length threshold value, and the calculation efficiency is ensured.
In the embodiment, whether semantic expansion is needed or not is judged for each piece of search content, the semantic expansion is not carried out on the search content with small correlation between the entity title and the entity content, the diversified search content is screened out, and the influence on subsequent processing is avoided; semantic expansion is not performed on the search content with the large entity title and entity content relevance, and resource waste is avoided; and reasonably expanding the search content meeting the expansion condition to obtain a comprehensive and accurate recommendation result.
In some embodiments, after extracting semantic features from the search content after semantic expansion, the method further includes:
and performing feature dimension reduction processing on the semantic features to obtain the semantic features after dimension reduction.
In the embodiment, after the search content is subjected to semantic expansion, semantic features are extracted from the search content by using a bag-of-words model, the extracted semantic features are high-dimensional data, and feature dimension reduction processing is performed on the extracted semantic features to reduce redundant information and processing complexity, so that the semantic features after dimension reduction are obtained.
In some modes, the semantic features of each dimension extracted by the bag-of-words model are normalized to obtain normalized semantic features, which are expressed as:
Figure BDA0003214111460000081
wherein x isjSemantic features, x, representing the jth wordnRepresenting semantic features of the nth dimension, vjAnd expressing the semantic features of the j term after normalization, wherein n represents the nth dimension, and k is the number of the dimensions.
The normalized semantic features form feature vectors, the feature vectors are input into an automatic encoder model for feature dimension reduction, the automatic encoder model comprises an encoder and a decoder, the encoder maps the input feature vectors to obtain encoded vectors, and the decoder maps the encoded vectors to obtain reconstructed feature vectors.
As shown in fig. 2, in this embodiment, the automatic encoder model includes a first encoding and decoding layer, a second encoding and decoding layer, and a third encoding and decoding layer, where the first encoding and decoding layer is configured to perform a first feature dimension reduction process to obtain a feature vector after the first dimension reduction, the second encoding and decoding layer is configured to extract a feature vector from the feature vector after the first dimension reduction, and the third encoding and decoding layer is configured to perform a second feature dimension reduction process based on the extracted feature vector to obtain a feature vector after the dimension reduction.
In some embodiments, the excitation function of the encoder of the first codec layer is:
Figure BDA0003214111460000082
wherein z isjFeature vector, z, representing the jth wordkA feature vector representing the K-th dimension, K being the number of dimensions,
Figure BDA0003214111460000083
representing a feature vector zjThe feature vector after the specialization is carried out,
Figure BDA0003214111460000084
representing a feature vector zkAnd (5) carrying out specialized feature vector.
The loss function of the encoder of the first codec layer is:
J=-∑ivilog oi (4)
wherein v isiFeature vector, o, representing the ith wordiIndicating the degree of loss of the ith word.
The excitation function of the encoder of the second codec layer is:
Figure BDA0003214111460000091
where t represents the weight of the excitation function, e-tRepresenting the weight after specialization.
The loss function of the encoder of the second codec layer is:
Figure BDA0003214111460000092
wherein, wlExcitation function value p representing the l-th word calculated by the first codec layerlAnd the loss function value of the first word calculated by the first coding and decoding layer is represented.
And (4) decoding the characteristics of the hidden layer by minimizing the distance between the output characteristic vector and the input characteristic vector by using a loss function shown in the formula (6), and carrying out unsupervised training on the automatic coding model. After the trained automatic coding model is used for feature dimension reduction, the calculation result of the hidden layer of the model is the semantic feature after dimension reduction. It should be noted that the above automatic coding model is only an exemplary illustration, and a specific structure and principle of the model are not explained, and in other ways, other models may also be used to perform feature dimension reduction, and the specific model is not limited.
In some embodiments, the semantic association relationship between semantic features is determined as:
and outputting semantic association relations among the semantic features by using the multi-channel attention model by taking the semantic features as input.
In this embodiment, after determining semantic features of all search contents, the semantic features are used as input of a multi-channel attention model, and semantic association relations among the semantic features are output by the multi-channel attention model. The present embodiment does not specifically explain the specific structure and principle of the multi-channel attention model.
In some embodiments, constructing a graph structure according to the semantic association relationship, and clustering based on the graph structure to obtain at least one entity group includes:
constructing an entity bipartite graph according to the semantic association relationship;
for each entity node in the entity bipartite graph:
calculating the similarity between the entity node and other entity nodes, and storing other entity nodes with the similarity larger than a preset threshold in a similarity matrix of the entity node;
determining all dual connected components according to a graph structure formed by entity nodes in the similar matrix;
and in response to the dual connectivity component meeting a preset connectivity condition, classifying each entity node of the dual connectivity component into an entity group of the entity node.
In this embodiment, after determining the semantic association relationship of the semantic features of the search content, an entity bipartite graph is constructed based on the semantic association relationship, each entity node of the entity bipartite graph corresponds to an entity word in the search content, and an edge between the entity nodes indicates that two entity words have the semantic association relationship. Based on the entity bipartite graph, for each entity node, calculating the similarity between the entity node and other entity nodes, storing other entity nodes with the similarity larger than a threshold value in a similarity matrix corresponding to the entity node, and after traversing all other entity nodes, obtaining the similarity matrix in which other entity nodes with certain similarity to the entity node are stored; and then, constructing a graph structure for the entity nodes in the similarity matrix, determining double-connection components of the graph structure, sequentially judging whether each double-connection component meets a connection condition, storing each entity node on the double-connection components meeting the connection condition in an entity group of the entity node, finishing the clustering processing of the entity node and the related entity nodes thereof, wherein the semantic features of each entity node in the entity group have certain similarity.
In some modes, the similarity matrix is a hash table data structure, and nodes with untight association relation can be stored. Since the same node may exist in the entity group of different nodes, the calculation and analysis of the dual connectivity component are required.
In some embodiments, in response to the dual connectivity component satisfying a preset connectivity condition, classifying each entity node of the dual connectivity component into an entity group of the entity node includes:
performing minimum partition calculation on the double-communication component in response to the fact that the number of nodes on the double-communication component is larger than a preset upper limit threshold value until the number of nodes on the split double-communication component is smaller than or equal to the upper limit threshold value;
for the preprocessed double-connected components with the node number smaller than or equal to the upper limit threshold, performing minimum segmentation calculation on the preprocessed double-connected components in response to the node number smaller than a preset reasonable threshold to obtain reasonable double-connected components;
and classifying entity nodes on the reasonable double-communication component into entity groups.
In this embodiment, whether the connected condition is satisfied is determined according to each dual-connected component determined by the graph structure. If the number of the nodes on the double-communication component is larger than the upper limit threshold, performing minimum segmentation calculation on the double-communication component until the number of the nodes on the double-communication component after minimum segmentation is smaller than or equal to the upper limit threshold; then, judging whether the number of the nodes is smaller than a reasonable threshold value or not for all the double-connected components of which the number of the nodes is smaller than or equal to the upper limit threshold value, further performing minimum segmentation calculation if the number of the nodes is smaller than the reasonable threshold value to obtain reasonable double-connected components, and removing unreasonable entity nodes by combining with screening of the reasonable threshold value to improve the accuracy; therefore, entity nodes on the reasonable double-communication component obtained after the preliminary processing of the number of the nodes and the filtering processing of the unreasonable entity nodes are added into the entity group to finish the clustering of the entity nodes, the entity nodes belonging to the same class not only have the similarity of semantic features, but also filter the unreasonable entity nodes, and errors can be reduced.
Optionally, the number of nodes on the reasonable dual connectivity component is greater than 2 and smaller than the upper threshold.
For example, as shown in fig. 3, the input entity keyword is "machine learning," the obtained search content includes entity words "data mining," "machine manufacturing," and "medicine," etc., an entity bipartite graph is constructed according to semantic association between the entity words, and the similarity between two entity words in the entity bipartite graph is calculated, for example, the similarity between "machine learning" and "data mining" is 0.98, the similarity between "machine learning" and "machine manufacturing" is 0.32, and the similarity between "machine learning" and "medicine" is 0.47. Determining all double-connection components based on a graph structure constructed by entity words with similarity larger than a threshold, and obtaining an entity group 1 and an entity group 2 through judgment of connection conditions, wherein the entity group 1 comprises entity words of 'machine learning', 'data mining' and 'medicine', and the entity group 2 comprises entity words of 'mechanical manufacturing'.
In some embodiments, calculating a heat value of an entity group in which the entity keyword is located, and outputting a recommendation result according to the calculation result of the heat value includes:
determining each entity node in the entity group where the entity key word is located;
respectively calculating the heat value of each entity node according to the search content of each entity node;
and outputting recommendation results according to the sequence of the heat value of each entity node from large to small.
In this embodiment, after a plurality of entity groups are obtained through clustering, the entity group where the entity keyword is located is determined, and each entity word in the entity group has certain similarity with the entity keyword. And then, respectively calculating the heat value of each entity word in the entity group, and outputting recommendation results according to the heat value sequence of each entity word after the heat values of all the entity words are calculated.
In some approaches, the heat value of an entity word may be calculated based on the search content in which the entity word is located. For example, the search content containing the entity word "data mining" is "X data mining item, publication time: 2020-01-23; the issuing company: company A; and (4) fund of the project: x ten thousand yuan ", the heat value of the entity word" data mining "is: (current time-release time)/time base + project funds/consideration base + total amount of scientific and technological requirement information released by the releasing company. It should be noted that, the calculation of the heat value may be adaptively configured according to the specific application field and the need, and the calculation method of the heat value is not particularly limited.
In some embodiments, after calculating the heat value of each entity node, the method further includes:
determining a comprehensive heat value of each entity node according to the similarity between the entity node and the entity key word and the heat value of each entity node;
the recommendation results are output according to the sequence of the heat value of each entity node from big to small: and outputting recommendation results according to the sequence of the comprehensive heat value of each entity node from large to small.
In this embodiment, for the recommendation sequence of each entity node in the same entity group, not only the heat value of the entity node is considered, but also the similarity between the entity node and the entity keyword is combined to obtain a comprehensive recommendation result. For example, the entity nodes are assigned with weights from large to small according to the similarity, the comprehensive heat value is calculated according to the weights and the heat values of the entity nodes, and finally, the comprehensive recommendation result is obtained according to the sequence of the comprehensive heat values from large to small.
In some embodiments, the search content includes time information;
clustering is carried out based on the graph structure, and at least one entity group is obtained as follows: clustering based on the graph structure and the time information to obtain at least one entity group;
the recommendation results are output according to the sequence of the heat value of each entity node from big to small: and outputting the recommendation results in a specific time period according to the sequence of the heat value of each entity node from large to small.
In this embodiment, a recommendation result in a specific time period may be obtained. Specifically, the search content includes time information (e.g., release time), an entity group is obtained according to clustering processing of the time information, a heat value of the entity group where the entity keyword is located is calculated, and then a recommendation result of a specific time period is output according to a sequence of the heat values of the entity nodes in the entity group, or a comprehensive recommendation result of the specific time period is output according to the heat values of the entity nodes and the similarity between the entity nodes and the entity keyword. For example, based on the graph structure, the entity nodes within the year 2019-2010 are classified into an entity group, and the recommendation result within the year is obtained.
It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.
It should be noted that the above description describes certain embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
As shown in fig. 4, an embodiment of the present specification further provides an entity evolution law recommendation device, including:
the searching module is used for searching and obtaining searching contents related to the entity keywords according to the input entity keywords;
the semantic expansion module is used for performing semantic expansion on the search content to obtain the search content after the semantic expansion;
the feature extraction module is used for extracting semantic features from the search content after semantic expansion;
the semantic association module is used for determining semantic association relations among the semantic features;
the clustering module is used for constructing a graph structure according to the semantic association relation and clustering based on the graph structure to obtain at least one entity group;
and the output module is used for calculating the heat value of the entity group where the entity key words are located and outputting the recommendation result according to the calculation result of the heat value.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.
The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The electronic device of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures, for simplicity of illustration and discussion, and so as not to obscure one or more embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the understanding of one or more embodiments of the present description, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the one or more embodiments of the present description are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that one or more embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (10)

1. A method for recommending entity evolution rules is characterized by comprising the following steps:
searching to obtain search content related to the entity keywords according to the input entity keywords;
performing semantic expansion on the search content to obtain search content after the semantic expansion;
extracting semantic features from the search content after semantic expansion;
determining semantic association relations among the semantic features;
constructing a graph structure according to the semantic association relation, and clustering based on the graph structure to obtain at least one entity group;
and calculating the heat value of the entity group where the entity key words are located, and outputting a recommendation result according to the calculation result of the heat value.
2. The method of claim 1, wherein the search content comprises a physical title and physical content;
performing semantic expansion on each piece of search content to obtain the search content after the semantic expansion, wherein the semantic expansion comprises the following steps:
calculating the similarity between the entity title and the entity content;
and responding to the similarity meeting a preset expansion condition, and expanding the entity title according to the entity content to obtain the search content after semantic expansion.
3. The method of claim 2, wherein the similarity between the entity title and the entity content is calculated as: calculating the Jaccard distance between the entity title and the entity content;
the similarity satisfies the preset expansion conditions as follows: the Jaccard distance is larger than a preset first distance threshold value and smaller than a preset second distance threshold value.
4. The method of claim 1, wherein after extracting semantic features from the semantically expanded search content, further comprising:
and performing feature dimension reduction processing on the semantic features to obtain the semantic features after dimension reduction.
5. The method of claim 1, wherein determining the semantic association relationship between the semantic features is:
and outputting semantic association relations among the semantic features by using the multichannel attention model by taking the semantic features as input.
6. The method of claim 1, wherein constructing a graph structure according to the semantic association relationship, and clustering based on the graph structure to obtain at least one entity group comprises:
constructing an entity bipartite graph according to the semantic association relation;
for each entity node in the entity bipartite graph:
calculating the similarity between the entity node and other entity nodes, and storing other entity nodes with the similarity larger than a preset threshold in a similarity matrix of the entity node;
determining all dual connected components according to a graph structure formed by entity nodes in the similar matrix;
and in response to the double communication component meeting a preset communication condition, classifying each entity node of the double communication component into an entity group of the entity node.
7. The method of claim 6, wherein classifying each entity node of the bi-directional connectivity component into an entity group of the entity node in response to the dual connectivity component satisfying a predetermined connectivity condition comprises:
performing minimum segmentation calculation on the dual connectivity components in response to the fact that the number of nodes on the dual connectivity components is larger than a preset upper limit threshold value until the number of nodes on the split dual connectivity components is smaller than or equal to the upper limit threshold value;
for the preprocessed double-connected components with the node number smaller than or equal to the upper limit threshold, performing minimum segmentation calculation on the preprocessed double-connected components in response to the node number smaller than a preset reasonable threshold to obtain reasonable double-connected components;
and classifying entity nodes on the reasonable dual connectivity components into entity groups.
8. The method of claim 1, wherein calculating the heat value of the entity group in which the entity keyword is located, and outputting the recommendation result according to the calculation result of the heat value comprises:
determining each entity node in the entity group where the entity key word is located;
respectively calculating the heat value of each entity node according to the search content of each entity node;
and outputting recommendation results according to the sequence of the heat value of each entity node from large to small.
9. The method of claim 8, wherein the search content includes time information;
and clustering based on the graph structure to obtain at least one entity group as follows:
clustering based on the graph structure and the time information to obtain at least one entity group;
the output of the recommendation results according to the sequence of the heat value of each entity node from big to small is as follows:
and outputting the recommendation results in a specific time period according to the sequence of the heat value of each entity node from large to small.
10. An entity evolution law recommendation device, comprising:
the searching module is used for searching and obtaining searching contents related to the entity keywords according to the input entity keywords;
the semantic expansion module is used for performing semantic expansion on the search content to obtain the search content after the semantic expansion;
the feature extraction module is used for extracting semantic features from the search content after the semantic expansion;
the semantic association module is used for determining semantic association relation among the semantic features;
the clustering module is used for constructing a graph structure according to the semantic association relation and clustering based on the graph structure to obtain at least one entity group;
and the output module is used for calculating the heat value of the entity group where the entity key words are located and outputting the recommendation result according to the calculation result of the heat value.
CN202110938544.2A 2021-08-16 2021-08-16 Entity evolution rule recommendation method and device Active CN113836289B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110938544.2A CN113836289B (en) 2021-08-16 2021-08-16 Entity evolution rule recommendation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110938544.2A CN113836289B (en) 2021-08-16 2021-08-16 Entity evolution rule recommendation method and device

Publications (2)

Publication Number Publication Date
CN113836289A true CN113836289A (en) 2021-12-24
CN113836289B CN113836289B (en) 2023-06-09

Family

ID=78960677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110938544.2A Active CN113836289B (en) 2021-08-16 2021-08-16 Entity evolution rule recommendation method and device

Country Status (1)

Country Link
CN (1) CN113836289B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663122A (en) * 2012-04-20 2012-09-12 北京邮电大学 Semantic query expansion algorithm based on emergency ontology
US20150058309A1 (en) * 2013-08-23 2015-02-26 Naver Corporation Keyword presenting system and method based on semantic depth structure
CN106528633A (en) * 2016-10-11 2017-03-22 杭州电子科技大学 Method for improving social attention of video based on keyword recommendation
CN107169010A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 A kind of determination method and device of recommendation search keyword
US20180137137A1 (en) * 2016-11-16 2018-05-17 International Business Machines Corporation Specialist keywords recommendations in semantic space
CN108197098A (en) * 2017-11-22 2018-06-22 阿里巴巴集团控股有限公司 A kind of generation of keyword combined strategy and keyword expansion method, apparatus and equipment
CN108241699A (en) * 2016-12-26 2018-07-03 百度在线网络技术(北京)有限公司 For the method and apparatus of pushed information
US10268703B1 (en) * 2012-01-17 2019-04-23 Google Llc System and method for associating images with semantic entities
CN110704743A (en) * 2019-09-30 2020-01-17 北京科技大学 Semantic search method and device based on knowledge graph
US20200250245A1 (en) * 2019-02-05 2020-08-06 Microstrategy Incorporated Incorporating opinion information with semantic graph data
CN112148701A (en) * 2020-09-23 2020-12-29 平安直通咨询有限公司上海分公司 File retrieval method and equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10268703B1 (en) * 2012-01-17 2019-04-23 Google Llc System and method for associating images with semantic entities
CN102663122A (en) * 2012-04-20 2012-09-12 北京邮电大学 Semantic query expansion algorithm based on emergency ontology
US20150058309A1 (en) * 2013-08-23 2015-02-26 Naver Corporation Keyword presenting system and method based on semantic depth structure
CN106528633A (en) * 2016-10-11 2017-03-22 杭州电子科技大学 Method for improving social attention of video based on keyword recommendation
US20180137137A1 (en) * 2016-11-16 2018-05-17 International Business Machines Corporation Specialist keywords recommendations in semantic space
CN108241699A (en) * 2016-12-26 2018-07-03 百度在线网络技术(北京)有限公司 For the method and apparatus of pushed information
CN107169010A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 A kind of determination method and device of recommendation search keyword
CN108197098A (en) * 2017-11-22 2018-06-22 阿里巴巴集团控股有限公司 A kind of generation of keyword combined strategy and keyword expansion method, apparatus and equipment
US20200250245A1 (en) * 2019-02-05 2020-08-06 Microstrategy Incorporated Incorporating opinion information with semantic graph data
CN110704743A (en) * 2019-09-30 2020-01-17 北京科技大学 Semantic search method and device based on knowledge graph
CN112148701A (en) * 2020-09-23 2020-12-29 平安直通咨询有限公司上海分公司 File retrieval method and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUNPING DU,ZHE XUE: "Prediction of Financial Big Data Stock Trends Based on Attention Mechanism", 《IEEE》 *
王晓阳,郑骁庆,肖仰华: "智慧搜索中的实体与关联关系建模与挖掘", 《通信学报》, vol. 36, no. 2 *

Also Published As

Publication number Publication date
CN113836289B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
KR20180069877A (en) Method and apparatus for building a machine learning based network model
CN111461004B (en) Event detection method and device based on graph attention neural network and electronic equipment
CN111353303B (en) Word vector construction method and device, electronic equipment and storage medium
WO2023045605A1 (en) Data processing method and apparatus, computer device, and storage medium
CN116720004B (en) Recommendation reason generation method, device, equipment and storage medium
CN110020427B (en) Policy determination method and device
WO2022140900A1 (en) Method and apparatus for constructing personal knowledge graph, and related device
CN115080742B (en) Text information extraction method, apparatus, device, storage medium, and program product
CN110569428A (en) recommendation model construction method, device and equipment
CN115129949A (en) Vector range retrieval method, device, equipment, medium and program product
CN116227467A (en) Model training method, text processing method and device
CN113626608B (en) Semantic-enhancement relationship extraction method and device, computer equipment and storage medium
CN113110843B (en) Contract generation model training method, contract generation method and electronic equipment
CN113535912A (en) Text association method based on graph convolution network and attention mechanism and related equipment
WO2023246849A1 (en) Feedback data graph generation method and refrigerator
CN105357583A (en) Method and device for discovering interest and preferences of intelligent television user
CN113836289B (en) Entity evolution rule recommendation method and device
CN116644180A (en) Training method and training system for text matching model and text label determining method
CN116127316A (en) Model training method, text abstract generating method and related equipment
CN114490946A (en) Xlnet model-based class case retrieval method, system and equipment
CN109325127B (en) Risk identification method and device
CN117973544B (en) Text unit reasoning method device based on semantic distance, storage medium and terminal
CN116226541B (en) Knowledge graph-based network hotspot information recommendation method, system and equipment
CN117909505B (en) Event argument extraction method and related equipment
CN111723567B (en) Text selection data processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant