CN113742464A - News event discovery algorithm and device based on heterogeneous information network - Google Patents

News event discovery algorithm and device based on heterogeneous information network Download PDF

Info

Publication number
CN113742464A
CN113742464A CN202110867857.3A CN202110867857A CN113742464A CN 113742464 A CN113742464 A CN 113742464A CN 202110867857 A CN202110867857 A CN 202110867857A CN 113742464 A CN113742464 A CN 113742464A
Authority
CN
China
Prior art keywords
news
matrix
keyword
keywords
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110867857.3A
Other languages
Chinese (zh)
Inventor
仇瑜
刘德兵
黄朝园
于凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhipu Huazhang Technology Co Ltd
Original Assignee
Beijing Zhipu Huazhang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhipu Huazhang Technology Co Ltd filed Critical Beijing Zhipu Huazhang Technology Co Ltd
Priority to CN202110867857.3A priority Critical patent/CN113742464A/en
Publication of CN113742464A publication Critical patent/CN113742464A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a news event discovery algorithm and a device based on a heterogeneous information network, and the method comprises the following steps: extracting news of various topics, preprocessing the news at the same time, selecting a plurality of keywords of the articles according to the importance degree of each keyword, and generating a keyword set according to the keywords; fusing the emotion information of the keyword set, and predicting by a prediction model to obtain an event group; constructing an element path or an element diagram of the event group to obtain a construction matrix, and generating a distance matrix according to the construction matrix; extracting the characteristics of the distance matrix and the event group through a graph attention network to obtain a characteristic matrix; constructing a recommendation cluster according to the characteristic matrix; and selecting news larger than a preset threshold of the similarity of the original article in the recommendation cluster for recommendation. The method provided by the application can integrate the emotional information of the article, and the accuracy of recommending the news topics is improved to a certain extent; and the distance matrix can be constructed through the HIN, so that the time complexity of model training is reduced.

Description

News event discovery algorithm and device based on heterogeneous information network
Technical Field
The invention relates to the technical field of information networks, in particular to a news event discovery algorithm based on a heterogeneous information network.
Background
Target information is quickly searched from massive text data, and the development trend of tracking the current hot topics in real time gradually becomes the actual demand of users. Topic Detection and Tracking (TDT) technology targeting real-time detection and tracking has gradually come into an era of great diversity, and nowadays, in scientific and technological enterprises and government departments, for real-time tracking of social public opinion guidance, topic detection and tracking algorithms facing news events have become the key research directions of computer researchers. However, the current topic detection or event discovery algorithm does not consider the emotional information of each keyword, and articles with the same emotional color cannot be recommended. Secondly, the word frequency inverse document probability value of the text is calculated by the traditional text similarity algorithm through TF-IDF, but keywords with the same frequency can appear, which causes different degrees of influence on the document where the keywords are located. Event discovery is performed only through similarity of keywords, and loss of a large amount of hidden information of emotions and articles of a user can be caused. The event discovery task cannot be completed accurately.
The current approximation method:
1. news recommendation for performing relevancy comparison by calculating text word frequency only according to TF-IDF;
2. after extracting the keywords, directly recommending through a Heterogeneous Information Network (HIN);
3. the article makes recommendations according to the graph attention network (GAT).
However, the current methods have disadvantages such as the following:
1) the emotional colors of articles are not well considered
For example, the article is the ancient science, and the key information is NBA, science and ancient. The emotional sadness of the users is that the users want to pay attention to the reason why the science is going to be lost, and the information is reported more. And not NBA, who in turn obtained the MVP, what the other globalstar did during this time period.
2) Complexity of heterogeneous graph neural networks
Although the heterogeneous graph neural network is a framework capable of identifying node features and semantic features based on multi-element path processing, the framework needs to specify the number of element paths in advance and perform one-time graph attention network training on an adjacent matrix of each element path together with the same feature matrix, so that the time complexity of model training is greatly improved.
The generation of the abnormal composition is mainly generated by manually setting the style of the path, such as N → K ← N, wherein N represents news (news), K represents keywords (keys), and the meta path represents that the same keywords exist among news reports and are connected through the same keywords.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first object of the present invention is to provide a news event discovery algorithm based on a heterogeneous information network, so as to implement more accurate recommendation to a user.
The second purpose of the present invention is to provide a news event discovery device based on heterogeneous information network.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a news event discovery algorithm based on a heterogeneous information network, including the following steps:
step S1, extracting news of various topics, preprocessing the extracted news, selecting a plurality of keywords of an article according to the importance degree of each keyword, and generating a keyword set according to the keywords;
step S2, carrying out the fusion of emotion information on the keyword set, and obtaining an event group through prediction of a prediction model;
step S3, constructing the event group by a meta path or a meta graph to obtain a construction matrix, and generating a distance matrix according to the construction matrix;
step S4, extracting the characteristics of the distance matrix and the event group through a graph attention network to obtain a characteristic matrix;
step S5, constructing a recommendation cluster according to the feature matrix;
and step S6, selecting news in the recommendation cluster, wherein the news is larger than the preset threshold of the similarity of the original article, and recommending the news.
Optionally, in an embodiment of the present application, characterized in that,
the preprocessing comprises the step of performing word segmentation processing on the article through the crust word segmentation, and the importance degree of each keyword is as follows:
A-TFIDF=TFIDF+W
wherein the formula W is defined as:
W=n*o
wherein n is the number of the previous same numerical values in the keywords with the same numerical values, and o is a uniform minimum value of 1.0 e-16.
Optionally, in an embodiment of the present application, the S2 includes:
training the prediction model;
carrying out topic prediction on the trained prediction model to obtain a prediction result;
and corresponding the prediction result with the multiple topics to obtain a corresponding news event cluster set.
Optionally, in an embodiment of the present application, the training of the prediction model includes:
performing word embedding on the keyword set to obtain a word vector of the keyword set;
performing word embedding on the emotion information of the keyword to obtain a keyword emotion information word vector;
splicing the keyword set word vector and the keyword emotion information word vector, and reducing the dimension through a full connection layer;
and putting the keyword set word vector after dimensionality reduction and the emotion information word vector of the keyword into the prediction model to predict the topic.
Optionally, in an embodiment of the present application, the S3 includes:
constructing the event group by selecting an NKN path, an NUN path and an NLN path to obtain a meta-path construction matrix;
performing metagram construction on the event group by selecting NK (L \ U) KN to obtain a metagram construction matrix;
where N denotes a news instance, U denotes a person name, K denotes a keyword, and L denotes a place.
Optionally, in an embodiment of the present application, the S3 further includes:
performing PathSim calculation on the meta-path construction matrix and the meta-graph construction matrix to obtain the distance matrix, wherein a calculation formula of the distance matrix is as follows:
Figure BDA0003184975310000031
optionally, in an embodiment of the present application, S4 includes:
when the feature extraction is carried out through the graph attention network, the relevance existing among the graph attention network nodes is ensured;
performing a normalization operation by using Softmax, and comparing attention coefficients affecting the graph attention network nodes, wherein the attention coefficients are expressed by the following formula:
Figure BDA0003184975310000032
optionally, in an embodiment of the present application, S5 includes:
and adjusting parameters of the clustering algorithm to enable the recommended clusters to reach a preset threshold value of accuracy.
The news event discovery method based on the heterogeneous information network extracts news of various topics and simultaneously preprocesses the extracted news, selects a plurality of keywords of an article according to the importance degree of each keyword, and generates a keyword set according to the keywords; fusing the emotion information of the keyword set, and predicting by a prediction model to obtain an event group; constructing an element path or an element diagram of the event group to obtain a construction matrix, and generating a distance matrix according to the construction matrix; extracting the characteristics of the distance matrix and the event group through a graph attention network to obtain a characteristic matrix; constructing a recommendation cluster according to the characteristic matrix; and selecting news larger than a preset threshold of the similarity of the original article in the recommendation cluster for recommendation. The method provided by the application can integrate the emotional information of the article, and the accuracy of recommending the news topics is improved to a certain extent; and the distance matrix can be constructed through the HIN, so that the time complexity of model training is reduced.
In order to achieve the above object, a second embodiment of the present application provides a news event discovery apparatus based on a heterogeneous information network, including the following modules:
the data preprocessing module is used for extracting news of various topics, preprocessing the extracted news, selecting a plurality of keywords of an article according to the importance degree of each keyword, and generating a keyword set according to the keywords;
the prediction module is used for fusing emotion information of the keyword set and obtaining an event group through prediction of a prediction model;
the construction module is used for constructing the event group by the meta path or the meta graph to obtain a construction matrix and calculating the construction matrix to obtain a distance matrix;
the characteristic extraction module is used for extracting characteristics of the distance matrix and the event group through a graph attention network to obtain a characteristic matrix;
the characteristic clustering module is used for constructing a recommendation cluster according to the characteristic matrix;
and the recommending module is used for selecting news which is larger than the preset threshold of the similarity of the original article in the recommending cluster for recommending.
Optionally, in an embodiment of the present application, the data preprocessing module includes:
performing word segmentation processing on the article through the crust word segmentation, wherein the importance degree of each keyword is as follows:
A-TFIDF=TFIDF+W
wherein the formula W is defined as:
W=n*o
wherein n is the number of the previous same numerical values in the keywords with the same numerical values, and o is a uniform minimum value of 1.0 e-16.
According to the news event discovery device based on the heterogeneous information network, news of various topics is extracted and preprocessed at the same time, a plurality of keywords of an article are selected according to the importance degree of each keyword, and a keyword set is generated according to the keywords; fusing the emotion information of the keyword set, and predicting by a prediction model to obtain an event group; constructing an element path or an element diagram of the event group to obtain a construction matrix, and generating a distance matrix according to the construction matrix; extracting the characteristics of the distance matrix and the event group through a graph attention network to obtain a characteristic matrix; constructing a recommendation cluster according to the characteristic matrix; and selecting news larger than a preset threshold of the similarity of the original article in the recommendation cluster for recommendation. The method provided by the application can integrate the emotional information of the article, and the accuracy of recommending the news topics is improved to a certain extent; and the distance matrix can be constructed through the HIN, so that the time complexity of model training is reduced.
The technical effects of this application: firstly, the emotional information of the articles is merged, so that the accuracy of news topic recommendation can be improved to a certain extent, and news reports can be recommended better; secondly, the distance matrix is constructed only through the HIN, emotional colors are added in the forming process of the characteristic matrix, and the time complexity of model training is reduced. The two points can more accurately recommend the user.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a news event discovery algorithm based on a heterogeneous information network according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a news event discovery apparatus based on a heterogeneous information network according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
A news event discovery algorithm based on a heterogeneous information network according to an embodiment of the present invention is described below with reference to the accompanying drawings.
As shown in fig. 1, to achieve the above object, an embodiment of a first aspect of the present invention provides a news event discovery algorithm based on a heterogeneous information network, including the following steps:
step S1, extracting news of various topics, preprocessing the extracted news, selecting a plurality of keywords of the article according to the importance degree of each keyword, and generating a keyword set according to the keywords;
step S2, fusing emotion information of the keyword set, and obtaining an event group through prediction of a prediction model;
step S3, constructing the event group with meta-path or meta-graph to obtain a construction matrix, and generating a distance matrix according to the construction matrix;
step S4, extracting the characteristics of the distance matrix and the event group through a graph attention network to obtain a characteristic matrix;
step S5, constructing a recommendation cluster according to the feature matrix;
and step S6, selecting news with the similarity larger than the preset threshold value of the original article in the recommendation cluster for recommendation.
In one embodiment of the present application, the preprocessing includes performing word segmentation on the article by the ending word segmentation, and the importance of each keyword is:
A -TFIDF=TFIDF+W
wherein the formula W is defined as:
W=n*o
wherein n is the number of the previous same numerical values in the keywords with the same numerical values, and o is a uniform minimum value of 1.0 e-16.
In an embodiment of the present application, further, S2 includes:
training the improved model;
s211, word embedding is carried out on the keyword set;
s212, embedding words into the emotional information of the keywords;
s213, splicing the two word vectors, and reducing the dimension through a full connection layer;
s214, predicting topics by putting the vectors subjected to dimensionality reduction into a model;
s215, repeating the processes from S211 to S214 until the accuracy of topic prediction is not improved any more;
s22, carrying out topic prediction on the trained model;
and S23, corresponding the prediction result to the topic in the database to obtain a corresponding news event cluster set.
In an embodiment of the present application, further, S3 includes:
s31, an NKN path, a NUN path and an NLN path are selected, and the meta-path construction is performed on the event group. Where N represents a news instance, U represents a person name, K represents a keyword, and L represents a place.
S32, selecting NK (L \ U) KN as a metagraph to construct, wherein the metagraph represents that one news report can be related to another in various ways through places and users, and the relevance of the document is stronger.
And S33, performing PathSim calculation on the constructed matrix to generate a distance matrix. The calculation formula is as follows:
Figure BDA0003184975310000061
in an embodiment of the present application, further, S4 includes:
s41, performing stronger feature extraction on the distance matrix through a graph attention network to ensure that certain relevance exists between nodes;
s42, in order to compare attention coefficients influencing nodes, the method uses Softmax to carry out a normalization operation, and the formula is as follows:
Figure BDA0003184975310000062
in an embodiment of the present application, further, S5 includes:
the accuracy of the recommended cluster formation is ensured by continuously adjusting eps, min _ samples parameters in the DBSCAN algorithm, and articles with high similarity are prevented from becoming noise points.
According to the news event discovery algorithm based on the heterogeneous information network, news of various topics is extracted and preprocessed at the same time, a plurality of keywords of an article are selected according to the importance degree of each keyword, and a keyword set is generated according to the keywords; fusing the emotion information of the keyword set, and predicting by a prediction model to obtain an event group; constructing an element path or an element diagram of the event group to obtain a construction matrix, and generating a distance matrix according to the construction matrix; extracting the characteristics of the distance matrix and the event group through a graph attention network to obtain a characteristic matrix; constructing a recommendation cluster according to the characteristic matrix; and selecting news larger than a preset threshold of the similarity of the original article in the recommendation cluster for recommendation. The method provided by the application can integrate the emotional information of the article, and the accuracy of recommending the news topics is improved to a certain extent; and the distance matrix can be constructed through the HIN, so that the time complexity of model training is reduced.
As shown in fig. 2, to achieve the above object, a second aspect of the present application provides a news event discovery apparatus 10 based on heterogeneous information network, including the following modules:
the data preprocessing module 100 is configured to extract news of multiple topics, preprocess the extracted news, select multiple keywords of an article according to importance of each keyword, and generate a keyword set according to the multiple keywords;
the prediction module 200 is used for fusing emotion information of the keyword set and obtaining an event group through prediction of a prediction model;
a constructing module 300, configured to construct a meta path or a meta graph for an event group to obtain a construction matrix, and obtain a distance matrix by calculating the construction matrix;
the feature extraction module 400 is configured to perform feature extraction on the distance matrix and the event group through a graph attention network to obtain a feature matrix;
the feature clustering module 500 is used for constructing a recommendation cluster according to the feature matrix;
and the recommending module 600 is configured to select news with a similarity greater than a preset threshold of the original article in the recommending cluster for recommending.
Optionally, in an embodiment of the present application, the data preprocessing module includes:
the articles are subjected to word segmentation processing through the ending word segmentation, and the importance degree of each keyword is as follows:
A-TFIDF=TFIDF+W
wherein the formula W is defined as:
W=n*o
wherein n is the number of the previous same numerical values in the keywords with the same numerical values, and o is a uniform minimum value of 1.0 e-16.
According to the news event discovery device based on the heterogeneous information network, news of various topics is extracted and preprocessed at the same time, a plurality of keywords of an article are selected according to the importance degree of each keyword, and a keyword set is generated according to the keywords; fusing the emotion information of the keyword set, and predicting by a prediction model to obtain an event group; constructing an element path or an element diagram of the event group to obtain a construction matrix, and generating a distance matrix according to the construction matrix; extracting the characteristics of the distance matrix and the event group through a graph attention network to obtain a characteristic matrix; constructing a recommendation cluster according to the characteristic matrix; and selecting news larger than a preset threshold of the similarity of the original article in the recommendation cluster for recommendation. The method provided by the application can integrate the emotional information of the article, and the accuracy of recommending the news topics is improved to a certain extent; and the distance matrix can be constructed through the HIN, so that the time complexity of model training is reduced.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Claims (10)

1. A news event discovery algorithm based on a heterogeneous information network is characterized by comprising the following steps:
s1, extracting news of various topics, preprocessing the extracted news, selecting a plurality of keywords of an article according to the importance degree of each keyword, and generating a keyword set according to the keywords;
s2, fusing emotion information of the keyword set, and predicting through a prediction model to obtain an event group;
s3, constructing the event group by a meta path or a meta graph to obtain a construction matrix, and generating a distance matrix according to the construction matrix;
s4, extracting the characteristics of the distance matrix and the event group through a graph attention network to obtain a characteristic matrix;
s5, constructing a recommendation cluster according to the feature matrix;
and S6, selecting news in the recommendation cluster, wherein the news is larger than the preset threshold of the similarity of the original article, and recommending the news.
2. The news event discovery algorithm based on heterogeneous information network according to claim 1,
the preprocessing comprises the step of performing word segmentation processing on the article through the crust word segmentation, and the importance degree of each keyword is as follows:
A-TFIDF=TFIDF+W
wherein the formula W is defined as:
W=n*o
wherein n is the number of the previous same numerical values in the keywords with the same numerical values, and o is a uniform minimum value of 1.0 e-16.
3. The heterogeneous information network based news event discovery algorithm of claim 1, wherein the S2 comprises:
training the prediction model;
carrying out topic prediction on the trained prediction model to obtain a prediction result;
and corresponding the prediction result with the multiple topics to obtain a corresponding news event cluster set.
4. A news event discovery algorithm based on heterogeneous information network according to claim 3, wherein the training of the predictive model comprises:
performing word embedding on the keyword set to obtain a word vector of the keyword set;
performing word embedding on the emotion information of the keyword to obtain a keyword emotion information word vector;
splicing the keyword set word vector and the keyword emotion information word vector, and reducing the dimension through a full connection layer;
and putting the keyword set word vector after dimensionality reduction and the emotion information word vector of the keyword into the prediction model to predict the topic.
5. The heterogeneous information network based news event discovery algorithm of claim 1, wherein the S3 comprises:
constructing the event group by selecting an NKN path, an NUN path and an NLN path to obtain a meta-path construction matrix;
performing metagram construction on the event group by selecting NK (L \ U) KN to obtain a metagram construction matrix;
where N denotes a news instance, U denotes a person name, K denotes a keyword, and L denotes a place.
6. The heterogeneous information network based news event discovery algorithm of claim 1, wherein the S3 further comprises:
performing PathSim calculation on the meta-path construction matrix and the meta-graph construction matrix to obtain the distance matrix, wherein a calculation formula of the distance matrix is as follows:
Figure FDA0003184975300000021
7. the heterogeneous information network based news event discovery algorithm of claim 1, wherein S4 comprises:
when the feature extraction is carried out through the graph attention network, the relevance existing among the graph attention network nodes is ensured;
performing a normalization operation by using Softmax, and comparing attention coefficients affecting the graph attention network nodes, wherein the attention coefficients are expressed by the following formula:
Figure FDA0003184975300000022
8. the heterogeneous information network based news event discovery algorithm of claim 1, wherein S5 comprises:
and adjusting parameters of the clustering algorithm to enable the recommended clusters to reach a preset threshold value of accuracy.
9. A news event discovery apparatus based on a heterogeneous information network, comprising:
the data preprocessing module is used for extracting news of various topics, preprocessing the extracted news, selecting a plurality of keywords of an article according to the importance degree of each keyword, and generating a keyword set according to the keywords;
the prediction module is used for fusing emotion information of the keyword set and obtaining an event group through prediction of a prediction model;
the construction module is used for constructing the event group by the meta path or the meta graph to obtain a construction matrix and calculating the construction matrix to obtain a distance matrix;
the characteristic extraction module is used for extracting characteristics of the distance matrix and the event group through a graph attention network to obtain a characteristic matrix;
the characteristic clustering module is used for constructing a recommendation cluster according to the characteristic matrix;
and the recommending module is used for selecting news which is larger than the preset threshold of the similarity of the original article in the recommending cluster for recommending.
10. The news event discovery device based on heterogeneous information network according to claim 9, wherein said data preprocessing module comprises:
performing word segmentation processing on the article through the crust word segmentation, wherein the importance degree of each keyword is as follows:
A-TFIDF=TFIDF+W
wherein the formula W is defined as:
W=n*o
wherein n is the number of the previous same numerical values in the keywords with the same numerical values, and o is a uniform minimum value of 1.0 e-16.
CN202110867857.3A 2021-07-28 2021-07-28 News event discovery algorithm and device based on heterogeneous information network Pending CN113742464A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110867857.3A CN113742464A (en) 2021-07-28 2021-07-28 News event discovery algorithm and device based on heterogeneous information network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110867857.3A CN113742464A (en) 2021-07-28 2021-07-28 News event discovery algorithm and device based on heterogeneous information network

Publications (1)

Publication Number Publication Date
CN113742464A true CN113742464A (en) 2021-12-03

Family

ID=78729504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110867857.3A Pending CN113742464A (en) 2021-07-28 2021-07-28 News event discovery algorithm and device based on heterogeneous information network

Country Status (1)

Country Link
CN (1) CN113742464A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114078277A (en) * 2022-01-19 2022-02-22 深圳前海中电慧安科技有限公司 One-person-one-file face clustering method and device, computer equipment and storage medium
CN117391071A (en) * 2023-12-04 2024-01-12 中电科大数据研究院有限公司 News topic data mining method, device and storage medium
CN117910479A (en) * 2024-03-19 2024-04-19 湖南蚁坊软件股份有限公司 Method, device, equipment and medium for judging aggregated news
CN117910479B (en) * 2024-03-19 2024-06-04 湖南蚁坊软件股份有限公司 Method, device, equipment and medium for judging aggregated news

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114078277A (en) * 2022-01-19 2022-02-22 深圳前海中电慧安科技有限公司 One-person-one-file face clustering method and device, computer equipment and storage medium
CN117391071A (en) * 2023-12-04 2024-01-12 中电科大数据研究院有限公司 News topic data mining method, device and storage medium
CN117391071B (en) * 2023-12-04 2024-02-27 中电科大数据研究院有限公司 News topic data mining method, device and storage medium
CN117910479A (en) * 2024-03-19 2024-04-19 湖南蚁坊软件股份有限公司 Method, device, equipment and medium for judging aggregated news
CN117910479B (en) * 2024-03-19 2024-06-04 湖南蚁坊软件股份有限公司 Method, device, equipment and medium for judging aggregated news

Similar Documents

Publication Publication Date Title
Kumar et al. Movie recommendation system using sentiment analysis from microblogging data
Chen et al. Dynamic explainable recommendation based on neural attentive models
Xia et al. Scientific article recommendation: Exploiting common author relations and historical preferences
Bu et al. Improving collaborative recommendation via user-item subgroups
Wang et al. New approaches to mood-based hybrid collaborative filtering
Chen et al. Trend prediction of internet public opinion based on collaborative filtering
CN113742464A (en) News event discovery algorithm and device based on heterogeneous information network
Li et al. A novel time-aware hybrid recommendation scheme combining user feedback and collaborative filtering
Wang et al. VRConvMF: Visual recurrent convolutional matrix factorization for movie recommendation
Li et al. Hybrid deep framework for group event recommendation
Wang et al. An enhanced multi-modal recommendation based on alternate training with knowledge graph representation
Ulian et al. Exploring the effects of different Clustering Methods on a News Recommender System
Abinaya et al. Enhancing context-aware recommendation using hesitant fuzzy item clustering by stacked autoencoder based smoothing technique
Feng et al. Recommendations based on comprehensively exploiting the latent factors hidden in items’ ratings and content
Zhang et al. An interpretable and scalable recommendation method based on network embedding
Huang et al. Neural explicit factor model based on item features for recommendation systems
Idrissi et al. A new hybrid-enhanced recommender system for mitigating cold start issues
Su et al. A personalized music recommender system using user contents, music contents and preference ratings
CN116701861A (en) Post-fusion personalized recommendation model and method based on explicit and implicit feedback characteristics
Xu et al. Exploiting interactions of review text, hidden user communities and item groups, and time for collaborative filtering
Wang et al. Joint knowledge graph and user preference for explainable recommendation
Bi et al. A recommendations model with multiaspect awareness and hierarchical user-product attention mechanisms
Ji et al. Using category and keyword for personalized recommendation: A scalable collaborative filtering algorithm
Palomares et al. Multi-view data approaches in recommender systems: an overview
Ceylan et al. Combining feature weighting and semantic similarity measure for a hybrid movie recommender system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination