CN112883716B - Twitter abstract generation method based on topic correlation - Google Patents
Twitter abstract generation method based on topic correlation Download PDFInfo
- Publication number
- CN112883716B CN112883716B CN202110151630.9A CN202110151630A CN112883716B CN 112883716 B CN112883716 B CN 112883716B CN 202110151630 A CN202110151630 A CN 202110151630A CN 112883716 B CN112883716 B CN 112883716B
- Authority
- CN
- China
- Prior art keywords
- topic
- word
- tweet
- correlation
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000013598 vector Substances 0.000 claims abstract description 10
- 230000003993 interaction Effects 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 3
- 239000006185 dispersion Substances 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000011160 research Methods 0.000 description 8
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Economics (AREA)
- Computing Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a twitter abstract generating method based on topic correlation, which comprises the steps of establishing a word bank of each topic through the distribution of nouns in each topic; calculating the correlation between a piece of tweet and a certain theme through a special theme word bank of each theme and a word vector model obtained by training; calculating the public acceptance according to the network interaction information; the public acceptance and the subject correlation are integrated to obtain the final significance of the tweet; and performing redundancy removal processing by adopting a maximum marginal correlation algorithm, and outputting an abstract. According to the method, the tweets are selected as the abstracts from the topic correlation and the tweet significance, and the redundancy of the final abstracts is controlled, so that the generated tweet abstracts comprehensively consider the abstract topics, diversity and social recognition. Thereby obtaining the abstract with higher topic relevance, novelty and summarization.
Description
Technical Field
The technical field relates to a text summarization technology in natural language processing, which is used for automatically generating a subject summary of a tweet specific language. Specifically, given a particular topic and several pieces of textual tweets, a summary is obtained that is relevant to that topic.
Background
With the rapid development of social network media and self-media, abstract research for summarizing and summarizing mass data is promoted. Because social network data does not have a large-scale public data set, the summary research on the social network data is mostly a traditional unsupervised method at present. The method based on the statistical characteristics is mainly researched according to the relative position of sentences, word frequency characteristics and the like, and the method is easy to realize, but the obtained characteristics are often relatively simple; the method is based on a graph model, sentences in texts are regarded as nodes, similarity scores between the texts are regarded as edges between the nodes, the significance of each node is calculated based on the nodes and the weight values between the nodes, and the sentences with high significance are selected as abstracts; based on a data reconstruction method, the text is converted into a two-dimensional matrix, and n sentences which can maximally reconstruct the source text are found out through the matrix reconstruction method to serve as abstracts. Most of the research of twitter abstracts in recent years combines static and dynamic data of social networks, but still researches by taking a traditional method as a basic algorithm.
The existing twitter abstract research is used for abstracting the statement of a certain subject or a certain event, and people rarely study the abstract of the given subject. And the existing automatic summarization method does not utilize the common characteristics of large-scale social network data.
Disclosure of Invention
Aiming at the problem that specific subjects and social network data are not introduced into the existing abstract generating method, the invention establishes large-scale main social network data of different subjects based on statistics, and further designs an abstract generating method based on a subject thesaurus.
In order to achieve the above object, the technical solution adopted by the present invention is a twitter summary generation method based on topic correlation, comprising the following steps:
1) preprocessing and data cleaning are carried out on the original data to obtain a text pushing set, and network interaction information of the text pushing is extracted.
2) And counting the word frequencies of nouns, verbs and adjectives in each word set in the tweet set, then taking words with the word frequency ranking at the top 1% as candidate subject words, and filtering out the candidate subject words with the word frequency higher than k in other subjects as final subject word sets.
3) And selecting a topic which is closer to the source text from the topics as a given topic, and calculating the relevance of the tweet to the given topic according to the topic word set.
4) And calculating the public acceptance according to the network interaction information.
5) And (3) integrating the public acceptance and the topic relevance to obtain the final significance of the tweet, which is expressed as: RankScore ═ ω. SST+(1-ω)·R,SSTIs a sentence-to-topic T relevance measure, R is public acceptance, and omega is a hyperparameter.
6) And performing redundancy removal processing by adopting a maximum marginal correlation algorithm, and outputting an abstract.
By adopting the technical scheme, the invention has the following beneficial technical effects:
the invention provides a new method for measuring the relevance of topics aiming at the characteristics of themes and data sparsity of twitter platform data. Through the specific topic word library of each topic and the word vector model obtained by training, the correlation between a piece of tweet and a certain topic can be calculated, and therefore an abstract closer to a target topic is screened.
The invention better considers the distinctiveness of the speech of different subjects and the distribution of the whole data set by establishing the word bank of each subject.
The invention adopts a new maximum marginal correlation algorithm to reduce redundant information and considers the coverage and diversity of the abstract. Therefore, the abstract with better information summarization and more novel content is obtained.
The method effectively combines social network data and integrates the social network data into public identity as a selection granularity of the abstract. For a piece of tweet published by a user, the public interaction amount represents the attention degree of people and the recognition degree of the piece of tweet information. Generally, the high degree of attention and the low degree of identity of people to a piece of information indicates that the fluency of the text is higher and the information is richer, and the purpose of text summarization is to select sentences with high information coverage, novelty and summarization. Therefore, the interactive information is integrated into the algorithm, the information is richer, and the content is more fluent.
In conclusion, the method selects the tweet as the abstract from the topic correlation and the tweet significance, and controls the redundancy of the final abstract, so that the generated tweet abstract comprehensively considers the abstract topic, diversity and social recognition. Thereby obtaining the abstract with higher topic relevance, novelty and summarization.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
In consideration of the theme and data sparsity of social network data, most of researches firstly carry out sparseness-removing screening on the tweets according to the theme and then carry out abstract research on the screened tweets. The abstract is extracted according to the tweet of a given subject, the obtained abstract has better subject relevance, the previous research is usually directed to summarization of the abstract and coverage of a source text, and few people consider the subject relevance of the abstract. In the social network data, people publish a certain statement, which is usually related to a certain topic, and the topic discussed in the social network data is different for different users and different time periods. While summarizing a piece of language, if the subject of the summary is specified, we necessarily want to obtain a summary more relevant to the subject. Therefore, the invention designs an abstract method considering the theme. The method predefines a plurality of prior topics and topic word banks on the basis of large-scale social network data training. The method has the technical effects that: given some prior theme and several pieces of text of the some prior theme.
The word frequency inverse document frequency algorithm (TF-IDF) characterizes the importance of a word in a text segment to some extent. The main idea is that the higher the frequency of a word appearing in a piece of text, and the lower the frequency in the whole document, the higher its importance. Since the overall social network data usually includes a plurality of topics, if the text data of each topic is regarded as a special class, the frequency distribution of words in each class is necessarily different, so we consider that if a word appears more in a certain class but rarely appears in other classes, the word is a "common word" of the topic class. Thus, a word stock which is specific to each theme can be established. Meanwhile, the tweets are rich in social network interaction information, for example, each tweet has forwarding amount, praise number, comment number and the like, and the social recognition represents the fluency, integrity and generalization of the expression of the tweets to a certain extent, so that the tweets are used as one granularity of the selected abstract. Based on the above discussion, the invention designs a twitter summarization method based on social network interaction information and topic relevance, and the method comprises the following specific steps, with reference to fig. 1:
1. preparing data: because of the lack of published tweet linguistic data and summary linguistic data, we collected several tweets with published Twitter API during the year 2019 and 2020. After the original data are obtained, firstly carrying out sparsification removal treatment: and counting the word frequency of the nouns in all the tweets, and screening the top n topic nouns as hot topic words. And then, filtering the text by the prior subject words, and if the speech in all the linguistic data relates to the n topics or the topic tags of the speech relate to the topics, classifying the speech into the categories of the topics related to the speech. Finally we have n tweets, each relating to a topic.
2. And (6) data cleaning. Firstly removing noisy information such as Hashtag, @, URL, numbers at the tail of the tweet and the like, and then removing the tweet with the number of words less than m in the tweet. And (3) the user pushes the praise, forwards and reviews the number of the texts, extracts the comments through the regular expression, and sets the comment to 0 if the comment is not extracted. Finally, n processed tweet sets are obtained. And training a word vector model by using the cleaned data set through a skip-gram model for the requirement of calculating the similarity between words later.
3. Making a topic word set: words of different parts of speech in the dataset are first identified by a stanza named entity recognition tool. And counting the word frequency of the adjective words of the noun verb appearing in each word set, and then taking the words with the word frequency ranking 1% as candidate subject words. Considering that some words may be common nouns or have strong relevance with multiple topics, candidate subject words with a word frequency greater than k in other topics are filtered out as a final subject word set.
4. Topic relevance: the relevance of a piece of tweet to a certain topic is calculated by the following method after the topic word set is obtained:
sim(a,b)=(a·bT)/(|a|·|b|)
s(w,ti)=sim(emb[ti],emb[w])ti∈Twords
F(w,T)=max|s(w,t1),s(w,t2),...,s(w,tn)|
wherein the sim function is used for calculating cosine similarity between two word vectors, and a and b respectively represent the two word vectors; sr is a regular term of the length of the sentence and is used for balancing the tendency that the model is easy to select a shorter sentence; l is a set of noun verb adjectives in a sentence, LiThe ith sentence is pointed, and m represents the maximum text pushing amount in the text pushing set; s (w, t)i) Function computation word w and word tiThe similarity of (2); f (w, T) is the degree of membership from the word w to the topic T; the emb is a word embedding model for converting the word id into a word vector; the word embedding model is a word vector model obtained by training through a skip-gram method; t is a unit ofwordsA set of topic words that are a topic; SSTThe relevance measurement from a sentence to a topic T is carried out, and sigma is an adjustable hyper-parameter; n denotes the number of tweets in the source text, L [ i ]]Representing the ith word in L.
5. Public acceptance: if one tweet is forwarded, praise and comments of the tweet are more than those of other tweets, the relative acceptance of the tweet in the document is considered to be higher than those of other tweets, and the calculation formula is as follows:
Ri=α·ci+β·rei+γ·li
wherein, ci、rei、liThe values are respectively the dispersion standardized values of the ith twitter praise number, the forwarding number and the comment number, and alpha, beta and gamma are adjustable hyper-parameters and satisfy that alpha + beta + gamma is 1. RiIndicating public acceptance.
6. And (3) integrating the public acceptance and the topic correlation information, wherein the final significance of the tweet is as follows:
RankScore=ω·SST+(1-ω)·R
omega is an adjustable hyper-parameter and is used for coordinating two kinds of information.
7. Redundancy penalty strategy: in order to ensure that the redundancy of the screened abstract is as small as possible, the strategy of the invention adopting the improved maximum Marginal correlation MMR (maximum Marginal Relevance) is as follows:
1) initializing a setA represents the set used to store the abstract, B represents the set of the tweets sorted by their saliency scores, xiThe ith tweet is represented, and n represents the total tweet number. Wherein the significance score of each tweet is by Si=RankScore(i)Calculating;
2) taking the ith element x from the set BiIf xiSatisfies the following conditions:
len(set(xi)∩set(s*))<k s*∈A
then x isiMoving from set B to set a, where epsilon is a hyperparameter representing a threshold of similarity. Otherwise will xiAnd is deleted from the B set. set (x)i) Represents a pair xiThe word set after the duplication of the word in (1) is removed, and k represents a threshold value of the word set.
Where len () is used to calculate xiAnd s*The set function is used for set element deduplication.
Reference to the literature
[1] Hoechamine, wubo, penhao, zhangyan chong, li jiangxin microblog emergency detection method and device based on semantic expansion [ P ]. beijing city: CN106886567B,2019-11-08.
[2] Tenghui, Liu Shimeng, Longfei A convolutional neural network-based microblog news abstract extraction type generation method [ P ]. Beijing City: CN110362674B,2020-08-04.
[3] Herosafa, guangchua, shanghai, wushui, huqing, topic-oriented multi-microblog time-series summarization method [ P ]. tianjin city: CN105740448B,2019-06-25.
[4] Congress, dawn, zhangxuefen, lie sanfei summary method based on social media microblog specific topics [ P ]. tianjin: CN107992634A,2018-05-04.
[5] Hoechamine, wubo, penhao, zhangyang, li jiangxin microblog emergency detection method and device based on semantic expansion [ P ]. beijing: CN106886567A,2017-06-23.
[6] A method for generating an abstract of a self-adaptive microblog topic [ P ]. beijing: CN106503064A, 2017-03-15.
Claims (6)
1. The twitter abstract generating method based on the topic correlation is characterized by comprising the following steps of:
1) preprocessing and cleaning original data to obtain a tweet set, and extracting network interaction information of the tweet;
2) counting word frequencies of nouns, verbs and adjectives appearing in each word set in the tweet set, then taking words with the word frequencies ranked at the top 1% as candidate subject words, and filtering out candidate subject words with the word frequencies higher than k in other subjects as a final subject word set;
3) selecting a theme which is closer to the source text from the themes as a given theme, calculating the relevance of the tweet to the given theme according to the theme word set, and calculating the relevance of the tweet to a certain theme by the following method:
sim(a,b)=(a·bT)/(|a|·|b|)
s(w,ti)=sim(emb[ti],emb[w])ti∈Twords
F(w,T)=max|s(w,t1),s(w,t2),...,s(w,tn)|
wherein the sim function is used for calculating cosine similarity between two word vectors, and a and b respectively represent the two word vectors; sr is the length regular term of the sentence; l is the set of noun verb adjectives in the current sentence, LiThe ith sentence is pointed, and m represents the maximum text pushing amount in the text pushing set; s (w, t)i) Function computation word w and word tiThe similarity of (2); f (w, T) is the degree of membership from the word w to the topic T; t iswordsA set of topic words that are a topic; the emb is a word embedding model for converting the word id into a word vector; SSTFor a sentence-to-topic T relevance metric, σ is an adjustable hyperparameter, n represents the number of tweets in the source text, L [ i [ ]]Represents the ith word in L;
4) calculating the public acceptance according to the network interaction information, wherein the public acceptance is calculated according to the following formula: ri=α·ci+β·rei+γ·liWherein c isi、rei、liRespectively the dispersion standardized values of the praise number, the forwarding number and the comment number of the ith deduction, wherein alpha, beta and gamma are adjustable hyper-parameters and satisfy that alpha + beta + gamma is 1, RiThe public acceptance of the ith tweet is represented;
5) and (3) integrating the public acceptance and the topic relevance to obtain the final significance of the tweet, which is expressed as: RankScore ═ ω. SST+(1-ω)·R,SSTThe relevance measure from a sentence to a topic T is shown, R is public identity, and omega is a hyperparameter;
6) and performing redundancy removal processing by adopting a maximum marginal correlation algorithm, and outputting an abstract.
2. The topic correlation-based twitter summary generation method according to claim 1, wherein: step 1) the pretreatment comprises: firstly, carrying out sparsification removal processing on original data, counting noun word frequencies of all tweets, and screening out the top n topic nouns as hot topic words; and then, filtering the text pushing through the prior subject words, if the speech in all the linguistic data relates to the n topics or the topic labels carried by the speech relate to the n topics, classifying the speech into the categories of the topics related to the speech, and finally obtaining n text pushing sets, wherein each text pushing set relates to one topic.
3. The topic correlation-based twitter summary generation method according to claim 2, wherein: step 1) the data cleaning comprises removing Hashtag, @ and URL and numbers of the last tail of the tweet, and then removing the tweet with the number of words less than m in the tweet.
4. The topic correlation-based twitter summary generation method according to claim 1 or 3, wherein: the network interaction information for extracting the tweet comprises the praise, forwarding and comment quantity extracted by the regular expression.
5. The topic correlation-based twitter summary generation method according to claim 1, wherein: the word embedding model is obtained by training a skip-gram model by using the cleaned data set.
6. The topic correlation-based twitter summary generation method according to claim 1, wherein: the specific steps of the maximum marginal relevance algorithm for redundancy removal processing are as follows:
1) initializing a setA represents the set used to store the abstract, B represents the set of the tweets sorted by their saliency scores, xiThe ith tweet is represented, and n represents the total tweet quantity;
2) taking the ith element x from the set BiIf xiSatisfies the following conditions:
len(set(xi)∩set(s*))<k s*∈A
then x isiMove from B set to A set, otherwise xiDeleting from the B set; len function is used to calculate xiAnd s*The set function is used for set element deduplication; set (x)i) Represents a pair xiThe words in the Chinese language are subjected to de-duplication to form a word set, and k represents a threshold value of the word set;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110151630.9A CN112883716B (en) | 2021-02-03 | 2021-02-03 | Twitter abstract generation method based on topic correlation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110151630.9A CN112883716B (en) | 2021-02-03 | 2021-02-03 | Twitter abstract generation method based on topic correlation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112883716A CN112883716A (en) | 2021-06-01 |
CN112883716B true CN112883716B (en) | 2022-05-03 |
Family
ID=76057037
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110151630.9A Active CN112883716B (en) | 2021-02-03 | 2021-02-03 | Twitter abstract generation method based on topic correlation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112883716B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114254624B (en) * | 2021-12-01 | 2023-01-31 | 马上消费金融股份有限公司 | Method and system for determining website type |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1916904A (en) * | 2006-09-01 | 2007-02-21 | 北大方正集团有限公司 | Method of abstracting single file based on expansion of file |
US8078450B2 (en) * | 2006-10-10 | 2011-12-13 | Abbyy Software Ltd. | Method and system for analyzing various languages and constructing language-independent semantic structures |
US8930376B2 (en) * | 2008-02-15 | 2015-01-06 | Yahoo! Inc. | Search result abstract quality using community metadata |
CN102622411A (en) * | 2012-02-17 | 2012-08-01 | 清华大学 | Structured abstract generating method |
CN103150333B (en) * | 2013-01-26 | 2016-01-13 | 安徽博约信息科技有限责任公司 | Opinion leader identification method in microblog media |
CN105740448B (en) * | 2016-02-03 | 2019-06-25 | 天津大学 | More microblogging timing abstract methods towards topic |
CN106126620A (en) * | 2016-06-22 | 2016-11-16 | 北京鼎泰智源科技有限公司 | Method of Chinese Text Automatic Abstraction based on machine learning |
CN106886567B (en) * | 2017-01-12 | 2019-11-08 | 北京航空航天大学 | Microblogging incident detection method and device based on semantic extension |
CN108920611B (en) * | 2018-06-28 | 2019-10-01 | 北京百度网讯科技有限公司 | Article generation method, device, equipment and storage medium |
CN110362674B (en) * | 2019-07-18 | 2020-08-04 | 中国搜索信息科技股份有限公司 | Microblog news abstract extraction type generation method based on convolutional neural network |
CN110990676A (en) * | 2019-11-28 | 2020-04-10 | 福建亿榕信息技术有限公司 | Social media hotspot topic extraction method and system |
CN111125349A (en) * | 2019-12-17 | 2020-05-08 | 辽宁大学 | Graph model text abstract generation method based on word frequency and semantics |
CN112100317B (en) * | 2020-09-24 | 2022-10-14 | 南京邮电大学 | Feature keyword extraction method based on theme semantic perception |
-
2021
- 2021-02-03 CN CN202110151630.9A patent/CN112883716B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN112883716A (en) | 2021-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108197111B (en) | Text automatic summarization method based on fusion semantic clustering | |
Ombabi et al. | Deep learning framework based on Word2Vec and CNNfor users interests classification | |
CN111177365A (en) | Unsupervised automatic abstract extraction method based on graph model | |
Tiwari et al. | Ensemble approach for twitter sentiment analysis | |
Banik et al. | Toxicity detection on bengali social media comments using supervised models | |
CN113962293A (en) | LightGBM classification and representation learning-based name disambiguation method and system | |
Zhou et al. | Neural storyline extraction model for storyline generation from news articles | |
Chang et al. | A METHOD OF FINE-GRAINED SHORT TEXT SENTIMENT ANALYSIS BASED ON MACHINE LEARNING. | |
Widjanarko et al. | Multi document summarization for the Indonesian language based on latent dirichlet allocation and significance sentence | |
Arifin et al. | Emotion detection of tweets in Indonesian language using non-negative matrix factorization | |
CN112883716B (en) | Twitter abstract generation method based on topic correlation | |
Cai et al. | Indonesian automatic text summarization based on a new clustering method in sentence level | |
De Saa et al. | Self-reflective and introspective feature model for hate content detection in sinhala youtube videos | |
Zhu et al. | NUDTSNA at TREC 2015 Microblog Track: A Live Retrieval System Framework for Social Network based on Semantic Expansion and Quality Model. | |
Trad et al. | A framework for authorial clustering of shorter texts in latent semantic spaces | |
CN111899832B (en) | Medical theme management system and method based on context semantic analysis | |
Gu et al. | Controllable citation text generation | |
Konkaew et al. | Automatic tag recommendation approach with keyphrase extraction and word embedding techniques | |
CN108256055B (en) | Topic modeling method based on data enhancement | |
Shyang et al. | A text augmentation approach using similarity measures based on neural sentence embeddings for emotion classification on microblogs | |
Yu et al. | Hot event detection for social media based on keyword semantic information | |
Steuber et al. | Embedding semantic anchors to guide topic models on short text corpora | |
CN112527964B (en) | Microblog abstract generation method based on multi-mode manifold learning and social network characteristics | |
Jiang et al. | Parallel dynamic topic modeling via evolving topic adjustment and term weighting scheme | |
Alfarra et al. | Graph-based Growing self-organizing map for Single Document Summarization (GGSDS) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |