CN112527964B - Microblog abstract generation method based on multi-mode manifold learning and social network characteristics - Google Patents

Microblog abstract generation method based on multi-mode manifold learning and social network characteristics Download PDF

Info

Publication number
CN112527964B
CN112527964B CN202011503521.0A CN202011503521A CN112527964B CN 112527964 B CN112527964 B CN 112527964B CN 202011503521 A CN202011503521 A CN 202011503521A CN 112527964 B CN112527964 B CN 112527964B
Authority
CN
China
Prior art keywords
microblog
significance
information
social network
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011503521.0A
Other languages
Chinese (zh)
Other versions
CN112527964A (en
Inventor
夏书银
曹洋洋
陈子忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202011503521.0A priority Critical patent/CN112527964B/en
Publication of CN112527964A publication Critical patent/CN112527964A/en
Application granted granted Critical
Publication of CN112527964B publication Critical patent/CN112527964B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses a microblog abstract generating method based on multi-modal manifold learning and social network characteristics, which comprises the following steps of: acquiring a microblog set of a specific topic of a user and user interaction information; constructing a relation matrix in the text and a relation matrix across the text; calculating the microblog significance by combining the matrix; calculating social acceptance by combining the user interaction information; and combining the microblog significance with the social recognition to obtain the final microblog significance, and selecting a plurality of sentences with the highest significance to become the abstract. The invention improves the popular learning method commonly used in the multi-document abstract, integrates the social network information, better utilizes sentence relation characteristics among different subject documents and sentence relation characteristics in the same document, simultaneously adopts the maximum boundary correlation algorithm (MMR) to reduce redundant information, and considers the coverage and diversity of the abstract.

Description

Microblog abstract generation method based on multi-mode manifold learning and social network characteristics
Technical Field
The invention relates to an automatic text summarization technology in natural language processing, in particular to an automatic generation of a microblog speech summary based on multi-modal manifold learning and social network characteristics.
Background
The rapid development of social networking media, such as twitter and microblog, provides a large amount of information for people and increases the cost for acquiring effective information. Therefore, microblog abstract research for compressing and summarizing massive microblog information becomes necessary. At present, the main research methods of microblog abstractions comprise: (1) based on the traditional abstraction method: sumbic, Textrank, Lexrank, Centriod, Data Reconstruction. (2) Utilizing social network static and dynamic data: and summarizing discussions of people under a certain topic, such as praise number, microblog forwarding number, user influence and the like. Most of the latest research methods are methods for combining two kinds of materials: some of these are microblog saliency calculations using static social network information, such as (Li et al, 2012) in combination with the forwarded number of a certain microblog and a following-following representing the user's influence. Still other social network information based on dynamic, such as the social network relationship information of people considered in the way of Heyu, etc., provide a social network abstract algorithm with lower redundancy; (Duan Y et al, 2012) combines the static and dynamic information, sorts the microblogs based on microblog publication time, weights the utterance based on the influence of the user and the content quality of the utterance, and calculates the sentence significance. In addition, research is performed based on the time sequence of the microblogs, for example, (Nichols et al, 2012) for abstracting a certain event, a timestamp of the microblog can be used as a node for detecting the occurrence of the event according to a characteristic, and a peak value appears in a change curve of the number of posts when the event occurs.
Disclosure of Invention
The existing research is usually to summarize according to a hot topic in a certain time or a certain event, when the existing research is applied to a user speech summarization, the effect is not ideal, and meanwhile, for some algorithms such as a Data Reconstruction algorithm (Data Reconstruction), the problem of high complexity exists. The method improves the common manifold learning method in the multi-document abstract, integrates the social network information, better utilizes sentence relation characteristics among different subject documents and sentence relation characteristics in the documents, simultaneously reduces redundant information by adopting MMR, and gives consideration to the coverage and diversity of the abstract.
The technical scheme adopted by the invention is as follows: a microblog abstract generating method based on multi-modal manifold learning and social network characteristics comprises the following steps:
step one, acquiring a microblog set of a specific topic of a user and user interaction information;
secondly, constructing a text relation matrix in a single document and a text relation matrix between cross documents;
thirdly, calculating the microblog significance by combining the matrix in the second step;
step four, calculating social acceptance by combining the user interaction information;
and step five, integrating the microblog significance and social identity information to obtain the final microblog significance, and selecting a plurality of sentences with the highest microblog significance under the maximum border correlation algorithm (MMR) strategy to be abstracted in consideration of redundancy.
Specifically, the step of acquiring the microblog set of the specific topic of the user in the step one comprises the steps of counting the word frequencies of nouns in all acquired microblog texts, screening the top n topic nouns as hot topic words, then screening the user through prior topic words, if the speeches published by the user relate to the n topics and exceed k, reserving the speeches, and then integrating the speeches of the user on each class into one sample.
In the technical scheme, the method further comprises the step of cleaning the microblog set of the specific topic, specifically removing numbers of Hashtag, @ URL and microblog tail, and removing the microblog with the number of words less than m in the microblog.
The user interaction information comprises the number of praise, forwarding and comment of the user microblog, is extracted through a regular expression, and is set to be 0 if the user interaction information is not extracted.
Specifically, step two is a text relation matrix in the same document
Figure GDA0003656539950000021
If the microblogs i and j belong to the same topic, then
Figure GDA0003656539950000022
Is xiAnd xjResidual chord similarity, otherwise
Figure GDA0003656539950000023
Is 0, xiAnd expressing TF-IDF codes of a single microblog.
The text relation matrix between the cross-documents
Figure GDA0003656539950000024
If i, j belong to different documents or if one of i, j is 0, then
Figure GDA0003656539950000025
Is xiAnd xjResidual chord similarity measure, otherwise
Figure GDA0003656539950000026
Is 0.
Specifically, the microblog significance of the third step is calculated by the following formula:
Qa(f)=μT·(1-Sa)f+(1-μ)·(f-y)T(f-y)
Figure GDA0003656539950000027
Qb(f)=η·fT(1-Sb)f+(1-η)·(f-y)T(f-y)
Figure GDA0003656539950000028
Figure GDA0003656539950000029
wherein Qa(f)、Qb(f) For the Loss function, f is a vector representing the saliency score of each sentence, when Qa(f) Middle f is reachedTo the minimum time
Figure GDA00036565399500000210
When, the representation f well considers the sentence relationship with the document. When Q isb(f) When medium f reaches the minimum
Figure GDA00036565399500000211
When representing f, the different document sentence relationships are well considered. y ═ y0,y1,y2,...,yn]TAnd y represents that given a set of data points, the first point represents the topic description sentence point and the remaining n points represent all sentences in the document (data points to be sorted). μ, η represents an information smoothness constraint that considers the subject information and the text information. λ represents an information smoothness constraint between two modalities that consider both co-document information and cross-document information.
Specifically, the social recognition degree in the fourth step is calculated according to the formula
Ri=α·ci+β·rei+γ·li
Wherein, ci、rei、liThe values are respectively the dispersion standardized values of the praise number, the forwarding number and the comment number of the ith microblog, and alpha, beta and gamma are hyper parameters and meet the condition that alpha + beta + gamma is 1.
Further, the final microblog significance is
RankScore=ω·f*+(1-ω)·R
ω is an adjustable hyper-parameter, where 0< ω < 1. And R represents the final microblog significance.
In order to ensure that the redundancy of the screened abstract is as small as possible, the scheme further comprises a redundancy removing step, which specifically comprises the following steps:
1) the set a and the set B are initialized,
Figure GDA0003656539950000031
B={xi1,2,. n }, wherein A represents a set for storing summary microblogs, B represents a set of candidate microblogs sorted according to the microblog significance scores, and x representsiIndicating the ith microblogAnd n represents the total number of microblogs. Wherein the significance score of each microblog is by Si=RankScore(i)Calculating;
2) sorting the microblogs in the set B according to the significance scores;
3) taking the first element x from set BiIf xiSatisfies the following conditions:
Figure GDA0003656539950000032
wherein s represents the microblog in the A set.
Then x isiMoving from the set B to the set A, wherein epsilon is a hyper-parameter and represents a threshold value of the similarity, and otherwise, deleting;
4) repeating the step 3) until
Figure GDA0003656539950000033
And the number of the microblogs in the set A or the set A reaches the expected digest length.
The invention has the following beneficial technical effects:
1. the multi-mode manifold learning algorithm is improved, so that the method can be applied to multi-topic microblog text abstracts without topic sentences. Specifically, it is considered that the status of no subject sentence is the same as that of the subject sentence and other microblog sentences, so that y in the original algorithm is equal to [ y ═ y0,y1,y2,...,yn]TWherein y is0=1;
Figure GDA0003656539950000034
yi0, modified by y0=yi1. Thus, the status of the first sentence is completely parallel to the status of the other sentences, and the subject sentence disappears. Therefore, the algorithm can be applied to the microblog data set, so that the microblog abstract is not interfered by the 'subject sentence' information when being generated, and the information consistency and complementarity among a plurality of documents are well considered.
2. And integrating social network interaction information of the microblog, such as praise number, comment number forwarding number and the like into an abstract algorithm. So as to obtain the abstract with high information coverage, novelty and summarization. The user issues a microblog, and the interaction amount of the friends of the user and the people who browse the microblog represents the attention degree of the people and the recognition degree of the microblog information. Generally, the information coverage, novelty and summarization of the microblog are indicated to a certain degree by the attention and the recognition of a piece of information, and the text abstract is just to select sentences with high information coverage, novelty and summarization. Therefore, the interactive information is integrated into the algorithm, and the abstract with better information coverage, novelty and summarization can be obtained.
In conclusion, the method develops a microblog abstract generating algorithm considering social identity and consistency and complementarity of information among different documents aiming at the particularity of the microblog speeches of the users covering a plurality of topics. Therefore, the abstract with better information coverage, novelty and summarization is obtained. This is the advantage of the present invention.
Drawings
FIG. 1 is a schematic process flow diagram of the present invention.
Detailed Description
Referring to fig. 1, when the method is used for generating the multi-topic text abstract, a microblog is selected as the abstract from the social identity and the microblog significance, and the similarity of the final abstract is controlled, so that the coverage, diversity and social identity of the generated microblog abstract are comprehensively considered. .
Considering the diversity of the user abstract topics, the task can be regarded as a multi-document abstract task, and the multi-modal manifold learning is a widely used multi-document abstract method, which comprehensively considers the full-text topics, the intra-document importance and the inter-document importance. Aiming at the user abstract, as no predefined topic sentence information exists, the method improves the multi-mode popular learning algorithm, so that the method can be applied to the microblog abstract without topic sentences. Meanwhile, because the microblog contains social network interaction information, such as the forwarding amount, praise number, comment number and the like of each microbump, and considering that the possibility that a language with high social recognition degree is used as a summary is higher, the invention designs a microblog summary method combining the social network interaction information and multi-mode popular learning, which comprises the following specific steps:
1. preparing data: because of lack of public microblog corpora and abstract corpora, the original data come from user microblog data obtained through a public microblog API, and finally 500 users are sorted, and the number of microblogs of each user is not more than 1000. All user ids are replaced numerically, taking into account user privacy. And counting the word frequency of the nouns in all the microblogs, and screening the first n topic nouns as hot topic words. And then screening the users through prior subject terms, if the language published by the users relates to the n topics and exceeds k, reserving the language, and integrating the language of the users on each category into a sample. And after a microblog set of a specific topic of the user is obtained, further cleaning the data. Firstly removing noisy information such as Hashtag, @, URL, and the number at the tail of the microblog, and then removing the microblog with the number of words less than m in the microblog. And e, the number of praise, forwarding and comment of the user microblog is extracted through a regular expression, and if the number of praise, forwarding and comment of the user microblog is not extracted, the number of praise is set to 0.
2. The main ideas of the multi-mode popular learning in the text abstract are as follows: the sentence relation in the multi-document abstract can be divided into a relation in the same document and a relation between texts in different documents, which respectively reflect the text information coverage and the full text information coverage of the sentences, and based on the difference of the two relations, the relation between the two sentences can be represented by two matrixes. And combining the two kinds of information to obtain the final microblog significance, and selecting a plurality of sentences with the highest significance to become the abstract.
Encoding a microblog relation matrix: by using
Figure GDA0003656539950000041
Representing a matrix of relationships between sentences within the same document,
Figure GDA0003656539950000042
representing all sentences in the document (data points to be sorted). By xiPresentation sheetAnd (4) carrying out TF-IDF coding on the microblog. If the microblogs i and j belong to the same topic, then
Figure GDA0003656539950000043
Is xiAnd xjResidual chord similarity, otherwise
Figure GDA0003656539950000044
Is 0. In a similar manner to that described above,
Figure GDA0003656539950000045
representing a text relationship matrix across documents, if i, j belong to different texts or if one of i, j is 0
Figure GDA0003656539950000046
Is xiAnd xjResidual chord similarity measure, otherwise
Figure GDA0003656539950000047
Is 0. Then will be
Figure GDA0003656539950000048
Is normalized as Sa、SbThe regularization method is Sx=(Dx)(-1/2)Wx(Dx)(-1/2)Wherein D isxIs formed by WxThe sum of the row elements of each column constitutes a diagonal matrix.
The microblog significance: firstly, calculating the sentence significance score in each modality, and then combining the information of the two modalities, wherein the calculation formula is as follows:
Qa(f)=μT·(1-Sa)f+(1-μ)·(f-y)T(f-y)
Figure GDA0003656539950000049
Qb(f)=η·fT(1-Sb)f+(1-η)·(f-y)T(f-y)
Figure GDA00036565399500000410
Figure GDA00036565399500000411
Qa(f)、Qb(f) for the Loss function, f is a vector representing the saliency score of each sentence, when Qa(f) When f reaches the minimum value, i.e. fa *When f, it means that f well considers the sentence relationship with the document. When Q isb(f) When f reaches the minimum value, i.e. fb *When f, the representation f takes into account the different document sentence relationships well. y ═ y0,y1,y2,…,yn]TAnd y represents that given a set of data points, the first point represents the topic description sentence point and the remaining n points represent all sentences in the document (data points to be sorted). μ, η represents an information smoothness constraint that considers the subject information and the text information. The lambda representation considers information smoothness constraints between the two modalities of the same document and different documents.
Social recognition: if one microblog is forwarded, praise and the number of comments of the microblog are more than those of other microblogs, the relative acceptance of the microblog in the document is considered to be higher than that of other microblogs, and the calculation formula is as follows:
Ri=α·ci+β·rei+γ·li
wherein, ci、rei、liThe values are respectively the dispersion standardized values of the praise number, the forwarding number and the comment number of the ith microblog, and alpha, beta and gamma are hyper parameters and meet the condition that alpha + beta + gamma is 1.
And integrating the social identity and the microblog significance information, wherein the final microblog significance is as follows:
RankScore=ω·f*+(1-ω)·R
omega represents a hyper-parameter and represents smoothness constraint considering two modal information of social acceptance and microblog significance. And R represents the final microblog significance.
Redundancy penalty strategy: to ensure that the redundancy of the selected digests is as small as possible, our strategy for mmr (maximum local retrieval) is as follows:
1) initializing a set
Figure GDA0003656539950000051
B={xi1,2,. n }, wherein A represents a set for storing summary microblogs, B represents a set of candidate microblogs sorted according to microblog significance scores, and x representsiThe ith microblog is represented, and n represents the total number of microblogs. The significance score of each microblog is calculated according to the formula Si=RankScore(i)
2) Ordering microblogs in the B set according to the significance scores
3) Taking the first element x from set BiIf xiSatisfies the following conditions:
Figure GDA0003656539950000052
wherein s represents the microblog in the set A.
Then x isiMoving from set B to set a, where epsilon is a hyperparameter representing a threshold of similarity. Otherwise delete
4) Repeating step 3) until or
Figure GDA0003656539950000053
And A or A sets of microblogs reach the expected digest length.
Reference to the literature
[1] Hoechamine, wubo, penhao, zhangyan chong, li jiangxin microblog emergency detection method and device based on semantic expansion [ P ]. beijing city: CN106886567B,2019-11-08.
[2] Tenghui, Liu Shimeng, Longfei A convolutional neural network-based microblog news abstract extraction type generation method [ P ]. Beijing City: CN110362674B,2020-08-04.
[3] Herzufang, guangchua, dangjianwu, huqinghua, topic-oriented multi-microblog time sequence summarization method [ P ]. tianjin city: CN105740448B,2019-06-25.
[4] Congress, duxuefe, zhangxiefe, lie sanfei summary method [ P ] based on social media microblog specific topics: CN107992634A,2018-05-04.
[5] Hoechamine, wubo, penhao, zhangyang, li jiangxin microblog emergency detection method and device based on semantic expansion [ P ]. beijing: CN106886567A,2017-06-23.
[6] A method for generating an abstract of a self-adaptive microblog topic [ P ]. beijing: CN106503064A,2017-03-15.

Claims (7)

1. The microblog abstract generation method based on the multi-mode manifold learning and social network characteristics is characterized by comprising the following steps of:
step one, acquiring a microblog set of a specific topic of a user and user interaction information;
secondly, constructing a text relation matrix in a single document and a text relation matrix between cross documents;
relationship matrix within the same document
Figure FDA0003656539940000011
If the microblogs i and j belong to the same topic, then
Figure FDA0003656539940000012
Is xiAnd xjThe residual lengths are similar, otherwise, the order
Figure FDA0003656539940000013
Is 0, xiThe TF-IDF codes represent a single microblog;
text relation matrix across documents
Figure FDA0003656539940000014
If i, j belong to different documents or if one of i, j is 0, then
Figure FDA0003656539940000015
Is xiAnd xjThe residual lengths are similar otherwise
Figure FDA0003656539940000016
Is 0;
step three, calculating the microblog significance by combining the matrix in the step two and calculating by the following formula:
Qa(f)=μT·(1-Sa)f+(1-μ)·(f-y)T(f-y)
Figure FDA0003656539940000017
Qb(f)=η·fT(1-Sb)f+(1-η)·(f-y)T(f-y)
Figure FDA0003656539940000018
Figure FDA0003656539940000019
Qa(f)、Qb(f) for the Loss function, f is a vector representing the saliency score of each sentence, when Qa(f) When f reaches the minimum
Figure FDA00036565399400000110
When f is good, the relation of the sentence with the document is considered, when Q isb(f) When f reaches the minimum
Figure FDA00036565399400000111
When f is good at considering different document sentence relations, y ═ y0,y1,y2,...,yn]TY denotes a given set of data points, the first point representing the topic description sentence point and the remaining n points representing all sentences in the document, μ, η representing the information smoothness constraint considering topic information and text information, respectively, and λ denotes the information smoothness constraint considering the same document and different documentsShifting information smoothness constraints between two modalities;
step four, calculating social acceptance by combining the user interaction information;
and step five, integrating the microblog significance and the social identity information to obtain the final microblog significance, and selecting a plurality of sentences with the highest microblog significance under the MMR strategy to be abstracts in consideration of redundancy.
2. The method for generating the microblog digest based on the multi-modal manifold learning and social network features according to claim 1, wherein the method comprises the following steps: the step of acquiring the microblog set of the specific topic of the user comprises the steps of counting word frequencies of nouns in all acquired microblog texts, screening top n topic nouns as hot topic words, screening the user through prior topic words, if the speeches published by the user relate to the n topics and exceed k, reserving the speeches, and integrating the speeches of the user on each class into a sample.
3. The method for generating the microblog digest based on the multi-modal manifold learning and social network features according to claim 2, wherein the method comprises the following steps: the method further comprises the step of cleaning the microblog set of the specific topic, specifically, removing numbers of Hashtag, @ URL and microblog tail, and removing the microblog with the number of words less than m in the microblog.
4. The method for generating the microblog digest based on the multi-modal manifold learning and social network features according to claim 1, wherein the method comprises the following steps: the user interaction information comprises the number of praise, forwarding and comment of the user microblog, is extracted through a regular expression, and is set to be 0 if the user interaction information is not extracted.
5. The method for generating the microblog digest based on the multi-modal manifold learning and social network features according to claim 1, wherein the method comprises the following steps: step four, the calculation formula of the social recognition degree is
Ri=α·ci+β·rei+γ·li
Wherein, ci、rei、liThe values are respectively the dispersion standardized values of the praise number, the forwarding number and the comment number of the ith microblog, and alpha, beta and gamma are hyper parameters and meet the condition that alpha + beta + gamma is 1.
6. The method for generating the microblog abstract based on the multi-modal manifold learning and the social network features according to any one of claims 1 to 5, wherein the method comprises the following steps: the final microblog significance is
RankScore=ω·f*+(1-ω)·R
ω is an adjustable hyper-parameter, where 0< ω <1, R represents the final microblog prominence.
7. The method for generating the microblog digest based on the multi-modal manifold learning and social network features according to claim 6, wherein the method comprises the following steps: the method further comprises a redundancy removing step which specifically comprises the following steps:
1) the set a and the set B are initialized,
Figure FDA0003656539940000021
B={xi1,2,. n }, wherein A represents a set for storing summary microblogs, B represents a set of candidate microblogs sorted according to microblog significance scores, and x representsiRepresenting the ith microblog, n representing the total number of microblogs, wherein the significance score of each microblog is Si=RankScore(i)Calculating;
2) sorting the microblogs in the set B according to the significance scores;
3) taking the first element x from set BiIf xiSatisfies the following conditions:
Figure FDA0003656539940000022
s represents the microblog in the A set;
then x isiMoving from B set to A set, where ε is a hyperparameter representing a threshold of similarity, otherwise deletingRemoving;
4) repeating the step 3) until
Figure FDA0003656539940000023
Or the number of the A set microblogs reaches the expected digest length.
CN202011503521.0A 2020-12-18 2020-12-18 Microblog abstract generation method based on multi-mode manifold learning and social network characteristics Active CN112527964B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011503521.0A CN112527964B (en) 2020-12-18 2020-12-18 Microblog abstract generation method based on multi-mode manifold learning and social network characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011503521.0A CN112527964B (en) 2020-12-18 2020-12-18 Microblog abstract generation method based on multi-mode manifold learning and social network characteristics

Publications (2)

Publication Number Publication Date
CN112527964A CN112527964A (en) 2021-03-19
CN112527964B true CN112527964B (en) 2022-07-01

Family

ID=75001431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011503521.0A Active CN112527964B (en) 2020-12-18 2020-12-18 Microblog abstract generation method based on multi-mode manifold learning and social network characteristics

Country Status (1)

Country Link
CN (1) CN112527964B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015210741A (en) * 2014-04-30 2015-11-24 日本電信電話株式会社 Topic modeling device, topic modeling method, and topic modeling program
CN105740448A (en) * 2016-02-03 2016-07-06 天津大学 Topic-oriented multi-microblog time sequence abstracting method
CN107329954A (en) * 2017-06-29 2017-11-07 浙江工业大学 A kind of topic detection method based on document content and correlation
CN108681557A (en) * 2018-04-08 2018-10-19 中国科学院信息工程研究所 Based on the short text motif discovery method and system indicated from expansion with similar two-way constraint
CN109063010A (en) * 2018-07-11 2018-12-21 成都爱为贝思科技有限公司 A kind of leader of opinion's method for digging based on PageRank
CN110489548A (en) * 2019-07-12 2019-11-22 北京邮电大学 A kind of Chinese microblog topic detecting method and system based on semanteme, time and social networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9946703B2 (en) * 2016-08-18 2018-04-17 Microsoft Technology Licensing, Llc Title extraction using natural language processing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015210741A (en) * 2014-04-30 2015-11-24 日本電信電話株式会社 Topic modeling device, topic modeling method, and topic modeling program
CN105740448A (en) * 2016-02-03 2016-07-06 天津大学 Topic-oriented multi-microblog time sequence abstracting method
CN107329954A (en) * 2017-06-29 2017-11-07 浙江工业大学 A kind of topic detection method based on document content and correlation
CN108681557A (en) * 2018-04-08 2018-10-19 中国科学院信息工程研究所 Based on the short text motif discovery method and system indicated from expansion with similar two-way constraint
CN109063010A (en) * 2018-07-11 2018-12-21 成都爱为贝思科技有限公司 A kind of leader of opinion's method for digging based on PageRank
CN110489548A (en) * 2019-07-12 2019-11-22 北京邮电大学 A kind of Chinese microblog topic detecting method and system based on semanteme, time and social networks

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Disease-Disease Relationships for Rheumatic Diseases: Web-Based Biomedical Textmining an Knowledge Discovery to Assist Medical Decision Making;Andreas Holzinger等;《2012 IEEE 36th Annual Computer Software and Applications Conference》;20121112;1-5 *
一种基于测地距离的多文档自动摘要方法;安玲;《林区教学》;20140930(第210期);31-35 *
基于Web信息抽取的网络舆情统计与分析;黎康;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20170515(第5期);I138-1232 *
基于参考点的改进k近邻分类算法;梁聪等;《计算机工程》;20190228;第45卷(第2期);167-172 *
政务社交媒体危机传播效果评价指标体系的构建;陈然;《统计与决策》;20190924(第534期);91-93 *
面向自由文本的细粒度关系抽取的关键技术研究;朱倩;《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》;20120615(第6期);I138-93 *

Also Published As

Publication number Publication date
CN112527964A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
CN110390103B (en) Automatic short text summarization method and system based on double encoders
RU2628431C1 (en) Selection of text classifier parameter based on semantic characteristics
RU2628436C1 (en) Classification of texts on natural language based on semantic signs
CN110232149B (en) Hot event detection method and system
Yang et al. Hierarchical summarization of large documents
Tiwari et al. Ensemble approach for twitter sentiment analysis
Chang et al. A METHOD OF FINE-GRAINED SHORT TEXT SENTIMENT ANALYSIS BASED ON MACHINE LEARNING.
Liang et al. Profiling users for question answering communities via flow-based constrained co-embedding model
WO2011117594A1 (en) System
Yenkikar et al. AirBERT: A fine-tuned language representation model for airlines tweet sentiment analysis
Sheeba et al. Low frequency keyword extraction with sentiment classification and cyberbully detection using fuzzy logic technique
CN111400483B (en) Time-weighting-based three-part graph news recommendation method
CN112527964B (en) Microblog abstract generation method based on multi-mode manifold learning and social network characteristics
Silessi et al. Identifying gender from SMS text messages
CN112883716B (en) Twitter abstract generation method based on topic correlation
Uchida et al. Evaluation of retweet clustering method classification method using retweets on Twitter without text data
Trad et al. A framework for authorial clustering of shorter texts in latent semantic spaces
Cindo et al. Sentiment Analysis on Twitter By Using Maximum Entropy And Support Vector Machine Method
CN115269846A (en) Text processing method and device, electronic equipment and storage medium
Anthonio Robust document representations for hyperpartisan and fake news detection
Kaba et al. Afaan Oromo language fake news detection in social media using convolutional neural network and long short term memory
Jaya et al. Analysis of convolution neural network for transfer learning of sentiment analysis in Indonesian tweets
Sánchez et al. Joint sentiment topic model for objective text clustering
Bagus Satria Wiguna et al. Sarcasm detection engine for twitter sentiment analysis using textual and emoji feature
Hernández et al. Evaluation of deep learning models for sentiment analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant