CN112527964B - Microblog abstract generation method based on multi-mode manifold learning and social network characteristics - Google Patents
Microblog abstract generation method based on multi-mode manifold learning and social network characteristics Download PDFInfo
- Publication number
- CN112527964B CN112527964B CN202011503521.0A CN202011503521A CN112527964B CN 112527964 B CN112527964 B CN 112527964B CN 202011503521 A CN202011503521 A CN 202011503521A CN 112527964 B CN112527964 B CN 112527964B
- Authority
- CN
- China
- Prior art keywords
- microblog
- significance
- information
- social network
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 239000011159 matrix material Substances 0.000 claims abstract description 17
- 230000003993 interaction Effects 0.000 claims abstract description 14
- 238000012216 screening Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000004140 cleaning Methods 0.000 claims description 3
- 239000006185 dispersion Substances 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000011160 research Methods 0.000 description 6
- 230000003068 static effect Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011524 similarity measure Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 101150011264 setB gene Proteins 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Human Resources & Organizations (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a microblog abstract generating method based on multi-modal manifold learning and social network characteristics, which comprises the following steps of: acquiring a microblog set of a specific topic of a user and user interaction information; constructing a relation matrix in the text and a relation matrix across the text; calculating the microblog significance by combining the matrix; calculating social acceptance by combining the user interaction information; and combining the microblog significance with the social recognition to obtain the final microblog significance, and selecting a plurality of sentences with the highest significance to become the abstract. The invention improves the popular learning method commonly used in the multi-document abstract, integrates the social network information, better utilizes sentence relation characteristics among different subject documents and sentence relation characteristics in the same document, simultaneously adopts the maximum boundary correlation algorithm (MMR) to reduce redundant information, and considers the coverage and diversity of the abstract.
Description
Technical Field
The invention relates to an automatic text summarization technology in natural language processing, in particular to an automatic generation of a microblog speech summary based on multi-modal manifold learning and social network characteristics.
Background
The rapid development of social networking media, such as twitter and microblog, provides a large amount of information for people and increases the cost for acquiring effective information. Therefore, microblog abstract research for compressing and summarizing massive microblog information becomes necessary. At present, the main research methods of microblog abstractions comprise: (1) based on the traditional abstraction method: sumbic, Textrank, Lexrank, Centriod, Data Reconstruction. (2) Utilizing social network static and dynamic data: and summarizing discussions of people under a certain topic, such as praise number, microblog forwarding number, user influence and the like. Most of the latest research methods are methods for combining two kinds of materials: some of these are microblog saliency calculations using static social network information, such as (Li et al, 2012) in combination with the forwarded number of a certain microblog and a following-following representing the user's influence. Still other social network information based on dynamic, such as the social network relationship information of people considered in the way of Heyu, etc., provide a social network abstract algorithm with lower redundancy; (Duan Y et al, 2012) combines the static and dynamic information, sorts the microblogs based on microblog publication time, weights the utterance based on the influence of the user and the content quality of the utterance, and calculates the sentence significance. In addition, research is performed based on the time sequence of the microblogs, for example, (Nichols et al, 2012) for abstracting a certain event, a timestamp of the microblog can be used as a node for detecting the occurrence of the event according to a characteristic, and a peak value appears in a change curve of the number of posts when the event occurs.
Disclosure of Invention
The existing research is usually to summarize according to a hot topic in a certain time or a certain event, when the existing research is applied to a user speech summarization, the effect is not ideal, and meanwhile, for some algorithms such as a Data Reconstruction algorithm (Data Reconstruction), the problem of high complexity exists. The method improves the common manifold learning method in the multi-document abstract, integrates the social network information, better utilizes sentence relation characteristics among different subject documents and sentence relation characteristics in the documents, simultaneously reduces redundant information by adopting MMR, and gives consideration to the coverage and diversity of the abstract.
The technical scheme adopted by the invention is as follows: a microblog abstract generating method based on multi-modal manifold learning and social network characteristics comprises the following steps:
step one, acquiring a microblog set of a specific topic of a user and user interaction information;
secondly, constructing a text relation matrix in a single document and a text relation matrix between cross documents;
thirdly, calculating the microblog significance by combining the matrix in the second step;
step four, calculating social acceptance by combining the user interaction information;
and step five, integrating the microblog significance and social identity information to obtain the final microblog significance, and selecting a plurality of sentences with the highest microblog significance under the maximum border correlation algorithm (MMR) strategy to be abstracted in consideration of redundancy.
Specifically, the step of acquiring the microblog set of the specific topic of the user in the step one comprises the steps of counting the word frequencies of nouns in all acquired microblog texts, screening the top n topic nouns as hot topic words, then screening the user through prior topic words, if the speeches published by the user relate to the n topics and exceed k, reserving the speeches, and then integrating the speeches of the user on each class into one sample.
In the technical scheme, the method further comprises the step of cleaning the microblog set of the specific topic, specifically removing numbers of Hashtag, @ URL and microblog tail, and removing the microblog with the number of words less than m in the microblog.
The user interaction information comprises the number of praise, forwarding and comment of the user microblog, is extracted through a regular expression, and is set to be 0 if the user interaction information is not extracted.
Specifically, step two is a text relation matrix in the same documentIf the microblogs i and j belong to the same topic, thenIs xiAnd xjResidual chord similarity, otherwiseIs 0, xiAnd expressing TF-IDF codes of a single microblog.
The text relation matrix between the cross-documentsIf i, j belong to different documents or if one of i, j is 0, thenIs xiAnd xjResidual chord similarity measure, otherwiseIs 0.
Specifically, the microblog significance of the third step is calculated by the following formula:
Qa(f)=μT·(1-Sa)f+(1-μ)·(f-y)T(f-y)
Qb(f)=η·fT(1-Sb)f+(1-η)·(f-y)T(f-y)
wherein Qa(f)、Qb(f) For the Loss function, f is a vector representing the saliency score of each sentence, when Qa(f) Middle f is reachedTo the minimum timeWhen, the representation f well considers the sentence relationship with the document. When Q isb(f) When medium f reaches the minimumWhen representing f, the different document sentence relationships are well considered. y ═ y0,y1,y2,...,yn]TAnd y represents that given a set of data points, the first point represents the topic description sentence point and the remaining n points represent all sentences in the document (data points to be sorted). μ, η represents an information smoothness constraint that considers the subject information and the text information. λ represents an information smoothness constraint between two modalities that consider both co-document information and cross-document information.
Specifically, the social recognition degree in the fourth step is calculated according to the formula
Ri=α·ci+β·rei+γ·li
Wherein, ci、rei、liThe values are respectively the dispersion standardized values of the praise number, the forwarding number and the comment number of the ith microblog, and alpha, beta and gamma are hyper parameters and meet the condition that alpha + beta + gamma is 1.
Further, the final microblog significance is
RankScore=ω·f*+(1-ω)·R
ω is an adjustable hyper-parameter, where 0< ω < 1. And R represents the final microblog significance.
In order to ensure that the redundancy of the screened abstract is as small as possible, the scheme further comprises a redundancy removing step, which specifically comprises the following steps:
1) the set a and the set B are initialized,B={xi1,2,. n }, wherein A represents a set for storing summary microblogs, B represents a set of candidate microblogs sorted according to the microblog significance scores, and x representsiIndicating the ith microblogAnd n represents the total number of microblogs. Wherein the significance score of each microblog is by Si=RankScore(i)Calculating;
2) sorting the microblogs in the set B according to the significance scores;
3) taking the first element x from set BiIf xiSatisfies the following conditions:
wherein s represents the microblog in the A set.
Then x isiMoving from the set B to the set A, wherein epsilon is a hyper-parameter and represents a threshold value of the similarity, and otherwise, deleting;
4) repeating the step 3) untilAnd the number of the microblogs in the set A or the set A reaches the expected digest length.
The invention has the following beneficial technical effects:
1. the multi-mode manifold learning algorithm is improved, so that the method can be applied to multi-topic microblog text abstracts without topic sentences. Specifically, it is considered that the status of no subject sentence is the same as that of the subject sentence and other microblog sentences, so that y in the original algorithm is equal to [ y ═ y0,y1,y2,...,yn]TWherein y is0=1;yi0, modified by y0=yi1. Thus, the status of the first sentence is completely parallel to the status of the other sentences, and the subject sentence disappears. Therefore, the algorithm can be applied to the microblog data set, so that the microblog abstract is not interfered by the 'subject sentence' information when being generated, and the information consistency and complementarity among a plurality of documents are well considered.
2. And integrating social network interaction information of the microblog, such as praise number, comment number forwarding number and the like into an abstract algorithm. So as to obtain the abstract with high information coverage, novelty and summarization. The user issues a microblog, and the interaction amount of the friends of the user and the people who browse the microblog represents the attention degree of the people and the recognition degree of the microblog information. Generally, the information coverage, novelty and summarization of the microblog are indicated to a certain degree by the attention and the recognition of a piece of information, and the text abstract is just to select sentences with high information coverage, novelty and summarization. Therefore, the interactive information is integrated into the algorithm, and the abstract with better information coverage, novelty and summarization can be obtained.
In conclusion, the method develops a microblog abstract generating algorithm considering social identity and consistency and complementarity of information among different documents aiming at the particularity of the microblog speeches of the users covering a plurality of topics. Therefore, the abstract with better information coverage, novelty and summarization is obtained. This is the advantage of the present invention.
Drawings
FIG. 1 is a schematic process flow diagram of the present invention.
Detailed Description
Referring to fig. 1, when the method is used for generating the multi-topic text abstract, a microblog is selected as the abstract from the social identity and the microblog significance, and the similarity of the final abstract is controlled, so that the coverage, diversity and social identity of the generated microblog abstract are comprehensively considered. .
Considering the diversity of the user abstract topics, the task can be regarded as a multi-document abstract task, and the multi-modal manifold learning is a widely used multi-document abstract method, which comprehensively considers the full-text topics, the intra-document importance and the inter-document importance. Aiming at the user abstract, as no predefined topic sentence information exists, the method improves the multi-mode popular learning algorithm, so that the method can be applied to the microblog abstract without topic sentences. Meanwhile, because the microblog contains social network interaction information, such as the forwarding amount, praise number, comment number and the like of each microbump, and considering that the possibility that a language with high social recognition degree is used as a summary is higher, the invention designs a microblog summary method combining the social network interaction information and multi-mode popular learning, which comprises the following specific steps:
1. preparing data: because of lack of public microblog corpora and abstract corpora, the original data come from user microblog data obtained through a public microblog API, and finally 500 users are sorted, and the number of microblogs of each user is not more than 1000. All user ids are replaced numerically, taking into account user privacy. And counting the word frequency of the nouns in all the microblogs, and screening the first n topic nouns as hot topic words. And then screening the users through prior subject terms, if the language published by the users relates to the n topics and exceeds k, reserving the language, and integrating the language of the users on each category into a sample. And after a microblog set of a specific topic of the user is obtained, further cleaning the data. Firstly removing noisy information such as Hashtag, @, URL, and the number at the tail of the microblog, and then removing the microblog with the number of words less than m in the microblog. And e, the number of praise, forwarding and comment of the user microblog is extracted through a regular expression, and if the number of praise, forwarding and comment of the user microblog is not extracted, the number of praise is set to 0.
2. The main ideas of the multi-mode popular learning in the text abstract are as follows: the sentence relation in the multi-document abstract can be divided into a relation in the same document and a relation between texts in different documents, which respectively reflect the text information coverage and the full text information coverage of the sentences, and based on the difference of the two relations, the relation between the two sentences can be represented by two matrixes. And combining the two kinds of information to obtain the final microblog significance, and selecting a plurality of sentences with the highest significance to become the abstract.
Encoding a microblog relation matrix: by usingRepresenting a matrix of relationships between sentences within the same document,representing all sentences in the document (data points to be sorted). By xiPresentation sheetAnd (4) carrying out TF-IDF coding on the microblog. If the microblogs i and j belong to the same topic, thenIs xiAnd xjResidual chord similarity, otherwiseIs 0. In a similar manner to that described above,representing a text relationship matrix across documents, if i, j belong to different texts or if one of i, j is 0Is xiAnd xjResidual chord similarity measure, otherwiseIs 0. Then will beIs normalized as Sa、SbThe regularization method is Sx=(Dx)(-1/2)Wx(Dx)(-1/2)Wherein D isxIs formed by WxThe sum of the row elements of each column constitutes a diagonal matrix.
The microblog significance: firstly, calculating the sentence significance score in each modality, and then combining the information of the two modalities, wherein the calculation formula is as follows:
Qa(f)=μT·(1-Sa)f+(1-μ)·(f-y)T(f-y)
Qb(f)=η·fT(1-Sb)f+(1-η)·(f-y)T(f-y)
Qa(f)、Qb(f) for the Loss function, f is a vector representing the saliency score of each sentence, when Qa(f) When f reaches the minimum value, i.e. fa *When f, it means that f well considers the sentence relationship with the document. When Q isb(f) When f reaches the minimum value, i.e. fb *When f, the representation f takes into account the different document sentence relationships well. y ═ y0,y1,y2,…,yn]TAnd y represents that given a set of data points, the first point represents the topic description sentence point and the remaining n points represent all sentences in the document (data points to be sorted). μ, η represents an information smoothness constraint that considers the subject information and the text information. The lambda representation considers information smoothness constraints between the two modalities of the same document and different documents.
Social recognition: if one microblog is forwarded, praise and the number of comments of the microblog are more than those of other microblogs, the relative acceptance of the microblog in the document is considered to be higher than that of other microblogs, and the calculation formula is as follows:
Ri=α·ci+β·rei+γ·li
wherein, ci、rei、liThe values are respectively the dispersion standardized values of the praise number, the forwarding number and the comment number of the ith microblog, and alpha, beta and gamma are hyper parameters and meet the condition that alpha + beta + gamma is 1.
And integrating the social identity and the microblog significance information, wherein the final microblog significance is as follows:
RankScore=ω·f*+(1-ω)·R
omega represents a hyper-parameter and represents smoothness constraint considering two modal information of social acceptance and microblog significance. And R represents the final microblog significance.
Redundancy penalty strategy: to ensure that the redundancy of the selected digests is as small as possible, our strategy for mmr (maximum local retrieval) is as follows:
1) initializing a setB={xi1,2,. n }, wherein A represents a set for storing summary microblogs, B represents a set of candidate microblogs sorted according to microblog significance scores, and x representsiThe ith microblog is represented, and n represents the total number of microblogs. The significance score of each microblog is calculated according to the formula Si=RankScore(i)
2) Ordering microblogs in the B set according to the significance scores
3) Taking the first element x from set BiIf xiSatisfies the following conditions:
wherein s represents the microblog in the set A.
Then x isiMoving from set B to set a, where epsilon is a hyperparameter representing a threshold of similarity. Otherwise delete
Reference to the literature
[1] Hoechamine, wubo, penhao, zhangyan chong, li jiangxin microblog emergency detection method and device based on semantic expansion [ P ]. beijing city: CN106886567B,2019-11-08.
[2] Tenghui, Liu Shimeng, Longfei A convolutional neural network-based microblog news abstract extraction type generation method [ P ]. Beijing City: CN110362674B,2020-08-04.
[3] Herzufang, guangchua, dangjianwu, huqinghua, topic-oriented multi-microblog time sequence summarization method [ P ]. tianjin city: CN105740448B,2019-06-25.
[4] Congress, duxuefe, zhangxiefe, lie sanfei summary method [ P ] based on social media microblog specific topics: CN107992634A,2018-05-04.
[5] Hoechamine, wubo, penhao, zhangyang, li jiangxin microblog emergency detection method and device based on semantic expansion [ P ]. beijing: CN106886567A,2017-06-23.
[6] A method for generating an abstract of a self-adaptive microblog topic [ P ]. beijing: CN106503064A,2017-03-15.
Claims (7)
1. The microblog abstract generation method based on the multi-mode manifold learning and social network characteristics is characterized by comprising the following steps of:
step one, acquiring a microblog set of a specific topic of a user and user interaction information;
secondly, constructing a text relation matrix in a single document and a text relation matrix between cross documents;
relationship matrix within the same documentIf the microblogs i and j belong to the same topic, thenIs xiAnd xjThe residual lengths are similar, otherwise, the orderIs 0, xiThe TF-IDF codes represent a single microblog;
text relation matrix across documentsIf i, j belong to different documents or if one of i, j is 0, thenIs xiAnd xjThe residual lengths are similar otherwiseIs 0;
step three, calculating the microblog significance by combining the matrix in the step two and calculating by the following formula:
Qa(f)=μT·(1-Sa)f+(1-μ)·(f-y)T(f-y)
Qb(f)=η·fT(1-Sb)f+(1-η)·(f-y)T(f-y)
Qa(f)、Qb(f) for the Loss function, f is a vector representing the saliency score of each sentence, when Qa(f) When f reaches the minimumWhen f is good, the relation of the sentence with the document is considered, when Q isb(f) When f reaches the minimumWhen f is good at considering different document sentence relations, y ═ y0,y1,y2,...,yn]TY denotes a given set of data points, the first point representing the topic description sentence point and the remaining n points representing all sentences in the document, μ, η representing the information smoothness constraint considering topic information and text information, respectively, and λ denotes the information smoothness constraint considering the same document and different documentsShifting information smoothness constraints between two modalities;
step four, calculating social acceptance by combining the user interaction information;
and step five, integrating the microblog significance and the social identity information to obtain the final microblog significance, and selecting a plurality of sentences with the highest microblog significance under the MMR strategy to be abstracts in consideration of redundancy.
2. The method for generating the microblog digest based on the multi-modal manifold learning and social network features according to claim 1, wherein the method comprises the following steps: the step of acquiring the microblog set of the specific topic of the user comprises the steps of counting word frequencies of nouns in all acquired microblog texts, screening top n topic nouns as hot topic words, screening the user through prior topic words, if the speeches published by the user relate to the n topics and exceed k, reserving the speeches, and integrating the speeches of the user on each class into a sample.
3. The method for generating the microblog digest based on the multi-modal manifold learning and social network features according to claim 2, wherein the method comprises the following steps: the method further comprises the step of cleaning the microblog set of the specific topic, specifically, removing numbers of Hashtag, @ URL and microblog tail, and removing the microblog with the number of words less than m in the microblog.
4. The method for generating the microblog digest based on the multi-modal manifold learning and social network features according to claim 1, wherein the method comprises the following steps: the user interaction information comprises the number of praise, forwarding and comment of the user microblog, is extracted through a regular expression, and is set to be 0 if the user interaction information is not extracted.
5. The method for generating the microblog digest based on the multi-modal manifold learning and social network features according to claim 1, wherein the method comprises the following steps: step four, the calculation formula of the social recognition degree is
Ri=α·ci+β·rei+γ·li
Wherein, ci、rei、liThe values are respectively the dispersion standardized values of the praise number, the forwarding number and the comment number of the ith microblog, and alpha, beta and gamma are hyper parameters and meet the condition that alpha + beta + gamma is 1.
6. The method for generating the microblog abstract based on the multi-modal manifold learning and the social network features according to any one of claims 1 to 5, wherein the method comprises the following steps: the final microblog significance is
RankScore=ω·f*+(1-ω)·R
ω is an adjustable hyper-parameter, where 0< ω <1, R represents the final microblog prominence.
7. The method for generating the microblog digest based on the multi-modal manifold learning and social network features according to claim 6, wherein the method comprises the following steps: the method further comprises a redundancy removing step which specifically comprises the following steps:
1) the set a and the set B are initialized,B={xi1,2,. n }, wherein A represents a set for storing summary microblogs, B represents a set of candidate microblogs sorted according to microblog significance scores, and x representsiRepresenting the ith microblog, n representing the total number of microblogs, wherein the significance score of each microblog is Si=RankScore(i)Calculating;
2) sorting the microblogs in the set B according to the significance scores;
3) taking the first element x from set BiIf xiSatisfies the following conditions:
s represents the microblog in the A set;
then x isiMoving from B set to A set, where ε is a hyperparameter representing a threshold of similarity, otherwise deletingRemoving;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011503521.0A CN112527964B (en) | 2020-12-18 | 2020-12-18 | Microblog abstract generation method based on multi-mode manifold learning and social network characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011503521.0A CN112527964B (en) | 2020-12-18 | 2020-12-18 | Microblog abstract generation method based on multi-mode manifold learning and social network characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112527964A CN112527964A (en) | 2021-03-19 |
CN112527964B true CN112527964B (en) | 2022-07-01 |
Family
ID=75001431
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011503521.0A Active CN112527964B (en) | 2020-12-18 | 2020-12-18 | Microblog abstract generation method based on multi-mode manifold learning and social network characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112527964B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015210741A (en) * | 2014-04-30 | 2015-11-24 | 日本電信電話株式会社 | Topic modeling device, topic modeling method, and topic modeling program |
CN105740448A (en) * | 2016-02-03 | 2016-07-06 | 天津大学 | Topic-oriented multi-microblog time sequence abstracting method |
CN107329954A (en) * | 2017-06-29 | 2017-11-07 | 浙江工业大学 | A kind of topic detection method based on document content and correlation |
CN108681557A (en) * | 2018-04-08 | 2018-10-19 | 中国科学院信息工程研究所 | Based on the short text motif discovery method and system indicated from expansion with similar two-way constraint |
CN109063010A (en) * | 2018-07-11 | 2018-12-21 | 成都爱为贝思科技有限公司 | A kind of leader of opinion's method for digging based on PageRank |
CN110489548A (en) * | 2019-07-12 | 2019-11-22 | 北京邮电大学 | A kind of Chinese microblog topic detecting method and system based on semanteme, time and social networks |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9946703B2 (en) * | 2016-08-18 | 2018-04-17 | Microsoft Technology Licensing, Llc | Title extraction using natural language processing |
-
2020
- 2020-12-18 CN CN202011503521.0A patent/CN112527964B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015210741A (en) * | 2014-04-30 | 2015-11-24 | 日本電信電話株式会社 | Topic modeling device, topic modeling method, and topic modeling program |
CN105740448A (en) * | 2016-02-03 | 2016-07-06 | 天津大学 | Topic-oriented multi-microblog time sequence abstracting method |
CN107329954A (en) * | 2017-06-29 | 2017-11-07 | 浙江工业大学 | A kind of topic detection method based on document content and correlation |
CN108681557A (en) * | 2018-04-08 | 2018-10-19 | 中国科学院信息工程研究所 | Based on the short text motif discovery method and system indicated from expansion with similar two-way constraint |
CN109063010A (en) * | 2018-07-11 | 2018-12-21 | 成都爱为贝思科技有限公司 | A kind of leader of opinion's method for digging based on PageRank |
CN110489548A (en) * | 2019-07-12 | 2019-11-22 | 北京邮电大学 | A kind of Chinese microblog topic detecting method and system based on semanteme, time and social networks |
Non-Patent Citations (6)
Title |
---|
Disease-Disease Relationships for Rheumatic Diseases: Web-Based Biomedical Textmining an Knowledge Discovery to Assist Medical Decision Making;Andreas Holzinger等;《2012 IEEE 36th Annual Computer Software and Applications Conference》;20121112;1-5 * |
一种基于测地距离的多文档自动摘要方法;安玲;《林区教学》;20140930(第210期);31-35 * |
基于Web信息抽取的网络舆情统计与分析;黎康;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20170515(第5期);I138-1232 * |
基于参考点的改进k近邻分类算法;梁聪等;《计算机工程》;20190228;第45卷(第2期);167-172 * |
政务社交媒体危机传播效果评价指标体系的构建;陈然;《统计与决策》;20190924(第534期);91-93 * |
面向自由文本的细粒度关系抽取的关键技术研究;朱倩;《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》;20120615(第6期);I138-93 * |
Also Published As
Publication number | Publication date |
---|---|
CN112527964A (en) | 2021-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tangherlini et al. | An automated pipeline for the discovery of conspiracy and conspiracy theory narrative frameworks: Bridgegate, Pizzagate and storytelling on the web | |
CN110390103B (en) | Automatic short text summarization method and system based on double encoders | |
Saif et al. | Alleviating data sparsity for twitter sentiment analysis | |
CN110232149B (en) | Hot event detection method and system | |
CN111090731A (en) | Electric power public opinion abstract extraction optimization method and system based on topic clustering | |
Yang et al. | Hierarchical summarization of large documents | |
Chang et al. | A METHOD OF FINE-GRAINED SHORT TEXT SENTIMENT ANALYSIS BASED ON MACHINE LEARNING. | |
Liang et al. | Profiling users for question answering communities via flow-based constrained co-embedding model | |
EP2553612A1 (en) | System | |
Hallac et al. | user2vec: Social media user representation based on distributed document embeddings | |
Yenkikar et al. | AirBERT: A fine-tuned language representation model for airlines tweet sentiment analysis | |
Sheeba et al. | A fuzzy logic based on sentiment classification | |
Sheeba et al. | Low frequency keyword extraction with sentiment classification and cyberbully detection using fuzzy logic technique | |
CN111400483B (en) | Time-weighting-based three-part graph news recommendation method | |
Silessi et al. | Identifying gender from SMS text messages | |
CN112527964B (en) | Microblog abstract generation method based on multi-mode manifold learning and social network characteristics | |
Alibadi et al. | Using pre-trained embeddings to detect the intent of an email | |
CN108427769B (en) | Character interest tag extraction method based on social network | |
Sakiyama et al. | Twitter breaking news detector in the 2018 Brazilian presidential election using word embeddings and convolutional neural networks | |
Kaba et al. | Afaan Oromo language fake news detection in social media using convolutional neural network and long short term memory | |
Uchida et al. | Evaluation of retweet clustering method classification method using retweets on Twitter without text data | |
Trad et al. | A framework for authorial clustering of shorter texts in latent semantic spaces | |
Cindo et al. | Sentiment Analysis on Twitter By Using Maximum Entropy And Support Vector Machine Method | |
CN112883716A (en) | Twitter abstract generation method based on topic correlation | |
CN115455975A (en) | Method and device for extracting topic keywords based on multi-model fusion decision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |