CN109255123A - It is a kind of that literary event summary generation method is pushed away based on mixing scoring model - Google Patents

It is a kind of that literary event summary generation method is pushed away based on mixing scoring model Download PDF

Info

Publication number
CN109255123A
CN109255123A CN201810919909.5A CN201810919909A CN109255123A CN 109255123 A CN109255123 A CN 109255123A CN 201810919909 A CN201810919909 A CN 201810919909A CN 109255123 A CN109255123 A CN 109255123A
Authority
CN
China
Prior art keywords
literary
text
user
away
scoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810919909.5A
Other languages
Chinese (zh)
Inventor
于富财
蒋珊
汪辉
胡光岷
费高雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201810919909.5A priority Critical patent/CN109255123A/en
Publication of CN109255123A publication Critical patent/CN109255123A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

Literary event summary generation method is pushed away based on mixing scoring model the present invention provides a kind of, belongs to summarization generation field.The present invention is pushed away from more extracts the highest text that pushes away of scoring as abstract in text, the mode of scoring is to push away text to each piece using mixed scoring model to give a mark: logic-based return classifier to push away the scoring of literary text feature, the scoring of user force based on social circle and literary the summarys degree that pushes away based on undirected graph model are scored.Using the weighted sum of the scoring of 3 parts as the final scoring for pushing away text, the importance for pushing away text is measured, chooses the highest abstract for pushing away text and pushing away text as more of importance.

Description

It is a kind of that literary event summary generation method is pushed away based on mixing scoring model
Technical field
The invention belongs to summarization generation field, in particular to a kind of literary event summary that pushes away based on mixing scoring model generates Method.
Background technique
In recent years, the rapid development of social networks and development of Mobile Internet technology has greatly been retracted the distance between user, User is more closely linked together, user not only can easily obtain the various information of cybertimes, and And can also participate in the manufacturing process of information, become the producer of information, this make traditional information generate with exchange by Unthinkable impact is arrived.With rising sharply for network user's scale, more resulting in current information has capacity big, The features such as type is more, and speed is fast, and potential value is high, and the trend that information explosion formula increases is more and more obvious.In recent years, the U.S. Social networks and microblogging had changed into global most popular one of social platform using website Twitter.It makes people Can be used no more than 140 words be referred to as " pushing away text " message carry out information real-time sharing.Until 2012, Any active ues of Twitter have had arrived at 1.4 hundred million, or even are referred to as " short message service of internet " extensively, and daily " pushing away text " publication number reaches 3.3 hundred million.Being skyrocketed through of the number of the message of the upper real-time sharing of Twitter exacerbates user and obtains It wins the confidence the difficulty of breath, although Twitter provides the function of keyword search for the acquisition information for keeping user more convenient, User can be relevant by keyword search " pushing away text ".But for a focus incident, " the pushing away text " of user's publication Number usually has thousands of items, and user wants " pushing away text " by reading these huge numbers to obtain a focus incident Develop summary to be nearly impossible.
The development grain of literary event is entirely pushed away in order to understand that user clearly, quickly, needs to excavate event Each stage of development, and related abstract is generated for each great stage, finally by these abstracts according to chronological order It is organized into the development grain of event;Therefore based on the summarization generation and EVOLUTION ANALYSIS for pushing away literary event reformed into one it is particularly significant Research hotspot.
Summary of the invention
In order to solve the problems in the prior art, the invention proposes a kind of literary events that pushes away based on mixing scoring model to pluck Generation method is wanted, literary event summary generating algorithm is pushed away based on mixing scoring model, for each significant development for pushing away literary event Stage can all generate an abstract, show that the different of event are sent out on a timeline according to the chronological order of these developing stage The abstract in exhibition stage, the i.e. development grain of event facilitate the development grain that user understands whole event.
It is a kind of that literary event summary generation method is pushed away based on mixing scoring model, comprising the following steps:
Step 1, it obtains and pushes away text;
Step 2, it is given a mark based on mixing scoring model to text is pushed away, wherein the scoring model includes that logic-based returns Return the Tui Wen text quality Rating Model of classifier, literary Context Generality degree Rating Model is pushed away and based on society based on undirected graph model Hand over the user force Rating Model of circle;
Step 3, according to the appraisal result of the scoring model, the abstract of event is obtained.
Further, the step 2 includes following below scheme:
Step 21, logic-based returns scoring Tui Wen text quality for classifier;
Step 22, it is scored based on undirected graph model literary Context Generality degree is pushed away;
Step 23, it is scored based on social circle user force.
Further, the step 21 includes following below scheme:
Extraction pushes away literary feature, obtains pushing away literary feature set, wherein push away literary feature include push away literary length, push away literary ellipsis quantity, It pushes away literary " # " number amount, the Alexa ranking for pushing away literary "@" number amount, pushing away literary stop words occupation proportion, pushing away literary url, push away literary capital letter Female occupation proportion pushes away literary additional character occupation proportion, carries out quality score to text is pushed away according to literary feature is pushed away.
Further, the step 22 includes following below scheme:
Literary s is pushed away by push away literary eventi={ tweet1,tweet2,...,tweetmConvert and be built into the nothing for pushing away text Xiang Tu pushes away literary tweetiAnd tweetjText similarity text_sim (tweeti,tweetj) be
Wherein, ViAnd VjRespectively push away literary tweetiAnd tweetjPush away cliction vector.
Further, ViAnd VjRespectively push away literary tweetiAnd tweetjIt segmented by NLTK tool, remove stop words With punctuation mark, it is stemmed after obtain push away cliction vector.
Further, the step 23 includes following below scheme:
The bean vermicelli quantity of other users and sum_follow_num are in user social contact circle
Wherein, k is the total number of persons of the social circle of user, pushes away text progress user force scoring to N, i-th user's Social circle's number of fans and be sum_follow_numi, the list of the corresponding social circle's number of fans sum of this N number of user is
Sum_follow_num_list=
{sum_follow_num1,sum_follow_num2,...,sum_follow_numN}
Ascending sort is carried out to the number in sum_follow_num_list, pushes away literary tweetjUser sum_ Follow_num is rank in ranking whereinsum_follow_num, push away literary tweetjThe scoring of the social circle of corresponding user is
The number of fans of user is follow_num, the other users friend_list={ user in social circle1, user2,...,userkCorresponding number of fans is
Wherein, rankfollow_numFor the number of fans ranking of user, M history are acquired to user and friend_list and are pushed away Text, the number that always thumbs up for always thumbing up number and always turning to push away other users of the number in its social circle for obtaining user push away number with total turn Ascending order ranking ranksum_like_numAnd ranksum_retweet_num, the history of user push away text thumb up number and turn push away several overall scores For
Wherein, α is normalized harmonic coefficient, and scoring of the user in social circle is
User force based on social circle scores
By the scoring after normalization as the user force scoring based on social circle
Wherein, Max_importance_score is all biggest impact power pushed away in literary user in event developing stage Scoring.
Beneficial effects of the present invention: literary event summary generation side is pushed away based on mixing scoring model the present invention provides a kind of Method pushes away from more and extracts the highest text that pushes away of scoring in text as abstract, and the mode of scoring is using mixed scoring model pair Each piece pushes away text and gives a mark: logic-based returns classifier to pushing away the scoring of literary text feature, the customer impact based on social circle The scoring of power and literary the summarys degree that pushes away based on undirected graph model score.Using the weighted sum of the scoring of 3 parts as push away text most The importance for pushing away text is measured in final review point, chooses the highest abstract for pushing away text and pushing away text as more of importance.
Detailed description of the invention
Fig. 1 is the flow chart of the embodiment of the present invention.
Fig. 2 pushes away literary non-directed graph illustraton of model for the embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described further with reference to the accompanying drawing.
Referring to Fig. 1, pushing away literary event summary generation method based on mixing scoring model the present invention provides a kind of, specifically It is achieved by the following way:
Step 1, it obtains and pushes away text.
Step 2, it is given a mark based on mixing scoring model to text is pushed away, wherein the scoring model includes that logic-based returns Return the Tui Wen text quality Rating Model of classifier, literary Context Generality degree Rating Model is pushed away and based on society based on undirected graph model Hand over the user force Rating Model of circle.
In the present embodiment, text is pushed away to each piece by mixing scoring model and is given a mark, selection score value is highest to push away text most For abstract.
Referring to Fig. 2, step 2 is realized by following below scheme:
Step 21, logic-based returns scoring Tui Wen text quality for classifier.
Due to pushing away the informal writing mode of text, to push away text containing abbreviation, misspelling, information segment these characteristics Text quality's level it is uneven, it is generally recognized that containing more abbreviation, misspelling, multi information it is fragmentary push away text be it is second-rate Push away text, on the contrary then push away text for quality is higher, the text that pushes away for select quality high more low-quality than selection pushes away text as abstract is obvious It is more reasonable as making a summary.However, Direct Recognition one push away text be quality it is high or it is of poor quality be it is highly difficult, be typically only capable to Artificial method is crossed to go to realize.The quality that the text feature for pushing away text pushes away text for identification has very positive meaning, for example, one If a piece pushes away text containing the url shared, it is typically considered to that quality is higher, if containing more ellipsis, then it is assumed that quality is more It is low, it pushes away in text that the ratio shared by stop words the high, thinks that the quality for pushing away text is lower.
The present invention pushes away literary feature set realization by extracting to the scoring for pushing away literary quality, and the feature set of extraction is for logistic regression Disaggregated model uses.Shown in the feature of extraction such as following table (one):
Feature Description
Length Push away the length of all characters of text
Ellipsis The number of ellipsis, including Chinese and English ellipsis
" # " number The number of " # " symbol
"@" number The number of "@" symbol
Stop words The ratio that stop words occupies
url It whether include url
Url rank The Alexa ranking for the url for including
Capitalization The ratio that capitalization occupies
Other additional characters Other additional character occupation proportions
Word length Mean word length
The feature set of table (one) influence Tui Wen text quality
It returns classifier to be trained before data, needs to be labeled the data set for pushing away text, form available training Collection.The scoring criterion such as following table (two) of label:
Table (two) scoring criterion
In table (two), add up to 9 item rating items, add up to bonus point 0.95, to pushing away before text is marked, this is calculated by program 9 scorings are added, if scoring is more than or equal to 0.8, it is H that directly label, which pushes away text, automatically, are scored lower than 0.8 still when pushing away text When higher than 0.65, using manual identified, it is H that abundant in content, orderliness, which is clearly pushed away literary hand labeled, remaining, which pushes away text and marks, is. The efficiency of label can be promoted to a certain extent using automanual labeling method.
Step 22, it is scored based on undirected graph model literary Context Generality degree is pushed away.
Push away the summary degree of text is defined as: push away in text at one group, certain pushes away text and other the sum of text similarities for pushing away text.It pushes away It is low that similarity between text has height to have, it is generally recognized that and more push away text all it is similar push away text generality it is stronger, can more represent other Push away text.If two push away between text, only a few word is identical, and the similarity calculated is usually lower, illustrates that these are pushed away The meaning to be expressed between text is likely to different, cannot be substituted for each other expression.If certain piece pushes away text and others push away text It is all more similar, illustrate that this pushes away text and can substitute other expression for pushing away text progress theme, i.e., it is higher to push away literary summary degree for this.Due to The high overall meaning for pushing away Wen Gengneng expression whole group and pushing away text of summary degree, the text generation abstract that pushes away for selecting summary degree high are summarized than selection It spends and low push away text to generate abstract more reasonable.
In the present embodiment, literary s will be pushed away in a significant development stage for pushing away literary eventi={ tweet1, tweet2,...,tweetmConvert and be built into the non-directed graph for pushing away text, as shown in Figure 2.
S is closed by pushing away collected worksiIn a m piece push away in the undirected graph model of text composition, each fixed point represents one and pushes away text, such as pushes away text J represents a jth piece and pushes away text, and the line between vertex is the text similarity pushed away between text, is worth between 0 to 1.To pushing away text tweetiAnd tweetjSegmented by NLTK tool, remove stop words and punctuation mark, it is stemmed after respectively obtain push away text Term vector ViAnd Vj, push away literary tweetiAnd tweetjText similarity text_sim (tweeti,tweetj) be
The non-directed graph model display for pushing away text pushes away similarity relation between text, if to possess more weights higher for vertex Side, just illustrate that it pushes away similar, its summary Du Yuegao of text to more, can more represent it is other push away text carry out theme expression, more It is suitble to be chosen as the abstract that whole group pushes away text.
Step 23, it is scored based on social circle user force.
In the present embodiment, does not consider that user pushes away thumbing up number and turning to push away number for text when this time delivered, but consider user Previous history pushes away thumbing up number and turning to push away number for text, because history pushes away literary thumbing up number and turn to push away to count and is almost less likely to send out Raw huge variation;And for the social effectiveness for preferably identifying user, this patent goes out from the angle of the social circle of user Hair, first identifies the social circle of user, evaluates the influence power of its social circle, then the performance of user is measured out of social circle Influence power of the user in social circle, using the product of the influence power of the social circle of user and its influence power in circle as the use The influence power at family scores.
By number of fans, the history to the good friend in user and its social circle push away text thumb up several and history push away text turn to push away number come The influence power scoring of the social circle of user is calculated, specific as follows:
The sum of bean vermicelli quantity of other users sum_follow_num is in user user social circle
Wherein, k is the total number of persons of the social circle of user.
If need to the N piece in significant development stage to event push away the influence power scoring that text carries out user, the social activity of i-th of user Enclosing the sum of number of fans is sum_follow_numi, the list of the corresponding social circle's number of fans sum of this N number of user is
Sum_follow_num_list=
{sum_follow_num1,sum_follow_num2,...,sum_follow_numN}
Wherein, N is the total number that text is pushed away in event developing stage, is risen to the number in sum_follow_num_list Sequence sequence, pushes away literary tweetjUser user sum_follow_num ranking wherein be ranksum_follow_num, push away text tweetjThe scoring of the social circle of corresponding user is
Wherein, ranksum_follow_numLiterary user all social activities pushed away in literary user in event developing stage are pushed forward to work as Total number of fans ranking of circle.
After the scoring for finding out the corresponding social circle of user, need to find out scoring of the user in circle.It is common for one User in social circle, it is contemplated that the influence power of only a few peoples is relatively high, i.e. scoring in circle is higher.According to going through for user History pushes away the like time of text and turns to push away row of the number of fans of ranking and user of the number in entire social circle in social circle , scoring method in the social circle based on user's performance is specific as follows:
The number of fans of user user is follow_num, the other users friend_list=in social circle {user1,user2,...,userkCorresponding number of fans is
Wherein, rankfollow_numFor ranking of the number of fans in the social circle of oneself of user.
M history are acquired to user user and friend_list and push away text, if the publication of user pushes away literary sum deficiency M still presses M calculating, and M should not be very little, also unsuitable too many, and the referential of very little data is insufficient, needs to expend excessive acquisition too much Time and cost.
For user user, by collected M history push away text thumb up number summation obtain it is total thumb up number, M are gone through The number that turns to push away that history pushes away text sums to obtain and total turns to push away number.
Equally, it for the other users friend_list in the social circle of user user, acquires its M history and pushes away text It always thumbs up number and always turns to push away number, the number that always thumbs up for obtaining user user pushes away ascending order row of the number in its social circle with total turn Name, respectively ranksum_like_numAnd ranksum_retweet_num, then the history of user pushes away thumbing up for text and counts and turn to push away several general comments It is divided into
Wherein, α is normalized harmonic coefficient, and usual value is 0.5.
Scoring of the user in social circle be
User force based on social circle scores
By the scoring after normalization as the user force scoring based on social circle
Wherein, Max_importance_score is all biggest impact power pushed away in literary user in event developing stage Scoring.
Step 3, according to the appraisal result of the scoring model, the abstract of event is obtained.
In the present embodiment, text is pushed away to each piece, three by step 2 methods of marking calculates corresponding scoring, then Mixing is scored
scorej=a1·text_quality_score(tweetj)+
a2·user_importance_score(tweetj)+
a3·summar_score(tweetj)
Wherein, a1、a2、a3For weighting coefficient.
The collected works that push away of i-th of event are combined into si={ tweet1,tweet2,...,tweetN, it is commented then mixing can be found out The collection divided is combined into score_list={ score1,score2,...,scoreN, in score_list, seek maximum mixing Scoring, if maximum score value max_score=scorej, a jth piece for maximum score value is pushed away into i-th great hair of the text as event The abstract in exhibition stage.
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair Bright principle, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.This field Those of ordinary skill disclosed the technical disclosures can make according to the present invention and various not depart from the other each of essence of the invention The specific variations and combinations of kind, these variations and combinations are still within the scope of the present invention.

Claims (6)

1. a kind of push away literary event summary generation method based on mixing scoring model, which comprises the following steps:
Step 1, it obtains and pushes away text;
Step 2, it is given a mark based on mixing scoring model to text is pushed away, wherein the scoring model includes logic-based recurrence point The Tui Wen text quality Rating Model of class device pushes away literary Context Generality degree Rating Model and based on social circle based on undirected graph model User force Rating Model;
Step 3, according to the appraisal result of the scoring model, the abstract of event is obtained.
2. pushing away literary event summary generation method based on mixing scoring model as described in claim 1, which is characterized in that described Step 2 includes following below scheme:
Step 21, logic-based returns scoring Tui Wen text quality for classifier;
Step 22, it is scored based on undirected graph model literary Context Generality degree is pushed away;
Step 23, it is scored based on social circle user force.
3. pushing away literary event summary generation method based on mixing scoring model as claimed in claim 2, which is characterized in that described Step 21 includes following below scheme:
Extraction pushes away literary feature, obtains pushing away literary feature set, wherein pushing away literary feature includes pushing away literary length, pushing away literary ellipsis quantity, push away text " # " number amount, the Alexa ranking for pushing away literary "@" number amount, pushing away literary stop words occupation proportion, pushing away literary url push away literary capitalization and account for With ratio, literary additional character occupation proportion is pushed away, carries out quality score to text is pushed away according to literary feature is pushed away.
4. pushing away literary event summary generation method based on mixing scoring model as claimed in claim 2, which is characterized in that described Step 22 includes following below scheme:
Literary s is pushed away by push away literary eventi={ tweet1,tweet2,...,tweetmThe non-directed graph for pushing away text is converted and is built into, Push away literary tweetiAnd tweetjText similarity text_sim (tweeti,tweetj) be
Wherein, ViAnd VjRespectively push away literary tweetiAnd tweetjPush away cliction vector.
5. pushing away literary event summary generation method based on mixing scoring model as claimed in claim 4, which is characterized in that ViWith VjRespectively push away literary tweetiAnd tweetjSegmented by NLTK tool, remove stop words and punctuation mark, it is stemmed after To push away cliction vector.
6. pushing away literary event summary generation method based on mixing scoring model as claimed in claim 2, which is characterized in that described Step 23 includes following below scheme:
The bean vermicelli quantity of other users and sum_follow_num are in user social contact circle
Wherein, k is the total number of persons of the social circle of user, pushes away text to N and carries out user force scoring, the social activity of i-th of user Enclose number of fans and be sum_follow_numi, the list of the corresponding social circle's number of fans sum of this N number of user is
Sum_follow_num_list=
{sum_follow_num1,sum_follow_num2,...,sum_follow_numN}
Ascending sort is carried out to the number in sum_follow_num_list, pushes away literary tweetjThe sum_follow_num of user exist Ranking therein is ranksum_follow_num, push away literary tweetjThe scoring of the social circle of corresponding user is
The number of fans of user is follow_num, the other users friend_list={ user in social circle1, user2,...,userkCorresponding number of fans is
Wherein, rankfollow_numFor the number of fans ranking of user, M history are acquired to user and friend_list and push away text, are obtained To user always thumb up it is several and it is total turn to push away other users of the number in its social circle always thumb up the several and total turns of liters for pushing away number Sequence ranking ranksum_like_numAnd ranksum_retweet_num, the history of user pushes away thumbing up number and turning to push away several general comments and be divided into for text
Wherein, α is normalized harmonic coefficient, and scoring of the user in social circle is
User force based on social circle scores
By the scoring after normalization as the user force scoring based on social circle
Wherein, Max_importance_score is that all biggest impact power pushed away in literary user in event developing stage are commented Point.
CN201810919909.5A 2018-08-14 2018-08-14 It is a kind of that literary event summary generation method is pushed away based on mixing scoring model Pending CN109255123A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810919909.5A CN109255123A (en) 2018-08-14 2018-08-14 It is a kind of that literary event summary generation method is pushed away based on mixing scoring model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810919909.5A CN109255123A (en) 2018-08-14 2018-08-14 It is a kind of that literary event summary generation method is pushed away based on mixing scoring model

Publications (1)

Publication Number Publication Date
CN109255123A true CN109255123A (en) 2019-01-22

Family

ID=65050157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810919909.5A Pending CN109255123A (en) 2018-08-14 2018-08-14 It is a kind of that literary event summary generation method is pushed away based on mixing scoring model

Country Status (1)

Country Link
CN (1) CN109255123A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060206806A1 (en) * 2004-11-04 2006-09-14 Motorola, Inc. Text summarization
CN104216875A (en) * 2014-09-26 2014-12-17 中国科学院自动化研究所 Automatic microblog text abstracting method based on unsupervised key bigram extraction
CN106844341A (en) * 2017-01-10 2017-06-13 北京百度网讯科技有限公司 News in brief extracting method and device based on artificial intelligence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060206806A1 (en) * 2004-11-04 2006-09-14 Motorola, Inc. Text summarization
CN104216875A (en) * 2014-09-26 2014-12-17 中国科学院自动化研究所 Automatic microblog text abstracting method based on unsupervised key bigram extraction
CN106844341A (en) * 2017-01-10 2017-06-13 北京百度网讯科技有限公司 News in brief extracting method and device based on artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
IPHONEX: ""微博事件摘要生成及演化分析技术研究与应用"", 《HTTPS://WWW.DOC88.COM/P-6837808913651.HTML》 *

Similar Documents

Publication Publication Date Title
CN106980692B (en) Influence calculation method based on microblog specific events
Hu et al. Twitter100k: A real-world dataset for weakly supervised cross-media retrieval
CN108628833B (en) Method and device for determining summary of original content and method and device for recommending original content
Singh et al. Sentiment analysis of textual reviews; Evaluating machine learning, unsupervised and SentiWordNet approaches
US8380697B2 (en) Search and retrieval methods and systems of short messages utilizing messaging context and keyword frequency
CN105718579A (en) Information push method based on internet-surfing log mining and user activity recognition
CN104503960B (en) A kind of text data processing method for English Translation
CN103123653A (en) Search engine retrieving ordering method based on Bayesian classification learning
Bora Summarizing public opinions in tweets
CN103246670A (en) Microblog sorting, searching, display method and system
Litvinova et al. Overview of the RUSProfiling PAN at FIRE Track on Cross-genre Gender Identification in Russian.
CN108897784A (en) One emergency event dimensional analytic system based on social media
Diao et al. A unified model for topics, events and users on twitter
CN109753602A (en) A kind of across social network user personal identification method and system based on machine learning
CN110134792A (en) Text recognition method, device, electronic equipment and storage medium
Qin et al. Automatic article commenting: the task and dataset
WO2017107010A1 (en) Information analysis system and method based on event regression test
Yuan et al. A hybrid method for multi-class sentiment analysis of micro-blogs
CN114881041A (en) Multi-dimensional intelligent extraction system for microblog big data hot topics
CN112000804B (en) Microblog hot topic user group emotion tendentiousness analysis method
CN110609950B (en) Public opinion system search word recommendation method and system
CN110188352B (en) Text theme determining method and device, computing equipment and storage medium
CN109255123A (en) It is a kind of that literary event summary generation method is pushed away based on mixing scoring model
Wei et al. Dietlens-eout: large scale restaurant food photo recognition
Scheffler et al. Mapping German tweets to geographic regions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190122