WO2013117147A1 - 微博排序、搜索、展示方法和系统 - Google Patents

微博排序、搜索、展示方法和系统 Download PDF

Info

Publication number
WO2013117147A1
WO2013117147A1 PCT/CN2013/071325 CN2013071325W WO2013117147A1 WO 2013117147 A1 WO2013117147 A1 WO 2013117147A1 CN 2013071325 W CN2013071325 W CN 2013071325W WO 2013117147 A1 WO2013117147 A1 WO 2013117147A1
Authority
WO
WIPO (PCT)
Prior art keywords
microblog
information
user
content
category
Prior art date
Application number
PCT/CN2013/071325
Other languages
English (en)
French (fr)
Inventor
马尧
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2014515063A priority Critical patent/JP2014522540A/ja
Priority to AP2014007382A priority patent/AP2014007382A0/xx
Priority to KR1020137031978A priority patent/KR20140012750A/ko
Priority to EP13746647.0A priority patent/EP2704040A4/en
Publication of WO2013117147A1 publication Critical patent/WO2013117147A1/zh
Priority to US14/109,949 priority patent/US9785677B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units

Definitions

  • the present invention relates to network technologies, and in particular, to a microblog ordering, searching, and displaying method and system.
  • Weibo has become an important platform for users to communicate with each other and users to show themselves. Users can search for Weibo to get the information they are interested in.
  • the traditional microblog sorting method generally sorts the microblogs in chronological order, and sorts the newer microblogs in the front.
  • the traditional microblog sorting method because all users' microblogs are mixed together, is arranged in chronological order, which causes users to spend a lot of energy and time to find interesting and relevant ones from the numerous microblogs. Weibo.
  • a microblog sorting method includes the following steps:
  • the microblog information is displayed according to the sorted results.
  • a microblog sorting system comprising:
  • the microblog information obtaining module is configured to obtain the microblog information requested by the user
  • the scoring module includes a user information scoring module and a content information scoring module, wherein the user information scoring module is configured to extract microblog publishing user information in the microblog information, and publish user information based on the microblog The microblog information is scored; the content information scoring module is configured to extract content information in the microblog information, and score the microblog information according to the content information;
  • a sorting module configured to sort the microblog information according to the score
  • a display module configured to display the microblog information according to the sorted result.
  • a microblog search method is provided for the user to view.
  • a microblog search method which sorts the microblog search results according to the microblog ranking method, wherein the step of acquiring the microblog information requested by the user comprises: performing a search according to a keyword input by the user, and obtaining the user request Weibo information.
  • microblog search system is provided for user convenience.
  • a microblog search system comprising the above microblog ranking system, wherein the microblog information acquisition module is configured to perform a search according to a keyword input by a user, and obtain microblog information requested by the user.
  • microblog display method is provided for the user to view.
  • a microblog display method in which the microblog request result is sorted according to the microblog ranking method, wherein the step of obtaining the microblog information requested by the user includes: obtaining the user according to the microblog request information corresponding to the user identifier Requested Weibo information.
  • microblog display system is provided for user convenience.
  • a microblog display system comprising the above microblog ranking system, wherein the microblog information obtaining module is configured to obtain the microblog information requested by the user according to the microblog request information corresponding to the user identifier.
  • microblog sorting, searching, displaying method and system extract microblog information in the microblog information and publish the user information and content information, scoring the microblog, and sorting the microblog information according to the score, which may be related to the user.
  • the information is ranked in front, so that users can view the microblog information.
  • FIG. 1 is a schematic flow chart of a microblog sorting method in an embodiment
  • FIG. 2 is a schematic flowchart of scoring microblog information by extracting user information of microblogs in the microblog information in an embodiment
  • FIG. 3 is a schematic flowchart of scoring microblog information by extracting user information of microblogs in the microblog information in another embodiment
  • FIG. 4 is a schematic flowchart of extracting microblog information by extracting content information in microblog information in an embodiment
  • FIG. 5 is a schematic flow chart of features of a training microblog topic category in an embodiment
  • FIG. 6 is a schematic flowchart of acquiring a training subset of a microblog topic category in an embodiment
  • FIG. 7 is a schematic diagram of an acquisition of a technology network type training subset in an embodiment
  • FIG. 8 is a schematic diagram showing the principle of a microblog sorting method in an embodiment
  • FIG. 9 is a schematic structural diagram of a microblog sorting system in an embodiment
  • FIG. 10 is a schematic structural diagram of a scoring module in an embodiment
  • FIG. 11 is a schematic structural diagram of a user information scoring module in an embodiment
  • FIG. 12 is a schematic structural diagram of a user information scoring module in another embodiment
  • FIG. 13 is a schematic structural diagram of a content information scoring module in an embodiment
  • FIG. 14 is a schematic structural diagram of a classification model training module in an embodiment.
  • a microblog ordering method includes the following steps:
  • Step S101 Acquire microblog information requested by the user.
  • Step S102 extracting microblog publishing user information and content information in the microblog information, and scoring the microblog information.
  • the microblog information is also scored high.
  • Step S103 sorting the microblog information according to the score.
  • the microblog information is sorted according to the level of the score, that is, the higher the score of the microblog information, the higher the ranking.
  • step S104 the microblog information is displayed according to the sorted result.
  • the microblog ranking method extracts user information and content information of the microblog in the microblog information, scores the microblog, and sorts the microblog information according to the score, and can list the microblog information related to the user in front. This makes it easy for users to view Weibo information.
  • the step of extracting the microblog publishing user information in the microblog information in step S102 to score the microblog information includes:
  • Step S112 obtaining a microblog operation record of the microblog publishing user, and calculating the activity level of the microblog publishing user according to the microblog operation record.
  • the microblog operation record corresponding to the ID of the user may be found in the database in which the microblog operation record of the user has been stored according to the ID of the microblog publishing user.
  • the microblog operation record may include: whether it is a VIP user, a microblog update frequency, a microblog transfer rate, a microblog original rate, a microblog forwarded comment number, a microblog average word count, a funny score, and the like.
  • the funny score can be obtained based on other users' funny scores on the Weibo publishing user's Weibo.
  • the Weibo operation record of Weibo's published users reflects the activity of Weibo's published users. Specifically, if Weibo publishes a user as a VIP user, or if the Weibo update frequency is high, the reposting rate is high, the original rate is high, the number of forwarded comments is large, the average number of words is large, or the funny score is high, the corresponding setting can be made. The popularity of Weibo publishing users is also high.
  • step S122 the microblog information is scored according to the activity level.
  • the Weibo publishing user has a high degree of activity, and the rating of the Weibo can be correspondingly increased, because the microblogging published by the user with high activity is more likely to be of interest to the user.
  • the microblog information of the microblog publishing user with high activity is also scored high, and the microblog information with high score is ranked first, and the microblog information that is more likely to cause user interest is ranked in front. It is convenient for users to view the Weibo information they are interested in.
  • the step of extracting the microblog publishing user information in the microblog information in step S102 to score the microblog information includes:
  • Step S132 Acquire the personal information of the microblog publishing user and the personal information of the microblog requesting user, and calculate the similarity between the personal information of the microblog publishing user and the personal information of the microblog requesting user.
  • the personal information corresponding to the user's ID may be found in a database in which the user's personal information has been stored according to the ID of the user.
  • the personal information may include: hobbies, education, professional, geographical, personalized signature, collected microblog information, common friends, user type information, and the like.
  • the user types can be classified into: technology type, entertainment type, sports type, art type, political type, and the like.
  • the user type information includes a user type vector, and the component of the user type vector represents a score of the user biased toward a certain user type.
  • the first component of the user type vector may be defined to represent the technology type score and the second component representation.
  • the user type vector can be expressed as (3, 4, ).
  • the user type corresponding to the component with the highest score among the components of the user type vector may be selected as the user type of the user.
  • the user type vector may be obtained by a user's manual setting, or may be obtained by counting the user's attention to the microblog user and the user's friend's user type. For example, among the Weibo users and the users' friends who are concerned by the user, the number of people belonging to the technology type is 5, and the component corresponding to the technology type in the user type vector may be set to 5.
  • the value of the similarity between the microblog publishing user and the microblog requesting user may be increased.
  • the category to which the user's hobby belongs may be found in a database storing the category to which the hobby belongs.
  • the value of similarity between users can also be increased.
  • the similarity of the user type information may also be obtained by calculating the distance of the user type vector. The smaller the distance between the two user type vectors, the higher the similarity of the user type information, and the corresponding users. The similarity is also high.
  • Step S142 Acquire an interaction record between the microblog publishing user and the microblog requesting user, and calculate the degree of association between the microblog publishing user and the microblog requesting user according to the interaction record.
  • the interaction record includes references, accesses, comments, forwarding records, and the like between users. Specifically, if the number of references, visits, comments, and forwardings between users is high, the degree of association between users can be set accordingly.
  • Step S152 the microblog information is scored according to the similarity and the degree of association.
  • the scoring of the microblog information may be increased if the similarity between the microblog publishing user and the microblog requesting user is high or the degree of association is high.
  • the score of the microblog information is also high, and the microblog with high score is obtained.
  • the information is ranked first.
  • These microblog information is also the microblog information that is more likely to cause the microblog to request the user's interest, so that the user can view the microblog information of interest.
  • the step of extracting the microblog posting user information in the microblog information in step S102 to score the microblog information includes steps S112 to S152.
  • the scoring of the microblog information in step S152 may be performed on the basis of the scoring of the microblog information in step S122. That is, the score obtained by the microblog publishing user's activity level and the score obtained by the microblog publishing user and the microblog requesting user's personal information similarity and relevance are used as the comprehensive score of the microblog information, and the above two can be set. The proportion of ratings in the overall score.
  • the step of extracting the content information in the microblog information in step S102 to score the microblog information includes:
  • Step S162 Acquire microblog content in the microblog information, and obtain a topic category vector of the microblog content in the microblog information according to the microblog content and the feature of the microblog topic category.
  • the microblog content includes the text content of the microblog, that is, the content published by the microblog user, and the microblog content may further include the comment content of the microblog.
  • the microblog content published by the publishing user of the microblog at the time of the microblog publishing time may be obtained, and the microblog content is The blog content is put together.
  • the microblog theme categories include: political and military, culture and art, financial stocks, emotional life, social legal system, entertainment gossip, technology network, healthy food, sports, automobile real estate, education job hunting, fashion tourism, and the like.
  • each component of the subject category vector represents a score of the microblog content biased to a certain microblog topic category, for example, the first component of the subject category vector represents a political and military class score, and the second component representation The scores of culture and art, and so on.
  • the subject category vector (5, 10, ...) indicates that the score of Weibo content belongs to the political and military category, and the score attributed to the culture and art category is 10.
  • the microblog topic category corresponding to the component with the highest score is the microblog topic category to which the microblog content belongs.
  • the features of the microblog topic category may be pre-trained. Further, the existing naive Bayes text classification algorithm may be used to classify the microblog content, and the topic category vector of the microblog content is obtained, and details are not described herein. .
  • Step S172 Acquire a historical microblog content of the microblog requesting user, and obtain a topic category vector of the historical microblog content of the microblog requesting the microblog according to the historical microblog content and the feature of the microblog theme category.
  • the Weibo content requested by the user in the recent time period (which can be preset) can be obtained.
  • the average of the plurality of vectors may be obtained as the subject category vector of the historical microblog content of the microblog requesting user.
  • Step S182 calculating the microblog content in the microblog information and the history microblog content of the microblog requesting user according to the topic category vector of the microblog content in the microblog information and the topic category vector of the microblog requesting the user's historical microblog content. The similarity between the two.
  • the similarity between the microblog content in the microblog information and the historical microblog content of the microblog requesting user may be calculated by calculating the distance between the two subject category vectors.
  • the smaller the distance the higher the similarity is set.
  • Step S192 the microblog information is scored according to the similarity.
  • the score of the microblog information is also high, and the microblog information with high score is ranked in front.
  • these top-ranked Weibo content is more likely to attract users' interest, so that users can view the Weibo they are interested in.
  • the microblog topic category needs to be trained in advance, and the microblog ranking method is further include:
  • Step S501 Acquire a preset microblog topic category.
  • the microblog theme categories include: political and military, culture and art, financial stocks, emotional life, social legal system, entertainment gossip, technology network, healthy food, sports, automobile real estate, education job hunting, fashion tourism, and the like.
  • Step S502 acquiring a training subset of the microblog topic category.
  • step S502 includes: step S512, searching for a microblog according to a keyword of a microblog topic category, and acquiring an initial training subset of the microblog topic category; and step S522, according to the pre- The number of times is repeated to perform the following steps S532 and S542: step S532, the high frequency words in the initial training subset are counted; and in step S542, the search results are added to the initial training subset according to the high frequency word search microblog.
  • the name of the Weibo theme category and its split words can be used as keywords of the Weibo theme category, such as political and military categories, and political, military, and political military can be used as keywords in this category, and according to these keywords.
  • a "technical, network, and technology network” may be added to the query set QS1, a word in the QS1 is used as a keyword search microblog, and a training subset RS1 is obtained; a high frequency word in the statistical RS1 is obtained.
  • the method for obtaining the training subset of the microblog topic category in this embodiment can obtain a large number of microblog training samples for each topic category, and provides a basis for extracting the features of each microblog topic category from the training subset.
  • Step S503 extracting features of the microblog topic category from the training subset.
  • the existing classification training method can be used to train the microblog content in the training subset of each topic category, and extract the features of each topic category. I will not repeat them here.
  • the method before the step S104, the method further includes:
  • the Weibo content in the Weibo information is classified, and the display category to which the Weibo content belongs is obtained.
  • the display category may include the Weibo theme categories in the above, such as political and military categories, cultural and art categories, financial stocks, and the like.
  • the microblog topic category to which the microblog content belongs may be obtained according to the topic category vector of the microblog content in the microblog information acquired in step S162, and the subject category corresponding to the component with the highest score in the subject category vector may be the microblog content belongs to.
  • Weibo theme category may be obtained according to the topic category vector of the microblog content in the microblog information acquired in step S162, and the subject category corresponding to the component with the highest score in the subject category vector may be the microblog content belongs to.
  • microblog topic category in addition to the microblog topic category, other display categories may be added, such as a friend class, a location class, a funny class, a help forwarding class, an advertisement activity class, and the like.
  • Whether the microblog information belongs to a friend class can be judged according to whether the microblog user and the microblog request the user as a friend.
  • whether the microblog publishing user and the microblog requesting user are friends are found in the database in which the friend correspondence relationship has been stored according to the ID of the microblog publishing user and the ID of the microblog requesting user.
  • Whether the microblog information belongs to the location class can be judged according to whether the address of the user and the microblog requesting the user belongs to the same region (can be set as a county, a district, etc.) according to the microblog.
  • Whether the microblog is a funny class can be judged according to whether the funny score value found in the database in which the user's funny score has been stored is greater than a preset threshold according to the ID of the microblog publishing user.
  • the user's funny score can be obtained based on other users' funny scores for the user.
  • Whether the microblog information belongs to the help forwarding class or the advertisement activity class can be judged according to whether there is help, high frequency words, etc. in the content of the microblog.
  • the microblog display category may also include a hot topic class.
  • the high-frequency record can be obtained by parsing the content of the webpage; the high-frequency record is scored according to the historical microblog content of the user requested by the microblog; and the microblog in the search result is classified into a hot topic according to the high-frequency record score.
  • the content of the webpage can be parsed according to the existing open source tool Html-parser, and a phrase whose number of occurrences exceeds a preset threshold, that is, a high frequency record, is obtained.
  • the high frequency record can be scored according to the similarity between the high frequency record and the microblog requesting the user's historical microblog content. Specifically, the number of times that the high frequency record appears in the microblog content that the microblog requests the user to post, forward, and comment can be counted, and the high frequency record is scored according to the number of times.
  • the high frequency record of the previous preset position may be selected, and the microblog information of the high frequency record appearing in the microblog content is selected, and the microblog information is classified into a hot topic category.
  • step S104 the specific process of step S104 is: displaying the microblog information according to the display category to which the microblog content belongs and the result of the above sorting.
  • the Weibo information may be classified according to each display category, and the Weibo information with high scores is arranged in front of each display category.
  • the microblog information is divided into multiple display categories for display, which is convenient for the user to select the microblog category of interest to view, which is convenient for the user's operation.
  • each display category is displayed in the order of the scores of Weibo, and the Weibo is ranked in the top order.
  • the Weibo publishing users are more active, or Weibo publishes the user's personal information and Weibo.
  • the similarity of the personal information of the requesting user is high, or the degree of association between the microblog publishing user and the microblog requesting user is high, so that the user can view the microblog that is interested in the user.
  • FIG. 8 is a schematic diagram of the principle of the microblog ranking method in one embodiment:
  • a microblog sorting method can score microblog information according to user information and content information published by Weibo.
  • the microblog publishing user information score is recorded as U, and the content information score is recorded as C.
  • the microblog publishing user information score U can be based on the microblog publishing user activity score A, the microblog publishing user and the microblog requesting user's personal information similarity score P, the microblog publishing user and the microblog requesting user's relevance degree score.
  • R is calculated.
  • Weibo published user activity score A can be obtained according to the following information of Weibo published users: whether it is VIP user, Weibo update frequency, Weibo transfer rate, Weibo original rate, Weibo forwarded comments, Weibo average Word count, funny score, etc.;
  • Weibo published user and Weibo request user's personal information similarity score P can be obtained according to the following information: hobbies, education, professional, geographical, personalized signature, collection of Weibo Information, common friends, user type information, and the like;
  • the relevance score R of the microblog publishing user and the microblog requesting user may be obtained according to the interaction record between the microblog publishing user and the microblog requesting user, and the interaction record includes a reference, Access, comment, forward records, and more.
  • the microblog content information score C may be calculated according to the similarity between the microblog content and the historical microblog content of the microblog requesting user, wherein the similarity may request the user's historical microblog according to the microblog topic category vector and the microblog. The distance between the subject category vectors is calculated. Finally, the above score can be integrated to obtain a comprehensive score of the microblog information.
  • a microblog ranking system includes a microblog information obtaining module 10, a scoring module 20, a sorting module 30, and a display module 40, wherein:
  • the microblog information obtaining module 10 is configured to obtain microblog information requested by the user.
  • the scoring module 20 includes a user information scoring module 201 and a content information scoring module 202, as shown in FIG. 10, wherein the user information scoring module 201 is configured to extract the microblog publishing user information in the microblog information, and publish the user information according to the microblog. The microblog information is scored; the content information scoring module 202 is configured to extract the content information in the microblog information, and score the microblog information according to the content information.
  • the user information scoring module 201 and the content information scoring module 202 score the microblog information in a comprehensive manner.
  • the comprehensive score of the microblog information is also high.
  • the sorting module 30 is configured to sort the microblog information according to the above score.
  • the ranking module 30 sorts the microblog information according to the level of the above comprehensive score, that is, the higher the microblog information score, the higher the ranking.
  • the display module 40 is configured to display the microblog information according to the sorted result.
  • the microblog sorting system extracts user information and content information of the microblog in the microblog information, scores the microblog, and sorts the microblog information according to the score, and can arrange the microblog information related to the user in front. This makes it easy for users to view Weibo information.
  • the user information scoring module 201 includes an activity calculation unit 211 and a first scoring unit 221, wherein:
  • the activity calculation unit 211 is configured to acquire a microblog operation record of the microblog publishing user, and calculate the activity level of the microblog publishing user according to the microblog operation record.
  • the activity calculation unit 211 may find the microblog operation record corresponding to the ID of the user in the database in which the microblog operation record of the user has been stored according to the ID of the microblog publishing user.
  • the microblog operation record may include: whether it is a VIP user, a microblog update frequency, a microblog transfer rate, a microblog original rate, a microblog forwarded comment number, a microblog average word count, a funny score, and the like.
  • the funny score can be obtained based on other users' funny scores on the Weibo publishing user's Weibo.
  • the Weibo operation record of Weibo's published users reflects the activity of Weibo's published users. Specifically, if Weibo publishes a user as a VIP user, or if the Weibo update frequency is high, the reposting rate is high, the original rate is high, the number of forwarded comments is large, the average number of words is large, or the funny score is high, the corresponding setting can be made. The popularity of Weibo publishing users is also high.
  • the first scoring unit 221 is configured to score the microblog information according to the activity level.
  • the microblog publishing user has high activity
  • the first scoring unit 221 can correspondingly increase the scoring of the microblog, because the microblog that publishes the user with high activity is more likely to be interested in the user.
  • the microblog information of the microblog publishing user with high activity is also scored high, and the microblog information with high score is ranked first, and the microblog information that is more likely to cause user interest is ranked in front. It is convenient for users to view the Weibo information they are interested in.
  • the user information scoring module 201 includes a personal information similarity calculation unit 231, an association degree calculation unit 241, and a second scoring unit 251, wherein:
  • the personal information similarity calculation unit 231 is configured to acquire the personal information of the microblog publishing user and the personal information of the microblog requesting user, and calculate the similarity between the personal information of the microblog publishing user and the personal information of the microblog requesting user.
  • the personal information similarity calculation unit 231 can find the personal information corresponding to the ID of the user in the database in which the personal information of the user has been stored according to the ID of the user.
  • the personal information may include: hobbies, education, professional, geographical, personalized signature, collected microblog information, common friends, user type information, and the like.
  • the user types can be classified into: technology type, entertainment type, sports type, art type, political type, and the like.
  • the user type information includes a user type vector, and the component of the user type vector represents a score of the user biased toward a certain user type.
  • the first component of the user type vector may be defined to represent the technology type score and the second component representation.
  • the user type vector can be expressed as (3, 4, ).
  • the user type corresponding to the component with the highest score among the components of the user type vector may be selected as the user type of the user.
  • the user type vector may be obtained by a user's manual setting, or may be obtained by counting the user's attention to the microblog user and the user's friend's user type. For example, among the Weibo users and the users' friends who are concerned by the user, the number of people belonging to the technology type is 5, and the component corresponding to the technology type in the user type vector may be set to 5.
  • the personal information similarity calculation unit 231 can increase the value of the similarity between the microblog publishing user and the microblog requesting user. .
  • the personal information similarity calculation unit 231 may search for a category to which the user's interest category belongs in a database in which the category of the interest category is stored.
  • the user's academic qualifications are the same, such as undergraduate or doctoral degree, the value of similarity between users can also be increased.
  • the similarity of the user type information may also be obtained by calculating the distance of the user type vector. The smaller the distance between the two user type vectors, the higher the similarity of the user type information, and the corresponding users. The similarity is also high.
  • the association degree calculation unit 241 is configured to acquire an interaction record between the microblog publishing user and the microblog requesting user, and calculate the degree of association between the microblog publishing user and the microblog requesting user according to the interaction record.
  • the interaction record includes references, accesses, comments, forwarding records, and the like between users. Specifically, if the number of times of reference, access, comment, and forwarding between users is high, the degree of association calculation unit 241 can set the degree of association between the users to be high.
  • the second scoring unit 251 is configured to score the microblog information according to the similarity and the degree of association described above.
  • the second scoring unit 251 may increase the scoring of the microblog information.
  • the score of the microblog information is also high, and the microblog with high score is obtained.
  • the information is ranked first.
  • These microblog information is also the microblog information that is more likely to cause the microblog to request the user's interest, so that the user can view the microblog information of interest.
  • the user information scoring module 201 includes an activity calculation unit 211, a first scoring unit 221, a personal information similarity calculation unit 231, an association degree calculation unit 241, and a second scoring unit 251.
  • the scoring of the microblog information by the second scoring unit 251 can be performed on the basis of the scoring of the microblog information by the first scoring unit 221, That is, the score obtained by the microblog publishing user's activity level and the score obtained by the microblog publishing user and the microblog requesting user's personal information similarity and relevance are used as the comprehensive score of the microblog information, and the above two can be set.
  • the proportion of ratings in the overall score is not limited to the overall score.
  • the content information scoring module 202 includes a category vector extracting unit 212, a content similarity calculating unit 222, and a third scoring unit 232, wherein:
  • the class vector extracting unit 212 is configured to acquire the microblog content in the microblog information, and acquire the topic category vector of the microblog content in the microblog information according to the microblog content and the feature of the microblog topic category.
  • the microblog content includes the text content of the microblog, that is, the content published by the microblog user, and the microblog content may further include the comment content of the microblog.
  • the category vector extracting unit 212 may obtain the microblog content published by the publishing user of the microblog in the similar time (pre-settable) of the microblog publishing time point. , put together multiple pieces of Weibo content together.
  • the microblog theme categories include: political and military, culture and art, financial stocks, emotional life, social legal system, entertainment gossip, technology network, healthy food, sports, automobile real estate, education job hunting, fashion tourism, and the like.
  • each component of the subject category vector represents a score of the microblog content biased to a certain microblog topic category, for example, the first component of the subject category vector represents a political and military class score, and the second component representation The scores of culture and art, and so on.
  • the subject category vector (5, 10, ...) indicates that the score of Weibo content belongs to the political and military category, and the score attributed to the culture and art category is 10.
  • the microblog topic category corresponding to the component with the highest score is the microblog topic category to which the microblog content belongs.
  • the feature of the microblog topic category may be pre-trained.
  • the class vector extracting unit 212 may use the existing naive Bayesian text classification algorithm to classify the microblog content, and obtain the topic category vector of the microblog content. I will not repeat them here.
  • the category vector extracting unit 212 is further configured to acquire the historical microblog content of the microblog requesting user, and obtain the theme category vector of the historical microblog content of the microblog requesting the microblog according to the historical microblog content and the feature of the microblog theme category.
  • the category vector extracting unit 212 may acquire the microblog content that the microblog requests the user to publish in the recent time period (pre-settable).
  • the category vector extracting unit 712 may find the average of the plurality of vectors as the subject category vector of the historical microblog content of the microblog requesting user.
  • the content similarity calculation unit 222 is configured to calculate the microblog content and the microblog requesting user in the microblog information according to the topic category vector of the microblog content in the microblog information and the topic category vector of the microblog requesting the user's historical microblog content. The similarity between historical Weibo content.
  • the content similarity calculation unit 222 can calculate the similarity between the microblog content in the microblog information and the historical microblog content of the microblog requesting user by calculating the distance between the two topic category vectors. Preferably, the smaller the distance, the higher the similarity is set.
  • the third scoring unit 232 is configured to score the microblog information according to the similarity.
  • the higher the similarity the higher the score of the microblog information by the third scoring unit 232.
  • the score of the microblog information is also high, and the microblog information with high score is ranked in front.
  • these top-ranked Weibo content is more likely to attract users' interest, so that users can view the Weibo they are interested in.
  • the microblog ranking system further includes a classification model training module 50, which is used to train samples of each microblog topic category, and extract features of each microblog topic category.
  • the classification model training module 50 includes a topic category acquisition module 501, a training set acquisition module 502, and a feature extraction module 503:
  • the topic category obtaining module 501 is configured to acquire a preset microblog topic category.
  • the microblog theme categories include: political and military, culture and art, financial stocks, emotional life, social legal system, entertainment gossip, technology network, healthy food, sports, automobile real estate, education job hunting, fashion tourism, and the like.
  • the training set acquisition module 502 is configured to acquire a training subset of the microblog topic category.
  • the training set obtaining module 502 may search the microblog according to the keyword of the microblog topic category to obtain an initial training subset of the microblog topic category; and repeatedly perform the following steps according to the preset number of times: counting the initial training subset High frequency words; search for microblogs based on high frequency words, adding search results to the initial training subset.
  • the training set obtaining module 502 can use the microblog topic category name and its split word as keywords of the microblog topic category, such as political and military categories, and can use political, military, and political military as keywords in this category. And searching according to these keywords to obtain an initial training subset of the category. Further, after the initial training subset is preprocessed, the word segmentation, the filtering stop words are processed, and the high frequency words in the initial subset are counted. Further, the combination of high frequency words and high frequency words can be continuously searched as keywords to obtain more microblog training samples. And repeating the steps of counting the high frequency words in the initial training subset according to the preset number of times, searching the microblog according to the high frequency words, and adding the search results to the initial training subset.
  • the method for obtaining the training subset of the microblog topic category in this embodiment can obtain a large number of microblog training samples for each topic category, and provides a basis for extracting the features of each microblog topic category from the training subset.
  • the feature extraction module 503 is configured to extract features of the microblog topic category from the training subset.
  • the feature extraction module 503 can use the existing classification training method to train the microblog content in the training subset of each topic category, and extract the features of each topic category. I will not repeat them here.
  • the microblog ranking system further includes a display category classification module (not shown) for classifying the microblog content in the microblog information according to the preset microblog display category, and obtaining the microblog content.
  • the display category may include the Weibo theme categories in the above, such as political and military categories, cultural and art categories, financial stocks, and the like.
  • the microblog topic category to which the microblog content belongs may be obtained according to the topic category vector of the microblog content in the microblog information acquired by the category vector extracting unit 212, and the subject category corresponding to the component with the highest score in the subject category vector may be the microblog.
  • the category of the Weibo topic to which the content belongs may include the Weibo theme categories in the above, such as political and military categories, cultural and art categories, financial stocks, and the like.
  • the display category classification module may search for a friend between the microblog publishing user and the microblog requesting user in the database in which the friend correspondence relationship has been stored according to the ID of the microblog publishing user and the ID of the microblog requesting user. .
  • Whether the microblog information belongs to the location class can be judged according to whether the address of the user and the microblog requesting the user belongs to the same region (can be set as a county, a district, etc.) according to the microblog.
  • Whether the microblog is a funny class can be judged according to whether the funny score value found in the database in which the user's funny score has been stored is greater than a preset threshold according to the ID of the microblog publishing user.
  • the user's funny score can be obtained based on other users' funny scores for the user.
  • Whether the microblog information belongs to the help forwarding class or the advertisement activity class can be judged according to whether there is help, high frequency words, etc. in the content of the microblog.
  • the microblog display category may also include a hot topic class.
  • the display category classification module can parse the webpage content to obtain a high frequency record; and according to the microblog requesting the user's historical microblog content, the high frequency record is scored; according to the high frequency record score, the microblog in the search result is classified as a hot topic. class.
  • the display category classification module can parse the webpage content according to the existing open source tool Html-parser, and obtain a phrase whose appearance number exceeds a preset threshold, that is, a high frequency record.
  • the high frequency record can be scored according to the similarity between the high frequency record and the microblog requesting the user's historical microblog content. Specifically, the number of times that the high frequency record appears in the microblog content that the microblog requests the user to post, forward, and comment can be counted, and the high frequency record is scored according to the number of times.
  • the high frequency record of the previous preset position may be selected, and the microblog information of the high frequency record appearing in the microblog content is selected, and the microblog information is classified into a hot topic category.
  • the display module 40 is configured to display the microblog information according to the display category to which the microblog content belongs and the result of the above sorting.
  • the display module 40 can display the microblog information according to each display category, and arrange the microblog information with high scores in the front of each display category.
  • the microblog information is divided into multiple display categories for display, which is convenient for the user to select the microblog category of interest to view, which is convenient for the user's operation.
  • each display category is displayed in the order of the scores of Weibo, and the Weibo is ranked in the top order.
  • the Weibo publishing users are more active, or Weibo publishes the user's personal information and Weibo.
  • the similarity of the personal information of the requesting user is high, or the degree of association between the microblog publishing user and the microblog requesting user is high, so that the user can view the microblog that is interested in the user.
  • a microblog search method which sorts the microblog search results according to the microblog ranking method, wherein the step of obtaining the microblog information requested by the user includes: searching according to the keyword input by the user, and obtaining the microblog information requested by the user. .
  • the traditional search engine may be used to search for keywords input by the user, and search for microblog information matching the keywords, thereby obtaining the microblog information requested by the user.
  • a microblog search system includes the above microblog ranking system, wherein the microblog information obtaining module 10 is configured to perform a search according to a keyword input by a user, and obtain microblog information requested by the user.
  • the microblog information obtaining module 10 may search for keywords input by the user by using a traditional search engine, and search for microblog information matched with the keywords, thereby obtaining microblog information requested by the user.
  • a microblog display method in which the microblog request result is sorted according to the microblog sorting method, wherein the step of obtaining the microblog information requested by the user includes: obtaining the micro request request according to the microblog request information corresponding to the user identifier Bo information.
  • the microblog request information corresponding to the user identifier may be preset to obtain the microblog information of the crowd corresponding to the user identifier.
  • the user identifier (such as the user ID) can be used to find the crowd that the user pays attention to or listen to, and the user's friend, and obtain the microblog information of the crowd and the user friend in the near period, thereby obtaining the user. Requested Weibo information.
  • a microblog display system comprising the microblog ranking system, wherein the microblog information obtaining module 10 is configured to obtain the microblog information requested by the user according to the microblog request information corresponding to the user identifier.
  • the microblog request information corresponding to the user identifier may be preset to obtain the microblog information of the crowd corresponding to the user identifier.
  • the microblog information obtaining module 10 may search for the crowd that the user pays attention to or listen to and the user's friend according to the user identifier (such as the user ID), and obtain the micro and the user's friends in the short period of time. Bo information, so as to get the microblog information requested by the user.
  • the storage medium may be a magnetic disk, an optical disk, or a read-only storage memory (Read-Only) Memory, ROM) or Random Access Memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种微博搜索方法,包括以下步骤:获取用户请求的微博信息;提取所述微博信息中的微博发表用户信息与内容信息,对所述微博信息进行评分;根据所述评分对所述微博信息进行排序;按照上述排序的结果展示所述微博信息。上述微博排序方法,提取微博信息中的微博发表用户信息与内容信息对微博进行评分,并按照评分对微博信息进行排序,将与用户相关的微博信息排在前面,从而方便用户查看微博信息。此外,还提供一种微博排序系统以及微博搜索、展示方法和系统。

Description

微博排序、搜索、展示方法和系统
【技术领域】
本发明涉及网络技术,特别的涉及一种微博排序、搜索、展示方法和系统。
【背景技术】
随着网络技术的发展,微博已经成为用户相互之间沟通交流以及用户展示自己的重要平台。用户可通过搜索微博来获取自己感兴趣的资讯。
传统的微博排序的方法一般将微博按照时间先后进行排序,将时间较新的微博靠前排序。
传统的微博排序方法,由于将所有用户的微博混合在一起,只是按照时间先后顺序进行排列,导致用户需要花费大量的精力和时间从纷繁的微博中找到自己感兴趣的、与自己有关的微博。
【发明内容】
基于此,有必要提供一种能方便用户查看的微博排序方法。
一种微博排序方法,包括以下步骤:
获取用户请求的微博信息;
提取所述微博信息中的微博发表用户信息与内容信息,对所述微博信息进行评分;
根据所述评分对所述微博信息进行排序;
按照所述排序的结果展示所述微博信息。
基于此,还有必要提供一种方便用户查看的微博排序系统。
一种微博排序系统,包括:
微博信息获取模块,用于获取用户请求的微博信息;
评分模块,所述评分模块包括用户信息评分模块和内容信息评分模块,所述用户信息评分模块用于提取所述微博信息中的微博发表用户信息,根据所述微博发表用户信息对所述微博信息进行评分;所述内容信息评分模块用于提取所述微博信息中的内容信息,根据所述内容信息对所述微博信息进行评分;
排序模块,用于根据所述评分对所述微博信息进行排序;
展示模块,用于按照所述排序的结果展示所述微博信息。
此外,还提供一种方便用户查看的微博搜索方法。
一种微博搜索方法,按照上述微博排序方法对微博搜索结果进行排序,其中,所述获取用户请求的微博信息的步骤包括:根据用户输入的关键字进行搜索,得到所述用户请求的微博信息。
此外,还提供一种方便用户查看的微博搜索系统。
一种微博搜索系统,包括上述微博排序系统,其中,所述微博信息获取模块用于根据用户输入的关键字进行搜索,得到所述用户请求的微博信息。
此外,还提供一种方便用户查看的微博展示方法。
一种微博展示方法,按照上述微博排序方法对微博请求结果进行排序,其中,所述获取用户请求的微博信息的步骤包括:根据用户标识对应的微博请求信息,得到所述用户请求的微博信息。
此外,还提供一种方便用户查看的微博展示系统。
一种微博展示系统,包括上述微博排序系统,其中,所述微博信息获取模块用于根据用户标识对应的微博请求信息,得到所述用户请求的微博信息。
上述微博排序、搜索、展示方法和系统,提取微博信息中的微博发表用户信息与内容信息,对微博进行评分,并按照评分对微博信息进行排序,可将与用户相关的微博信息排在前面,从而方便用户查看微博信息。
【附图说明】
图1为一个实施例中的微博排序方法的流程示意图;
图2为一个实施例中的提取微博信息中的微博发表用户信息对微博信息进行评分的流程示意图;
图3为另一个实施例中的提取微博信息中的微博发表用户信息对微博信息进行评分的流程示意图;
图4为一个实施例中的提取微博信息中的内容信息对微博信息进行评分的流程示意图;
图5为一个实施例中的训练微博主题类别的特征的流程示意图;
图6为一个实施例中的获取微博主题类别的训练子集的流程示意图;
图7为一个实施例中的获取科技网络类训练子集的示意图;
图8为一个实施例中的微博排序方法的原理示意图;
图9为一个实施例中的微博排序系统的结构示意图;
图10为一个实施例中的评分模块的结构示意图;
图11为一个实施例中的用户信息评分模块的结构示意图;
图12为另一个实施例中的用户信息评分模块的结构示意图;
图13为一个实施例中的内容信息评分模块的结构示意图;
图14为一个实施例中的分类模型训练模块的结构示意图。
【具体实施方式】
如图1所示,在一个实施例中,一种微博排序方法,包括以下步骤:
步骤S101,获取用户请求的微博信息。
步骤S102,提取微博信息中的微博发表用户信息与内容信息,对微博信息进行评分。
优选的,若微博信息中的微博发表用户信息和内容信息与微博请求用户的相关性高,则对该微博信息的评分也高。
步骤S103,根据评分对微博信息进行排序。
优选的,按照评分的高低对微博信息进行排序,即微博信息评分越高,其排序越靠前。
步骤S104,按照排序的结果展示微博信息。
上述微博排序方法,提取微博信息中的微博发表用户信息与内容信息,对微博进行评分,并按照评分对微博信息进行排序,可将与用户相关的微博信息排在前面,从而方便用户查看微博信息。
如图2所示,在一个实施例中,步骤S102中提取微博信息中的微博发表用户信息对微博信息进行评分的步骤包括:
步骤S112,获取微博发表用户的微博操作记录,根据微博操作记录计算微博发表用户的活跃度。
在一个实施例中,可根据微博发表用户的ID在已经存储了用户的微博操作记录的数据库中查找到用户的ID对应的微博操作记录。优选的,微博操作记录可包括:是否为VIP用户、微博更新频率、微博转贴率、微博原创率、微博被转发评论次数、微博平均字数、搞笑分值等。在一个实施例中,搞笑分值可根据其它用户对微博发表用户的微博的搞笑评分获得。
微博发表用户的微博操作记录体现了微博发表用户的活跃度。具体的,若微博发表用户为VIP用户,或者其微博更新频率高、转帖率高、原创率高、被转发评论次数多、平均字数多、或搞笑分值高等,则可相应的设置微博发表用户的活跃度也高。
步骤S122,根据活跃度对微博信息进行评分。
优选的,微博发表用户的活跃度高,可相应的增加对该微博的评分,因为活跃度高的微博发表用户发表的微博更容易让用户感兴趣。
本实施例中,对活跃度高的微博发表用户的微博信息的评分也高,并将评分高的微博信息排在前面,即将更能引起用户兴趣的微博信息排在前面,从而方便了用户查看其感兴趣的微博信息。
如图3所示,在一个实施例中,步骤S102中提取微博信息中的微博发表用户信息对微博信息进行评分的步骤包括:
步骤S132,获取微博发表用户的个人信息以及微博请求用户的个人信息,计算微博发表用户的个人信息与微博请求用户的个人信息之间的相似度。
在一个实施例中,可根据用户的ID在已经存储了用户的个人信息的数据库中查找到用户的ID对应的个人信息。具体的,个人信息可包括:兴趣爱好、学历、专业、地域、个性签名、收藏的微博信息、共同好友数、用户类型信息等。在一个实施例中,用户类型可分为:科技型、娱乐型、体育型、艺术型、政治型等。优选的,用户类型信息包括用户类型向量,用户类型向量的分量表示用户偏向某一用户类型的分值,例如,可定义用户类型向量的第一个分量表示科技型分值、第二个分量表示娱乐型分值,等等依此类推;若用户偏向科技型的分值为3、偏向娱乐型的分值为4,则用户类型向量可表示为(3,4,…)。优选的,可选择用户类型向量的分量中分值最高的分量对应的用户类型为该用户的用户类型。
在一个实施例中,用户类型向量可通过用户手动设置获得,也可以通过统计用户关注的微博用户以及用户的好友的用户类型来获得。例如,用户关注的微博用户以及用户的好友中,属于科技型的人数为5,则可设置用户类型向量中科技型对应的分量为5。
在一个实施例中,若用户的兴趣爱好相同或者兴趣爱好所属的分类相同,如都为艺术类,则可提高微博发表用户与微博请求用户的相似度的值。在一个实施例中,可在存储了兴趣爱好的所属的分类的数据库中查找用户的兴趣爱好所属的分类。相应的,若用户的学历相同,如都为本科,或都是博士以上学历,则也可增加用户之间的相似度的值。同样的,若用户的专业相同或专业所属的分科相同,或者用户的地域相同或所属的地区相同,或者用户个性签有相同的关键词,或者用户收藏的微博信息相同,或者用户的用户类型信息相似,或者用户之间的共同微博好友的数量超过设定的阈值等,都可以增加用户之间的相似度的值。在一个实施例中,还可通过计算上述用户类型向量的距离来获取用户类型信息的相似度,两用户类型向量的距离越小,则用户类型信息的相似度越高,相应的用户之间的相似度也高。
步骤S142,获取微博发表用户与微博请求用户之间的交互记录,根据交互记录计算微博发表用户与微博请求用户之间的关联度。
在一个实施例中,交互记录包括用户之间的引用、访问、评论、转发记录等。具体的,若用户之间的引用、访问、评论、转发次数高,则可相应的设置用户之间的关联度也高。
步骤S152,根据上述相似度和关联度对微博信息进行评分。
优选的,若微博发表用户与微博请求用户之间的相似度高或关联度高,则可增加对微博信息的评分。
本实施例中,若微博发表用户与微博请求用户之间的个人信息相似度高或者二者之间的关联度高,则对微博信息的评分也高,并将评分高的微博信息排在前面,这些微博信息也是更可能引起微博请求用户的兴趣的微博信息,因而可方便用户查看其感兴趣的微博信息。
在一个实施例中,步骤S102中提取微博信息中的微博发表用户信息对微博信息进行评分的步骤包括步骤S112~S152。步骤S152对微博信息的评分可在步骤S122对微博信息的评分的基础上进行, 即综合根据微博发表用户的活跃度获得的评分以及根据微博发表用户与微博请求用户之间的个人信息相似度和关联度获取的评分作为微博信息的综合评分,并可设置上述两个评分在综合评分中所占的比重。
如图4所示,在一个实施例中,步骤S102中提取微博信息中的内容信息对微博信息进行评分的步骤包括:
步骤S162,获取微博信息中的微博内容,根据微博内容以及微博主题类别的特征获取微博信息中的微博内容的主题类别向量。
优选的,微博内容包括微博的正文内容即微博发表用户发表的内容,微博内容还可包括微博的评论内容。在一个实施例中,若微博内容的字数不多,则可以获取该微博的发表用户在该微博发表时间点的相近时间(可预先设置)内发表的微博内容,将多条微博内容拼合成在一起。
优选的,微博主题类别包括:政治军事、文化艺术、财经股票、情感人生、社会法制、娱乐八卦、科技网络、健康美食、体育运动、汽车房产、教育求职、时尚旅游等。优选的,主题类别向量的每一个分量表示微博内容偏向归属于某一微博主题类别的分值,例如,主题类别向量的第一个分量表示政治军事类的分值、第二个分量表示文化艺术类的分值,等等依次类推。则主题类别向量(5,10,…)表示微博内容偏向归属于政治军事类的分值为5,而偏向归属于文化艺术类的分值为10。优选的,可取分值最高的分量对应的微博主题类别为微博内容所属的微博主题类别。
具体的,可预先训练出微博主题类别的特征,进一步的,可采用现有的朴素贝叶斯文本分类算法对微博内容进行分类,获得微博内容的主题类别向量,在此不再赘述。
步骤S172,获取微博请求用户的历史微博内容,根据历史微博内容以及微博主题类别的特征获取微博请求用户的历史微博内容的主题类别向量。
具体的,可获取近期时间段(可预先设置)内微博请求用户发表的微博内容。优选的,获得多条历史微博内容的主题类别向量后,可以求该多个向量的平均值作为微博请求用户的历史微博内容的主题类别向量。
步骤S182,根据微博信息中的微博内容的主题类别向量和微博请求用户的历史微博内容的主题类别向量,计算微博信息中的微博内容与微博请求用户的历史微博内容之间的相似度。
具体的,可通过计算上述两个主题类别向量之间的距离来计算微博信息中的微博内容与微博请求用户的历史微博内容之间的相似度。优选的,距离越小,则设置相似度越高。
步骤S192,根据该相似度对微博信息进行评分。
优选的,相似度越高,则对微博信息的评分越高。
本实施例中,若微博信息中的微博内容与微博请求用户的历史微博内容的相似度高,则对微博信息的评分也高,并将评分高的微博信息排在前面,而这些靠前排列的微博内容更容易引起用户的兴趣,因而可方便用户查看其感兴趣的微博。
如图5所示,在一个实施例中,在步骤S102中的提取微博信息中的内容信息对微博信息进行评分之前,需要预先训练出微博主题类别的特征,上述微博排序方法还包括:
步骤S501,获取预设的微博主题类别。
优选的,微博主题类别包括:政治军事、文化艺术、财经股票、情感人生、社会法制、娱乐八卦、科技网络、健康美食、体育运动、汽车房产、教育求职、时尚旅游等。
步骤S502,获取微博主题类别的训练子集。
优选的,为了从训练子集中提取出主题类别更好的特征,可获取一定范围内尽可能多的微博训练样本。如图6所示,在一个实施例中,步骤S502的具体过程包括:步骤S512,根据微博主题类别的关键词搜索微博,获取微博主题类别的初始训练子集;步骤S522,按照预设次数重复执行以下步骤S532和步骤S542:步骤S532,统计初始训练子集中的高频词;步骤S542,根据高频词搜索微博,将搜索结果加入初始训练子集。
具体的,可将微博主题类别名称及其拆分词作为微博主题类别的关键词,如政治军事类,可将政治、军事以及政治军事作为这一类别的关键词,并根据这些关键词进行搜索,获取该类别的初始训练子集。进一步的,可将初始训练子集进行预处理后,对其进行分词、过滤停用词的处理,并统计初始子集中的高频词。进一步的,可继续将高频词以及高频词的组合作为关键词进行搜索,以获得更多的微博训练样本。并按照预设次数重复统计初始训练子集中的高频词、将高频词作为关键词搜索微博并将搜索结果加入初始训练子集的步骤。
例如,如图7所示,可将“科技、网络、科技网络”添加到查询集QS1,将QS1中的词作为关键词搜索微博,得到训练子集RS1;统计RS1中的高频词,例如,得到“科学、IT、手机、数据、互联网”等等,将得到的高频词添加到QS1中,得到QS2;将QS2中的词以及词的组合作为关键词搜索微博,将获得的微博搜索结果加入RS1中,得到RS2;统计RS2中的高频词,并将得到的高频词添加至QS2,得到QS3;将QS3中的词以及词的组合作为关键词搜索微博,将获得的微博搜索结果加入RS2中,得到RS3;依次类推,得到QS4和RS4,重复上述统计和搜索步骤,训练子集中样本的数量即会得到扩充。
本实施例中获取微博主题类别的训练子集的方法可获得每一主题类别的大量的微博训练样本,为从训练子集中提取出各微博主题类别的特征提供了基础。
步骤S503,从训练子集中提取出微博主题类别的特征。
具体的,可利用现有的分类训练方法对每一主题类别的训练子集中的微博内容进行训练,提取出每一主题类别的特征。在此不再赘述。
在一个实施例中,在步骤S104之前,上述方法还包括:
按照预设的微博展示类别对微博信息中的微博内容进行归类,得到微博内容所属的展示类别。
具体的,展示类别可包括上文中的微博主题类别,如政治军事类、文化艺术类、财经股票类等。微博内容所属的微博主题类别可根据步骤S162中获取的微博信息中的微博内容的主题类别向量得到,可取主题类别向量中分值最高的分量对应的主题类别为微博内容所属的微博主题类别。
在一个实施例中,除了微博主题类别外,还可以增加其它的展示类别,如好友类、地点类、搞笑类、求助转发类、广告活动类等。微博信息是否属于好友类,可根据微博发表用户与微博请求用户是否为好友来判断。在一个实施例中,可根据微博发表用户的ID以及微博请求用户的ID在已经存储了好友对应关系的数据库中查找微博发表用户与微博请求用户之间是否为好友。微博信息是否属于地点类,则可根据微博发表用户与微博请求用户的地址是否属于同一地区(可设置为县、区等)来判断。微博是否属于搞笑类,则可根据微博发表用户的ID在已经存储了用户的搞笑分值的数据库中查找到的搞笑分值是否大于预设阈值来判断。在一个实施例中,用户的搞笑分值可根据其它用户对该用户的搞笑评分获得。微博信息是否属于求助转发类、广告活动类,则可根据微博内容中是否出现求助、广告高频词等来判断。
在一个实施例中,微博展示类别还可以包括热门话题类。具体的,可解析网页内容获取高频记录;根据微博请求用户的历史微博内容对上述高频记录进行评分;根据高频记录评分选取搜索结果中的微博归为热门话题类。
优选的,可根据现有的开源工具Html-parser对网页内容进行解析,得到出现次数超过预设阈值的词组,即高频记录。进一步的,可根据高频记录与微博请求用户的历史微博内容的相似度对高频记录进行评分。具体的,可统计高频记录在微博请求用户发表、转发、评论的微博内容中出现的次数,根据该次数对高频记录进行评分。最后,可选取评分靠前预设位的高频记录,并选取微博内容中出现该高频记录的微博信息,将该微博信息归为热门话题类。
在本实施例中,步骤S104的具体过程为:按照微博内容所属的展示类别及上述排序的结果展示微博信息。
具体的,可将微博信息按照各展示类别分类展示,并在各展示类别中,将评分高的微博信息靠前排列。
本实施例中,将微博信息分为多个展示类别进行展示,可方便用户选择自己感兴趣的微博类别进行查看,方便了用户的操作。另外,每一展示类别都是按照对微博的评分的高低顺序进行展示,排列顺序靠前的微博,其微博发表用户的活跃度较高、或微博发表用户的个人信息与微博请求用户的个人信息相似度较高、或微博发表用户与微博请求用户的关联度较高,从而可方便用户查看与自己有关的、感兴趣的微博。
图8为一个实施例中的微博排序方法的原理示意图:
一种微博排序方法,可根据微博发表用户信息和内容信息对微博信息进行评分,微博发表用户信息评分记为U,内容信息评分记为C。其中,微博发表用户信息评分U可根据微博发表用户活跃度评分A、微博发表用户与微博请求用户的个人信息相似度评分P、微博发表用户与微博请求用户的关联度评分R计算得到。而微博发表用户活跃度评分A可根据微博发表用户的如下信息获得:是否为VIP用户、微博更新频率、微博转贴率、微博原创率、微博被转发评论次数、微博平均字数、搞笑分值,等等;微博发表用户与微博请求用户的个人信息相似度评分P可根据二者的如下信息获得:兴趣爱好、学历、专业、地域、个性签名、收藏的微博信息、共同好友数、用户类型信息,等等;微博发表用户与微博请求用户的关联度评分R可根据微博发表用户与微博请求用户之间的交互记录获得,交互记录包括引用、访问、评论、转发记录,等等。微博内容信息评分C可根据微博内容与微博请求用户的历史微博内容之间的相似度计算得到,其中,该相似度可根据微博主题类别向量与微博请求用户的历史微博主题类别向量之间的距离计算得到。最后,可整合以上评分获得微博信息的综合评分,在一个实施例中,综合评分=a1*U+a2*C=b1*A+b2*P+b3*R+a2*C,其中:a1、a2、b1、b2、b3为预设的系数。
如图9所示,在一个实施例中,一种微博排序系统,包括微博信息获取模块10、评分模块20、排序模块30、展示模块40,其中:
微博信息获取模块10用于获取用户请求的微博信息。
评分模块20包括用户信息评分模块201和内容信息评分模块202,如图10所示,其中,用户信息评分模块201用于提取微博信息中的微博发表用户信息,根据微博发表用户信息对微博信息进行评分;内容信息评分模块202用于提取微博信息中的内容信息,根据内容信息对微博信息进行评分。
用户信息评分模块201与内容信息评分模块202对微博信息评分得到综合评分。优选的,若微博信息中的微博发表用户信息和内容信息与微博请求用户的相关性高,则该微博信息的综合评分也高。
排序模块30用于根据上述评分对微博信息进行排序。
优选的,排序模块30按照上述综合评分的高低对微博信息进行排序,即微博信息评分越高,其排序越靠前。
展示模块40用于按照排序的结果展示微博信息。
上述微博排序系统,提取微博信息中的微博发表用户信息与内容信息,对微博进行评分,并按照评分对微博信息进行排序,可将与用户相关的微博信息排在前面,从而方便用户查看微博信息。
如图11所示,在一个实施例中,用户信息评分模块201包括活跃度计算单元211、第一评分单元221,其中:
活跃度计算单元211用于获取微博发表用户的微博操作记录,根据微博操作记录计算微博发表用户的活跃度。
在一个实施例中,活跃度计算单元211可根据微博发表用户的ID在已经存储了用户的微博操作记录的数据库中查找到用户的ID对应的微博操作记录。优选的,微博操作记录可包括:是否为VIP用户、微博更新频率、微博转贴率、微博原创率、微博被转发评论次数、微博平均字数、搞笑分值等。在一个实施例中,搞笑分值可根据其它用户对微博发表用户的微博的搞笑评分获得。
微博发表用户的微博操作记录体现了微博发表用户的活跃度。具体的,若微博发表用户为VIP用户,或者其微博更新频率高、转帖率高、原创率高、被转发评论次数多、平均字数多、或搞笑分值高等,则可相应的设置微博发表用户的活跃度也高。
第一评分单元221用于根据活跃度对微博信息进行评分。
优选的,微博发表用户的活跃度高,第一评分单元221可相应的增加对该微博的评分,因为活跃度高的微博发表用户发表的微博更容易让用户感兴趣。
本实施例中,对活跃度高的微博发表用户的微博信息的评分也高,并将评分高的微博信息排在前面,即将更能引起用户兴趣的微博信息排在前面,从而方便了用户查看其感兴趣的微博信息。
如图12所示,在一个实施例中,用户信息评分模块201包括个人信息相似度计算单元231、关联度计算单元241、第二评分单元251,其中:
个人信息相似度计算单元231用于获取微博发表用户的个人信息以及微博请求用户的个人信息,计算微博发表用户的个人信息与微博请求用户的个人信息之间的相似度。
在一个实施例中,个人信息相似度计算单元231可根据用户的ID在已经存储了用户的个人信息的数据库中查找到用户的ID对应的个人信息。具体的,个人信息可包括:兴趣爱好、学历、专业、地域、个性签名、收藏的微博信息、共同好友数、用户类型信息等。在一个实施例中,用户类型可分为:科技型、娱乐型、体育型、艺术型、政治型等。优选的,用户类型信息包括用户类型向量,用户类型向量的分量表示用户偏向某一用户类型的分值,例如,可定义用户类型向量的第一个分量表示科技型分值、第二个分量表示娱乐型分值,等等依此类推;若用户偏向科技型的分值为3、偏向娱乐型的分值为4,则用户类型向量可表示为(3,4,…)。优选的,可选择用户类型向量的分量中分值最高的分量对应的用户类型为该用户的用户类型。
在一个实施例中,用户类型向量可通过用户手动设置获得,也可以通过统计用户关注的微博用户以及用户的好友的用户类型来获得。例如,用户关注的微博用户以及用户的好友中,属于科技型的人数为5,则可设置用户类型向量中科技型对应的分量为5。
在一个实施例中,若用户的兴趣爱好相同或者兴趣爱好所属的分类相同,如都为艺术类,则个人信息相似度计算单元231可提高微博发表用户与微博请求用户的相似度的值。在一个实施例中,个人信息相似度计算单元231可在存储了兴趣爱好的所属的分类的数据库中查找用户的兴趣爱好所属的分类。相应的,若用户的学历相同,如都为本科,或都是博士以上学历,则也可增加用户之间的相似度的值。同样的,若用户的专业相同或专业所属的分科相同,或者用户的地域相同或所属的地区相同,或者用户个性签有相同的关键词,或者用户收藏的微博信息相同,或者用户的用户类型信息相似,或者用户之间的共同微博好友的数量超过设定的阈值等,都可以增加用户之间的相似度的值。在一个实施例中,还可通过计算上述用户类型向量的距离来获取用户类型信息的相似度,两用户类型向量的距离越小,则用户类型信息的相似度越高,相应的用户之间的相似度也高。
关联度计算单元241用于获取微博发表用户与微博请求用户之间的交互记录,根据交互记录计算微博发表用户与微博请求用户之间的关联度。
在一个实施例中,交互记录包括用户之间的引用、访问、评论、转发记录等。具体的,若用户之间的引用、访问、评论、转发次数高,则关联度计算单元241可相应的设置用户之间的关联度也高。
第二评分单元251用于根据上述相似度和关联度对微博信息进行评分。
优选的,若微博发表用户与微博请求用户之间的相似度高或关联度高,则第二评分单元251可增加对微博信息的评分。
本实施例中,若微博发表用户与微博请求用户之间的个人信息相似度高或者二者之间的关联度高,则对微博信息的评分也高,并将评分高的微博信息排在前面,这些微博信息也是更可能引起微博请求用户的兴趣的微博信息,因而可方便用户查看其感兴趣的微博信息。
在一个实施例中,用户信息评分模块201包括活跃度计算单元211、第一评分单元221、个人信息相似度计算单元231、关联度计算单元241、第二评分单元251。第二评分单元251对微博信息的评分可在第一评分单元221对微博信息的评分的基础上进行, 即综合根据微博发表用户的活跃度获得的评分以及根据微博发表用户与微博请求用户之间的个人信息相似度和关联度获取的评分作为微博信息的综合评分,并可设置上述两个评分在综合评分中所占的比重。
如图13所示,在一个实施例中,内容信息评分模块202包括类别向量提取单元212、内容相似度计算单元222、第三评分单元232,其中:
类别向量提取单元212用于获取微博信息中的微博内容,根据微博内容以及微博主题类别的特征获取微博信息中的微博内容的主题类别向量。
优选的,微博内容包括微博的正文内容即微博发表用户发表的内容,微博内容还可包括微博的评论内容。在一个实施例中,若微博内容的字数不多,则类别向量提取单元212可以获取该微博的发表用户在该微博发表时间点的相近时间(可预先设置)内发表的微博内容,将多条微博内容拼合成在一起。
优选的,微博主题类别包括:政治军事、文化艺术、财经股票、情感人生、社会法制、娱乐八卦、科技网络、健康美食、体育运动、汽车房产、教育求职、时尚旅游等。优选的,主题类别向量的每一个分量表示微博内容偏向归属于某一微博主题类别的分值,例如,主题类别向量的第一个分量表示政治军事类的分值、第二个分量表示文化艺术类的分值,等等依次类推。则主题类别向量(5,10,…)表示微博内容偏向归属于政治军事类的分值为5,而偏向归属于文化艺术类的分值为10。优选的,可取分值最高的分量对应的微博主题类别为微博内容所属的微博主题类别。
具体的,可预先训练出微博主题类别的特征,进一步的,类别向量提取单元212可采用现有的朴素贝叶斯文本分类算法对微博内容进行分类,获得微博内容的主题类别向量,在此不再赘述。
类别向量提取单元212还用于获取微博请求用户的历史微博内容,根据历史微博内容以及微博主题类别的特征获取微博请求用户的历史微博内容的主题类别向量。
具体的,类别向量提取单元212可获取近期时间段(可预先设置)内微博请求用户发表的微博内容。优选的,获得多条历史微博内容的主题类别向量后,类别向量提取单元712可以求该多个向量的平均值作为微博请求用户的历史微博内容的主题类别向量。
内容相似度计算单元222用于根据微博信息中的微博内容的主题类别向量和微博请求用户的历史微博内容的主题类别向量,计算微博信息中的微博内容与微博请求用户的历史微博内容之间的相似度。
具体的,内容相似度计算单元222可通过计算上述两个主题类别向量之间的距离来计算微博信息中的微博内容与微博请求用户的历史微博内容之间的相似度。优选的,距离越小,则设置相似度越高。
第三评分单元232用于根据相似度对微博信息进行评分。
优选的,相似度越高,则第三评分单元232对微博信息的评分越高。
本实施例中,若微博信息中的微博内容与微博请求用户的历史微博内容的相似度高,则对微博信息的评分也高,并将评分高的微博信息排在前面,而这些靠前排列的微博内容更容易引起用户的兴趣,因而可方便用户查看其感兴趣的微博。
本实施例中,需要预先训练出微博主题类别的特征,上述微博排序系统还包括分类模型训练模块50,用于训练各微博主题类别的样本,并提取出各微博主题类别的特征,如图14所示,分类模型训练模块50包括主题类别获取模块501、训练集获取模块502、特征提取模块503:
主题类别获取模块501用于获取预设的微博主题类别。
优选的,微博主题类别包括:政治军事、文化艺术、财经股票、情感人生、社会法制、娱乐八卦、科技网络、健康美食、体育运动、汽车房产、教育求职、时尚旅游等。
训练集获取模块502用于获取微博主题类别的训练子集。
优选的,为了从训练子集中提取出主题类别更好的特征,可获取一定范围内尽可能多的微博训练样本。在一个实施例中,训练集获取模块502可根据微博主题类别的关键词搜索微博,获取微博主题类别的初始训练子集;并按照预设次数重复执行以下步骤:统计初始训练子集中的高频词;根据高频词搜索微博,将搜索结果加入初始训练子集。
具体的,训练集获取模块502可将微博主题类别名称及其拆分词作为微博主题类别的关键词,如政治军事类,可将政治、军事以及政治军事作为这一类别的关键词,并根据这些关键词进行搜索,获取该类别的初始训练子集。进一步的,可将初始训练子集进行预处理后,对其进行分词、过滤停用词的处理,并统计初始子集中的高频词。进一步的,可继续将高频词以及高频词的组合作为关键词进行搜索,以获得更多的微博训练样本。并按照预设次数重复统计初始训练子集中的高频词、根据高频词搜索微博并将搜索结果加入初始训练子集的步骤。
本实施例中获取微博主题类别的训练子集的方法可获得每一主题类别的大量的微博训练样本,为从训练子集中提取出各微博主题类别的特征提供了基础。
特征提取模块503用于从训练子集中提取出微博主题类别的特征。
具体的,特征提取模块503可利用现有的分类训练方法对每一主题类别的训练子集中的微博内容进行训练,提取出每一主题类别的特征。在此不再赘述。
在一个实施例中,上述微博排序系统还包括展示类别分类模块(图中未示出),用于按照预设的微博展示类别对微博信息中的微博内容进行归类,得到微博内容所属的展示类别。
具体的,展示类别可包括上文中的微博主题类别,如政治军事类、文化艺术类、财经股票类等。微博内容所属的微博主题类别可根据类别向量提取单元212中获取的微博信息中的微博内容的主题类别向量得到,可取主题类别向量中分值最高的分量对应的主题类别为微博内容所属的微博主题类别。
在一个实施例中,除了微博主题类别外,还可以增加其它的展示类别,如好友类、地点类、搞笑类、求助转发类、广告活动类等。微博信息是否属于好友类,可根据微博发表用户与微博请求用户是否为好友来判断。在一个实施例中,展示类别分类模块可根据微博发表用户的ID以及微博请求用户的ID在已经存储了好友对应关系的数据库中查找微博发表用户与微博请求用户之间是否为好友。微博信息是否属于地点类,则可根据微博发表用户与微博请求用户的地址是否属于同一地区(可设置为县、区等)来判断。微博是否属于搞笑类,则可根据微博发表用户的ID在已经存储了用户的搞笑分值的数据库中查找到的搞笑分值是否大于预设阈值来判断。在一个实施例中,用户的搞笑分值可根据其它用户对该用户的搞笑评分获得。微博信息是否属于求助转发类、广告活动类,则可根据微博内容中是否出现求助、广告高频词等来判断。
在一个实施例中,微博展示类别还可以包括热门话题类。具体的,展示类别分类模块可解析网页内容获取高频记录;根据微博请求用户的历史微博内容对上述高频记录进行评分;根据高频记录评分选取搜索结果中的微博归为热门话题类。
优选的,展示类别分类模块可根据现有的开源工具Html-parser对网页内容进行解析,得到出现次数超过预设阈值的词组,即高频记录。进一步的,可根据高频记录与微博请求用户的历史微博内容的相似度对高频记录进行评分。具体的,可统计高频记录在微博请求用户发表、转发、评论的微博内容中出现的次数,根据该次数对高频记录进行评分。最后,可选取评分靠前预设位的高频记录,并选取微博内容中出现该高频记录的微博信息,将该微博信息归为热门话题类。
在本实施例中,展示模块40用于按照微博内容所属的展示类别及上述排序的结果展示微博信息。
具体的,展示模块40可将微博信息按照各展示类别分类展示,并在各展示类别中,将评分高的微博信息靠前排列。
本实施例中,将微博信息分为多个展示类别进行展示,可方便用户选择自己感兴趣的微博类别进行查看,方便了用户的操作。另外,每一展示类别都是按照对微博的评分的高低顺序进行展示,排列顺序靠前的微博,其微博发表用户的活跃度较高、或微博发表用户的个人信息与微博请求用户的个人信息相似度较高、或微博发表用户与微博请求用户的关联度较高,从而可方便用户查看与自己有关的、感兴趣的微博。
一种微博搜索方法,按照上述微博排序方法对微博搜索结果进行排序,其中,获取用户请求的微博信息的步骤包括:根据用户输入的关键字进行搜索,得到用户请求的微博信息。
具体的,可利用传统的搜索引擎对用户输入的关键字进行搜索,查找与关键字匹配的微博信息,从而得到用户请求的微博信息。
一种微博搜索系统,包括上述微博排序系统,其中,微博信息获取模块10用于根据用户输入的关键字进行搜索,得到用户请求的微博信息。
具体的,微博信息获取模块10可利用传统的搜索引擎对用户输入的关键字进行搜索,查找与关键字匹配的微博信息,从而得到用户请求的微博信息。
一种微博展示方法,按照上述的微博排序方法对微博请求结果进行排序,其中,获取用户请求的微博信息的步骤包括:根据用户标识对应的微博请求信息,得到用户请求的微博信息。
在一个实施例中,可预先设置用户标识对应的微博请求信息为:获取用户标识对应的人群的微博信息。例如,当用户登录微博帐户时,可根据用户标识(如用户ID)查找用户关注或收听的人群以及用户的好友,并获取该人群以及用户好友近段时间内的微博信息,从而得到用户请求的微博信息。
一种微博展示系统,包括上述微博排序系统,其中,微博信息获取模块10用于根据用户标识对应的微博请求信息,得到所述用户请求的微博信息。
在一个实施例中,可预先设置用户标识对应的微博请求信息为:获取用户标识对应的人群的微博信息。例如,当用户登录微博帐户时,微博信息获取模块10可根据用户标识(如用户ID)查找用户关注或收听的人群以及用户的好友,并获取该人群以及用户好友近段时间内的微博信息,从而得到用户请求的微博信息。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序控制相关的硬件来完成的,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (18)

  1. 一种微博排序方法,包括以下步骤:
    获取用户请求的微博信息;
    提取所述微博信息中的微博发表用户信息与内容信息,对所述微博信息进行评分;
    根据所述评分对所述微博信息进行排序;
    按照所述排序的结果展示所述微博信息。
  2. 根据权利要求1所述的微博排序方法,其特征在于,提取所述微博信息中的微博发表用户信息对所述微博信息进行评分的步骤包括:
    获取所述微博发表用户的微博操作记录,根据所述微博操作记录计算所述微博发表用户的活跃度;
    根据所述活跃度对所述微博信息进行评分。
  3. 根据权利要求1或2所述的微博排序方法,其特征在于,提取所述微博信息中的微博发表用户信息对所述微博信息进行评分的步骤包括:
    获取所述微博发表用户的个人信息以及微博请求用户的个人信息,计算所述微博发表用户的个人信息与所述微博请求用户的个人信息之间的相似度;
    获取所述微博发表用户与所述微博请求用户之间的交互记录,根据所述交互记录计算所述微博发表用户与所述微博请求用户之间的关联度;
    根据所述相似度和所述关联度对所述微博信息进行评分。
  4. 根据权利要求1所述的微博排序方法,其特征在于,提取所述微博信息中的内容信息对所述微博信息进行评分的步骤包括:
    获取所述微博信息中的微博内容,根据所述微博内容以及微博主题类别的特征获取所述微博信息中的微博内容的主题类别向量;
    获取微博请求用户的历史微博内容,根据所述历史微博内容以及微博主题类别的特征获取所述微博请求用户的历史微博内容的主题类别向量;
    根据所述微博信息中的微博内容的主题类别向量和所述微博请求用户的历史微博内容的主题类别向量,计算所述微博信息中的微博内容与所述微博请求用户的历史微博内容之间的相似度;
    根据所述相似度对所述微博信息进行评分。
  5. 根据权利要求4所述的微博排序方法,其特征在于,在提取所述微博信息中的内容信息对所述微博信息进行评分的步骤之前,所述方法还包括:
    获取预设的微博主题类别;
    获取所述微博主题类别的训练子集;
    从所述训练子集中提取出所述微博主题类别的特征。
  6. 根据权利要求5所述的微博排序方法,其特征在于,所述获取所述微博主题类别的训练子集的步骤包括:
    根据所述微博主题类别的关键词搜索微博,获取所述微博主题类别的初始训练子集;
    按照预设次数重复执行以下步骤:
    统计所述初始训练子集中的高频词;
    根据所述高频词搜索微博,将搜索结果加入所述初始训练子集。
  7. 根据权利要求1所述的微博排序方法,其特征在于,在所述按照所述排序的结果展示所述微博信息的步骤之前,所述方法还包括:
    按照预设的微博展示类别对所述微博信息中的微博内容进行归类,得到所述微博内容所属的展示类别;
    所述按照所述排序的结果展示所述微博信息的步骤为:
    按照所述微博内容所属的展示类别及所述排序的结果展示所述微博信息。
  8. 一种微博排序系统,其特征在于,包括:
    微博信息获取模块,用于获取用户请求的微博信息;
    评分模块,所述评分模块包括用户信息评分模块和内容信息评分模块,所述用户信息评分模块用于提取所述微博信息中的微博发表用户信息,根据所述微博发表用户信息对所述微博信息进行评分;所述内容信息评分模块用于提取所述微博信息中的内容信息,根据所述内容信息对所述微博信息进行评分;
    排序模块,用于根据所述评分对所述微博信息进行排序;
    展示模块,用于按照所述排序的结果展示所述微博信息。
  9. 根据权利要求8所述的微博排序系统,其特征在于,所述用户信息评分模块包括:
    活跃度计算单元,用于获取所述微博发表用户的微博操作记录,根据所述微博操作记录计算所述微博发表用户的活跃度;
    第一评分单元,根据所述活跃度对所述微博信息进行评分。
  10. 根据权利要求8或9所述的微博排序系统,其特征在于,所述用户信息评分模块包括:
    个人信息相似度计算单元,用于获取所述微博发表用户的个人信息以及微博请求用户的个人信息,计算所述微博发表用户的个人信息与所述微博请求用户的个人信息之间的相似度;
    关联度计算单元,用于获取所述微博发表用户与所述微博请求用户之间的交互记录,根据所述交互记录计算所述微博发表用户与所述微博请求用户之间的关联度;
    第二评分单元,用于根据所述相似度和所述关联度对所述微博信息进行评分。
  11. 根据权利要求8所述的微博排序系统,其特征在于,所述内容信息评分模块包括:
    类别向量提取单元,用于获取所述微博信息中的微博内容,根据所述微博内容以及微博主题类别的特征获取所述微博信息中的微博内容的主题类别向量;
    所述类别向量提取单元还用于获取微博请求用户的历史微博内容,根据所述历史微博内容以及微博主题类别的特征获取所述微博请求用户的历史微博内容的主题类别向量;
    内容相似度计算单元,用于根据所述微博信息中的微博内容的主题类别向量和所述微博请求用户的历史微博内容的主题类别向量,计算所述微博信息中的微博内容与所述微博请求用户的历史微博内容之间的相似度;
    第三评分单元,用于根据所述相似度对所述微博信息进行评分。
  12. 根据权利要求11所述的微博排序系统,其特征在于,所述系统还包括分类模型训练模块,所述分类模型训练模块包括:
    主题类别获取模块,用于获取预设的微博主题类别;
    训练集获取模块,用于获取所述微博主题类别的训练子集;
    特征提取模块,用于从所述训练子集中提取出所述微博主题类别的特征。
  13. 根据权利要求12所述的微博排序系统,其特征在于,所述训练集获取模块用于根据所述微博主题类别的关键词搜索微博,获取所述微博主题类别的初始训练子集,并按照预设次数重复执行以下步骤:统计所述初始训练子集中的高频词,根据所述高频词搜索微博,将搜索结果加入所述初始训练子集。
  14. 根据权利要求8所述的微博排序系统,其特征在于,所述系统还包括:
    展示类别分类模块,用于按照预设的微博展示类别对所述微博信息中的微博内容进行归类,得到所述微博内容所属的展示类别;
    所述展示模块还用于按照所述微博内容所属的展示类别及所述排序的结果展示所述微博信息。
  15. 一种微博搜索方法,其特征在于,按照权利要求1-7任一所述的微博排序方法对微博搜索结果进行排序,其中,所述获取用户请求的微博信息的步骤包括:根据用户输入的关键字进行搜索,得到所述用户请求的微博信息。
  16. 一种微博搜索系统,其特征在于,包括权利要求8-14任一所述的微博排序系统,其中,所述微博信息获取模块用于根据用户输入的关键字进行搜索,得到所述用户请求的微博信息。
  17. 一种微博展示方法,其特征在于,按照权利要求1-7任一所述的微博排序方法对微博请求结果进行排序,其中,所述获取用户请求的微博信息的步骤包括:根据用户标识对应的微博请求信息,得到所述用户请求的微博信息。
  18. 一种微博展示系统,其特征在于,包括权利要求8-14任一所述的微博排序系统,其中,所述微博信息获取模块用于根据用户标识对应的微博请求信息,得到所述用户请求的微博信息。
PCT/CN2013/071325 2012-02-09 2013-02-04 微博排序、搜索、展示方法和系统 WO2013117147A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP2014515063A JP2014522540A (ja) 2012-02-09 2013-02-04 マイクロブログのシーケンシング、検索、表示方法及びシステム
AP2014007382A AP2014007382A0 (en) 2012-02-09 2013-02-04 Method and system for sequencing, seeking, and displaying micro-blog
KR1020137031978A KR20140012750A (ko) 2012-02-09 2013-02-04 마이크로 블로그 배열, 검색 및 표시 방법과 시스템
EP13746647.0A EP2704040A4 (en) 2012-02-09 2013-02-04 METHOD AND SYSTEM FOR SEQUENCING, SEARCHING, AND DISPLAYING MICROBLOGS
US14/109,949 US9785677B2 (en) 2012-02-09 2013-12-17 Method and system for sorting, searching and presenting micro-blogs

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210028740.7A CN103246670B (zh) 2012-02-09 2012-02-09 微博排序、搜索、展示方法和系统
CN201210028740.7 2012-02-09

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/109,949 Continuation US9785677B2 (en) 2012-02-09 2013-12-17 Method and system for sorting, searching and presenting micro-blogs

Publications (1)

Publication Number Publication Date
WO2013117147A1 true WO2013117147A1 (zh) 2013-08-15

Family

ID=48926194

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/071325 WO2013117147A1 (zh) 2012-02-09 2013-02-04 微博排序、搜索、展示方法和系统

Country Status (7)

Country Link
US (1) US9785677B2 (zh)
EP (1) EP2704040A4 (zh)
JP (1) JP2014522540A (zh)
KR (1) KR20140012750A (zh)
CN (1) CN103246670B (zh)
AP (1) AP2014007382A0 (zh)
WO (1) WO2013117147A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860299A (zh) * 2020-07-17 2020-10-30 北京奇艺世纪科技有限公司 目标对象的等级确定方法、装置、电子设备及存储介质

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678474B (zh) * 2013-09-24 2016-10-05 浙江大学 一种在社交网络中快速获取大量热门话题的方法
CN103744918A (zh) * 2013-12-27 2014-04-23 东软集团股份有限公司 基于垂直领域的微博搜索排序方法及系统
CN104899201B (zh) * 2014-03-04 2019-05-14 腾讯科技(北京)有限公司 文本提取方法、敏感词判定方法、装置和服务器
CN104317881B (zh) * 2014-04-11 2017-11-24 北京理工大学 一种基于用户话题权威性的微博重排序方法
US9819633B2 (en) * 2014-06-18 2017-11-14 Social Compass, LLC Systems and methods for categorizing messages
US9819618B2 (en) * 2014-06-18 2017-11-14 Microsoft Technology Licensing, Llc Ranking relevant discussion groups
CN106294363A (zh) * 2015-05-15 2017-01-04 厦门美柚信息科技有限公司 一种论坛帖子评价方法、装置及系统
JP6842825B2 (ja) * 2015-09-25 2021-03-17 株式会社ユニバーサルエンターテインメント 情報提供システム、情報提供方法、及びプログラム
CN105468714B (zh) * 2015-11-20 2018-11-09 北京邮电大学 一种基于论坛的自媒体信息展示方法和系统
CN105808722B (zh) * 2016-03-08 2020-07-24 苏州大学 一种信息判别方法和系统
CN105824951B (zh) * 2016-03-23 2019-10-11 百度在线网络技术(北京)有限公司 检索方法和装置
CN106027303B (zh) * 2016-05-24 2019-07-16 腾讯科技(深圳)有限公司 一种征信特征获取方法及其设备
CN107844492A (zh) * 2016-09-19 2018-03-27 阿里巴巴集团控股有限公司 一种进行对象排序和展示搜索对象的方法及设备
CN108280198B (zh) * 2018-01-29 2021-03-02 口碑(上海)信息技术有限公司 榜单生成方法及装置
CN109885763B (zh) * 2019-01-26 2021-04-16 北京工业大学 一种基于用户头像的博文推荐方法
CN109948313B (zh) * 2019-03-15 2022-11-25 江苏金智教育信息股份有限公司 一种个人信息查看赋权的方法和装置
CN110941759B (zh) * 2019-11-20 2022-11-11 国元证券股份有限公司 一种微博情感分析方法
CN117093762B (zh) * 2023-07-18 2024-02-13 南京特尔顿信息科技有限公司 一种舆情数据评估分析系统及方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101305371A (zh) * 2005-09-13 2008-11-12 谷歌公司 对博客文档进行排名
CN101661474A (zh) * 2008-08-26 2010-03-03 华为技术有限公司 一种搜索方法和系统
CN102016825A (zh) * 2007-08-17 2011-04-13 谷歌公司 对社交网络对象进行排名
CN102063488A (zh) * 2010-12-29 2011-05-18 南京航空航天大学 一种基于语义的代码搜索方法

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09231238A (ja) * 1996-02-20 1997-09-05 Omron Corp テキスト検索結果表示方法及び装置
JP2006185356A (ja) * 2004-12-28 2006-07-13 Canon Inc 情報処理装置及びその処理方法、プログラム、記憶媒体、並びに文書分類システム
US7421429B2 (en) * 2005-08-04 2008-09-02 Microsoft Corporation Generate blog context ranking using track-back weight, context weight and, cumulative comment weight
US7765209B1 (en) 2005-09-13 2010-07-27 Google Inc. Indexing and retrieval of blogs
US8171128B2 (en) * 2006-08-11 2012-05-01 Facebook, Inc. Communicating a newsfeed of media content based on a member's interactions in a social network environment
JP2007334502A (ja) * 2006-06-13 2007-12-27 Fujifilm Corp 検索装置、方法およびプログラム
CN101004749A (zh) * 2006-12-26 2007-07-25 朱莉君 一种互联网用户交流平台的构建方法
JP4802125B2 (ja) * 2007-03-09 2011-10-26 富士通株式会社 ウェブログ管理プログラム、ウェブログ管理装置およびウェブログ管理方法
CN101561805B (zh) * 2008-04-18 2014-06-25 日电(中国)有限公司 文档分类器生成方法和系统
US20100042612A1 (en) * 2008-07-11 2010-02-18 Gomaa Ahmed A Method and system for ranking journaled internet content and preferences for use in marketing profiles
US8145636B1 (en) * 2009-03-13 2012-03-27 Google Inc. Classifying text into hierarchical categories
JP2010218475A (ja) 2009-03-19 2010-09-30 Nifty Corp ブログ分析方法及び装置
KR20100125697A (ko) 2009-05-21 2010-12-01 장경호 블로그를 이용한 광고 및 정보 제공 시스템
US8719302B2 (en) * 2009-06-09 2014-05-06 Ebh Enterprises Inc. Methods, apparatus and software for analyzing the content of micro-blog messages
US8539161B2 (en) 2009-10-12 2013-09-17 Microsoft Corporation Pre-fetching content items based on social distance
CN102088419B (zh) * 2009-12-07 2012-08-15 倪加元 一种在社交网络中查找好友信息的方法和系统
US20110178995A1 (en) * 2010-01-21 2011-07-21 Microsoft Corporation Microblog search interface
US8606792B1 (en) * 2010-02-08 2013-12-10 Google Inc. Scoring authors of posts
US20110231296A1 (en) * 2010-03-16 2011-09-22 UberMedia, Inc. Systems and methods for interacting with messages, authors, and followers
US8751511B2 (en) * 2010-03-30 2014-06-10 Yahoo! Inc. Ranking of search results based on microblog data
US20110302103A1 (en) * 2010-06-08 2011-12-08 International Business Machines Corporation Popularity prediction of user-generated content
US8583674B2 (en) * 2010-06-18 2013-11-12 Microsoft Corporation Media item recommendation
US8954451B2 (en) * 2010-06-30 2015-02-10 Hewlett-Packard Development Company, L.P. Selecting microblog entries based on web pages, via path similarity within hierarchy of categories
US20120042020A1 (en) * 2010-08-16 2012-02-16 Yahoo! Inc. Micro-blog message filtering
US9324112B2 (en) * 2010-11-09 2016-04-26 Microsoft Technology Licensing, Llc Ranking authors in social media systems
US8825679B2 (en) * 2011-02-15 2014-09-02 Microsoft Corporation Aggregated view of content with presentation according to content type
US8898151B2 (en) * 2011-06-22 2014-11-25 Rogers Communications Inc. System and method for filtering documents
CN102332006B (zh) * 2011-08-03 2016-08-03 百度在线网络技术(北京)有限公司 一种信息推送控制方法及装置
US8751917B2 (en) * 2011-11-30 2014-06-10 Facebook, Inc. Social context for a page containing content from a global community
US20130159277A1 (en) * 2011-12-14 2013-06-20 Microsoft Corporation Target based indexing of micro-blog content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101305371A (zh) * 2005-09-13 2008-11-12 谷歌公司 对博客文档进行排名
CN102016825A (zh) * 2007-08-17 2011-04-13 谷歌公司 对社交网络对象进行排名
CN101661474A (zh) * 2008-08-26 2010-03-03 华为技术有限公司 一种搜索方法和系统
CN102063488A (zh) * 2010-12-29 2011-05-18 南京航空航天大学 一种基于语义的代码搜索方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2704040A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860299A (zh) * 2020-07-17 2020-10-30 北京奇艺世纪科技有限公司 目标对象的等级确定方法、装置、电子设备及存储介质
CN111860299B (zh) * 2020-07-17 2023-09-08 北京奇艺世纪科技有限公司 目标对象的等级确定方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
US9785677B2 (en) 2017-10-10
CN103246670B (zh) 2016-02-17
EP2704040A4 (en) 2015-08-05
EP2704040A1 (en) 2014-03-05
AP2014007382A0 (en) 2014-01-31
CN103246670A (zh) 2013-08-14
KR20140012750A (ko) 2014-02-03
JP2014522540A (ja) 2014-09-04
US20140108388A1 (en) 2014-04-17

Similar Documents

Publication Publication Date Title
WO2013117147A1 (zh) 微博排序、搜索、展示方法和系统
US20220020056A1 (en) Systems and methods for targeted advertising
US8099415B2 (en) Method and apparatus for assessing similarity between online job listings
WO2012134180A2 (ko) 문장에 내재한 감정 분석을 위한 감정 분류 방법 및 컨텍스트 정보를 이용한 다중 문장으로부터의 감정 분류 방법
WO2015003480A1 (zh) 一种社交媒体中的信息推荐方法和装置
Malik et al. Comparing mobile apps by identifying ‘Hot’features
US20120131020A1 (en) Method and apparatus for assembling a set of documents related to a triggering item
WO2010036013A2 (ko) 웹 문서에서의 의견 추출 및 분석 장치 및 그 방법
US20100262597A1 (en) Method and system for searching information of collective emotion based on comments about contents on internet
Choudhari et al. Video search engine optimization using keyword and feature analysis
Jones et al. TREC 2020 podcasts track overview
US20090313217A1 (en) Systems and methods for classifying search queries
JP2009524158A5 (zh)
KR20080044915A (ko) 블로그 문서의 순위 부여
Cheng et al. On effective personalized music retrieval by exploring online user behaviors
JP6429382B2 (ja) コンテンツ推薦装置、及びプログラム
Völske et al. What users ask a search engine: Analyzing one billion russian question queries
CN110309265A (zh) 一种决定视频是否推送相关法律知识的方法
JP2014085862A (ja) 予測対象コンテンツにおける将来的なコメント数を予測する予測サーバ、プログラム及び方法
Sawicki et al. Exploring usability of reddit in data science and knowledge processing
US20070239735A1 (en) Systems and methods for predicting if a query is a name
Zhao et al. Why you should listen to this song: Reason generation for explainable recommendation
CN108140034B (zh) 使用主题模型基于接收的词项选择内容项目
US20120023119A1 (en) Data searching system
Mullick et al. Harnessing twitter for answering opinion list queries

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13746647

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2013746647

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2013746647

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20137031978

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2014515063

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE