CN109271518B

CN109271518B - Method and equipment for classified display of microblog information

Info

Publication number: CN109271518B
Application number: CN201811157427.7A
Authority: CN
Inventors: 康学雷; 杨智
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2012-04-28
Filing date: 2012-04-28
Publication date: 2021-12-07
Anticipated expiration: 2032-04-28
Also published as: CN109271518A; CN103377258B; CN103377258A

Abstract

The invention discloses a method and equipment for classified display of microblog information. The method for classified display of microblog information comprises the following steps: extracting a central word of microblog information; obtaining the relevance of the microblog information and a predefined classification by calculating the relevance of the headword and the predefined classification; classifying the microblog information into the predefined classification if the degree of correlation between the microblog information and the predefined classification is higher than a first threshold value; and displaying the classified microblog information. Therefore, the method and the device can automatically classify the massive microblog information published by the microblog publisher, so that the user can read only one type of microblog information interested by the user according to the classification, and new user experience of reading the massive microblog information is provided.

Description

Method and equipment for classified display of microblog information

The application is a divisional application of an invention patent application with the application number of 201210132513.9, the application date of 2012, 4 and 28 and the invention name of 'method and equipment for classifying and displaying microblog information'.

Technical Field

The invention relates to the technical field of computers, in particular to a method and equipment for classifying and displaying microblog information.

Background

Microblogs, namely microblogs (microbogs), are broadcast social network platforms for sharing short real-time information through an attention mechanism, and can share, spread and acquire information based on user relationships. On a microblog platform, a user can establish a personal community through a microblog server, a network and various clients, release information by characters and/or images of about 140 characters, and realize instant sharing of the information.

Microblog technologies are rapidly developing once they are introduced. Taking the Sina microblog website as an example, from 8 months in 2009 to 4 months in 2011, only 20 months are needed, the registered users of the Sina microblog reach nearly 1.5 hundred million people, and the users on the Sina microblog release more than 5000 million pieces of microblog information on average each day.

However, with the rapid increase of the use of the microblog service, the problem that the microblog service cannot be automatically classified when massive microblog information is browsed is more and more prominent. Specifically, in the process of using the existing microblog application program, users filter and sort the microblog information of the microblog publishers concerned by the users according to different accounts and different microblog types (such as comments), so that when the microblog information of the microblog publishers is many, the users do not know from where to browse.

For example, a situation frequently encountered is that when a user is interested in a certain microblog publisher, but there are hundreds of microblogs published by the microblog publisher, the user has no way to know the type of content that the microblog publisher is mainly interested in.

For another example, as microblogging is more and more popular, a user may want to review his account or accounts of other users who have paid attention to the user, but the current microblog application program does not provide any method for automatically classifying and summarizing the account other than viewing the account one by one, so that the user cannot quickly find some piece of microblog information that the user needs to review.

All of the above results in that when a user browses the microblog information of a certain microblog publisher, the user needs to browse manually one by one, and whether the focus of the microblog publisher is the same as the user is manually summarized, so that a lot of time and energy of the user are consumed.

Disclosure of Invention

In order to solve the technical problem, according to an aspect of the present invention, there is provided a method for classifying and displaying microblog information, the method including: extracting a central word of microblog information; obtaining the relevance of the microblog information and a predefined classification by calculating the relevance of the headword and the predefined classification; classifying the microblog information into the predefined classification if the degree of correlation between the microblog information and the predefined classification is higher than a first threshold value; and displaying the classified microblog information.

In addition, according to another aspect of the present invention, there is provided an apparatus for performing classified display on microblog information, the apparatus comprising: the central word extracting unit is used for extracting a central word of the microblog information; a relevancy obtaining unit, configured to obtain relevancy of the microblog information to a predefined classification by calculating relevancy of the headword extracted by the headword extracting unit to the predefined classification; a classification unit, configured to classify the microblog information into the predefined classification if the degree of correlation between the microblog information obtained by the degree of correlation obtaining unit and the predefined classification is higher than a first threshold; and the display processing unit is used for displaying the microblog information classified by the classification unit.

In order to solve the above technical problem, according to an aspect of the present invention, there is provided a system for displaying microblog information in a classified manner, including: the engine server is arranged at a network end and connected with a microblog server for providing microblog service, can download microblog information published by a microblog publisher in a time range from the microblog server, and automatically classifies the microblog information; the microblog client is positioned at a user side and connected with the engine server, and comprises: the input information receiving unit is used for receiving microblog information which is automatically classified on the engine server; and a display processing unit for displaying to a user.

In addition, according to another aspect of the present invention, there is provided an engine server for classified display of microblog information, including: the device comprises a headword extracting unit, a correlation degree obtaining unit, a classifying unit and a microblog information obtaining unit. In the process of downloading the microblog information, the microblog information acquiring unit acquires reply information of other microblog issuers for the microblog information issued by the microblog issuers in addition to all the microblog information issued by the microblog issuers in the time range.

In addition, according to another aspect of the present invention, a method for displaying microblog information in a classified manner is provided, including: extracting microblog information; classifying the microblog information by calculating the relevance of the microblog information and a predefined classification; and determining the attention heat degree of the microblog publishers to the predefined classification according to the number and the richness of the microblog information in the predefined classification.

Compared with the prior art, the method and the device for classifying and displaying the microblog information can analyze the relevance of the microblog information and classify the microblog information highly related to the predefined classification into the predefined classification, so that the classified microblog information is finally displayed to a user. Therefore, the method and the device can automatically classify the massive microblog information published by the microblog publisher, so that the user can read only one type of microblog information interested by the user according to the classification, and new user experience of reading the massive microblog information is provided.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 illustrates a method for classifying and displaying microblog information according to the invention.

Fig. 2 illustrates a device for classifying and displaying microblog information according to the invention.

Fig. 3 illustrates a method for classifying and displaying microblog information according to an embodiment of the invention.

Fig. 4 illustrates a classification system and a microblog server for classifying and displaying microblog information according to an embodiment of the invention.

FIG. 5 illustrates a flow diagram of an offline data training phase according to an embodiment of the present invention.

Fig. 6A to 6C illustrate examples of the calculated degree of relevance of a headword to a predefined classification according to an embodiment of the present invention.

FIG. 7 illustrates a preview display interface displayed in a microblog client according to an embodiment of the invention.

FIG. 8 illustrates a classification display interface displayed in a microblog client according to an embodiment of the invention.

Detailed Description

Various embodiments according to the present invention will be described in detail with reference to the accompanying drawings. Here, it is to be noted that, in the drawings, the same reference numerals are given to constituent parts having substantially the same or similar structures and functions, and repeated description thereof will be omitted.

Hereinafter, a method and an apparatus for classifying and displaying microblog information according to the present invention will be described with reference to fig. 1 and 2.

Fig. 1 illustrates a method for classifying and displaying microblog information according to the invention. The method comprises the following steps:

in step S110, a headword of the microblog information is extracted;

in step S120, obtaining a correlation degree between the microblog information and a predefined classification by calculating a correlation degree between the headword and the predefined classification;

in step S130, if the correlation between the microblog information and the predefined classification is higher than a first threshold, classifying the microblog information into the predefined classification; and

in step S140, the classified microblog information is displayed.

Fig. 2 illustrates an apparatus 200 for classifying and displaying microblog information according to the invention. The apparatus 200 comprises:

a headword extraction unit 210, configured to extract headwords of microblog information;

a relevancy obtaining unit 220, configured to obtain relevancy of the microblog information to a predefined category by calculating relevancy of the headword extracted by the headword extracting unit 210 to the predefined category;

a classifying unit 230, configured to classify the microblog information into the predefined classification if the degree of correlation between the microblog information obtained by the degree of correlation obtaining unit 220 and the predefined classification is higher than a first threshold; and

a display processing unit 240, configured to display the microblog information classified by the classification unit 230.

Therefore, by adopting the method and the device for classifying and displaying the microblog information, the relevance of the microblog information can be analyzed, and the microblog information highly related to the predefined classification can be classified into the predefined classification, so that the classified microblog information is finally displayed to a user. Therefore, the method and the device can automatically classify the massive microblog information published by the microblog publisher, so that the user can read only one type of microblog information interested by the user according to the classification, and new user experience of reading the massive microblog information is provided.

Hereinafter, a method and a device for classifying and displaying microblog information according to an embodiment of the invention will be described with reference to fig. 3 and 4. In the embodiment of the present invention, a classification system including an engine server and a microblog client is described as an example of a device for classifying and displaying microblog information.

It should be noted that, although the present invention is described herein by applying the method and apparatus for classifying and displaying microblog information to a classification system, those skilled in the art will understand that the present invention is not limited thereto. Instead, the invention can also be applied in stand-alone devices. For example, the various components of the classification system may be implemented in a stand-alone device, such as a personal computer, a notebook computer, a tablet computer, a multimedia player, or a personal digital assistant.

Fig. 3 illustrates a method for classifying and displaying microblog information according to an embodiment of the invention, and fig. 4 illustrates a classification system and a microblog server for classifying and displaying microblog information according to an embodiment of the invention.

The method for classifying and displaying microblog information according to the embodiment of the invention illustrated in fig. 3 can be applied to the classification system 400 illustrated in fig. 4. As illustrated in fig. 4, the classification system 400 for classifying and displaying microblog information includes: engine server 410 and microblog client 450.

The engine server 410 is located at a network end (cloud end), is connected to the microblog server 300 for providing microblog services, can download microblog information issued by any microblog issuer in any time range from the microblog server 300, and automatically classifies the microblog information. The microblog client 450 is connected to the engine server 410 at a user side (local side), and is configured to receive microblog information automatically classified on the engine server 410 and display the microblog information to the user.

Obviously, the advantage of implementing the automatic classification operation of the microblog information on the engine server 410 in the cloud is that: a large amount of operation operations on local user equipment (the microblog client 450) can be reduced, so that the requirement on the operation capacity of the user equipment is lowered, and the user can use simple and low-cost user equipment to realize classified browsing of massive microblog information.

The engine server 410 includes: a headword extracting unit 210, a relevancy obtaining unit 220, a classifying unit 230, and a microblog information acquiring unit 250.

The microblog client 450 includes: an input information receiving unit 260 and a display processing unit 240.

As illustrated in fig. 3, the method for classifying and displaying microblog information according to the embodiment of the invention includes:

in step S300, all pieces of microblog information issued by the microblog issuer within a time range are acquired.

Specifically, when a user wishes to browse microblog information published within a certain time period by a certain microblog publisher in an automatic classification manner, the user activates the microblog client 450 (e.g., a mobile phone). At this time, the display processing unit 240 in the microblog client 450 prompts the user to input the account number of the microblog publisher who wishes to perform classified browsing and the time range on a display screen (not shown) located on the microblog client 450 (for example, a display of a mobile phone). The micro blog client 450 receives the above information input by the user through the input information receiving unit 260 (e.g., a touch screen or a keypad of a mobile phone), and then transmits them to the engine server 410 through a wired or wireless manner.

In the engine server 410, the microblog information obtaining unit 250 downloads the corresponding microblog information from the microblog server 300 to the engine server 410 according to the account number and the time range of the microblog publisher received from the microblog client 450, so as to perform subsequent automatic classification processing.

For example, when the user wants to sort all microblogs in the time period from 1 st day of 2012 to 3 st 31 st day of 2012 in the yao morning, the microblog information acquiring unit 250 may acquire all microblogs issued by the user according to the account of the microblogs in the yao morning and the time period, and store the microblogs in the engine server 410.

Preferably, in order to more accurately automatically classify the downloaded microblog information, the microblog information obtaining unit 250 obtains, in the downloading process, all microblog information released by the microblog releases within the time range, and also obtains reply information made by other microblog releases to the microblog information released by the microblog releases themselves, so that long-text conversion of the short-text microblog information is realized according to the short-text characteristic of the microblog information (generally, the number of words of one piece of microblog information does not exceed 140 words), and the content included in each piece of microblog information is enriched.

Further, optionally, if the engine server 410 finds that the user does not pay attention to the microblog publisher in the interaction process with the microblog server 300, the engine server 410 may prompt the user to add attention to the microblog publisher through the display processing unit 240 in the microblog client 450, and after the user has added attention, the engine server 410 continues the downloading operation.

In step S310, a headword of the microblog information is extracted.

Specifically, in the engine server 410, the headword extracting unit 210 receives all pieces of microblog information that the microblog information acquiring unit 250 acquires from the microblog server 300 that the user wishes to browse in a classified manner, and performs real-time data analysis on the microblog articles by using a bigram (Gram) model generated in an offline data training stage.

In the following, an offline data training phase according to an embodiment of the present invention is described with reference to fig. 5.

FIG. 5 illustrates a flow diagram of an offline data training phase according to an embodiment of the present invention. Before the stage of real-time data analysis using the engine server 410, the engine server 410 must first be trained offline for features of microblog texts.

Specifically, because the microblog texts have the characteristics of more short sentences and the comment content proportion larger than that of the narrative content compared with the common texts, a Conditional Random Field (CRF) can be adopted for off-line training of the engine server 410, and the CRF model is obtained by training of a microblog special corpus.

As illustrated in fig. 5, the offline data training phase includes:

in step S510, some real microblog information is randomly extracted from a network open Application Program Interface (API) (e.g., from the newwave microblog) and bulk transmitted to the engine server 410, which engine server 410 may be, for example, a sony natural language engine server.

In step S520, the engine server 410 performs automatic parsing by using an initial microblog corpus, for example, the initial microblog corpus may be generated by human beings, and at least includes: the subdivided word, the part-of-speech of the word (e.g., noun, verb, pronoun, preposition, etc.), and the classification to which the word may belong.

Specifically, the automatic parsing operation includes the following steps: each piece of microblog information in the plurality of pieces of microblog information extracted randomly is segmented into at least one natural sentence; each natural sentence after segmentation is subdivided into a plurality of words; performing part-of-speech tagging on each subdivided word; on the basis of the part of speech tagging, the syntax analysis is carried out on the natural sentence after segmentation; and finding out candidate headwords according to the result of the syntactic analysis and the microblog headwords dictionary.

In step S530, it is determined whether the deviation of the microblog-specific corpus is smaller than a predetermined threshold.

For example, after the engine server 410 automatically analyzes the randomly extracted microblog information through the initial microblog-specific corpus, an operator of the engine server 410 determines whether the obtained headword is consistent with a headword in the microblog information that is manually determined by the operator.

If the deviation of the microblog-specific corpus is greater than a predetermined threshold, for example, if a large amount of disagreement (for example, 50%) exists between the central word obtained by the engine server 410 through the initial microblog-specific corpus and the central word manually judged by the operator, the operator corrects the deviation of the microblog-specific corpus according to the result of the manual judgment, and adds classification information to which the word may belong to a new word in the microblog-specific corpus, thereby obtaining an updated microblog-specific corpus.

Then, the updated microblog-specific corpus is used to replace the initial microblog-specific corpus, and the step S510 is executed again to further correct and update the microblog-specific corpus by using other real microblog information. Steps S510 to S530 are repeatedly performed until the deviation of the microblog-specific corpus is smaller than a predetermined threshold.

In step S540, the engine server 410 performs modeling of the binary Gram according to the finally generated microblog-specific corpus.

For example, the engine server 410 establishes a binary Gram model for analyzing microblog data in real time according to the segmentation and labeling results obtained through the CRF model, so as to improve the accuracy of automatic classification.

It should be noted that although the offline data training phase according to the embodiment of the present invention is described herein by a Conditional Random Field (CRF), those skilled in the art will appreciate that the present invention is not limited thereto. Rather, other random fields, such as Markov Random Fields (MRF), Gibbs Random Fields (GRF), or Gaussian random fields, may also be used to implement the offline data training phase described above.

Referring back to step S310 of fig. 3, the headword extraction unit 210 segments each piece of microblog information (preferably, including reply information) of all microblog information (including reply information) of a specific user that the user wishes to browse in a classified manner, which is acquired from the microblog server 300 by the microblog information acquisition unit 250, in a specific time period according to a binary grammar (Gram) model generated in an offline data training phase into at least one natural sentence; each natural sentence after segmentation is subdivided into a plurality of words; labeling each word part of speech after being subdivided; establishing a grammar tree for the natural sentence according to the words and the parts of speech thereof; and extracting the words in the domination relation in the grammar tree of the natural sentence as the central words of the natural sentence in the microblog information.

For example, words in a dominant relationship in the syntax tree of the natural sentence may be a cardinal phrase, an usher phrase, and/or a noun phrase. Obviously, words in a dominating relationship in the syntax tree of the natural sentence may be extracted according to other rules (e.g., selecting a subject, a predicate, an object, or the like).

Next, step S310 is explained in detail by an example.

For example, the microblog information obtaining unit 250 obtains five pieces of microblog information in step S300, where the content of the first piece of microblog information is that "a genuine rooster is eaten in the yellow mountain travel restaurant today, and is really too delicious, cool |)! ".

At this time, the headword extraction unit 210 first performs natural sentence segmentation on the first piece of microblog information with reference to a bigram (Gram) model generated in an offline data training stage. The microblog information is divided into three natural sentences, namely a first natural sentence that a genuine chicken on the ground is eaten in a yellow mountain tourism restaurant today, a second natural sentence that the chicken is really very delicious cheer, and a third natural sentence that the chicken is cool.

Next, continuing with the first natural sentence as an example, the headword extraction unit 210 uses a context-free grammar to subdivide the first natural sentence into eight words, "today", "on", "yellow mountain travel restaurant", "eat", "consumed", "one pause", "authentic", and "free-range chicken". Then, the headword extraction unit 210 performs part-of-speech tagging on the above eight terms in the first natural sentence, for example, "eat" as a verb, "free chicken" as a noun, and so on. Subsequently, the core word extracting unit 210 builds a syntax tree for the first natural sentence according to the subdivided words and their parts of speech. By analyzing the syntax tree, it can be known that the words in the syntax tree in the dominance relationship are bingo structures composed of verbs and nouns, that is, the dominant phrases of the first natural sentence can be obtained as phrases "eat" and "rooster" of the bingo structure. Therefore, the headword extraction unit 210 extracts the verb phrase as the headword of the first natural sentence in the microblog information.

Similarly, the headword extraction unit 210 performs similar processing on the second and third natural sentences in the first piece of microblog information. Wherein, since the third natural sentence includes only one word, it is obvious that the structure of the syntax tree is incomplete. Therefore, the third natural sentence is preferably filtered from the first piece of microblog information, so that some "water-pouring posts" (e.g., "top", "yawn", etc.) which are too short and are unlikely to contain any central word can be filtered at this stage, so as to reduce the data analysis load of the engine server 410 when the subsequent correlation degree is matched.

After the extraction of the key words of all the natural sentences in the first piece of microblog information is completed, similarly, the key word extraction unit 210 then starts to perform similar key word extraction processing on the subsequent second to fifth pieces of microblog information to obtain the relevant key words of all the five pieces of microblog information.

It should be noted that, although step S310 is described by using the context-free grammar to build the syntax tree and determine the dominant phrase in the syntax tree as the central word, those skilled in the art will understand that the present invention is not limited thereto. Instead, step S310 may also be implemented using a method of determining whether a headword predefined by an operator exists in the microblog information, for example, by comparing the subdivided terms with a headword dictionary including a plurality of predefined headwords.

In step S320, the relevance of the microblog information to the predefined classification is obtained by calculating the relevance of the headword to the predefined classification.

Specifically, in the engine server 410, the relevancy obtaining unit 220 receives one or more headwords of each piece of filtered microblog information extracted in the above step S310 from the headword extracting unit 210. Also, the relevancy obtaining unit 220 extracts a plurality of predefined classifications, which are artificially specified and used to automatically classify pieces of microblog information according to them, from a memory (not shown) of the engine server 410.

Next, the relevancy obtaining unit 220 uses a pre-trained central word relevancy probability library to establish a spatial vector of the categories for each predefined category, where each element in the spatial vector indicates a relevancy of each central word in the central words and the predefined category. The headword relevancy probability library is obtained by pre-training of an operator and comprises the relevancy of each headword preset by the operator and each predefined classification, wherein the relevancy is a probability value used for expressing the probability that the headword is relevant to the predefined classification, the value range of the relevancy is from 0 to 1,0 is completely irrelevant, and 1 is completely relevant.

Next, the correlation calculation step S320 according to the embodiment of the present invention is described with reference to fig. 6A to 6C.

Fig. 6A to 6C illustrate examples of the calculated degree of relevance of a headword to a predefined classification according to an embodiment of the present invention. It is assumed that the headword included in a certain piece of microblog information extracted by the headword extraction unit 210 in step S310 is four headwords of "eat", "walky chicken", "take a picture", "donyou", and that the predefined classifications in the engine server 410 include three classifications of "take a picture", "gourmet", and "travel".

At this time, the relevancy obtaining unit 220 calculates the relevancy of each of the four headwords to each of the three categories by using the headword relevancy probability library.

Aiming at the first predefined classification of 'shooting', the relevance of the first central word 'eating' and the classification is low and is only 0.1 through calculation; the relevance of the second core word "rooster" to this classification is still very low, only 0.1; the third core word "take a picture" has a very high degree of correlation with the classification, which is 1 indicating that the two are completely correlated; the fourth core word "dongyou" has a relevance of 0.3 to this classification.

Aiming at the second predefined classification 'food', the relevance of the first core word 'eat' and the classification is high and is 0.9 through calculation; the relevance of the second core word "rooster" to this classification is 0.8; the third core word "take a picture" has a very low relevance to this classification, only 0.1; the fourth core word "dongyou" has a relevance of 0.3 to this classification.

Aiming at a third predefined classification 'travel', the relevance of the first headword 'eat' to the classification is 0.3 through calculation; the relevance of the second core word "rooster" to this classification is very low, 0.1; the third core word "take a picture" has a relevance of 0.6 to the classification; the fourth core word "dongyou" has a high degree of relevance to this classification of 0.9.

Thus, through the above steps, the relevancy obtaining unit 220 may use the headword relevancy probability base to obtain: the space vector of the microblog information for the first predefined classification "photography" is t1 ═ {0.1,0.1,1,0.3 }; the space vector of the microblog information for the second predefined classification "food" is t2 ═ {0.9,0.8,0.1,0.3 }; the space vector of the microblog information for the third predefined classification 'travel' is t3 ═ {0.3,0.1,0.6,0.9}, so that a relevance probability distribution space of each headword and the predefined classification is established.

Finally, the relevancy obtaining unit 220 takes the sum of the relevancy of each headword and a predefined category as the relevancy of the microblog information and the predefined category.

For example, the degree of correlation between the microblog information and the first predefined classification is 0.1+0.1+1+0.3 — 1.5; the relevancy of the microblog information and a second predefined classification is 0.9+0.8+0.1+ 0.3-2.1; the relevance of the microblog information and the third predefined classification is 0.3+0.1+0.6+ 0.9-1.9.

Obviously, when only one headword exists in a certain piece of microblog information, the correlation degree of the headword and a certain predefined classification is the correlation degree of the microblog information and the predefined classification.

Referring back to fig. 3, in step S330, the microblog information is classified into the predefined classification.

In the engine server 410, the classification unit 230 receives the calculated relevancy of each piece of microblog information to each predefined classification from the relevancy obtaining unit 220, and compares the relevancy of each piece of microblog information to the predefined classification with a first threshold. And if the correlation degree of the microblog information and the predefined classification is higher than a first threshold value, selecting the maximum correlation degree from the correlation degrees higher than the first threshold value, and classifying the microblog information into the predefined classification corresponding to the maximum correlation degree. And if the correlation degree of the microblog information and the predefined classification is lower than a first threshold value, not classifying the microblog information into the predefined classification.

Specifically, the classification unit 230 compares the relevancy of each piece of microblog information calculated by the relevancy obtaining unit 220 with the relevancy of each predefined classification and a first threshold. Here, for convenience of explanation, the first threshold is assumed to be 1.8.

Still referring to the example illustrated in fig. 6A to 6C, the relevance of the piece of microblog information, which contains four core words of "eat", "free chicken", "take a picture", "dongler", and the first predefined classification "photograph" received by the classification unit 230 is 1.5; a relevance to the second predefined classification "food" of 2.1; the degree of correlation with the third predefined classification "travel" is 1.9.

The classification unit 230 then compares the three correlations with a first threshold value of 1.8. It can be found that the degree of correlation of the piece of microblog information with the second and third predefined classifications is greater than the first threshold, and the degree of correlation 2.1 with the second predefined classification is greater than the degree of correlation 1.9 with the third predefined classification, so that the classification unit 230 classifies the piece of microblog information into the second predefined classification "gourmet" corresponding to the maximum degree of correlation 2.1.

In another example, if the degree of correlation between a certain piece of microblog information and the first to third predefined classifications is less than the first threshold value of 1.8, the classification unit 230 does not classify the certain piece of microblog information into any one of the predefined classifications. And, at the end of step S330, that is, after finishing the classification operation of each microblog message into all predefined classifications, the classification unit 230 classifies all microblog messages not classified into the predefined classifications into one or more newly-created classifications.

For example, the classification unit 230 may classify all pieces of microblog information with a degree of correlation smaller than a first threshold value with each predefined classification into a classification called "other" or "miscellaneous" to avoid that the user cannot see some pieces of microblog information that are less or not relevant to the classification predefined by the operator when the user subsequently views the classified pieces of microblog information.

Alternatively, the classification unit 230 may also preferably classify a plurality of pieces of microblog information whose core words contained therein are comparatively related (e.g., the core word "piano" and the core word "electronic organ" and the core word "accordion") into a newly created classification, and take the core word closest to the geometric center point of the core words of all pieces of microblog information (e.g., the core word "piano") as the name of the newly created classification.

In step S340, iterative clustering is performed.

In the engine server 410, after classifying the microblog information into the respective predefined classifications, the classification unit 230 compares the number of all classifications with a second threshold. If the number of the classifications is larger than the second threshold value, using a spatial clustering method, and continuing iterative clustering until the number of the classifications is smaller than or equal to the second threshold value. And if the number of classifications is less than or equal to the second threshold, the following step S350 is performed.

Specifically, after completing all the classification operations of each microblog message into the predefined classification and/or the newly created classification, the classification unit 230 compares the number of currently existing classifications with a second threshold. For example, the second threshold is set by the user and is used to indicate the number of categories that are allowed to be displayed simultaneously on the display interface of the microblog client of the user.

For example, if the number of currently existing classifications is 8 after the first classification operation and the second threshold is 5, the classification unit 230 determines that an iterative clustering operation is required to gradually narrow the number of classifications to 5.

In one example, the classification unit 230 may delete the first predefined classification having the least number of microblog information among all predefined classifications; for microblog information in the first predefined classification, re-calculating the relevance of the central word of the microblog information and other predefined classifications except the first predefined classification in all the predefined classifications to obtain the relevance of the microblog information and the other predefined classifications; and if the relevance of the microblog information and the other predefined classifications is higher than a first threshold value, reclassifying the microblog information into one of the other predefined classifications.

For example, assume that there are 1 microblog message in the first predefined classification, 2 microblog messages in the second predefined classification, … …, and 8 microblog messages in the eighth predefined classification at this time. Then, the classifying unit 230 may delete the first predefined classification, and re-read the central word in the microblog information in the first predefined classification, and return to performing steps S320 and S330. For example, it is preferable that all the keywords of the microblog information are not deleted from the engine server 410 after the end of step S330, but stored in a temporary memory (not shown) until being cleared after the sorting operation is completed.

That is, the relevancy obtaining unit 220 recalculates the relevancy of one or more terms of the microblog information to the second predefined category to the eighth predefined category, so as to obtain the relevancy of the microblog information to the other 7 predefined categories. As described above, the classification unit 230 further determines whether the correlation between the microblog information and the other 7 predefined classifications is higher than a first threshold. If the correlation is higher than the first threshold, the classification unit 230 selects the maximum correlation from the correlations higher than the first threshold, and classifies the microblog information into a predefined classification corresponding to the maximum correlation. If the first threshold is not met, the classification unit 230 classifies the microblog into a classification named "other", for example.

At this time, the classification unit 230 compares the number of all classifications with the second threshold value again. Since the current classification number is 7 still larger than the second threshold value of 5, the classification unit 230 repeatedly executes step S340, and deletes the second classification including 2 pieces of microblog information, and so on. In this way, the classification unit 230 classifies the similar microblog information into one class by using a spatial clustering method until the number of classified classes is less than or equal to a preset number.

In another example, the classifying unit 230 may also calculate the distance between any two of the classes according to a preset criterion, and combine the two classes with the smallest distance therebetween into one class, so as to directly narrow down the 8 classes obtained by the classification to a preset number.

In step S350, the digest and attention heat of the classification are determined.

In the engine server 410, specifically, after the classification unit 230 classifies all the pieces of microblog information into the preset number of classifications, it may preferably select, for each predefined classification, the microblog information having the greatest degree of correlation with the predefined classification, and represent the predefined classification by taking a picture and/or a central word in the selected microblog information as a thumbnail and/or a digest, so that the user may more clearly understand the subject of the microblog information included in this classification through the thumbnail and/or the digest of the classification in addition to the name of the classification.

In addition, preferably, the classification unit 230 may further determine the attention degree of the microblog publisher to the predefined classification according to the number and the richness of the microblog information in the predefined classification.

For example, the classification unit 230 determines the classification attention by multiplying the number of sentences having a complete syntax tree by the number of words included. Suppose that 2 pieces of microblog information are included in a certain predefined category, the first piece of microblog information includes 1 sentence having a complete grammar tree with 30 words, and the second piece of microblog information includes 2 sentences having a complete grammar tree with 10 words and 20 words, respectively. Then, the classification unit 230 may calculate the attention degree of the classification as 1 × 30+1 × 10+1 × 20 — 60.

Therefore, the user can know the interest degree of the microblog publisher on the classification topic according to the attention heat degree of a certain classification, so that the user can better know the interest and hobbies of the microblog publisher.

In step S360, the classified microblog information is displayed.

Specifically, in the classification system 400, after the engine server 410 completes classification operations on microblog information, the engine server 410 can push various pieces of microblog information and the classifications to which the pieces of microblog information belong to the microblog client 450.

For example, the display processing unit 240 in the microblog client 450 receives the classified microblog information from the classification unit 230 in the engine server 410, automatically arranges and adjusts the received microblog information according to layout information preset by a user or default by a system, and displays the microblog information to the user according to different classifications and time periods.

Next, referring to fig. 7 and 8, the microblog information displaying step S360 according to the embodiment of the invention is described.

Fig. 7 illustrates a preview display interface displayed in the microblog client 450 according to an embodiment of the present invention, and fig. 8 illustrates a sort display interface displayed in the microblog client 450 according to an embodiment of the present invention.

As illustrated in fig. 7, a user Edwin may wish to obtain the microblog information of the people who are interested in yao morning, tianqi, xiao S, and the like at different time periods, and then the user uses the classification system 400 according to the embodiment of the present invention in advance to capture and classify the microblog information of each microblog publisher in the steps S300 to S350.

Then, the user jumps from the traditional microblog browsing interface to the classification microblog interface. In a preview display interface in the classification microblog interface, accounts of people concerned by the user can be displayed one by one, wherein accounts updated until now are opened last time can be highlighted and displayed. For example, the display processing unit 240 in the micro-blog client 450 (e.g., mobile phone) displays a preview display interface on a display screen located on the micro-blog client 450, as illustrated in fig. 7. The user name Edwin and the head portrait of the user, the user name of a microblog publisher concerned by the user, such as morning Yao, Tianqi, Xiaos and the like, and the corresponding head portrait are included.

Thereafter, the user selects an account (e.g., a tip) of the microblog publisher who wants to browse in a classified manner, so that the display processing unit 240 in the microblog client 450 displays a classified display interface on the display screen. As illustrated in fig. 8, the classification system 400 has automatically classified all microblog information in the microblog publisher account according to relevance, labeled popularity, and sorted in order of year. In fig. 8, the accounts 2010 microblog information are classified into 5 categories, namely "photograph", "animal", "life", "travel", and "food", wherein the "photograph" has the highest popularity and the popularity reaches 345.

Then, the user can choose to enter a specific category (e.g., photography) for detailed microblog reading; alternatively, the user may select the arrow "below" 2010 "in the upper right corner or drag the arrow" to the left to display a finer one-layer classification in 2010 in months, and the user may select the arrow "below" 2009 "in the middle right corner or drag the arrow" to the left to display a finer one-layer classification in 2009 in months.

Therefore, the user can automatically classify the microblogs with high relevance according to the predefined categories (or automatically summarized categories) and sort the microblogs according to the time or relevance sequence by analyzing the relevance of each microblog content of the specified time period and the specified publisher, and the classified information is automatically typeset and displayed after representative pictures and microblog characters are extracted. Therefore, when a user faces a large amount of microblog information, the user can preview the center theme in a list mode and can further read interested themes, and therefore new user experience of reading massive microblogs is provided.

Therefore, by using the method and the system, the user can directly select the object and the time period to be concerned, and visually enter a certain type of interested microblog; pictures and abstracts representative of each category can be seen in the initial preview page; the previous microblog information can be skipped to review in a convenient manner; and after the popular degrees are classified and calculated, the user can conveniently and quickly find the most popular classification subject.

In summary, the invention changes the traditional microblog browsing mode at present, and changes the information reading one by one into a new experience of only reading the concerned content after quickly browsing the theme, and the invention can be easily applied to various consumer electronic products in a software or hardware form so as to effectively improve the microblog browsing experience of the user.

Various embodiments of the present invention are described in detail above. However, those skilled in the art will appreciate that various modifications, combinations, or sub-combinations of the embodiments may be made without departing from the spirit and principle of the invention, and such modifications are intended to be within the scope of the invention.

Claims

1. A system for classified display of microblog information comprises:

an engine server and a micro-blog client,

the engine server is arranged at a network end and is connected with a microblog server for providing microblog service, the microblog information issued by a microblog issuer in a time range is downloaded from the microblog server, and the microblog information is automatically classified according to predefinition;

the microblog client is positioned at a user side and connected with the engine server,

the microblog client comprises: the input information receiving unit is used for receiving microblog information which is automatically classified on the engine server; a display processing unit for displaying to a user,

the engine server comprises a microblog information acquisition unit, and the microblog information acquisition unit acquires reply information of other microblog issuers to the microblog information issued by the microblog issuers in addition to all the microblog information issued by the microblog issuers in the time range in the process of downloading the microblog information.

2. The system of claim 1, wherein the display processing unit prompts a user to add attention to a microblog publisher if the user is found not to be concerned with the microblog publisher.

3. The system of claim 2, wherein the engine server continues to download after the user has added attention.

4. The system of claim 1,

the engine server also comprises a relevancy obtaining unit, wherein the relevancy obtaining unit utilizes a headword relevancy probability library to obtain the relevancy of each headword to a predefined classification: and for the microblog information, establishing the correlation degree of each headword and the predefined classification for the space vectors of different predefined classifications.

5. The system of claim 4,

the relevancy obtaining unit obtains the headword relevancy probability base by training the relevancy of each headword and each predefined classification which are preset.

6. The system of claim 1, wherein the engine server further comprises a classification unit, the classification unit classifies microblog information by calculating the relevance of the microblog information to a predefined classification, and determines the attention hot degree of a microblog publisher to the predefined classification according to the number and richness of the microblog information in the predefined classification.

7. A microblog information classified display method comprises the following steps:

downloading microblog information published by a microblog publisher within a time range;

automatically classifying the microblog information;

providing the classified microblog information to the microblog client,

in the process of downloading the microblog information, besides all the microblog information issued by the microblog issuers in the time range, reply information made by other microblog issuers for the microblog information issued by the microblog issuers is acquired.