CN103593399A - Method and equipment for collecting microblog content according to microblog user library - Google Patents

Method and equipment for collecting microblog content according to microblog user library Download PDF

Info

Publication number
CN103593399A
CN103593399A CN201310476149.2A CN201310476149A CN103593399A CN 103593399 A CN103593399 A CN 103593399A CN 201310476149 A CN201310476149 A CN 201310476149A CN 103593399 A CN103593399 A CN 103593399A
Authority
CN
China
Prior art keywords
microblog users
microblog
users
previously selected
microblogging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310476149.2A
Other languages
Chinese (zh)
Inventor
冯青松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201310476149.2A priority Critical patent/CN103593399A/en
Publication of CN103593399A publication Critical patent/CN103593399A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

The invention discloses a method and equipment for collecting microblog content according to a microblog user library. The method includes judging whether pre-selected microblog users in the microblog user library meet pre-defined conditions or not; if not, marking selected states of the pre-selected microblog users in the microblog user library as non-selected, and if yes, keeping the selected states of the pre-selected microblog users in the microblog user library unchanged; collecting the microblog content correspondingly released or forwarded by microblog users marked with the selected states in the microblog user library. By the method and the equipment, accuracy in collecting the microblog content of microblog users in the microblog user library can be effectively improved.

Description

A kind of method and apparatus that gathers microblogging content according to microblog users storehouse
Technical field
The invention belongs to field of computer technology, relate in particular to a kind of method and apparatus that gathers microblogging content according to microblog users storehouse.
Background technology
Along with the development and progress in epoch, as " microblogging " this social service platform, arise at the historic moment, and be penetrated into gradually social various aspects.Microblogging has attracted rapidly a large number of users with its short and small unique charm efficiently, causes microblogging discussion tide.With other information propagation patterns, compare, microblogging has its distinctive feature.First, it can be issued more quickly and diffuse information.Due to the restriction of number of words, the content of microblogging issue is short and pithy.The second, microblogging can allow anyone or order group to read, reply, forward, and has realized the propagation of one-to-many, multi-to-multi.The 3rd, the communication effect of microblogging is more remarkable.Dapper information, when reaching audient's aspect, can not expend the too many time and go to understand, and the simple and easy of content is more easily accepted it.The 4th, microblogging has the characteristic that real-time, interactive is propagated.
Different from conventional internet information scratching, microblogging content capture to require to have higher ageing.Conventional Grasp Modes is at present, register a collection of corpse user, by these corpses user, pay close attention to a collection of mass ratio microblog users, then constantly capture the microblogging content of this batch of microblog users, these corpses user self also can issue or forward some microblogging contents simultaneously, wherein be full of the contents such as a large amount of useless or advertisements, therefore make the content propagated in microblogging many, how therefrom to find or to distinguish that high-quality microblogging content is also very difficult.
Summary of the invention
In view of the above problems, the present invention has been proposed in case provide a kind of overcome the problems referred to above or address the above problem at least in part according to microblog users storehouse, gather the method and apparatus of microblogging content.
According to one aspect of the present invention, a kind of method that gathers microblogging content according to microblog users storehouse is provided, it comprises: judge in described microblog users storehouse, whether previously selected microblog users meets predefined condition; If described previously selected microblog users does not meet predefined condition, the selected state of previously selected microblog users in described microblog users storehouse is designated non-selected; If described previously selected microblog users meets predefined condition, keep the selected state of previously selected microblog users in described microblog users storehouse constant; To identifying the corresponding issue of microblog users of selected state or the microblogging content forwarding in microblog users storehouse, gather.
Alternatively, the described step that judges whether previously selected microblog users in described microblog users storehouse meets predefined condition comprises: judge in described microblog users storehouse, whether microblog users is malicious registration user, if described previously selected microblog users is malicious registration user, judgment result is that described previously selected microblog users does not meet predefined condition, if described previously selected microblog users is not malicious registration user, judgment result is that described previously selected microblog users meets predefined condition; And/or whether the liveness that judges the previously selected microblog users in described microblog users storehouse is lower than predefined liveness threshold value, if the liveness of described microblog users lower than predefined liveness threshold value, judgment result is that described previously selected microblog users does not meet predefined condition; If the liveness of described microblog users is not less than predefined liveness threshold value, judgment result is that described previously selected microblog users meets predefined condition, wherein said liveness comprises: microblog users issue or same day of forwarding the frequency of microblogging, the continuous login time of microblog users and microblog users any one or the multiple combination in line duration.
Alternatively, describedly judge that whether microblog users in described microblog users storehouse is that malicious registration user's step comprises: judge that whether user's score value of described microblog users is lower than predefined malicious registration score value; If user's score value of described microblog users lower than predefined malicious registration score value, judgment result is that described microblog users is malicious registration user; If user's score value of described microblog users is not less than predefined malicious registration score value, judgment result is that described microblog users is not malicious registration user.
The microblogging number of the number of users that alternatively, described user's score value is paid close attention to based on microblog users, the bean vermicelli number of microblog users and microblog users issue calculates.
Alternatively, described method also comprises: the microblog users of the microblogging content that Capture and publish and/or forwarding are relevant to popular keyword or popular crucial phrase; The microblog users collecting is updated in described microblog users storehouse, and the selected state of the microblog users collecting is designated selected.
According to another aspect of the present invention, a kind of equipment that gathers microblogging content according to microblog users storehouse is also provided, it comprises: judge module, for judging whether the previously selected microblog users in described microblog users storehouse meets predefined condition; Selected state update module, the in the situation that of not meeting predefined condition, is designated non-selected by the selected state of previously selected microblog users in described microblog users storehouse for judge described previously selected microblog users at described judge module; And for judge described previously selected microblog users at described judge module and meet predefined condition in the situation that, keep the selected state of previously selected microblog users in described microblog users storehouse constant; The first acquisition module, gathers for microblog users storehouse being identified to the corresponding issue of microblog users of selected state or the microblogging content forwarding.
Alternatively, described judge module comprises: malicious registration user judging unit, be used for judging whether described microblog users storehouse microblog users is malicious registration user, if described previously selected microblog users is malicious registration user, judgment result is that described previously selected microblog users does not meet predefined condition, if described previously selected microblog users is not malicious registration user, judgment result is that described previously selected microblog users meets predefined condition; And/or liveness judging unit, for the liveness of previously selected microblog users that judges described microblog users storehouse whether lower than predefined liveness threshold value, if the liveness of described microblog users lower than predefined liveness threshold value, judgment result is that described previously selected microblog users does not meet predefined condition; If the liveness of described microblog users is not less than predefined liveness threshold value, judgment result is that described previously selected microblog users meets predefined condition, wherein said liveness comprises: microblog users issue or same day of forwarding the frequency of microblogging, the continuous login time of microblog users and microblog users any one or the multiple combination in line duration.
Whether user's score value that alternatively, described malicious registration user judging unit is further used for judging described microblog users is lower than predefined malicious registration score value; If user's score value of described microblog users lower than predefined malicious registration score value, judgment result is that described microblog users is malicious registration user; If user's score value of described microblog users is not less than predefined malicious registration score value, judgment result is that described microblog users is not malicious registration user.
The microblogging number of the number of users that alternatively, described user's score value is paid close attention to based on microblog users, the bean vermicelli number of microblog users and microblog users issue calculates.
Alternatively, described equipment also comprises: acquisition module, for the microblog users of Capture and publish and/or the forwarding microblogging content relevant to popular keyword or popular crucial phrase; Selected state update module, for the microblog users collecting being updated to described microblog users storehouse, and is designated selected by the selected state of the microblog users collecting.
As shown from the above technical solution, embodiments of the invention have following beneficial effect: by judging microblog users in the microblog users storehouse predefined condition that whether meets the demands, microblog users storehouse is safeguarded, in processing time in the time of can reducing the microblogging content that gathers microblog users in microblog users storehouse on the one hand, also can improve on the other hand the accuracy that gathers the microblogging content of microblog users in microblog users storehouse.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to better understand technological means of the present invention, and can be implemented according to the content of instructions, and for above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the specific embodiment of the present invention.
Accompanying drawing explanation
By reading below detailed description of the preferred embodiment, various other advantage and benefits will become cheer and bright for those of ordinary skills.Accompanying drawing is only for the object of preferred implementation is shown, and do not think limitation of the present invention.And in whole accompanying drawing, by identical reference symbol, represent identical parts.In the accompanying drawings:
Fig. 1 shows according to gathering the process flow diagram of the method 100 of microblogging content according to microblog users storehouse in the embodiment of the present invention;
Fig. 2 shows according to gathering the process flow diagram of S111~S117 in the method 100 of microblogging content according to microblog users storehouse in the embodiment of the present invention; And
Fig. 3 shows according to gathering the structured flowchart of the equipment 300 of microblogging content according to microblog users storehouse in the embodiment of the present invention.
Embodiment
Exemplary embodiment of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown exemplary embodiment of the present disclosure in accompanying drawing, yet should be appreciated that and can realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order more thoroughly to understand the disclosure that these embodiment are provided, and can by the scope of the present disclosure complete convey to those skilled in the art.
What below in conjunction with Fig. 1, illustrate according to an embodiment of the invention, be suitable for solving the problems of the technologies described above gathers the process flow diagram of microblogging content 100 according to microblogging storehouse.As shown in Figure 1, the method 100 of the embodiment of the present invention comprises:
In step S101, gather microblogging content and the microblogging parameter of previously selected microblog users in microblog users storehouse.
In an embodiment of the present invention, in microblog users storehouse, record the relevant information of a plurality of microblog users, wherein the selected state of each microblog users comprises selected and non-selected, wherein " select " the microblogging content and the microblogging parameter that represent to gather this microblog users, " non-selected " represents not need to gather microblogging content and the microblogging parameter of this microblog users.Certainly can understand, the selected state of microblog users can be adjusted in an embodiment of the present invention, can by the selected state of microblog users by selected be adjusted into non-selected, or by the selected state of microblog users by non-selected be adjusted into selected.
Alternatively, in an embodiment of the present invention, the application programming interface (Application Programming Interface, API) that can pass through microblogging website (such as Sina's microblogging, Tengxun's microblogging etc.) gathers microblogging content and the microblogging parameter of previously selected microblog users in microblog users storehouse.Certainly can understand, do not limit in an embodiment of the present invention the concrete mode that gathers microblogging content and microblogging parameter.
Alternatively, in an embodiment of the present invention, microblogging parameter comprises: any one in total hop count of the attribute information of microblog users, microblogging, the general comment number of times of microblogging, the hop count of microblogging authenticated and the comment number of times of microblogging authenticated or multiple combination.Wherein, the attribute information of microblog users comprises: the label information of microblog users, can comprise the information such as hobby, occupation, personality of microblog users such as label information.
Subsequently, in step S103, according to the microblogging content collecting and microblogging parameter, extract the popular keyword relevant to microblogging much-talked-about topic or popular crucial phrase.
Alternatively, in step S103, first according to the microblogging content and the microblogging parameter that collect, according to predefined microblogging classification to the processing of classifying of microblogging content; Microblogging content under each microblogging classification is carried out to the processing of microblogging much-talked-about topic, obtain microblogging content relevant to microblogging much-talked-about topic under each microblogging classification; The microblogging content relevant to microblogging much-talked-about topic under each microblogging classification carried out respectively to word segmentation processing, extract the popular keyword relevant to microblogging much-talked-about topic or popular crucial phrase under each microblogging classification.
Alternatively, in an embodiment of the present invention, the corresponding a plurality of keywords of each predefined microblogging classification, these keywords are mated with the microblogging content collecting and/or microblogging parameter, if can match, microblogging classifying content is arrived to the microblogging classification corresponding with this keyword, wherein predefined microblogging classification comprises: the polytypes such as house property, amusement, economy, politics and internet.
Alternatively, in an embodiment of the present invention, can by whether thering is microblogging much-talked-about topic sign in the microblogging content under each microblogging classification, judge that whether this microblogging content is relevant to microblogging much-talked-about topic, if there is microblogging much-talked-about topic sign in microblogging content, judge that this microblogging content is relevant to microblogging much-talked-about topic.For example this microblogging much-talked-about topic sign can be " # ", the form of expression of microblogging much-talked-about topic can be " # sequelae vacation # ", be that microblogging much-talked-about topic is " sequelae vacation ", certainly can understand, do not limit in an embodiment of the present invention the concrete form of microblogging much-talked-about topic sign.
Alternatively, in an embodiment of the present invention, under obtaining each microblogging classification after the microblogging content relevant to microblogging much-talked-about topic, can adopt existing participle technique to carry out word segmentation processing to microblogging content, the frequency that then can occur based on word, extracts the popular keyword relevant to microblogging much-talked-about topic or popular crucial phrase under each microblogging classification.
Subsequently, in step S105, according to the popular keyword extracting or popular crucial phrase, according to the predefined frequency acquisition collection microblogging content relevant to popular keyword or popular crucial phrase.
Alternatively, in an embodiment of the present invention, for can the reach capacity collection of state of each collection, can reduce frequency acquisition.For gathering fewer popular keyword or the popular crucial phrase of quantity, according to collection quantity and the time interval, mark off different intervals, correspondingly frequency acquisition is multiplied by corresponding interval weight.Certainly can understand, do not limit in an embodiment of the present invention the occurrence of frequency acquisition.
Alternatively, in an embodiment of the present invention, can adopt Request-rate to specify frequency acquisition, grammer: Request-rate:1/5 0600-0845, specify same web crawlers webpage of collection in every how many seconds and acquisition time section, for example 1/5 0600-0845, certainly can understand, and does not limit in an embodiment of the present invention the occurrence of frequency acquisition.
In the prior art, because some microblogging content may relate to microblogging much-talked-about topic, if but in this microblogging content, there is no microblogging much-talked-about topic sign, may cause not collecting this microblogging content.And in an embodiment of the present invention, by the popular keyword or the popular crucial phrase that extract before, from microblogging, gather the microblogging content relevant to popular keyword or popular crucial phrase, can make the microblogging content that collects more comprehensive.
Alternatively, in an embodiment of the present invention, after step S105, method 100 also comprises: step S107 and step S109, and in step S107, the microblog users of the microblogging content that Capture and publish and/or forwarding are relevant to popular keyword or popular crucial phrase.
By step S105, can collect the microblog users of the microblogging content relevant to popular keyword or popular keyword, yet wherein some microblog users may not have record (being new microblog users) in microblog users storehouse, the possibility of considering the microblogging content that this class microblog users issue or forwarding are relevant to popular keyword or popular crucial phrase is higher, is therefore necessary the relevant information of this class microblog users to be recorded in microblog users storehouse.
Subsequently, in step S109, the microblog users collecting is updated in microblog users storehouse, and the selected state of the microblog users collecting is designated selected.
Alternatively, in an embodiment of the present invention, after in step S109, microblog users upgrades, whether the microblog users that can detect in microblog users storehouse has repetition, if had, deletes the relevant information of the microblog users repeating.
In order to improve the work efficiency that gathers microblogging content, can carry out malicious registration user identification and/or liveness identification to the microblog users in microblog users storehouse.If microblog users is malicious registration user, or the liveness of microblog users is lower, can the selected state of microblog users in microblog users storehouse be adjusted into non-selected.
It should be noted that, method shown in Fig. 1 do not limit by shown in the order of each step carry out, can adjust as required the sequencing of each step, in addition, described step is also not limited to above-mentioned steps and divides, and above-mentioned steps can further split into more multi-step also can be merged into still less step.
As shown in Figure 2, the method 100 also comprises: step S111, step S113, step S115 and step S117, in step S111, judge in microblog users storehouse, whether previously selected microblog users meets predefined condition.
Alternatively, in an embodiment of the present invention, in step S111, can by the combination of one of following two kinds of modes or two kinds of modes, judge in microblog users storehouse, whether previously selected microblog users meets predefined condition:
Mode one, judge in microblog users storehouse, whether microblog users is malicious registration user, if state previously selected microblog users, be malicious registration user, judgment result is that previously selected microblog users does not meet predefined condition, if previously selected microblog users is not malicious registration user, judgment result is that stating previously selected microblog users meets predefined condition;
Usually, malicious registration user refers to false registered user, corpse bean vermicelli, corpse powder or machine registered user etc.
Mode two, judge the previously selected microblog users in microblog users storehouse liveness whether lower than predefined liveness threshold value, if the liveness of microblog users lower than predefined liveness threshold value, judgment result is that previously selected microblog users does not meet predefined condition; If the liveness of microblog users is not less than predefined liveness threshold value, judgment result is that previously selected microblog users meets predefined condition, wherein liveness comprises: microblog users issue or same day of forwarding the frequency of microblogging, the continuous login time of microblog users and microblog users any one or the multiple combination in line duration.
For example: issue every day or forwarding microblogging >=5, liveness weights=0.2; 5 of issue 3≤every day or forwarding microblogging <, liveness weights=0.1; 3 of issue every day or forwarding microblogging <; Liveness weights=0;
Rule is rewarded in login continuously: login is >=3 days continuously, liveness weights=0.5; Login is >=5 days continuously, liveness weights=1; Login is >=10 days continuously, liveness weights=2.5; Login is >=20 days continuously, liveness weights=5; Login is >=30 days continuously, liveness weights=7.5.
Wherein, in aforesaid way one, can judge in microblog users storehouse, whether microblog users is malicious registration user by following concrete mode:
Judge that whether user's score value of microblog users is lower than predefined malicious registration score value; If user's score value of microblog users lower than predefined malicious registration score value, judgment result is that microblog users is malicious registration user; If user's score value of microblog users is not less than predefined malicious registration score value, judgment result is that described microblog users is not malicious registration user.
Alternatively, in an embodiment of the present invention, can, according to the parameters such as frequency of the quality of the quantity of the head image information of microblog users, bean vermicelli, microblogging content, issue or forwarding microblogging, adopt existing account form to calculate user's score value.For example: user's score value full marks 100 minutes, the head portrait total points of microblog users 5 minutes, the quantity total points of bean vermicelli 10 minutes, the quality total points of microblogging content 10 minutes ..., then based on above-mentioned standards of grading, calculate user's score value of selected microblog users.Certainly can understand, do not limit in an embodiment of the present invention identification malicious registration user's mode.
If previously selected microblog users does not meet predefined condition, enter step S113, in step S113, the selected state of previously selected microblog users in microblog users storehouse is designated non-selected.Namely, cancel to gather and to belong to malicious registration user or the lower microblog users issue of liveness or the microblogging content forwarding.
If previously selected microblog users meets predefined condition, enter step S115, in step S115, keep the selected state of previously selected microblog users in microblog users storehouse constant.Namely, if previously selected microblog users is not that malicious registration user or liveness are higher, when gather microblogging content next time, still need the microblogging content that gathers this microblog users issue or forward.
In step S117, to identifying the corresponding issue of microblog users of selected state or the microblogging content forwarding in microblog users storehouse, gather.Such as non-malicious registration user or the higher corresponding microblogging content of issuing or forwarding of user of liveness are gathered, for valuable microblogging content, gather like this, reject a large amount of useless and rubbish contents, thereby can find microblogging much-talked-about topic in the very first time.
In an embodiment of the present invention, step S111~step S117 can with Fig. 1 in step S101~step S109 in arbitrary step carry out simultaneously, also after arbitrary step or before arbitrary step, carry out therein.Alternatively, step S111~step S117 can carry out before the step S101 in Fig. 1, in gathering microblog users storehouse before the microblogging content and microblogging parameter of previously selected microblog users, judge that whether previously selected microblog users in microblog users storehouse is that the liveness of malicious registration user and/or previously selected microblog users is lower, if this microblog users is when to be the liveness of malicious registration user and/or microblog users lower, cancel selecting this microblog users, thereby can reduce the quantity of the microblog users of collection, improve the accuracy that gathers microblogging content.
It should be noted that, method shown in Fig. 2 do not limit by shown in the order of each step carry out, can adjust as required the sequencing of each step, in addition, described step is also not limited to above-mentioned steps and divides, and above-mentioned steps can further split into more multi-step also can be merged into still less step.
Below in conjunction with Fig. 3 explanation according to one embodiment of the invention, a kind of equipment 300 that gathers microblogging content according to microblog users storehouse of being suitable for solving the problems of the technologies described above.
As shown in Figure 3, according to the equipment 300 of microblog users storehouse collection microblogging content, can mainly comprise according to an embodiment of the invention: the first acquisition module 301, extraction module 303 and the second acquisition module 305.The annexation that should be appreciated that modules represented in Fig. 3 is only example, and those skilled in the art can adopt other annexation completely, as long as modules also can be realized function of the present invention under such annexation.
In this manual, the function of modules can by with specialized hardware or the hardware that can combine with suitable software to carry out processing realize.Such hardware or specialized hardware can comprise special IC (ASIC), various other circuit, various processors etc.When being realized by processor, this function can be provided by single application specific processor, single shared processing device or a plurality of independently processor (wherein some may be shared).In addition, processor should not be understood to the hardware that special finger can executive software, but can impliedly comprise and be not limited to digital signal processor (DSP) hardware, be used for ROM (read-only memory) (ROM), random-access memory (ram) and the non-volatile memory apparatus of storing software.
In an embodiment of the present invention, the first acquisition module 301, for gathering microblogging content and the microblogging parameter of previously selected microblog users.
In an embodiment of the present invention, extraction module 303, for according to the microblogging content and the microblogging parameter that collect, extracts the popular keyword relevant to microblogging much-talked-about topic or popular crucial phrase.
In an embodiment of the present invention, the second acquisition module 305, the popular keyword or the popular crucial phrase that for basis, extract, according to the predefined frequency acquisition collection microblogging content relevant to popular keyword or popular crucial phrase.
Alternatively, in an embodiment of the present invention, equipment 300 also comprises:
The 3rd acquisition module 307, for the microblog users of Capture and publish and/or the forwarding microblogging content relevant to described popular keyword or popular crucial phrase;
Microblog users update module 313, for the microblog users of collection is updated to microblog users storehouse, and is designated selected by the selected state of the microblog users of collection.
Alternatively, in an embodiment of the present invention, described the first acquisition module 301 is further used for gathering by the application programming interface of microblogging website microblogging content and the microblogging parameter of previously selected microblog users in described microblog users storehouse.
Alternatively, in an embodiment of the present invention, described extraction module 303 comprises:
Taxon, for according to the described microblogging content and the microblogging parameter that collect, according to predefined microblogging classification to the processing of classifying of described microblogging content;
Much-talked-about topic processing unit, for the microblogging content under each microblogging classification is carried out to the processing of microblogging much-talked-about topic, obtains microblogging content relevant to microblogging much-talked-about topic under each microblogging classification;
Extraction unit, for the microblogging content relevant to microblogging much-talked-about topic under each microblogging classification carried out respectively to word segmentation processing, extracts the popular keyword relevant to microblogging much-talked-about topic or popular crucial phrase under each microblogging classification.
Continuation is referring to Fig. 3, and the equipment 300 that gathers microblogging content according to microblog users storehouse also comprises: judge module 309 and selected state update module 311.
In an embodiment of the present invention, judge module 309 is for judging that whether the previously selected microblog users in microblog users storehouse is for meeting predefined condition;
Alternatively, in an embodiment of the present invention, judge module 309 also comprises: malicious registration judging unit 3091 and/or liveness judging unit 3093, wherein
Malicious registration judging unit 3091, be used for judging whether described microblog users storehouse microblog users is malicious registration user, if described previously selected microblog users is malicious registration user, judgment result is that described previously selected microblog users does not meet predefined condition, if described previously selected microblog users is not malicious registration user, judgment result is that described previously selected microblog users meets predefined condition;
Liveness judging unit 3093, for the liveness of previously selected microblog users that judges described microblog users storehouse whether lower than predefined liveness threshold value, if the liveness of described microblog users lower than predefined liveness threshold value, judgment result is that described previously selected microblog users does not meet predefined condition; If the liveness of described microblog users is not less than predefined liveness threshold value, judgment result is that described previously selected microblog users meets predefined condition, wherein said liveness comprises: microblog users issue or same day of forwarding the frequency of microblogging, the continuous login time of microblog users and microblog users any one or the multiple combination in line duration.
Alternatively, in an embodiment of the present invention, whether user's score value that malicious registration judging unit 3091 is further used for judging described microblog users is lower than predefined malicious registration score value; If user's score value of described microblog users lower than predefined malicious registration score value, judgment result is that described microblog users is malicious registration user; If user's score value of described microblog users is not less than predefined malicious registration score value, judgment result is that described microblog users is not malicious registration user.The microblogging number of the number of users that wherein, user's score value can be paid close attention to based on microblog users, the bean vermicelli number of microblog users and microblog users issue calculates.
In an embodiment of the present invention, if selected state update module 311 does not meet predefined condition for previously selected microblog users, the selected state of previously selected microblog users in described microblog users storehouse is designated non-selected; If previously selected microblog users meets predefined condition, keep the selected state of previously selected microblog users in described microblog users storehouse constant.
Namely, in an embodiment of the present invention, the first acquisition module 301, can gather identifying the corresponding issue of microblog users of selected state or the microblogging content forwarding in microblog users storehouse.Such as non-malicious registration user or the corresponding issue of the higher user of liveness or the microblogging content that forwards gather, thus can be at very first time discovering hot topic.
The algorithm providing at this is intrinsic not relevant to any certain computer, virtual system or miscellaneous equipment with demonstration.Various general-purpose systems also can with based on using together with this teaching.According to description above, it is apparent constructing the desired structure of this type systematic.In addition, the present invention is not also for any certain programmed language.It should be understood that and can utilize various programming languages to realize content of the present invention described here, and the description of above language-specific being done is in order to disclose preferred forms of the present invention.
In the instructions that provided herein, a large amount of details have been described.Yet, can understand, embodiments of the invention can not put into practice in the situation that there is no these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand one or more in each inventive aspect, in the above in the description of exemplary embodiment of the present invention, each feature of the present invention is grouped together into single embodiment, figure or sometimes in its description.Yet, the method for the disclosure should be construed to the following intention of reflection: the present invention for required protection requires than the more feature of feature of clearly recording in each claim.Or rather, as reflected in claims below, inventive aspect is to be less than all features of disclosed single embodiment above.Therefore, claims of following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and can the module in the equipment in embodiment are adaptively changed and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and can put them into a plurality of submodules or subelement or sub-component in addition.At least some in such feature and/or process or unit are mutually repelling, and can adopt any combination to combine all processes or the unit of disclosed all features in this instructions (comprising claim, summary and the accompanying drawing followed) and disclosed any method like this or equipment.Unless clearly statement in addition, in this instructions (comprising claim, summary and the accompanying drawing followed) disclosed each feature can be by providing identical, be equal to or similar object alternative features replaces.
In addition, those skilled in the art can understand, although embodiment more described herein comprise some feature rather than further feature included in other embodiment, the combination of the feature of different embodiment means within scope of the present invention and forms different embodiment.For example, in the following claims, the one of any of embodiment required for protection can be used with array mode arbitrarily.
All parts embodiment of the present invention can realize with hardware, or realizes with the software module moved on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that and can use in practice microprocessor or digital signal processor (DSP) to realize according to the some or all functions of the some or all parts in the equipment of the collection microblogging content of the embodiment of the present invention.The present invention for example can also be embodied as, for carrying out part or all equipment or device program (, computer program and computer program) of method as described herein.Realizing program of the present invention and can be stored on computer-readable medium like this, or can there is the form of one or more signal.Such signal can be downloaded and obtain from internet website, or provides on carrier signal, or provides with any other form.
It should be noted above-described embodiment the present invention will be described rather than limit the invention, and those skilled in the art can design alternative embodiment in the situation that do not depart from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and is not listed as element or step in the claims.Being positioned at word " " before element or " one " does not get rid of and has a plurality of such elements.The present invention can be by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In having enumerated the unit claim of some equipment, several in these equipment can be to carry out imbody by same hardware branch.The use of word first, second and C grade does not represent any order.Can be title by these word explanations.

Claims (10)

1. according to microblog users storehouse, gather a method for microblogging content, it comprises:
Judge in described microblog users storehouse, whether previously selected microblog users meets predefined condition;
If described previously selected microblog users does not meet predefined condition, the selected state of previously selected microblog users in described microblog users storehouse is designated non-selected;
If described previously selected microblog users meets predefined condition, keep the selected state of previously selected microblog users in described microblog users storehouse constant;
To identifying the corresponding issue of microblog users of selected state or the microblogging content forwarding in microblog users storehouse, gather.
2. method according to claim 1, wherein, the described step that judges whether previously selected microblog users in described microblog users storehouse meets predefined condition comprises:
Judge in described microblog users storehouse, whether microblog users is malicious registration user, if described previously selected microblog users is malicious registration user, judgment result is that described previously selected microblog users does not meet predefined condition, if described previously selected microblog users is not malicious registration user, judgment result is that described previously selected microblog users meets predefined condition; And/or
Judge that whether the liveness of the previously selected microblog users in described microblog users storehouse is lower than predefined liveness threshold value, if the liveness of described microblog users lower than predefined liveness threshold value, judgment result is that described previously selected microblog users does not meet predefined condition; If the liveness of described microblog users is not less than predefined liveness threshold value, judgment result is that described previously selected microblog users meets predefined condition, wherein said liveness comprises: microblog users issue or same day of forwarding the frequency of microblogging, the continuous login time of microblog users and microblog users any one or the multiple combination in line duration.
3. method according to claim 1 and 2, wherein, describedly judges that whether microblog users in described microblog users storehouse is that malicious registration user's step comprises:
Judge that whether user's score value of described microblog users is lower than predefined malicious registration score value;
If user's score value of described microblog users lower than predefined malicious registration score value, judgment result is that described microblog users is malicious registration user;
If user's score value of described microblog users is not less than predefined malicious registration score value, judgment result is that described microblog users is not malicious registration user.
4. according to the method described in claim 1-3 any one, wherein, the microblogging number of the number of users that described user's score value is paid close attention to based on microblog users, the bean vermicelli number of microblog users and microblog users issue calculates.
5. according to the method described in claim 1-4 any one, described method also comprises:
The microblog users of the microblogging content that Capture and publish and/or forwarding are relevant to popular keyword or popular crucial phrase;
The microblog users collecting is updated in described microblog users storehouse, and the selected state of the microblog users collecting is designated selected.
6. according to microblog users storehouse, gather an equipment for microblogging content, it comprises:
Judge module, for judging whether the previously selected microblog users in described microblog users storehouse meets predefined condition;
Selected state update module, the in the situation that of not meeting predefined condition, is designated non-selected by the selected state of previously selected microblog users in described microblog users storehouse for judge described previously selected microblog users at described judge module; And for judge described previously selected microblog users at described judge module and meet predefined condition in the situation that, keep the selected state of previously selected microblog users in described microblog users storehouse constant
The first acquisition module, gathers for microblog users storehouse being identified to the corresponding issue of microblog users of selected state or the microblogging content forwarding.
7. equipment according to claim 6, wherein, described judge module comprises:
Malicious registration user judging unit, be used for judging whether described microblog users storehouse microblog users is malicious registration user, if described previously selected microblog users is malicious registration user, judgment result is that described previously selected microblog users does not meet predefined condition, if described previously selected microblog users is not malicious registration user, judgment result is that described previously selected microblog users meets predefined condition; And/or
Liveness judging unit, for the liveness of previously selected microblog users that judges described microblog users storehouse whether lower than predefined liveness threshold value, if the liveness of described microblog users lower than predefined liveness threshold value, judgment result is that described previously selected microblog users does not meet predefined condition; If the liveness of described microblog users is not less than predefined liveness threshold value, judgment result is that described previously selected microblog users meets predefined condition, wherein said liveness comprises: microblog users issue or same day of forwarding the frequency of microblogging, the continuous login time of microblog users and microblog users any one or the multiple combination in line duration.
8. according to the equipment described in claim 6 or 7, wherein, whether user's score value that described malicious registration user judging unit is further used for judging described microblog users is lower than predefined malicious registration score value; If user's score value of described microblog users lower than predefined malicious registration score value, judgment result is that described microblog users is malicious registration user; If user's score value of described microblog users is not less than predefined malicious registration score value, judgment result is that described microblog users is not malicious registration user.
9. according to the equipment described in claim 6-8 any one, wherein, the microblogging number of the number of users that described user's score value is paid close attention to based on microblog users, the bean vermicelli number of microblog users and microblog users issue calculates.
10. according to the equipment described in claim 6-9 any one, described equipment also comprises:
Acquisition module, for the microblog users of Capture and publish and/or the forwarding microblogging content relevant to popular keyword or popular crucial phrase;
Selected state update module, for the microblog users collecting being updated to described microblog users storehouse, and is designated selected by the selected state of the microblog users collecting.
CN201310476149.2A 2013-10-12 2013-10-12 Method and equipment for collecting microblog content according to microblog user library Pending CN103593399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310476149.2A CN103593399A (en) 2013-10-12 2013-10-12 Method and equipment for collecting microblog content according to microblog user library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310476149.2A CN103593399A (en) 2013-10-12 2013-10-12 Method and equipment for collecting microblog content according to microblog user library

Publications (1)

Publication Number Publication Date
CN103593399A true CN103593399A (en) 2014-02-19

Family

ID=50083540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310476149.2A Pending CN103593399A (en) 2013-10-12 2013-10-12 Method and equipment for collecting microblog content according to microblog user library

Country Status (1)

Country Link
CN (1) CN103593399A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104010002A (en) * 2014-06-16 2014-08-27 南威软件股份有限公司 Internet key point login system and method
CN105468714A (en) * 2015-11-20 2016-04-06 北京邮电大学 Forum-based self-media information display method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163225A (en) * 2011-04-11 2011-08-24 中国科学院地理科学与资源研究所 A fusion evaluation method of traffic information collected based on micro blogs
CN102346766A (en) * 2011-09-20 2012-02-08 北京邮电大学 Method and device for detecting network hot topics found based on maximal clique
CN102508843A (en) * 2011-09-23 2012-06-20 上海量明科技发展有限公司 Screen capture method and system with microblogging function
CN102708176A (en) * 2012-05-08 2012-10-03 山东大学 Microblog data mining method based on active users

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163225A (en) * 2011-04-11 2011-08-24 中国科学院地理科学与资源研究所 A fusion evaluation method of traffic information collected based on micro blogs
CN102346766A (en) * 2011-09-20 2012-02-08 北京邮电大学 Method and device for detecting network hot topics found based on maximal clique
CN102508843A (en) * 2011-09-23 2012-06-20 上海量明科技发展有限公司 Screen capture method and system with microblogging function
CN102708176A (en) * 2012-05-08 2012-10-03 山东大学 Microblog data mining method based on active users

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104010002A (en) * 2014-06-16 2014-08-27 南威软件股份有限公司 Internet key point login system and method
CN105468714A (en) * 2015-11-20 2016-04-06 北京邮电大学 Forum-based self-media information display method and system
CN105468714B (en) * 2015-11-20 2018-11-09 北京邮电大学 It is a kind of based on forum from media information methods of exhibiting and system

Similar Documents

Publication Publication Date Title
CN104239539B (en) A kind of micro-blog information filter method merged based on much information
CN102945290B (en) Hot microblog topic excavating gear and method
Wolfe et al. Island vs. countryside biogeography: an examination of how Amazonian birds respond to forest clearing and fragmentation
CN108563686B (en) Social network rumor identification method and system based on hybrid neural network
CN103902697B (en) Combinatorial search method, client and server
CN102982157A (en) Device and method used for mining microblog hot topics
CN104462301B (en) A kind for the treatment of method and apparatus of network data
CN106886518A (en) A kind of method of microblog account classification
Wang et al. Testing multiple assembly rule models in avian communities on islands of an inundated lake, Zhejiang Province, China
CN103617213B (en) Method and system for identifying newspage attributive characters
CN109726327A (en) A kind of information-pushing method and device
CN106294425A (en) The automatic image-text method of abstracting of commodity network of relation article and system
CN107437026B (en) Malicious webpage advertisement detection method based on advertisement network topology
Griffiths Body size distributions in North American freshwater fish: Large‐scale factors
Yang et al. A taste of tweets: Reverse engineering twitter spammers
CN103593398A (en) Method and equipment for updating microblog user library
CN108334508A (en) The extracting method and device of webpage information
CN103593397A (en) Method and device for acquiring microblog content
CN106909669A (en) The detection method and device of a kind of promotion message
CN107748898A (en) File classifying method, device, computing device and computer-readable storage medium
JPWO2012127968A1 (en) Event analysis apparatus, event analysis method, and program
Zhang et al. NEIGHBORWATCHER: A Content-Agnostic Comment Spam Inference System.
CN103593399A (en) Method and equipment for collecting microblog content according to microblog user library
CN104156458B (en) The extracting method and device of a kind of information
Ruhrberg et al. # ISIS—a comparative analysis of country-specific sentiment on Twitter

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140219

RJ01 Rejection of invention patent application after publication