CN105243122A - Social software based data acquisition method and apparatus - Google Patents

Social software based data acquisition method and apparatus Download PDF

Info

Publication number
CN105243122A
CN105243122A CN201510633010.3A CN201510633010A CN105243122A CN 105243122 A CN105243122 A CN 105243122A CN 201510633010 A CN201510633010 A CN 201510633010A CN 105243122 A CN105243122 A CN 105243122A
Authority
CN
China
Prior art keywords
user
crawled
queue
buddy list
crawl
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510633010.3A
Other languages
Chinese (zh)
Inventor
李鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201510633010.3A priority Critical patent/CN105243122A/en
Publication of CN105243122A publication Critical patent/CN105243122A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a social software based data acquisition method and apparatus. The method comprises: S1: selecting at least one registered user in target social software, and adding a user identifier corresponding to the at least one registered user into a crawling queue; S2: according to the user identifiers in the crawling queue, crawling webpage data and friend lists of the users corresponding to the user identifiers one by one; and S3: adding each user identifier in the crawled friend lists into the crawling queue, returning to execute step S2, and ending when a preset condition is met. According to the present scheme, the number of users corresponding to the crawled webpage data can be larger, thereby improving the accuracy of an analysis result.

Description

A kind of data capture method based on social software and device
Technical field
The present invention relates to technical field of the computer network, particularly a kind of data capture method based on social software and device.
Background technology
Along with the develop rapidly of computer networking technology, the data volume that user produces on the computer network is also increasing.Wherein, by obtaining the data volume that produces on the computer network compared with multi-user, the concern information obtaining user can be analyzed, thus can be that following network Development is ready according to the concern information of user.
Existing data capture method captures by web crawlers the web data that user accesses.Due to obtain web data amount time for targeted customer more, more accurate to the analysis result of web data, therefore, how to obtain the web data of more targeted customer, to improve the accuracy rate of web data analysis result, become current urgent problem.
Summary of the invention
In view of this, the invention provides a kind of data capture method based on social software and device, to obtain the web data of more targeted customer.
The invention provides a kind of data capture method based on social software, comprising:
S1: select at least one registered user in the social software of target, and the user ID that this at least one registered user is corresponding is respectively added to crawl in queue;
S2: according to the described user ID crawled in queue, crawls web data and the buddy list of user corresponding to each user ID one by one;
S3: crawl in queue described in each user ID in the buddy list crawled is added to, and return execution step S2, terminate until meet when imposing a condition.
Preferably, comprise further:
The web data crawling user corresponding to each user ID is stored in database;
And/or,
Each user ID crawled in queue described in adding to is added in database.
Preferably, described each user ID in the buddy list crawled is added to described in crawl in queue, comprising:
Each user ID in the buddy list crawled is compared with each user ID of adding in database one by one, and crawls in queue described in the user ID do not stored in database is added to.
Preferably,
Comprise further: the space degree of the user ID crawled in queue described in adding to is marked, wherein, the space degree crawled described in adding to first corresponding to the user ID in queue is 1 degree, the space degree large 1 that the space degree for each user ID crawled in the buddy list of targeted customer's mark identifies than this targeted customer;
Described satisfied imposing a condition comprises: the space degree crawled described in adding to corresponding to the user ID in queue arrives setting value.
Preferably, described in crawl web data and the buddy list of user corresponding to each user ID, comprising:
The described user ID crawled in queue is divided into multiple Map task, and give at least two processors by multiple Map task matching of division, described at least two-server walk abreast crawl into its distribute Map task, and after process terminates, Reduce merging is carried out to the data of each station server process.
Present invention also offers a kind of data acquisition facility based on social software, comprising:
Selection unit, for selecting at least one registered user in the social software of target;
Adding device, adds to for the user ID that this at least one registered user is corresponding respectively and crawls in queue;
Crawling unit, for crawling the user ID in queue described in basis, crawling web data and the buddy list of user corresponding to each user ID one by one;
Described adding device, crawls in queue described in being further used for each user ID in the buddy list crawled to add to, and crawls unit described in triggering and perform corresponding operating, terminates the described triggering crawling unit until meet when imposing a condition.
Preferably, comprise further:
Transmitting element, for being stored in database by the web data crawling user corresponding to each user ID;
And/or,
Described transmitting element, for being stored into each user ID crawled in queue described in adding in database.
Preferably, described adding device, specifically for each user ID in the buddy list crawled being compared with each user ID of adding in database one by one, and crawls in queue described in the user ID do not stored in database being added to.
Preferably,
Comprise further: indexing unit, for marking the space degree of the user ID crawled in queue described in adding to, wherein, the space degree crawled described in adding to first corresponding to the user ID in queue is 1 degree, the space degree large 1 that the space degree for each user ID crawled in the buddy list of targeted customer's mark identifies than this targeted customer;
Described satisfied imposing a condition comprises: the space degree crawled described in adding to corresponding to the user ID in queue arrives setting value.
Preferably, describedly crawl unit, specifically for the described user ID crawled in queue is divided into multiple Map task, and give at least two processors by multiple Map task matching of division, described at least two-server walk abreast crawl into its distribute Map task, and after process terminates, Reduce merging is carried out to the data of each station server process.
Embodiments provide a kind of data capture method based on social software and device, by utilizing registered users a large amount of in social software, and the friend relation between registered user, crawl the web data of user, because in social software, the number of registered user is larger, therefore, corresponding to the web data crawled, the number of user is also more, thus can improve the accuracy rate of analysis result.
Accompanying drawing explanation
Fig. 1 is the method flow diagram that the embodiment of the present invention provides;
Fig. 2 is the method flow diagram that another embodiment of the present invention provides;
Fig. 3 is the hardware structure figure of the data acquisition facility place equipment that the embodiment of the present invention provides;
Fig. 4 is the data acquisition facility structural representation that the embodiment of the present invention provides;
Fig. 5 is the data acquisition facility structural representation that another embodiment of the present invention provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.Obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, embodiments provide a kind of data capture method based on social software, the method can comprise the following steps:
Step 101: select at least one registered user in the social software of target, and the user ID that this at least one registered user is corresponding is respectively added to crawl in queue;
Step 102: according to the described user ID crawled in queue, crawls web data and the buddy list of user corresponding to each user ID one by one;
Step 103: crawl in queue described in each user ID in the buddy list crawled is added to, and return execution step 102, terminate until meet when imposing a condition.
In embodiments of the present invention, by utilizing registered users a large amount of in social software, and the friend relation between registered user, crawl the web data of user, because in social software, the number of registered user is larger, therefore, corresponding to the web data crawled, the number of user is also more, thus can improve the accuracy rate of analysis result.
In a preferred embodiment of the invention, in order to prevent to the web data of same user repeat crawl, each user ID in the buddy list crawled can be compared with each user ID of adding in database one by one, and the user ID do not stored in database is added to crawl in queue.Thus the calculated amount that can reduce in subsequent process.
In a preferred embodiment of the invention, crawling of web data for user is not endless, theoretical according to six degrees of separation, between two strangers, the people at institute interval can not more than six, that is, at most by being just familiar with between five go-betweens, two strangers, Here it is, and six degrees of separation is theoretical, is also Small-world Theory in Self.Therefore, the space degree adding the user ID crawled in queue to is being marked, wherein, the space degree crawled described in adding to first corresponding to the user ID in queue is 1 degree, the space degree large 1 that the space degree for each user ID crawled in the buddy list of targeted customer's mark identifies than this targeted customer; Impose a condition can comprise for meeting: the space degree crawled described in adding to corresponding to the user ID in queue arrives setting value.Wherein, this setting value can be 6.When to add the space degree corresponding to the user ID that crawls in queue to be 6, namely can show to crawl the web data of all registered users in this social software.
In a preferred embodiment of the invention, because the data volume of crawled web data is larger, therefore the user ID crawled in queue can be divided into multiple Map task, and give at least two processors by multiple Map task matching of division, described at least two-server walk abreast crawl into its distribute Map task, and after process terminates, Reduce merging is carried out to the data of each station server process.Thus the efficiency that web data crawls can be improved.
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
As shown in Figure 2, embodiments provide a kind of data capture method based on social software, the method can comprise the following steps:
Step 201: arrange crawl end meet impose a condition.
Crawling of web data for user is not endless, theoretical according to six degrees of separation, between two strangers, the people at institute interval can not more than six, that is, at most by being just familiar with between five go-betweens, two strangers, Here it is, and six degrees of separation is theoretical, is also Small-world Theory in Self.
In the present embodiment, need arrange crawl end meet impose a condition.Wherein, can arrange crawl as follows end meet impose a condition:
1, the space degree added to corresponding to the user ID crawled in queue arrives setting value a.
Wherein, theoretical according to six degrees of separation, this setting value a can be 6, certainly, also can be not less than 6 other numerical value.
2, corresponding to the web data crawled, the number of user ID reaches setting value b.
Wherein, this setting value b can be 100,000,000.The value of this setting value b is larger, and the analysis result for the web data getting this setting value b user is more accurate.
Step 202: determine the social software of target.
In the present embodiment, the social software of the target determined can be the social software of any class that can run in the current networks such as facebook, twitter, instagram, microblogging or QQ.
Step 203: select at least one registered user in the social software of target, and the user ID that this at least one registered user is corresponding is respectively added to crawl in queue, 1 degree is labeled as to the space degree of at least one registered user of this selection.
For crawl terminate required meet impose a condition as above-mentioned condition 1, the space degree of at least one registered user selected first from the social software of target can be labeled as 1 degree.
In the present embodiment, in order to be convenient to crawling the web data of user in subsequent process, the user ID that this at least one registered user selected is corresponding respectively can be added to and crawling in queue.
Step 204: according to the user ID crawled in queue, crawls web data and the buddy list of user corresponding to each user ID one by one.
In the present embodiment, web crawlers can be utilized to crawl for the web data of each user and crawling of buddy list info, wherein, web crawlers, be also called webpage spider, network robot or webpage follower, be a kind of according to certain rule, automatically capture program or the script of web message.In this web crawlers operational process, all information that user carries out on network all can be crawled, thus these information crawled can be utilized to carry out analyzing to determine the focus of user.
In the present embodiment, because the data volume of crawled web data is larger, therefore the user ID crawled in queue can be divided into multiple Map task, and give at least two processors by multiple Map task matching of division, this at least two-server walk abreast crawl into its distribute Map task, and after process terminates, Reduce merging is carried out to the data of each station server process.Thus the efficiency that web data crawls can be improved.
Further, in order to store and record the content crawled, can the web data of user corresponding to each user ID crawled be stored in database; And/or, each user ID crawled in queue described in adding to can be added in database.
Step 205: each user ID in the buddy list crawled is compared with each user ID of adding in database one by one, and crawl in queue described in the user ID do not stored in database is added to; And return execution step 204, terminate until meet when imposing a condition.
In the present embodiment, crawl the space degree of user ID in queue mark for adding at every turn, wherein, for the space degree large 1 that the space degree of each user ID crawled in the buddy list of targeted customer's mark identifies than this targeted customer.
Such as, added to by the registered user A of selection in step 203 and crawl in queue, the space degree that this registered user A is corresponding is 1 degree.In this step, good friend for the registered user A that will crawl: user A1, user A2, user A3 ..., user An adds to when crawling in queue, can to user A1, user A2, user A3 ..., user An space degree be all labeled as 2.Good friend for by crawling user A1: user A11, user A12, user A13 ..., user A1n adds to when crawling in queue, can to user A11, user A12, user A13 ..., user A1n space degree be all labeled as 3, the like.
When reaching met imposing a condition, such as, when to add the space degree corresponding to user ID crawled in queue to be 6, the acquisition to space degree buddy list of user corresponding to the user ID of 6 is stopped.
In the present embodiment, three circulation sets of threads can be used, perform following three functions respectively: 1, queue to be crawled, produce new queue to be crawled, preserve the data crawled.
As shown in Figure 3, Figure 4, a kind of data acquisition facility based on social software is embodiments provided.Device embodiment can pass through software simulating, also can be realized by the mode of hardware or software and hardware combining.Say from hardware view; as shown in Figure 3; for the embodiment of the present invention is based on a kind of hardware structure diagram of the data acquisition facility place equipment of social software; except the processor shown in Fig. 3, internal memory, network interface and nonvolatile memory; in embodiment, the equipment at device place can also comprise other hardware usually, as the forwarding chip etc. of responsible process message.For software simulating, as shown in Figure 4, as the device on a logical meaning, be by the CPU of its place equipment, computer program instructions corresponding in nonvolatile memory is read operation in internal memory to be formed.The data acquisition facility based on social software that the present embodiment provides comprises:
Selection unit 401, for selecting at least one registered user in the social software of target;
Adding device 402, adds to for the user ID that this at least one registered user is corresponding respectively and crawls in queue;
Crawling unit 403, for crawling the user ID in queue described in basis, crawling web data and the buddy list of user corresponding to each user ID one by one;
Described adding device 402, crawls in queue described in being further used for each user ID in the buddy list crawled to add to, and crawls unit described in triggering and perform corresponding operating, terminates the described triggering crawling unit until meet when imposing a condition.
In a preferred embodiment of the invention, as shown in Figure 5, this data acquisition facility may further include:
Transmitting element 501, for being stored in database by the web data crawling user corresponding to each user ID;
And/or,
Described transmitting element 501, for being stored into each user ID crawled in queue described in adding in database.
Further, described adding device 402, specifically for each user ID in the buddy list crawled is compared with each user ID of adding in database one by one, and crawl in queue described in the user ID do not stored in database is added to.
Comprise further: indexing unit 502, for marking the space degree of the user ID crawled in queue described in adding to, wherein, the space degree crawled described in adding to first corresponding to the user ID in queue is 1 degree, the space degree large 1 that the space degree for each user ID crawled in the buddy list of targeted customer's mark identifies than this targeted customer;
Described satisfied imposing a condition comprises: the space degree crawled described in adding to corresponding to the user ID in queue arrives setting value.
Further, describedly crawl unit 403, specifically for the described user ID crawled in queue is divided into multiple Map task, and give at least two processors by multiple Map task matching of division, described at least two-server walk abreast crawl into its distribute Map task, and after process terminates, Reduce merging is carried out to the data of each station server process.
To sum up, the embodiment of the present invention at least can realize following beneficial effect:
1, in embodiments of the present invention, by utilizing registered users a large amount of in social software, and the friend relation between registered user, crawl the web data of user, because in social software, the number of registered user is larger, therefore, corresponding to the web data crawled, the number of user is also more, thus can improve the accuracy rate of analysis result.
2, in embodiments of the present invention, in order to prevent to the web data of same user repeat crawl, each user ID in the buddy list crawled can be compared with each user ID of adding in database one by one, and the user ID do not stored in database is added to crawl in queue.Thus the calculated amount that can reduce in subsequent process.
3, in embodiments of the present invention, because the data volume of crawled web data is larger, therefore the user ID crawled in queue can be divided into multiple Map task, and give at least two processors by multiple Map task matching of division, described at least two-server walk abreast crawl into its distribute Map task, and after process terminates, Reduce merging is carried out to the data of each station server process.Thus the efficiency that web data crawls can be improved.
The content such as information interaction, implementation between each unit in the said equipment, due to the inventive method embodiment based on same design, particular content can see in the inventive method embodiment describe, repeat no more herein.
It should be noted that, in this article, the relational terms of such as first and second and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element " being comprised a 〃 〃 〃 〃 〃 〃 " limited by statement, and be not precluded within process, method, article or the equipment comprising described key element and also there is other same factor.
One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can have been come by the hardware that programmed instruction is relevant, aforesaid program can be stored in the storage medium of embodied on computer readable, this program, when performing, performs the step comprising said method embodiment; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium in.
Finally it should be noted that: the foregoing is only preferred embodiment of the present invention, only for illustration of technical scheme of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims (10)

1. based on a data capture method for social software, it is characterized in that, comprising:
S1: select at least one registered user in the social software of target, and the user ID that this at least one registered user is corresponding is respectively added to crawl in queue;
S2: according to the described user ID crawled in queue, crawls web data and the buddy list of user corresponding to each user ID one by one;
S3: crawl in queue described in each user ID in the buddy list crawled is added to, and return execution step S2, terminate until meet when imposing a condition.
2. method according to claim 1, is characterized in that, comprises further:
The web data crawling user corresponding to each user ID is stored in database;
And/or,
Each user ID crawled in queue described in adding to is added in database.
3. method according to claim 2, is characterized in that, described each user ID in the buddy list crawled is added to described in crawl in queue, comprising:
Each user ID in the buddy list crawled is compared with each user ID of adding in database one by one, and crawls in queue described in the user ID do not stored in database is added to.
4. method according to claim 1, is characterized in that,
Comprise further: the space degree of the user ID crawled in queue described in adding to is marked, wherein, the space degree crawled described in adding to first corresponding to the user ID in queue is 1 degree, the space degree large 1 that the space degree for each user ID crawled in the buddy list of targeted customer's mark identifies than this targeted customer;
Described satisfied imposing a condition comprises: the space degree crawled described in adding to corresponding to the user ID in queue arrives setting value.
5., according to described method arbitrary in claim 1-4, it is characterized in that, described in crawl web data and the buddy list of user corresponding to each user ID, comprising:
The described user ID crawled in queue is divided into multiple Map task, and give at least two processors by multiple Map task matching of division, described at least two-server walk abreast crawl into its distribute Map task, and after process terminates, Reduce merging is carried out to the data of each station server process.
6. based on a data acquisition facility for social software, it is characterized in that, comprising:
Selection unit, for selecting at least one registered user in the social software of target;
Adding device, adds to for the user ID that this at least one registered user is corresponding respectively and crawls in queue;
Crawling unit, for crawling the user ID in queue described in basis, crawling web data and the buddy list of user corresponding to each user ID one by one;
Described adding device, crawls in queue described in being further used for each user ID in the buddy list crawled to add to, and crawls unit described in triggering and perform corresponding operating, terminates the described triggering crawling unit until meet when imposing a condition.
7. data acquisition facility according to claim 6, is characterized in that, comprises further:
Transmitting element, for being stored in database by the web data crawling user corresponding to each user ID;
And/or,
Described transmitting element, for being stored into each user ID crawled in queue described in adding in database.
8. data acquisition facility according to claim 7, it is characterized in that, described adding device, specifically for each user ID in the buddy list crawled is compared with each user ID of adding in database one by one, and crawl in queue described in the user ID do not stored in database is added to.
9. data acquisition facility according to claim 6, is characterized in that,
Comprise further: indexing unit, for marking the space degree of the user ID crawled in queue described in adding to, wherein, the space degree crawled described in adding to first corresponding to the user ID in queue is 1 degree, the space degree large 1 that the space degree for each user ID crawled in the buddy list of targeted customer's mark identifies than this targeted customer;
Described satisfied imposing a condition comprises: the space degree crawled described in adding to corresponding to the user ID in queue arrives setting value.
10. according to described data acquisition facility arbitrary in claim 6-9, it is characterized in that, describedly crawl unit, specifically for the described user ID crawled in queue is divided into multiple Map task, and give at least two processors by multiple Map task matching of division, described at least two-server walks abreast the Map task crawled as it distributes, and after process terminates, carries out Reduce merging to the data of each station server process.
CN201510633010.3A 2015-09-29 2015-09-29 Social software based data acquisition method and apparatus Pending CN105243122A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510633010.3A CN105243122A (en) 2015-09-29 2015-09-29 Social software based data acquisition method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510633010.3A CN105243122A (en) 2015-09-29 2015-09-29 Social software based data acquisition method and apparatus

Publications (1)

Publication Number Publication Date
CN105243122A true CN105243122A (en) 2016-01-13

Family

ID=55040770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510633010.3A Pending CN105243122A (en) 2015-09-29 2015-09-29 Social software based data acquisition method and apparatus

Country Status (1)

Country Link
CN (1) CN105243122A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126740A (en) * 2016-06-30 2016-11-16 杭州师范大学 A kind of usage mining method and apparatus during event propagation
CN107948052A (en) * 2017-11-14 2018-04-20 福建中金在线信息科技有限公司 Information crawler method, apparatus, electronic equipment and system
CN110020046A (en) * 2017-10-20 2019-07-16 中移(苏州)软件技术有限公司 A kind of data grab method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366018A (en) * 2013-08-02 2013-10-23 人民搜索网络股份公司 Microblog information capturing method and device
CN103870510A (en) * 2012-12-17 2014-06-18 华中科技大学 Social network friend filtering method on basis of distributive parallel processing mode

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870510A (en) * 2012-12-17 2014-06-18 华中科技大学 Social network friend filtering method on basis of distributive parallel processing mode
CN103366018A (en) * 2013-08-02 2013-10-23 人民搜索网络股份公司 Microblog information capturing method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126740A (en) * 2016-06-30 2016-11-16 杭州师范大学 A kind of usage mining method and apparatus during event propagation
CN110020046A (en) * 2017-10-20 2019-07-16 中移(苏州)软件技术有限公司 A kind of data grab method and device
CN110020046B (en) * 2017-10-20 2021-06-15 中移(苏州)软件技术有限公司 Data capturing method and device
CN107948052A (en) * 2017-11-14 2018-04-20 福建中金在线信息科技有限公司 Information crawler method, apparatus, electronic equipment and system

Similar Documents

Publication Publication Date Title
CN107102941B (en) Test case generation method and device
Shi et al. Detecting malicious social bots based on clickstream sequences
CN105224606B (en) A kind of processing method and processing device of user identifier
CN107800591B (en) Unified log data analysis method
Pant et al. Web footprints of firms: Using online isomorphism for competitor identification
JP2019533205A (en) User keyword extraction apparatus, method, and computer-readable storage medium
CN102724059A (en) Website operation state monitoring and abnormal detection based on MapReduce
CN107515915A (en) User based on user behavior data identifies correlating method
CN108304410A (en) A kind of detection method, device and the data analysing method of the abnormal access page
CN105119735B (en) A kind of method and apparatus for determining discharge pattern
CN105812175B (en) Resource management method and resource management equipment
CN104750760A (en) Application software recommending method and device
CN101957968A (en) Online transaction service aggregation method based on Hadoop
Rossi et al. Parallel maximum clique algorithms with applications to network analysis and storage
CN105243122A (en) Social software based data acquisition method and apparatus
CN107153702A (en) A kind of data processing method and device
Wang et al. A multi-layered performance analysis for cloud-based topic detection and tracking in big data applications
Korzeniowski et al. Landscape of automated log analysis: A systematic literature review and mapping study
CN107481039A (en) A kind of event-handling method and terminal device
Kaushal et al. Methods for user profiling across social networks
KR20180035633A (en) Artificial Intelligence for Decision Making Based on Machine Learning of Human Decision Making Process
Dhekane et al. Talash: Friend Finding In Federated Social Networks.
CN107220262B (en) Information processing method and device
CN106375351A (en) Abnormal domain name detection method and device
WO2016135883A1 (en) Service design assistance system and service design assistance method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160113

WD01 Invention patent application deemed withdrawn after publication