CN110046319A - Social media information acquisition method, device, system, equipment and storage medium - Google Patents

Social media information acquisition method, device, system, equipment and storage medium Download PDF

Info

Publication number
CN110046319A
CN110046319A CN201910255758.2A CN201910255758A CN110046319A CN 110046319 A CN110046319 A CN 110046319A CN 201910255758 A CN201910255758 A CN 201910255758A CN 110046319 A CN110046319 A CN 110046319A
Authority
CN
China
Prior art keywords
collection
scheduling
social media
account
account resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910255758.2A
Other languages
Chinese (zh)
Other versions
CN110046319B (en
Inventor
李宇涵
曹六一
张丹
于晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Beijing Founder Electronics Co Ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN201910255758.2A priority Critical patent/CN110046319B/en
Publication of CN110046319A publication Critical patent/CN110046319A/en
Application granted granted Critical
Publication of CN110046319B publication Critical patent/CN110046319B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The embodiment of the present application provides a kind of social media information acquisition method, device, system, equipment and storage medium, this method comprises: sending the first solicited message to account resource service device, first solicited message is for requesting the account resource service device to distribute corresponding account resource to the collection scheduling device according to the characteristic information of social media to be collected;According to the account resource, scheduling acquisition tasks are generated and/or derived;The second solicited message, multiple Common Services Components that second solicited message is provided for capture program dynamic loading device described in request call are sent to capture program dynamic loading device;According to the scheduling acquisition tasks, by the multiple Common Services Components and the account resource, determine that collection result, the collection result are used to indicate to carry out the acquisition data obtained after social media information acquisition to the social media to be collected.Method provided in this embodiment can be suitable for the acquisition of general social media information.

Description

Social media information acquisition method, device, system, equipment and storage medium
Technical field
The invention relates to social media information acquisition technique fields more particularly to a kind of social media information to acquire Method, apparatus, system, equipment and storage medium.
Background technique
With the rapidly development of internet in recent decades, more and more people tend to deliver viewpoint in a network, obtain Information is taken, all kinds of social activity have all been migrated into internet.And social media is as most important group in internet social activity At part, attracted a large amount of user in social media register account number, state one's views.
Due to fast, the disclosure extension in social media of the huge and internet of social media user base number Speed and range are all significantly larger than traditional media, so that the information propagated in social media very likely quickly consumingly threatens To social public security, therefore it is very necessary for carrying out analysis to the web site contents of social media.Wherein, in social media Information collection be in web site contents data analysis before necessary link.
But because the website interaction of social media is complicated, access control is stringent, and traditional acquisition scheme generally can not be direct Applied to social media, and long there are the development cycle, the problems such as collecting efficiency is low.
Summary of the invention
The embodiment of the present application provides a kind of social media information acquisition method, device, system, equipment and storage medium, with Overcome not can be used directly in social media in the prior art, and long there are the development cycle, the low problem of collecting efficiency.
In a first aspect, the embodiment of the present application provides a kind of social media information acquisition method, it is applied to collection scheduling device, Include:
The first solicited message is sent to account resource service device, first solicited message is for requesting the account to provide Source service unit distributes corresponding account resource to the collection scheduling device according to the characteristic information of social media to be collected;
According to the account resource, scheduling acquisition tasks are generated and/or derived;
The second solicited message is sent to capture program dynamic loading device, second solicited message is used for request call institute Multiple Common Services Components of capture program dynamic loading device offer are provided;
Acquisition is determined by the multiple Common Services Components and the account resource according to the scheduling acquisition tasks As a result, the collection result is used to indicate to carry out the acquisition obtained after social media information acquisition to the social media to be collected Data.
In a kind of possible design, the multiple Common Services Components include first communication module;
It is described to be determined according to the scheduling acquisition tasks by the multiple Common Services Components and the account resource Collection result, comprising:
Network request instruction is sent to server by the first communication module, the network request instruction includes described Account resource and the scheduling acquisition tasks, the network request instruction is for requesting the server to be acquired according to the scheduling Task download the corresponding first task of the scheduling acquisition tasks as a result, and according to the account resource, by the first task As a result the collection scheduling device is fed back to by the first communication module;
Receive the first task that the first communication module returns as a result, and the task result is parsed, Determine collection result.
It include second communication module and filter molality block in a kind of possible design, in the multiple Common Services Components;
It is described to be determined according to the scheduling acquisition tasks by the multiple Common Services Components and the account resource Collection result, comprising:
Network request instruction is sent to server by the second communication module, the network request instruction is for requesting The server downloads corresponding second task result of the scheduling acquisition tasks, the filter weight according to the scheduling acquisition tasks Module is used to receive the second task result that the second communication module is sent, and carries out filter weight to second task result, The second task result after obtaining filter weight will filter the second task result after weight and be sent to the second communication module, and described the Two communication modules are used to second task result being sent to the filter molality block, and by the second task knot filtered after weight Fruit feeds back to the collection scheduling device;
The second task result after receiving the filter weight that the second communication module returns, and to the after the filter weight Two task results are parsed, and determine collection result.
It further include writing module in the multiple Common Services Components in a kind of possible design;
After the determining collection result, the method also includes:
The collection result is written in predetermined directory by the write module, and controls the storage collection result The predetermined directory in single file memory be less than or equal to default memory.
In a kind of possible design, the social media information acquisition method, further includes:
Third solicited message is sent to the account resource service device, the third solicited message is deposited for request Store up target account the Resources list of the account resource in the account resource service device, the target account resource column Table is the account the Resources list at current time;
Described target account the Resources list is received, and the account resource in described target account the Resources list is supervised Control;
Account resource abnormal in described target account the Resources list is sent to the account resource service device, so that The account resource service device handles the abnormal account resource.
Second aspect, the embodiment of the present application provide a kind of social media information acquisition method, are applied to capture program dynamic Loading device, comprising:
The instruction of triggering starting collection scheduling device, the instruction of the starting collection scheduling device is for starting multiple acquisitions Dispatching device, so that multiple collection scheduling devices send the first solicited message to account resource service device respectively, it is described First solicited message is for requesting the account Resource Server according to the characteristic information of social media to be collected to multiple described Each collection scheduling device in collection scheduling device distributes corresponding account resource, so that each collection scheduling device root According to the account resource, corresponding scheduling acquisition tasks are generated and/or derived;
The second solicited message that multiple collection scheduling devices are sent respectively is received, second solicited message is for asking The multiple Common Services Components for calling the capture program dynamic loading device to provide are sought, so that each collection scheduling device Corresponding acquisition knot is determined by the multiple Common Services Components and the account resource according to the scheduling acquisition tasks Fruit, the collection result are used to indicate to carry out the acquisition number obtained after social media information acquisition to the social media to be collected According to.
In a kind of possible design, the triggering starting collection scheduling device instruction, comprising:
Inquire preset dispatching device deployment catalogue in the capture program dynamic loading device;
The collection scheduling stored in the mark for obtaining the multiple collection scheduling device, with dispatching device deployment catalogue The mark of device is compared, determine in the multiple collection scheduling device the mark of the collection scheduling device of operation to be terminated and The mark of inactive collection scheduling device;
According to the mark of the collection scheduling device of the operation to be terminated, in the multiple collection scheduling device out of service Collection scheduling device corresponding with the mark of collection scheduling device of the operation to be terminated;
According to the mark of the inactive collection scheduling device, to the mark pair of the inactive collection scheduling device The collection scheduling device triggering start-up operation answered, the triggering start-up operation include the instruction of triggering starting collection scheduling device.
The third aspect, the embodiment of the present application provide a kind of social media information acquisition method, are applied to account resource service Device, comprising:
Receive the first solicited message that multiple collection scheduling devices are sent respectively;
It is filled by the characteristic information of social media to be collected to the multiple collection scheduling according to first solicited message Each collection scheduling device distributes corresponding account resource in setting, so that each collection scheduling device is according to the account Number resource, generates and/or derivative scheduling acquisition tasks, and according to the scheduling acquisition tasks, passes through multiple Common Services Components With the account resource, determine that collection result, the collection result are social for indicating to carry out the social media to be collected The acquisition data obtained after media information acquisition, the multiple Common Services Components are acquisition dynamic loading devices to each described What collection scheduling device provided.
In a kind of possible design, the social media information acquisition method, further includes:
It is every in the multiple account attribute according to multiple account attributes preset in the account resource service device The corresponding account assignment period of a account attribute logs in, the task of periodic survey and account resource dissemination;
The executing the periodical login, periodic survey and the account resource dissemination that distribute of the task, generates account resource, institute Stating account resource includes account log-on message and application programming interfaces.
Fourth aspect, the embodiment of the present application provide a kind of social media information acquisition device, comprising:
First solicited message sending module, for account resource service device send the first solicited message, described first Solicited message is for requesting the account resource service device to be adjusted according to the characteristic information of social media to be collected to the acquisition It spends device and distributes corresponding account resource;
Acquisition tasks generation module is dispatched, for according to the account resource, generating and/or deriving scheduling acquisition tasks;
Second solicited message sending module, it is described for sending the second solicited message to capture program dynamic loading device Multiple Common Services Components that second solicited message is provided for capture program dynamic loading device described in request call;
Collection result determining module, for according to the scheduling acquisition tasks, by the multiple Common Services Components and The account resource determines that collection result, the collection result carry out social matchmaker to the social media to be collected for indicating The acquisition data obtained after body information collection.
5th aspect, the embodiment of the present application provide a kind of social media information acquisition device, comprising:
Starting module, for triggering the instruction of starting collection scheduling device, the instruction of the starting collection scheduling device is used In starting multiple collection scheduling devices, so that multiple collection scheduling devices send first to account resource service device respectively Solicited message, first solicited message is for requesting the account Resource Server to be believed according to the feature of social media to be collected It ceases and distributes corresponding account resource to each collection scheduling device in multiple collection scheduling devices, so as to be adopted described in each Collect dispatching device according to the account resource, generates and/or derive corresponding scheduling acquisition tasks;
Second solicited message receiving module, the second request letter sent respectively for receiving multiple collection scheduling devices Breath, multiple generic service groups that second solicited message is provided for capture program dynamic loading device described in request call Part, so that each collection scheduling device passes through the multiple Common Services Components and institute according to the scheduling acquisition tasks Account resource is stated, determines that corresponding collection result, the collection result carry out society to the social media to be collected for indicating Hand over the acquisition data obtained after media information acquisition.
6th aspect, the embodiment of the present application provide a kind of social media information acquisition device, comprising:
First solicited message receiving module, the first solicited message sent respectively for receiving multiple collection scheduling devices;
Account resource management module, the characteristic information for passing through social media to be collected according to first solicited message Corresponding account resource is distributed to the collection scheduling device each in the multiple collection scheduling device, so as to adopt described in each Collect dispatching device according to the account resource, generate and/or derivative scheduling acquisition tasks, and according to the scheduling acquisition tasks, By multiple Common Services Components and the account resource, determine collection result, the collection result for indicate to it is described to Acquisition social media carries out the acquisition data obtained after social media information acquisition, and the multiple Common Services Components are that acquisition is dynamic State loading device is provided to each collection scheduling device.
7th aspect, the embodiment of the present application provide a kind of social media information acquisition system, including as described in fourth aspect Social media information acquisition device, the social media information acquisition device as described in terms of the 5th and such as institute in terms of the 6th Social media information acquisition device described at least one aspect in the social media information acquisition device stated.
Eighth aspect, the embodiment of the present application provide a kind of social media information acquisition equipment, comprising: at least one processor And memory;
The memory stores computer executed instructions;
At least one described processor executes the computer executed instructions of memory storage so that it is described at least one Processor executes that first aspect as above, the various possible designs of first aspect, second aspect, second aspect are various possible to be set Social media information acquisition method described in meter, the third aspect and the various possible designs of the third aspect.
9th aspect, the embodiment of the present application provide a kind of computer readable storage medium, which is characterized in that the computer It is stored with computer executed instructions in readable storage medium storing program for executing, when processor executes the computer executed instructions, realizes as above The various possible designs of first aspect, first aspect, second aspect, second aspect it is various it is possible design, the third aspect and Social media information acquisition method described in the various possible designs of the third aspect.
Social media information acquisition method, device, system, equipment and storage medium provided in this embodiment, first to account Resource service device sends the first solicited message, to request the account resource service device according to social media to be collected Characteristic information distributes corresponding account resource to the collection scheduling device;According to the account resource, generates and/or derive and adjust Spend acquisition tasks;And the second solicited message is sent to capture program dynamic loading device, to capture program described in request call Multiple Common Services Components that dynamic loading device provides;Then according to the scheduling acquisition tasks, by the multiple general Serviced component and the account resource determine collection result, i.e., carry out social media information to the social media to be collected and adopt The acquisition data obtained after collection.This programme in social media information acquisition method collection scheduling device by account resource take Business device application account resource directly generates or is derivatized to scheduling acquisition tasks in conjunction with account resource, then adopted according to scheduling Set task, the multiple Common Services Components provided by capture program dynamic loading device described in request call and corresponding account Resource obtains acquisition data, realizes automated information acquisition, and be suitable for the acquisition method of general social media, is able to solve Development cycle is long, the low problem of collecting efficiency.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this Shen Some embodiments please for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.
Fig. 1 is the flow diagram of social media information acquisition method provided by the embodiments of the present application;
Fig. 2 is the flow diagram for the social media information acquisition method that another embodiment of the application provides;
Fig. 3 is the flow diagram for the social media information acquisition method that the another embodiment of the application provides;
Fig. 4 is the flow diagram for the social media information acquisition method that the application another embodiment provides;
Fig. 5 is the flow diagram for the social media information acquisition method that the another embodiment of the application provides;
Fig. 6 is the flow diagram for the social media information acquisition method that another embodiment of the application provides;
Fig. 7 is the flow diagram for the social media information acquisition method that the another embodiment of the application provides;
Fig. 8 is the flow diagram for the social media information acquisition method that the application another embodiment provides
Fig. 9 is the structural schematic diagram of social media information acquisition device provided by the embodiments of the present application;
Figure 10 is the structural schematic diagram for the social media information acquisition device that the another embodiment of the application provides;
Figure 11 is the structural schematic diagram for the social media information acquisition device that another embodiment of the application provides;
Figure 12 is the structural schematic diagram of social media information acquisition system provided by the embodiments of the present application;
Figure 13 is the structural schematic diagram that social media information provided by the embodiments of the present application acquires equipment.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall in the protection scope of this application.
The description and claims of this application and term " first ", " second ", " third " " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so as to embodiments herein described herein, such as can be with Sequence other than those of illustrating or describing herein is implemented.In addition, term " includes " and " having " and their times What is deformed, it is intended that cover it is non-exclusive include, for example, contain the process, method of a series of steps or units, system, Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for The intrinsic other step or units of these process, methods, product or equipment.
Fig. 1 is the flow diagram of social media information acquisition method provided by the embodiments of the present application, and the present embodiment is held Row main body is collection scheduling device, which can be multiple, wherein each collection scheduling device is basis The characteristic information configuration of each social media to be collected, it, can root when to some or the progress information collection of certain social medias Start corresponding collection scheduling acquisition device according to the characteristic information of some or certain social medias, starts to carry out social media information Acquisition.
Referring to Fig. 1, the social media information acquisition method, comprising:
S101, the first solicited message is sent to account resource service device, first solicited message is described for requesting Account resource service device distributes corresponding account to the collection scheduling device according to the characteristic information of social media to be collected Resource.
In the present embodiment, when collection scheduling device carries out information collection to social media to be collected, it is necessary first to which acquisition is adjusted Device is spent to account resource service device application account resource, to request the account resource service device according to society to be collected The characteristic information of media is handed over to distribute corresponding account resource to the collection scheduling device.Wherein, account resource may include account Number log-on message (including account), cookie (data being stored on user local terminal), token (are invited, login system makes Identity identifier) and managing control information, managing control information may include account id last time using the time, permit Perhaps using interval etc..
Wherein, collection scheduling device can be scheduler, and scheduler may include resource service communication module and scheduling mould Block;The resource service communication module stores the account for being communicated with the account management module, to request Account the Resources list, that is, newest account the Resources list at the current time of number resource, and by the account resource at the current time List is sent to the scheduler module;The scheduler module is used to receive the current account that the resource service communication module is sent Number the Resources list, and be managed according to the account resource in described current account the Resources list.
Specifically, which is responsible for the period and is communicated with account resource service device, requests newest Account the Resources list, and updated to scheduler module, guarantee that account resource is good for use, and scheduler module is detected It is reported in the presence of abnormal account to account resource service device, guarantees that failure account is handled at the first time.Wherein, when Preceding account the Resources list can be expressed as the set for the account that scheduler can be used, which provided by account What source service unit provided, it can automatically update, the account resource applied every time is stored up in memory in account resource service device The newest data deposited embody real-time when acquisition data.
S102, acquisition tasks are dispatched according to the account resource, generation and/or derivative.
In the embodiment of the present application, the scheduler module in scheduler is responsible for dispatching the generation of acquisition tasks and/derivative, is generating And/or need to combine corresponding account resource when deriving scheduling acquisition tasks, it generates and/or derivative carries account resource and account The scheduling acquisition tasks of the attribute of number resource.Here scheduling acquisition tasks include all http (Hyper Text Transport Protocol, hypertext transfer protocol) request (i.e. network request) need information task.
Wherein, scheduler can directly generate scheduling acquisition tasks, can also derive when generating and dispatching acquisition tasks Others scheduling acquisition tasks, or directly derive other scheduling acquisition tasks.For example, be derived by a task N other task, for example, acquiring the task of a certain personal information, the personal information acquisition that may be derived other people is appointed It is engaged in (passing through friend relation etc.).Meanwhile the scheduler module in scheduler is designed using single thread, completes data by buffering queue Exchange and management, to reduce a possibility that bug occurs in social media information capture program development process, reduce follow-up maintenance at This.
S103, the second solicited message is sent to capture program dynamic loading device, second solicited message is for requesting The multiple Common Services Components for calling the capture program dynamic loading device to provide.
In the present embodiment, when collection scheduling device carries out information collection to social media to be collected, it is also necessary to acquisition journey Multiple Common Services Components that capture program dynamic loading device described in sequence dynamic loading device request call provides, to realize Acquire the functions such as downloading, filter weight, the landing of data.Wherein, capture program dynamic loading device can support multiple scheduling simultaneously The operation of device, the functions such as downloading, filter weight, landing can be supplied to independent multiple schedulers simultaneously.Multiple Common Services Components It may include at least one of following: first communication module, filter molality block and writing module.
S104, it is determined according to the scheduling acquisition tasks by the multiple Common Services Components and the account resource Collection result, the collection result are used to indicate to obtaining after the social media to be collected progress social media information acquisition Acquire data.
In the present embodiment, the collection scheduling device, that is, scheduler is used for the characteristic information according to social media to be collected, Reasonable distribution dispatches acquisition tasks, and the general module and account resource service provided using capture program dynamically load frame is filled The account resource data acquisition of offer is provided.
In practical applications, it in the corresponding scheduler of each social media of exploitation, based on basic scheduler template, realizes Special interface configures the internal profile of special interface and puts it into preset scheduler deployment catalogue, facilitates acquisition Program dynamically load frame is inquired and monitors the operating status of each scheduler.Multiple schedulers can be run simultaneously, periodically It is communicated with account resource service device, i.e., to account resource service device application account resource.
Social media information acquisition method provided in this embodiment sends the first request to account resource service device first Information, to request the account resource service device to be filled according to the characteristic information of social media to be collected to the collection scheduling Set the corresponding account resource of distribution;According to the account resource, scheduling acquisition tasks are generated and/or derived;And to capture program Dynamic loading device sends the second solicited message, provides to capture program dynamic loading device described in request call multiple logical Use serviced component;Then according to the scheduling acquisition tasks, by the multiple Common Services Components and the account resource, really Determine collection result, i.e., the acquisition data obtained after social media information acquisition is carried out to the social media to be collected.This programme Collection scheduling device is by account resource service device application account resource in social media information acquisition method, in conjunction with account Number resource, directly generates or is derivatized to scheduling acquisition tasks, then according to scheduling acquisition tasks, by adopting described in request call Collect the multiple Common Services Components and corresponding account resource that program dynamic loading device provides, obtains acquisition data, realize certainly Dynamicization information collection, and it is suitable for the acquisition method of general social media, it is able to solve that the development cycle is long, collecting efficiency is low to ask Topic.
Fig. 2 is the flow diagram for the social media information acquisition method that another embodiment of the application provides, the present embodiment On the basis of the embodiment described in Fig. 1, step S104 is described in detail in the present embodiment.Wherein, the multiple general clothes Business component includes first communication module, as shown in Fig. 2, it is described according to the scheduling acquisition tasks, pass through the multiple general clothes Business component and the account resource, determine collection result, comprising:
S201, network request instruction, the network request instruction packet are sent to server by the first communication module The account resource and the scheduling acquisition tasks are included, the network request instruction is for requesting the server according to the tune Spend acquisition tasks download the corresponding first task of the scheduling acquisition tasks as a result, and according to the account resource, by described the One task result feeds back to the collection scheduling device by the first communication module.
In the present embodiment, first communication module is for realizing collection scheduling device (scheduler) and account resource service device Or the communication function between server, i.e. communication function between scheduler module and account resource service device or server.
Specifically, it when carrying out social media information acquisition according to acquisition tasks, is filled first by capture program dynamically load The first communication module for setting offer sends network request instruction to server, to request the server to be adopted according to the scheduling Set task downloads the corresponding first task of the scheduling acquisition tasks as a result, for example, the microblogging homepage given a visit, is pushed to Microblogging home tip can be used as the corresponding first task result of someone microblogging of server access this scheduling acquisition tasks.So Afterwards, the first task result is fed back to corresponding collection scheduling device by the first communication module by server.
S202, the first task of the first communication module return is received as a result, and carrying out to the task result Parsing, determines collection result.
In the present embodiment, collection scheduling device receives the first task result that the first communication module returns Afterwards, the first task result is post-processed, post-processing refers to be returned from the server of social media to be collected Response-body (being provided in http protocol) in extract valuable data, as collection result is (to described to be collected Social media carries out the acquisition data obtained after social media information acquisition).Therefore, the scheduler module is also used to receive described First communication module return first task as a result, and the first task result is parsed, obtain acquisition data, In, first task result can be scheduling acquisition tasks by the generation of server access social media and/or derivative result.By Existing for multiple single threads of multiple schedulers, therefore, if there is a scheduler to break down, it will not influence other schedulers Operation, realize the loose coupling of acquisition information, while collection process whole-course automation improves collecting efficiency.
Fig. 3 is the flow diagram for the social media information acquisition method that the another embodiment of the application provides, in above-mentioned reality On the basis of applying example, for example, step S104 is described in detail in the present embodiment on the basis of embodiment shown in Fig. 1.Its In, it include second communication module and filter molality block in the multiple Common Services Components in the present embodiment, as shown in figure 3, institute It states according to the scheduling acquisition tasks, by the multiple Common Services Components and the account resource, determines collection result, wrap It includes:
S301, network request instruction is sent to server by the second communication module, the network request instruction is used Corresponding second task result of the scheduling acquisition tasks, institute are downloaded according to the scheduling acquisition tasks in the request server It states filter molality block and is used to receive the second task result that the second communication module is sent, and second task result is carried out The second task result after filter weight is sent to the second communication module, institute by filter weight, the second task result after obtaining filter weight Second communication module is stated for second task result to be sent to the filter molality block, and by second filtered after weight Business result feeds back to the collection scheduling device.
Multiple Common Services Components in the present embodiment are than the component in multiple general purpose modules in embodiment shown in Fig. 2 More filter molality blocks, wherein for second communication module compared with first communication module, second communication module receives server After the second task result (can be identical as first task result) sent, the second task result is directly sent to filter molality Block, filter molality block are used to receive the second task result that the second communication module is sent, and to second task result into The second task result after filter weight is sent to the second communication module by row filter weight, the second task result after obtaining filter weight, So that the second task result after the filter weight is fed back to the collection scheduling device by second communication module.
S302, the second task result after the filter weight that the second communication module returns is received, and to filter weight The second task result afterwards is parsed, and determines collection result.
In the present embodiment, collection scheduling device receives second after the filter weight that the second communication module returns After result of being engaged in, the second task result after the filter weight is post-processed, the process of post-processing is identical as the process in S202, Details are not described herein.The collection process filters the task result generated behind the corresponding website of server access social media Weight, reduces the step of collection scheduling device parses task result, improves analyzing efficiency, and then improve acquisition Efficiency.
In a kind of possible design, on the basis of the above embodiments, for example, in any figure illustrated embodiment of Fig. 1-3 On the basis of, the present embodiment determines that the specific implementation process after collection result is described in detail in step S104.Its In, it can also include writing module in multiple Common Services Components in the present embodiment, after the determining collection result, institute State method further include: the collection result is written in predetermined directory by the write module, and controls and adopts described in storage Collect the single file memory in the predetermined directory of result and is less than or equal to default memory.
In the present embodiment, which is responsible for the landing result (collection result) for generating each scheduler, according to right The configuration file for each scheduler answered is written in predetermined directory, and controls single file size in predetermined directory, is carried out Periodical cutting file.Wherein single file refers to that the text file of each storage landing result, cutting file are pointed to up to default It does not continue to export into current file when condition (preset condition may include file size, time interval etc.), but creates One new file output.
Fig. 4 is the flow diagram for the social media information acquisition method that the application another embodiment provides, in above-mentioned reality On the basis of applying example, the specific implementation process of social media information acquisition method provided in this embodiment has been carried out specifically It is bright.As shown in figure 4, the social media information acquisition method, further includes:
S401, Xiang Suoshu account resource service device send third solicited message, and the third solicited message is for requesting Obtain the target account the Resources list for the account resource being stored in the account resource service device, the target account The Resources list is the account the Resources list at current time.
In the present embodiment, collection scheduling device can also be sent in collection process to the account resource service device Third solicited message is stored in the current time of the account resource in the account resource service device to request Account the Resources list.
In practical applications, the communication module and the account management mould are passed through based on the resource service communication module Block is communicated, and the account the Resources list at the current time of the account resource is stored to request, and will be described current The account the Resources list at moment is sent to the scheduler module, so that account resource of the scheduler module to the current time Account resource in list is managed.
S402, receive described target account the Resources list, and to the account resource in described target account the Resources list into Row monitoring.
In the present embodiment, collection scheduling device can receive the account resource in the account resource service device Target account the Resources list, and the account resource in described target account the Resources list is monitored.Specifically, developer The code write judges whether used account is normal by the data format that social media returns.
S403, account resource abnormal in described target account the Resources list is sent to the account resource service dress It sets, so that the account resource service device handles the abnormal account resource.
In the present embodiment, by account resource updates to scheduler module, guarantee that account resource is good for use, and by the mesh Abnormal account resource is sent to the account resource service device scheduler module is thought possible in mark account the Resources list Existing abnormal account is reported to account resource service device, is guaranteed that failure account is handled at the first time, is safeguarded Simply, fast.
Fig. 5 is the flow diagram for the social media information acquisition method that the another embodiment of the application provides, the present embodiment Executing subject be capture program dynamic loading device.As shown in figure 5, the social media information acquisition method, comprising:
The instruction of the instruction of S501, triggering starting collection scheduling device, the starting collection scheduling device is more for starting A collection scheduling device, so that multiple collection scheduling devices send the first request letter to account resource service device respectively Breath, first solicited message is for requesting the account Resource Server according to the characteristic information of social media to be collected to more Each collection scheduling device in a collection scheduling device distributes corresponding account resource, so that each collection scheduling Device generates and/or derives corresponding scheduling acquisition tasks according to the account resource.
In the present embodiment, capture program dynamic loading device is when carrying out information collection, it is necessary first to start collection scheduling Device, the i.e. instruction of triggering starting collection scheduling device, to start multiple collection scheduling devices, so that multiple acquisitions are adjusted It spends device and corresponding scheduling acquisition tasks is generated by assigned corresponding account resource service.Wherein, a scheduling acquisition Device can produce at least one scheduling acquisition tasks, and different scheduling acquisition devices can produce different scheduling acquisitions and appoint Business, each scheduling acquisition device is configured according to the characteristic information of social media to be collected.
S502, the second solicited message that multiple collection scheduling devices are sent respectively, second solicited message are received For multiple Common Services Components that capture program dynamic loading device described in request call provides, so that each acquisition is adjusted Degree device is determined corresponding according to the scheduling acquisition tasks by the multiple Common Services Components and the account resource Collection result, the collection result are used to indicate to obtaining after the social media to be collected progress social media information acquisition Acquire data.
In the present embodiment, capture program dynamic loading device is receiving what multiple collection scheduling devices were sent respectively When the second solicited message, multiple Common Services Components are provided for each collection scheduling device, so that each acquisition is adjusted Degree device is determined corresponding according to the scheduling acquisition tasks by the multiple Common Services Components and the account resource Collection result.
Wherein, capture program dynamic loading device includes: sweep start module, capture program communication module and write-in mould Block.Capture program communication module can be the first communication module or the second communication module.
Specifically, the sweep start module is preset in the capture program dynamic loading device for scanning or inquiring Scheduler deployment catalogue, to monitor the operating status of each collection scheduling device (scheduler);The capture program is logical Letter module is used to be sent by the account resource service device to the server according to each collection scheduling device Network request instruction is downloaded, and is generated and/or is derived first task result and first task result is back to corresponding institute State collection scheduling device.The write module is used to generate each collection scheduling device and/or the derivative landing is tied Fruit is written in predetermined directory, and controls the single file memory that the landing result is stored in and be less than or equal to default memory.It adopts Collection interprogram communication module is responsible for by the unified transmission of the network request of all schedulers and management, to realize the network of each scheduler The load balancing of IO supports that docking distributed downloads cluster or http agency or the machine according to demand directly accesses.Scheduler is given Capture program communication module sends the task (scheduling acquisition tasks) of the information needed comprising all http requests, and capture program is logical Letter module returns to the corresponding http request of scheduler obtained whole returned data (information needed for http request and returns Information).
Social media information acquisition method provided in this embodiment, capture program dynamic loading device are adopted by triggering starting Collect the instruction of dispatching device, so that multiple collection scheduling devices are respectively to account resource service device application account resource, In conjunction with account resource, scheduling acquisition tasks are directly generated or be derivatized to, then according to scheduling acquisition tasks, are adopted by described Collect the multiple Common Services Components and corresponding account resource that program dynamic loading device provides, obtains acquisition data, realize certainly Dynamicization information collection, and it is suitable for the acquisition method of general social media, it is able to solve that the development cycle is long, collecting efficiency is low to ask Topic.
Referring to Fig. 6, Fig. 6 is the flow diagram for the social media information acquisition method that another embodiment of the application provides, On the basis of the present embodiment embodiment shown in Fig. 5, the present embodiment has carried out specifically the specific implementation process of step S501 It is bright.As shown in fig. 6, the triggering starting collection scheduling device instruction, comprising:
Preset dispatching device disposes catalogue in S601, the inquiry capture program dynamic loading device.
In the present embodiment, a dispatching device catalogue, capture program are preset in the capture program dynamic loading device Dynamic loading device can dispose catalogue periodically through sweep start module scans or query scheduling device, monitor each scheduling The operating status of device, the operating status of scheduler may include inactive state, remove state from dispatching device deployment mesh, More new state etc..
What is stored in S602, the mark for obtaining the multiple collection scheduling device, with dispatching device deployment catalogue adopts The mark of collection dispatching device is compared, and determines the collection scheduling device of operation to be terminated in the multiple collection scheduling device The mark of mark and inactive collection scheduling device.
In the present embodiment, the multiple collection scheduling device can be collection scheduling device existing for backstage, from depositing from the background Collection scheduling device in obtain the mark of the multiple collection scheduling device, and stored in dispatching device deployment catalogue The mark of collection scheduling device be compared, search and remove state or more new state or not from dispatching device deployment mesh The collection scheduling device of starting state determines the collection scheduling device of operation to be terminated in the multiple collection scheduling device Mark, the mark of the mark of inactive collection scheduling device and updated collection scheduling device.Wherein, updated acquisition The judgement of dispatching device is judged according to the modification time information of collection scheduling device, and specifically, code file has occurred Any variation typically occurs in when code release updates i.e. it is believed that is modified, if file mtime (when Light net) it changes i.e. it is believed that being modified.Reboot operation is carried out to the collection scheduling device of more new state.
The mark of the collection scheduling device of S603, the basis operation to be terminated, the multiple collection scheduling out of service Collection scheduling device corresponding with the mark of collection scheduling device of the operation to be terminated in device.
In the present embodiment, determine it is described after the mark of collection scheduling device for terminating operation, it is described existing for the backstage The collection scheduling device of the mark of the collection scheduling device containing the operation to be terminated is searched in multiple collection scheduling devices, and The collection scheduling device of the mark of collection scheduling device containing the operation to be terminated is terminated into operation.
S604, according to the mark of the inactive collection scheduling device, to the inactive collection scheduling device Corresponding collection scheduling device triggering start-up operation is identified, the triggering start-up operation includes triggering starting collection scheduling device Instruction.
It is the multiple existing for the backstage after the mark for determining the inactive collection scheduling device in the present embodiment The collection scheduling device of the mark containing the inactive collection scheduling device is searched in collection scheduling device, and will be contained The collection scheduling device starting for stating the mark of inactive collection scheduling device executes.Wherein, capture program dynamic loading device In sweep start module also support (to have configured) configuration in the configuration file of scheduler the scheduling that the period executes parameter Device carries out period starting, restarts to unexpected scheduler out of service.
Optionally, the capture program dynamic loading device further include: filter molality block.Filter molality block is for receiving described the The second task result that two communication modules are sent, and filter weight is carried out to second task result, second after obtaining filter weight Business is as a result, be sent to the second communication module for the second task result after weight is filtered, so that second communication module is by the filter The second task result after weight feeds back to the collection scheduling device.
Specifically, filter molality block is responsible for providing the filter weight function of collection result, and Data duplication is prevented to land.It is defaulted as being based on BerkeleyDB (Berkeley DB is the document data bank of an open source, between relational database and memory database, Usage mode is similar with memory database, it provides a series of functions directly accessed the database, rather than as relationship number Needed like that according to library network communication, SQL parsing and etc.) realize Key-Value (i.e. distributed memory system inquiry velocity Fastly, store data volume is big, supports high concurrent) storage organization, support to add Bloom filter component wherein also to realize superelevation Filter weight in the case of data volume.
In practical applications, capture program dynamic loading device be responsible for start collection scheduling device and for its provide downloading, The Common Services Components such as filter weight, landing.The capture program dynamic loading device supports the multiple collection scheduling dresses of independent startup simultaneously It sets, and supports collection scheduling device hot plug to update and be restarted automatically, the functions such as start by set date.
Fig. 7 is the flow diagram for the social media information acquisition method that the another embodiment of the application provides, the present embodiment Executing subject be account resource service device.As shown in fig. 7, the social media information acquisition method, comprising:
S701, the first solicited message that multiple collection scheduling devices are sent respectively is received.
In the present embodiment, account resource service device receives multiple collection scheduling device difference when carrying out information collection The first solicited message sent distributes account resource for multiple collection scheduling devices respectively, i.e. account resource service device is responsible for The account resource of each social media is provided, it can be corresponding to each social media being added in the account resource service device Each account carry out automated log on, cyclic check, the management such as resource dissemination operation.
S702, it is adjusted by the characteristic information of social media to be collected to the multiple acquisition according to first solicited message It spends each collection scheduling device in device and distributes corresponding account resource, so that each collection scheduling device is according to institute Account resource is stated, generate and/or derives scheduling acquisition tasks, and according to the scheduling acquisition tasks, passes through multiple generic services Component and the account resource determine that collection result, the collection result carry out the social media to be collected for indicating The acquisition data obtained after social media information acquisition, the multiple Common Services Components are acquisition dynamic loading devices to each What the collection scheduling device provided.
In the present embodiment, account resource service device passes through social activity to be collected after receiving first solicited message The characteristic information of media distributes corresponding account resource to the collection scheduling device each in the multiple collection scheduling device, So that each collection scheduling device according to the account resource, generates and/or derivative scheduling acquisition tasks, and according to described It dispatches acquisition tasks and determines collection result by multiple Common Services Components and the account resource.
Social media information acquisition method provided in this embodiment, account resource service device are asked by receiving described first Information is sought, to adjust according to the characteristic information of social media to be collected to the acquisition each in the multiple collection scheduling device It spends device and distributes corresponding account resource, so that each collection scheduling device generates and/or spreads out according to the account resource Raw scheduling acquisition tasks, and obtained according to the scheduling acquisition tasks by multiple Common Services Components and the account resource Data are acquired, automated information acquisition is realized, and be suitable for the acquisition method of general social media, is able to solve the development cycle It is long, the low problem of collecting efficiency.
Referring to Fig. 8, Fig. 8 is the flow diagram for the social media information acquisition method that the application another embodiment provides, The present embodiment on the basis of the embodiment shown in fig. 7, is described in detail the specific implementation process of the present embodiment.Such as Fig. 8 It is shown, the social media information acquisition method, further includes:
S801, according to multiple account attributes preset in the account resource service device, be the multiple account attribute In the corresponding account assignment period of each account attribute log in, the task of periodic survey and account resource dissemination.
In the present embodiment, the account resource service device includes: that account management module, network request modules and account are logical Believe module;The account management module is used for according to multiple account attributes preset in the account resource service device, for institute State the corresponding account assignment period login of each account attribute, periodic survey and the account resource in multiple account attributes The task of distribution;The network request modules are used to execute the task of the account management module distribution, obtain resource tasks, institute State account resource of the resource tasks for indicating for the account resource of all management to be respectively allocated to corresponding collection scheduling device; The communication module with user terminal and each collection scheduling device for being communicated respectively, so that the user terminal carries out account Number management and each collection scheduling device carry out account resource bid.
S802, the executing the periodical login, periodic survey and the account resource dissemination that distribute of the task, generate account money Source, the account resource include account log-on message and application programming interfaces.
In the present embodiment, account communication module is responsible for externally providing http interface, with realization extraneous (user terminal) addition, more The interface for changing, deleting account and providing application resource for collection scheduling device, report the functions such as failure account, is supported according to peace Full property requires to replace with https interface and realize addition account and remove later to be situated between in the case of account is closed by social media without artificial Enter to be automatically performed the work of account whole management service, the application pipe of resource needed for supporting a variety of acquisitions such as cookie, token Reason maintenance.Account management module can rely on relevant database to carry out data persistence, guarantee that program stops in any case It can guarantee that account is not affected.Network request modules are responsible for adopting the task that account management module distributes by distribution Collect cluster to execute.
In practical applications, which passes through http interface (network interface) and other programs (user Hold the program of background program and collection scheduling device) communication, realization is decoupling with collection scheduling device, ensure that social matchmaker The high reliability and wide usage of body information acquisition program (social media information acquisition method).
Fig. 9 is the structural schematic diagram of social media information acquisition device provided by the embodiments of the present application.As shown in figure 9, should Social media information acquisition device 90 includes: the first solicited message sending module 901, scheduling acquisition tasks generation module 902, the Two solicited message sending modules 903 and collection result determining module 904, the first solicited message sending module 901 are used for account Number resource service device sends the first solicited message, and first solicited message is for requesting the account resource service device root Corresponding account resource is distributed to the collection scheduling device according to the characteristic information of social media to be collected;It is raw to dispatch acquisition tasks At module 902, for according to the account resource, generating and/or deriving scheduling acquisition tasks;Second solicited message sending module 903, for sending the second solicited message to capture program dynamic loading device, second solicited message is used for request call institute Multiple Common Services Components of capture program dynamic loading device offer are provided;Collection result determining module 904, for according to It dispatches acquisition tasks and determines collection result, the collection result by the multiple Common Services Components and the account resource For indicating to carry out the acquisition data obtained after social media information acquisition to the social media to be collected.
Device provided in this embodiment can be used for executing the implementation of social media information acquisition method shown in above-mentioned Fig. 1-4 The technical solution of example, it is similar that the realization principle and technical effect are similar, and details are not described herein again for the present embodiment.
In a kind of possible design, the multiple Common Services Components include first communication module, the collection result Determining module 904, is specifically used for: sending network request instruction to server by the first communication module, the network is asked Asking instruction includes the account resource and the scheduling acquisition tasks, and the network request instruction is for requesting the server root Download the corresponding first task of the scheduling acquisition tasks according to the scheduling acquisition tasks as a result, and according to the account resource, The first task result is fed back into the collection scheduling device by the first communication module;Receive first communication Module return the first task as a result, and the task result is parsed, determine collection result.
In a kind of possible design, in the multiple Common Services Components include second communication module and filter molality block, The collection result determining module 904, is specifically used for: sending network request to server by the second communication module and refers to It enables, the network request instruction is for requesting the server to download the scheduling acquisition tasks according to the scheduling acquisition tasks Corresponding second task result, the filter molality block are used to receive the second task result that the second communication module is sent, and Filter weight is carried out to second task result, the second task result after obtaining filter weight sends out the second task result after filter weight It send to the second communication module, the second communication module is used to second task result being sent to the filter molality Block, and the second task result after the filter weight is fed back into the collection scheduling device;The second communication module is received to return The second task result after the filter weight returned, and the second task result after the filter weight is parsed, determine acquisition knot Fruit.
It further include writing module in the multiple Common Services Components in a kind of possible design;Described device is also wrapped It includes: first processing module, for the collection result being written by the write module after the determining collection result Into predetermined directory, and controls the single file memory in the predetermined directory for storing the collection result and be less than or equal to preset Memory.
In a kind of possible design, described device further include: third solicited message sending module is used for the account Resource service device sends third solicited message, and the third solicited message is stored in the account resource clothes for request The target account the Resources list for the account resource being engaged in device, described target account the Resources list are the account at current time The Resources list;Monitoring module, for receiving described target account the Resources list, and to the account in described target account the Resources list Number resource is monitored;Second processing module, for account resource abnormal in described target account the Resources list to be sent to The account resource service device, so that the account resource service device handles the abnormal account resource.
Figure 10 is the structural schematic diagram for the social media information acquisition device that the another embodiment of the application provides.Such as Figure 10 institute Show, which includes: starting module 1001, the second solicited message receiving module 1002;Start mould Block 1001 is used to trigger the instruction of starting collection scheduling device, and the instruction of the starting collection scheduling device is for starting multiple adopt Collect dispatching device, so that multiple collection scheduling devices send the first solicited message, institute to account resource service device respectively The first solicited message is stated for requesting the account Resource Server according to the characteristic information of social media to be collected to multiple institutes The each collection scheduling device stated in collection scheduling device distributes corresponding account resource, so that each collection scheduling device According to the account resource, corresponding scheduling acquisition tasks are generated and/or derived;Second solicited message receiving module 1002 is used In the second solicited message that the multiple collection scheduling devices of reception are sent respectively, second solicited message is used for request call Multiple Common Services Components that the capture program dynamic loading device provides, so that each collection scheduling device is according to institute Scheduling acquisition tasks are stated, by the multiple Common Services Components and the account resource, determine corresponding collection result, it is described Collection result is used to indicate to carry out the acquisition data obtained after social media information acquisition to the social media to be collected.
Device provided in this embodiment can be used for executing the implementation of social media information acquisition method shown in above-mentioned Fig. 5 or 6 The technical solution of example, it is similar that the realization principle and technical effect are similar, and details are not described herein again for the present embodiment.
In a kind of possible design, starting module 1001 is specifically used for: inquiring the capture program dynamic loading device In preset dispatching device dispose catalogue;The mark for obtaining the multiple collection scheduling device disposes mesh with the dispatching device The mark of the collection scheduling device stored in record is compared, and determines adopting for operation to be terminated in the multiple collection scheduling device Collect the mark of dispatching device and the mark of inactive collection scheduling device;According to the collection scheduling device of the operation to be terminated Mark, it is corresponding with the mark of collection scheduling device of the operation to be terminated in the multiple collection scheduling device out of service Collection scheduling device;According to the mark of the inactive collection scheduling device, to the inactive collection scheduling device The corresponding collection scheduling device of mark trigger start-up operation, the triggering start-up operation includes that triggering starts collection scheduling device Instruction.
Figure 11 is the structural schematic diagram for the social media information acquisition device that another embodiment of the application provides.Such as Figure 11 institute Show, which includes: the first solicited message receiving module 1101, account resource management module 1102, the first solicited message receiving module 1101 is for receiving the first solicited message that multiple collection scheduling devices are sent respectively; Account resource management module 1102, for according to first solicited message by the characteristic information of social media to be collected to institute It states each collection scheduling device in multiple collection scheduling devices and distributes corresponding account resource, so that each acquisition is adjusted Device is spent according to the account resource, generate and/or derives scheduling acquisition tasks, and according to the scheduling acquisition tasks, is passed through Multiple Common Services Components and the account resource determine collection result, and the collection result is for indicating to described to be collected Social media carries out the acquisition data obtained after social media information acquisition, and the multiple Common Services Components are that acquisition dynamic adds It carries and sets to each collection scheduling device offer.
Device provided in this embodiment can be used for executing the implementation of social media information acquisition method shown in above-mentioned Fig. 7 or 8 The technical solution of example, it is similar that the realization principle and technical effect are similar, and details are not described herein again for the present embodiment.
In a kind of possible design, described device further include: distribution account resource module, for being provided according to the account Preset multiple account attributes in the service unit of source are the corresponding account point of each account attribute in the multiple account attribute Task with periodical login, periodic survey and account resource dissemination;Execution module, the periodicity for executing distribution are stepped on The task of record, periodic survey and account resource dissemination generates account resource, and the account resource includes account log-on message And application programming interfaces.
Figure 12 is the structural schematic diagram of social media information acquisition system provided by the embodiments of the present application.The embodiment of the present application On the basis of the above embodiments, the social media information acquisition system is described in detail.The social media Information acquisition system 120 includes: such as the corresponding social media information acquisition device 90 as described in the examples of Fig. 9, such as Figure 10 pairs The social media information acquisition device 100 as described in the examples answered and such as corresponding social activity as described in the examples of Figure 11 The corresponding social media information acquisition device as described in the examples of at least one figure in media information acquisition device 110.
Wherein, if the corresponding social media information acquisition device 90 as described in the examples of Fig. 9 is the first social media letter Acquisition device 1201 is ceased, if the corresponding social media information acquisition device 100 as described in the examples of Figure 10 is the second social matchmaker Body information collecting device 1202, if the corresponding social media information acquisition device 110 as described in the examples of Figure 11 is third society Hand over media information acquisition device 1203.
In the present embodiment, the embodiment of the social media information acquisition device in the social media information acquisition system The embodiment of any social media information acquisition device is similar in embodiment corresponding with above-mentioned Fig. 9-11, herein not It repeats again.
In practical applications, capture program dynamic loading device be responsible for start collection scheduling device and for its provide downloading, The Common Services Components such as filter weight, landing, the capture program dynamic loading device support the multiple collection scheduling devices of independent startup simultaneously, And collection scheduling device hot plug is supported to update and be restarted automatically, the functions such as start by set date.Capture program dynamic loading device is real When to account resource service device update account resource situation, and be distributed to each independently operated collection scheduling device of subordinate. It realizes the decoupling of resource and capture program, and guarantees that each collection scheduling device will not interact, reduce system complex Degree improves development efficiency, reduces maintenance difficulties.Sweep start module inside capture program dynamic loading device starts and monitors The operation conditions of each collection scheduling device, while multiple general purpose modules being provided and are used for collection scheduling device, it completes downloading and adjusts Degree task.Account resource service device relies on relevant database, completes account Resource Management, and pass through three kinds of http The account managements relevant operations such as interface and extraneous completion resource bid discharge, and exception is reported are in implementation process, it is only necessary to according to The actual conditions of specific social media complete scheduler example according to special interface format, and capture program dynamically load are added Device completes social media collecting work, significantly reduces workload in development process.
Figure 13 is the structural schematic diagram that social media information provided by the embodiments of the present application acquires equipment.As shown in figure 13, The social media information acquisition equipment 130 of the present embodiment includes: processor 1301 and memory 1302;Wherein, memory 1302, for storing computer executed instructions;Processor 1301, for executing the computer executed instructions of memory storage, with Realize each step performed by receiving device in above-described embodiment.The correlation that specifically may refer in preceding method embodiment is retouched It states.
The embodiment of the present application also provides a kind of computer readable storage medium, stores in the computer readable storage medium There are computer executed instructions, when processor executes the computer executed instructions, realizes any embodiment pair as described above The social media information acquisition method answered.
In several embodiments provided herein, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, apparatus embodiments described above are merely indicative, for example, the division of the module, only Only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple modules can combine or It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of device or module It connects, can be electrical property, mechanical or other forms.In addition, each functional module in each embodiment of the application can integrate In one processing unit, it is also possible to modules to physically exist alone, can also be integrated in two or more modules In one unit.Above-mentioned module at unit both can take the form of hardware realization, software function can also be added using hardware The form of unit is realized.
The above-mentioned integrated module realized in the form of software function module, can store and computer-readable deposit at one In storage media.Above-mentioned software function module is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) or processor (English: processor) execute this Shen Please each embodiment the method part steps.It should be understood that above-mentioned processor can be central processing unit (English: Central Processing Unit, referred to as: CPU), can also be other general processors, digital signal processor (English: Digital Signal Processor, referred to as: DSP), specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC) etc..General processor can be microprocessor or the processor is also possible to Any conventional processor etc..Hardware processor can be embodied directly in conjunction with the step of invention disclosed method to have executed At, or in processor hardware and software module combination execute completion.
Memory may include high speed RAM memory, it is also possible to and it further include non-volatile memories NVM, for example, at least one Magnetic disk storage can also be USB flash disk, mobile hard disk, read-only memory, disk or CD etc..Bus can be industrial standard body Architecture (Industry Standard Architecture, ISA) bus, external equipment interconnection (Peripheral Component, PCI) bus or extended industry-standard architecture (Extended Industry Standard Architecture, EISA) bus etc..Bus can be divided into address bus, data/address bus, control bus etc..For convenient for indicate, Bus in illustrations does not limit only a bus or a type of bus.Above-mentioned storage medium can be by appointing Volatibility or non-volatile memory device or the their combination realization of what type, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), Erasable Programmable Read Only Memory EPROM (EPROM) may be programmed Read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, disk or CD.Storage medium can be with It is any usable medium that general or specialized computer can access.
A kind of illustrative storage medium is coupled to processor, believes to enable a processor to read from the storage medium Breath, and information can be written to the storage medium.Certainly, storage medium is also possible to the component part of processor.It processor and deposits Storage media can be located at specific integrated circuit (Application Specific Integrated Circuits, referred to as: ASIC in).Certainly, pocessor and storage media can also be used as discrete assembly and be present in electronic equipment or main control device.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.
Finally, it should be noted that the above various embodiments is only to illustrate the technical solution of the application, rather than its limitations;To the greatest extent Pipe is described in detail the application referring to foregoing embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, each embodiment technology of the application that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (15)

1. a kind of social media information acquisition method, which is characterized in that be applied to collection scheduling device, comprising:
The first solicited message is sent to account resource service device, first solicited message is for requesting the account resource to take Business device distributes corresponding account resource to the collection scheduling device according to the characteristic information of social media to be collected;
According to the account resource, scheduling acquisition tasks are generated and/or derived;
The second solicited message is sent to capture program dynamic loading device, second solicited message described in request call for adopting Collect multiple Common Services Components that program dynamic loading device provides;
Collection result is determined by the multiple Common Services Components and the account resource according to the scheduling acquisition tasks, The collection result is used to indicate to carry out the acquisition data obtained after social media information acquisition to the social media to be collected.
2. the method according to claim 1, wherein the multiple Common Services Components include the first communication mould Block;
It is described to determine acquisition by the multiple Common Services Components and the account resource according to the scheduling acquisition tasks As a result, comprising:
Network request instruction is sent to server by the first communication module, the network request instruction includes the account Resource and the scheduling acquisition tasks, the network request instruction is for requesting the server according to the scheduling acquisition tasks Download the corresponding first task of the scheduling acquisition tasks as a result, and according to the account resource, by the first task result The collection scheduling device is fed back to by the first communication module;
The first task that the first communication module returns is received as a result, and parsing to the task result, determination Collection result.
3. the method according to claim 1, wherein including the second communication mould in the multiple Common Services Components Block and filter molality block;
It is described to determine acquisition by the multiple Common Services Components and the account resource according to the scheduling acquisition tasks As a result, comprising:
Network request instruction is sent to server by the second communication module, the network request instruction is described for requesting Server downloads corresponding second task result of the scheduling acquisition tasks, the filter molality block according to the scheduling acquisition tasks The second task result sent for receiving the second communication module, and filter weight is carried out to second task result, it obtains The second task result after filter weight is sent to the second communication module by the second task result after filtering weight, and described second is logical Letter module is used to for second task result being sent to the filter molality block, and the second task result after the filter weight is anti- It feeds the collection scheduling device;
The second task result after receiving the filter weight that the second communication module returns, and to second after the filter weight Business result is parsed, and determines collection result.
4. method according to claim 1-3, which is characterized in that further include in the multiple Common Services Components Writing module;
After the determining collection result, the method also includes:
The collection result is written in predetermined directory by the write module, and controls the institute for storing the collection result It states the single file memory in predetermined directory and is less than or equal to default memory.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
Third solicited message is sent to the account resource service device, the third solicited message is stored in for request Target account the Resources list of the account resource in the account resource service device, described target account the Resources list are The account the Resources list at current time;
Described target account the Resources list is received, and the account resource in described target account the Resources list is monitored;
Account resource abnormal in described target account the Resources list is sent to the account resource service device, so that described Account resource service device handles the abnormal account resource.
6. a kind of social media information acquisition method, which is characterized in that be applied to capture program dynamic loading device, comprising:
The instruction of triggering starting collection scheduling device, the instruction of the starting collection scheduling device is for starting multiple collection schedulings Device so that multiple collection scheduling devices respectively to account resource service device send the first solicited message, described first Solicited message is for requesting the account Resource Server according to the characteristic information of social media to be collected to multiple acquisitions Each collection scheduling device in dispatching device distributes corresponding account resource, so that each collection scheduling device is according to institute Account resource is stated, corresponding scheduling acquisition tasks are generated and/or derive;
The second solicited message that multiple collection scheduling devices are sent respectively is received, second solicited message is adjusted for requesting With the capture program dynamic loading device provide multiple Common Services Components so that each collection scheduling device according to The scheduling acquisition tasks determine corresponding collection result, institute by the multiple Common Services Components and the account resource It states collection result and the acquisition data obtained after social media information acquisition is carried out to the social media to be collected for indicating.
7. according to the method described in claim 6, it is characterized in that, triggering starting collection scheduling device instruction, comprising:
Inquire preset dispatching device deployment catalogue in the capture program dynamic loading device;
The collection scheduling device stored in the mark for obtaining the multiple collection scheduling device, with dispatching device deployment catalogue Mark be compared, determine the mark of the collection scheduling device of operation to be terminated in the multiple collection scheduling device and do not open The mark of dynamic collection scheduling device;
According to the mark of the collection scheduling device of the operation to be terminated, in the multiple collection scheduling device out of service with institute State the corresponding collection scheduling device of mark of the collection scheduling device of operation to be terminated;
It is corresponding to the mark of the inactive collection scheduling device according to the mark of the inactive collection scheduling device Collection scheduling device triggers start-up operation, and the triggering start-up operation includes the instruction of triggering starting collection scheduling device.
8. a kind of social media information acquisition method, which is characterized in that be applied to account resource service device, comprising:
Receive the first solicited message that multiple collection scheduling devices are sent respectively;
According to first solicited message by the characteristic information of social media to be collected in the multiple collection scheduling device Each collection scheduling device distributes corresponding account resource, so that each collection scheduling device is provided according to the account Source generates and/or derives scheduling acquisition tasks, and according to the scheduling acquisition tasks, passes through multiple Common Services Components and institute Account resource is stated, determines that collection result, the collection result carry out social media to the social media to be collected for indicating The acquisition data obtained after information collection, the multiple Common Services Components are acquisition dynamic loading devices to each acquisition What dispatching device provided.
9. according to the method described in claim 8, it is characterized in that, the method also includes:
It is each account in the multiple account attribute according to multiple account attributes preset in the account resource service device The corresponding account assignment period of number attribute logs in, the task of periodic survey and account resource dissemination;
The executing the periodical login, periodic survey and the account resource dissemination that distribute of the task, generates account resource, the account Number resource includes account log-on message and application programming interfaces.
10. a kind of social media information acquisition device characterized by comprising
First solicited message sending module, for sending the first solicited message, first request to account resource service device Information is for requesting the account resource service device to be filled according to the characteristic information of social media to be collected to the collection scheduling Set the corresponding account resource of distribution;
Acquisition tasks generation module is dispatched, for according to the account resource, generating and/or deriving scheduling acquisition tasks;
Second solicited message sending module, for capture program dynamic loading device send the second solicited message, described second Multiple Common Services Components that solicited message is provided for capture program dynamic loading device described in request call;
Collection result determining module, for according to the scheduling acquisition tasks, by the multiple Common Services Components and described Account resource determines that collection result, the collection result carry out social media letter to the social media to be collected for indicating The acquisition data obtained after breath acquisition.
11. a kind of social media information acquisition device characterized by comprising
Starting module, for triggering the instruction of starting collection scheduling device, the instruction of the starting collection scheduling device is for opening Multiple collection scheduling devices are moved, so that multiple collection scheduling devices send the first request to account resource service device respectively Information, first solicited message be used to requesting the account Resource Server according to the characteristic information of social media to be collected to Each collection scheduling device in multiple collection scheduling devices distributes corresponding account resource, so that each acquisition is adjusted Device is spent according to the account resource, generates and/or derive corresponding scheduling acquisition tasks;
Second solicited message receiving module, the second solicited message sent respectively for receiving multiple collection scheduling devices, Multiple Common Services Components that second solicited message is provided for capture program dynamic loading device described in request call, with Make each collection scheduling device according to the scheduling acquisition tasks, passes through the multiple Common Services Components and the account Resource determines that corresponding collection result, the collection result carry out social media to the social media to be collected for indicating The acquisition data obtained after information collection.
12. a kind of social media information acquisition device characterized by comprising
First solicited message receiving module, the first solicited message sent respectively for receiving multiple collection scheduling devices;
Account resource management module, for according to first solicited message by the characteristic information of social media to be collected to institute It states each collection scheduling device in multiple collection scheduling devices and distributes corresponding account resource, so that each acquisition is adjusted Device is spent according to the account resource, generate and/or derives scheduling acquisition tasks, and according to the scheduling acquisition tasks, is passed through Multiple Common Services Components and the account resource determine collection result, and the collection result is for indicating to described to be collected Social media carries out the acquisition data obtained after social media information acquisition, and the multiple Common Services Components are that acquisition dynamic adds It carries and sets to each collection scheduling device offer.
13. a kind of social media information acquisition system, which is characterized in that including social media information as claimed in claim 10 Acquisition device, social media information acquisition device as claimed in claim 11 and social matchmaker as claimed in claim 12 Social media information acquisition device described at least one of body information collecting device.
14. a kind of social media information acquires equipment characterized by comprising at least one processor and memory;
The memory stores computer executed instructions;
At least one described processor executes the computer executed instructions of the memory storage, so that at least one described processing Device executes social media information acquisition method as claimed in any one of claims 1-9 wherein.
15. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium It executes instruction, when processor executes the computer executed instructions, realizes social as claimed in any one of claims 1-9 wherein Media information acquisition method.
CN201910255758.2A 2019-04-01 2019-04-01 Social media information acquisition method, device, system, equipment and storage medium Expired - Fee Related CN110046319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910255758.2A CN110046319B (en) 2019-04-01 2019-04-01 Social media information acquisition method, device, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910255758.2A CN110046319B (en) 2019-04-01 2019-04-01 Social media information acquisition method, device, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110046319A true CN110046319A (en) 2019-07-23
CN110046319B CN110046319B (en) 2021-04-09

Family

ID=67275670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910255758.2A Expired - Fee Related CN110046319B (en) 2019-04-01 2019-04-01 Social media information acquisition method, device, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110046319B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112769779A (en) * 2020-12-28 2021-05-07 深圳壹账通智能科技有限公司 Media resource transmission method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103516759A (en) * 2012-06-28 2014-01-15 中兴通讯股份有限公司 Cloud system resource management method, cloud call center seat management method and cloud system
CN105009105A (en) * 2011-10-10 2015-10-28 苹果公司 Systems and methods for prediction-based crawling of social media network
CN105608194A (en) * 2015-12-24 2016-05-25 成都陌云科技有限公司 Method for analyzing main characteristics in social media
CN105761150A (en) * 2016-01-29 2016-07-13 中国科学院遥感与数字地球研究所 Crop information and sample acquisition method and system
US20180260483A1 (en) * 2015-09-29 2018-09-13 Sony Corporation Information processing apparatus, information processing method, and program
CN108874810A (en) * 2017-05-10 2018-11-23 北京京东尚科信息技术有限公司 The method and apparatus of information collection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105009105A (en) * 2011-10-10 2015-10-28 苹果公司 Systems and methods for prediction-based crawling of social media network
CN103516759A (en) * 2012-06-28 2014-01-15 中兴通讯股份有限公司 Cloud system resource management method, cloud call center seat management method and cloud system
US20180260483A1 (en) * 2015-09-29 2018-09-13 Sony Corporation Information processing apparatus, information processing method, and program
CN105608194A (en) * 2015-12-24 2016-05-25 成都陌云科技有限公司 Method for analyzing main characteristics in social media
CN105761150A (en) * 2016-01-29 2016-07-13 中国科学院遥感与数字地球研究所 Crop information and sample acquisition method and system
CN108874810A (en) * 2017-05-10 2018-11-23 北京京东尚科信息技术有限公司 The method and apparatus of information collection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈为东等: "面向Web Archive的社交媒体信息采集工具比较研究", 《图书馆学研究》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112769779A (en) * 2020-12-28 2021-05-07 深圳壹账通智能科技有限公司 Media resource transmission method and device

Also Published As

Publication number Publication date
CN110046319B (en) 2021-04-09

Similar Documents

Publication Publication Date Title
US11238036B2 (en) System performance logging of complex remote query processor query operations
CN104520814B (en) System and method for configuring cloud computing systems
US9727405B2 (en) Problem determination in distributed enterprise applications
CN105224445B (en) Distributed tracking system
CN104541247B (en) System and method for adjusting cloud computing system
CN106844198B (en) Distributed dispatching automation test platform and method
CN101222349B (en) Method and system for collecting web user action and performance data
CN106067080B (en) Configurable workflow capabilities are provided
CN110309389A (en) Cloud computing system
CN109582303A (en) General purpose module call method, device, computer equipment and storage medium
CN109582466A (en) A kind of timed task executes method, distributed server cluster and electronic equipment
CN106980678A (en) Data analysing method and system based on zookeeper technologies
CN107317724A (en) Data collecting system and method based on cloud computing technology
CN109814992A (en) Distributed dynamic dispatching method and system for the acquisition of large scale network data
CN109951320A (en) A kind of expansible multi layer monitoing frame and its monitoring method of facing cloud platform
CN109840298A (en) The multi information source acquisition method and system of large scale network data
CN104077224A (en) Software function analyzing system and method
CN110011827A (en) Towards doctor conjuncted multi-user's big data analysis service system and method
CN110046319A (en) Social media information acquisition method, device, system, equipment and storage medium
CN110929130B (en) Public security level audit data query method based on distributed scheduling
CN112559525A (en) Data checking system, method, device and server
Dillenseger Clif, a framework based on fractal for flexible, distributed load testing
CN111143177B (en) Method, system, device and storage medium for collecting RMF III data of IBM host
CN114996081A (en) Batch job progress monitoring method and device, electronic equipment and storage medium
CN109995617A (en) Automated testing method, device, equipment and the storage medium of Host Administration characteristic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230613

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

Address before: 100871, Beijing, Haidian District, Cheng Fu Road, No. 298, Zhongguancun Fangzheng building, 9 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210409

CF01 Termination of patent right due to non-payment of annual fee