CN107239563A - Public feelings information dynamic monitoring and controlling method - Google Patents

Public feelings information dynamic monitoring and controlling method Download PDF

Info

Publication number
CN107239563A
CN107239563A CN201710441942.7A CN201710441942A CN107239563A CN 107239563 A CN107239563 A CN 107239563A CN 201710441942 A CN201710441942 A CN 201710441942A CN 107239563 A CN107239563 A CN 107239563A
Authority
CN
China
Prior art keywords
website
data
agent address
page
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710441942.7A
Other languages
Chinese (zh)
Inventor
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING BLTSFE INFORMATION TECHNOLOGY Co Ltd
Original Assignee
BEIJING BLTSFE INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING BLTSFE INFORMATION TECHNOLOGY Co Ltd filed Critical BEIJING BLTSFE INFORMATION TECHNOLOGY Co Ltd
Priority to CN201710441942.7A priority Critical patent/CN107239563A/en
Publication of CN107239563A publication Critical patent/CN107239563A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a kind of public feelings information dynamic monitoring and controlling method, this method includes:The source website provided according to user and subject information carry out data acquisition of increasing income, and the web data of acquisition is stored to local;The web data that gatherer process is obtained is analyzed, diversified internet information is normalized, and is deposited for early stage Webpage data using document form, database purchase is carried out for the result after analysis.The present invention proposes a kind of public feelings information dynamic monitoring and controlling method, the perfect data acquisition based on search engine, and public feelings information is monitored in real time using efficient data mining algorithm.

Description

Public feelings information dynamic monitoring and controlling method
Technical field
The present invention relates to search engine, more particularly to a kind of public feelings information dynamic monitoring and controlling method.
Background technology
Internet has become the approach that people obtain information, and user can be expressed by this information platform of internet Oneself viewpoint to some events, phenomenon and policy.On the other hand, in terms of also having poured in reaction, yellow and the network crime Content.Prior art is for internet information monitoring aspect by web search, data mining, intellectual analysis and public sentiment Technology in terms of monitoring has carried out a certain degree of lifting, designs, realizes many network topics systems.But overall solution party Scientific explarnation, detailed description, Accurate Prediction and the control in real time of case and systematization also need to significantly improve.
The content of the invention
To solve the problems of above-mentioned prior art, the present invention proposes a kind of public feelings information dynamic monitoring and controlling method, Including:
The source website provided according to user and subject information carry out data acquisition of increasing income, and the web data of acquisition is stored To local;
The web data that gatherer process is obtained is analyzed, diversified internet information is normalized, And deposited for early stage Webpage data using document form, carry out database purchase for the result after analysis.
Preferably, before the data acquisition, this method also includes:
The rule of combination of keyword is provided by user, is on the one hand scanned for by search engine, on the other hand to website The space of a whole page filters gathered data by crawling process;For the search result of search engine, to by keyword retrieval to URL carry out Acquisition order;For specified concern website, it is desirable to which user is assigned to the URL of the space of a whole page or provided adds each of the website one by one Space of a whole page URL interface;The topic incremental crawler for one by one listing each space of a whole page by space of a whole page priority.
Preferably, after the data acquisition, this method also includes:By duplicate removal, denoising, relevant information is extracted, is set up Full-text index.
Preferably, what each website W correspondences one were independent crawls process w, when website W data volume is big, starts multiple Crawl the process w1 of process, w2 ..., wn come data acquisition of dividing the work, website crawls what process was distributed according to task manager Affairs, obtain the Web page for specifying affairs and carry out page core content extraction, for extracting obtained URL according to specified stream Cheng Jinhang is redirected, and database is stored in for the core content of text of extraction.
Preferably, each website is divided into several subtransactions by task manager according to data volume and access limitation, according to Each crawls the loading condition of procedure deployment machine, and subtransaction dynamically is distributed into each crawls process;According to it is specified when Between interval start to gather affairs to dispatch the process of crawling, it is and single if website requires just to allow gathered data after logging in ID crawls the access limitation that process shares and causes the ID to trigger website by multiple, then account manager is unified safeguards a resource Pond, including available id information, and the current number of times and timestamp used of the ID;When some crawls process needs When carrying out accession page using ID, first to one ID of account manager application, not yet reached in account manager retrieval resource pool The ID of threshold limit simultaneously returns to the process of crawling and used, while the access times of the ID are incremented by and access time stamp is updated;
When website limits each IP the access times in certain time, conducted interviews using agent address;Vicariously Location map unit assignment agent address first;Then the network QoS of agent address is detected;Generation is used when some crawls process application When managing address, frequency limit threshold value and the best agency of network quality are not yet reached in agent address map unit retrieval resource pool Address returns to the process of crawling, while count is incremented and updates access time stamp by the use of the IP;Generation in timing scan resource pool The connection situation of address is managed, the time-out time of each agent address is recorded, and invalid agent address is cleared out of into resource pool.
The present invention compared with prior art, with advantages below:
The present invention proposes a kind of public feelings information dynamic monitoring and controlling method, the perfect data acquisition based on search engine Journey, and public feelings information is monitored in real time using efficient data mining algorithm.
Brief description of the drawings
Fig. 1 is the flow chart of public feelings information dynamic monitoring and controlling method according to embodiments of the present invention.
Embodiment
Retouching in detail to one or more embodiment of the invention is hereafter provided together with illustrating the accompanying drawing of the principle of the invention State.The present invention is described with reference to such embodiment, but the invention is not restricted to any embodiment.The scope of the present invention is only by right Claim is limited, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with Thorough understanding of the present invention is just provided.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of public feelings information dynamic monitoring and controlling method.Fig. 1 is according to embodiments of the present invention Public feelings information dynamic monitoring and controlling method flow chart.
The present invention includes data acquisition, two modules of the analysis of public opinion.Data acquisition includes the source website provided according to user Data acquisition of increasing income is carried out with subject information, and the web data of acquisition is stored to local.The analysis of public opinion is included to gathering The web data that journey is obtained is analyzed, and diversified internet information is normalized, and for early stage Network page Face data are deposited using document form, and database purchase is carried out for the result after analysis.Web server is used for user The interface of browser mode is provided, is easy to user profile to inquire about and operate.
The rule of combination of keyword is provided before system operation by user, is on the one hand scanned for by search engine, separately On the one hand gathered data is filtered by crawling process to the website space of a whole page.For the search result of search engine, to being examined by keyword The URL that rope is arrived carries out acquisition order.For specified concern website, it is desirable to which user is assigned to the URL of the space of a whole page or provided one by one Add each space of a whole page URL of website interface.The topic incremental crawler for one by one listing each space of a whole page by space of a whole page priority.Associated nets After page collection, by duplicate removal, denoising, extract after relevant information, set up full-text index.
Data acquisition module crawls process, task manager, agent address map unit and account management including website Device, each website W correspondence one it is independent crawl process w, when website W data volume is big, starts and multiple crawl entering for process Journey w1, w2 ..., wn is come data acquisition of dividing the work.Website crawls the affairs that process is distributed according to task manager, and acquisition refers to Determine the Web page of affairs and carry out page core content extraction, redirected for the URL that extraction is obtained according to specified flow, Database is then stored in for the core content of text of extraction.
Each website is divided into several subtransactions by task manager according to data volume and access limitation, is crawled according to each The loading condition of procedure deployment machine, is dynamically distributed to each by subtransaction and crawls process.And according to specified time interval Start to gather affairs to dispatch the process of crawling, if website requires just to allow gathered data after logging in, and single ID is more Individual to crawl the access limitation that process shares and causes the ID to trigger website, then account manager is unified safeguards a resource pool, bag Include available id information, and the current number of times and timestamp used of the ID.Needed to use when some crawls process When ID carrys out accession page, first to one ID of account manager application, limitation is not yet reached in account manager retrieval resource pool The ID of threshold value simultaneously returns to the process of crawling and used, while the access times of the ID are incremented by and access time stamp is updated.
When website limits each IP the access times in certain time, conducted interviews using agent address.Vicariously Location map unit assignment agent address first;Then the network QoS of agent address is detected.Generation is used when some crawls process application When managing address, frequency limit threshold value and the best agency of network quality are not yet reached in agent address map unit retrieval resource pool Address returns to the process of crawling, while count is incremented and updates access time stamp by the use of the IP.Generation in timing scan resource pool The connection situation of address is managed, the time-out time of each agent address is recorded, and invalid agent address is cleared out of into resource pool.
During acquisition module is run, task manager memory-resident, when discovery current time is apart from last time run time When reaching specified time interval, advance ready-portioned affairs are distributed to each website and crawl process by task manager one by one.Point Hair strategy is as follows:When have website crawl process perform the business that finishes it is idle when, task manager is transaction distribution to idle website Crawl process;When all websites, which crawl process, is all carrying out affairs, then task manager block, until have website crawl into Cheng Zaici is idle.
Crawl processes and use different crawl policies according to application scenarios.In the page core text of different websites The decimation rule of appearance is different, download module, text processing module and crawl policy module are packaged into website crawl affairs as The base unit of distributed deployment, the base unit of transaction scheduling is set to the set of the space of a whole page, column and microblogging ID.When need increase Plus during website, process is crawled to the website corresponding to the website and is programmed;When needing lifting creep speed, increase website is climbed The deployment process quantity of affairs is taken, and download transaction is further segmented.It is independent that the process of crawling is logically separated into following four Order class:Execution crawls strategy, page download, for obtaining the content of pages on specified URL;Contents processing, for basis Type of webpage extracts URL addresses or core topic content of text;Core control flow, affairs are scheduling page downloads, in webpage Appearance processing, execution crawl the interaction of these three tactful classes.The URL for needing to download is obtained from policy class is crawled, then by the URL The download that page download class carries out content is passed to, result next will be downloaded and passes to web page contents processing module, from download As a result middle extraction URL or core topic content of text, the result of processing and the URL extracted finally fed back to and crawl policy class. The cooperation come using the thread of User space between class object.It is that user's active break function in function is performed, and preserves current Breakpoint status, enters back into another section of code execution;When CPU again returns to a upper function, it will be opened at last breakpoint Begin to recover to perform.Subject information is obtained first with the site search of search engine, Meta Search Engine is used for theme afterwards, for referring to Determine website to crawl using whole station and keyword expression filtering retrieval, both combine composition data source.For website Information updating, updates if the byte number downloaded twice before and after same URL is inconsistent or overtime.Excellent is set to each website A thread search is set up in first level, each website, when there is new thread resource, if the search number of plies identifies a site search knot Beam or Thread Count are not up to sum, and the search of next website is carried out according to priority and time, and retain two members all the time and search The thread of rope, thread is used to finding having newly-increased website or theme then to deploy search, and one is used to periodically update.
On the basis of network public-opinion basic data is extracted, the analysis of public opinion module of the invention passes through to user's speech data Analysis, obtains operation of the user to theme, including replys, forwards and quote, and application relational network method is built based on this Customer relationship model.Based on above-mentioned customer relationship model extraction and related relational network parameter, network is described with these parameters Public sentiment influence power user characteristics, and design the public sentiment index composition being distributed for influence power user's identification and network public-opinion trend And methods of exhibiting.
The relational network passes through relational matrix An×nPerformance, the element a wherein in matrixijValue is(i, j= 1...n), value 0 represents UseriTo UserjThere is no annexation, value n represents UserjTo UseriThere is number of times following for n Annexation.Projective transformation is done by certain characteristic parameter to A, the matrix B of some attributive character relations between reflection User is obtainedn×n。 Bn×nIn, if aij>0, then corresponding bijValue is 1, and otherwise value is 0, i, j=1...n.Calculate node i point in-degree indi, the degree i.e. influence power that the node is concerned is showed, is calculated as:Calculate node i point out-degree outi, instead Reflect the node liveness interactive with adjacent node:The point degree size of calculate node, reflects the node and its neighbour The summation of node interaction relation, i.e. nd (i)=indi+outi
Relational network figure is calculated accordingly:Calculate in figure the point degree of each node and descending obtain node Sequence 1.Calculate again each node close to centrad and ascending obtain sequence node 2.Take the preceding m section of two sequences Point obtains the sequence node that 2 length are m, and the frequency sequence occurred by node in 2 sequences obtains the sequence that length is 2m, M important node before finding out;Wherein m is threshold value that is predetermined or being determined by algorithm so that centered on this m node Propagate local area network and just cover whole network.
Build self net of the m important node that back is obtained and propagate local area network.Calculate self net and transmitting network Parameters.Different weighted values are taken to user and its self net, the parameters of propagation local area network, COMPREHENSIVE CALCULATING draws influence The overall target parameter value of power user.
In order to improve the accuracy that public sentiment predicts the outcome, event category is divided, is then respectively established. Forecast model starts to set up the recognizable vector for the information for needing to pay close attention to during initialization;Secondly some spies are judged in search Whether fixed public sentiment occurs, based on gathered mass data and by duplicate removal, denoising after, after network analysis carry out Prediction obtains the situation of a classification of the public sentiment that all marks are by analysis, is held in the prediction based on numerical analysis After row is finished, analyze again analyzing the public sentiment come.To prevent data from excessively frequently clustering, the present invention is by dividing The characteristic statisticses value for gathering to be clustered of analysis data.Then produce from bottom to top and data acquisition system be divided into different levels, Then a corresponding curve is constructed according to this level increased upwards, each clustering cluster of acquisition regards group as, Then the model of each group is directed to, the whole that the class model and this group of minimum include is obtained by least square method Law curve mean square error.The class model storehouse of the event of this type is intactly set up.
In summary, the present invention proposes a kind of public feelings information dynamic monitoring and controlling method, the perfect number based on search engine Public feelings information is monitored in real time according to gatherer process, and using efficient data mining algorithm.
Obviously, can be with general it should be appreciated by those skilled in the art, above-mentioned each module of the invention or each step Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and constituted Network on, alternatively, the program code that they can be can perform with computing system be realized, it is thus possible to they are stored Performed within the storage system by computing system.So, the present invention is not restricted to any specific hardware and software combination.
It should be appreciated that the above-mentioned embodiment of the present invention is used only for exemplary illustration or explains the present invention's Principle, without being construed as limiting the invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent substitution, improvement etc., should be included in the scope of the protection.In addition, appended claims purport of the present invention Covering the whole changes fallen into scope and border or this scope and the equivalents on border and repairing Change example.

Claims (5)

1. a kind of public feelings information dynamic monitoring and controlling method, it is characterised in that including:
The source website provided according to user and subject information carry out data acquisition of increasing income, and the web data of acquisition is stored to this Ground;
The web data that gatherer process is obtained is analyzed, diversified internet information is normalized, and it is right Deposited in early stage Webpage data using document form, database purchase is carried out for the result after analysis.
2. according to the method described in claim 1, it is characterised in that before the data acquisition, this method also includes:
The rule of combination of keyword is provided by user, is on the one hand scanned for by search engine, on the other hand to the website space of a whole page Gathered data is filtered by crawling process;For the search result of search engine, to by keyword retrieval to URL progress order Collection;For specified concern website, it is desirable to which user is assigned to the URL of the space of a whole page or provides each space of a whole page for adding the website one by one URL interface;The topic incremental crawler for one by one listing each space of a whole page by space of a whole page priority.
3. method according to claim 2, it is characterised in that after the data acquisition, this method also includes:Through the past Weight, denoising, extract relevant information, set up full-text index.
4. according to the method described in claim 1, it is characterised in that each website W correspondences one it is independent crawl process w, when When website W data volume is big, start multiple process w1 for crawling process, w2 ..., wn come data acquisition of dividing the work, website is climbed The affairs for taking process to be distributed according to task manager, obtain the Web page for specifying affairs and carry out page core content extraction, right The URL obtained in extraction is redirected according to specified flow, and database is stored in for the core content of text of extraction.
5. method according to claim 4, it is characterised in that task manager is by each website according to data volume and access Limitation is divided into several subtransactions, and the loading condition of procedure deployment machine is crawled according to each, dynamically subtransaction is distributed to Each crawls process;Dispatch the process of crawling according to specified time interval to start to gather affairs, if website requires login Just allow gathered data afterwards, and single ID crawls the access limitation that process shares and causes the ID to trigger website by multiple, then Account manager is unified safeguard a resource pool, including available id information, and the current number of times used of the ID with And timestamp;When some, which crawls process, needs to use ID to carry out accession page, first to one ID of account manager application, account Not yet reach the ID of threshold limit in manager retrieval resource pool and return to the process of crawling and use, while the access of the ID is secondary Number is incremented by and updates access time stamp;
When website limits each IP the access times in certain time, conducted interviews using agent address;Agent address reflects Penetrate unit assignment agent address first;Then the network QoS of agent address is detected;When some crawls process application using vicariously During location, frequency limit threshold value and the best agent address of network quality are not yet reached in agent address map unit retrieval resource pool The process of crawling is returned to, while count is incremented and updates access time stamp by the use of the IP;In timing scan resource pool vicariously The connection situation of location, records the time-out time of each agent address, and invalid agent address is cleared out of into resource pool.
CN201710441942.7A 2017-06-13 2017-06-13 Public feelings information dynamic monitoring and controlling method Pending CN107239563A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710441942.7A CN107239563A (en) 2017-06-13 2017-06-13 Public feelings information dynamic monitoring and controlling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710441942.7A CN107239563A (en) 2017-06-13 2017-06-13 Public feelings information dynamic monitoring and controlling method

Publications (1)

Publication Number Publication Date
CN107239563A true CN107239563A (en) 2017-10-10

Family

ID=59987550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710441942.7A Pending CN107239563A (en) 2017-06-13 2017-06-13 Public feelings information dynamic monitoring and controlling method

Country Status (1)

Country Link
CN (1) CN107239563A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334409A (en) * 2018-01-15 2018-07-27 北京大学 A kind of fine-grained high-performance cloud resource management dispatching method
CN109190010A (en) * 2018-09-20 2019-01-11 河南智慧云大数据有限公司 Internet data acquisition system is carried out based on customized keyword acquisition mode
CN109815382A (en) * 2018-12-29 2019-05-28 中国科学院计算技术研究所 The perception and acquisition methods and system of large scale network data
CN109871475A (en) * 2019-02-28 2019-06-11 上海浪潮云计算服务有限公司 A kind of method and system of in a preferential order piecemeal acquisition internet data
CN111753169A (en) * 2020-06-29 2020-10-09 金电联行(北京)信息技术有限公司 Data acquisition system based on internet
CN112612944A (en) * 2020-12-07 2021-04-06 深圳价值在线信息科技股份有限公司 Case information management method, terminal equipment and system
CN112749314A (en) * 2020-12-23 2021-05-04 民生科技有限责任公司 Accurate and efficient target public opinion intelligent monitoring system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101299219A (en) * 2008-06-27 2008-11-05 北京邮电大学 Multithread breakpoint continued transmission customizable internal net reptile system
CN103177076A (en) * 2012-12-28 2013-06-26 中联竞成(北京)科技有限公司 Public sentiment monitoring system and method based on fixed point websites
US20140280009A1 (en) * 2013-03-15 2014-09-18 Chad Hage Methods and apparatus to supplement web crawling with cached data from distributed devices
CN104951512A (en) * 2015-05-27 2015-09-30 中国科学院信息工程研究所 Public sentiment data collection method and system based on Internet

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101299219A (en) * 2008-06-27 2008-11-05 北京邮电大学 Multithread breakpoint continued transmission customizable internal net reptile system
CN103177076A (en) * 2012-12-28 2013-06-26 中联竞成(北京)科技有限公司 Public sentiment monitoring system and method based on fixed point websites
US20140280009A1 (en) * 2013-03-15 2014-09-18 Chad Hage Methods and apparatus to supplement web crawling with cached data from distributed devices
CN104951512A (en) * 2015-05-27 2015-09-30 中国科学院信息工程研究所 Public sentiment data collection method and system based on Internet

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
叶昭晖 等: "基于搜索引擎的网络舆情监控系统设计与实现", 《广西大学学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334409A (en) * 2018-01-15 2018-07-27 北京大学 A kind of fine-grained high-performance cloud resource management dispatching method
CN109190010A (en) * 2018-09-20 2019-01-11 河南智慧云大数据有限公司 Internet data acquisition system is carried out based on customized keyword acquisition mode
CN109190010B (en) * 2018-09-20 2021-05-11 河南智慧云大数据有限公司 Internet data acquisition system based on user-defined keyword acquisition mode
CN109815382A (en) * 2018-12-29 2019-05-28 中国科学院计算技术研究所 The perception and acquisition methods and system of large scale network data
CN109815382B (en) * 2018-12-29 2022-07-12 中国科学院计算技术研究所 Method and system for sensing and acquiring large-scale network data
CN109871475A (en) * 2019-02-28 2019-06-11 上海浪潮云计算服务有限公司 A kind of method and system of in a preferential order piecemeal acquisition internet data
CN111753169A (en) * 2020-06-29 2020-10-09 金电联行(北京)信息技术有限公司 Data acquisition system based on internet
CN112612944A (en) * 2020-12-07 2021-04-06 深圳价值在线信息科技股份有限公司 Case information management method, terminal equipment and system
CN112612944B (en) * 2020-12-07 2024-05-31 深圳价值在线信息科技股份有限公司 Case information management method, terminal equipment and system
CN112749314A (en) * 2020-12-23 2021-05-04 民生科技有限责任公司 Accurate and efficient target public opinion intelligent monitoring system and method

Similar Documents

Publication Publication Date Title
CN107239563A (en) Public feelings information dynamic monitoring and controlling method
CN105956175B (en) The method and apparatus that web page contents are crawled
CN105243159B (en) A kind of distributed network crawler system based on visualization script editing machine
CN100490388C (en) Invading detection method and system based on procedure action
CN106778253A (en) Threat context aware information security Initiative Defense model based on big data
CN108039959A (en) Situation Awareness method, system and the relevant apparatus of a kind of data
US20090216868A1 (en) Anti-spam tool for browser
CN107590188A (en) A kind of reptile crawling method and its management system for automating vertical subdivision field
CN107317724A (en) Data collecting system and method based on cloud computing technology
US20120188249A1 (en) Distributed graph system and method
CN109074454A (en) Malware is grouped automatically based on artefact
CN112632135A (en) Big data platform
US20150161555A1 (en) Scheduling tasks to operators
CN105468664A (en) Information acquisition method and apparatus
Gabrel et al. Optimal and automatic transactional web service composition with dependency graph and 0-1 linear programming
CN107784113A (en) Html web page collecting method, device and computer-readable recording medium
CN116185754A (en) Data monitoring method, device, equipment, computer storage medium and program product
CN111597422A (en) Buried point mapping method and device, computer equipment and storage medium
CN111683107A (en) Internet-oriented security audit method and system
US20100161671A1 (en) System and method for generating hierarchical categories from collection of related terms
CN109446441A (en) A kind of credible distributed capture storage system of general Web Community
CN109063216A (en) A kind of distributed vertical service search crawler frame
CN105069004A (en) Patent information automatic collection method
CN113918534A (en) Policy processing system and method
CN104391956B (en) The detection method and device of network upgrade content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171010

RJ01 Rejection of invention patent application after publication