CN109063144A - Visual network crawler method and device - Google Patents

Visual network crawler method and device Download PDF

Info

Publication number
CN109063144A
CN109063144A CN201810889341.7A CN201810889341A CN109063144A CN 109063144 A CN109063144 A CN 109063144A CN 201810889341 A CN201810889341 A CN 201810889341A CN 109063144 A CN109063144 A CN 109063144A
Authority
CN
China
Prior art keywords
crawler
selection operation
interface
tool box
box column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810889341.7A
Other languages
Chinese (zh)
Inventor
刘振华
徐玉立
刘幸明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Golden Cat Information Technology Service Co Ltd
Original Assignee
Guangzhou Golden Cat Information Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Golden Cat Information Technology Service Co Ltd filed Critical Guangzhou Golden Cat Information Technology Service Co Ltd
Priority to CN201810889341.7A priority Critical patent/CN109063144A/en
Publication of CN109063144A publication Critical patent/CN109063144A/en
Pending legal-status Critical Current

Links

Landscapes

  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the present invention provides a kind of visual network crawler method and device.The visual network crawler method, comprising: start a targeted website in specified browser;After receiving selection operation, shows web crawlers control interface, show station in the control interface, web crawlers plug-in unit is installed in the specified browser;After monitoring the newly-increased operation in the station, newly-increased web crawlers interface is shown, include tool box column in the web crawlers interface, include multiple flow through a networks in the tool box column;It receives after the selection operation in the tool box column, crawler process is formed according to the selection operation;The crawler process is stored.

Description

Visual network crawler method and device
Technical field
The present invention relates to field of computer technology, in particular to a kind of visual network crawler method and device.
Background technique
Traditional crawler system and method also has part to use visualization to write based on crawlers by technical staff Method, but all only support collection rule Visual Production, without procedure, the embodiment of stepwise, in the development phase It is also relatively high to link up cost.
Summary of the invention
In view of this, the embodiment of the present invention is designed to provide a kind of visual network crawler method and device.
In a first aspect, a kind of visual network crawler method provided in an embodiment of the present invention, comprising:
Start a targeted website in specified browser;
After receiving selection operation, shows web crawlers control interface, show station in the control interface, institute It states and web crawlers plug-in unit is installed in specified browser;
After monitoring the newly-increased operation in the station, newly-increased web crawlers interface, the web crawlers are shown Include tool box column in interface, includes multiple flow through a networks in the tool box column;
It receives after the selection operation in the tool box column, crawler process is formed according to the selection operation;
The crawler process is stored.
Optionally, the method also includes:
Receive crawler request after, obtain the crawler process of storage, and send crawler request to objective browser into Row grasping manipulation.
Optionally, it is described receive crawler request after, obtain the crawler process of storage, and send crawler request target Browser carries out the step of grasping manipulation, comprising:
After receiving crawler request, the crawler process of storage is obtained, and send crawler and request to browser collection without a head Group carries out grasping manipulation.
Optionally, the method also includes:
Data Data acquisition request is crawled to server transmission, crawling data includes during the crawler process is performed The related data of generation;
It receives the described of server return and crawls data.
Optionally, the reception forms crawler according to the selection operation after the selection operation in the tool box column The step of process, comprising:
Receive the selection operation inside the tool box column;
Show the parameter setting interface of the corresponding process of the selection operation;
Receive the parameter being arranged in the parameter setting interface;
It steps be repeated alternatively until each process being provided in crawler process.
It is optionally, described to receive the parameter that the parameter setting interface is arranged the step of, comprising:
Start the element capture function of the parameter setting interface;
After the point selection operation for monitoring mouse, parameter is set by the corresponding data of described selection operation.
Optionally, after the element capture function of the starting parameter setting interface the step of, the method is also Include:
The position for monitoring cursor of mouse, the position of the cursor of mouse is differently shown.
It optionally, include: to open webpage, click element, fill in input frame, operation keyboard, setting in the tool box column Parameter, webpage capture, obtains web page element, mouse suspension, retracts, process circulation, IF judgement, item, verifying loaded Code, digital operation, send message, exit circulation, exit the program in it is any a variety of.
Second aspect, the embodiment of the present invention also provide a kind of visual network crawler device, comprising:
Starting module, for starting a targeted website in specified browser;
First display module, for showing web crawlers control interface, the control interface after receiving selection operation In show station, web crawlers plug-in unit is installed in the specified browser;
Second display module shows newly-increased web crawlers after monitoring the newly-increased operation in the station Interface includes tool box column in the web crawlers interface, includes multiple flow through a networks in the tool box column;
Module is formed, for receiving after the selection operation in the tool box column, is climbed according to selection operation formation Worm process;
Memory module, for storing the crawler process.
Optionally, described device further include:
Handling module, for receive crawler request after, obtain the crawler process of storage, and send crawler request to Objective browser carries out grasping manipulation.
Compared with prior art, the visual network crawler method and device of the embodiment of the present invention.By can in user Depending on interface in for user's operation be arranged crawler process, each stage is provided which visual display interface, facilitates user setting, mentions High user experience.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, special embodiment below, and appended by cooperation Attached drawing is described in detail below.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the block diagram of electronic equipment provided in an embodiment of the present invention.
Fig. 2 is the flow chart of visual network crawler method provided in an embodiment of the present invention.
Fig. 3 is the signal at web crawlers interface used in visual network crawler method provided in an embodiment of the present invention Figure.
Fig. 4 is the signal of parameter setting interface used in visual network crawler method provided in an embodiment of the present invention Figure.
Fig. 5 be another embodiment of the present invention provides visual network crawler method flow chart.
Fig. 6 is the functional block diagram of visual network crawler device provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Usually exist The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause This, is not intended to limit claimed invention to the detailed description of the embodiment of the present invention provided in the accompanying drawings below Range, but it is merely representative of selected embodiment of the invention.Based on the embodiment of the present invention, those skilled in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Meanwhile of the invention In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
As shown in Figure 1, being the block diagram of the electronic equipment.The electronic equipment 100 is climbed including visual network Worm device 110, memory 111, storage control 112, processor 113, Peripheral Interface 114, input-output unit 115, display Unit 116.It will appreciated by the skilled person that structure shown in FIG. 1 is only to illustrate, not to electronic apparatus 100 structure causes to limit.For example, electronic equipment 100 may also include than shown in Fig. 1 more perhaps less component or With the configuration different from shown in Fig. 1.
The memory 111, storage control 112, processor 113, Peripheral Interface 114 and input-output unit 115 are each Element is directly or indirectly electrically connected between each other, to realize the transmission or interaction of data.For example, these elements are mutual It can be realized and be electrically connected by one or more communication bus or signal wire.The visual network crawler device 110 includes extremely Few one can be stored in the memory 111 or be solidificated in the electronic equipment in the form of software or firmware (firmware) Software function module in 100 operating system (operating system, OS).The processor 113 is for executing storage The executable module stored in device, such as software function module or computer that the visual network crawler device 110 includes Program.
Wherein, the memory 111 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..Wherein, memory 111 is for storing program, the processor 113 after receiving and executing instruction, Described program is executed, method performed by the electronic equipment 100 that the process that any embodiment of the embodiment of the present invention discloses defines can To be applied in processor 113, or realized by processor 113.
The processor 113 may be a kind of IC chip, the processing capacity with signal.Above-mentioned processor 113 can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processes Device (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.It is general Processor can be microprocessor or the processor is also possible to any conventional processor etc..
Various input/output devices are couple processor 113 and memory 111 by the Peripheral Interface 114.Some In embodiment, Peripheral Interface 114, processor 113 and storage control 112 can be realized in one single chip.Other one In a little examples, they can be realized by independent chip respectively.
The input-output unit 115 is for being supplied to user input data.The input-output unit 115 can be, But it is not limited to, mouse and keyboard etc..
The display unit 116 provided between the electronic equipment 100 and user an interactive interface (such as user behaviour Make interface) or for display image data give user reference.In the present embodiment, the display unit can be liquid crystal display Or touch control display.It can be the capacitance type touch control screen or resistance of support single-point and multi-point touch operation if touch control display Formula touch screen etc..Single-point and multi-point touch operation is supported to refer to that touch control display can sense on the touch control display one Or the touch control operation generated simultaneously at multiple positions, and the touch control operation that this is sensed transfers to processor to be calculated and located Reason.
Inventor studies existing crawler technology, and traditional crawler system and method are to write crawlers by technical staff Based on, also there is part use visualization method, but all only support the Visual Production of collection rule, not procedure It embodies, compared with flow path visual crawler system, disadvantage mainly has the following:
1. exploitation amount difficulty is high.Traditional focused crawler system mostly uses greatly conventional web crawlers technology, needs profession Software developer according to certain webpage rule, write out the program or script for automatically grabbing web message.But Since the structure of web page of each website is different, for each different website, requires technical staff and go to write different crawlers Program.And the structure of web page when targeted website is when changing, it is also necessary to develop crawler again.Meanwhile for developer Also there is competency profiling also higher, need to understand html structure, understand network communication, the professional skills such as packet capturing.Result in exploitation with And maintenance crawler system is all a high-cost thing.
2. being linked up between development period at high cost.User directly links up successfully very high with exploitation.Because user is difficult to describe him What is looked for, needs to write a large amount of rule documents.And developer can only also know where user wants acquisition by document A little fields.
3. in the prior art, there are also doing visualization crawler system, but due to the development of front-end technology, very much Website takes client Rendering, sends Ajax request and server is gone to obtain data, be rendered to really in client Html, that is to say, that conventional crawler method can't see complete presentation source code.So most of visualization crawlers all can only be right The web site of server-side rendering is crawled, and is that and can not specify some specific to the field rule extraction in static Web page Operation is such as clicked, input etc..Existing system obtains webpage data above resolution rules simply by visual mode, And data resolution rules are parsed by processing module and obtain data.This method can handle some static Web pages, but nothing Method handles the dynamic page of some such as Ajax, and does not record some specific operations, for example, a webpage head screen if there is Mask advertisement, if not clicking the close button of removal mask advertisement, can not visual rules for grasping, these disadvantages all make This method does not adapt to more websites and crawls work.
The technical issues of being gone out based on foregoing description, the application can efficiently solve above-mentioned skill by following embodiment Art problem, is described in detail below.
Referring to Fig. 2, being the visual network crawler provided in an embodiment of the present invention applied to electronic equipment shown in FIG. 1 The flow chart of method.Detailed process shown in Fig. 4 will be described in detail below.
Step S201 starts a targeted website in specified browser.
In the present embodiment, the specified browser can be chrome browser.It is of course also possible to be other browsers. It is described by taking chrome browser as an example below.
In the present embodiment, the visual network crawler method be can be used with lower component: chrome plug-in unit, process execute Engine, browser cluster without a head, Distributed Message Queue, task scheduling system, url duplicate removal filtering module, crawler monitoring module, Data memory module.
Before step S201, chrome plug-in unit is installed on chrome browser first.
The targeted website is then the website for wanting to crawl.
Step S202 shows web crawlers control interface after receiving selection operation.
In the present embodiment, station is shown in the control interface, web crawlers is installed in the specified browser Plug-in unit.
In the present embodiment, ParseRobot can be clicked with him by the way that the Debugging interface of browser can be opened by F12 key Panel enters chrome plug-in unit.
The station that can show chrome on an electronic device can click newly-increased crawler button in left side, increase one newly A crawler.In an example, title can be set for newly-increased crawler.For example, it is desired to be grabbed to the website ABC, then newly The crawler of increasing can be named as ABC crawler.
In an example, the objective https that lives in peace can be opened on chrome browser: // Guangzhou.anjuke.com/sale/? the website of kw=&from=zjsr, and open chrome plug-in unit.Chrome's Web crawlers control interface increases a crawler newly, can be named as the visitor that lives in peace.
Step S203 after monitoring the newly-increased operation in the station, shows newly-increased web crawlers interface.
Include tool box column in the web crawlers interface in the present embodiment, includes multiple networks in the tool box column Process.
Include: in the present embodiment, in the tool box column open webpage, click element, fill in input frame, operation keyboard, Setting parameter, webpage capture, obtains web page element, mouse suspension, retracts, process circulation, IF judgement, item, testing loaded Card code, digital operation, send message, exit circulation, exit the program in it is any a variety of.
In an example, the tool box column can be set in the left side at the web crawlers interface.
As shown in figure 3, being tool box column, process workspace, the right that centre is crawler on the left of the web crawlers interface For parameter setting interface.The many processes provided in tool box column interface are provided, such as: webpage is opened, element is clicked, fills in input Frame obtains the processes such as web data, webpage capture, these processes can simulate the process that people operates browser.The newly-increased behaviour It can be the operation for pulling the process workspace that the process in tool box column is inserted into crawler.
Step S204 is received after the selection operation in the tool box column, forms crawler stream according to the selection operation Journey.
, can be according to your process to be combined from top to bottom in example as shown in Figure 3 in the present embodiment, formation is currently climbed The process chain of worm.
In the present embodiment, the reception is formed after the selection operation in the tool box column according to the selection operation The step of crawler process, comprising: receive the selection operation inside the tool box column;Show the corresponding stream of the selection operation The parameter setting interface of journey;Receive the parameter being arranged in the parameter setting interface;It steps be repeated alternatively until to be provided with and climb Each process in worm process.In the present embodiment, if the crawler process includes multiple processes, the repeatedly above-mentioned step is executed Suddenly.
It is described to receive the parameter that the parameter setting interface is arranged the step of, comprising: to start the ginseng in the present embodiment The element capture function of number set interface;The position for monitoring cursor of mouse, the position of the cursor of mouse is differently shown; After the point selection operation for monitoring mouse, parameter is set by the corresponding data of described selection operation.
As shown in figure 4, include: in the parameter setting interface element term, element description, element positioning, value type, Wait the projects such as element loaded, high latency, storage, canonical.Wherein, further include in the parameter setting interface " element capture " button.
For example, one newly-increased " obtaining web page element process ", and click the element for obtaining web page element parameter setting panel The page of capture button, chrome browser can highlight, source of houses mark above webpage clicking with the movement of cursor of mouse automatically Element is inscribed, can be obtained the css/xpath rule of object element.
Step S205 stores the crawler process.
The visual network crawler method of the embodiment of the present invention, by being set in the visual interface of user for user's operation Crawler process is set, each stage is provided which visual display interface, facilitates user setting, improves user experience.
In the present embodiment, as shown in figure 5, the visual network crawler method further include: step S206 receives crawler After request, the crawler process of storage is obtained, and sends crawler and requests to carry out grasping manipulation to objective browser.
In one embodiment, the step S206 includes: after receiving crawler request, to obtain the crawler of storage Process, and send crawler and request to carry out grasping manipulation to browser cluster without a head.
In the present embodiment, the visual network crawler method further include:
Data Data acquisition request is crawled to server transmission, crawling data includes during the crawler process is performed The related data of generation;
It receives the described of server return and crawls data.
Referring to Fig. 6, being the function mould of visual network crawler device 110 shown in FIG. 1 provided in an embodiment of the present invention Block schematic diagram.The modules of the visual network crawler device 110 in the present embodiment are for executing above method implementation Each step in example.The visual network crawler device 110 includes: starting module 1101, the first display module 1102, Two display modules 1103 form module 1104 and memory module 1105.
The starting module 1101, for starting a targeted website in specified browser.
First display module 1102, it is described for after receiving selection operation, showing web crawlers control interface Station is shown in control interface, and web crawlers plug-in unit is installed in the specified browser.
Second display module 1103 is shown newly-increased after monitoring the newly-increased operation in the station Web crawlers interface includes tool box column in the web crawlers interface, includes multiple flow through a networks in the tool box column.
The formation module 1104 is grasped for receiving after the selection operation in the tool box column according to the selection Form crawler process.
The memory module 1105, for storing the crawler process.
In the present embodiment, the visual network crawler device 110 further include: handling module 1106 is climbed for receiving After worm request, the crawler process of storage is obtained, and sends crawler and requests to carry out grasping manipulation to objective browser.
In the present embodiment, the handling module 1106 is also used to:
After receiving crawler request, the crawler process of storage is obtained, and send crawler and request to browser collection without a head Group carries out grasping manipulation.
In the present embodiment, the visual network crawler device further include:
Sending module, for crawling Data Data acquisition request to server transmission, crawling data includes the crawler stream The related data that journey generates during being performed;
Receiving module crawls data described in the server return for receiving.
In the present embodiment, the formation module 1104 is also used to:
Receive the selection operation inside the tool box column;
Show the parameter setting interface of the corresponding process of the selection operation;
Receive the parameter being arranged in the parameter setting interface;
It steps be repeated alternatively until each process being provided in crawler process.
In the present embodiment, the formation module 1104 is also used to:
Start the element capture function of the parameter setting interface;
After the point selection operation for monitoring mouse, parameter is set by the corresponding data of described selection operation.
In the present embodiment, the formation module 1104 of the visual network crawler device is also used to:
The position for monitoring cursor of mouse, the position of the cursor of mouse is differently shown.
Include: in the present embodiment, in the tool box column open webpage, click element, fill in input frame, operation keyboard, Setting parameter, webpage capture, obtains web page element, mouse suspension, retracts, process circulation, IF judgement, item, testing loaded Card code, digital operation, send message, exit circulation, exit the program in it is any a variety of.
The visual network crawler device of the embodiment of the present invention.By being set in the visual interface of user for user's operation Crawler process is set, each stage is provided which visual display interface, facilitates user setting, improves user experience.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and block diagram in attached drawing Show the device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product, Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a module, section or code Part, a part of the module, section or code, which includes that one or more is for implementing the specified logical function, to be held Row instruction.It should also be noted that function marked in the box can also be to be different from some implementations as replacement The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes It can execute in the opposite order, this depends on the function involved.It is also noted that every in block diagram and or flow chart The combination of box in a box and block diagram and or flow chart can use the dedicated base for executing defined function or movement It realizes, or can realize using a combination of dedicated hardware and computer instructions in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.It needs Illustrate, herein, relational terms such as first and second and the like be used merely to by an entity or operation with Another entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this realities The relationship or sequence on border.Moreover, the terms "include", "comprise" or its any other variant are intended to the packet of nonexcludability Contain, so that the process, method, article or equipment for including a series of elements not only includes those elements, but also including Other elements that are not explicitly listed, or further include for elements inherent to such a process, method, article, or device. In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including the element Process, method, article or equipment in there is also other identical elements.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should also be noted that similar label and letter exist Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing It is further defined and explained.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.

Claims (10)

1. a kind of visual network crawler method characterized by comprising
Start a targeted website in specified browser;
After receiving selection operation, shows web crawlers control interface, show station, the finger in the control interface Determine that web crawlers plug-in unit is installed in browser;
After monitoring the newly-increased operation in the station, newly-increased web crawlers interface, the web crawlers interface are shown In include tool box column, include multiple flow through a networks in the tool box column;
It receives after the selection operation in the tool box column, crawler process is formed according to the selection operation;
The crawler process is stored.
2. visual network crawler method as described in claim 1, which is characterized in that the method also includes:
After receiving crawler request, the crawler process of storage is obtained, and send crawler and request to be grabbed to objective browser Extract operation.
3. visual network crawler method as claimed in claim 2, which is characterized in that it is described receive crawler request after, obtain The crawler process of storage is taken, and sends the step of crawler request target browser carries out grasping manipulation, comprising:
Receive crawler request after, obtain the crawler process of storage, and send crawler request to browser cluster without a head into Row grasping manipulation.
4. visual network crawler method as claimed in claim 2, which is characterized in that the method also includes:
Data Data acquisition request is crawled to server transmission, crawling data includes generating during the crawler process is performed Related data;
It receives the described of server return and crawls data.
5. visual network crawler method as described in claim 1, which is characterized in that the reception is in the tool box column Selection operation after, according to the selection operation formed crawler process the step of, comprising:
Receive the selection operation inside the tool box column;
Show the parameter setting interface of the corresponding process of the selection operation;
Receive the parameter being arranged in the parameter setting interface;
It steps be repeated alternatively until each process being provided in crawler process.
6. visual network crawler method as claimed in claim 5, which is characterized in that the reception is in parameter setting circle The step of parameter of face setting, comprising:
Start the element capture function of the parameter setting interface;
After the point selection operation for monitoring mouse, parameter is set by the corresponding data of described selection operation.
7. visual network crawler method as claimed in claim 6, which is characterized in that in starting parameter setting circle After the step of element capture function in face, the method also includes:
The position for monitoring cursor of mouse, the position of the cursor of mouse is differently shown.
8. the visual network crawler method as described in claim 1-7 any one, which is characterized in that in the tool box column It include: to open webpage, click element, fill in input frame, operation keyboard, setting parameter, loaded, webpage capture, obtain net Page element, mouse suspensions, rollback, process circulation, IF judgement, item, identifying code, digital operation, transmission message, exit circulation, It is any a variety of in exiting the program.
9. a kind of visual network crawler device characterized by comprising
Starting module, for starting a targeted website in specified browser;
First display module is shown in the control interface for after receiving selection operation, showing web crawlers control interface It is shown with station, web crawlers plug-in unit is installed in the specified browser;
Second display module shows newly-increased web crawlers interface after monitoring the newly-increased operation in the station, Include tool box column in the web crawlers interface, includes multiple flow through a networks in the tool box column;
Module is formed, for receiving after the selection operation in the tool box column, crawler stream is formed according to the selection operation Journey;
Memory module, for storing the crawler process.
10. visual network crawler device as claimed in claim 9, which is characterized in that described device further include:
Handling module obtains the crawler process of storage, and send crawler and request to target after receiving crawler request Browser carries out grasping manipulation.
CN201810889341.7A 2018-08-07 2018-08-07 Visual network crawler method and device Pending CN109063144A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810889341.7A CN109063144A (en) 2018-08-07 2018-08-07 Visual network crawler method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810889341.7A CN109063144A (en) 2018-08-07 2018-08-07 Visual network crawler method and device

Publications (1)

Publication Number Publication Date
CN109063144A true CN109063144A (en) 2018-12-21

Family

ID=64832139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810889341.7A Pending CN109063144A (en) 2018-08-07 2018-08-07 Visual network crawler method and device

Country Status (1)

Country Link
CN (1) CN109063144A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710831A (en) * 2018-12-28 2019-05-03 四川新网银行股份有限公司 A kind of network crawler system based on browser plug-in
CN110209909A (en) * 2019-04-19 2019-09-06 平安科技(深圳)有限公司 Data crawling method, device, computer equipment and storage medium
CN110516135A (en) * 2019-08-29 2019-11-29 杭州时趣信息技术有限公司 A kind of crawler system and method
CN112100061A (en) * 2020-08-28 2020-12-18 广州探迹科技有限公司 Visual crawler code compiling and debugging method
CN112231536A (en) * 2020-10-26 2021-01-15 中国信息安全测评中心 Data crawling method and device based on self-learning
CN112579850A (en) * 2019-09-29 2021-03-30 北京国双科技有限公司 Breakpoint recovery method and device
CN115328812A (en) * 2022-10-11 2022-11-11 深圳华锐分布式技术股份有限公司 UI (user interface) testing method, device, equipment and medium based on web crawler

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080114739A1 (en) * 2006-11-14 2008-05-15 Hayes Paul V System and Method for Searching for Internet-Accessible Content
US20120078875A1 (en) * 2010-09-27 2012-03-29 Michael Price Web browser contacts plug-in
CN103279507A (en) * 2013-05-16 2013-09-04 北京尚友通达信息技术有限公司 Webpage spider operational method and system
CN104408204A (en) * 2014-12-18 2015-03-11 北京国双科技有限公司 Method and device for obtaining webpage page link address
CN106156370A (en) * 2016-08-29 2016-11-23 携程计算机技术(上海)有限公司 Reptile implementation method based on the built-in crawler system of browser
CN106446020A (en) * 2016-08-29 2017-02-22 携程计算机技术(上海)有限公司 Browser built-in crawler system-based fingerprint identification realization method
CN106547749A (en) * 2015-09-16 2017-03-29 北京国双科技有限公司 The method and apparatus of collecting webpage data
CN107273499A (en) * 2017-06-16 2017-10-20 成都布林特信息技术有限公司 Data grab method based on vertical search engine
CN108304498A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Webpage data acquiring method, device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080114739A1 (en) * 2006-11-14 2008-05-15 Hayes Paul V System and Method for Searching for Internet-Accessible Content
US20120078875A1 (en) * 2010-09-27 2012-03-29 Michael Price Web browser contacts plug-in
CN103279507A (en) * 2013-05-16 2013-09-04 北京尚友通达信息技术有限公司 Webpage spider operational method and system
CN104408204A (en) * 2014-12-18 2015-03-11 北京国双科技有限公司 Method and device for obtaining webpage page link address
CN106547749A (en) * 2015-09-16 2017-03-29 北京国双科技有限公司 The method and apparatus of collecting webpage data
CN106156370A (en) * 2016-08-29 2016-11-23 携程计算机技术(上海)有限公司 Reptile implementation method based on the built-in crawler system of browser
CN106446020A (en) * 2016-08-29 2017-02-22 携程计算机技术(上海)有限公司 Browser built-in crawler system-based fingerprint identification realization method
CN107273499A (en) * 2017-06-16 2017-10-20 成都布林特信息技术有限公司 Data grab method based on vertical search engine
CN108304498A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Webpage data acquiring method, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CINDY: "Web Scraper", 《WEB SCRAPER》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710831A (en) * 2018-12-28 2019-05-03 四川新网银行股份有限公司 A kind of network crawler system based on browser plug-in
CN110209909A (en) * 2019-04-19 2019-09-06 平安科技(深圳)有限公司 Data crawling method, device, computer equipment and storage medium
CN110516135A (en) * 2019-08-29 2019-11-29 杭州时趣信息技术有限公司 A kind of crawler system and method
CN112579850A (en) * 2019-09-29 2021-03-30 北京国双科技有限公司 Breakpoint recovery method and device
CN112100061A (en) * 2020-08-28 2020-12-18 广州探迹科技有限公司 Visual crawler code compiling and debugging method
CN112231536A (en) * 2020-10-26 2021-01-15 中国信息安全测评中心 Data crawling method and device based on self-learning
CN115328812A (en) * 2022-10-11 2022-11-11 深圳华锐分布式技术股份有限公司 UI (user interface) testing method, device, equipment and medium based on web crawler

Similar Documents

Publication Publication Date Title
CN109063144A (en) Visual network crawler method and device
Khder Web scraping or web crawling: State of art, techniques, approaches and application.
US9015144B2 (en) Configuring web crawler to extract web page information
CN106293365B (en) A kind of method and device obtaining content of pages
Paternò et al. Customizable automatic detection of bad usability smells in mobile accessed web applications
Shen et al. Visual analysis of massive web session data
CA3183941A1 (en) Machine learning based webinterface generation and testing system
US20160077672A1 (en) Flexible Analytics-Driven Webpage Design and Optimization
US10866692B2 (en) Methods and apparatus for creating overlays according to trending information
CN108763274B (en) Access request identification method and device, electronic equipment and storage medium
CN106951495A (en) Method and apparatus for information to be presented
De Santana et al. Summarizing observational client-side data to reveal web usage patterns
CN110309386A (en) A kind of method and apparatus of web page crawl
CN104281629A (en) Method and device for extracting picture from webpage and client equipment
US10289658B1 (en) Web page design scanner
Abodayeh et al. Web Scraping for Data Analytics: A BeautifulSoup Implementation
CN107220230A (en) A kind of information collecting method and device, and a kind of intelligent terminal
JP4539438B2 (en) COLLECTING METHOD AND DEVICE FOR TRACKBACK SOURCE COMMENT / TRACKBACK, PROGRAM, AND COMPUTER-READABLE STORAGE MEDIUM CONTAINING THE PROGRAM
Schissel et al. Automated metadata, provenance cataloging and navigable interfaces: Ensuring the usefulness of extreme-scale data
Shen et al. A Catalogue Service for Internet GIS ervices Supporting Active Service Evaluation and Real‐Time Quality Monitoring
Osborne et al. Development of InfoVis software for digital forensics
US20120173524A1 (en) Capturing collection information for institutions
KR101654192B1 (en) Method and Apparatus for Analyzing Touch Data, and Touch Data Analyzing System
Hienert et al. Whose–a tool for whole-session analysis in iir
JP6520955B2 (en) Data verification program, data verification method and data verification apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200710

Address after: 115000 China (Liaoning) free trade Experimental Zone, Yingkou City, Liaoning Province

Applicant after: Liaoning Xinzhen commercial factoring Co., Ltd

Address before: 510000 Room 602, 153 Sports West Road, Tianhe District, Guangzhou City, Guangdong Province (office only)

Applicant before: GUANGZHOU JINMAO INFORMATION TECHNOLOGY SERVICE Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221