CN109460522A - The acquisition methods and device of site information - Google Patents

The acquisition methods and device of site information Download PDF

Info

Publication number
CN109460522A
CN109460522A CN201811279690.3A CN201811279690A CN109460522A CN 109460522 A CN109460522 A CN 109460522A CN 201811279690 A CN201811279690 A CN 201811279690A CN 109460522 A CN109460522 A CN 109460522A
Authority
CN
China
Prior art keywords
information
input frame
website
site information
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811279690.3A
Other languages
Chinese (zh)
Inventor
赵丙峰
陶志明
金红豆
常春倩
张爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Net Co Creation Technology Co Ltd
Original Assignee
Beijing Net Co Creation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Net Co Creation Technology Co Ltd filed Critical Beijing Net Co Creation Technology Co Ltd
Priority to CN201811279690.3A priority Critical patent/CN109460522A/en
Publication of CN109460522A publication Critical patent/CN109460522A/en
Pending legal-status Critical Current

Links

Abstract

This application provides a kind of acquisition methods of site information and devices, wherein, this method comprises: the log-in interface according to preset rules identification website, account and password etc. are inputted in corresponding position, then the website is logged on to, relevant first site information of the first information indicated in default template is obtained in the website, it extracts first site information and exports, using the above scheme, solve the problems, such as that crawler is not applied for most webpages and causes maintenance cost higher in the related technology, above-mentioned preset rules can be adapted for logging in for most websites, the feature of information to be obtained is provided in default template, increase the scope of application of crawler, reduce the susceptibility to different web pages, it is no longer modified for different web pages and substantially modifies program code, reduce maintenance cost.

Description

The acquisition methods and device of site information
Technical field
This application involves but be not limited to internet area, in particular to the acquisition methods and dress of a kind of site information It sets.
Background technique
In the related art, crawler needs to be concerned about that user logs in logic, needs to be concerned about picture validation code, and care is needed to crawl URL, often crawl a website and need to develop a set of program code, the realization of targeted website is relied on serious, robustness is not strong, It crawls result and needs stringent parsing requirement, deeply understand that details is realized in targeted website, maintenance cost is relatively high, and development efficiency It is slow, be not able to satisfy it is higher and faster in business crawl, it is sensitive to target pages, be unable to automatic adaptation major part page-level Correcting, maintenance cost are relatively high.
Being not applied for most webpages for crawler in the related technology leads to the higher problem of maintenance cost, there is presently no Effective solution scheme.
Summary of the invention
The embodiment of the present application provides the acquisition methods and device of a kind of site information, at least to solve to climb in the related technology Worm, which is not applied for most webpages, leads to the higher problem of maintenance cost.
According to one embodiment of the application, a kind of acquisition methods of site information are provided, comprising: according to preset rules It identifies the log-in interface of website, and logs on to the website;In the site information of the website, obtain and default template middle finger Corresponding first site information of the first information shown.
According to another embodiment of the application, a kind of acquisition device of site information is additionally provided, comprising: identification mould Block for the log-in interface according to preset rules identification website, and logs on to the website;Module is obtained, in the net In the site information stood, the first site information corresponding with the first information indicated in default template is obtained.
According to another embodiment of the application, a kind of storage medium is additionally provided, meter is stored in the storage medium Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
According to another embodiment of the application, a kind of electronic device, including memory and processor are additionally provided, it is described Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described Step in embodiment of the method.
By the application, according to the log-in interface of preset rules identification website, account and password etc. are inputted in corresponding position, Then the website is logged on to, relevant first site information of the first information indicated in default template is obtained in the website, is mentioned It takes first site information and exports, using the above scheme, solve crawler in the related technology and be not applied for most webpages and lead The higher problem of maintenance cost is caused, above-mentioned preset rules can be adapted for logging in for most websites, provide in default template wait obtain The feature of the information taken increases the scope of application of crawler, reduces the susceptibility to different web pages, no longer for different web pages Program code is substantially modified in modification, reduces maintenance cost.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 is a kind of hardware block diagram of the mobile terminal of the acquisition methods of site information of the embodiment of the present application;
Fig. 2 is the flow chart according to the acquisition methods of the site information of the embodiment of the present application;
Fig. 3 is the interface schematic diagram according to traditional crawler in the related technology;
Fig. 4 is the detailed schematic diagram according to traditional crawler acquisition site information in the related technology;
Fig. 5 is the configuration diagram according to the intelligent crawler of the application another embodiment;
Fig. 6 is the method flow diagram that webpage information is obtained according to the intelligent crawler of the application another embodiment;
Fig. 7 is the schematic diagram according to the intelligent crawler identification log-in interface of the application another embodiment;
Fig. 8 is the schematic diagram according to the intelligent crawler identification web page code of the application another embodiment;
Fig. 9 is the JSON structural schematic diagram according to another embodiment of the application.
Specific embodiment
The application is described in detail below with reference to attached drawing and in conjunction with the embodiments.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
Technical solution in present specification can be applied to terminal, is kind of a crawler scheme, can pass through one section Program or scripting language realize the technical solution of the application.
Crawler is a kind of program or script that web message is automatically grabbed according to certain rule.
Embodiment one
Embodiment of the method provided by the embodiment of the present application one can be in mobile terminal, terminal or similar fortune It calculates and is executed in device.For running on mobile terminals, Fig. 1 is a kind of acquisition methods of site information of the embodiment of the present application Mobile terminal hardware block diagram, as shown in Figure 1, mobile terminal may include one or more (only showing one in Fig. 1 It is a) (processor 102 can include but is not limited to the processing of Micro-processor MCV or programmable logic device FPGA etc. to processor 102 Device) and memory 104 for storing data, optionally, above-mentioned mobile terminal can also include the biography for communication function Defeated device 106 and input-output equipment 108.It will appreciated by the skilled person that structure shown in FIG. 1 is only to show Meaning, does not cause to limit to the structure of above-mentioned mobile terminal.For example, mobile terminal may also include it is more than shown in Fig. 1 or The less component of person, or with the configuration different from shown in Fig. 1.
Memory 104 can be used for storing the software program and module of application software, such as the website in the embodiment of the present application Corresponding program instruction/the module of the acquisition methods of information, the software journey that processor 102 is stored in memory 104 by operation Sequence and module realize above-mentioned method thereby executing various function application and data processing.Memory 104 may include High speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory or its His non-volatile solid state memory.In some instances, memory 104 can further comprise remotely setting relative to processor 102 The memory set, these remote memories can pass through network connection to mobile terminal.The example of above-mentioned network includes but unlimited In internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 106 is used to that data to be received or sent via a network.Above-mentioned network specific example may include The wireless network that the communication providers of mobile terminal provide.In an example, transmitting device 106 includes a network adapter (Network Interface Controller, NIC), can be connected by base station with other network equipments so as to interconnection Net is communicated.In an example, transmitting device 106 can be radio frequency (Radio Frequency, RF) module, be used for Wirelessly communicated with internet.
A kind of acquisition methods of site information for running on above-mentioned terminal are provided in the present embodiment, and Fig. 2 is According to the flow chart of the acquisition methods of the site information of the embodiment of the present application, as shown in Fig. 2, the process includes the following steps:
Step S202 according to the log-in interface of preset rules identification website, and logs on to the website;
Above scheme can be by program or script realization, also referred to as crawler.Identification website log-in interface may include Identify the position of account frame, the effect that each input frame in log-in interface is told in the position etc. of password box, i.e. program automatically.
Step S204 is obtained corresponding with the first information indicated in default template in the site information of the website First site information.
After through the safety verification of website, log on to website, according to mode of operation specified in default template or Information step-by-step clickthrough, acquisition can specify the page info in link.
Through the above steps, the log-in interface according to preset rules identification website inputs account and password in corresponding position Deng, the website is then logged on to, relevant first site information of the first information indicated in default template is obtained in the website, It extracts first site information and exports, using the above scheme, solve crawler in the related technology and be not applied for most webpages Lead to the higher problem of maintenance cost, above-mentioned preset rules can be adapted for logging in for most websites, preset template in regulation to The feature of the information of acquisition increases the scope of application of crawler, reduces the susceptibility to different web pages, not for different web pages It modifies again and substantially modifies program code, reduce maintenance cost.
Optionally, the log-in interface according to preset rules identification website, including at least one in the following manner identification institute It states the password box in log-in interface: obtaining the page code of the log-in interface, inquire type=in the code The element of password identifies that the element is the password box;Inquiry is converted to the first defeated of mask form for information is inputted Enter frame, identifies that first input frame is the password box.Using the above scheme, the position of password box can be accurately identified.
Optionally, it after identifying the password box in the log-in interface, is identified in preset range around the password box Second input frame identifies that the third input frame for meeting at least one the following conditions in second input frame is the account frame: The second input frame for identifying the non-hidden of type of the input of input frame is the third input frame;Identify the value of input frame Or it is the third input frame that title or placeholder, which meet the second input frame of preset field,.
Optionally, the website is logged on to, comprising: input account information on the password box and the account frame;And Picture validation code is identified using the first model, and completes to verify, wherein first model is logical using multi-group data Cross what machine learning trained, every group of data in the multi-group data include;In picture validation code and picture validation code Character.Using machine learning, accurately the content in picture validation code can be identified, increase the success for logging in website Rate.The identification step of the picture validation code can be crawler download pictures identifying code, transfer to background processor according to machine learning Model is verified.
Optionally, obtain corresponding with the first information indicated in default template the first site information, including it is following at least One of:
Obtain the site information in the hyperlink indicated in the first information, crawler can automatically in web station interface into Row clicks corresponding link;
The site information at the critical field indicated in the first information is obtained, the page identidication key of webpage is passed through Section;
The site information in the table indicated in the first information is obtained, may include a table in specified hyperlink Lattice, the partial information in the available table;
The site information in the page location indicated in the first information is obtained, which can be website pages Face is divided into three columns, obtains the content in the first column;
The site information in the period indicated in the first information is obtained, for example, being stored with over one in the website The online shopping of the county the Nian Mou people records, and this programme can only obtain one month in the past online shopping record according to the period.
Below with reference to the application, another embodiment is illustrated.
In the following, the scheme of another embodiment of the application offer is described with intelligent crawler, with traditional crawler To indicate crawler scheme in the related technology.
Fig. 3 is according to the interface schematic diagram of traditional crawler in the related technology, as shown in figure 3, the tradition crawler software can To select website to be collected, such as Taobao, No. 1 shop etc..
Fig. 4 is the detailed schematic diagram that site information is acquired according to traditional crawler in the related technology, as shown in figure 4, this Some data of Taobao are acquired using traditional crawler technology.
Intelligent crawler in another embodiment of the application is mainly focused on observation and simulation to user behavior, relies on clear It lookes at whole behaviors of device, is solely focused on the page of final rendering.
Intelligent crawler realizes that the service of targeted website automated log on, picture validation code automatic identification service are based on template page Crawl service, the data extraction service based on template page.
The automated log on of intelligent crawler simulates the behavior of people, needs to input usemame/password/test from login page search Then the input frame of card code etc. automatically enters, simulate the behavior of people, searches for " login " button that can be clicked.Compared to traditional Crawler needs the position of designated user's name input frame, Password Input frame, identifying code input frame and login button;Intelligent crawler meeting These elements are searched on the page automatically automatically according to the rule being determined in advance, is filled and identifies automatically.
Fig. 5 is according to the configuration diagram of the intelligent crawler of the application another embodiment, as shown in figure 5, including passing through Terminal configuration template operation module can increase template, modify template, delete template and template is shown.Then interface is crawled Module can parse template with validation template, crawl request by terminal initiation, while further including persistence platform mould in the framework Block, operating system and computer techno-stress infrastructure, the persistence platform can store in page parsing memory module Above content.
Fig. 6 is the method flow diagram that webpage information is obtained according to the intelligent crawler of the application another embodiment, such as Fig. 6 institute Show, comprising the following steps:
Step 1, user crawls request to request dissemination system initiation by operation system;
Step 2, request dissemination system to operation system feedback response information;
Step 3, request dissemination system calls intelligent crawler system to parsing mapping services module;
Step 4, parsing mapping services module crawls request to server transmission is crawled;
Step 5, it crawls server execution and crawls service, and return and crawl result;
Step 6, parsing mapping services resume module crawls result;
Step 7, parsing mapping services crawl result to request dissemination system return;
Step 8, request dissemination system adjusts back the above-mentioned result that crawls to operation system.
Fig. 7 be according to the schematic diagram of the intelligent crawler identification log-in interface of the application another embodiment, as shown in fig. 7, User ID input frame, and the password box and the login button that input password etc. in intelligent crawler detection log-in interface.
Fig. 8 be according to the schematic diagram of the intelligent crawler identification web page code of the application another embodiment, as shown in figure 8, Traditional crawler can only specify the id=userid or name=userid of input frame, the id=pwd or name=pwd of password box, The id=Submit or name=Submit of login frame, if website slightly correcting, for example the id of password box is changed to Then entire program will not all work password or name=password.Compared to the former, intelligent crawler can exist according to rule The element of type=password is found on the page, according to the specification of HTML, the input element that general type is password is Password box, all will not be to crawling regardless of which kind of title id/name is changed to according to such method accuracy rate is investigated 99% or more Behavior generates any influence!The input frame rule of user name: statistics discovery, user name input frame in most cases can be close It near code frame, and is the upper surface of password box, the type according to the input of the additional user name input frame of this characteristic will not be Hidden, along with the verification of value or title or placeholder placeholder, the accuracy rate of user name input frame discovery exists 96% or so, login button, the discovery of picture validation code input frame are similar with the discovery of user name input frame.
The picture validation code identification of intelligent crawler, simple picture validation code.Use the biography based on square TesserAct System optical character identification (Opticial Character Recognition) OCR identification, medium or complicated identifying code;Make With the identification model of the convolutional calculation training based on tensor stream TensorFlow.Intelligent crawler has carried out largely picture validation code Training, TesserAct and TensorFlow are active and standby each other, jointly for identifying code identification service, make log in do not needing hand Dynamic input identifying code, reduces the interaction times for crawling program with user, to improve the efficiency entirely crawled!
The page of intelligent crawler grabs, and using template-driven, which page needs to download, and how to enter these pages, fixed Adopted template finds personal information/details page, and the obtained page is exactly target pages.Definition template finds details note The page is recorded, such as: " initial time " and " end time " is searched in the page of loading, " inquiry " link is then searched, in mesh The mark page looks for first, and until N, the obtained page is exactly target pages.
Intelligence crawls page crawl based entirely on browser and template, according to the page, designated program when making template Which hyperlink is clicked, which button, which menu, required data are local in table or other non-tables, if Table needs some obvious features of specified table;Program can be gone according to the specified hyperlink or button of template It clicks in turn, then, table is found according to the feature of specified table, the data in choosing are organized into a JSON with gauge outfit key For the data of structure for down-stream processing, Fig. 9 is according to the JSON structural schematic diagram of another embodiment of the application, such as Fig. 9 institute Show, such as: " account management/personal information/" program first clicks account management, and the page connects clicks personal information again after completion Then the data of the page are extracted.
Traditional crawler can take the first column data to be put into insurance type insuranceType according to from table, if insurance The case where insurance kind is put into secondary series, then program just will appear error in data.Upper figure is the table extracted in the page according to template Data, gauge outfit are followed successively by the type of insurance, account type, traffic flag etc..Mapping template can provide " type of insurance " this word Section will be mapped to insuranceType field, and intelligent crawler is completely insensitive to grid column adjustment, always correctly be counted According to.
The page-map intelligently crawled, which is similar to, turns to object JSON data sequence, different from universal sequence chemical industry tool, Data mapping module can be defined according to mapping template, the data of above-mentioned extraction are mapped as the object instance of business needs.To For service logic subsequent processing.
Generally speaking, intelligent crawler can make developer free from heavy development task, not need to go to close again Infuse the HTML details for the page to be crawled, it is only necessary to according to the content seen on the page, design rule template.It can mention High development efficiency reduces the probability that mistake occurs, and can cope with the small correcting of website.To reduce the influence to business!!!
Using the above scheme, following technical effect is realized: element needed for automatic discovery logs in, and fill automatically;It is based on The page data of rule extracts, and regular definition determines the robustness that data are extracted;Crawl result maps directly to any mesh Mark data structure;Log in intelligent recognition.The recognition methods supported at present is that have text description explanation, input frame background text, HTML element style etc. can expand the range to wider functional identification field;The definition of template can be Hand writing It generates, is also possible to special tool and generates.The not false limitation of the grammer of template.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, the technical solution of the application is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the application.
Embodiment two
Additionally provide a kind of acquisition device of site information in the present embodiment, the device for realizing above-described embodiment and Preferred embodiment, the descriptions that have already been made will not be repeated.As used below, predetermined function may be implemented in term " module " The combination of the software and/or hardware of energy.It is hard although device described in following embodiment is preferably realized with software The realization of the combination of part or software and hardware is also that may and be contemplated.
According to another embodiment of the application, a kind of acquisition device of site information is additionally provided, comprising:
Identification module for the log-in interface according to preset rules identification website, and logs on to the website;
Module is obtained, the first information pair in the site information of the website, obtaining with indicating in default template The first site information answered.
Through the above steps, the log-in interface according to preset rules identification website inputs account and password in corresponding position Deng, the website is then logged on to, relevant first site information of the first information indicated in default template is obtained in the website, It extracts first site information and exports, using the above scheme, solve crawler in the related technology and be not applied for most webpages Lead to the higher problem of maintenance cost, above-mentioned preset rules can be adapted for logging in for most websites, preset template in regulation to The feature of the information of acquisition increases the scope of application of crawler, reduces the susceptibility to different web pages, not for different web pages It modifies again and substantially modifies program code, reduce maintenance cost.
Optionally, log-in interface of the identification module according to preset rules identification website, including in the following manner extremely Password box in one of few identification log-in interface: the page code of the log-in interface is obtained, is inquired in the code The element of type=password identifies that the element is the password box;Input information is converted to mask form by inquiry First input frame identifies that first input frame is the password box.
Optionally, the identification module is also used to after identifying the password box in the log-in interface in the password The second input frame is identified around frame in preset range, identifies the third for meeting at least one the following conditions in second input frame Input frame is the account frame: the second input frame for identifying the non-hidden of the type of the input of input frame is third input Frame;Identify that the value or title of input frame or the second input frame that placeholder meets preset field are that the third is defeated Enter frame.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any Combined form is located in different processors.
Embodiment three
Embodiments herein additionally provides a kind of storage medium.Optionally, in the present embodiment, above-mentioned storage medium can To be arranged to store the program code for executing following steps:
S1 according to the log-in interface of preset rules identification website, and logs on to the website;
S2 obtains the first net corresponding with the first information indicated in default template in the site information of the website It stands information.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or The various media that can store program code such as CD.
Embodiments herein additionally provides a kind of electronic device, including memory and processor, stores in the memory There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method Suddenly.
Optionally, above-mentioned electronic device can also include transmitting device and input-output equipment, wherein the transmitting device It is connected with above-mentioned processor, which connects with above-mentioned processor.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
S1 according to the log-in interface of preset rules identification website, and logs on to the website;
S2 obtains the first net corresponding with the first information indicated in default template in the site information of the website It stands information.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment Example, details are not described herein for the present embodiment.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment Example, details are not described herein for the present embodiment.
Obviously, those skilled in the art should be understood that each module of above-mentioned the application or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or Step is fabricated to single integrated circuit module to realize.It is combined in this way, the application is not limited to any specific hardware and software.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.

Claims (10)

1. a kind of acquisition methods of site information characterized by comprising
According to the log-in interface of preset rules identification website, and log on to the website;
In the site information of the website, the first site information corresponding with the first information indicated in default template is obtained.
2. the method according to claim 1, wherein according to preset rules identification website log-in interface, including At least one the in the following manner password box in the identification log-in interface:
The page code for obtaining the log-in interface inquires the element of type=password, described in identification in the code Element is the password box;
Inquiry is converted to the first input frame of mask form by information is inputted, and identifies that first input frame is the password box.
3. according to the method described in claim 2, it is characterized in that, after identifying the password box in the log-in interface, in institute It states and identifies the second input frame around password box in preset range, identify in second input frame and meet at least one the following conditions Third input frame be account frame:
The second input frame for identifying the non-hidden of type of the input of input frame is the third input frame;
Identify that the value or title of input frame or the second input frame that placeholder meets preset field are that the third is defeated Enter frame.
4. according to the method described in claim 3, it is characterized in that, logging on to the website, comprising:
Account information is inputted on the password box and the account frame;
And picture validation code is identified using the first model, and complete to verify, wherein first model is to use multiple groups Data by machine learning train come, every group of data in the multi-group data include;Picture validation code and picture are tested Demonstrate,prove the character in code.
5. the method according to claim 1, wherein obtaining corresponding with the first information indicated in default template First site information, including at least one of:
Obtain the site information in the hyperlink indicated in the first information;
Obtain the site information at the critical field indicated in the first information;
Obtain the site information in the table indicated in the first information;
Obtain the site information in the page location indicated in the first information;
Obtain the site information in the period indicated in the first information.
6. a kind of acquisition device of site information characterized by comprising
Identification module for the log-in interface according to preset rules identification website, and logs on to the website;
Module is obtained, for obtaining corresponding with the first information indicated in default template in the site information of the website First site information.
7. device according to claim 6, which is characterized in that the identification module is stepped on according to preset rules identification website Lithosphere face, including the password box at least one the in the following manner identification log-in interface:
The page code for obtaining the log-in interface inquires the element of type=password, described in identification in the code Element is the password box;
Inquiry is converted to the first input frame of mask form by information is inputted, and identifies that first input frame is the password box.
8. device according to claim 7, which is characterized in that the identification module is close in the identification log-in interface After code frame, it is also used to identify the second input frame around the password box in preset range, identifies in second input frame The third input frame for meeting at least one the following conditions is account frame:
The second input frame for identifying the non-hidden of type of the input of input frame is the third input frame;
Identify that the value or title of input frame or the second input frame that placeholder meets preset field are that the third is defeated Enter frame.
9. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer Program is arranged to execute method described in any one of claim 1 to 5 when operation.
10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to run the computer program to execute side described in any one of claim 1 to 5 Method.
CN201811279690.3A 2018-10-30 2018-10-30 The acquisition methods and device of site information Pending CN109460522A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811279690.3A CN109460522A (en) 2018-10-30 2018-10-30 The acquisition methods and device of site information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811279690.3A CN109460522A (en) 2018-10-30 2018-10-30 The acquisition methods and device of site information

Publications (1)

Publication Number Publication Date
CN109460522A true CN109460522A (en) 2019-03-12

Family

ID=65608879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811279690.3A Pending CN109460522A (en) 2018-10-30 2018-10-30 The acquisition methods and device of site information

Country Status (1)

Country Link
CN (1) CN109460522A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541104A (en) * 2019-09-20 2021-03-23 浙江大搜车软件技术有限公司 Data capturing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070250514A1 (en) * 2006-04-25 2007-10-25 Saeed Rajput Browsing and monitoring the web through learning and ingemination
CN103034711A (en) * 2012-12-10 2013-04-10 北京金山安全软件有限公司 Form recognition method and device
CN107229669A (en) * 2016-03-23 2017-10-03 塔塔咨询服务公司 Method and system for selecting the sample set on assessing website Barrien-free
CN108268635A (en) * 2018-01-17 2018-07-10 百度在线网络技术(北京)有限公司 For obtaining the method and apparatus of data
CN108334585A (en) * 2018-01-29 2018-07-27 湖北省楚天云有限公司 A kind of spiders method, apparatus and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070250514A1 (en) * 2006-04-25 2007-10-25 Saeed Rajput Browsing and monitoring the web through learning and ingemination
CN103034711A (en) * 2012-12-10 2013-04-10 北京金山安全软件有限公司 Form recognition method and device
CN107229669A (en) * 2016-03-23 2017-10-03 塔塔咨询服务公司 Method and system for selecting the sample set on assessing website Barrien-free
CN108268635A (en) * 2018-01-17 2018-07-10 百度在线网络技术(北京)有限公司 For obtaining the method and apparatus of data
CN108334585A (en) * 2018-01-29 2018-07-27 湖北省楚天云有限公司 A kind of spiders method, apparatus and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541104A (en) * 2019-09-20 2021-03-23 浙江大搜车软件技术有限公司 Data capturing method and device

Similar Documents

Publication Publication Date Title
CN104766014B (en) For detecting the method and system of malice network address
CN103605738B (en) Web page access data statistical method and device
US20160140626A1 (en) Web page advertisement configuration and optimization with visual editor and automatic website and webpage analysis
CN103888490B (en) A kind of man-machine knowledge method for distinguishing of full automatic WEB client side
CN109376291B (en) Website fingerprint information scanning method and device based on web crawler
CN107958016A (en) Function pages method for customizing and application server
CN111311136A (en) Wind control decision method, computer equipment and storage medium
US20210125131A1 (en) Electronic device, method for constructing scoring model of retail outlets, system, and computer readable medium
CN107609150A (en) A kind of interactive network reptile creation method chosen based on page elements and system
CN106503111B (en) Webpage code-transferring method, device and client terminal
CN110083752A (en) Information of real estate recommended method, device, equipment and storage medium
CN108763274A (en) Recognition methods, device, electronic equipment and the storage medium of access request
CN106878108A (en) Network flow playback method of testing and device
CN105718533A (en) Information pushing method and device
CN107340954A (en) A kind of information extracting method and device
CN109729044A (en) A kind of general internet data acquisition is counter to climb system and method
CN110083755A (en) A kind of high emulation parsing web-page approach, device and electronic equipment
CN111784301A (en) User portrait construction method and device, storage medium and electronic equipment
CN110134844A (en) Subdivision field public sentiment monitoring method, device, computer equipment and storage medium
CN104023025A (en) Website security vulnerability detection method and device based on service rules
CN104462242B (en) Webpage capacity of returns statistical method and device
CN113392306B (en) Information interaction method, information interaction device, terminal and storage medium
CN103336693B (en) The creation method of refer chain, device and security detection equipment
CN106294406A (en) A kind of method and apparatus accessing data for processing application
CN114398138A (en) Interface generation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190312

RJ01 Rejection of invention patent application after publication