CN109460522A - The acquisition methods and device of site information - Google Patents
The acquisition methods and device of site information Download PDFInfo
- Publication number
- CN109460522A CN109460522A CN201811279690.3A CN201811279690A CN109460522A CN 109460522 A CN109460522 A CN 109460522A CN 201811279690 A CN201811279690 A CN 201811279690A CN 109460522 A CN109460522 A CN 109460522A
- Authority
- CN
- China
- Prior art keywords
- information
- input frame
- website
- site information
- log
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
This application provides a kind of acquisition methods of site information and devices, wherein, this method comprises: the log-in interface according to preset rules identification website, account and password etc. are inputted in corresponding position, then the website is logged on to, relevant first site information of the first information indicated in default template is obtained in the website, it extracts first site information and exports, using the above scheme, solve the problems, such as that crawler is not applied for most webpages and causes maintenance cost higher in the related technology, above-mentioned preset rules can be adapted for logging in for most websites, the feature of information to be obtained is provided in default template, increase the scope of application of crawler, reduce the susceptibility to different web pages, it is no longer modified for different web pages and substantially modifies program code, reduce maintenance cost.
Description
Technical field
This application involves but be not limited to internet area, in particular to the acquisition methods and dress of a kind of site information
It sets.
Background technique
In the related art, crawler needs to be concerned about that user logs in logic, needs to be concerned about picture validation code, and care is needed to crawl
URL, often crawl a website and need to develop a set of program code, the realization of targeted website is relied on serious, robustness is not strong,
It crawls result and needs stringent parsing requirement, deeply understand that details is realized in targeted website, maintenance cost is relatively high, and development efficiency
It is slow, be not able to satisfy it is higher and faster in business crawl, it is sensitive to target pages, be unable to automatic adaptation major part page-level
Correcting, maintenance cost are relatively high.
Being not applied for most webpages for crawler in the related technology leads to the higher problem of maintenance cost, there is presently no
Effective solution scheme.
Summary of the invention
The embodiment of the present application provides the acquisition methods and device of a kind of site information, at least to solve to climb in the related technology
Worm, which is not applied for most webpages, leads to the higher problem of maintenance cost.
According to one embodiment of the application, a kind of acquisition methods of site information are provided, comprising: according to preset rules
It identifies the log-in interface of website, and logs on to the website;In the site information of the website, obtain and default template middle finger
Corresponding first site information of the first information shown.
According to another embodiment of the application, a kind of acquisition device of site information is additionally provided, comprising: identification mould
Block for the log-in interface according to preset rules identification website, and logs on to the website;Module is obtained, in the net
In the site information stood, the first site information corresponding with the first information indicated in default template is obtained.
According to another embodiment of the application, a kind of storage medium is additionally provided, meter is stored in the storage medium
Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
According to another embodiment of the application, a kind of electronic device, including memory and processor are additionally provided, it is described
Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described
Step in embodiment of the method.
By the application, according to the log-in interface of preset rules identification website, account and password etc. are inputted in corresponding position,
Then the website is logged on to, relevant first site information of the first information indicated in default template is obtained in the website, is mentioned
It takes first site information and exports, using the above scheme, solve crawler in the related technology and be not applied for most webpages and lead
The higher problem of maintenance cost is caused, above-mentioned preset rules can be adapted for logging in for most websites, provide in default template wait obtain
The feature of the information taken increases the scope of application of crawler, reduces the susceptibility to different web pages, no longer for different web pages
Program code is substantially modified in modification, reduces maintenance cost.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen
Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 is a kind of hardware block diagram of the mobile terminal of the acquisition methods of site information of the embodiment of the present application;
Fig. 2 is the flow chart according to the acquisition methods of the site information of the embodiment of the present application;
Fig. 3 is the interface schematic diagram according to traditional crawler in the related technology;
Fig. 4 is the detailed schematic diagram according to traditional crawler acquisition site information in the related technology;
Fig. 5 is the configuration diagram according to the intelligent crawler of the application another embodiment;
Fig. 6 is the method flow diagram that webpage information is obtained according to the intelligent crawler of the application another embodiment;
Fig. 7 is the schematic diagram according to the intelligent crawler identification log-in interface of the application another embodiment;
Fig. 8 is the schematic diagram according to the intelligent crawler identification web page code of the application another embodiment;
Fig. 9 is the JSON structural schematic diagram according to another embodiment of the application.
Specific embodiment
The application is described in detail below with reference to attached drawing and in conjunction with the embodiments.It should be noted that not conflicting
In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
Technical solution in present specification can be applied to terminal, is kind of a crawler scheme, can pass through one section
Program or scripting language realize the technical solution of the application.
Crawler is a kind of program or script that web message is automatically grabbed according to certain rule.
Embodiment one
Embodiment of the method provided by the embodiment of the present application one can be in mobile terminal, terminal or similar fortune
It calculates and is executed in device.For running on mobile terminals, Fig. 1 is a kind of acquisition methods of site information of the embodiment of the present application
Mobile terminal hardware block diagram, as shown in Figure 1, mobile terminal may include one or more (only showing one in Fig. 1
It is a) (processor 102 can include but is not limited to the processing of Micro-processor MCV or programmable logic device FPGA etc. to processor 102
Device) and memory 104 for storing data, optionally, above-mentioned mobile terminal can also include the biography for communication function
Defeated device 106 and input-output equipment 108.It will appreciated by the skilled person that structure shown in FIG. 1 is only to show
Meaning, does not cause to limit to the structure of above-mentioned mobile terminal.For example, mobile terminal may also include it is more than shown in Fig. 1 or
The less component of person, or with the configuration different from shown in Fig. 1.
Memory 104 can be used for storing the software program and module of application software, such as the website in the embodiment of the present application
Corresponding program instruction/the module of the acquisition methods of information, the software journey that processor 102 is stored in memory 104 by operation
Sequence and module realize above-mentioned method thereby executing various function application and data processing.Memory 104 may include
High speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory or its
His non-volatile solid state memory.In some instances, memory 104 can further comprise remotely setting relative to processor 102
The memory set, these remote memories can pass through network connection to mobile terminal.The example of above-mentioned network includes but unlimited
In internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 106 is used to that data to be received or sent via a network.Above-mentioned network specific example may include
The wireless network that the communication providers of mobile terminal provide.In an example, transmitting device 106 includes a network adapter
(Network Interface Controller, NIC), can be connected by base station with other network equipments so as to interconnection
Net is communicated.In an example, transmitting device 106 can be radio frequency (Radio Frequency, RF) module, be used for
Wirelessly communicated with internet.
A kind of acquisition methods of site information for running on above-mentioned terminal are provided in the present embodiment, and Fig. 2 is
According to the flow chart of the acquisition methods of the site information of the embodiment of the present application, as shown in Fig. 2, the process includes the following steps:
Step S202 according to the log-in interface of preset rules identification website, and logs on to the website;
Above scheme can be by program or script realization, also referred to as crawler.Identification website log-in interface may include
Identify the position of account frame, the effect that each input frame in log-in interface is told in the position etc. of password box, i.e. program automatically.
Step S204 is obtained corresponding with the first information indicated in default template in the site information of the website
First site information.
After through the safety verification of website, log on to website, according to mode of operation specified in default template or
Information step-by-step clickthrough, acquisition can specify the page info in link.
Through the above steps, the log-in interface according to preset rules identification website inputs account and password in corresponding position
Deng, the website is then logged on to, relevant first site information of the first information indicated in default template is obtained in the website,
It extracts first site information and exports, using the above scheme, solve crawler in the related technology and be not applied for most webpages
Lead to the higher problem of maintenance cost, above-mentioned preset rules can be adapted for logging in for most websites, preset template in regulation to
The feature of the information of acquisition increases the scope of application of crawler, reduces the susceptibility to different web pages, not for different web pages
It modifies again and substantially modifies program code, reduce maintenance cost.
Optionally, the log-in interface according to preset rules identification website, including at least one in the following manner identification institute
It states the password box in log-in interface: obtaining the page code of the log-in interface, inquire type=in the code
The element of password identifies that the element is the password box;Inquiry is converted to the first defeated of mask form for information is inputted
Enter frame, identifies that first input frame is the password box.Using the above scheme, the position of password box can be accurately identified.
Optionally, it after identifying the password box in the log-in interface, is identified in preset range around the password box
Second input frame identifies that the third input frame for meeting at least one the following conditions in second input frame is the account frame:
The second input frame for identifying the non-hidden of type of the input of input frame is the third input frame;Identify the value of input frame
Or it is the third input frame that title or placeholder, which meet the second input frame of preset field,.
Optionally, the website is logged on to, comprising: input account information on the password box and the account frame;And
Picture validation code is identified using the first model, and completes to verify, wherein first model is logical using multi-group data
Cross what machine learning trained, every group of data in the multi-group data include;In picture validation code and picture validation code
Character.Using machine learning, accurately the content in picture validation code can be identified, increase the success for logging in website
Rate.The identification step of the picture validation code can be crawler download pictures identifying code, transfer to background processor according to machine learning
Model is verified.
Optionally, obtain corresponding with the first information indicated in default template the first site information, including it is following at least
One of:
Obtain the site information in the hyperlink indicated in the first information, crawler can automatically in web station interface into
Row clicks corresponding link;
The site information at the critical field indicated in the first information is obtained, the page identidication key of webpage is passed through
Section;
The site information in the table indicated in the first information is obtained, may include a table in specified hyperlink
Lattice, the partial information in the available table;
The site information in the page location indicated in the first information is obtained, which can be website pages
Face is divided into three columns, obtains the content in the first column;
The site information in the period indicated in the first information is obtained, for example, being stored with over one in the website
The online shopping of the county the Nian Mou people records, and this programme can only obtain one month in the past online shopping record according to the period.
Below with reference to the application, another embodiment is illustrated.
In the following, the scheme of another embodiment of the application offer is described with intelligent crawler, with traditional crawler
To indicate crawler scheme in the related technology.
Fig. 3 is according to the interface schematic diagram of traditional crawler in the related technology, as shown in figure 3, the tradition crawler software can
To select website to be collected, such as Taobao, No. 1 shop etc..
Fig. 4 is the detailed schematic diagram that site information is acquired according to traditional crawler in the related technology, as shown in figure 4, this
Some data of Taobao are acquired using traditional crawler technology.
Intelligent crawler in another embodiment of the application is mainly focused on observation and simulation to user behavior, relies on clear
It lookes at whole behaviors of device, is solely focused on the page of final rendering.
Intelligent crawler realizes that the service of targeted website automated log on, picture validation code automatic identification service are based on template page
Crawl service, the data extraction service based on template page.
The automated log on of intelligent crawler simulates the behavior of people, needs to input usemame/password/test from login page search
Then the input frame of card code etc. automatically enters, simulate the behavior of people, searches for " login " button that can be clicked.Compared to traditional
Crawler needs the position of designated user's name input frame, Password Input frame, identifying code input frame and login button;Intelligent crawler meeting
These elements are searched on the page automatically automatically according to the rule being determined in advance, is filled and identifies automatically.
Fig. 5 is according to the configuration diagram of the intelligent crawler of the application another embodiment, as shown in figure 5, including passing through
Terminal configuration template operation module can increase template, modify template, delete template and template is shown.Then interface is crawled
Module can parse template with validation template, crawl request by terminal initiation, while further including persistence platform mould in the framework
Block, operating system and computer techno-stress infrastructure, the persistence platform can store in page parsing memory module
Above content.
Fig. 6 is the method flow diagram that webpage information is obtained according to the intelligent crawler of the application another embodiment, such as Fig. 6 institute
Show, comprising the following steps:
Step 1, user crawls request to request dissemination system initiation by operation system;
Step 2, request dissemination system to operation system feedback response information;
Step 3, request dissemination system calls intelligent crawler system to parsing mapping services module;
Step 4, parsing mapping services module crawls request to server transmission is crawled;
Step 5, it crawls server execution and crawls service, and return and crawl result;
Step 6, parsing mapping services resume module crawls result;
Step 7, parsing mapping services crawl result to request dissemination system return;
Step 8, request dissemination system adjusts back the above-mentioned result that crawls to operation system.
Fig. 7 be according to the schematic diagram of the intelligent crawler identification log-in interface of the application another embodiment, as shown in fig. 7,
User ID input frame, and the password box and the login button that input password etc. in intelligent crawler detection log-in interface.
Fig. 8 be according to the schematic diagram of the intelligent crawler identification web page code of the application another embodiment, as shown in figure 8,
Traditional crawler can only specify the id=userid or name=userid of input frame, the id=pwd or name=pwd of password box,
The id=Submit or name=Submit of login frame, if website slightly correcting, for example the id of password box is changed to
Then entire program will not all work password or name=password.Compared to the former, intelligent crawler can exist according to rule
The element of type=password is found on the page, according to the specification of HTML, the input element that general type is password is
Password box, all will not be to crawling regardless of which kind of title id/name is changed to according to such method accuracy rate is investigated 99% or more
Behavior generates any influence!The input frame rule of user name: statistics discovery, user name input frame in most cases can be close
It near code frame, and is the upper surface of password box, the type according to the input of the additional user name input frame of this characteristic will not be
Hidden, along with the verification of value or title or placeholder placeholder, the accuracy rate of user name input frame discovery exists
96% or so, login button, the discovery of picture validation code input frame are similar with the discovery of user name input frame.
The picture validation code identification of intelligent crawler, simple picture validation code.Use the biography based on square TesserAct
System optical character identification (Opticial Character Recognition) OCR identification, medium or complicated identifying code;Make
With the identification model of the convolutional calculation training based on tensor stream TensorFlow.Intelligent crawler has carried out largely picture validation code
Training, TesserAct and TensorFlow are active and standby each other, jointly for identifying code identification service, make log in do not needing hand
Dynamic input identifying code, reduces the interaction times for crawling program with user, to improve the efficiency entirely crawled!
The page of intelligent crawler grabs, and using template-driven, which page needs to download, and how to enter these pages, fixed
Adopted template finds personal information/details page, and the obtained page is exactly target pages.Definition template finds details note
The page is recorded, such as: " initial time " and " end time " is searched in the page of loading, " inquiry " link is then searched, in mesh
The mark page looks for first, and until N, the obtained page is exactly target pages.
Intelligence crawls page crawl based entirely on browser and template, according to the page, designated program when making template
Which hyperlink is clicked, which button, which menu, required data are local in table or other non-tables, if
Table needs some obvious features of specified table;Program can be gone according to the specified hyperlink or button of template
It clicks in turn, then, table is found according to the feature of specified table, the data in choosing are organized into a JSON with gauge outfit key
For the data of structure for down-stream processing, Fig. 9 is according to the JSON structural schematic diagram of another embodiment of the application, such as Fig. 9 institute
Show, such as: " account management/personal information/" program first clicks account management, and the page connects clicks personal information again after completion
Then the data of the page are extracted.
Traditional crawler can take the first column data to be put into insurance type insuranceType according to from table, if insurance
The case where insurance kind is put into secondary series, then program just will appear error in data.Upper figure is the table extracted in the page according to template
Data, gauge outfit are followed successively by the type of insurance, account type, traffic flag etc..Mapping template can provide " type of insurance " this word
Section will be mapped to insuranceType field, and intelligent crawler is completely insensitive to grid column adjustment, always correctly be counted
According to.
The page-map intelligently crawled, which is similar to, turns to object JSON data sequence, different from universal sequence chemical industry tool,
Data mapping module can be defined according to mapping template, the data of above-mentioned extraction are mapped as the object instance of business needs.To
For service logic subsequent processing.
Generally speaking, intelligent crawler can make developer free from heavy development task, not need to go to close again
Infuse the HTML details for the page to be crawled, it is only necessary to according to the content seen on the page, design rule template.It can mention
High development efficiency reduces the probability that mistake occurs, and can cope with the small correcting of website.To reduce the influence to business!!!
Using the above scheme, following technical effect is realized: element needed for automatic discovery logs in, and fill automatically;It is based on
The page data of rule extracts, and regular definition determines the robustness that data are extracted;Crawl result maps directly to any mesh
Mark data structure;Log in intelligent recognition.The recognition methods supported at present is that have text description explanation, input frame background text,
HTML element style etc. can expand the range to wider functional identification field;The definition of template can be Hand writing
It generates, is also possible to special tool and generates.The not false limitation of the grammer of template.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, the technical solution of the application is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the application.
Embodiment two
Additionally provide a kind of acquisition device of site information in the present embodiment, the device for realizing above-described embodiment and
Preferred embodiment, the descriptions that have already been made will not be repeated.As used below, predetermined function may be implemented in term " module "
The combination of the software and/or hardware of energy.It is hard although device described in following embodiment is preferably realized with software
The realization of the combination of part or software and hardware is also that may and be contemplated.
According to another embodiment of the application, a kind of acquisition device of site information is additionally provided, comprising:
Identification module for the log-in interface according to preset rules identification website, and logs on to the website;
Module is obtained, the first information pair in the site information of the website, obtaining with indicating in default template
The first site information answered.
Through the above steps, the log-in interface according to preset rules identification website inputs account and password in corresponding position
Deng, the website is then logged on to, relevant first site information of the first information indicated in default template is obtained in the website,
It extracts first site information and exports, using the above scheme, solve crawler in the related technology and be not applied for most webpages
Lead to the higher problem of maintenance cost, above-mentioned preset rules can be adapted for logging in for most websites, preset template in regulation to
The feature of the information of acquisition increases the scope of application of crawler, reduces the susceptibility to different web pages, not for different web pages
It modifies again and substantially modifies program code, reduce maintenance cost.
Optionally, log-in interface of the identification module according to preset rules identification website, including in the following manner extremely
Password box in one of few identification log-in interface: the page code of the log-in interface is obtained, is inquired in the code
The element of type=password identifies that the element is the password box;Input information is converted to mask form by inquiry
First input frame identifies that first input frame is the password box.
Optionally, the identification module is also used to after identifying the password box in the log-in interface in the password
The second input frame is identified around frame in preset range, identifies the third for meeting at least one the following conditions in second input frame
Input frame is the account frame: the second input frame for identifying the non-hidden of the type of the input of input frame is third input
Frame;Identify that the value or title of input frame or the second input frame that placeholder meets preset field are that the third is defeated
Enter frame.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong
Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any
Combined form is located in different processors.
Embodiment three
Embodiments herein additionally provides a kind of storage medium.Optionally, in the present embodiment, above-mentioned storage medium can
To be arranged to store the program code for executing following steps:
S1 according to the log-in interface of preset rules identification website, and logs on to the website;
S2 obtains the first net corresponding with the first information indicated in default template in the site information of the website
It stands information.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or
The various media that can store program code such as CD.
Embodiments herein additionally provides a kind of electronic device, including memory and processor, stores in the memory
There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method
Suddenly.
Optionally, above-mentioned electronic device can also include transmitting device and input-output equipment, wherein the transmitting device
It is connected with above-mentioned processor, which connects with above-mentioned processor.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
S1 according to the log-in interface of preset rules identification website, and logs on to the website;
S2 obtains the first net corresponding with the first information indicated in default template in the site information of the website
It stands information.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment
Example, details are not described herein for the present embodiment.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment
Example, details are not described herein for the present embodiment.
Obviously, those skilled in the art should be understood that each module of above-mentioned the application or each step can be with general
Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed
Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein
Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or
Step is fabricated to single integrated circuit module to realize.It is combined in this way, the application is not limited to any specific hardware and software.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field
For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair
Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.
Claims (10)
1. a kind of acquisition methods of site information characterized by comprising
According to the log-in interface of preset rules identification website, and log on to the website;
In the site information of the website, the first site information corresponding with the first information indicated in default template is obtained.
2. the method according to claim 1, wherein according to preset rules identification website log-in interface, including
At least one the in the following manner password box in the identification log-in interface:
The page code for obtaining the log-in interface inquires the element of type=password, described in identification in the code
Element is the password box;
Inquiry is converted to the first input frame of mask form by information is inputted, and identifies that first input frame is the password box.
3. according to the method described in claim 2, it is characterized in that, after identifying the password box in the log-in interface, in institute
It states and identifies the second input frame around password box in preset range, identify in second input frame and meet at least one the following conditions
Third input frame be account frame:
The second input frame for identifying the non-hidden of type of the input of input frame is the third input frame;
Identify that the value or title of input frame or the second input frame that placeholder meets preset field are that the third is defeated
Enter frame.
4. according to the method described in claim 3, it is characterized in that, logging on to the website, comprising:
Account information is inputted on the password box and the account frame;
And picture validation code is identified using the first model, and complete to verify, wherein first model is to use multiple groups
Data by machine learning train come, every group of data in the multi-group data include;Picture validation code and picture are tested
Demonstrate,prove the character in code.
5. the method according to claim 1, wherein obtaining corresponding with the first information indicated in default template
First site information, including at least one of:
Obtain the site information in the hyperlink indicated in the first information;
Obtain the site information at the critical field indicated in the first information;
Obtain the site information in the table indicated in the first information;
Obtain the site information in the page location indicated in the first information;
Obtain the site information in the period indicated in the first information.
6. a kind of acquisition device of site information characterized by comprising
Identification module for the log-in interface according to preset rules identification website, and logs on to the website;
Module is obtained, for obtaining corresponding with the first information indicated in default template in the site information of the website
First site information.
7. device according to claim 6, which is characterized in that the identification module is stepped on according to preset rules identification website
Lithosphere face, including the password box at least one the in the following manner identification log-in interface:
The page code for obtaining the log-in interface inquires the element of type=password, described in identification in the code
Element is the password box;
Inquiry is converted to the first input frame of mask form by information is inputted, and identifies that first input frame is the password box.
8. device according to claim 7, which is characterized in that the identification module is close in the identification log-in interface
After code frame, it is also used to identify the second input frame around the password box in preset range, identifies in second input frame
The third input frame for meeting at least one the following conditions is account frame:
The second input frame for identifying the non-hidden of type of the input of input frame is the third input frame;
Identify that the value or title of input frame or the second input frame that placeholder meets preset field are that the third is defeated
Enter frame.
9. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer
Program is arranged to execute method described in any one of claim 1 to 5 when operation.
10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory
Sequence, the processor are arranged to run the computer program to execute side described in any one of claim 1 to 5
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811279690.3A CN109460522A (en) | 2018-10-30 | 2018-10-30 | The acquisition methods and device of site information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811279690.3A CN109460522A (en) | 2018-10-30 | 2018-10-30 | The acquisition methods and device of site information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109460522A true CN109460522A (en) | 2019-03-12 |
Family
ID=65608879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811279690.3A Pending CN109460522A (en) | 2018-10-30 | 2018-10-30 | The acquisition methods and device of site information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109460522A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112541104A (en) * | 2019-09-20 | 2021-03-23 | 浙江大搜车软件技术有限公司 | Data capturing method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070250514A1 (en) * | 2006-04-25 | 2007-10-25 | Saeed Rajput | Browsing and monitoring the web through learning and ingemination |
CN103034711A (en) * | 2012-12-10 | 2013-04-10 | 北京金山安全软件有限公司 | Form recognition method and device |
CN107229669A (en) * | 2016-03-23 | 2017-10-03 | 塔塔咨询服务公司 | Method and system for selecting the sample set on assessing website Barrien-free |
CN108268635A (en) * | 2018-01-17 | 2018-07-10 | 百度在线网络技术(北京)有限公司 | For obtaining the method and apparatus of data |
CN108334585A (en) * | 2018-01-29 | 2018-07-27 | 湖北省楚天云有限公司 | A kind of spiders method, apparatus and electronic equipment |
-
2018
- 2018-10-30 CN CN201811279690.3A patent/CN109460522A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070250514A1 (en) * | 2006-04-25 | 2007-10-25 | Saeed Rajput | Browsing and monitoring the web through learning and ingemination |
CN103034711A (en) * | 2012-12-10 | 2013-04-10 | 北京金山安全软件有限公司 | Form recognition method and device |
CN107229669A (en) * | 2016-03-23 | 2017-10-03 | 塔塔咨询服务公司 | Method and system for selecting the sample set on assessing website Barrien-free |
CN108268635A (en) * | 2018-01-17 | 2018-07-10 | 百度在线网络技术(北京)有限公司 | For obtaining the method and apparatus of data |
CN108334585A (en) * | 2018-01-29 | 2018-07-27 | 湖北省楚天云有限公司 | A kind of spiders method, apparatus and electronic equipment |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112541104A (en) * | 2019-09-20 | 2021-03-23 | 浙江大搜车软件技术有限公司 | Data capturing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104766014B (en) | For detecting the method and system of malice network address | |
CN103605738B (en) | Web page access data statistical method and device | |
US20160140626A1 (en) | Web page advertisement configuration and optimization with visual editor and automatic website and webpage analysis | |
CN103888490B (en) | A kind of man-machine knowledge method for distinguishing of full automatic WEB client side | |
CN109376291B (en) | Website fingerprint information scanning method and device based on web crawler | |
CN107958016A (en) | Function pages method for customizing and application server | |
CN111311136A (en) | Wind control decision method, computer equipment and storage medium | |
US20210125131A1 (en) | Electronic device, method for constructing scoring model of retail outlets, system, and computer readable medium | |
CN107609150A (en) | A kind of interactive network reptile creation method chosen based on page elements and system | |
CN106503111B (en) | Webpage code-transferring method, device and client terminal | |
CN110083752A (en) | Information of real estate recommended method, device, equipment and storage medium | |
CN108763274A (en) | Recognition methods, device, electronic equipment and the storage medium of access request | |
CN106878108A (en) | Network flow playback method of testing and device | |
CN105718533A (en) | Information pushing method and device | |
CN107340954A (en) | A kind of information extracting method and device | |
CN109729044A (en) | A kind of general internet data acquisition is counter to climb system and method | |
CN110083755A (en) | A kind of high emulation parsing web-page approach, device and electronic equipment | |
CN111784301A (en) | User portrait construction method and device, storage medium and electronic equipment | |
CN110134844A (en) | Subdivision field public sentiment monitoring method, device, computer equipment and storage medium | |
CN104023025A (en) | Website security vulnerability detection method and device based on service rules | |
CN104462242B (en) | Webpage capacity of returns statistical method and device | |
CN113392306B (en) | Information interaction method, information interaction device, terminal and storage medium | |
CN103336693B (en) | The creation method of refer chain, device and security detection equipment | |
CN106294406A (en) | A kind of method and apparatus accessing data for processing application | |
CN114398138A (en) | Interface generation method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190312 |
|
RJ01 | Rejection of invention patent application after publication |