CN109815380A - A kind of information crawler method, apparatus, equipment and computer readable storage medium - Google Patents

A kind of information crawler method, apparatus, equipment and computer readable storage medium Download PDF

Info

Publication number
CN109815380A
CN109815380A CN201811564176.4A CN201811564176A CN109815380A CN 109815380 A CN109815380 A CN 109815380A CN 201811564176 A CN201811564176 A CN 201811564176A CN 109815380 A CN109815380 A CN 109815380A
Authority
CN
China
Prior art keywords
information
target webpage
identifying code
crawler
server background
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811564176.4A
Other languages
Chinese (zh)
Inventor
卢祎明
温尚卓
姜卓
张青
刘占魁
田冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
KUNSHAN CVIC SE Co Ltd
CVIC Software Engineering Co Ltd
Original Assignee
KUNSHAN CVIC SE Co Ltd
CVIC Software Engineering Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by KUNSHAN CVIC SE Co Ltd, CVIC Software Engineering Co Ltd filed Critical KUNSHAN CVIC SE Co Ltd
Priority to CN201811564176.4A priority Critical patent/CN109815380A/en
Publication of CN109815380A publication Critical patent/CN109815380A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a kind of information crawler method, apparatus, equipment and computer readable storage mediums, this method comprises: obtaining the corresponding information element of information for needing to inquire from server background;The information element is filled into the search box of target webpage, and triggers the target webpage pop-up identifying code;Browser, which is corresponded to, from the target webpage obtains the verification information that the extraneous corresponding verifying of personnel's progress identifying code obtains, if the verification information expression is verified, the verification information is then returned into the server background, to indicate that the crawlers of the server background operation carry out crawling for the information that need to be inquired.As it can be seen that for the website with identifying code, the application realizes information search and information crawler automatically, it is only necessary to manually participate in the corresponding verifying of identifying code, to realize that artificial minimize is intervened, semi-automatic information crawler mode is realized, to substantially increase acquisition of information speed.

Description

A kind of information crawler method, apparatus, equipment and computer readable storage medium
Technical field
The present invention relates to information crawler technical fields, more specifically to a kind of information crawler method, apparatus, equipment And computer readable storage medium.
Background technique
Internet information amount is huge at present, and various public informations can nearly all be obtained by network, and crawler is as one Kind computer program can get on crawl the information of needs from internet, and the convenience of crawlers is efficiently, full-automatic to locate Reason.But have number of site crawler crawls information in order to prevent, take a series of " counter to climb " measure, it is representative It is exactly that (identifying code is the abbreviation of the full-automatic turing test for distinguishing computer and the mankind to identifying code, be a kind of differentiations user is to count Calculation machine or the public full auto-programs of people), verification code technology can effectively distinguish people and computer program.Only pass through and has tested The verifying of card code can just continue to access, and encounter such case in the prior art and usually require manpower intervention realization, specifically, For there are the website that identifying code can not work normally crawlers, manual queries and acquisition in website by staff Information needed realizes information crawler relative to using crawlers, and it is slower that this undoubtedly will lead to acquisition of information.
In conclusion there are acquisition of information speed is slower when information in obtaining the website with identifying code in the prior art The problem of.
Summary of the invention
The object of the present invention is to provide a kind of information crawler method, apparatus, equipment and computer readable storage mediums, can Solve the problems, such as that existing acquisition of information is slow when information in obtaining the website with identifying code in the prior art.
To achieve the goals above, the invention provides the following technical scheme:
A kind of information crawler method, comprising:
The corresponding information element of information for needing to inquire is obtained from server background;
The information element is filled into the search box of target webpage, and triggers the target webpage pop-up identifying code;
Browser, which is corresponded to, from the target webpage obtains the verification information that the extraneous corresponding verifying of personnel's progress identifying code obtains, If the verification information expression is verified, the verification information is returned into the server background, described in instruction The crawlers of server background operation carry out crawling for the information that need to be inquired.
Preferably, after triggering target webpage pop-up identifying code, further includes:
Output prompts extraneous personnel to carry out the prompt tone that identifying code corresponds to verifying.
Preferably, the target webpage pop-up identifying code is triggered, comprising:
The corresponding inquiry button of automatically clicking described search frame, to trigger the target webpage pop-up identifying code.
Preferably, the information element is filled into the search box of target webpage, comprising:
The information element is filled into the target using the operation html page interface technology that the browser provides In the search box of webpage.
A kind of information crawler device, comprising:
Module is obtained, is used for: obtaining the corresponding information element of information for needing to inquire from server background;
Trigger module is used for: the information element being filled into the search box of target webpage, and triggers the target network Page pop-up identifying code;
Return module is used for: being corresponded to browser from the target webpage and is obtained the extraneous corresponding verifying of personnel's progress identifying code The verification information is returned to the server if verification information expression is verified by obtained verification information From the background, to indicate that the crawlers of the server background operation carry out crawling for the information that need to be inquired.
Preferably, further includes:
Output module is used for: after triggering the target webpage pop-up identifying code, output prompts extraneous personnel to verify The prompt tone of the corresponding verifying of code.
Preferably, the trigger module includes:
Trigger unit is used for: the corresponding inquiry button of automatically clicking described search frame, to trigger the target webpage pop-up Identifying code.
Preferably, the trigger module includes:
Fills unit is used for: utilizing the operation html page interface technology of browser offer by the information element It is filled into the search box of the target webpage.
A kind of information crawler equipment, comprising:
Memory, for storing computer program;
Processor, when for executing the computer program the step of realization any one as above information crawler method.
A kind of computer readable storage medium is stored with computer program on the computer readable storage medium, described The step of as above any one information crawler method is realized when computer program is executed by processor.
The present invention provides a kind of information crawler method, apparatus, equipment and computer readable storage medium, wherein this method It include: to obtain the corresponding information element of information for needing to inquire from server background;The information element is filled into target network In the search box of page, and trigger the target webpage pop-up identifying code;Browser, which is corresponded to, from the target webpage obtains extraneous people Member carries out the verification information that the corresponding verifying of identifying code obtains, if verification information expression is verified, by the verifying Information returns to the server background, to indicate that the crawlers of the server background operation carry out the information that need to be inquired It crawls.Technical solution disclosed in the present application obtains what needs were inquired from server background automatically for the website with identifying code Target webpage pop-up identifying code is filled into the search box of target webpage and triggered to information element by information corresponding informance element, And then the verification information that extraneous personnel carry out the corresponding verifying of identifying code is got, and will test when verification information expression is verified Card information returns to server background so that server background can indicate that crawlers realize crawling for information needed.It can See, for the website with identifying code, the application realizes information search and information crawler automatically, it is thus only necessary to artificial ginseng Verifying corresponding with identifying code realizes semi-automatic information crawler mode, and then significantly to realize that artificial minimize is intervened Improve acquisition of information speed.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of information crawler method provided in an embodiment of the present invention;
Fig. 2 is a kind of structural schematic diagram of information crawler device provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, can wrap it illustrates a kind of flow chart of information crawler method provided in an embodiment of the present invention It includes:
S11: the corresponding information element of information for needing to inquire is obtained from server background.
It should be noted that a kind of execution subject of information crawler method provided in an embodiment of the present invention can be corresponding Information crawler device, and the device can be using browser plug-in realization, specifically, browser plug-in is that one kind follows The program that the application programming interfaces of certain browser specification are write out, can be realized specific function.It is needing to utilize browsing When device plug-in unit realizes a kind of information crawler method provided in an embodiment of the present invention, needs first to open browser and pacify in a browser Browser plug-in is filled, subsequently into needing target webpage corresponding to the information inquired and open in a manner of plug-in unit icon etc. to click Browser plug-in, to realize the correlation step of information crawler using browser plug-in.Wherein, target webpage can be targeted website Homepage, and determining any website can be actually needed in targeted website according to.
The corresponding information element of information for needing to inquire is obtained from server background, the information element in the application can be The element that keyword, keyword or picture etc. are set according to actual needs, to inquire corresponding information based on information element.
S12: information element is filled into the search box of target webpage, and triggers target webpage pop-up identifying code.
After getting information element from server background, information element is filled into the search box of target webpage automatically It is interior, this in the prior art by staff realize corresponding step be identical process;After the completion of filling, automatic trigger target Webpage pops up identifying code, carries out the corresponding verifying of identifying code by extraneous staff;Wherein, the verifying in identifying code and background technique Code meaning is identical, and extraneous personnel carry out the corresponding verifying of identifying code and also realize identifying code to confirmed with staff in background technique The process of card is identical.
S13: corresponding to browser from target webpage and obtain the verification information that the extraneous corresponding verifying of personnel's progress identifying code obtains, If verification information expression is verified, verification information is returned into server background, to indicate server background operation Crawlers carry out crawling for the information that need to be inquired.
After extraneous personnel carry out the corresponding verifying of identifying code, browser can be verified information and store to cookie In, browser plug-in can be by obtaining verification information in the cookie of browser as a result, and judges that the verification information indicates whether It is verified, when the judgment result is yes, verification information is returned into server background, server background receives expression verifying By verifying message after, you can learn that target webpage has jumped to the webpage needed where the information inquired, thereby indicate that Crawlers carry out information crawler in the webpage jumped to;When the judgment result is No, then browser plug-in is not necessarily to verify Information returns to server background, the webpage that target webpage does not jump to where the information for needing to inquire is also illustrated that, also with regard to nothing The inquiry of method realization information needed.In addition, browser and server background have corresponding relationship, browser has with target webpage Corresponding relationship, and browser is specifically as follows chrome.
Technical solution disclosed in the present application obtains from server background automatically for the website with identifying code and needs to look into Information element is filled into the search box of target webpage and triggers target webpage pop-up verifying by the information corresponding informance element of inquiry Code, and then the verification information that extraneous personnel carry out the corresponding verifying of identifying code is got, and when verification information expression is verified Verification information is returned into server background so that server background can indicate that crawlers realize crawling for information needed. As it can be seen that for the website with identifying code, the application realizes information search and information crawler automatically, it is thus only necessary to artificial The corresponding verifying of identifying code is participated in, to realize that artificial minimize is intervened, realizes semi-automatic information crawler mode, Jin Er great Acquisition of information speed is improved greatly.
It is further to note that since technical solution disclosed in the present application can be good at the crawler with server background Program interacts operation, so that batch crawler task execution is got up more efficiently, realizes the plan of a kind of " counter to climb " Slightly.
A kind of information crawler method provided in an embodiment of the present invention can be with after triggering target webpage pop-up identifying code Include:
Output prompts extraneous personnel to carry out the prompt tone that identifying code corresponds to verifying.
It should be noted that prompt tone can specifically be selected according to actual needs, tested in triggering target webpage pop-up After demonstrate,proving code, then prompt tone is exported, to avoid staff not in face of computer, and then fails to carry out identifying code correspondence in time The case where verifying, occurs, namely enables to staff that can need to carry out identifying code with timely learning by the output of prompt tone Corresponding verifying, and then operation needed for verifying is realized in time, further ensure fast implementing for information crawler.
A kind of information crawler method provided in an embodiment of the present invention, triggering target webpage pop up identifying code, may include:
The corresponding inquiry button of automatically clicking search box, to trigger target webpage pop-up identifying code.
It should be noted that the specific implementation of triggering target webpage pop-up identifying code can carry out according to actual needs Setting can such as send the instruction of pop-up identifying code to target webpage, to indicate that target webpage pops up identifying code;And the present embodiment The automatic click action clicking trigger inquiry for imitating staff of middle selection and the inquiry of the movement of information element corresponding informance are pressed Button realizes corresponding triggering using the inquiry button that browser the application just has as a result, to trigger target webpage pop-up identifying code Event without additionally writing program, and only needs to can be realized based on browser with interfacing, to reduce development Required workload.
Information element, is filled into the search box of target webpage by a kind of information crawler method provided in an embodiment of the present invention It is interior, may include:
Information element is filled into the search box of target webpage by the operation html page interface technology provided using browser It is interior.
It should be noted that the operation html that can be provided using browser when information element to be filled into search box Page interface technology is realized, therefore without additionally writing program, and only needs to be based on browser to can be realized with interfacing, from And reduce workload needed for development.
The embodiment of the invention also provides a kind of information crawler devices, as shown in Fig. 2, may include:
Module 11 is obtained, is used for: obtaining the corresponding information element of information for needing to inquire from server background;
Trigger module 12, is used for: information element being filled into the search box of target webpage, and automatically clicking search box pair The inquiry button answered, to trigger target webpage pop-up identifying code;
Return module 13, is used for: corresponding to the extraneous personnel's progress identifying code correspondence of browser acquisition from target webpage and verifies Verification information is returned to server background, to indicate to service if verification information expression is verified by the verification information arrived The crawlers of device running background carry out crawling for the information that need to be inquired.
A kind of information crawler device provided in an embodiment of the present invention can also include:
Output module is used for: after triggering target webpage pop-up identifying code, output prompts extraneous personnel to carry out identifying code pair The prompt tone that should be verified.
A kind of information crawler device provided in an embodiment of the present invention, trigger module may include:
Trigger unit is used for: the corresponding inquiry button of automatically clicking search box, to trigger target webpage pop-up identifying code.
A kind of information crawler device provided in an embodiment of the present invention, trigger module may include:
Fills unit is used for: information element being filled into mesh using the operation html page interface technology that browser provides In the search box for marking webpage.
The embodiment of the invention also provides a kind of information crawler equipment, may include:
Memory, for storing computer program;
Processor, when for executing computer program realize as above any one of information crawler method the step of.
The embodiment of the invention also provides a kind of computer readable storage medium, it is stored on computer readable storage medium Computer program, may be implemented when computer program is executed by processor as above any one of information crawler method the step of.
It should be noted that a kind of information crawler device, equipment and computer-readable storage provided in an embodiment of the present invention The explanation of relevant portion refers to the detailed of corresponding part in a kind of information crawler method provided in an embodiment of the present invention in medium Illustrate, details are not described herein.In addition in above-mentioned technical proposal provided in an embodiment of the present invention with correspond to technical side in the prior art The consistent part of case realization principle is simultaneously unspecified, in order to avoid excessively repeat.
The foregoing description of the disclosed embodiments can be realized those skilled in the art or using the present invention.To this A variety of modifications of a little embodiments will be apparent for a person skilled in the art, and the general principles defined herein can Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited It is formed on the embodiments shown herein, and is to fit to consistent with the principles and novel features disclosed in this article widest Range.

Claims (10)

1. a kind of information crawler method characterized by comprising
The corresponding information element of information for needing to inquire is obtained from server background;
The information element is filled into the search box of target webpage, and triggers the target webpage pop-up identifying code;
Browser, which is corresponded to, from the target webpage obtains the verification information that the extraneous corresponding verifying of personnel's progress identifying code obtains, if The verification information expression is verified, then the verification information is returned to the server background, to indicate the service The crawlers of device running background carry out crawling for the information that need to be inquired.
2. the method according to claim 1, wherein also being wrapped after triggering the target webpage pop-up identifying code It includes:
Output prompts extraneous personnel to carry out the prompt tone that identifying code corresponds to verifying.
3. according to the method described in claim 2, it is characterized in that, triggering the target webpage pop-up identifying code, comprising:
The corresponding inquiry button of automatically clicking described search frame, to trigger the target webpage pop-up identifying code.
4. according to the method described in claim 3, it is characterized in that, the information element to be filled into the search box of target webpage It is interior, comprising:
The information element is filled into the target webpage using the operation html page interface technology that the browser provides Search box in.
5. a kind of information crawler device characterized by comprising
Module is obtained, is used for: obtaining the corresponding information element of information for needing to inquire from server background;
Trigger module is used for: the information element being filled into the search box of target webpage, and triggers the target webpage bullet Identifying code out;
Return module is used for: being corresponded to the extraneous corresponding verifying of personnel's progress identifying code of browser acquisition from the target webpage and is obtained Verification information, if the verification information expression be verified, the verification information is returned into the server background, To indicate that the crawlers of the server background operation carry out crawling for the information that need to be inquired.
6. device according to claim 5, which is characterized in that further include:
Output module is used for: after triggering the target webpage pop-up identifying code, output prompts extraneous personnel to carry out identifying code pair The prompt tone that should be verified.
7. device according to claim 6, which is characterized in that the trigger module includes:
Trigger unit is used for: the corresponding inquiry button of automatically clicking described search frame, to trigger the target webpage pop-up verifying Code.
8. device according to claim 7, which is characterized in that the trigger module includes:
Fills unit is used for: being filled the information element using the operation html page interface technology that the browser provides Into the search box of the target webpage.
9. a kind of information crawler equipment characterized by comprising
Memory, for storing computer program;
Processor, realizing the information crawler method as described in any one of Claims 1-4 when for executing the computer program Step.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes the step of the information crawler method as described in any one of Claims 1-4 when the computer program is executed by processor Suddenly.
CN201811564176.4A 2018-12-20 2018-12-20 A kind of information crawler method, apparatus, equipment and computer readable storage medium Pending CN109815380A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811564176.4A CN109815380A (en) 2018-12-20 2018-12-20 A kind of information crawler method, apparatus, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811564176.4A CN109815380A (en) 2018-12-20 2018-12-20 A kind of information crawler method, apparatus, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN109815380A true CN109815380A (en) 2019-05-28

Family

ID=66601679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811564176.4A Pending CN109815380A (en) 2018-12-20 2018-12-20 A kind of information crawler method, apparatus, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109815380A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413859A (en) * 2019-06-27 2019-11-05 平安科技(深圳)有限公司 Webpage information search method, apparatus, computer equipment and storage medium
CN110489629A (en) * 2019-08-28 2019-11-22 云汉芯城(上海)互联网科技股份有限公司 Data crawling method, data crawl device, data crawl equipment and storage medium
CN111460256A (en) * 2020-03-26 2020-07-28 深圳壹账通智能科技有限公司 Webpage data crawling method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298716A (en) * 2014-06-19 2015-01-21 中国科学院信息工程研究所 Web crawler system and web crawler implementation method capable of supporting artificial session grafting
CN106649362A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Webpage crawling method and apparatus
CN107426148A (en) * 2017-03-30 2017-12-01 成都优易数据有限公司 A kind of anti-reptile method and system based on running environment feature recognition
CN108076067A (en) * 2017-12-27 2018-05-25 北京中关村科金技术有限公司 A kind of method and system that the simulation of reptile configurationization is authorized to log in
CN108345641A (en) * 2018-01-12 2018-07-31 深圳壹账通智能科技有限公司 A kind of method crawling website data, storage medium and server

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298716A (en) * 2014-06-19 2015-01-21 中国科学院信息工程研究所 Web crawler system and web crawler implementation method capable of supporting artificial session grafting
CN106649362A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Webpage crawling method and apparatus
CN107426148A (en) * 2017-03-30 2017-12-01 成都优易数据有限公司 A kind of anti-reptile method and system based on running environment feature recognition
CN108076067A (en) * 2017-12-27 2018-05-25 北京中关村科金技术有限公司 A kind of method and system that the simulation of reptile configurationization is authorized to log in
CN108345641A (en) * 2018-01-12 2018-07-31 深圳壹账通智能科技有限公司 A kind of method crawling website data, storage medium and server

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413859A (en) * 2019-06-27 2019-11-05 平安科技(深圳)有限公司 Webpage information search method, apparatus, computer equipment and storage medium
CN110489629A (en) * 2019-08-28 2019-11-22 云汉芯城(上海)互联网科技股份有限公司 Data crawling method, data crawl device, data crawl equipment and storage medium
CN111460256A (en) * 2020-03-26 2020-07-28 深圳壹账通智能科技有限公司 Webpage data crawling method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109815380A (en) A kind of information crawler method, apparatus, equipment and computer readable storage medium
CN106844522B (en) A kind of network data crawling method and device
CN108647141A (en) Automatic test approach, device, computer-readable medium and electronic equipment
CN106294101B (en) The page gets test method and device ready
CN102831218B (en) Method and device for determining data in thermodynamic chart
CN108293081A (en) Pass through the program playback deep linking of user interface event to mobile application state
CN103649907B (en) The record of highly concurrent processing task and execution
CN108363602A (en) Intelligent UI quick interface arrangement methods, device, terminal device and storage medium
CN107423048A (en) Method, apparatus, medium and the computing device of Data Collection
CN109783751A (en) Form validation method and terminal device
CN104717095B (en) A kind of visualization SDN management method of integrated multi-controller
KR20060079080A (en) Methods and apparatus for evaluating aspects of a web page
CN104731582B (en) A kind of social networking system modeling and privacy policy Property Verification method based on MSVL
CN103777980A (en) Website commenting information loading method and browser
CN109684210A (en) A kind of website automation test method, device, equipment and readable storage medium storing program for executing
CN108614762A (en) A kind of browser testing method and device
CN109657125A (en) Data processing method, device, equipment and storage medium based on web crawlers
CN106775611B (en) Method for realizing self-adaptive dynamic web page crawler system based on machine learning
CN109522494A (en) A kind of dark chain detection method, device, equipment and computer readable storage medium
CN103685237B (en) Improve the method and device of website vulnerability scanning speed
WO2016048294A1 (en) Infrastructure rule generation
CN111737114A (en) Login function testing method and device
CN106484741B (en) A kind of method and device of single page application access data collection and transmission
Elyasaf et al. Combinatorial sequence testing using behavioral programming and generalized coverage criteria
CN110362294A (en) Development task executes method, apparatus, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190528

RJ01 Rejection of invention patent application after publication