CN110390043A - Crawling method, device, terminal and the storage medium of webpage mailbox data - Google Patents

Crawling method, device, terminal and the storage medium of webpage mailbox data Download PDF

Info

Publication number
CN110390043A
CN110390043A CN201910522340.3A CN201910522340A CN110390043A CN 110390043 A CN110390043 A CN 110390043A CN 201910522340 A CN201910522340 A CN 201910522340A CN 110390043 A CN110390043 A CN 110390043A
Authority
CN
China
Prior art keywords
data
file
script
mailbox
crawling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910522340.3A
Other languages
Chinese (zh)
Inventor
卢俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910522340.3A priority Critical patent/CN110390043A/en
Publication of CN110390043A publication Critical patent/CN110390043A/en
Priority to PCT/CN2020/086228 priority patent/WO2020253366A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/34Network arrangements or protocols for supporting network services or applications involving the movement of software or configuration parameters 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
    • H04L9/3239Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving non-keyed hash functions, e.g. modification detection codes [MDCs], MD5, SHA or RIPEMD
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/42Mailbox-related aspects, e.g. synchronisation of mailboxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention relates to deep layer net page crawler technology fields, more particularly to a kind of crawling method, device, terminal and the storage medium of webpage mailbox data, comprising: after mailbox homepage loads successfully, call the call back function of browser, wherein, the call back function includes the script file of injection;Specified search information is obtained, and the mail data of the mailbox homepage is crawled by the script that crawls of the script file, obtains corresponding with described search information crawling data;After the completion of crawling operation, by it is described crawl data and be uploaded to server parse;The parsing result that the server returns is received, and the parsing result is shown, wherein the parsing result includes the target data to match with described search information;This solution avoids server-sides, and because repeatedly crawling data, shielded phenomenon occurs, while saving server-side and crawling the resource consumed in data procedures.

Description

Crawling method, device, terminal and the storage medium of webpage mailbox data
Technical field
The present invention relates to deep layer net page crawler technology field more particularly to a kind of crawling methods of webpage mailbox data, dress It sets, terminal and storage medium.
Background technique
With the continuous development of computer and internet, the information that user receives is also more and more many and diverse, by receiving Information searching other information it is then more inconvenient;User is to crawl net by server in the information that inquiry needs at present Page information is simultaneously loaded onto related pages after being parsed and is shown.
For example, crawling user account number and password first when user logs in mailbox and obtains relevant information by mail, service After device is logged in using the user account number and password, then the Mail Contents in subscriber mailbox are crawled and analyzed, details See attached drawing 1;But in this operating process, server can usually log in repeatedly because of on mailbox homepage, and access is caused to be used The number of family mailbox is excessively frequent, and while occupying a large amount of computing resources, the ID of server is also easy to be shielded by website, thus It can not carry out subsequent operation.
Summary of the invention
The purpose of the present invention is intended at least can solve above-mentioned one of technological deficiency, and especially server is climbed in the prior art It takes webpage information excessively frequent, is not only easy to be shielded by website, can also occupy the technological deficiency of a large amount of computing resources.
The present invention provides a kind of crawling method of webpage mailbox data, includes the following steps:
After mailbox homepage loads successfully, the call back function of browser is called, wherein the call back function includes injection Script file;
Specified search information is obtained, and script is crawled to the mail of the mailbox homepage by the script file Data are crawled, and obtain corresponding with described search information crawling data;
After the completion of crawling operation, by it is described crawl data and be uploaded to server parse;
The parsing result that the server returns is received, and the parsing result is shown, wherein the parsing knot Fruit includes the target data to match with described search information.
In one of the embodiments, after mailbox homepage loads successfully, the step of calling the call back function of browser Before, further includes:
Script file is obtained from specified directory according to the version information file of client;
It will be in the call back function of script file injection browser.
Script file is obtained from specified directory according to the version information file of client in one of the embodiments, Before step, further includes:
Communication interface is requested, updated version information file is obtained by the communication interface;
It will be compared between the updated version information file and original version information file;
Determine whether original version information file needs to update according to the comparison result.
In one of the embodiments, according to the comparison result determine original version information file whether needs After the step of update, further includes:
When original version information file needs to update, downloads and save the updated version information text Part;
It will be compared between the check value of the updated version information file and the check value of the communication interface;
Determine whether the updated version information file is downloaded correctly according to the comparison result.
Script file is obtained from specified directory according to the version information file of client in one of the embodiments, Step, comprising:
Search configuration file by specified directory, wherein the configuration file include multiple script files and with it is described The corresponding configuration data of script file;
In the presence of the script file, the corresponding configuration data of the script file is obtained, and by the configuration data Check value and the check value of the script file between be compared;
Script file is obtained according to the comparison result.
Described search information includes Credit Statement information in one of the embodiments,;
Specified search information is obtained, and script is crawled to the mail of the mailbox homepage by the script file Data are crawled, and the step of crawling data corresponding with described search information is obtained, comprising:
The script file is executed according to specified Credit Statement information, wherein the script file includes crawling foot This;
Mail relevant to the Credit Statement information is crawled in the mailbox homepage using the script that crawls Data;
The mail data that script repeatedly crawls is crawled described in statistics, and is obtained and the credit according to the statistical result Card bill information is corresponding to crawl data.
The parsing result that the server returns is received in one of the embodiments, and the parsing result is carried out It shows, wherein the parsing result includes the steps that the target data to match with described search information, comprising:
Obtain the parsing result that the server returns, wherein the parsing result includes believing with the Credit Statement The matched billing data of manner of breathing;
The billing data is shown in the mailbox homepage.
Device is crawled the present invention also provides a kind of webpage mailbox data comprising:
Calling module, for calling the call back function of browser, wherein described time after mailbox homepage loads successfully Letter of transfer number includes the script file of injection;
Module is crawled, crawls script to the postal for obtaining specified search information, and by the script file The mail data of case homepage is crawled, and obtains corresponding with described search information crawling data;
Data transmission module, for after the completion of crawl operation, by it is described crawl data and be uploaded to server parse;
Information display module, the parsing result returned for receiving the server, and the parsing result is opened up Show, wherein the parsing result includes the target data to match with described search information.
The present invention also provides a kind of terminals, which is characterized in that including memory and processor, stores in the memory There is computer-readable instruction, when the computer-readable instruction is executed by the processor, so that the processor executes as above State the step in the crawling method of webpage mailbox data described in any one of embodiment.
The present invention also provides a kind of storage medium, computer-readable instruction, the meter are stored in the storage medium When calculation machine readable instruction is executed by one or more processors, so that one or more processors are executed as appointed in above-described embodiment The step of crawling method of one webpage mailbox data.
Crawling method, device, terminal and the storage medium of above-mentioned webpage mailbox data, when mailbox homepage loads successfully Afterwards, the call back function of browser is called, wherein the call back function includes the script file of injection;Obtain specified search letter Breath, and the mail data of the mailbox homepage being crawled by the script that crawls of the script file, obtain with it is described Information is corresponding crawls data for search;After the completion of crawling operation, by it is described crawl data and be uploaded to server parse;It connects The parsing result that the server returns is received, and the parsing result is shown, wherein the parsing result includes and institute State the target data that search information matches.
This programme by injecting the script file in call back function in advance, after mailbox homepage loads successfully, to browsing Call back function in device is called, and the script file injected at this time brings into operation, to the mail data in mailbox homepage into Row crawls, and crawls data so that the search information for obtaining with specifying is corresponding, after crawling operation, all are crawled number Parsed according to server is uploaded to together, in parsing result, that is, this programme that such server returns with specify search for information phase Matched target data;This programme directly can extract information related to user in client using script file, without complete Related content is crawled by server, avoiding server-side because repeatedly crawling data, shielded phenomenon occurs, meanwhile, by clear Injection script file in the call back function of device of looking at replaces server-side to carry out crawling operation, saves server-side and is crawling data procedures The resource of middle consumption.
The additional aspect of the present invention and advantage will be set forth in part in the description, these will become from the following description Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is background of invention scenario-frame schematic diagram;
Fig. 2 is the applied environment figure of the embodiment of the present invention;
Fig. 3 is the crawling method flow chart of the webpage mailbox data of one embodiment;
Fig. 4 is webpage mailbox data interaction schematic diagram of the invention;
Fig. 5 is that the webpage mailbox data of one embodiment crawls apparatus structure schematic diagram;
Fig. 6 is the part-structure block diagram for the relevant mobile phone of terminal that one embodiment provides.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their combination.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here To explain.
Refering to what is shown in Fig. 2, Fig. 2 is the applied environment figure of the embodiment of the present invention;In the present embodiment, technical solution of the present invention It can be based on being realized on terminal device 110, in Fig. 2, user installs client by forms such as App on terminal device 110, User passes through communication network after crawling user related information by script file by logging in Client browse related pages, client Network is transmitted to server 120 to realize correlation function;In embodiments of the present invention, user logs in client by terminal device 110 End, script file crawls user in the behavioural information of client, and the target data to be matched according to behavior acquisition of information, will The target data is transmitted in server 120, and server 120 parses the target data crawled, to return to specific number It is believed that breath;Here terminal device 110 can be smart phone, tablet computer, the end PC etc., and however, it is not limited to this, here Server 120 refers to the server apparatus for realizing various background functions.
In one embodiment, as shown in figure 3, Fig. 3 is the crawling method process of the webpage mailbox data of one embodiment Scheme, propose a kind of crawling method of webpage mailbox data in the present embodiment, can specifically include following steps:
S110: after mailbox homepage loads successfully, the call back function of browser is called, wherein the call back function packet Include the script file of injection.
In this step, after user logins successfully, page mask animation is opened in mailbox homepage, display " crawls load In " load effect, main purpose is to cover mailbox homepage, prevents user's operation to crawling and interfere automatically.
It is in FTP client FTP code that the purpose of mailbox log-on webpage is accessed used here as call back function in call back function A native control, it is possible to understand that at a simple browser in system, when access mailbox log-on webpage, webpage load Success will be called, and script file has been injected into mailbox log-on webpage at this time, when call back function is called, inject foot Js code after this would be at the state of operation, and the operation of real-time detection user.
S120: specified search information is obtained, and script is crawled to the mailbox homepage by the script file Mail data crawled, obtain corresponding with described search information crawling data.
In this step, through step S110 after mailbox homepage loads successfully, the call back function of browser is called, In, after the call back function includes the script file of injection, obtain specified search information, and climbing by the script file It takes script to crawl the mail data of the mailbox homepage, obtains corresponding with described search information crawling data.
In the above process, using crawling in the script file of injection, script crawls and specifies in mailbox homepage searches What rope information matched crawls data, for example, the search information of user's input is Credit Statement data, script file is detected After the information of user's input, the mail data in mailbox is crawled, lookup is relevant to Credit Statement data to crawl data.
It should be noted that specified search information here, also refers to input after user logs in mailbox homepage Specific search information, also also refer to by consistency operation and timing inquire the search information saved, these are The content of the search information of preservation is modified after can having business personnel's login in the consistency operation page.
S130: when it is described crawl operation after the completion of, by it is described crawl data and be uploaded to server parse.
In this step, after client completion crawls operation, data are crawled to what server upload crawled, server can To carry out deep text resolution according to the data that crawl crawled.
For example, crawling in scene in the mail for obtaining Credit Statement data, server carries out text according to the data crawled This parsing obtains the billing data of credit card, and the billing data is back to mailbox homepage, climbs so that user obtains in time Take result.
S140: the parsing result that the server returns is received, and the parsing result is shown, wherein is described Parsing result includes the target data to match with described search information.
In this step, after server is completed to parse, the parsing result that is returned by server is received, and by the parsing result It is shown in mailbox homepage, the target data that the information of displaying includes and the search information specified matches.
For example, crawling in scene in the mail for obtaining Credit Statement data, server carries out text according to the data crawled This parsing obtains the billing data of credit card, and the billing data is back to mailbox homepage, and mailbox homepage loads at this time Content include the credit card got billing data, which can be the side arranged according to chronological order Formula is also possible to the billing data obtained after being counted according to different payment types.
The content of billing data can be, and daily, house payment of living, traffic trip, diet, Fashion & Beauty, movement are strong Health, education and recreation, communication logistics and the bill information of other consumption.
The crawling method of above-mentioned webpage mailbox data calls the readjustment letter of browser after mailbox homepage loads successfully Number, wherein the call back function includes the script file of injection;Specified search information is obtained, and passes through the script file The script that crawls the mail data of the mailbox homepage is crawled, obtain corresponding with described search information crawling number According to;After the completion of crawling operation, by it is described crawl data and be uploaded to server parse;Receive the solution that the server returns Analysis as a result, and the parsing result is shown, wherein the parsing result includes the mesh to match with described search information Mark data.
This programme by injecting the script file in call back function in advance, after mailbox homepage loads successfully, to browsing Call back function in device is called, and the script file injected at this time brings into operation, to the mail data in mailbox homepage into Row crawls, and crawls data so that the search information for obtaining with specifying is corresponding, after crawling operation, all are crawled number Parsed according to server is uploaded to together, in parsing result, that is, this programme that such server returns with specify search for information phase Matched target data;This programme directly can extract information related to user in client using script file, without complete Related content is crawled by server, avoiding server-side because repeatedly crawling data, shielded phenomenon occurs, meanwhile, by clear The script file that injects in the call back function of device of looking at replaces server-side to carry out crawling operation, saves server-side and is crawling data mistake The resource consumed in journey.
In one embodiment, the readjustment letter of browser is called after mailbox homepage loads successfully in step S110 Number, wherein before the call back function includes the script file of injection, can also include:
(1) script file is obtained from specified directory according to the version information file of client;
It (2) will be in the call back function of script file injection browser.
In this step, script file is obtained from specified directory according to the version information file of client, needs to illustrate , version information file here refer to software program distribution the corresponding message file of number version number;Here soft Part program can be the mailbox office software in client installation;Here specified directory refers to being deposited according to version information file Store up the catalogue file under the specified path that path determines.
It as shown in the above description, can be from version information text according to the version information file of the mailbox of client installation Script file is obtained in the specified directory of part, script file here may include multiple file types, such as: public function library Public script, any door public script, crawl script and logon script etc..
1. the public script in public function library: jquery-2.2.4.min.js;The js function library that can be used as basis is adjusted With;
2. the public script of any door script: native_RYM.js;Can be used for webpage and client IOS system or Android system carries out code interaction;
Any door script judges which kind of system current environment is, calls WebView interface corresponding with system, IOS system The script that same portion is crawled or logged in can be used in system and android system;
3. crawling script: crawl.js;The data that can be used for crawling webpage, under the scene that mail crawls, this is crawled Script can mainly crawl the mail data in mail;
4. logon script: autologin.js;It can be used for being stepped on automatically when preserving user account number and password Record.
It is above-mentioned to crawl script and logon script needs to rely on public script and executed, therefore when injection script file, What is be initially injected is public script.
After getting script file, by the call back function of script file injection browser, further, it can pass through Call back function accesses mailbox log-on webpage, and logs in letter by the user that the logon script of script file crawls mailbox log-on webpage Breath.
The purpose that mailbox log-on webpage is accessed used here as call back function is that WebView is in FTP client FTP code A native control, it is possible to understand that at a simple browser in system, when access mailbox log-on webpage, webpage load Success will be called, if user, when saving related mailbox logon account and login password before this, js can have found simultaneously Crawl operation.
Preferably, the user login information is shown, when receiving confirmation of the user to the user login information When information, mailbox log-on webpage is jumped into mailbox homepage.
In this step, mailbox log-on webpage is accessed by call back function, and log in by the script file morning mailbox of injection It is crawled in webpage with after the user login information of mailbox log-on webpage, which is shown, when receiving use When family is to the confirmation message of user login information, mailbox log-on webpage is jumped into mailbox homepage.
In the above process, script file crawl user preservation user login information after, by the user login information into Row shows that user is to the displaying information browse and determination, after receiving the confirmation message of user, by page jump to mailbox master The page.
Further, when user does not save the user login information in mailbox login page, without passing through script file Continue to crawl relevant information, user jumps mailbox homepage after being manually entered that account is close and logining successfully, and backstage can also solicit user Opinion, if save the user account number and password of active user's input.
In one embodiment, the step of script file is obtained from specified directory according to the version information file of client Before, can also include:
(1) communication interface is requested, updated version information file is obtained by the communication interface;
(2) it will be compared between the updated version information file and original version information file;
(3) when there is comparison result, original version information file is updated;
(4) when no comparison result, without updating original version information file.
In above-mentioned steps, obtain the latest version information file of script by request communication interface, and be locally configured Version information file compare, judge whether to need to update.
Here communication interface can be GP interface, be the interface that uses between the GSN of different PLMN net, which increase sides Hoddy closes (BG, Border Gateway) and firewall, and Border Gateway Routing Protocol is provided by BG, to complete to belong to not With the communication between the GPRS Support Node of PLMN.
Further, version information file here can be saved with the document form of a json format, be saved Format is as follows: { " downloadUrl ": " ", " jsName ": " qq ", " md5 ": " 279f1dd1759c2c270ea2837f1121 ebfb","needUpdateVersion":false,"updatedJsPath":"","version":"1.1.3"}。
In the present embodiment, before obtaining script file in specified directory, lead to by the version information file of client Request GP interface is crossed to check and obtain last updated version information file, in this way, the available script file to update, The problem of to reduce information asymmetry caused by the version aging that local script file occurs when operation.
In one embodiment, it will be carried out between the updated version information file and original version information file The step of comparison may include:
(1) obtain the updated version information file version number and original version information file Version number;
(2) by the version number of the version number of the updated version information file and original version information file Between be compared.
In above-mentioned steps, by comparing the version number between different editions message file, the version being locally configured can be learnt Whether this message file, which needs, updates.
It should be noted that different version information files has different version numbers, version number is as differentiation different editions The identification information of message file, for comparing new and old edition message file, further to check local version message file Whether need to update.
In the present embodiment, pass through updated version information file and the local original version information file that will acquire It is compared between version number, learns whether the updated version information file pulled is latest edition to compare, avoid Directly updated version information file is pulled to local, but the indifference between original version information file, causes to provide The case where source is consumed for no reason.
In one embodiment, determine whether original version information file needs to update according to the comparison result The step of after, can also include:
(1) it when original version information file needs to update, downloads and saves the updated version information File;
(2) will compare between the check value of the updated version information file and the check value of the communication interface It is right;
(3) determine whether the updated version information file is downloaded correctly according to the comparison result.
In this step, when comparison result through the foregoing embodiment determines that original version information file needs to update, under It carries and saves the version information file, then will be carried out between the check value of the version information file and the check value of communication interface It compares, it is ensured that the version information file of downloading is correct version information file.
Above-mentioned check value can be md5 value, and above-mentioned communication interface can be GP interface, and when needing to update, downloading is most The md5 value of latest version information file and the md5 value of GP interface are compared, judge whether to download by new version message file Correctly, when determining that downloading is correct, which is stored as configuration data.
Further, if mismatched between the md5 value of latest version information file and the md5 value of GP interface, then it represents that The version information file of downloading is incorrect, then re-downloads after deleting the version information file downloaded and saved before, until Until downloading is correct, prevent from causing in terminal device 110 in the incorrect version information file of downloading comprising trojan horse etc. Poison.
In one embodiment, the step of script file is obtained from specified directory according to the version information file of client Suddenly, may include:
(1) configuration file is searched by specified directory, wherein the configuration file include multiple script files and with institute State the corresponding configuration data of script file;
(2) in the presence of the script file, the corresponding configuration data of the script file is obtained, and by the configuration number According to check value and the check value of the script file between be compared;
(3) script file is obtained according to the comparison result.
In the present embodiment, script file is obtained from specified directory according to the version information file of client, it specifically can be with Include the following steps:
Configuration file is searched under specified data directory, if specified directory is /data/ .../app_emailcrawl/, then Xx.js file is searched under the specified directory, if the script file exists, obtains the configuration data of the script file, and will The md5 value of the configuration data and the md5 value of script file under/data/ compare, if the md5 value of the configuration data be it is correct, Then obtain/data/ under script file.
Further, if it exists the md5 value of the configuration data be wrong, specified data directory there is no script file, When script file under specified data directory does not have these situations of configuration data, then the dependency number under specified data directory is deleted According to replicating the script under Asset, storage configuration data, notice is external, is handled by external intervention.
In above-described embodiment, the check value between the script file obtained under specified directory and configuration data is compared Right, preventing the script file obtained is the script file of mistake, avoids consumption memory.
In one embodiment, described search information includes Credit Statement information;
In step S120, specified search information is obtained, and script is crawled to the mailbox by the script file The mail data of homepage is crawled, and is obtained the step of crawling data corresponding with described search information, be may include:
(1) script file is executed according to specified Credit Statement information, wherein the script file includes crawling Script;
(2) script is crawled described in utilizing, and postal relevant to the Credit Statement information is crawled in the mailbox homepage Number of packages evidence;
(3) crawl the mail data that script repeatedly crawls described in statistics, and according to the statistical result obtain with it is described Credit Statement information is corresponding to crawl data.
In the present embodiment, specified search information is obtained, and crawl foot according in the search information Run Script file This, crawls mail data relevant to search information, and the mail number that will be crawled using script is crawled in mailbox homepage Data content further progress parsing in, to obtain the target data to match with the search information that user inputs.
Further, for script file while execution crawls, the process that crawls can be returned to client crawls state, Here the state that crawls may include the progress that crawls, crawl percent and how many, to be done crawl time etc..
In one embodiment, as shown in figure 4, Fig. 4 is the webpage mailbox data interaction schematic diagram of one embodiment, this reality It applies and provides a kind of crawling method of webpage mailbox data in example, may include:
The script file injected in the call back function that terminal device 110 passes through browser opens mailbox homepage in client When face, after calling the call back function, the content for script of the script file is executed, crawls mailbox homepage using script is crawled In mail data, and the correlation finally crawled is crawled into data and is uploaded to server 120 together, server 120 climbs this Access is according to parsing result is back in mailbox homepage after deep text resolution being shown.
In the above process, server 120 only need receiving terminal apparatus 110 upload crawl data, without repeatedly visit It asks client, prevents from occurring because of the frequent and shielded phenomenon of access times, the resource for also further reducing server 120 disappears Consumption.
In one embodiment, as shown in figure 5, the webpage mailbox data that Fig. 5 is one embodiment crawls apparatus structure Schematic diagram, a kind of webpage mailbox data is provided in the present embodiment crawls device comprising: calling module 210 crawls module 220, data transmission module 230, information display module 240, in which:
Calling module 210, for calling the call back function of browser, wherein described after mailbox homepage loads successfully Call back function includes the script file of injection.
In this module, after user logins successfully, page mask animation is opened in mailbox homepage, is shown " crawling in load " Load effect, main purpose is to cover mailbox homepage, prevents user's operation to crawling and interfere automatically.
It is in FTP client FTP code that the purpose of mailbox log-on webpage is accessed used here as call back function in call back function A native control, it is possible to understand that at a simple browser in system, when access mailbox log-on webpage, webpage load Success will be called, and script file has been injected into mailbox log-on webpage at this time, when call back function is called, inject foot Js code after this would be at the state of operation, and the operation of real-time detection user.
Module 220 is crawled, crawls script to described for obtaining specified search information, and by the script file The mail data of mailbox homepage is crawled, and obtains corresponding with described search information crawling data.
In this module, through calling module 210 after mailbox homepage loads successfully, the readjustment letter of browser is called Number, wherein after the call back function includes the script file of injection, obtain specified search information, and pass through the script The script that crawls of file crawls the mail data of the mailbox homepage, obtains crawl corresponding with described search information Data.
In the above process, using crawling in the script file of injection, script crawls and specifies in mailbox homepage searches What rope information matched crawls data, for example, the search information of user's input is Credit Statement data, script file is detected After the information of user's input, the mail data in mailbox is crawled, lookup is relevant to Credit Statement data to crawl data.
It should be noted that specified search information here, also refers to input after user logs in mailbox homepage Specific search information, also also refer to by consistency operation and timing inquire the search information saved, these are The content of the search information of preservation is modified after can having business personnel's login in the consistency operation page.
Data transmission module 230, for after the completion of crawl operation, by it is described crawl data and be uploaded to server solve Analysis.
In this module, after client completion crawls operation, data are crawled to what server upload crawled, server can To carry out deep text resolution according to the data that crawl crawled.
For example, crawling in scene in the mail for obtaining Credit Statement data, server carries out text according to the data crawled This parsing obtains the billing data of credit card, and the billing data is back to mailbox homepage, climbs so that user obtains in time Take result.
Information display module 240, the parsing result returned for receiving the server, and the parsing result is carried out It shows, wherein the parsing result includes the target data to match with described search information.
In this module, after server is completed to parse, the parsing result that is returned by server is received, and by the parsing result It is shown in mailbox homepage, the target data that the information of displaying includes and the search information specified matches.
For example, crawling in scene in the mail for obtaining Credit Statement data, server carries out text according to the data crawled This parsing obtains the billing data of credit card, and the billing data is back to mailbox homepage, and mailbox homepage loads at this time Content include the credit card got billing data, which can be the side arranged according to chronological order Formula is also possible to the billing data obtained after being counted according to different payment types.
Such as: the content of billing data can be, live daily, house payment, traffic trip, diet, Fashion & Beauty, fortune Dynamic health, education and recreation, communication logistics and the bill information of other consumption.
Above-mentioned webpage mailbox data crawls device, after mailbox homepage loads successfully, calls the readjustment letter of browser Number, wherein the call back function includes the script file of injection;Specified search information is obtained, and passes through the script file The script that crawls the mail data of the mailbox homepage is crawled, obtain corresponding with described search information crawling number According to;After the completion of crawling operation, by it is described crawl data and be uploaded to server parse;Receive the solution that the server returns Analysis as a result, and the parsing result is shown, wherein the parsing result includes the mesh to match with described search information Mark data.
This programme by injecting the script file in call back function in advance, after mailbox homepage loads successfully, to browsing Call back function in device is called, and the script file injected at this time brings into operation, to the mail data in mailbox homepage into Row crawls, and crawls data so that the search information for obtaining with specifying is corresponding, after crawling operation, all are crawled number Parsed according to server is uploaded to together, in parsing result, that is, this programme that such server returns with specify search for information phase Matched target data;This programme directly can extract information related to user in client using script file, without complete Related content is crawled by server, avoiding server-side because repeatedly crawling data, shielded phenomenon occurs, meanwhile, by clear The script file that injects in the call back function of device of looking at replaces server-side to carry out crawling operation, saves server-side and is crawling data mistake The resource consumed in journey.
The specific restriction for crawling device about webpage mailbox data may refer to above for webpage mailbox data The restriction of crawling method, details are not described herein.The modules of above-mentioned webpage mailbox data crawled in device can whole or portion Divide and is realized by software, hardware and combinations thereof.Above-mentioned each module can be embedded in the form of hardware or independently of in terminal device Processor in, can also be stored in a software form in the memory in terminal device, in order to processor call execute with The corresponding operation of upper modules.
In one embodiment it is proposed that a kind of terminal, including memory 320 and processor 380, the memory 320 In be stored with computer-readable instruction, when the computer-readable instruction is executed by the processor 380, so that the processor 380 execute the step as described in any one of above-described embodiment in the crawling method of webpage mailbox data.
As shown in fig. 6, Fig. 6 shows the part-structure frame of mobile phone relevant to terminal provided in an embodiment of the present invention Figure.With reference to Fig. 6, mobile phone includes: radio frequency (Radio Frequency, RF) circuit 310, memory 320, input unit 330, shows Show unit 340, sensor 350, voicefrequency circuit 360, Wireless Fidelity (wireless fidelity, WiFi) module 370, processing The components such as device 380 and power supply 390.It will be understood by those skilled in the art that handset structure shown in Fig. 6 is not constituted pair The restriction of mobile phone may include perhaps combining certain components or different component cloth than illustrating more or fewer components It sets.
It is specifically introduced below with reference to each component parts of the Fig. 6 to mobile phone:
RF circuit 310 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station After downlink information receives, handled to processor 380;In addition, the data for designing uplink are sent to base station.Memory 320 can be used In storing computer-readable store instruction and module, processor 380 is stored in the computer-readable of memory 320 by operation Store instruction and module, thereby executing the various function application and data processing of mobile phone.Input unit 330 can be used for receiving The number or character information of input, and generate key signals input related with the user setting of mobile phone and function control.It is aobvious Show that unit 340 can be used for showing information input by user or be supplied to the information of user and the various menus of mobile phone.
Mobile phone may also include at least one sensor 350, such as optical sensor, motion sensor and other sensors. Voicefrequency circuit 360, loudspeaker 361, microphone 362 can provide the audio interface between user and mobile phone.WiFi belongs to short distance Radio Transmission Technology, mobile phone can help user to send and receive e-mail, browse webpage and access streaming matchmaker by WiFi module 370 Body etc., it provides wireless broadband internet access for user.Processor 380 is the control centre of mobile phone, is connect using various Mouthful and connection whole mobile phone various pieces, deposited by running or executing be stored in memory 320 computer-readable Storage instruction and/or module, and the data being stored in memory 320 are called, the various functions and processing data of mobile phone are executed, To carry out integral monitoring to mobile phone.
Mobile phone further includes the power supply 390 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply pipe Reason system and processor 380 are logically contiguous, to realize management charging, electric discharge and power managed by power-supply management system Etc. functions.
In one embodiment it is proposed that a kind of storage medium, when the computer-readable storage for being stored in memory 320 refers to When order and module are executed by processor 380, processor 380 may make to realize the crawling method of above-mentioned webpage mailbox data, with And realize the function of crawling the corresponding module in device of the webpage mailbox data of embodiment illustrated in fig. 5.
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (10)

1. a kind of crawling method of webpage mailbox data, which comprises the steps of:
After mailbox homepage loads successfully, the call back function of browser is called, wherein the call back function includes the foot of injection This document;
Specified search information is obtained, and script is crawled to the mail data of the mailbox homepage by the script file It is crawled, obtains corresponding with described search information crawling data;
After the completion of crawling operation, by it is described crawl data and be uploaded to server parse;
The parsing result that the server returns is received, and the parsing result is shown, wherein the parsing result packet Include the target data to match with described search information.
2. the method according to claim 1, wherein calling browser after mailbox homepage loads successfully Before the step of call back function, further includes:
Script file is obtained from specified directory according to the version information file of client;
It will be in the call back function of script file injection browser.
3. according to the method described in claim 2, it is characterized in that, according to the version information file of client from specified directory Before the step of obtaining script file, further includes:
Communication interface is requested, updated version information file is obtained by the communication interface;
It will be compared between the updated version information file and original version information file;
Determine whether original version information file needs to update according to the comparison result.
4. according to the method described in claim 3, it is characterized in that, determining that original version is believed according to the comparison result After whether breath file needs the step of updating, further includes:
When original version information file needs to update, downloads and save the updated version information file;
It will be compared between the check value of the updated version information file and the check value of the communication interface;
Determine whether the updated version information file is downloaded correctly according to the comparison result.
5. according to the method described in claim 2, it is characterized in that, according to the version information file of client from specified directory The step of obtaining script file, comprising:
Search configuration file by specified directory, wherein the configuration file include multiple script files and with the script The corresponding configuration data of file;
In the presence of the script file, the corresponding configuration data of the script file is obtained, and by the school of the configuration data It tests between value and the check value of the script file and is compared;
Script file is obtained according to the comparison result.
6. the method according to claim 1, wherein described search information includes Credit Statement information;
Specified search information is obtained, and script is crawled to the mail data of the mailbox homepage by the script file It is crawled, obtains the step of crawling data corresponding with described search information, comprising:
The script file is executed according to specified Credit Statement information, wherein the script file includes crawling script;
Mail data relevant to the Credit Statement information is crawled in the mailbox homepage using the script that crawls;
The mail data that script repeatedly crawls is crawled described in statistics, and is obtained and the credit card account according to the statistical result Single information is corresponding to crawl data.
7. according to the method described in claim 6, it is characterized in that, receive the parsing result that the server returns, and by institute Parsing result to be stated to be shown, wherein the parsing result includes the steps that the target data to match with described search information, Include:
Obtain the parsing result that the server returns, wherein the parsing result includes and the Credit Statement information phase Matched billing data;
The billing data is shown in the mailbox homepage.
8. a kind of webpage mailbox data crawls device characterized by comprising
Calling module, for calling the call back function of browser, wherein the readjustment letter after mailbox homepage loads successfully Number includes the script file of injection;
Module is crawled, crawls script to the mailbox master for obtaining specified search information, and by the script file The mail data of the page is crawled, and obtains corresponding with described search information crawling data;
Data transmission module, for after the completion of crawl operation, by it is described crawl data and be uploaded to server parse;
Information display module, the parsing result returned for receiving the server, and the parsing result is shown, In, the parsing result includes the target data to match with described search information.
9. a kind of terminal, which is characterized in that including memory and processor, computer-readable finger is stored in the memory It enables, when the computer-readable instruction is executed by the processor, so that the processor is executed as claim 1 to 7 is any Step in the crawling method of one webpage mailbox data.
10. a kind of storage medium, it is characterised in that: be stored with computer-readable instruction, the computer in the storage medium When readable instruction is executed by one or more processors, so that one or more processors are executed as any in claim 1 to 7 The step of crawling method of the item webpage mailbox data.
CN201910522340.3A 2019-06-17 2019-06-17 Crawling method, device, terminal and the storage medium of webpage mailbox data Pending CN110390043A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910522340.3A CN110390043A (en) 2019-06-17 2019-06-17 Crawling method, device, terminal and the storage medium of webpage mailbox data
PCT/CN2020/086228 WO2020253366A1 (en) 2019-06-17 2020-04-22 Webpage mailbox data crawling method and apparatus, terminal, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910522340.3A CN110390043A (en) 2019-06-17 2019-06-17 Crawling method, device, terminal and the storage medium of webpage mailbox data

Publications (1)

Publication Number Publication Date
CN110390043A true CN110390043A (en) 2019-10-29

Family

ID=68285418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910522340.3A Pending CN110390043A (en) 2019-06-17 2019-06-17 Crawling method, device, terminal and the storage medium of webpage mailbox data

Country Status (2)

Country Link
CN (1) CN110390043A (en)
WO (1) WO2020253366A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112073258A (en) * 2020-08-06 2020-12-11 深信服科技股份有限公司 Method for identifying user, electronic equipment and storage medium
WO2020253366A1 (en) * 2019-06-17 2020-12-24 深圳壹账通智能科技有限公司 Webpage mailbox data crawling method and apparatus, terminal, and storage medium
CN112965933A (en) * 2021-03-16 2021-06-15 支付宝(杭州)信息技术有限公司 Business rule loading method, device and equipment
CN113742550A (en) * 2021-08-20 2021-12-03 广州市易工品科技有限公司 Data acquisition method, device and system based on browser

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292642B (en) * 2022-07-29 2023-10-13 深圳市六度人和科技有限公司 Page display method and device, storage medium and computer equipment
CN115604216B (en) * 2022-09-30 2024-06-14 北京仁科互动网络技术有限公司 Service mail processing method and device based on SaaS mode
CN117113932A (en) * 2023-08-28 2023-11-24 北京规格委外技术有限公司 Multi-source estimation table data analysis method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615771A (en) * 2015-02-13 2015-05-13 广州华多网络科技有限公司 Webpage data acquiring method and device
CN105335474A (en) * 2015-09-29 2016-02-17 广州酷狗计算机科技有限公司 Method and device for obtaining content
CN107689951A (en) * 2017-07-26 2018-02-13 上海壹账通金融科技有限公司 Web data crawling method, device, user terminal and readable storage medium storing program for executing
CN108509211A (en) * 2018-02-07 2018-09-07 深圳壹账通智能科技有限公司 Application program updating method, apparatus, mobile terminal and storage medium
CN109684192A (en) * 2018-08-21 2019-04-26 平安普惠企业管理有限公司 Local test method, equipment, storage medium and device based on data processing

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956173A (en) * 2016-05-24 2016-09-21 百度在线网络技术(北京)有限公司 Page content acquisition method and apparatus
CN106886547A (en) * 2016-07-13 2017-06-23 阿里巴巴集团控股有限公司 A kind of scenario generation method and device
CN106649567A (en) * 2016-11-15 2017-05-10 杭州安恒信息技术有限公司 Web crawler system based on browser kernel
CN110390043A (en) * 2019-06-17 2019-10-29 深圳壹账通智能科技有限公司 Crawling method, device, terminal and the storage medium of webpage mailbox data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615771A (en) * 2015-02-13 2015-05-13 广州华多网络科技有限公司 Webpage data acquiring method and device
CN105335474A (en) * 2015-09-29 2016-02-17 广州酷狗计算机科技有限公司 Method and device for obtaining content
CN107689951A (en) * 2017-07-26 2018-02-13 上海壹账通金融科技有限公司 Web data crawling method, device, user terminal and readable storage medium storing program for executing
CN108509211A (en) * 2018-02-07 2018-09-07 深圳壹账通智能科技有限公司 Application program updating method, apparatus, mobile terminal and storage medium
CN109684192A (en) * 2018-08-21 2019-04-26 平安普惠企业管理有限公司 Local test method, equipment, storage medium and device based on data processing

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020253366A1 (en) * 2019-06-17 2020-12-24 深圳壹账通智能科技有限公司 Webpage mailbox data crawling method and apparatus, terminal, and storage medium
CN112073258A (en) * 2020-08-06 2020-12-11 深信服科技股份有限公司 Method for identifying user, electronic equipment and storage medium
CN112073258B (en) * 2020-08-06 2022-09-30 深信服科技股份有限公司 Method for identifying user, electronic equipment and storage medium
CN112965933A (en) * 2021-03-16 2021-06-15 支付宝(杭州)信息技术有限公司 Business rule loading method, device and equipment
CN112965933B (en) * 2021-03-16 2023-07-25 支付宝(杭州)信息技术有限公司 Business rule loading method, device and equipment
CN113742550A (en) * 2021-08-20 2021-12-03 广州市易工品科技有限公司 Data acquisition method, device and system based on browser
CN113742550B (en) * 2021-08-20 2024-04-19 广州市易工品科技有限公司 Browser-based data acquisition method, device and system

Also Published As

Publication number Publication date
WO2020253366A1 (en) 2020-12-24

Similar Documents

Publication Publication Date Title
CN110390043A (en) Crawling method, device, terminal and the storage medium of webpage mailbox data
CN108319483B (en) Webpage processing method, device, terminal and storage medium
US8725794B2 (en) Enhanced website tracking system and method
CN106897215A (en) A kind of method gathered based on WebView webpages loading performance and user behavior flow data
CN111104635B (en) Method and device for generating form webpage
CN109672580A (en) Full link monitoring method, apparatus, terminal device and storage medium
CN102955694B (en) The client realization method of sing on web Kit browser and client
US20230308504A9 (en) Method and system of application development for multiple device client platforms
US9830139B2 (en) Application experience sharing system
KR101869133B1 (en) Method and apparatus for providing web pages
CN103645951A (en) Cross-platform mobile data management system and method
CN109948077A (en) User behavior data acquisition method, device, equipment and computer storage medium
CN103810176A (en) Pre-fetching accessing method and device of webpage information
CN108494762A (en) Web access method, device and computer readable storage medium, terminal
CN104504060A (en) File downloading method in browser, browser client side and device
CN112473131A (en) Method and device for realizing game running and computer readable storage medium
CN103607454B (en) The method that android system browser arranges privately owned proxy server
CN114745146A (en) Skip interception method and device, readable storage medium and equipment
CN107770377A (en) A kind of method of the establishment interactive voice mobile phone news client based on HTML5
CN110928547A (en) Public file extraction method, device, terminal and storage medium
WO2015188535A1 (en) Method and apparatus for inserting toolbar
CN113656091B (en) Method for realizing independent open program process, related device and equipment
CN108572985B (en) Page display method and device
CN113220296B (en) Android system interaction method and device
CN111625746B (en) Application page display method, system, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination