CN105721519A - Webpage data acquisition method, device and system - Google Patents

Webpage data acquisition method, device and system Download PDF

Info

Publication number
CN105721519A
CN105721519A CN201410721389.9A CN201410721389A CN105721519A CN 105721519 A CN105721519 A CN 105721519A CN 201410721389 A CN201410721389 A CN 201410721389A CN 105721519 A CN105721519 A CN 105721519A
Authority
CN
China
Prior art keywords
website information
acquisition
load
acquisition strategies
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410721389.9A
Other languages
Chinese (zh)
Other versions
CN105721519B (en
Inventor
刘庆
黄华
殷贤君
张美德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201410721389.9A priority Critical patent/CN105721519B/en
Priority to PCT/CN2015/095584 priority patent/WO2016086784A1/en
Publication of CN105721519A publication Critical patent/CN105721519A/en
Application granted granted Critical
Publication of CN105721519B publication Critical patent/CN105721519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a webpage data acquisition method which may comprise, for example, the following steps: receiving a data batch-acquisition request, wherein the request carries target website information; determining an acquisition strategy which corresponds to the target website information and through which target data can be acquired successfully, wherein the acquisition strategy corresponding to the target website information is specifically obtained through target data acquisition test (at least including synchronization loading test) on the target website information, and the acquisition strategy includes a synchronous loading mode or an asynchronous loading mode; and adopting a corresponding loading mode to acquire target data in a webpage indicated by the target website information according to the synchronous loading mode or the asynchronous loading mode set in the acquisition strategy corresponding to the target website information. In addition, the invention discloses a webpage data acquisition device and a webpage data acquisition system.

Description

A kind of webpage data acquiring method, Apparatus and system
Technical field
The application relates to internet arena, particularly relates to a kind of webpage data acquiring method, Apparatus and system.
Background technology
In SEO (Search Engine Optimization, the search engine optimization) process of construction of website, In order to accurately recognize the global optimization situation of website present stage, can be produced some to third party's website Or the data acquisition demand of platform, formulate next step by the various information collected is analyzed Web information flow strategy.
At present, mainly third party is gathered by the web data of the Internet loading third-party website or platform Website or the data of platform.Load web data mainly to include synchronizing and asynchronous two kinds of load modes.Synchronize Side's load mode, directly returns html page for request.Asynchronous loading mode, after the page returns, Change page original structure by loading JS (JavaScript, a kind of literal translation formula script) mode thus Add and set out data.After obtaining the html page returned, html page can be resolved, Useful data are extracted and separates, such as can extract certain news in Sina website's news channel Title.
Relatively big owing to formulating the demand data amount of web information flow strategy, accordingly, it would be desirable to batch capture third party Website or the web data of platform.But, owing to different web pages data load mode may be different, in order to Ensure the accuracy of data acquisition results, can only unify to take the mode of Asynchronous loading.But, due to JS Perform to need to consume the extra time, just can add, for original synchronization, the data set out and can additionally consume in a large number Hardware resource and time, cause data acquisition efficiency relatively low.
Summary of the invention
In view of this, the purpose of the application is to provide a kind of webpage data acquiring method, Apparatus and system To realize improving the purpose of data acquisition efficiency.
First aspect in the embodiment of the present application, it is provided that a kind of webpage data acquiring method.Such as, The method may include that the request receiving batch capture data, and wherein, described request carries target network Location information;Determine described target website information corresponding can the acquisition strategies of successful acquisition target data, its In, acquisition strategies corresponding to described target website information is carried out at least especially by this target website information Obtaining including the target data collecting test synchronizing to load test, described acquisition strategies includes synchronizing loading side Formula or Asynchronous loading mode;Synchronization according to arranging in the acquisition strategies that described target website information is corresponding adds Load mode or Asynchronous loading mode, take corresponding load mode to gather what described target website information was pointed to Target data in webpage.
Second aspect in the embodiment of the present application, it is provided that a kind of collecting webpage data device.Such as, This device may include that request reception unit, may be used for receive batch capture data request, wherein, Described request carries target website information.Policy determining unit, is determined for described target network address Information corresponding can the acquisition strategies of successful acquisition target data, wherein, described target website information is corresponding Acquisition strategies especially by this target website information at least include synchronize load test number of targets Obtaining according to collecting test, described acquisition strategies includes synchronizing load mode or Asynchronous loading mode.Gather single Unit, may be used for according to the synchronization load mode arranged in acquisition strategies corresponding to described target website information Or Asynchronous loading mode, take corresponding load mode to gather in the webpage that described target website information is pointed to Target data.
The 3rd aspect in the embodiment of the present application, it is provided that a kind of collecting webpage data system.Such as, This system may include that client, may be used for sending the request of batch capture data, wherein, described Request carries target website information.Acquisition strategies configuration service device, may be used for receiving client and sends The request of batch capture data, determine corresponding can successfully the adopting of target website information that described request is carried The acquisition strategies of collection target data, wherein, acquisition strategies corresponding to described target website information especially by This target website information is at least included, and the target data collecting test synchronizing to load test obtains, institute State acquisition strategies and include synchronizing load mode or Asynchronous loading mode, and, generate for according to described mesh The synchronization load mode arranged in the acquisition strategies that mark website information is corresponding or Asynchronous loading mode, take phase The load mode answered gathers the acquisition tasks of the target data in the webpage that described target website information is pointed to, Described acquisition tasks is distributed to the acquisition server in acquisition server cluster.Acquisition server cluster, May be used for receiving the acquisition tasks of acquisition strategies configuration service device distribution, perform described acquisition tasks, instead The target data that feedback collects.
Visible the application has the advantages that
Owing to the embodiment of the present application is after receiving the request of batch capture data, the mesh carried according to request Mark website information determine correspondence can the acquisition strategies of successful acquisition target data, and this acquisition strategies is By this target website information at least being included the target data collecting test synchronizing to load test obtains , therefore, if webpage corresponding to target website information can gather out number of targets to synchronize load mode According to, then the load mode that can comprise in the acquisition strategies of successful acquisition target data that test obtains is the most permissible It is to synchronize load mode, thus takes the synchronization load mode arranged in acquisition strategies to gather data, make same Step just can add the data set out can be to avoid using Asynchronous loading mode to load, it is to avoid causes resource and time Extra consumption, therefore, the embodiment of the present application ensure successful acquisition to while target data, permissible It is effectively improved data acquisition efficiency.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to reality Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that below, Accompanying drawing in description is only some embodiments described in the application, for those of ordinary skill in the art From the point of view of, on the premise of not paying creative work, it is also possible to obtain the attached of other according to these accompanying drawings Figure.
Fig. 1 is a kind of webpage data acquiring method schematic flow sheet disclosed in the embodiment of the present application;
Fig. 2 is a kind of collecting webpage data apparatus structure schematic diagram disclosed in the embodiment of the present application;
Fig. 3 is a kind of collecting webpage data system structure schematic diagram disclosed in the embodiment of the present application.
Detailed description of the invention
For the technical scheme making those skilled in the art be more fully understood that in the application, below in conjunction with Accompanying drawing in the embodiment of the present application, clearly and completely retouches the technical scheme in the embodiment of the present application State, it is clear that described embodiment is only some embodiments of the present application rather than whole enforcement Example.Based on the embodiment in the application, those of ordinary skill in the art are not before making creative work Put the every other embodiment obtained, all should belong to the scope of protection of the invention.
Typically, since JS performs to need to consume the extra time, if to same page structure not Perform JS, then execution efficiency has certain lifting.Based on this principle, before batch capture web data, If able to what the load mode of page data at least included synchronizing loading test effectively analyzes survey Examination, then can distinguish and can synchronize the website information of loaded targets data and necessary Asynchronous loading number of targets According to website information, and correspondence is set can the acquisition strategies of successful acquisition target data.So, criticizing When amount gathers data, can take to be provided with according to the acquisition strategies corresponding with target website information Synchronize load mode or Asynchronous loading mode gathers data, make originally to synchronize just to add the data set out permissible Avoid using Asynchronous loading mode to load, thus avoid causing the extra consumption of resource and time, Ke Yiyou The raising data acquisition efficiency of effect.
For example, with reference to Fig. 1, a kind of webpage data acquiring method flow process signal provided for the embodiment of the present application Figure.As it is shown in figure 1, the method may include that
S110, the request of reception batch capture data, wherein, described request carries target website information.
Such as, the request of the batch capture data received, user can be carried defeated in front end page The batch capture configuration information entered.Assume to want the batch capture 1688 site search page different crucial in retrieval Search result data during word.So batch capture configuration information may include that target website information " http: //s.1688.com/selloffer/offer_search.htm?Keywords=$ { keyword}&button_click=to P&n=y ".Wherein, { keyword} can replace to different key words to $, the HTML of target data Text, expression is taken out for label can be configured to id:breadCrumbText | class [0]: sm-navigatebar-count | Take below this html tag of breadCrumbText under first sm-navigatebar-count class Plain text.Wherein, batch capture configuration information can also be configured to the describing mode of XPath, this Shen Please this is not limited.It is understood that batch capture configuration information can also according to user certainly Oneself demand selectivity configures other parameters, and this is not limited by the application.
It addition, according to actual needs, if in addition to the batch capture configuration information that user submits to, also Need to read relevant parameter from alternative document, then also need to for preserve the associated documents of this parameter with The mapping relations of associated documents storage address are configured, in order to according to mapping when carrying out data acquisition test Relation reads the parameter in file.Such as, at the batch capture 1688 site search page in search difference In the application scenarios of search result data during key word, the key word file that user submits to can be according to finger Determine address and download to the machine for performing data acquisition test, meanwhile, arrange and preserve key word literary composition Part and the mapping relations storing address, such as, " taskKeywordsFile ": "/home/admin/1/test.txt ", Thus when carrying out data acquisition test, the key word in key word file can be read according to mapping relations.
S120, determine described target website information corresponding can the acquisition strategies of successful acquisition target data, Wherein, the acquisition strategies that described target website information is corresponding especially by this target website information is carried out to Including that the target data collecting test synchronizing to load test obtains less, described acquisition strategies includes synchronizing to load Mode or Asynchronous loading mode.
It should be noted that described target website information corresponding can the collection plan of successful acquisition target data Slightly, can enter beforehand through to various different website information before receiving the request of batch capture data The target data collecting test walking to include synchronizing to load test less obtains, it is also possible to receiving for institute When stating the request of batch capture data of target website information, in real time by this target website information is carried out At least include that the target data collecting test synchronizing to load test obtains, then or, it is also possible to it is to determine Test in advance the acquisition strategies of acquisition invalid after, again proceed to the number of targets including synchronizing to load test less Obtain according to collecting test.
Such as, at least include synchronizing to load the number of targets tested to various different website information in advance According in the embodiment of collecting test, the test configurations that user inputs in front end page can be received in advance Information, mainly includes dissimilar network address to be tested, for identifying the html tag etc. of target data. Determine need test website information and correspondence for identify target data html tag it After, can carry out synchronizing the target data collecting test that load mode is preferential, obtain dissimilar network address and divide Not corresponding acquisition strategies.
In some possible embodiments, the acquisition strategies testing acquisition in advance can gather plan as history Slightly it is stored in data base, in order to when receiving the request of batch capture data, extracts from data base Go out the history acquisition strategies of correspondence to carry out data acquisition.
Certainly, before extracting the history acquisition strategies that described target website information is corresponding, it is also possible to enter one Step judges whether the history acquisition strategies that target website information that described request is carried is corresponding, if not Exist, then can be by this target website information being carried out the synchronization preferential target data collection of load mode Test, it is thus achieved that corresponding can the acquisition strategies of successful acquisition target data, described acquisition strategies includes synchronizing Load mode or Asynchronous loading mode, and, this acquisition strategies is saved as described target website information pair The history acquisition strategies answered.
In some possible embodiments, the target website information correspondence that described request is carried can extracted History acquisition strategies after, directly determine that with described history acquisition strategies be described target website information pair That answers can the acquisition strategies of successful acquisition target data.
In other possible embodiments, it is contemplated that the loading of the page data of third party's website or platform Mode is it may happen that change, and originally synchronizing loading can be with the network address of successful acquisition to target data, and having can Can become can only the network address of Asynchronous loading.Therefore, the history collection plan that target website information is corresponding is being extracted After Lve, it is also possible to carry out small-scale test, thus verify already present history acquisition strategies and whether may be used It is continuing with.
Such as, small-scale test may include that and determines for identifying little rule by default small-scale test order The html tag of mould test data and described target website information need the website information of test, root The acquisition strategies corresponding according to described target website information and for identifying the HTML testing on a small scale data Label, attempts the small-scale test data gathering in the webpage needing the website information of test to point to, if Gather successfully, then may determine that described history acquisition strategies be described target website information corresponding can be successful Gather the acquisition strategies of target data, carry out formal batch capture.And, also include, if gathering not become Merit, then can at least include that to this target website information the target data collection synchronizing to load test is surveyed Examination, it is thus achieved that corresponding can the acquisition strategies of successful acquisition target data, according to the acquisition strategies obtained more The history acquisition strategies that new described target website information is corresponding.
It should be noted that the embodiment of the present application is to the detailed description of the invention of default small-scale test order not Limit.For example, it is possible to preset quantity or certain reduction ratio on a small scale, from target website information according to fixing In select a small amount of website information needing test, etc..Such as, in conjunction with above-mentioned batch capture 1688 The application scenarios of the site search page search result data when the different key word of search.Carrying out little rule Mould test time, can from user submit to a large amount of key words extract first 10 (if user submit to pass Keyword, less than 10, can be extracted by actual quantity), it is substituted into one by one in user configured website information The position of search keyword parameter, determines 10 website information needing test.Thus survey as required 10 website information of examination and, for identifying the information such as html tag of target data, take The history acquisition strategies extracted from data base, tests.Such as, history acquisition strategies can be wrapped Include load mode (synchronizing load mode or Asynchronous loading mode), connect time-out time, acquisition page time-out The parameters such as time.In this application scenarios, the form of the history acquisition strategies extracted can be: “[{"url":"http://s.1688.com/selloffer/offer_search.htm?Keywords=$ { keyword} &button_click=top&n=y ", " keywordsPath ": "/usr/group/seo/test.txt ", " conto ": " 5000 ","readto":"6000","crawlType":"sync"}]”.Through test on a small scale, if it is determined that gather not Success, can re-start at least include synchronizing to load test for user configured target website information Target data collecting test target data collecting test, update described according to the acquisition strategies that regains The history acquisition strategies that target website information is corresponding.Carrying out based on the formal batch of the acquisition strategies after updating Target data gathers.
It should be noted that target website information is at least included synchronizing to load surveying by the embodiment of the present application The specific implementation of the target data collecting test of examination does not limits.
Such as, in some possible embodiments, at least include target website information synchronizing to load The target data collecting test of test may include that taking to synchronize load mode loads described target network address letter The webpage that breath points to, for synchronizing to load the webpage that obtains, therefrom attempts reading target data, for can From synchronizing the webpage that loading obtains reads out the website information of target data, the network address letter of the type is set Load mode in the acquisition strategies that breath is corresponding is for synchronizing load mode, for obtaining from synchronizing loading Webpage in read out the website information of target data, the collection plan that the website information of the type is corresponding is set Load mode in slightly is Asynchronous loading mode.
The most such as, in other possible embodiments, can be described first to take Asynchronous loading mode to load The webpage that target website information is pointed to, from Asynchronous loading to webpage attempt reading target data, then adopt Take and synchronize the webpage that the load mode described target website information of loading is pointed to, from the webpage that synchronization is loaded into Attempt reading target data.If the network address letter of target data can be read out from the webpage that synchronization is loaded into Breath, then can arrange the load mode in the acquisition strategies that the website information of the type is corresponding for synchronizing to load Mode.If can not be from synchronizing the webpage that is loaded into read out target data and can be from Asynchronous loading To webpage in read out target data, then the acquisition strategies that the website information of the type is corresponding can be set In load mode be Asynchronous loading mode.
In some possible embodiments, it is contemplated that it is steady that loading Webpage success or not also suffers from network Qualitative effect, it may be necessary to retry connection when connecting time-out and retry reading when reading page time-out The page, therefore, during at least including the target data collecting test synchronizing to load test, institute State the step taking to synchronize the webpage that load mode loads the sensing of described target website information can repeatedly hold OK, and, it is also possible to including: when performing every time, all record and network address time of being connected of foundation and Upon connection for obtaining the time of Webpage;In the collection plan that the website information arranging the type is corresponding When load mode in slightly is for synchronizing load mode, connect according to the foundation of record during being performed a plurality of times Time and upon connection for obtaining time of Webpage, carry out in the acquisition strategies of correspondence with What step load mode was corresponding connects time-out time and obtains the setting of page time-out time.And, for Can repeatedly can not take different from synchronizing the webpage that loading obtains reads out the website information of target data Step load mode loads its webpage pointed to, and all record when performing every time set up with network address be connected time, And upon connection for obtaining the time of Webpage, thus corresponding in the website information arranging the type Acquisition strategies in load mode when being Asynchronous loading mode, can be according to repeatedly taking Asynchronous loading side What formula recorded during loading webpage sets up the time connected and is used for upon connection obtaining Webpage Time, carry out correspondence acquisition strategies in connect time-out time and obtain page time-out time setting.
Wherein, what described basis recorded during being performed a plurality of times sets up the time connected and is connecting Afterwards for obtaining the time of Webpage, carry out the acquisition strategies of correspondence connects time-out time and acquisition The specific implementation of the setting of page time-out time does not limits.For example, it is possible to take be performed a plurality of times during remember The meansigma methods setting up the time connected of record obtains the connection time-out time needing to set, and takes and was performed a plurality of times In journey, the meansigma methods of the time for obtaining Webpage of record obtains the acquisition page time-out needing to set Time.Connect time-out time it is of course also possible to there are other to calculate and obtain the realization of page time-out time Mode, this is not limited by the application.
In superincumbent embodiment, owing to being provided with connection time-out time and acquisition in acquisition strategies Page time-out time, thus during follow-up batch capture data, can be according to the connection set in acquisition strategies Time-out time, re-emits connection request when occurring and connecting time-out, and, set according in acquisition strategies Fixed acquisition page time-out time, re-emits reading page request when occurring and reading page time-out.Separately Outward, can also set in acquisition strategies retry the number of times higher limit of connection and retrying read the page time Number higher limit, in order to when number of retries exceedes higher limit, abandon this website information correspondence page data Collection.
S130, according to the synchronization load mode arranged in acquisition strategies corresponding to described target website information or Asynchronous loading mode, takes corresponding load mode to gather in the webpage that described target website information is pointed to Target data.
It should be noted that the target website information that described request is carried can be one or more.This Bright embodiment can the most at least include the target synchronizing to load test to different types of website information Data acquisition test, distinguishes and can synchronize the website information of loaded targets data and necessary Asynchronous loading mesh The website information of mark data, and correspondence is set can the acquisition strategies of successful acquisition target data.For many Individual different types of target website information, can take corresponding acquisition strategies to gather webpage respectively In target data.Wherein, at least include various types of website information synchronizing to load surveying The target data collecting test of examination, is referred to above-mentioned target website information at least being included, synchronization adds The embodiment carrying test realizes, and does not repeats them here.
Visible, that application the embodiment of the present application provides method, due to the collection plan that target website information is corresponding It is slightly by this target website information at least being included the target data collecting test synchronizing to load test Obtain, therefore, when batch capture data, can be according to the collection plan corresponding with target website information Slightly, take the synchronization load mode being provided with or Asynchronous loading mode to gather data, make synchronization just can add The data set out to avoid using Asynchronous loading mode to load, thus can avoid causing the volume of resource and time Outer consumption, can effectively improve data acquisition efficiency.It addition, the page is also connected and the page by the application The reading time carries out recording, analyzing, and sets and connect time-out time, acquisition page accordingly in acquisition strategies Face time-out time, thus rationally can call same according to acquisition strategies when formally carrying out batch data and gathering Step or asynchronous two kinds of load modes, ensureing that accurate acquisition improves collection effect while data to greatest extent Rate, it is to avoid additional hardware resources and time loss.
Corresponding with above-mentioned webpage data acquiring method, present invention also provides a kind of collecting webpage data Device.
For example, with reference to Fig. 2, a kind of collecting webpage data apparatus structure passed through for the embodiment of the present application shows It is intended to.As in figure 2 it is shown, this device may include that
Request reception unit 210, may be used for receive batch capture data request, wherein, described please Ask and carry target website information.Policy determining unit 220, is determined for described target network address letter Breath corresponding can the acquisition strategies of successful acquisition target data, wherein, described target website information is corresponding Acquisition strategies is especially by least including this target website information synchronizing to load the target data tested Collecting test obtains, and described acquisition strategies includes synchronizing load mode or Asynchronous loading mode.Collecting unit 230, may be used for according to the synchronization load mode arranged in acquisition strategies corresponding to described target website information Or Asynchronous loading mode, take corresponding load mode to gather in the webpage that described target website information is pointed to Target data.
In some possible embodiments, the target website information correspondence that described request is carried can extracted History acquisition strategies after, directly determine that with described history acquisition strategies be described target website information pair That answers can the acquisition strategies of successful acquisition target data.Therefore, described policy determining unit 220, permissible For extracting the history acquisition strategies that described target website information is corresponding, described history acquisition strategies is the most pre- First pass through and this target website information is at least included, and the target data collecting test synchronizing to load test obtains , described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode, determines that described history is adopted Collection strategy be described target website information corresponding can the acquisition strategies of successful acquisition target data.
In other possible embodiments, extract history acquisition strategies corresponding to target website information it After, it is also possible to carry out small-scale test.Such as, described policy determining unit 220 includes: extract son Unit 221, may be used for extracting the history acquisition strategies that described target website information is corresponding, described history Acquisition strategies is specifically beforehand through at least including this target website information synchronizing to load the target tested Data acquisition test obtains, and described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode. Test determines subelement 222 on a small scale, may be used for determining for identifying by default small-scale test order The html tag of small-scale test data and described target website information need the network address letter of test Breath.Strategy test subelement 223, may be used for the history collection corresponding according to described target website information Strategy and for identifying the html tag testing on a small scale data, attempts gathering the network address needing test Small-scale test data in the webpage that information is pointed to.Strategy determines subelement 224, if may be used for Gather successfully, determine described history acquisition strategies be described target website information corresponding can successful acquisition mesh The acquisition strategies of mark data.Test subelement 225, if may be used for gathering unsuccessful, the most right This target website information at least includes the target data collecting test synchronizing to load test, it is thus achieved that corresponding Can the acquisition strategies of successful acquisition target data.Update subelement 226, may be used for according to described survey The acquisition strategies that swab unit obtains updates the history acquisition strategies that described target website information is corresponding.
It should be noted that described test subelement 225 is gathered by the embodiment of the present application by target data Test, it is thus achieved that corresponding can the specific implementation of acquisition strategies of successful acquisition target data not limit.Example As, in some possible embodiments, wherein, described test subelement 225 may include that synchronization adds Subelements 2251, may be used for taking to synchronize load mode and loads the net that described target website information is pointed to Page.Target data reads subelement 2252, is used against synchronizing to load the webpage obtained, therefrom tastes Academic probation takes target data.Synchronization policy arranges subelement 2253, is used against to load from synchronization To webpage in read out the website information of target data, the collection that the website information of the type is corresponding is set Load mode in strategy is for synchronizing load mode.Asynchronous strategy setting subelement 2254, may be used for pin To arranging the type from synchronizing the webpage that loading obtains reads out the website information of target data Load mode in the acquisition strategies that website information is corresponding is Asynchronous loading mode.
In some possible embodiments, it is contemplated that it is steady that loading Webpage success or not also suffers from network Qualitative effect, it may be necessary to retry connection when connecting time-out and retry reading when reading page time-out The page, therefore, wherein, described synchronization adds subelements 2251, may be used for being performed a plurality of times and takes to synchronize Load mode loads the step of the webpage that described target website information is pointed to.And, described test subelement is also May include that synchronous recording subelement 2255, may be used for adding subelements in described synchronization and perform every time During loading, all records set up the time being connected and upon connection for obtaining Webpage with network address Time.Synchronization timeout sets subelement 2256, may be used for the website information arranging the type corresponding When load mode in acquisition strategies is for synchronizing load mode, add subelements repeatedly according to described synchronization Perform the time setting up connection of record in loading procedure and upon connection for obtaining Webpage Time, carry out the acquisition strategies of correspondence synchronizing connection time-out time corresponding to load mode and obtaining page The setting of face time-out time.Asynchronous record subelement 2257, is used against to load from synchronization To webpage in read out the website information of target data, repeatedly take Asynchronous loading mode to load it and point to Webpage, and all record when performing every time and set up, with network address, the time being connected and be used for upon connection obtaining Take the time of Webpage.Asynchronous timeouts subelement 2258, may be used for arranging the net of the type When load mode in the acquisition strategies that location information is corresponding is Asynchronous loading mode, asynchronous according to repeatedly taking What load mode recorded during loading webpage sets up the time connected and is used for upon connection obtaining net The time of the page page, carry out the acquisition strategies of correspondence connecting time-out time and obtaining page time-out time Setting.
It should be noted that the extraction subelement 221 described in the embodiment of the present application, on a small scale test determine Subelement 222, strategy test subelement 223, strategy determine subelement 224, synchronism detection subelement 225, Update subelement 226, synchronize to add subelements 2251, target data reading subelement 2252, synchronize plan Slightly arrange subelement 2253, asynchronous strategy setting subelement 2254, synchronous recording subelement 2255, with Step timeouts subelement 2256, asynchronous record subelement 2257, asynchronous timeouts subelement 2258 The most all with dotted lines, to represent that these unit are not the collecting webpage data dresses that the application provides The necessary unit put.
Corresponding with above-mentioned webpage data acquiring method, present invention also provides a kind of for realizing the party The collecting webpage data system of method.
For example, with reference to Fig. 3, a kind of collecting webpage data system structure provided for the embodiment of the present application is shown It is intended to.As it is shown on figure 3, this system may include that
Client 310, may be used for sending the request of batch capture data, and wherein, described request is carried There is target website information.
Acquisition strategies configuration service device 320, may be used for receive batch capture data request, wherein, Described request carries target website information, determine described target website information corresponding can successful acquisition mesh The acquisition strategies of mark data, wherein, acquisition strategies corresponding to described target website information is especially by this Target website information at least include synchronize load test target data collecting test obtain, described in adopt Collection strategy includes synchronizing load mode or Asynchronous loading mode;And, generate for according to described target network The synchronization load mode arranged in the acquisition strategies that location information is corresponding or Asynchronous loading mode, take corresponding Load mode gathers the acquisition tasks of the target data in the webpage that described target website information is pointed to, by institute State the acquisition server that acquisition tasks is distributed in acquisition server cluster 330.
Acquisition server cluster 330, the collection that may be used for receiving the distribution of acquisition strategies configuration service device is appointed Business, performs described acquisition tasks, the target data that feedback collection arrives.
Visible, that application the embodiment of the present application provides collecting webpage data system, can be joined by acquisition strategies Put the raw batches of acquisition tasks of server 320, by preset distribution policy, the acquisition tasks of batch is divided Issue acquisition server idle in acquisition server cluster 330 so that acquisition tasks can concurrently perform, Further increase the collecting efficiency of web data.
In some possible embodiments, user can arrange batch capture configuration information in client 310, User can send, by client 310, the request carrying this batch capture configuration information.Wherein batch Acquisition configuration information can include the parameters such as target website information.The batch capture being generally noted above In the application scenarios of the search result data of the different search key words of 1688 websites, acquisition strategies configuration clothes Business device 320 is in addition to obtaining batch capture configuration information, in addition it is also necessary to key word file user submitted to Download in acquisition server cluster 330 for performing the collection clothes of data acquisition test according to specifying address On business device, meanwhile, the mapping relations arranging and preserving key word file with store address, such as, "taskKeywordsFile":"/home/admin/1/test.txt".And these mapping relations are encapsulated into test assignment In, acquisition server it is sent in the lump with test assignment.Thus carry out data acquisition survey at acquisition server During examination, the key word in key word file can be read according to mapping relations, expand accordingly for Search out the target website information to relevant page data.
In other possible embodiments, acquisition strategies configuration service device 320 may include that strategy is raw Become server 321, testing service device 322, database server 323.
Wherein, strategy generating server 321, may be used in advance for dissimilar network address, generate pre- First test assignment, test assignment will submit to testing service device 322, from database server 323 in advance The load mode of record, Connection Time, acquisition page time etc. when obtaining test.Add according to acquired Load mode, Connection Time, acquisition page time generate the acquisition strategies corresponding with dissimilar network address.To Database server 323 sends acquisition strategies corresponding to dissimilar network address so that as history acquisition strategies Warehouse-in preserves.And, receive the request that client 310 sends, obtain target network from database server The history acquisition strategies that location information is corresponding.Generating takes history acquisition strategies to enter described target website information The small-scale test assignment of row test on a small scale.Small-scale test assignment is submitted to testing service device 322. If test gathers successfully, then can generate for gathering described target network according to described history acquisition strategies The acquisition tasks of the target data in the webpage that location information is pointed to.If gathering unsuccessful, then generate institute That states that target website information carries out target data collection retries task.Test assignment will be retried and submit to test Server 322.When obtaining the load mode of record, connection when retesting from database server 323 Between, obtain page time etc..According to acquired load mode, Connection Time, acquisition page time, Generate the acquisition strategies of the renewal corresponding with target website information.Send described to database server 323 The acquisition strategies of the renewal that target website information is corresponding is so that the history preserved in more new database gathers plan Omit, and generate in the webpage for gathering the sensing of described target website information according to the acquisition strategies updated The acquisition tasks of target data.The acquisition tasks of generation is distributed to adopting in acquisition server cluster 330 Collection server performs.
Wherein, testing service device 322, may be used for being tested in advance from strategy generating server 321 Task, on a small scale test assignment and/or, retry task.By obtain obtain in advance test assignment, On a small scale test assignment and/or, collection that the task of retrying is distributed in acquisition server cluster 330 clothes Business device performs.It is collected in the load mode during test assignment performs, Connection Time, the acquisition page Time etc..The load mode collected, Connection Time, acquisition page time etc. are saved in data base So that strategy generating server 321 uses.In testing service device 322, synchronization loading side can be comprised Formula and two kinds of load modes of Asynchronous loading mode, wherein, synchronize load mode and can use httpclient+ The mode of htmlparser carries out loading and page parsing, and Asynchronous loading mode can use webkit to carry out Load and page parsing.
Wherein, database server 323, may be used for preserving what described testing service device 322 was collected Load mode, Connection Time, acquisition page time etc., and, it is raw that conversation strategy generates server 321 The acquisition strategies become.
In superincumbent embodiment, acquisition strategies configuration service device 320 and acquisition server cluster 330 Can arrange in different network systems.Database server 323 can be built in MySQL database On cluster.Furthermore, it is contemplated that the magnitude of data, database server 323 can use distributed carrying out Dispose the reading performance good with offer.
It should be noted that strategy generating server 321 described in the embodiment of the present application, testing service device 322, Database server is in fig. 2 with dotted lines, to represent that these unit are not acquisition strategies configuration service The essential service device of device.
For convenience of description, it is divided into various unit to be respectively described with function when describing apparatus above.Certainly, The function of each unit can be realized in same or multiple softwares and/or hardware when implementing the present invention.
As seen through the above description of the embodiments, those skilled in the art is it can be understood that arrive The present invention can add the mode of required general hardware platform by software and realize.Based on such understanding, The part that prior art is contributed by technical scheme the most in other words can be with software product Form embody, this computer software product can be stored in storage medium, as ROM/RAM, Magnetic disc, CD etc., including some instructions with so that computer equipment (can be personal computer, Server, or the network equipment etc.) perform each embodiment of the present invention or some part institute of embodiment The method stated.
Each embodiment in this specification all uses the mode gone forward one by one to describe, identical between each embodiment Similar part sees mutually, and what each embodiment stressed is different from other embodiments Part.For system embodiment, owing to it is substantially similar to embodiment of the method, so retouching That states is fairly simple, and relevant part sees the part of embodiment of the method and illustrates.
The present invention can be used in numerous general or special purpose computing system environment or configuration.Such as: Ge Renji Calculation machine, server computer, handheld device or portable set, laptop device, multicomputer system, System based on microprocessor, set top box, programmable consumer-elcetronics devices, network PC, small-sized calculating Machine, mainframe computer, the distributed computing environment including any of the above system or equipment etc..
The present invention can described in the general context of computer executable instructions, Such as program module.Usually, program module includes performing particular task or realizing specific abstract data class The routine of type, program, object, assembly, data structure etc..Can also be in a distributed computing environment Put into practice the present invention, in these distributed computing environment, by by communication network connected remotely Reason equipment performs task.In a distributed computing environment, program module may be located at and includes storage device In interior local and remote computer-readable storage medium.
It should be noted that in this article, the relational terms of such as first and second or the like is used merely to One entity or operation are separated with another entity or operating space, and not necessarily requires or imply Relation or the order of any this reality is there is between these entities or operation.And, term " includes ", " comprise " or its any other variant is intended to comprising of nonexcludability, so that include that one is The process of row key element, method, article or equipment not only include those key elements, but also include the brightest Other key elements really listed, or also include intrinsic for this process, method, article or equipment Key element.In the case of there is no more restriction, statement " including ... " key element limited, It is not precluded from there is also in including the process of described key element, method, article or equipment other identical Key element.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the protection model of the present invention Enclose.All any modification, equivalent substitution and improvement etc. made within the spirit and principles in the present invention, all Comprise within the scope of the present invention.

Claims (11)

1. a webpage data acquiring method, it is characterised in that including:
Receiving the request of batch capture data, wherein, described request carries target website information;
Determine described target website information corresponding can the acquisition strategies of successful acquisition target data, wherein, Acquisition strategies corresponding to described target website information at least includes especially by this target website information Synchronize load test target data collecting test obtain, described acquisition strategies include synchronize load mode or Asynchronous loading mode;
According to the synchronization load mode arranged in the acquisition strategies that described target website information is corresponding or asynchronous add Load mode, takes corresponding load mode to gather the number of targets in the webpage that described target website information is pointed to According to.
Method the most according to claim 1, it is characterised in that described determine that described target network address is believed Corresponding can the acquisition strategies of successful acquisition target data the including of breath:
Extracting the history acquisition strategies that described target website information is corresponding, described history acquisition strategies is the most pre- First pass through and this target website information is at least included, and the target data collecting test synchronizing to load test obtains , described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode;
Determine described history acquisition strategies be described target website information corresponding can successful acquisition target data Acquisition strategies.
Method the most according to claim 1, it is characterised in that described determine that described target network address is believed Corresponding can the acquisition strategies of successful acquisition target data the including of breath:
Extracting the history acquisition strategies that described target website information is corresponding, described history acquisition strategies is the most pre- First pass through and this target website information is at least included, and the target data collecting test synchronizing to load test obtains , described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode;
By default small-scale test order determine for the html tag identifying on a small scale test data with And described target website information needs the website information of test;
The history acquisition strategies corresponding according to described target website information and being used for identifies test number on a small scale According to html tag, attempt gather need test website information point to webpage in small-scale test Data;
If gathering successfully, it is determined that described history acquisition strategies be described target website information corresponding can The acquisition strategies of successful acquisition target data;
If gathering unsuccessful, then this target website information is at least included the mesh synchronizing to load test Mark data acquisition test, it is thus achieved that corresponding can the acquisition strategies of successful acquisition target data, according to obtain Acquisition strategies updates the history acquisition strategies that described target website information is corresponding.
4. according to the method described in any one of claim 1-3, it is characterised in that described to target network address Information at least includes that the target data collecting test synchronizing to load test includes:
Take to synchronize load mode and load the webpage that described target website information is pointed to, for synchronizing to load The webpage arrived, therefrom attempts reading target data, for reading out from synchronizing to load the webpage obtained The website information of target data, arranges the load mode in the acquisition strategies that the website information of the type is corresponding For synchronizing load mode, for can not load the network address reading out target data the webpage obtained from synchronization Information, the load mode arranged in the acquisition strategies that the website information of the type is corresponding is Asynchronous loading mode.
Method the most according to claim 4, it is characterised in that described in take synchronize load mode add The step carrying the webpage that described target website information is pointed to is performed a plurality of times, and, also include:
When performing every time, all records are set up, with network address, the time being connected and are used for upon connection obtaining The time of Webpage, the load mode in the acquisition strategies that the website information arranging the type is corresponding is When synchronizing load mode, setting up the time connected and connecting according to record during being performed a plurality of times For obtaining the time of Webpage after connecing, carry out that the acquisition strategies of correspondence synchronizes load mode corresponding Connect time-out time and obtain the setting of page time-out time;
For repeatedly can not adopting from synchronizing the webpage that loading obtains reads out the website information of target data Take Asynchronous loading mode and load its webpage pointed to, and all record when performing every time to set up with network address and be connected Time and upon connection for obtaining time of Webpage, the website information pair of the type is being set When load mode in the acquisition strategies answered is Asynchronous loading mode, according to repeatedly taking Asynchronous loading mode Record during loading webpage sets up the time connected and upon connection for obtaining Webpage Time, carry out connection time-out time and acquisition page that in the acquisition strategies of correspondence, Asynchronous loading mode is corresponding The setting of face time-out time.
6. a collecting webpage data device, it is characterised in that including:
Request reception unit, for receiving the request of batch capture data, wherein, described request carries Target website information;
Policy determining unit, for determine described target website information corresponding can successful acquisition target data Acquisition strategies, wherein, acquisition strategies corresponding to described target website information is especially by this target network Location information at least includes that the target data collecting test synchronizing to load test obtains, described acquisition strategies Including synchronizing load mode or Asynchronous loading mode;
Collecting unit, the synchronization arranged in the acquisition strategies corresponding according to described target website information adds Load mode or Asynchronous loading mode, take corresponding load mode to gather what described target website information was pointed to Target data in webpage.
Device the most according to claim 6, it is characterised in that described policy determining unit, is used for Extracting the history acquisition strategies that described target website information is corresponding, described history acquisition strategies is led to the most in advance Cross and this target website information at least included, and the target data collecting test synchronizing to load test obtains, Described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode, determines that described history gathers plan What the most described target website information was corresponding can the acquisition strategies of successful acquisition target data.
Device the most according to claim 6, it is characterised in that described policy determining unit includes:
Extract subelement, for extracting the history acquisition strategies that described target website information is corresponding, described in go through History acquisition strategies is specifically beforehand through at least including this target website information synchronizing to load the mesh tested Mark data acquisition test obtains, and described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode;
Test determines subelement on a small scale, for determining for identifying little rule by default small-scale test order The html tag of mould test data and described target website information need the website information of test;
Strategy test subelement, for according to history acquisition strategies corresponding to described target website information and For identifying the html tag testing data on a small scale, attempting gathering needs the website information of test to point to Webpage in small-scale test data;
Strategy determines subelement, if for gathering successfully, determining that described history acquisition strategies is described mesh What mark website information was corresponding can the acquisition strategies of successful acquisition target data;
Test subelement, if unsuccessful for gathering, then at least includes this target website information Synchronize to load the target data collecting test of test, it is thus achieved that corresponding can the collection of successful acquisition target data Strategy;
Update subelement, update described target network for the acquisition strategies obtained according to described test subelement The history acquisition strategies that location information is corresponding.
Device the most according to claim 8, it is characterised in that described test subelement includes:
Synchronization adds subelements, for taking synchronization load mode to load what described target website information was pointed to Webpage;Target data reads subelement, for for synchronizing to load the webpage obtained, therefrom attempting reading Target data;Synchronization policy arranges subelement, for for reading from synchronizing to load the webpage obtained Go out the website information of target data, the loading side in the acquisition strategies that the website information of the type is corresponding is set Formula is for synchronizing load mode;Asynchronous strategy setting subelement, for for obtaining from synchronizing to load Webpage reads out the website information of target data, the acquisition strategies that the website information of the type is corresponding is set In load mode be Asynchronous loading mode.
Device the most according to claim 9, it is characterised in that described synchronization adds subelements, For the step of taking synchronize webpage that load mode load described target website information sensing is performed a plurality of times;
And, described test subelement also includes:
Synchronous recording subelement, for described synchronization add subelements every time perform load time, all records The time being connected is set up and upon connection for obtaining the time of Webpage with network address;
Synchronization timeout sets subelement, in the acquisition strategies that the website information arranging the type is corresponding Load mode for synchronizing load mode time, add subelements according to described synchronization and loaded being performed a plurality of times Record in journey sets up the time connected and upon connection for obtaining the time of Webpage, carries out Corresponding acquisition strategies synchronizes connection time-out time corresponding to load mode and obtains page time-out time Setting;
Asynchronous record subelement, for for reading out number of targets from synchronizing to load the webpage obtained According to website information, repeatedly take Asynchronous loading mode to load its webpage pointed to, and equal when performing every time Record and set up the time being connected with network address and upon connection for obtaining the time of Webpage;
Asynchronous timeouts subelement, in the acquisition strategies that the website information arranging the type is corresponding Load mode when being Asynchronous loading mode, during loading webpage according to repeatedly taking Asynchronous loading mode Setting up the time connected and being used for obtaining the time of Webpage upon connection of record, carries out correspondence Acquisition strategies in connect time-out time and obtain page time-out time setting.
11. 1 kinds of collecting webpage data systems, it is characterised in that including:
Client, for sending the request of batch capture data, wherein, described request carries target network Location information;
Acquisition strategies configuration service device, for receiving the request of the batch capture data that client sends, really Target website information that fixed described request is carried corresponding can the acquisition strategies of successful acquisition target data, its In, acquisition strategies corresponding to described target website information is carried out at least especially by this target website information Obtaining including the target data collecting test synchronizing to load test, described acquisition strategies includes synchronizing loading side Formula or Asynchronous loading mode, and, generate for the acquisition strategies corresponding according to described target website information The synchronization load mode of middle setting or Asynchronous loading mode, take corresponding load mode to gather described target The acquisition tasks of the target data in the webpage that website information is pointed to, is distributed to described acquisition tasks gather Acquisition server in server cluster;
Acquisition server cluster, for receiving the acquisition tasks of acquisition strategies configuration service device distribution, performs Described acquisition tasks, the target data that feedback collection arrives.
CN201410721389.9A 2014-12-02 2014-12-02 A kind of webpage data acquiring method, apparatus and system Active CN105721519B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410721389.9A CN105721519B (en) 2014-12-02 2014-12-02 A kind of webpage data acquiring method, apparatus and system
PCT/CN2015/095584 WO2016086784A1 (en) 2014-12-02 2015-11-26 Method, apparatus and system for collecting webpage data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410721389.9A CN105721519B (en) 2014-12-02 2014-12-02 A kind of webpage data acquiring method, apparatus and system

Publications (2)

Publication Number Publication Date
CN105721519A true CN105721519A (en) 2016-06-29
CN105721519B CN105721519B (en) 2019-02-05

Family

ID=56090993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410721389.9A Active CN105721519B (en) 2014-12-02 2014-12-02 A kind of webpage data acquiring method, apparatus and system

Country Status (2)

Country Link
CN (1) CN105721519B (en)
WO (1) WO2016086784A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106502802A (en) * 2016-10-12 2017-03-15 山东浪潮云服务信息科技有限公司 A kind of concurrent acquisition method in distributed high in the clouds transmitted based on Avro RPC
CN109658689A (en) * 2018-12-04 2019-04-19 沈阳世纪高通科技有限公司 A kind of information processing method and device
CN110134841A (en) * 2018-02-09 2019-08-16 鼎复数据科技(北京)有限公司 The customized real-time method for obtaining website data
CN113114505A (en) * 2021-04-13 2021-07-13 广州海鹚网络科技有限公司 httpClient-based access request processing method and system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115630217A (en) * 2022-12-21 2023-01-20 广州市千钧网络科技有限公司 Method, device and equipment for loading information and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136026A (en) * 2007-05-15 2008-03-05 北京聚生科技有限公司 Web page content capturing method based on XMLHTTP component technology
CN103049542A (en) * 2012-12-27 2013-04-17 北京信息科技大学 Domain-oriented network information search method
CN103092817A (en) * 2013-01-18 2013-05-08 五八同城信息技术有限公司 Data collection method and data collection device based on script engine
US20140280014A1 (en) * 2013-03-14 2014-09-18 Glenbrook Networks Apparatus and method for automatic assignment of industry classification codes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136026A (en) * 2007-05-15 2008-03-05 北京聚生科技有限公司 Web page content capturing method based on XMLHTTP component technology
CN103049542A (en) * 2012-12-27 2013-04-17 北京信息科技大学 Domain-oriented network information search method
CN103092817A (en) * 2013-01-18 2013-05-08 五八同城信息技术有限公司 Data collection method and data collection device based on script engine
US20140280014A1 (en) * 2013-03-14 2014-09-18 Glenbrook Networks Apparatus and method for automatic assignment of industry classification codes

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106502802A (en) * 2016-10-12 2017-03-15 山东浪潮云服务信息科技有限公司 A kind of concurrent acquisition method in distributed high in the clouds transmitted based on Avro RPC
CN110134841A (en) * 2018-02-09 2019-08-16 鼎复数据科技(北京)有限公司 The customized real-time method for obtaining website data
CN109658689A (en) * 2018-12-04 2019-04-19 沈阳世纪高通科技有限公司 A kind of information processing method and device
CN113114505A (en) * 2021-04-13 2021-07-13 广州海鹚网络科技有限公司 httpClient-based access request processing method and system

Also Published As

Publication number Publication date
WO2016086784A1 (en) 2016-06-09
CN105721519B (en) 2019-02-05

Similar Documents

Publication Publication Date Title
CN106503134B (en) Browser jumps to the method for data synchronization and device of application program
US10055762B2 (en) Deep application crawling
US11989247B2 (en) Indexing access limited native applications
US9251157B2 (en) Enterprise node rank engine
CN105721519A (en) Webpage data acquisition method, device and system
CN104765592B (en) A kind of plug-in management method and its device of object web page acquisition tasks
CN108304410A (en) A kind of detection method, device and the data analysing method of the abnormal access page
US20130191376A1 (en) Identifying related entities
CN107688568A (en) Acquisition method and device based on web page access behavior record
US20140317489A1 (en) Device-independent validation of website elements
CN110555146A (en) method and system for generating network crawler camouflage data
US20170277622A1 (en) Web Page Automated Testing Method and Apparatus
KR101556743B1 (en) Apparatus and method for generating poi information based on web collection
CN103577426B (en) For providing the method, apparatus and system of the additional application information that search is suggested
KR100987330B1 (en) A system and method generating multi-concept networks based on user's web usage data
US20170116336A1 (en) Synchronizing http requests with respective html context
Klerkx et al. How to share and reuse learning resources: the ARIADNE experience
CN117370203B (en) Automatic test method, system, electronic equipment and storage medium
CN105808623B (en) A kind of page access event correlation methodology and device based on search
CN104133762B (en) Method for testing software and test device
JP2019101889A (en) Test execution device and program
CN111966725A (en) Data acquisition method and device applied between internal network and external network and electronic equipment
Jia et al. Using the 5W+ 1H model in reporting systematic literature review: A case study on software testing for cloud computing
Kumar et al. A brief investigation on web usage mining tools (WUM)
US9218418B2 (en) Search expression generation system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant