CN105721519A - Webpage data acquisition method, device and system - Google Patents
Webpage data acquisition method, device and system Download PDFInfo
- Publication number
- CN105721519A CN105721519A CN201410721389.9A CN201410721389A CN105721519A CN 105721519 A CN105721519 A CN 105721519A CN 201410721389 A CN201410721389 A CN 201410721389A CN 105721519 A CN105721519 A CN 105721519A
- Authority
- CN
- China
- Prior art keywords
- website information
- acquisition
- load
- acquisition strategies
- test
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a webpage data acquisition method which may comprise, for example, the following steps: receiving a data batch-acquisition request, wherein the request carries target website information; determining an acquisition strategy which corresponds to the target website information and through which target data can be acquired successfully, wherein the acquisition strategy corresponding to the target website information is specifically obtained through target data acquisition test (at least including synchronization loading test) on the target website information, and the acquisition strategy includes a synchronous loading mode or an asynchronous loading mode; and adopting a corresponding loading mode to acquire target data in a webpage indicated by the target website information according to the synchronous loading mode or the asynchronous loading mode set in the acquisition strategy corresponding to the target website information. In addition, the invention discloses a webpage data acquisition device and a webpage data acquisition system.
Description
Technical field
The application relates to internet arena, particularly relates to a kind of webpage data acquiring method, Apparatus and system.
Background technology
In SEO (Search Engine Optimization, the search engine optimization) process of construction of website,
In order to accurately recognize the global optimization situation of website present stage, can be produced some to third party's website
Or the data acquisition demand of platform, formulate next step by the various information collected is analyzed
Web information flow strategy.
At present, mainly third party is gathered by the web data of the Internet loading third-party website or platform
Website or the data of platform.Load web data mainly to include synchronizing and asynchronous two kinds of load modes.Synchronize
Side's load mode, directly returns html page for request.Asynchronous loading mode, after the page returns,
Change page original structure by loading JS (JavaScript, a kind of literal translation formula script) mode thus
Add and set out data.After obtaining the html page returned, html page can be resolved,
Useful data are extracted and separates, such as can extract certain news in Sina website's news channel
Title.
Relatively big owing to formulating the demand data amount of web information flow strategy, accordingly, it would be desirable to batch capture third party
Website or the web data of platform.But, owing to different web pages data load mode may be different, in order to
Ensure the accuracy of data acquisition results, can only unify to take the mode of Asynchronous loading.But, due to JS
Perform to need to consume the extra time, just can add, for original synchronization, the data set out and can additionally consume in a large number
Hardware resource and time, cause data acquisition efficiency relatively low.
Summary of the invention
In view of this, the purpose of the application is to provide a kind of webpage data acquiring method, Apparatus and system
To realize improving the purpose of data acquisition efficiency.
First aspect in the embodiment of the present application, it is provided that a kind of webpage data acquiring method.Such as,
The method may include that the request receiving batch capture data, and wherein, described request carries target network
Location information;Determine described target website information corresponding can the acquisition strategies of successful acquisition target data, its
In, acquisition strategies corresponding to described target website information is carried out at least especially by this target website information
Obtaining including the target data collecting test synchronizing to load test, described acquisition strategies includes synchronizing loading side
Formula or Asynchronous loading mode;Synchronization according to arranging in the acquisition strategies that described target website information is corresponding adds
Load mode or Asynchronous loading mode, take corresponding load mode to gather what described target website information was pointed to
Target data in webpage.
Second aspect in the embodiment of the present application, it is provided that a kind of collecting webpage data device.Such as,
This device may include that request reception unit, may be used for receive batch capture data request, wherein,
Described request carries target website information.Policy determining unit, is determined for described target network address
Information corresponding can the acquisition strategies of successful acquisition target data, wherein, described target website information is corresponding
Acquisition strategies especially by this target website information at least include synchronize load test number of targets
Obtaining according to collecting test, described acquisition strategies includes synchronizing load mode or Asynchronous loading mode.Gather single
Unit, may be used for according to the synchronization load mode arranged in acquisition strategies corresponding to described target website information
Or Asynchronous loading mode, take corresponding load mode to gather in the webpage that described target website information is pointed to
Target data.
The 3rd aspect in the embodiment of the present application, it is provided that a kind of collecting webpage data system.Such as,
This system may include that client, may be used for sending the request of batch capture data, wherein, described
Request carries target website information.Acquisition strategies configuration service device, may be used for receiving client and sends
The request of batch capture data, determine corresponding can successfully the adopting of target website information that described request is carried
The acquisition strategies of collection target data, wherein, acquisition strategies corresponding to described target website information especially by
This target website information is at least included, and the target data collecting test synchronizing to load test obtains, institute
State acquisition strategies and include synchronizing load mode or Asynchronous loading mode, and, generate for according to described mesh
The synchronization load mode arranged in the acquisition strategies that mark website information is corresponding or Asynchronous loading mode, take phase
The load mode answered gathers the acquisition tasks of the target data in the webpage that described target website information is pointed to,
Described acquisition tasks is distributed to the acquisition server in acquisition server cluster.Acquisition server cluster,
May be used for receiving the acquisition tasks of acquisition strategies configuration service device distribution, perform described acquisition tasks, instead
The target data that feedback collects.
Visible the application has the advantages that
Owing to the embodiment of the present application is after receiving the request of batch capture data, the mesh carried according to request
Mark website information determine correspondence can the acquisition strategies of successful acquisition target data, and this acquisition strategies is
By this target website information at least being included the target data collecting test synchronizing to load test obtains
, therefore, if webpage corresponding to target website information can gather out number of targets to synchronize load mode
According to, then the load mode that can comprise in the acquisition strategies of successful acquisition target data that test obtains is the most permissible
It is to synchronize load mode, thus takes the synchronization load mode arranged in acquisition strategies to gather data, make same
Step just can add the data set out can be to avoid using Asynchronous loading mode to load, it is to avoid causes resource and time
Extra consumption, therefore, the embodiment of the present application ensure successful acquisition to while target data, permissible
It is effectively improved data acquisition efficiency.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to reality
Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that below,
Accompanying drawing in description is only some embodiments described in the application, for those of ordinary skill in the art
From the point of view of, on the premise of not paying creative work, it is also possible to obtain the attached of other according to these accompanying drawings
Figure.
Fig. 1 is a kind of webpage data acquiring method schematic flow sheet disclosed in the embodiment of the present application;
Fig. 2 is a kind of collecting webpage data apparatus structure schematic diagram disclosed in the embodiment of the present application;
Fig. 3 is a kind of collecting webpage data system structure schematic diagram disclosed in the embodiment of the present application.
Detailed description of the invention
For the technical scheme making those skilled in the art be more fully understood that in the application, below in conjunction with
Accompanying drawing in the embodiment of the present application, clearly and completely retouches the technical scheme in the embodiment of the present application
State, it is clear that described embodiment is only some embodiments of the present application rather than whole enforcement
Example.Based on the embodiment in the application, those of ordinary skill in the art are not before making creative work
Put the every other embodiment obtained, all should belong to the scope of protection of the invention.
Typically, since JS performs to need to consume the extra time, if to same page structure not
Perform JS, then execution efficiency has certain lifting.Based on this principle, before batch capture web data,
If able to what the load mode of page data at least included synchronizing loading test effectively analyzes survey
Examination, then can distinguish and can synchronize the website information of loaded targets data and necessary Asynchronous loading number of targets
According to website information, and correspondence is set can the acquisition strategies of successful acquisition target data.So, criticizing
When amount gathers data, can take to be provided with according to the acquisition strategies corresponding with target website information
Synchronize load mode or Asynchronous loading mode gathers data, make originally to synchronize just to add the data set out permissible
Avoid using Asynchronous loading mode to load, thus avoid causing the extra consumption of resource and time, Ke Yiyou
The raising data acquisition efficiency of effect.
For example, with reference to Fig. 1, a kind of webpage data acquiring method flow process signal provided for the embodiment of the present application
Figure.As it is shown in figure 1, the method may include that
S110, the request of reception batch capture data, wherein, described request carries target website information.
Such as, the request of the batch capture data received, user can be carried defeated in front end page
The batch capture configuration information entered.Assume to want the batch capture 1688 site search page different crucial in retrieval
Search result data during word.So batch capture configuration information may include that target website information " http:
//s.1688.com/selloffer/offer_search.htm?Keywords=$ { keyword}&button_click=to
P&n=y ".Wherein, { keyword} can replace to different key words to $, the HTML of target data
Text, expression is taken out for label can be configured to id:breadCrumbText | class [0]: sm-navigatebar-count |
Take below this html tag of breadCrumbText under first sm-navigatebar-count class
Plain text.Wherein, batch capture configuration information can also be configured to the describing mode of XPath, this Shen
Please this is not limited.It is understood that batch capture configuration information can also according to user certainly
Oneself demand selectivity configures other parameters, and this is not limited by the application.
It addition, according to actual needs, if in addition to the batch capture configuration information that user submits to, also
Need to read relevant parameter from alternative document, then also need to for preserve the associated documents of this parameter with
The mapping relations of associated documents storage address are configured, in order to according to mapping when carrying out data acquisition test
Relation reads the parameter in file.Such as, at the batch capture 1688 site search page in search difference
In the application scenarios of search result data during key word, the key word file that user submits to can be according to finger
Determine address and download to the machine for performing data acquisition test, meanwhile, arrange and preserve key word literary composition
Part and the mapping relations storing address, such as, " taskKeywordsFile ": "/home/admin/1/test.txt ",
Thus when carrying out data acquisition test, the key word in key word file can be read according to mapping relations.
S120, determine described target website information corresponding can the acquisition strategies of successful acquisition target data,
Wherein, the acquisition strategies that described target website information is corresponding especially by this target website information is carried out to
Including that the target data collecting test synchronizing to load test obtains less, described acquisition strategies includes synchronizing to load
Mode or Asynchronous loading mode.
It should be noted that described target website information corresponding can the collection plan of successful acquisition target data
Slightly, can enter beforehand through to various different website information before receiving the request of batch capture data
The target data collecting test walking to include synchronizing to load test less obtains, it is also possible to receiving for institute
When stating the request of batch capture data of target website information, in real time by this target website information is carried out
At least include that the target data collecting test synchronizing to load test obtains, then or, it is also possible to it is to determine
Test in advance the acquisition strategies of acquisition invalid after, again proceed to the number of targets including synchronizing to load test less
Obtain according to collecting test.
Such as, at least include synchronizing to load the number of targets tested to various different website information in advance
According in the embodiment of collecting test, the test configurations that user inputs in front end page can be received in advance
Information, mainly includes dissimilar network address to be tested, for identifying the html tag etc. of target data.
Determine need test website information and correspondence for identify target data html tag it
After, can carry out synchronizing the target data collecting test that load mode is preferential, obtain dissimilar network address and divide
Not corresponding acquisition strategies.
In some possible embodiments, the acquisition strategies testing acquisition in advance can gather plan as history
Slightly it is stored in data base, in order to when receiving the request of batch capture data, extracts from data base
Go out the history acquisition strategies of correspondence to carry out data acquisition.
Certainly, before extracting the history acquisition strategies that described target website information is corresponding, it is also possible to enter one
Step judges whether the history acquisition strategies that target website information that described request is carried is corresponding, if not
Exist, then can be by this target website information being carried out the synchronization preferential target data collection of load mode
Test, it is thus achieved that corresponding can the acquisition strategies of successful acquisition target data, described acquisition strategies includes synchronizing
Load mode or Asynchronous loading mode, and, this acquisition strategies is saved as described target website information pair
The history acquisition strategies answered.
In some possible embodiments, the target website information correspondence that described request is carried can extracted
History acquisition strategies after, directly determine that with described history acquisition strategies be described target website information pair
That answers can the acquisition strategies of successful acquisition target data.
In other possible embodiments, it is contemplated that the loading of the page data of third party's website or platform
Mode is it may happen that change, and originally synchronizing loading can be with the network address of successful acquisition to target data, and having can
Can become can only the network address of Asynchronous loading.Therefore, the history collection plan that target website information is corresponding is being extracted
After Lve, it is also possible to carry out small-scale test, thus verify already present history acquisition strategies and whether may be used
It is continuing with.
Such as, small-scale test may include that and determines for identifying little rule by default small-scale test order
The html tag of mould test data and described target website information need the website information of test, root
The acquisition strategies corresponding according to described target website information and for identifying the HTML testing on a small scale data
Label, attempts the small-scale test data gathering in the webpage needing the website information of test to point to, if
Gather successfully, then may determine that described history acquisition strategies be described target website information corresponding can be successful
Gather the acquisition strategies of target data, carry out formal batch capture.And, also include, if gathering not become
Merit, then can at least include that to this target website information the target data collection synchronizing to load test is surveyed
Examination, it is thus achieved that corresponding can the acquisition strategies of successful acquisition target data, according to the acquisition strategies obtained more
The history acquisition strategies that new described target website information is corresponding.
It should be noted that the embodiment of the present application is to the detailed description of the invention of default small-scale test order not
Limit.For example, it is possible to preset quantity or certain reduction ratio on a small scale, from target website information according to fixing
In select a small amount of website information needing test, etc..Such as, in conjunction with above-mentioned batch capture 1688
The application scenarios of the site search page search result data when the different key word of search.Carrying out little rule
Mould test time, can from user submit to a large amount of key words extract first 10 (if user submit to pass
Keyword, less than 10, can be extracted by actual quantity), it is substituted into one by one in user configured website information
The position of search keyword parameter, determines 10 website information needing test.Thus survey as required
10 website information of examination and, for identifying the information such as html tag of target data, take
The history acquisition strategies extracted from data base, tests.Such as, history acquisition strategies can be wrapped
Include load mode (synchronizing load mode or Asynchronous loading mode), connect time-out time, acquisition page time-out
The parameters such as time.In this application scenarios, the form of the history acquisition strategies extracted can be:
“[{"url":"http://s.1688.com/selloffer/offer_search.htm?Keywords=$ { keyword}
&button_click=top&n=y ", " keywordsPath ": "/usr/group/seo/test.txt ", " conto ": " 5000
","readto":"6000","crawlType":"sync"}]”.Through test on a small scale, if it is determined that gather not
Success, can re-start at least include synchronizing to load test for user configured target website information
Target data collecting test target data collecting test, update described according to the acquisition strategies that regains
The history acquisition strategies that target website information is corresponding.Carrying out based on the formal batch of the acquisition strategies after updating
Target data gathers.
It should be noted that target website information is at least included synchronizing to load surveying by the embodiment of the present application
The specific implementation of the target data collecting test of examination does not limits.
Such as, in some possible embodiments, at least include target website information synchronizing to load
The target data collecting test of test may include that taking to synchronize load mode loads described target network address letter
The webpage that breath points to, for synchronizing to load the webpage that obtains, therefrom attempts reading target data, for can
From synchronizing the webpage that loading obtains reads out the website information of target data, the network address letter of the type is set
Load mode in the acquisition strategies that breath is corresponding is for synchronizing load mode, for obtaining from synchronizing loading
Webpage in read out the website information of target data, the collection plan that the website information of the type is corresponding is set
Load mode in slightly is Asynchronous loading mode.
The most such as, in other possible embodiments, can be described first to take Asynchronous loading mode to load
The webpage that target website information is pointed to, from Asynchronous loading to webpage attempt reading target data, then adopt
Take and synchronize the webpage that the load mode described target website information of loading is pointed to, from the webpage that synchronization is loaded into
Attempt reading target data.If the network address letter of target data can be read out from the webpage that synchronization is loaded into
Breath, then can arrange the load mode in the acquisition strategies that the website information of the type is corresponding for synchronizing to load
Mode.If can not be from synchronizing the webpage that is loaded into read out target data and can be from Asynchronous loading
To webpage in read out target data, then the acquisition strategies that the website information of the type is corresponding can be set
In load mode be Asynchronous loading mode.
In some possible embodiments, it is contemplated that it is steady that loading Webpage success or not also suffers from network
Qualitative effect, it may be necessary to retry connection when connecting time-out and retry reading when reading page time-out
The page, therefore, during at least including the target data collecting test synchronizing to load test, institute
State the step taking to synchronize the webpage that load mode loads the sensing of described target website information can repeatedly hold
OK, and, it is also possible to including: when performing every time, all record and network address time of being connected of foundation and
Upon connection for obtaining the time of Webpage;In the collection plan that the website information arranging the type is corresponding
When load mode in slightly is for synchronizing load mode, connect according to the foundation of record during being performed a plurality of times
Time and upon connection for obtaining time of Webpage, carry out in the acquisition strategies of correspondence with
What step load mode was corresponding connects time-out time and obtains the setting of page time-out time.And, for
Can repeatedly can not take different from synchronizing the webpage that loading obtains reads out the website information of target data
Step load mode loads its webpage pointed to, and all record when performing every time set up with network address be connected time,
And upon connection for obtaining the time of Webpage, thus corresponding in the website information arranging the type
Acquisition strategies in load mode when being Asynchronous loading mode, can be according to repeatedly taking Asynchronous loading side
What formula recorded during loading webpage sets up the time connected and is used for upon connection obtaining Webpage
Time, carry out correspondence acquisition strategies in connect time-out time and obtain page time-out time setting.
Wherein, what described basis recorded during being performed a plurality of times sets up the time connected and is connecting
Afterwards for obtaining the time of Webpage, carry out the acquisition strategies of correspondence connects time-out time and acquisition
The specific implementation of the setting of page time-out time does not limits.For example, it is possible to take be performed a plurality of times during remember
The meansigma methods setting up the time connected of record obtains the connection time-out time needing to set, and takes and was performed a plurality of times
In journey, the meansigma methods of the time for obtaining Webpage of record obtains the acquisition page time-out needing to set
Time.Connect time-out time it is of course also possible to there are other to calculate and obtain the realization of page time-out time
Mode, this is not limited by the application.
In superincumbent embodiment, owing to being provided with connection time-out time and acquisition in acquisition strategies
Page time-out time, thus during follow-up batch capture data, can be according to the connection set in acquisition strategies
Time-out time, re-emits connection request when occurring and connecting time-out, and, set according in acquisition strategies
Fixed acquisition page time-out time, re-emits reading page request when occurring and reading page time-out.Separately
Outward, can also set in acquisition strategies retry the number of times higher limit of connection and retrying read the page time
Number higher limit, in order to when number of retries exceedes higher limit, abandon this website information correspondence page data
Collection.
S130, according to the synchronization load mode arranged in acquisition strategies corresponding to described target website information or
Asynchronous loading mode, takes corresponding load mode to gather in the webpage that described target website information is pointed to
Target data.
It should be noted that the target website information that described request is carried can be one or more.This
Bright embodiment can the most at least include the target synchronizing to load test to different types of website information
Data acquisition test, distinguishes and can synchronize the website information of loaded targets data and necessary Asynchronous loading mesh
The website information of mark data, and correspondence is set can the acquisition strategies of successful acquisition target data.For many
Individual different types of target website information, can take corresponding acquisition strategies to gather webpage respectively
In target data.Wherein, at least include various types of website information synchronizing to load surveying
The target data collecting test of examination, is referred to above-mentioned target website information at least being included, synchronization adds
The embodiment carrying test realizes, and does not repeats them here.
Visible, that application the embodiment of the present application provides method, due to the collection plan that target website information is corresponding
It is slightly by this target website information at least being included the target data collecting test synchronizing to load test
Obtain, therefore, when batch capture data, can be according to the collection plan corresponding with target website information
Slightly, take the synchronization load mode being provided with or Asynchronous loading mode to gather data, make synchronization just can add
The data set out to avoid using Asynchronous loading mode to load, thus can avoid causing the volume of resource and time
Outer consumption, can effectively improve data acquisition efficiency.It addition, the page is also connected and the page by the application
The reading time carries out recording, analyzing, and sets and connect time-out time, acquisition page accordingly in acquisition strategies
Face time-out time, thus rationally can call same according to acquisition strategies when formally carrying out batch data and gathering
Step or asynchronous two kinds of load modes, ensureing that accurate acquisition improves collection effect while data to greatest extent
Rate, it is to avoid additional hardware resources and time loss.
Corresponding with above-mentioned webpage data acquiring method, present invention also provides a kind of collecting webpage data
Device.
For example, with reference to Fig. 2, a kind of collecting webpage data apparatus structure passed through for the embodiment of the present application shows
It is intended to.As in figure 2 it is shown, this device may include that
Request reception unit 210, may be used for receive batch capture data request, wherein, described please
Ask and carry target website information.Policy determining unit 220, is determined for described target network address letter
Breath corresponding can the acquisition strategies of successful acquisition target data, wherein, described target website information is corresponding
Acquisition strategies is especially by least including this target website information synchronizing to load the target data tested
Collecting test obtains, and described acquisition strategies includes synchronizing load mode or Asynchronous loading mode.Collecting unit
230, may be used for according to the synchronization load mode arranged in acquisition strategies corresponding to described target website information
Or Asynchronous loading mode, take corresponding load mode to gather in the webpage that described target website information is pointed to
Target data.
In some possible embodiments, the target website information correspondence that described request is carried can extracted
History acquisition strategies after, directly determine that with described history acquisition strategies be described target website information pair
That answers can the acquisition strategies of successful acquisition target data.Therefore, described policy determining unit 220, permissible
For extracting the history acquisition strategies that described target website information is corresponding, described history acquisition strategies is the most pre-
First pass through and this target website information is at least included, and the target data collecting test synchronizing to load test obtains
, described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode, determines that described history is adopted
Collection strategy be described target website information corresponding can the acquisition strategies of successful acquisition target data.
In other possible embodiments, extract history acquisition strategies corresponding to target website information it
After, it is also possible to carry out small-scale test.Such as, described policy determining unit 220 includes: extract son
Unit 221, may be used for extracting the history acquisition strategies that described target website information is corresponding, described history
Acquisition strategies is specifically beforehand through at least including this target website information synchronizing to load the target tested
Data acquisition test obtains, and described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode.
Test determines subelement 222 on a small scale, may be used for determining for identifying by default small-scale test order
The html tag of small-scale test data and described target website information need the network address letter of test
Breath.Strategy test subelement 223, may be used for the history collection corresponding according to described target website information
Strategy and for identifying the html tag testing on a small scale data, attempts gathering the network address needing test
Small-scale test data in the webpage that information is pointed to.Strategy determines subelement 224, if may be used for
Gather successfully, determine described history acquisition strategies be described target website information corresponding can successful acquisition mesh
The acquisition strategies of mark data.Test subelement 225, if may be used for gathering unsuccessful, the most right
This target website information at least includes the target data collecting test synchronizing to load test, it is thus achieved that corresponding
Can the acquisition strategies of successful acquisition target data.Update subelement 226, may be used for according to described survey
The acquisition strategies that swab unit obtains updates the history acquisition strategies that described target website information is corresponding.
It should be noted that described test subelement 225 is gathered by the embodiment of the present application by target data
Test, it is thus achieved that corresponding can the specific implementation of acquisition strategies of successful acquisition target data not limit.Example
As, in some possible embodiments, wherein, described test subelement 225 may include that synchronization adds
Subelements 2251, may be used for taking to synchronize load mode and loads the net that described target website information is pointed to
Page.Target data reads subelement 2252, is used against synchronizing to load the webpage obtained, therefrom tastes
Academic probation takes target data.Synchronization policy arranges subelement 2253, is used against to load from synchronization
To webpage in read out the website information of target data, the collection that the website information of the type is corresponding is set
Load mode in strategy is for synchronizing load mode.Asynchronous strategy setting subelement 2254, may be used for pin
To arranging the type from synchronizing the webpage that loading obtains reads out the website information of target data
Load mode in the acquisition strategies that website information is corresponding is Asynchronous loading mode.
In some possible embodiments, it is contemplated that it is steady that loading Webpage success or not also suffers from network
Qualitative effect, it may be necessary to retry connection when connecting time-out and retry reading when reading page time-out
The page, therefore, wherein, described synchronization adds subelements 2251, may be used for being performed a plurality of times and takes to synchronize
Load mode loads the step of the webpage that described target website information is pointed to.And, described test subelement is also
May include that synchronous recording subelement 2255, may be used for adding subelements in described synchronization and perform every time
During loading, all records set up the time being connected and upon connection for obtaining Webpage with network address
Time.Synchronization timeout sets subelement 2256, may be used for the website information arranging the type corresponding
When load mode in acquisition strategies is for synchronizing load mode, add subelements repeatedly according to described synchronization
Perform the time setting up connection of record in loading procedure and upon connection for obtaining Webpage
Time, carry out the acquisition strategies of correspondence synchronizing connection time-out time corresponding to load mode and obtaining page
The setting of face time-out time.Asynchronous record subelement 2257, is used against to load from synchronization
To webpage in read out the website information of target data, repeatedly take Asynchronous loading mode to load it and point to
Webpage, and all record when performing every time and set up, with network address, the time being connected and be used for upon connection obtaining
Take the time of Webpage.Asynchronous timeouts subelement 2258, may be used for arranging the net of the type
When load mode in the acquisition strategies that location information is corresponding is Asynchronous loading mode, asynchronous according to repeatedly taking
What load mode recorded during loading webpage sets up the time connected and is used for upon connection obtaining net
The time of the page page, carry out the acquisition strategies of correspondence connecting time-out time and obtaining page time-out time
Setting.
It should be noted that the extraction subelement 221 described in the embodiment of the present application, on a small scale test determine
Subelement 222, strategy test subelement 223, strategy determine subelement 224, synchronism detection subelement 225,
Update subelement 226, synchronize to add subelements 2251, target data reading subelement 2252, synchronize plan
Slightly arrange subelement 2253, asynchronous strategy setting subelement 2254, synchronous recording subelement 2255, with
Step timeouts subelement 2256, asynchronous record subelement 2257, asynchronous timeouts subelement 2258
The most all with dotted lines, to represent that these unit are not the collecting webpage data dresses that the application provides
The necessary unit put.
Corresponding with above-mentioned webpage data acquiring method, present invention also provides a kind of for realizing the party
The collecting webpage data system of method.
For example, with reference to Fig. 3, a kind of collecting webpage data system structure provided for the embodiment of the present application is shown
It is intended to.As it is shown on figure 3, this system may include that
Client 310, may be used for sending the request of batch capture data, and wherein, described request is carried
There is target website information.
Acquisition strategies configuration service device 320, may be used for receive batch capture data request, wherein,
Described request carries target website information, determine described target website information corresponding can successful acquisition mesh
The acquisition strategies of mark data, wherein, acquisition strategies corresponding to described target website information is especially by this
Target website information at least include synchronize load test target data collecting test obtain, described in adopt
Collection strategy includes synchronizing load mode or Asynchronous loading mode;And, generate for according to described target network
The synchronization load mode arranged in the acquisition strategies that location information is corresponding or Asynchronous loading mode, take corresponding
Load mode gathers the acquisition tasks of the target data in the webpage that described target website information is pointed to, by institute
State the acquisition server that acquisition tasks is distributed in acquisition server cluster 330.
Acquisition server cluster 330, the collection that may be used for receiving the distribution of acquisition strategies configuration service device is appointed
Business, performs described acquisition tasks, the target data that feedback collection arrives.
Visible, that application the embodiment of the present application provides collecting webpage data system, can be joined by acquisition strategies
Put the raw batches of acquisition tasks of server 320, by preset distribution policy, the acquisition tasks of batch is divided
Issue acquisition server idle in acquisition server cluster 330 so that acquisition tasks can concurrently perform,
Further increase the collecting efficiency of web data.
In some possible embodiments, user can arrange batch capture configuration information in client 310,
User can send, by client 310, the request carrying this batch capture configuration information.Wherein batch
Acquisition configuration information can include the parameters such as target website information.The batch capture being generally noted above
In the application scenarios of the search result data of the different search key words of 1688 websites, acquisition strategies configuration clothes
Business device 320 is in addition to obtaining batch capture configuration information, in addition it is also necessary to key word file user submitted to
Download in acquisition server cluster 330 for performing the collection clothes of data acquisition test according to specifying address
On business device, meanwhile, the mapping relations arranging and preserving key word file with store address, such as,
"taskKeywordsFile":"/home/admin/1/test.txt".And these mapping relations are encapsulated into test assignment
In, acquisition server it is sent in the lump with test assignment.Thus carry out data acquisition survey at acquisition server
During examination, the key word in key word file can be read according to mapping relations, expand accordingly for
Search out the target website information to relevant page data.
In other possible embodiments, acquisition strategies configuration service device 320 may include that strategy is raw
Become server 321, testing service device 322, database server 323.
Wherein, strategy generating server 321, may be used in advance for dissimilar network address, generate pre-
First test assignment, test assignment will submit to testing service device 322, from database server 323 in advance
The load mode of record, Connection Time, acquisition page time etc. when obtaining test.Add according to acquired
Load mode, Connection Time, acquisition page time generate the acquisition strategies corresponding with dissimilar network address.To
Database server 323 sends acquisition strategies corresponding to dissimilar network address so that as history acquisition strategies
Warehouse-in preserves.And, receive the request that client 310 sends, obtain target network from database server
The history acquisition strategies that location information is corresponding.Generating takes history acquisition strategies to enter described target website information
The small-scale test assignment of row test on a small scale.Small-scale test assignment is submitted to testing service device 322.
If test gathers successfully, then can generate for gathering described target network according to described history acquisition strategies
The acquisition tasks of the target data in the webpage that location information is pointed to.If gathering unsuccessful, then generate institute
That states that target website information carries out target data collection retries task.Test assignment will be retried and submit to test
Server 322.When obtaining the load mode of record, connection when retesting from database server 323
Between, obtain page time etc..According to acquired load mode, Connection Time, acquisition page time,
Generate the acquisition strategies of the renewal corresponding with target website information.Send described to database server 323
The acquisition strategies of the renewal that target website information is corresponding is so that the history preserved in more new database gathers plan
Omit, and generate in the webpage for gathering the sensing of described target website information according to the acquisition strategies updated
The acquisition tasks of target data.The acquisition tasks of generation is distributed to adopting in acquisition server cluster 330
Collection server performs.
Wherein, testing service device 322, may be used for being tested in advance from strategy generating server 321
Task, on a small scale test assignment and/or, retry task.By obtain obtain in advance test assignment,
On a small scale test assignment and/or, collection that the task of retrying is distributed in acquisition server cluster 330 clothes
Business device performs.It is collected in the load mode during test assignment performs, Connection Time, the acquisition page
Time etc..The load mode collected, Connection Time, acquisition page time etc. are saved in data base
So that strategy generating server 321 uses.In testing service device 322, synchronization loading side can be comprised
Formula and two kinds of load modes of Asynchronous loading mode, wherein, synchronize load mode and can use httpclient+
The mode of htmlparser carries out loading and page parsing, and Asynchronous loading mode can use webkit to carry out
Load and page parsing.
Wherein, database server 323, may be used for preserving what described testing service device 322 was collected
Load mode, Connection Time, acquisition page time etc., and, it is raw that conversation strategy generates server 321
The acquisition strategies become.
In superincumbent embodiment, acquisition strategies configuration service device 320 and acquisition server cluster 330
Can arrange in different network systems.Database server 323 can be built in MySQL database
On cluster.Furthermore, it is contemplated that the magnitude of data, database server 323 can use distributed carrying out
Dispose the reading performance good with offer.
It should be noted that strategy generating server 321 described in the embodiment of the present application, testing service device 322,
Database server is in fig. 2 with dotted lines, to represent that these unit are not acquisition strategies configuration service
The essential service device of device.
For convenience of description, it is divided into various unit to be respectively described with function when describing apparatus above.Certainly,
The function of each unit can be realized in same or multiple softwares and/or hardware when implementing the present invention.
As seen through the above description of the embodiments, those skilled in the art is it can be understood that arrive
The present invention can add the mode of required general hardware platform by software and realize.Based on such understanding,
The part that prior art is contributed by technical scheme the most in other words can be with software product
Form embody, this computer software product can be stored in storage medium, as ROM/RAM,
Magnetic disc, CD etc., including some instructions with so that computer equipment (can be personal computer,
Server, or the network equipment etc.) perform each embodiment of the present invention or some part institute of embodiment
The method stated.
Each embodiment in this specification all uses the mode gone forward one by one to describe, identical between each embodiment
Similar part sees mutually, and what each embodiment stressed is different from other embodiments
Part.For system embodiment, owing to it is substantially similar to embodiment of the method, so retouching
That states is fairly simple, and relevant part sees the part of embodiment of the method and illustrates.
The present invention can be used in numerous general or special purpose computing system environment or configuration.Such as: Ge Renji
Calculation machine, server computer, handheld device or portable set, laptop device, multicomputer system,
System based on microprocessor, set top box, programmable consumer-elcetronics devices, network PC, small-sized calculating
Machine, mainframe computer, the distributed computing environment including any of the above system or equipment etc..
The present invention can described in the general context of computer executable instructions,
Such as program module.Usually, program module includes performing particular task or realizing specific abstract data class
The routine of type, program, object, assembly, data structure etc..Can also be in a distributed computing environment
Put into practice the present invention, in these distributed computing environment, by by communication network connected remotely
Reason equipment performs task.In a distributed computing environment, program module may be located at and includes storage device
In interior local and remote computer-readable storage medium.
It should be noted that in this article, the relational terms of such as first and second or the like is used merely to
One entity or operation are separated with another entity or operating space, and not necessarily requires or imply
Relation or the order of any this reality is there is between these entities or operation.And, term " includes ",
" comprise " or its any other variant is intended to comprising of nonexcludability, so that include that one is
The process of row key element, method, article or equipment not only include those key elements, but also include the brightest
Other key elements really listed, or also include intrinsic for this process, method, article or equipment
Key element.In the case of there is no more restriction, statement " including ... " key element limited,
It is not precluded from there is also in including the process of described key element, method, article or equipment other identical
Key element.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the protection model of the present invention
Enclose.All any modification, equivalent substitution and improvement etc. made within the spirit and principles in the present invention, all
Comprise within the scope of the present invention.
Claims (11)
1. a webpage data acquiring method, it is characterised in that including:
Receiving the request of batch capture data, wherein, described request carries target website information;
Determine described target website information corresponding can the acquisition strategies of successful acquisition target data, wherein,
Acquisition strategies corresponding to described target website information at least includes especially by this target website information
Synchronize load test target data collecting test obtain, described acquisition strategies include synchronize load mode or
Asynchronous loading mode;
According to the synchronization load mode arranged in the acquisition strategies that described target website information is corresponding or asynchronous add
Load mode, takes corresponding load mode to gather the number of targets in the webpage that described target website information is pointed to
According to.
Method the most according to claim 1, it is characterised in that described determine that described target network address is believed
Corresponding can the acquisition strategies of successful acquisition target data the including of breath:
Extracting the history acquisition strategies that described target website information is corresponding, described history acquisition strategies is the most pre-
First pass through and this target website information is at least included, and the target data collecting test synchronizing to load test obtains
, described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode;
Determine described history acquisition strategies be described target website information corresponding can successful acquisition target data
Acquisition strategies.
Method the most according to claim 1, it is characterised in that described determine that described target network address is believed
Corresponding can the acquisition strategies of successful acquisition target data the including of breath:
Extracting the history acquisition strategies that described target website information is corresponding, described history acquisition strategies is the most pre-
First pass through and this target website information is at least included, and the target data collecting test synchronizing to load test obtains
, described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode;
By default small-scale test order determine for the html tag identifying on a small scale test data with
And described target website information needs the website information of test;
The history acquisition strategies corresponding according to described target website information and being used for identifies test number on a small scale
According to html tag, attempt gather need test website information point to webpage in small-scale test
Data;
If gathering successfully, it is determined that described history acquisition strategies be described target website information corresponding can
The acquisition strategies of successful acquisition target data;
If gathering unsuccessful, then this target website information is at least included the mesh synchronizing to load test
Mark data acquisition test, it is thus achieved that corresponding can the acquisition strategies of successful acquisition target data, according to obtain
Acquisition strategies updates the history acquisition strategies that described target website information is corresponding.
4. according to the method described in any one of claim 1-3, it is characterised in that described to target network address
Information at least includes that the target data collecting test synchronizing to load test includes:
Take to synchronize load mode and load the webpage that described target website information is pointed to, for synchronizing to load
The webpage arrived, therefrom attempts reading target data, for reading out from synchronizing to load the webpage obtained
The website information of target data, arranges the load mode in the acquisition strategies that the website information of the type is corresponding
For synchronizing load mode, for can not load the network address reading out target data the webpage obtained from synchronization
Information, the load mode arranged in the acquisition strategies that the website information of the type is corresponding is Asynchronous loading mode.
Method the most according to claim 4, it is characterised in that described in take synchronize load mode add
The step carrying the webpage that described target website information is pointed to is performed a plurality of times, and, also include:
When performing every time, all records are set up, with network address, the time being connected and are used for upon connection obtaining
The time of Webpage, the load mode in the acquisition strategies that the website information arranging the type is corresponding is
When synchronizing load mode, setting up the time connected and connecting according to record during being performed a plurality of times
For obtaining the time of Webpage after connecing, carry out that the acquisition strategies of correspondence synchronizes load mode corresponding
Connect time-out time and obtain the setting of page time-out time;
For repeatedly can not adopting from synchronizing the webpage that loading obtains reads out the website information of target data
Take Asynchronous loading mode and load its webpage pointed to, and all record when performing every time to set up with network address and be connected
Time and upon connection for obtaining time of Webpage, the website information pair of the type is being set
When load mode in the acquisition strategies answered is Asynchronous loading mode, according to repeatedly taking Asynchronous loading mode
Record during loading webpage sets up the time connected and upon connection for obtaining Webpage
Time, carry out connection time-out time and acquisition page that in the acquisition strategies of correspondence, Asynchronous loading mode is corresponding
The setting of face time-out time.
6. a collecting webpage data device, it is characterised in that including:
Request reception unit, for receiving the request of batch capture data, wherein, described request carries
Target website information;
Policy determining unit, for determine described target website information corresponding can successful acquisition target data
Acquisition strategies, wherein, acquisition strategies corresponding to described target website information is especially by this target network
Location information at least includes that the target data collecting test synchronizing to load test obtains, described acquisition strategies
Including synchronizing load mode or Asynchronous loading mode;
Collecting unit, the synchronization arranged in the acquisition strategies corresponding according to described target website information adds
Load mode or Asynchronous loading mode, take corresponding load mode to gather what described target website information was pointed to
Target data in webpage.
Device the most according to claim 6, it is characterised in that described policy determining unit, is used for
Extracting the history acquisition strategies that described target website information is corresponding, described history acquisition strategies is led to the most in advance
Cross and this target website information at least included, and the target data collecting test synchronizing to load test obtains,
Described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode, determines that described history gathers plan
What the most described target website information was corresponding can the acquisition strategies of successful acquisition target data.
Device the most according to claim 6, it is characterised in that described policy determining unit includes:
Extract subelement, for extracting the history acquisition strategies that described target website information is corresponding, described in go through
History acquisition strategies is specifically beforehand through at least including this target website information synchronizing to load the mesh tested
Mark data acquisition test obtains, and described history acquisition strategies includes synchronizing load mode or Asynchronous loading mode;
Test determines subelement on a small scale, for determining for identifying little rule by default small-scale test order
The html tag of mould test data and described target website information need the website information of test;
Strategy test subelement, for according to history acquisition strategies corresponding to described target website information and
For identifying the html tag testing data on a small scale, attempting gathering needs the website information of test to point to
Webpage in small-scale test data;
Strategy determines subelement, if for gathering successfully, determining that described history acquisition strategies is described mesh
What mark website information was corresponding can the acquisition strategies of successful acquisition target data;
Test subelement, if unsuccessful for gathering, then at least includes this target website information
Synchronize to load the target data collecting test of test, it is thus achieved that corresponding can the collection of successful acquisition target data
Strategy;
Update subelement, update described target network for the acquisition strategies obtained according to described test subelement
The history acquisition strategies that location information is corresponding.
Device the most according to claim 8, it is characterised in that described test subelement includes:
Synchronization adds subelements, for taking synchronization load mode to load what described target website information was pointed to
Webpage;Target data reads subelement, for for synchronizing to load the webpage obtained, therefrom attempting reading
Target data;Synchronization policy arranges subelement, for for reading from synchronizing to load the webpage obtained
Go out the website information of target data, the loading side in the acquisition strategies that the website information of the type is corresponding is set
Formula is for synchronizing load mode;Asynchronous strategy setting subelement, for for obtaining from synchronizing to load
Webpage reads out the website information of target data, the acquisition strategies that the website information of the type is corresponding is set
In load mode be Asynchronous loading mode.
Device the most according to claim 9, it is characterised in that described synchronization adds subelements,
For the step of taking synchronize webpage that load mode load described target website information sensing is performed a plurality of times;
And, described test subelement also includes:
Synchronous recording subelement, for described synchronization add subelements every time perform load time, all records
The time being connected is set up and upon connection for obtaining the time of Webpage with network address;
Synchronization timeout sets subelement, in the acquisition strategies that the website information arranging the type is corresponding
Load mode for synchronizing load mode time, add subelements according to described synchronization and loaded being performed a plurality of times
Record in journey sets up the time connected and upon connection for obtaining the time of Webpage, carries out
Corresponding acquisition strategies synchronizes connection time-out time corresponding to load mode and obtains page time-out time
Setting;
Asynchronous record subelement, for for reading out number of targets from synchronizing to load the webpage obtained
According to website information, repeatedly take Asynchronous loading mode to load its webpage pointed to, and equal when performing every time
Record and set up the time being connected with network address and upon connection for obtaining the time of Webpage;
Asynchronous timeouts subelement, in the acquisition strategies that the website information arranging the type is corresponding
Load mode when being Asynchronous loading mode, during loading webpage according to repeatedly taking Asynchronous loading mode
Setting up the time connected and being used for obtaining the time of Webpage upon connection of record, carries out correspondence
Acquisition strategies in connect time-out time and obtain page time-out time setting.
11. 1 kinds of collecting webpage data systems, it is characterised in that including:
Client, for sending the request of batch capture data, wherein, described request carries target network
Location information;
Acquisition strategies configuration service device, for receiving the request of the batch capture data that client sends, really
Target website information that fixed described request is carried corresponding can the acquisition strategies of successful acquisition target data, its
In, acquisition strategies corresponding to described target website information is carried out at least especially by this target website information
Obtaining including the target data collecting test synchronizing to load test, described acquisition strategies includes synchronizing loading side
Formula or Asynchronous loading mode, and, generate for the acquisition strategies corresponding according to described target website information
The synchronization load mode of middle setting or Asynchronous loading mode, take corresponding load mode to gather described target
The acquisition tasks of the target data in the webpage that website information is pointed to, is distributed to described acquisition tasks gather
Acquisition server in server cluster;
Acquisition server cluster, for receiving the acquisition tasks of acquisition strategies configuration service device distribution, performs
Described acquisition tasks, the target data that feedback collection arrives.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410721389.9A CN105721519B (en) | 2014-12-02 | 2014-12-02 | A kind of webpage data acquiring method, apparatus and system |
PCT/CN2015/095584 WO2016086784A1 (en) | 2014-12-02 | 2015-11-26 | Method, apparatus and system for collecting webpage data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410721389.9A CN105721519B (en) | 2014-12-02 | 2014-12-02 | A kind of webpage data acquiring method, apparatus and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105721519A true CN105721519A (en) | 2016-06-29 |
CN105721519B CN105721519B (en) | 2019-02-05 |
Family
ID=56090993
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410721389.9A Active CN105721519B (en) | 2014-12-02 | 2014-12-02 | A kind of webpage data acquiring method, apparatus and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105721519B (en) |
WO (1) | WO2016086784A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106502802A (en) * | 2016-10-12 | 2017-03-15 | 山东浪潮云服务信息科技有限公司 | A kind of concurrent acquisition method in distributed high in the clouds transmitted based on Avro RPC |
CN109658689A (en) * | 2018-12-04 | 2019-04-19 | 沈阳世纪高通科技有限公司 | A kind of information processing method and device |
CN110134841A (en) * | 2018-02-09 | 2019-08-16 | 鼎复数据科技(北京)有限公司 | The customized real-time method for obtaining website data |
CN113114505A (en) * | 2021-04-13 | 2021-07-13 | 广州海鹚网络科技有限公司 | httpClient-based access request processing method and system |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115630217A (en) * | 2022-12-21 | 2023-01-20 | 广州市千钧网络科技有限公司 | Method, device and equipment for loading information and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101136026A (en) * | 2007-05-15 | 2008-03-05 | 北京聚生科技有限公司 | Web page content capturing method based on XMLHTTP component technology |
CN103049542A (en) * | 2012-12-27 | 2013-04-17 | 北京信息科技大学 | Domain-oriented network information search method |
CN103092817A (en) * | 2013-01-18 | 2013-05-08 | 五八同城信息技术有限公司 | Data collection method and data collection device based on script engine |
US20140280014A1 (en) * | 2013-03-14 | 2014-09-18 | Glenbrook Networks | Apparatus and method for automatic assignment of industry classification codes |
-
2014
- 2014-12-02 CN CN201410721389.9A patent/CN105721519B/en active Active
-
2015
- 2015-11-26 WO PCT/CN2015/095584 patent/WO2016086784A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101136026A (en) * | 2007-05-15 | 2008-03-05 | 北京聚生科技有限公司 | Web page content capturing method based on XMLHTTP component technology |
CN103049542A (en) * | 2012-12-27 | 2013-04-17 | 北京信息科技大学 | Domain-oriented network information search method |
CN103092817A (en) * | 2013-01-18 | 2013-05-08 | 五八同城信息技术有限公司 | Data collection method and data collection device based on script engine |
US20140280014A1 (en) * | 2013-03-14 | 2014-09-18 | Glenbrook Networks | Apparatus and method for automatic assignment of industry classification codes |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106502802A (en) * | 2016-10-12 | 2017-03-15 | 山东浪潮云服务信息科技有限公司 | A kind of concurrent acquisition method in distributed high in the clouds transmitted based on Avro RPC |
CN110134841A (en) * | 2018-02-09 | 2019-08-16 | 鼎复数据科技(北京)有限公司 | The customized real-time method for obtaining website data |
CN109658689A (en) * | 2018-12-04 | 2019-04-19 | 沈阳世纪高通科技有限公司 | A kind of information processing method and device |
CN113114505A (en) * | 2021-04-13 | 2021-07-13 | 广州海鹚网络科技有限公司 | httpClient-based access request processing method and system |
Also Published As
Publication number | Publication date |
---|---|
WO2016086784A1 (en) | 2016-06-09 |
CN105721519B (en) | 2019-02-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106503134B (en) | Browser jumps to the method for data synchronization and device of application program | |
US10055762B2 (en) | Deep application crawling | |
US11989247B2 (en) | Indexing access limited native applications | |
US9251157B2 (en) | Enterprise node rank engine | |
CN105721519A (en) | Webpage data acquisition method, device and system | |
CN104765592B (en) | A kind of plug-in management method and its device of object web page acquisition tasks | |
CN108304410A (en) | A kind of detection method, device and the data analysing method of the abnormal access page | |
US20130191376A1 (en) | Identifying related entities | |
CN107688568A (en) | Acquisition method and device based on web page access behavior record | |
US20140317489A1 (en) | Device-independent validation of website elements | |
CN110555146A (en) | method and system for generating network crawler camouflage data | |
US20170277622A1 (en) | Web Page Automated Testing Method and Apparatus | |
KR101556743B1 (en) | Apparatus and method for generating poi information based on web collection | |
CN103577426B (en) | For providing the method, apparatus and system of the additional application information that search is suggested | |
KR100987330B1 (en) | A system and method generating multi-concept networks based on user's web usage data | |
US20170116336A1 (en) | Synchronizing http requests with respective html context | |
Klerkx et al. | How to share and reuse learning resources: the ARIADNE experience | |
CN117370203B (en) | Automatic test method, system, electronic equipment and storage medium | |
CN105808623B (en) | A kind of page access event correlation methodology and device based on search | |
CN104133762B (en) | Method for testing software and test device | |
JP2019101889A (en) | Test execution device and program | |
CN111966725A (en) | Data acquisition method and device applied between internal network and external network and electronic equipment | |
Jia et al. | Using the 5W+ 1H model in reporting systematic literature review: A case study on software testing for cloud computing | |
Kumar et al. | A brief investigation on web usage mining tools (WUM) | |
US9218418B2 (en) | Search expression generation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |