CN108280094A - Using upper and lower line number method and device according to statistics - Google Patents
Using upper and lower line number method and device according to statistics Download PDFInfo
- Publication number
- CN108280094A CN108280094A CN201710010785.4A CN201710010785A CN108280094A CN 108280094 A CN108280094 A CN 108280094A CN 201710010785 A CN201710010785 A CN 201710010785A CN 108280094 A CN108280094 A CN 108280094A
- Authority
- CN
- China
- Prior art keywords
- application
- address
- inquiry
- offline
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 10
- 238000005516 engineering process Methods 0.000 claims abstract description 14
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 abstract description 3
- 230000009193 crawling Effects 0.000 description 2
- 230000009194 climbing Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/954—Navigation, e.g. using categorised browsing
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Information Transfer Between Computers (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention belongs to apply data statistics technical field, and in particular to a kind of upper and lower line number of application according to statistics method and device, it can be achieved that judging using whether online and the case where using upper and lower line.The upper and lower line data statistical approach of application provided by the invention, including:It is accessed using crawler technology to having the application address in address date table;The inquiry state that server returns is obtained, according to online application and offline application in the inquiry statistic current slot, deletes application offline in described address tables of data.It is provided by the invention to apply upper and lower line number method and device according to statistics, repeat to crawl application address in address date table using crawler technology, in statistics application shop in a period of time (such as certain day, certain week, certain moon) apply it is online, reach the standard grade, offline situation.
Description
Technical field
The present invention relates to apply data statistics technical field, and in particular to a kind of upper and lower line data statistical approach of application and
Device.
Background technology
Mobile applications monitor, and mainly crawl the specifying information of application shop, using detailed letter by crawler technology
Breath, the download etc. each applied, count the application in application market, are provided reliably for Industry support, decision
Information.Since the update of each application is very frequent, there is new application to reach the standard grade daily, also have largely apply it is offline, answer
Version can also be constantly updated, and existing applied statistical method is all the data of cumulative statistics, therefore, it is impossible to under
The application of the application of line or more new version is counted, can not know current online application also how many, it is even more impossible to be informed in
Reach the standard grade in certain time/offline application how many.
Invention content
For the defects in the prior art, the upper and lower line number of application provided by the invention method and device according to statistics, using climbing
Worm technology repeats to crawl application address in address date table, applied in a period of time in statistics application shop it is online, reach the standard grade,
Offline situation.
In a first aspect, a kind of upper and lower line data statistical approach of application provided by the invention, including:Utilize crawler technology pair
The application address having in address date table accesses;The inquiry state that server returns is obtained, according to the inquiry state
Online application and offline application in current slot are counted, application offline in described address tables of data is deleted.
It is preferably, described according to online application and offline application in the inquiry statistic current slot,
Including:If the state of inquiry is to access to fail, the application address for accessing failure is put into newly-built wrong data table;If inquiry
State is to redirect, then is put into the web page address after redirecting in newly-built wrong data table;After having traversed described address tables of data,
The newly-built wrong data table of traversal accesses failure for inquiry state or redirects during traversing wrong data table
Situation then continues to establish the application address that new wrong data table storage accesses failure or redirects, until reaching preset condition, if
Also access the application address of failure, then it is assumed that application is offline, and offline application address is moved into offline tables of data.
Preferably, the preset condition is that traversal number reaches frequency threshold value or traversal time reaches time threshold.
Preferably, further include:If the inquiry state is to redirect, and can be crawled and be answered by the web page address after redirecting
The destination address redirected is then added in described address tables of data by information.
Preferably, further include:If the inquiry state is successfully, the message that the server returns is parsed, according to report
Literary content judges whether the version of application updates, the version updating situation applied in statistics current slot.
Second aspect, a kind of upper and lower line number of application provided by the invention device according to statistics, including:Data crawl module, use
In using crawler technology to have address date table in application address access;Applied statistics module, for obtaining service
The inquiry state that device returns is deleted according to online application and offline application in the inquiry statistic current slot
Except application offline in described address tables of data.
Preferably, the applied statistics module is specifically used for:If the state of inquiry is to access to fail, answering for failure will be accessed
It is put into address in newly-built wrong data table;If inquiry state is to redirect, the web page address after redirecting is put into newly-built
In wrong data table;After having traversed described address tables of data, newly-built wrong data table is traversed, in the mistake of traversal wrong data table
Cheng Zhong is then to continue to establish new wrong data table storage access failure the case where accessing failure or redirect for inquiry state
Or the application address redirected, until reaching preset condition, if there is the application address for accessing failure, then it is assumed that application is offline,
Offline application address is moved into offline tables of data.
Preferably, the preset condition is that traversal number reaches frequency threshold value or traversal time reaches time threshold.
Preferably, the applied statistics module is additionally operable to:If the inquiry state is to redirect, and passes through the webpage after redirecting
Address can crawl the information of application, then the destination address redirected is added in described address tables of data.
Preferably, the applied statistics module is additionally operable to:If the inquiry state is successfully, parses the server and return
The message returned judges whether the version of application updates according to message content, the version updating feelings applied in statistics current slot
Condition.
Description of the drawings
The flow chart for the upper and lower line data statistical approach of application that Fig. 1 is provided by the embodiment of the present invention;
The structure diagram of the upper and lower line number of application that Fig. 2 is provided by embodiment of the present invention device according to statistics.
Specific implementation mode
The embodiment of technical solution of the present invention is described in detail below in conjunction with attached drawing.Following embodiment is only used for
Clearly illustrate technical scheme of the present invention, therefore be intended only as example, and the protection of the present invention cannot be limited with this
Range.
It should be noted that unless otherwise indicated, technical term or scientific terminology used in this application should be this hair
The ordinary meaning that bright one of ordinary skill in the art are understood.
As shown in Figure 1, a kind of upper and lower line data statistical approach of application is present embodiments provided, including:
Step S1 is accessed using crawler technology to having the application address in address date table.
Wherein, address date table is used for storing the application address of application on site.Application address is with referring to the webpage where application
Location, that is, the URL applied.
Step S2 obtains the inquiry state that server returns, and is answered according to online in inquiry statistic current slot
With with offline application, delete offline application in address date table.
Wherein, inquiry state is that the return code returned according to server obtains, and table 1 gives part return code, this reality
It applies example and induction-arrangement has been carried out to return code according to the meaning of return code:Return code is that " 200 " then the state of inquiring is successfully, to indicate
Using online;Return code is " 302 ", " 303 " etc., and inquiry state is to redirect, and indicates that application may be also online, but application address is
Through changing;Return code is " 400 ", " 401 " etc., then it is to access to fail to inquire state;Return code is that " 304 " indicate no change
Change, does not deal with;For other return codes, it may be possible to be likely to be the networks such as network timeout, server timeout, packet loss and ask
Caused by topic, it is also possible to be that application is offline, then it is to access to fail to inquire state.
Table 1
The upper and lower line data statistical approach of application provided in this embodiment repeats to crawl address date table using crawler technology
In application address, in statistics application shop in a period of time (such as certain day, certain week, certain moon) application it is online, reach the standard grade, be offline
Situation.
During being crawled again using data, the problems such as network timeout, congestion, failure, it can all lead to not crawl
To the information of application.It is to carry out repeating to crawl to error number generally directed to the solution for occurring wrong data during crawling,
But network problem is difficult to solve in a short time, since the time interval for repeating to crawl is very short, the data crawled again are still
Wrong data, this mode, which reduces, crawls efficiency, it is also possible to aggravate network, server congestion degree.
In order to improve the efficiency for crawling application message, the accuracy of statistical data is improved, the preferred embodiment of step S2 includes:
The application address for accessing failure is put into newly-built wrong data by step S21 if inquiry state is to access to fail
In table;If inquiry state is to redirect, the web page address after redirecting is put into newly-built wrong data table.
Such as:If the inquiry code that server returns is 400,401,404,410,5## (being specifically shown in Table 1) when, it is corresponding
Inquiry state be access fail, then corresponding application address is put into newly-built wrong data table, so as to access failure
Application address is crawled again, avoids being that can not obtain data caused by network;If the inquiry code that server returns is 3##
When (being specifically shown in Table 1), the web page address after redirecting (applying new URL) is put into newly-built wrong data table, so as to address
The application to change carries out data and crawls.Wherein, if the inquiry code that server returns is 408, server waiting request is indicated
Time-out is then climbed again at once.
Step S22 after having traversed address date table, traverses newly-built wrong data table;In the mistake of traversal wrong data table
Cheng Zhong is to continue to establish new wrong data table storage access failure or jump the case where accessing failure or redirect to inquiry state
The application address turned;Repetition establishes new wrong data table until reaching preset condition, if there is the application address for accessing failure,
Then think that application is offline, offline application address is moved into offline tables of data.
Wherein, preset condition is that traversal number reaches frequency threshold value or traversal time reaches time threshold, and preset condition is
Occurs the inconclusible situation of ergodic process in order to prevent.
Step S23 deletes application offline in address date table, and current time is obtained according to statistics in offline tables of data
The offline situation applied in section, the online situation for obtaining applying in current slot according to address tables of data.
The preferred embodiment of step S2 creates new wrong data table and accommodates access always after having traversed wrong data table
The application address of failure, traversal is executed for new wrong data table.On the one hand, the problems such as avoiding because of network, server congestion
Cause that application address can not be crawled, leads to statistical data mistake;On the other hand, it avoids that same address is repeated to climb in the short time
It takes, cause to get always is wrong data, is conducive to raising and crawls efficiency.When application address changes, server
The page of meeting return jump, it is to redirect to inquire state at this time, if can crawl the letter of application by the web page address after redirecting
The destination address redirected, then be added in address date table by breath, with the application address of update application, facilitates later statistics.
The version of application updates often, for the version updating situation of statistics application, when the state of inquiry is successfully, parsing
The message that server returns judges whether the version of application updates according to message content, the version applied in statistics current slot
This update status.
Based on inventive concept identical with the upper and lower line data statistical approach of above application, the present embodiment additionally provides one kind
Using upper and lower line number device according to statistics, as shown in Fig. 2, including:Data crawl module, for utilizing crawler technology to having ground
Application address in the tables of data of location accesses;Applied statistics module, the inquiry state for obtaining server return, according to looking into
Online application and offline application in statistic current slot are ask, application offline in address date table is deleted.
It is provided in this embodiment to apply upper and lower line number device according to statistics, it repeats to crawl address date table using crawler technology
In application address, in statistics application shop in a period of time (such as certain day, certain week, certain moon) application it is online, reach the standard grade, be offline
Situation.
Further, applied statistics module is specifically used for:If the state of inquiry is to access to fail, the application of failure will be accessed
Address is put into newly-built wrong data table;If inquiry state is to redirect, the web page address after redirecting is put into newly-built mistake
Accidentally in tables of data;After having traversed address date table, newly-built wrong data table is traversed, during traversing wrong data table,
It is then to continue to establish new wrong data table storage access failure or redirect the case where accessing failure or redirect for inquiry state
Application address, until reaching preset condition, if also have access failure application address, then it is assumed that application is offline, under
The application address of line moves into offline tables of data.
Wherein, preset condition is that traversal number reaches frequency threshold value or traversal time reaches time threshold.
Wherein, applied statistics module is additionally operable to:If inquiry state is to redirect, and can be climbed by the web page address after redirecting
The information of application is got, then the destination address redirected is added in address date table.
Wherein, applied statistics module is additionally operable to:If inquiry state is the successfully message that resolution server returns, according to
Message content judges whether the version of application updates, the version updating situation applied in statistics current slot.
Finally it should be noted that:The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Present invention has been described in detail with reference to the aforementioned embodiments for pipe, it will be understood by those of ordinary skill in the art that:Its according to
So can with technical scheme described in the above embodiments is modified, either to which part or all technical features into
Row equivalent replacement;And these modifications or replacements, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme should all cover in the claim of the present invention and the range of specification.
Claims (10)
1. a kind of upper and lower line data statistical approach of application, which is characterized in that including:
It is accessed using crawler technology to having the application address in address date table;
Obtain the inquiry state that server returns, according to online application in the inquiry statistic current slot and under
Application offline in described address tables of data is deleted in the application of line.
2. according to the method described in claim 1, it is characterized in that, described according in the inquiry statistic current slot
Online application and offline application, including:
If the state of inquiry is to access to fail, the application address for accessing failure is put into newly-built wrong data table;If inquiry
State is to redirect, then is put into the web page address after redirecting in newly-built wrong data table;
After having traversed described address tables of data, newly-built wrong data table is traversed, during traversing wrong data table, for
Inquiry state is to continue to establish the application that new wrong data table storage accesses failure or redirects the case where accessing failure or redirect
Address, until reaching preset condition, if there is the application address for accessing failure, then it is assumed that application is offline, and offline is answered
It is moved into offline tables of data with address.
3. according to the method described in claim 2, it is characterized in that, the preset condition be traversal number reach frequency threshold value or
Traversal time reaches time threshold.
4. according to the method described in claim 1, it is characterized in that, further including:If the inquiry state is to redirect, and passes through jump
Web page address after turning can crawl the information of application, then the destination address redirected is added in described address tables of data.
5. according to the method described in claim 1, it is characterized in that, further including:If the inquiry state is successfully, to parse institute
The message for stating server return judges whether the version of application updates according to message content, is applied in statistics current slot
Version updating situation.
6. a kind of applying upper and lower line number device according to statistics, which is characterized in that including:
Data crawl module, for being accessed using crawler technology to having the application address in address date table;
Applied statistics module, the inquiry state for obtaining server return, according to the inquiry statistic current slot
Application offline in described address tables of data is deleted in interior online application and offline application.
7. device according to claim 6, which is characterized in that the applied statistics module is specifically used for:
If the state of inquiry is to access to fail, the application address for accessing failure is put into newly-built wrong data table;If inquiry
State is to redirect, then is put into the web page address after redirecting in newly-built wrong data table;
After having traversed described address tables of data, newly-built wrong data table is traversed, during traversing wrong data table, for
Inquiry state is to continue to establish the application that new wrong data table storage accesses failure or redirects the case where accessing failure or redirect
Address, until reaching preset condition, if there is the application address for accessing failure, then it is assumed that application is offline, and offline is answered
It is moved into offline tables of data with address.
8. device according to claim 7, which is characterized in that the preset condition be traversal number reach frequency threshold value or
Traversal time reaches time threshold.
9. device according to claim 6, which is characterized in that the applied statistics module is additionally operable to:If the inquiry shape
State is to redirect, and the information of application can be crawled by the web page address after redirecting, then is added to the destination address redirected
In described address tables of data.
10. device according to claim 6, which is characterized in that the applied statistics module is additionally operable to:If the inquiry shape
State is successfully, then to parse the message that the server returns, and judges whether the version of application updates according to message content, statistics is worked as
The version updating situation applied in the preceding period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710010785.4A CN108280094B (en) | 2017-01-06 | 2017-01-06 | Application up-line and down-line data statistical method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710010785.4A CN108280094B (en) | 2017-01-06 | 2017-01-06 | Application up-line and down-line data statistical method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108280094A true CN108280094A (en) | 2018-07-13 |
CN108280094B CN108280094B (en) | 2022-06-17 |
Family
ID=62800985
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710010785.4A Expired - Fee Related CN108280094B (en) | 2017-01-06 | 2017-01-06 | Application up-line and down-line data statistical method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108280094B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325050A (en) * | 2018-08-01 | 2019-02-12 | 吉林盘古网络科技股份有限公司 | Data query method, apparatus and terminal device |
CN111046316A (en) * | 2019-12-16 | 2020-04-21 | 北京智游网安科技有限公司 | Application on-shelf state monitoring method, intelligent terminal and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110153631A1 (en) * | 2009-12-23 | 2011-06-23 | Kondasani Thakur B | Methods and systems for detecting broken links within a file |
CN105528416A (en) * | 2015-12-07 | 2016-04-27 | 中南大学 | Method and system for monitoring update contents of website |
CN105719162A (en) * | 2016-01-20 | 2016-06-29 | 北京京东尚科信息技术有限公司 | Method and device of monitoring validity of promotion links |
CN106230809A (en) * | 2016-07-27 | 2016-12-14 | 南京快页数码科技有限公司 | A kind of mobile Internet public sentiment monitoring method based on URL and system |
-
2017
- 2017-01-06 CN CN201710010785.4A patent/CN108280094B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110153631A1 (en) * | 2009-12-23 | 2011-06-23 | Kondasani Thakur B | Methods and systems for detecting broken links within a file |
CN105528416A (en) * | 2015-12-07 | 2016-04-27 | 中南大学 | Method and system for monitoring update contents of website |
CN105719162A (en) * | 2016-01-20 | 2016-06-29 | 北京京东尚科信息技术有限公司 | Method and device of monitoring validity of promotion links |
CN106230809A (en) * | 2016-07-27 | 2016-12-14 | 南京快页数码科技有限公司 | A kind of mobile Internet public sentiment monitoring method based on URL and system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325050A (en) * | 2018-08-01 | 2019-02-12 | 吉林盘古网络科技股份有限公司 | Data query method, apparatus and terminal device |
CN111046316A (en) * | 2019-12-16 | 2020-04-21 | 北京智游网安科技有限公司 | Application on-shelf state monitoring method, intelligent terminal and storage medium |
CN111046316B (en) * | 2019-12-16 | 2023-03-21 | 北京智游网安科技有限公司 | Application on-shelf state monitoring method, intelligent terminal and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108280094B (en) | 2022-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE69931157T2 (en) | METHOD AND DEVICE FOR SEPARATING BROWSER FUNCTIONALITY BETWEEN A WIRELESS CLIENT AND A PART OF THE INFRASTRUCTURE IN A WIRELESS COMMUNICATION SYSTEM | |
CN107832428A (en) | Webpage method for monitoring state and system based on Website page | |
CN104252530B (en) | A kind of unit crawler capturing method and system | |
BR0314366A (en) | Method and system for providing routing information for establishing connections in the communication system, mobile terminal, routing server, and, computer program | |
CN101610268B (en) | Implementation method and equipment of keyword filtration | |
DE60228333D1 (en) | ENABLING AN CONTENT DELIVERED BY CONTENTS BY A SPECIFIC RADIO ACCESS NETWORK | |
CN107040863A (en) | Real time business recommends method and system | |
CN108280094A (en) | Using upper and lower line number method and device according to statistics | |
CA2605849A1 (en) | Wireless data device performance monitor | |
CN109302437A (en) | A kind of method and apparatus redirecting website | |
CN102904765A (en) | Method and equipment for data reporting | |
CN100499590C (en) | Message access controlling method and a network apparatus | |
CN105320758A (en) | Search service platform and search service method therefor | |
EP1531641A3 (en) | A server apparatus | |
CN101800712A (en) | Gateway apparatus, information communication method, information communication program, and information communication system | |
ATE311062T1 (en) | METHOD FOR PROVIDING A PROXY SERVER BASED SERVICE FOR A COMMUNICATIONS DEVICE IN A NETWORK | |
CN102281302A (en) | resource access processing method and system | |
CN103164213A (en) | Method, device and system of testing compatibility of Web browser | |
CN102681996A (en) | Pre-reading method and device | |
CN101877721A (en) | Terminal downloading automatic adaptation method and downloading server | |
CN105721632A (en) | Wireless access method and wireless access device based on DNS (Domain Name System) mechanism | |
CN103181140B (en) | Identify the method for service request type, media server and terminal unit | |
US20070016433A1 (en) | Method and apparatus for ranking support materials for service agents and customers | |
CN101123559A (en) | A green network access service deployment system and authorized access method for this service | |
CN106790635A (en) | Cookie information management method and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220617 |
|
CF01 | Termination of patent right due to non-payment of annual fee |