CN105824941A - WAQM-based optimal sampling method for network barrier-free detection - Google Patents

WAQM-based optimal sampling method for network barrier-free detection Download PDF

Info

Publication number
CN105824941A
CN105824941A CN201610159027.4A CN201610159027A CN105824941A CN 105824941 A CN105824941 A CN 105824941A CN 201610159027 A CN201610159027 A CN 201610159027A CN 105824941 A CN105824941 A CN 105824941A
Authority
CN
China
Prior art keywords
sampling
webpage
layer
website
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610159027.4A
Other languages
Chinese (zh)
Other versions
CN105824941B (en
Inventor
王灿
卜佳俊
张梦妮
于智
王炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201610159027.4A priority Critical patent/CN105824941B/en
Publication of CN105824941A publication Critical patent/CN105824941A/en
Application granted granted Critical
Publication of CN105824941B publication Critical patent/CN105824941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a WAQM-based optimal sampling method for network barrier-free detection. The method is implemented by means of performing the following steps on a computer system: grouping all webpages in a to-be-detected website according to different depths, and collecting the webpages with the same depth in a group; constructing an expectation model for website sampling errors; giving a sampling ratio r, and calculating a sampling webpage number of each layer by means of minimizing expectation of the sampling errors; randomly selecting a specified number of webpages from each layer according to the webpage sampling number of each layer so as to form a sampling specimen; performing machine detection and manual detection on each webpage in the specimen so as to obtain a barrier-free score of each webpage; estimating the barrier-free score of the whole website by utilization of the barrier-free scores of the sampled webpages according to a barrier-free measurement standard. The method disclosed by the invention has the advantages that the sampling error can be greatly lowered, and a barrier-free situation of the whole website can be better reflected by specimen webpages selected through a sampling algorithm.

Description

The accessible detection in a kind of website based on WAQM optimal sampling method
Technical field
The present invention relates to the technical field of the sampling approach of accessible detection towards website, be based particularly on the accessible detection in the website sampling approach of WAQM.
Background technology
According to the Second China National Sample Survey on Disability, all kinds of disabled number of China is 82,960,000, relates to 2.6 hundred million family populations.Increasing people with disability uses the Internet to obtain information, entertain, get to know friend, and the Internet has become as the important element of people with disability's daily life.Owing to self-defect and the major part website of people with disability are obstacles to people with disability, people with disability obtains at Internet Information Service, utilize and the aspect such as mutual exists huge difficulty.Therefore, the most effectively find have the webpage of obstacle and the accessible degree of one website of rapid evaluation to become the important subject in Information barrier-free field for people with disability.
During actual detection, automatically detecting owing to the accessible detection in website can not be fully achieved machine, part detection needs artificial intervention;Website typically has the webpage of magnanimity simultaneously, and in order to reduce workman's detection overhead, after being sampled website, detection is necessary.
The sampling algorithm that accessible detection field relates in website at present is all extensive, i.e. sampling algorithm is not optimized for concrete accessible criterion.But existing research shows, in accessible detection, the sampling error that sampling algorithm causes depends greatly on selected accessible criterion;Some data shows that, when sampling ratio is the biggest, some sampling algorithms still result in the sampling error of 20%, and this illustrates that this sampling algorithm and this accessible standard are not mated, even if having selected the biggest sample, the sampling error brought is the biggest.
Summary of the invention
The disadvantages mentioned above of present invention prior art to be overcome, proposes the accessible detection in a kind of website based on WAQM optimal sampling method
In order to reduce the sampling error in the accessible detection in website, we have proposed the sampling algorithm for accessible criterion.Owing to WAQM is the criterion that current accessible field is commonly used, we have proposed an optimization sampling algorithm for WAQM, this algorithm greatly reduces sampling error, improves sampling quality, makes the webpage chosen of sampling algorithm preferably represent the accessible situation of whole website.
The accessible detection in a kind of website based on WAQM of the present invention optimal sampling method, follows the steps below on the computer systems:
1) according to the different degree of depth, all webpages in website to be detected being divided into (d+1) group, the webpage with the same degree of depth gathers at one group, and wherein d is the depth capacity of this website, and the degree of depth of homepage is 0;
2) expectational model of the sampling error of structure website;
3) given sampling ratio r, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage composition sampling sample of defined amount is randomly choosed in each layer;
5) machine and manual detection is utilized to obtain the accessible score of each webpage in sample of sampling;
6) according to relevant criterion, the accessible score of webpage is utilized to calculate the accessible score of whole website.
Structure sampling error model described in step 2, specifically comprises the following steps that
21) the first definition webpage degree of depth is the minimum hop count that the homepage from this website redirects this webpage, and the homepage degree of depth is set to 0;
22) assume that the maximum degree of depth in website is d (d >=0), then according to the degree of depth, all n webpages of this website can be divided into (d+1) layer, have the website construction of the same degree of depth at one layer, the number of every layer of webpage is respectively n0,n1,n2,n3,…nd, and
23) expectation of the sampling error of every layer of webpage is calculated.Assume that every layer of webpage number that sampling algorithm obtains is respectively n0′n1′,n2' ... nd', then at i-th (0≤i≤d) layer, the meansigma methods of this layer of all webpage mark is ui, the average mark of this layer of sampling webpage is ui', then the sampling error of i-th layer of webpage be desired for E [errori]=E [| ui-u′i|] (formula 1);
24) according to WAQM standard, the expectation of the whole website sampling error of (d+1) layer altogether is calculated, i.e.Wherein niRepresent total number of i-th layer of webpage, wiRepresent the weight of i-th layer of webpage, general wiValue is wi=e-i
25) expectation of the sampling error of website is minimized, i.e.
The expectation of the sampling error minimizing website described in step 3, draws every layer of sampling webpage number, specifically:
31) minimize the expectation of the sampling error of website and minimize It is equivalent, using formula (4) as final optimization pass function;
32) each webpage must be divided into pi, owing to all webpages are all independent, obey IID distribution, it is assumed that variance is all σi(0≤i≤d), can do and change as follows:
33) formula (5) is brought into formula (4), the majorized function after can being converted
34) for formula (6), can be seen and be made following combinatorial optimization problem, wherein the sampling number n of every layer0′,n1′,n2' ... nd' it is the parameter needing to solve, define N '={ n0′,n1′,n2' ... nd', then can obtain:
∑n′i=r* ∑ ni
35) proposing a greedy algorithm for this combinatorial optimization problem, step is as follows:
A. the sampling number n of every layer it is initially0′n1', n2' ... nd' is 1;
B. calculate every layer respectively add 1 sampling number after anticipation error;
C. after calculating, the sampling number of anticipation error minimum respective layer adds 1, and other layer constant;
D. b, step c are repeated, until reaching population of samples.
The present invention proposes the accessible detection in website based on WAQM optimal sampling method, have an advantage in that: this sampling approach may determine that the sampling number of every layer, reduce sampling error greatly, improve sampling quality, decrease artificial cost, can be used in the accessible detection in website, choose the webpage of the accessible situation that more can represent whole website.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of the present invention.
Detailed description of the invention
Referring to the drawings, the present invention is further illustrated:
In order to reduce the sampling error in accessible detection, the present invention accessible detection in a kind of website based on WAQM optimal sampling method, follow the steps below on the computer systems:
1) according to the different degree of depth, all webpages in website to be detected being divided into (d+1) group, the webpage with the same degree of depth gathers at one group, and wherein d is the depth capacity of this website, and the degree of depth of homepage is 0;
2) expectational model of the sampling error of structure website;
3) given sampling ratio r, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage composition sampling sample of defined amount is randomly choosed in each layer;
5) machine and manual detection is utilized to obtain the accessible score of each webpage in sample of sampling;
6) according to accessible criterion, the accessible score of webpage is utilized to calculate the accessible score of whole website.
Structure sampling error model described in step 2, specifically comprises the following steps that
21) the first definition webpage degree of depth is the minimum hop count that the homepage from this website redirects this webpage, and the homepage degree of depth is set to 0;
22) assume that the maximum degree of depth in website is d (d >=0), then according to the degree of depth, all n webpages of this website can be divided into (d+1) layer, have the website construction of the same degree of depth at one layer, the number of every layer of webpage is respectively n0,n1,n2,n3,…nd, and
23) expectation of the sampling error of every layer of webpage is calculated.Assume that every layer of webpage number that sampling algorithm obtains is respectively n0′n1′,n2' ... nd', then at i-th (0≤i≤d) layer, the meansigma methods of this layer of all webpage mark is ui, the average mark of this layer of sampling webpage is ui', then the sampling error of i-th layer of webpage be desired for E [errori]=E [| ui-u 'i|] (formula 1);
24) according to WAQM standard, the expectation of the whole website sampling error of (d+1) layer altogether is calculated, i.e.Wherein niRepresent total number of i-th layer of webpage, wiRepresent the weight of i-th layer of webpage, general wiValue is wi=e-i
25) the sampling error expectation of website is minimized, i.e.
The sampling error minimizing website mentioned in step 3, draws every layer of sampling webpage number, specifically:
31) minimize the sampling error expectation of website and minimize It is equivalent, using formula (4) as final optimization pass function;
32) each webpage must be divided into pi, owing to all webpages are all independent, obey IID distribution, it is assumed that variance is all σi(0≤i≤d), can do and change as follows:
33) formula (5) is brought into formula (4), the majorized function after can being converted
34) for formula (6), can be regarded as following combinatorial optimization problem, wherein the sampling number n of every layer0′n1′,n2' ... nd' need to be solved, define N '={ n0′,n1′,n2' ... nd', then can obtain
∑n′i=r* ∑ ni
35) proposing a greedy algorithm for this combinatorial optimization problem, step is as follows:
A. the sampling number n of every layer it is initially0′n1′,n2' ... nd' it is all 1;
B. calculate every layer respectively add 1 sampling number after anticipation error;
C. after calculating, the sampling number of anticipation error minimum respective layer adds 1, and other layer constant;
D. b, step c are repeated, until reaching population of samples.
Content described in this specification embodiment is only enumerating of the way of realization to inventive concept; being not construed as of protection scope of the present invention is only limitted to the concrete form that embodiment is stated, protection scope of the present invention also and in those skilled in the art according to present inventive concept it is conceivable that equivalent technologies means.

Claims (3)

1. the accessible detection in website based on a WAQM optimal sampling method, follows the steps below on the computer systems:
1) according to the different degree of depth, all webpages in website to be detected being divided into d+1 group, the webpage with the same degree of depth gathers at one group, and wherein d is the depth capacity of this website, and the degree of depth of homepage is 0;
2) expectational model of the sampling error of structure website;
3) given sampling ratio r, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage composition sampling sample of defined amount is randomly choosed in each layer;
5) to each webpage in sample, machine and manual detection is utilized to obtain the accessible score of webpage;
6) according to accessible criterion, the accessible score of webpage in sampling sample is utilized to estimate the accessible score of whole website.
2. the accessible detection method in website as claimed in claim 1, it is characterised in that: the expectational model of the structure sampling error described in step 2, step is as follows:
21) the first definition webpage degree of depth is the minimum hop count that the homepage from this website redirects this webpage, and the homepage degree of depth is set to 0;
22) assume that the maximum degree of depth in website is d (d >=0), then according to the degree of depth, all n webpages of this website can be divided into (d+1) layer, have the website construction of the same degree of depth at one layer, the number of every layer of webpage is respectively n0,n1,n2,n3,…nd, and
23) expectation of the sampling error of every layer of webpage is calculated.Assume that every layer of webpage number that sampling algorithm obtains is respectively n '0,n1′,n2' ... nd', then at i-th (0≤i≤d) layer, the meansigma methods of this layer of all webpage mark is ui, the average mark of this layer of sampling webpage is ui', then the sampling error of i-th layer of webpage be desired for E [errori]=E [| ui-u′i|] (formula 1);
24) according to WAQM standard, the expectation of the whole website sampling error of (d+1) layer altogether is calculated, i.e.Wherein niRepresent total number of i-th layer of webpage, wiRepresent the weight of i-th layer of webpage, general wiValue is wi=e-i
25) the sampling error expectation of website is minimized, i.e.
3. the accessible detection method in website as claimed in claim 1, it is characterised in that: the expectation according to the sampling error minimizing website described in step 3, draw every layer of sampling webpage number, comprise the concrete steps that:
31) minimize the sampling error expectation of website and minimize It is equivalent, using formula (4) as final optimization pass function;
32) each webpage must be divided into pi, owing to all webpages are all independent, obey IID distribution, it is assumed that variance is all σi=D (pi) (0≤i≤d), following change can be done and obtain:
33) formula (5) is brought into formula (4), the majorized function after can being converted
34) for formula (6), can be regarded as following combinatorial optimization problem, wherein the sampling number n of every layer0′,n1′,n2' ... nd' it is the parameter needing to solve, define N '={ n0′,n1′,n2' ... nd', then can obtain
argmin N ′ Σ i = 0 d σ i n i ′ * n i * w i
s . t . ∀ 0 ≤ i ≤ d , n i ′ > 0
∑n′i=r* ∑ ni
35) proposing a greedy algorithm for this combinatorial optimization problem, step is as follows:
A. the sampling number n of every layer it is initially0′n1′,n2' ... nd' it is all 1;
B. calculate every layer respectively add 1 sampling number after anticipation error;
C. after calculating, the sampling number of anticipation error minimum respective layer adds 1, and other layer constant;
D. b, step c are repeated, until reaching population of samples.
CN201610159027.4A 2016-03-21 2016-03-21 A kind of accessible detection optimal sampling method in the website based on WAQM Active CN105824941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610159027.4A CN105824941B (en) 2016-03-21 2016-03-21 A kind of accessible detection optimal sampling method in the website based on WAQM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610159027.4A CN105824941B (en) 2016-03-21 2016-03-21 A kind of accessible detection optimal sampling method in the website based on WAQM

Publications (2)

Publication Number Publication Date
CN105824941A true CN105824941A (en) 2016-08-03
CN105824941B CN105824941B (en) 2019-02-05

Family

ID=56524762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610159027.4A Active CN105824941B (en) 2016-03-21 2016-03-21 A kind of accessible detection optimal sampling method in the website based on WAQM

Country Status (1)

Country Link
CN (1) CN105824941B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874883A (en) * 2018-05-07 2018-11-23 浙江大学 A kind of accessible detection method of webpage information based on user experience partial ordering relation
CN108923951A (en) * 2018-05-07 2018-11-30 浙江大学 A kind of method for allocating tasks of the accessible detection system in website based on crowdsourcing
CN108960274A (en) * 2018-05-07 2018-12-07 浙江大学 A kind of Active Learning Method for the accessible inspection assessment of webpage information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030005042A1 (en) * 2001-07-02 2003-01-02 Magnus Karlsson Method and system for detecting aborted connections and modified documents from web server logs
CN103279548A (en) * 2013-06-06 2013-09-04 浙江大学 Method for performing barrier-free detection on websites
CN103823753A (en) * 2014-01-22 2014-05-28 浙江大学 Webpage sampling method oriented at barrier-free webpage content detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030005042A1 (en) * 2001-07-02 2003-01-02 Magnus Karlsson Method and system for detecting aborted connections and modified documents from web server logs
CN103279548A (en) * 2013-06-06 2013-09-04 浙江大学 Method for performing barrier-free detection on websites
CN103823753A (en) * 2014-01-22 2014-05-28 浙江大学 Webpage sampling method oriented at barrier-free webpage content detection

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GIORGIO BRAJNIK 等: "Effects of sampling methods on web accessibility evaluations", 《HUMANACOMPUTER INTERACTION》 *
MENG-NI ZHANG 等: "A sampling method based on URL clustering for fast web accessibility evaluation", 《FRONT INFORM TECHNOL ELECTRON ENG》 *
周宇: "基于抽样和模板的网站无障碍检测方法", 《万方数据》 *
高岩: "基于多元分析的多变量事后分层抽样方案设计", 《统计与决策》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874883A (en) * 2018-05-07 2018-11-23 浙江大学 A kind of accessible detection method of webpage information based on user experience partial ordering relation
CN108923951A (en) * 2018-05-07 2018-11-30 浙江大学 A kind of method for allocating tasks of the accessible detection system in website based on crowdsourcing
CN108960274A (en) * 2018-05-07 2018-12-07 浙江大学 A kind of Active Learning Method for the accessible inspection assessment of webpage information
CN108923951B (en) * 2018-05-07 2020-08-04 浙江大学 Crowdsourcing-based task allocation method for website barrier-free detection system
CN108874883B (en) * 2018-05-07 2021-08-17 浙江大学 User experience partial order relation-based webpage information barrier-free detection method

Also Published As

Publication number Publication date
CN105824941B (en) 2019-02-05

Similar Documents

Publication Publication Date Title
CN102521592B (en) Multi-feature fusion salient region extracting method based on non-clear region inhibition
CN106326923B (en) A kind of position data clustering method of registering taking position into account and repeating with density peaks point
CN109359137B (en) User growth portrait construction method based on feature screening and semi-supervised learning
CN105824941A (en) WAQM-based optimal sampling method for network barrier-free detection
CN111128398A (en) Epidemic disease infected person number estimation method based on population migration big data
Yin et al. Improved nonparametric estimation of the optimal diagnostic cut‐off point associated with the Youden index under different sampling schemes
CN105678590A (en) topN recommendation method for social network based on cloud model
CN113889252B (en) Remote internet big data intelligent medical system based on vital sign big data clustering core algorithm and block chain
WO2022170933A1 (en) Error correction method and apparatus for laser ranging, electronic device, and storage medium
CN105574265B (en) Entire assembly model quantitative description towards model index
CN109239553A (en) A kind of clustering method based on local density of partial discharge pulse
CN112149922A (en) Method for predicting severity of accident in exit and entrance area of down-link of highway tunnel
KR20090131014A (en) Method for evaluating technology and service and forming service-oriented technology roadmap on the basis of patent information
CN116226103A (en) Method for detecting government data quality based on FPGrow algorithm
CN111340058A (en) Multi-source data fusion-based traffic distribution model parameter rapid checking method
CN104794896B (en) Overpass congestion spacial hot spots extraction method based on lift height-limiting frame
CN102930532A (en) Markov random field (MRF) iteration-based synthetic aperture radar (SAR) unsupervised change detection method and device
CN113139337B (en) Partition interpolation processing method and device for lake topography simulation
CN109543236A (en) Method is determined based on the rock structural plane roughness statistical sample number of variation lines several levels score analysis
CN111460796B (en) Accidental sensitive word discovery method based on word network
CN104715160A (en) Soft measurement modeling data outlier detecting method based on KMDB
CN106611339B (en) Seed user screening method, and product user influence evaluation method and device
CN115508615B (en) Load transient characteristic extraction method based on induction motor
Pérez-Hornero et al. An annual JCR impact factor calculation based on Bayesian credibility formulas
CN112330225B (en) Method, server and medium for obtaining influence degree of line loss influence factor through server

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant