CN105824941A - WAQM-based optimal sampling method for network barrier-free detection - Google Patents
WAQM-based optimal sampling method for network barrier-free detection Download PDFInfo
- Publication number
- CN105824941A CN105824941A CN201610159027.4A CN201610159027A CN105824941A CN 105824941 A CN105824941 A CN 105824941A CN 201610159027 A CN201610159027 A CN 201610159027A CN 105824941 A CN105824941 A CN 105824941A
- Authority
- CN
- China
- Prior art keywords
- sampling
- webpage
- layer
- website
- error
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a WAQM-based optimal sampling method for network barrier-free detection. The method is implemented by means of performing the following steps on a computer system: grouping all webpages in a to-be-detected website according to different depths, and collecting the webpages with the same depth in a group; constructing an expectation model for website sampling errors; giving a sampling ratio r, and calculating a sampling webpage number of each layer by means of minimizing expectation of the sampling errors; randomly selecting a specified number of webpages from each layer according to the webpage sampling number of each layer so as to form a sampling specimen; performing machine detection and manual detection on each webpage in the specimen so as to obtain a barrier-free score of each webpage; estimating the barrier-free score of the whole website by utilization of the barrier-free scores of the sampled webpages according to a barrier-free measurement standard. The method disclosed by the invention has the advantages that the sampling error can be greatly lowered, and a barrier-free situation of the whole website can be better reflected by specimen webpages selected through a sampling algorithm.
Description
Technical field
The present invention relates to the technical field of the sampling approach of accessible detection towards website, be based particularly on the accessible detection in the website sampling approach of WAQM.
Background technology
According to the Second China National Sample Survey on Disability, all kinds of disabled number of China is 82,960,000, relates to 2.6 hundred million family populations.Increasing people with disability uses the Internet to obtain information, entertain, get to know friend, and the Internet has become as the important element of people with disability's daily life.Owing to self-defect and the major part website of people with disability are obstacles to people with disability, people with disability obtains at Internet Information Service, utilize and the aspect such as mutual exists huge difficulty.Therefore, the most effectively find have the webpage of obstacle and the accessible degree of one website of rapid evaluation to become the important subject in Information barrier-free field for people with disability.
During actual detection, automatically detecting owing to the accessible detection in website can not be fully achieved machine, part detection needs artificial intervention;Website typically has the webpage of magnanimity simultaneously, and in order to reduce workman's detection overhead, after being sampled website, detection is necessary.
The sampling algorithm that accessible detection field relates in website at present is all extensive, i.e. sampling algorithm is not optimized for concrete accessible criterion.But existing research shows, in accessible detection, the sampling error that sampling algorithm causes depends greatly on selected accessible criterion;Some data shows that, when sampling ratio is the biggest, some sampling algorithms still result in the sampling error of 20%, and this illustrates that this sampling algorithm and this accessible standard are not mated, even if having selected the biggest sample, the sampling error brought is the biggest.
Summary of the invention
The disadvantages mentioned above of present invention prior art to be overcome, proposes the accessible detection in a kind of website based on WAQM optimal sampling method
In order to reduce the sampling error in the accessible detection in website, we have proposed the sampling algorithm for accessible criterion.Owing to WAQM is the criterion that current accessible field is commonly used, we have proposed an optimization sampling algorithm for WAQM, this algorithm greatly reduces sampling error, improves sampling quality, makes the webpage chosen of sampling algorithm preferably represent the accessible situation of whole website.
The accessible detection in a kind of website based on WAQM of the present invention optimal sampling method, follows the steps below on the computer systems:
1) according to the different degree of depth, all webpages in website to be detected being divided into (d+1) group, the webpage with the same degree of depth gathers at one group, and wherein d is the depth capacity of this website, and the degree of depth of homepage is 0;
2) expectational model of the sampling error of structure website;
3) given sampling ratio r, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage composition sampling sample of defined amount is randomly choosed in each layer;
5) machine and manual detection is utilized to obtain the accessible score of each webpage in sample of sampling;
6) according to relevant criterion, the accessible score of webpage is utilized to calculate the accessible score of whole website.
Structure sampling error model described in step 2, specifically comprises the following steps that
21) the first definition webpage degree of depth is the minimum hop count that the homepage from this website redirects this webpage, and the homepage degree of depth is set to 0;
22) assume that the maximum degree of depth in website is d (d >=0), then according to the degree of depth, all n webpages of this website can be divided into (d+1) layer, have the website construction of the same degree of depth at one layer, the number of every layer of webpage is respectively n0,n1,n2,n3,…nd, and
23) expectation of the sampling error of every layer of webpage is calculated.Assume that every layer of webpage number that sampling algorithm obtains is respectively n0′n1′,n2' ... nd', then at i-th (0≤i≤d) layer, the meansigma methods of this layer of all webpage mark is ui, the average mark of this layer of sampling webpage is ui', then the sampling error of i-th layer of webpage be desired for E [errori]=E [| ui-u′i|] (formula 1);
24) according to WAQM standard, the expectation of the whole website sampling error of (d+1) layer altogether is calculated, i.e.Wherein niRepresent total number of i-th layer of webpage, wiRepresent the weight of i-th layer of webpage, general wiValue is wi=e-i;
25) expectation of the sampling error of website is minimized, i.e.
The expectation of the sampling error minimizing website described in step 3, draws every layer of sampling webpage number, specifically:
31) minimize the expectation of the sampling error of website and minimize It is equivalent, using formula (4) as final optimization pass function;
32) each webpage must be divided into pi, owing to all webpages are all independent, obey IID distribution, it is assumed that variance is all σi(0≤i≤d), can do and change as follows:
33) formula (5) is brought into formula (4), the majorized function after can being converted
34) for formula (6), can be seen and be made following combinatorial optimization problem, wherein the sampling number n of every layer0′,n1′,n2' ... nd' it is the parameter needing to solve, define N '={ n0′,n1′,n2' ... nd', then can obtain:
∑n′i=r* ∑ ni
35) proposing a greedy algorithm for this combinatorial optimization problem, step is as follows:
A. the sampling number n of every layer it is initially0′n1', n2' ... nd' is 1;
B. calculate every layer respectively add 1 sampling number after anticipation error;
C. after calculating, the sampling number of anticipation error minimum respective layer adds 1, and other layer constant;
D. b, step c are repeated, until reaching population of samples.
The present invention proposes the accessible detection in website based on WAQM optimal sampling method, have an advantage in that: this sampling approach may determine that the sampling number of every layer, reduce sampling error greatly, improve sampling quality, decrease artificial cost, can be used in the accessible detection in website, choose the webpage of the accessible situation that more can represent whole website.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of the present invention.
Detailed description of the invention
Referring to the drawings, the present invention is further illustrated:
In order to reduce the sampling error in accessible detection, the present invention accessible detection in a kind of website based on WAQM optimal sampling method, follow the steps below on the computer systems:
1) according to the different degree of depth, all webpages in website to be detected being divided into (d+1) group, the webpage with the same degree of depth gathers at one group, and wherein d is the depth capacity of this website, and the degree of depth of homepage is 0;
2) expectational model of the sampling error of structure website;
3) given sampling ratio r, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage composition sampling sample of defined amount is randomly choosed in each layer;
5) machine and manual detection is utilized to obtain the accessible score of each webpage in sample of sampling;
6) according to accessible criterion, the accessible score of webpage is utilized to calculate the accessible score of whole website.
Structure sampling error model described in step 2, specifically comprises the following steps that
21) the first definition webpage degree of depth is the minimum hop count that the homepage from this website redirects this webpage, and the homepage degree of depth is set to 0;
22) assume that the maximum degree of depth in website is d (d >=0), then according to the degree of depth, all n webpages of this website can be divided into (d+1) layer, have the website construction of the same degree of depth at one layer, the number of every layer of webpage is respectively n0,n1,n2,n3,…nd, and
23) expectation of the sampling error of every layer of webpage is calculated.Assume that every layer of webpage number that sampling algorithm obtains is respectively n0′n1′,n2' ... nd', then at i-th (0≤i≤d) layer, the meansigma methods of this layer of all webpage mark is ui, the average mark of this layer of sampling webpage is ui', then the sampling error of i-th layer of webpage be desired for E [errori]=E [| ui-u 'i|] (formula 1);
24) according to WAQM standard, the expectation of the whole website sampling error of (d+1) layer altogether is calculated, i.e.Wherein niRepresent total number of i-th layer of webpage, wiRepresent the weight of i-th layer of webpage, general wiValue is wi=e-i;
25) the sampling error expectation of website is minimized, i.e.
The sampling error minimizing website mentioned in step 3, draws every layer of sampling webpage number, specifically:
31) minimize the sampling error expectation of website and minimize It is equivalent, using formula (4) as final optimization pass function;
32) each webpage must be divided into pi, owing to all webpages are all independent, obey IID distribution, it is assumed that variance is all σi(0≤i≤d), can do and change as follows:
33) formula (5) is brought into formula (4), the majorized function after can being converted
34) for formula (6), can be regarded as following combinatorial optimization problem, wherein the sampling number n of every layer0′n1′,n2' ... nd' need to be solved, define N '={ n0′,n1′,n2' ... nd', then can obtain
∑n′i=r* ∑ ni
35) proposing a greedy algorithm for this combinatorial optimization problem, step is as follows:
A. the sampling number n of every layer it is initially0′n1′,n2' ... nd' it is all 1;
B. calculate every layer respectively add 1 sampling number after anticipation error;
C. after calculating, the sampling number of anticipation error minimum respective layer adds 1, and other layer constant;
D. b, step c are repeated, until reaching population of samples.
Content described in this specification embodiment is only enumerating of the way of realization to inventive concept; being not construed as of protection scope of the present invention is only limitted to the concrete form that embodiment is stated, protection scope of the present invention also and in those skilled in the art according to present inventive concept it is conceivable that equivalent technologies means.
Claims (3)
1. the accessible detection in website based on a WAQM optimal sampling method, follows the steps below on the computer systems:
1) according to the different degree of depth, all webpages in website to be detected being divided into d+1 group, the webpage with the same degree of depth gathers at one group, and wherein d is the depth capacity of this website, and the degree of depth of homepage is 0;
2) expectational model of the sampling error of structure website;
3) given sampling ratio r, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage composition sampling sample of defined amount is randomly choosed in each layer;
5) to each webpage in sample, machine and manual detection is utilized to obtain the accessible score of webpage;
6) according to accessible criterion, the accessible score of webpage in sampling sample is utilized to estimate the accessible score of whole website.
2. the accessible detection method in website as claimed in claim 1, it is characterised in that: the expectational model of the structure sampling error described in step 2, step is as follows:
21) the first definition webpage degree of depth is the minimum hop count that the homepage from this website redirects this webpage, and the homepage degree of depth is set to 0;
22) assume that the maximum degree of depth in website is d (d >=0), then according to the degree of depth, all n webpages of this website can be divided into (d+1) layer, have the website construction of the same degree of depth at one layer, the number of every layer of webpage is respectively n0,n1,n2,n3,…nd, and
23) expectation of the sampling error of every layer of webpage is calculated.Assume that every layer of webpage number that sampling algorithm obtains is respectively n '0,n1′,n2' ... nd', then at i-th (0≤i≤d) layer, the meansigma methods of this layer of all webpage mark is ui, the average mark of this layer of sampling webpage is ui', then the sampling error of i-th layer of webpage be desired for E [errori]=E [| ui-u′i|] (formula 1);
24) according to WAQM standard, the expectation of the whole website sampling error of (d+1) layer altogether is calculated, i.e.Wherein niRepresent total number of i-th layer of webpage, wiRepresent the weight of i-th layer of webpage, general wiValue is wi=e-i;
25) the sampling error expectation of website is minimized, i.e.
3. the accessible detection method in website as claimed in claim 1, it is characterised in that: the expectation according to the sampling error minimizing website described in step 3, draw every layer of sampling webpage number, comprise the concrete steps that:
31) minimize the sampling error expectation of website and minimize It is equivalent, using formula (4) as final optimization pass function;
32) each webpage must be divided into pi, owing to all webpages are all independent, obey IID distribution, it is assumed that variance is all σi=D (pi) (0≤i≤d), following change can be done and obtain:
33) formula (5) is brought into formula (4), the majorized function after can being converted
34) for formula (6), can be regarded as following combinatorial optimization problem, wherein the sampling number n of every layer0′,n1′,n2' ... nd' it is the parameter needing to solve, define N '={ n0′,n1′,n2' ... nd', then can obtain
∑n′i=r* ∑ ni
35) proposing a greedy algorithm for this combinatorial optimization problem, step is as follows:
A. the sampling number n of every layer it is initially0′n1′,n2' ... nd' it is all 1;
B. calculate every layer respectively add 1 sampling number after anticipation error;
C. after calculating, the sampling number of anticipation error minimum respective layer adds 1, and other layer constant;
D. b, step c are repeated, until reaching population of samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610159027.4A CN105824941B (en) | 2016-03-21 | 2016-03-21 | A kind of accessible detection optimal sampling method in the website based on WAQM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610159027.4A CN105824941B (en) | 2016-03-21 | 2016-03-21 | A kind of accessible detection optimal sampling method in the website based on WAQM |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105824941A true CN105824941A (en) | 2016-08-03 |
CN105824941B CN105824941B (en) | 2019-02-05 |
Family
ID=56524762
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610159027.4A Active CN105824941B (en) | 2016-03-21 | 2016-03-21 | A kind of accessible detection optimal sampling method in the website based on WAQM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105824941B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108874883A (en) * | 2018-05-07 | 2018-11-23 | 浙江大学 | A kind of accessible detection method of webpage information based on user experience partial ordering relation |
CN108923951A (en) * | 2018-05-07 | 2018-11-30 | 浙江大学 | A kind of method for allocating tasks of the accessible detection system in website based on crowdsourcing |
CN108960274A (en) * | 2018-05-07 | 2018-12-07 | 浙江大学 | A kind of Active Learning Method for the accessible inspection assessment of webpage information |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030005042A1 (en) * | 2001-07-02 | 2003-01-02 | Magnus Karlsson | Method and system for detecting aborted connections and modified documents from web server logs |
CN103279548A (en) * | 2013-06-06 | 2013-09-04 | 浙江大学 | Method for performing barrier-free detection on websites |
CN103823753A (en) * | 2014-01-22 | 2014-05-28 | 浙江大学 | Webpage sampling method oriented at barrier-free webpage content detection |
-
2016
- 2016-03-21 CN CN201610159027.4A patent/CN105824941B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030005042A1 (en) * | 2001-07-02 | 2003-01-02 | Magnus Karlsson | Method and system for detecting aborted connections and modified documents from web server logs |
CN103279548A (en) * | 2013-06-06 | 2013-09-04 | 浙江大学 | Method for performing barrier-free detection on websites |
CN103823753A (en) * | 2014-01-22 | 2014-05-28 | 浙江大学 | Webpage sampling method oriented at barrier-free webpage content detection |
Non-Patent Citations (4)
Title |
---|
GIORGIO BRAJNIK 等: "Effects of sampling methods on web accessibility evaluations", 《HUMANACOMPUTER INTERACTION》 * |
MENG-NI ZHANG 等: "A sampling method based on URL clustering for fast web accessibility evaluation", 《FRONT INFORM TECHNOL ELECTRON ENG》 * |
周宇: "基于抽样和模板的网站无障碍检测方法", 《万方数据》 * |
高岩: "基于多元分析的多变量事后分层抽样方案设计", 《统计与决策》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108874883A (en) * | 2018-05-07 | 2018-11-23 | 浙江大学 | A kind of accessible detection method of webpage information based on user experience partial ordering relation |
CN108923951A (en) * | 2018-05-07 | 2018-11-30 | 浙江大学 | A kind of method for allocating tasks of the accessible detection system in website based on crowdsourcing |
CN108960274A (en) * | 2018-05-07 | 2018-12-07 | 浙江大学 | A kind of Active Learning Method for the accessible inspection assessment of webpage information |
CN108923951B (en) * | 2018-05-07 | 2020-08-04 | 浙江大学 | Crowdsourcing-based task allocation method for website barrier-free detection system |
CN108874883B (en) * | 2018-05-07 | 2021-08-17 | 浙江大学 | User experience partial order relation-based webpage information barrier-free detection method |
Also Published As
Publication number | Publication date |
---|---|
CN105824941B (en) | 2019-02-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102521592B (en) | Multi-feature fusion salient region extracting method based on non-clear region inhibition | |
CN106326923B (en) | A kind of position data clustering method of registering taking position into account and repeating with density peaks point | |
CN109359137B (en) | User growth portrait construction method based on feature screening and semi-supervised learning | |
CN105824941A (en) | WAQM-based optimal sampling method for network barrier-free detection | |
CN111128398A (en) | Epidemic disease infected person number estimation method based on population migration big data | |
Yin et al. | Improved nonparametric estimation of the optimal diagnostic cut‐off point associated with the Youden index under different sampling schemes | |
CN105678590A (en) | topN recommendation method for social network based on cloud model | |
CN113889252B (en) | Remote internet big data intelligent medical system based on vital sign big data clustering core algorithm and block chain | |
WO2022170933A1 (en) | Error correction method and apparatus for laser ranging, electronic device, and storage medium | |
CN105574265B (en) | Entire assembly model quantitative description towards model index | |
CN109239553A (en) | A kind of clustering method based on local density of partial discharge pulse | |
CN112149922A (en) | Method for predicting severity of accident in exit and entrance area of down-link of highway tunnel | |
KR20090131014A (en) | Method for evaluating technology and service and forming service-oriented technology roadmap on the basis of patent information | |
CN116226103A (en) | Method for detecting government data quality based on FPGrow algorithm | |
CN111340058A (en) | Multi-source data fusion-based traffic distribution model parameter rapid checking method | |
CN104794896B (en) | Overpass congestion spacial hot spots extraction method based on lift height-limiting frame | |
CN102930532A (en) | Markov random field (MRF) iteration-based synthetic aperture radar (SAR) unsupervised change detection method and device | |
CN113139337B (en) | Partition interpolation processing method and device for lake topography simulation | |
CN109543236A (en) | Method is determined based on the rock structural plane roughness statistical sample number of variation lines several levels score analysis | |
CN111460796B (en) | Accidental sensitive word discovery method based on word network | |
CN104715160A (en) | Soft measurement modeling data outlier detecting method based on KMDB | |
CN106611339B (en) | Seed user screening method, and product user influence evaluation method and device | |
CN115508615B (en) | Load transient characteristic extraction method based on induction motor | |
Pérez-Hornero et al. | An annual JCR impact factor calculation based on Bayesian credibility formulas | |
CN112330225B (en) | Method, server and medium for obtaining influence degree of line loss influence factor through server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |