CN105824941B - A kind of accessible detection optimal sampling method in the website based on WAQM - Google Patents
A kind of accessible detection optimal sampling method in the website based on WAQM Download PDFInfo
- Publication number
- CN105824941B CN105824941B CN201610159027.4A CN201610159027A CN105824941B CN 105824941 B CN105824941 B CN 105824941B CN 201610159027 A CN201610159027 A CN 201610159027A CN 105824941 B CN105824941 B CN 105824941B
- Authority
- CN
- China
- Prior art keywords
- sampling
- webpage
- layer
- website
- accessible
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000005070 sampling Methods 0.000 title claims abstract description 100
- 238000001514 detection method Methods 0.000 title claims abstract description 23
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000005259 measurement Methods 0.000 claims abstract description 7
- 238000005457 optimization Methods 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005086 pumping Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
A kind of accessible detection optimal sampling method in the website based on WAQM, follows the steps below on the computer systems: all webpages in website to be detected is grouped according to different depth, and the webpage with same depth gathers at one group;Construct the expectational model of website sampling error;Given sampling ratio r calculates every layer of sampling webpage number by minimizing the expectation of sampling error;According to every layer of webpage sampling number, the webpage for randomly choosing defined amount in each layer forms sampling samples;To each webpage in sample, the accessible score of webpage is obtained using machine and artificial detection;According to accessible measurement standard, the accessible score of entire website is estimated using the accessible score of the webpage of sampling.Advantage of the process is that sampling error can be greatly lowered, the sample web page for enabling sampling algorithm to choose preferably reflects the accessible situation of entire website.
Description
Technical field
The present invention relates to the technical fields of the methods of sampling towards the accessible detection in website, are based particularly on the website of WAQM
The accessible detection methods of sampling.
Background technique
According to the Second China National Sample Survey on Disability, all kinds of disabled numbers in China are 82,960,000, are related to 2.6 hundred million people from family
Mouthful.More and more disabled persons obtain information using internet, entertain, get to know friend, and it is daily that internet has become disabled person
The important element of life.Since the self-defect of disabled person and most of website are obstacles to disabled person, disabled person is being interconnected
There are huge difficulties for net information service acquisition, utilization and interaction etc..Therefore, how effectively to find have for disabled person
The accessible degree of one website of webpage and rapid evaluation of obstacle becomes the important subject in Information barrier-free field.
It in actual detection process, is detected automatically since the accessible detection in website cannot fully achieve machine, part is examined
It surveys and needs artificial intervention;Generally there is the webpage of magnanimity to take out to reduce worker's detection overhead to website for website simultaneously
Detection is necessary after sample.
The sampling algorithm that accessible detection field is related in website at present be all it is extensive, i.e., sampling algorithm is not directed to tool
The accessible measurement standard of body optimizes.But existing research shows that a sampling algorithm causes in accessible detection
Sampling error depend greatly on selected accessible measurement standard;Some when sampling ratio is very big statistics indicate that have
Sampling algorithm still result in 20% sampling error, this illustrates that the sampling algorithm and the accessible standard mismatch, i.e.,
Make to have selected very big sample, bring sampling error is still very big.
Summary of the invention
The present invention will overcome the disadvantages mentioned above of the prior art, propose that a kind of accessible detection in the website based on WAQM is best and take out
Quadrat method
In order to reduce the sampling error in the accessible detection in website, we have proposed the sampling for being directed to accessible measurement standard
Algorithm.Since WAQM is the current accessible common measurement standard in field, we have proposed an optimization pumpings for WAQM
Sample algorithm, the algorithm greatly reduce sampling error, improve sampling quality, make the webpage of the selection of sampling algorithm better generation
The table accessible situation of entire website.
A kind of accessible detection optimal sampling method in website based on WAQM of the present invention, on the computer systems into
Row following steps:
1) all webpages in website to be detected are divided into (d+1) group according to different depth, with same depth
Webpage gathers at one group, and wherein d is the depth capacity of the website, and the depth of homepage is 0;
2) expectational model of the sampling error of website is constructed;
3) sampling ratio r is given, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage for randomly choosing defined amount in each layer forms sampling samples;
5) the accessible score of each webpage in sampling samples is obtained using machine and artificial detection;
6) according to relevant criterion, the accessible score of entire website is calculated using the accessible score of webpage.
Construction sampling error model described in step 2, the specific steps are as follows:
21) defining webpage depth first is that the minimum hop count of the webpage is jumped from the homepage of the website, and homepage depth is set as
0;
22) assume that maximum depth in website is d (d >=0), then can by all n webpages of the website according to
Depth is divided into (d+1) layer, and for the website construction with same depth at one layer, the number of every layer of webpage is respectively n0,n1,n2,
n3,…nd, and
23) expectation of the sampling error of every layer of webpage is calculated.Assuming that every layer of webpage number that sampling algorithm obtains is respectively
n0′n1′,n2' ... nd', then in i-th (0≤i≤d) layer, the average value of this layer of all webpage scores is ui, this layer sampling webpage
Average mark is ui', then the sampling error of i-th layer of webpage is desired for E [errori]=E [| ui-u′i|] (formula 1);
24) according to WAQM standard, the expectation that entire website is total to the sampling error of (d+1) layer is calculated, i.e.,Wherein niIndicate the total number of i-th layer of webpage, wiIndicate i-th layer of net
The weight of page, general wiValue is wi=e-i;
25) expectation of the sampling error of website is minimized, i.e.,
The expectation that the sampling error of website is minimized described in step 3, obtains every layer of sampling webpage number, specifically:
31) expectation and minimum of the sampling error of website are minimized Be it is equivalent, by formula (4) be used as final optimization pass function;
32) each webpage is scored at pi, due to all webpages be all it is independent, obey IID distribution, it is assumed that variance is all
σi(0≤i≤d), can do following variation:
33) formula (5) is brought into formula (4), the majorized function after available conversion
34) formula (6) are directed to, can be regarded into as combinatorial optimization problem below, wherein every layer of sampling number
n0′,n1′,n2' ... nd' it is the parameter for needing to solve, define N '={ n0′,n1′,n2' ... nd', then available:
∑n′i=r* ∑ ni
35) greedy algorithm is proposed for the combinatorial optimization problem, steps are as follows:
A. it is initially every layer of sampling number n0′n1', n2' ... nd' is 1;
B. every layer of anticipation error added after 1 sampling number respectively is calculated;
C. the sampling number of anticipation error minimum respective layer after calculating is added 1, other layer constant;
D. b, step c, until reaching population of samples are repeated.
The invention proposes the accessible detection optimal sampling methods in the website based on WAQM, the advantage is that: the sampling side
Method can determine every layer of sampling number, greatly reduce sampling error, improve sampling quality, reduce artificial generation
Valence can be used in the accessible detection in website, choose the webpage of the accessible situation more representative of entire website.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Specific embodiment
Referring to attached drawing, the present invention is further illustrated:
In order to reduce the sampling error in accessible detection, a kind of present invention website based on WAQM accessible detection is most
The good methods of sampling, follows the steps below on the computer systems:
1) all webpages in website to be detected are divided into (d+1) group according to different depth, with same depth
Webpage gathers at one group, and wherein d is the depth capacity of the website, and the depth of homepage is 0;
2) expectational model of the sampling error of website is constructed;
3) sampling ratio r is given, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage for randomly choosing defined amount in each layer forms sampling samples;
5) the accessible score of each webpage in sampling samples is obtained using machine and artificial detection;
6) according to accessible measurement standard, the accessible score of entire website is calculated using the accessible score of webpage.
Construction sampling error model described in step 2, the specific steps are as follows:
21) defining webpage depth first is that the minimum hop count of the webpage is jumped from the homepage of the website, and homepage depth is set as
0;
22) assume that maximum depth in website is d (d >=0), then can by all n webpages of the website according to
Depth is divided into (d+1) layer, and for the website construction with same depth at one layer, the number of every layer of webpage is respectively n0,n1,n2,
n3,…nd, and
23) expectation of the sampling error of every layer of webpage is calculated.Assuming that every layer of webpage number that sampling algorithm obtains is respectively
n0′n1′,n2' ... nd', then in i-th (0≤i≤d) layer, the average value of this layer of all webpage scores is ui, this layer sampling webpage
Average mark is ui', then the sampling error of i-th layer of webpage is desired for E [errori]=E [| ui- u 'i|] (formula 1);
24) according to WAQM standard, the expectation that entire website is total to the sampling error of (d+1) layer is calculated, i.e.,Wherein niIndicate the total number of i-th layer of webpage, wiIndicate i-th layer of net
The weight of page, general wiValue is wi=e-i;
25) the sampling error expectation of website is minimized, i.e.,
The sampling error for the minimum website mentioned in step 3, obtains every layer of sampling webpage number, specifically:
31) it minimizes the sampling error expectation of website and minimizes Be it is equivalent, by formula (4) be used as final optimization pass function;
32) each webpage is scored at pi, due to all webpages be all it is independent, obey IID distribution, it is assumed that variance is all
σi(0≤i≤d), can do following variation:
33) formula (5) is brought into formula (4), the majorized function after available conversion
34) formula (6) are directed to, can be regarded as following combinatorial optimization problem, wherein every layer of sampling number n0′n1′,
n2' ... nd' need to be solved, define N '={ n0′,n1′,n2' ... nd', then available
∑n′i=r* ∑ ni
35) greedy algorithm is proposed for the combinatorial optimization problem, steps are as follows:
A. it is initially every layer of sampling number n0′n1′,n2' ... nd' it is all 1;
B. every layer of anticipation error added after 1 sampling number respectively is calculated;
C. the sampling number of anticipation error minimum respective layer after calculating is added 1, other layer constant;
D. b, step c, until reaching population of samples are repeated.
Content described in this specification embodiment is only enumerating to the way of realization of inventive concept, protection of the invention
Range should not be construed as being limited to the specific forms stated in the embodiments, and protection scope of the present invention is also and in this field skill
Art personnel conceive according to the present invention it is conceivable that equivalent technologies mean.
Claims (1)
1. a kind of accessible detection optimal sampling method in the website based on WAQM, follows the steps below on the computer systems:
1) all webpages in website to be detected are divided into d+1 group according to different depth, the webpage with same depth is poly-
At one group, wherein d is the depth capacity of the website, and the depth of homepage is 0;
2) expectational model of the sampling error of website is constructed;
3) sampling ratio r is given, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage for randomly choosing defined amount in each layer forms sampling samples;
5) to each webpage in sample, the accessible score of webpage is obtained using machine and artificial detection;
6) according to accessible measurement standard, the accessible of entire website is estimated using the accessible score of webpage in sampling samples
Score;
The expectational model of sampling error is constructed described in step 2, steps are as follows:
21) defining webpage depth first is that the minimum hop count of the webpage is jumped from the homepage of the website, and homepage depth is set as 0;
22) assume that the maximum depth in website is d (d >=0), then can be by all n webpages of the website according to depth
It is divided into (d+1) layer, for the website construction with same depth at one layer, the number of every layer of webpage is respectively n0, n1, n2, n3... nd,
And
23) expectation of the sampling error of every layer of webpage is calculated;Assuming that every layer of webpage number that sampling algorithm obtains is respectively n '0,
n1', n2' ... nd', then in i-th (0≤i≤d) layer, the average value of this layer of all webpage scores is ui, this layer sampling webpage
Average mark is ui', then the sampling error of i-th layer of webpage is desired for E [errori]=E [| ui-u′i|] (formula 1);
24) according to WAQM standard, the expectation that entire website is total to the sampling error of (d+1) layer is calculated, i.e.,Wherein niIndicate the total number of i-th layer of webpage, wiIndicate i-th layer of webpage
Weight, wiValue is wi=e-i;
25) the sampling error expectation of website is minimized, i.e.,
According to the expectation for the sampling error for minimizing website described in step 3, every layer of sampling webpage number, specific steps are obtained
It is:
31) it minimizes the sampling error expectation of website and minimizes Be it is equivalent, by formula (4) be used as final optimization pass function;
32) each webpage is scored at pi, since all webpages are all independent, obedience IID distributions, it is assumed that variance is all σi=D
(pi) (0≤i≤d), following variation can be done and obtained:
33) formula (5) is brought into formula (4), the majorized function after available conversion
34) formula (6) are directed to, can be regarded as following combinatorial optimization problem, wherein every layer of sampling number n0', n1',
n2' ... nd' it is the parameter for needing to solve, define N '={ n0', n1', n2' ... nd', then available
∑n′i=r* ∑ ni
35) greedy algorithm is proposed for the combinatorial optimization problem, steps are as follows:
A. it is initially every layer of sampling number n0′n1', n2' ... nd' it is all 1;
B. every layer of anticipation error added after 1 sampling number respectively is calculated;
C. the sampling number of anticipation error minimum respective layer after calculating is added 1, other layer constant;
D. b, step c, until reaching population of samples are repeated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610159027.4A CN105824941B (en) | 2016-03-21 | 2016-03-21 | A kind of accessible detection optimal sampling method in the website based on WAQM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610159027.4A CN105824941B (en) | 2016-03-21 | 2016-03-21 | A kind of accessible detection optimal sampling method in the website based on WAQM |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105824941A CN105824941A (en) | 2016-08-03 |
CN105824941B true CN105824941B (en) | 2019-02-05 |
Family
ID=56524762
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610159027.4A Active CN105824941B (en) | 2016-03-21 | 2016-03-21 | A kind of accessible detection optimal sampling method in the website based on WAQM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105824941B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108960274A (en) * | 2018-05-07 | 2018-12-07 | 浙江大学 | A kind of Active Learning Method for the accessible inspection assessment of webpage information |
CN108874883B (en) * | 2018-05-07 | 2021-08-17 | 浙江大学 | User experience partial order relation-based webpage information barrier-free detection method |
CN108923951B (en) * | 2018-05-07 | 2020-08-04 | 浙江大学 | Crowdsourcing-based task allocation method for website barrier-free detection system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279548A (en) * | 2013-06-06 | 2013-09-04 | 浙江大学 | Method for performing barrier-free detection on websites |
CN103823753A (en) * | 2014-01-22 | 2014-05-28 | 浙江大学 | Webpage sampling method oriented at barrier-free webpage content detection |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030005042A1 (en) * | 2001-07-02 | 2003-01-02 | Magnus Karlsson | Method and system for detecting aborted connections and modified documents from web server logs |
-
2016
- 2016-03-21 CN CN201610159027.4A patent/CN105824941B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279548A (en) * | 2013-06-06 | 2013-09-04 | 浙江大学 | Method for performing barrier-free detection on websites |
CN103823753A (en) * | 2014-01-22 | 2014-05-28 | 浙江大学 | Webpage sampling method oriented at barrier-free webpage content detection |
Non-Patent Citations (4)
Title |
---|
A sampling method based on URL clustering for fast web accessibility evaluation;Meng-ni ZHANG 等;《Front Inform Technol Electron Eng》;20150603;第16卷(第6期);第449-456页 |
Effects of sampling methods on web accessibility evaluations;Giorgio Brajnik 等;《Humanacomputer interaction》;20111231;第26卷(第3期);第1-8页 |
基于多元分析的多变量事后分层抽样方案设计;高岩;《统计与决策》;20101130(第22期);第8-10页 |
基于抽样和模板的网站无障碍检测方法;周宇;《万方数据》;20141103;正文第19-28页第3章以及摘要 |
Also Published As
Publication number | Publication date |
---|---|
CN105824941A (en) | 2016-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104008165B (en) | Club detecting method based on network topology and node attribute | |
CN105824941B (en) | A kind of accessible detection optimal sampling method in the website based on WAQM | |
Bezdek | Cluster validity with fuzzy sets | |
Minias et al. | Wing shape and migration in shorebirds: a comparative study | |
CN106503119A (en) | A kind of sort method of mobile interrogation platform vertical search result | |
CN112927782B (en) | Heart health state early warning system based on text emotion analysis | |
CN108960488A (en) | A kind of accurate prediction technique of saturation loading spatial distribution based on deep learning and Multi-source Information Fusion | |
CN109004997A (en) | A kind of frequency spectrum sensing method, device and computer readable storage medium | |
López-Alonso et al. | Measuring inequality in living standards with anthropometric indicators: The case of Mexico 1850–1986 | |
Ma et al. | Spatial Poisson models for examining the influence of climate and land cover pattern on bird species richness | |
Bear et al. | Inbreeding in outport Newfoundland | |
CN103268391A (en) | Naive-Bayes-based adaptive lightning disaster risk estimation method | |
CN107945871A (en) | A kind of blood disease intelligent classification system based on big data | |
CN108694247B (en) | Typhoon disaster analysis method based on microblog topic popularity | |
Smith et al. | Multiple variable indicator kriging: a procedure for integrating soil quality indicators | |
Zweig et al. | Body condition factor analysis for the American alligator (Alligator mississippiensis) | |
CN109377017A (en) | A kind of information system is practical and data health degree evaluation method | |
Grözinger et al. | Regional unemployment and individual satisfaction | |
CN106792531A (en) | The node positioning method and its device of a kind of sensor network | |
Karegowda et al. | Combining Akaike’s information criterion (AIC) and the golden-section search technique to find optimal numbers of k-nearest neighbors | |
CN108197259B (en) | Online topic big data detection method for network | |
Willekens et al. | Age-period-cohort (APC) analysis of mortality with applications to Soviet data | |
Kong et al. | Evolution of Scholar Networks During the COVID-19 Outbreak | |
Repiská et al. | Hierarchical clustering based on international sustainability indices of EU countries | |
Głodowska | Multidimensional analysis of social convergence within the European Union countries |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |