CN105824941B - A kind of accessible detection optimal sampling method in the website based on WAQM - Google Patents

A kind of accessible detection optimal sampling method in the website based on WAQM Download PDF

Info

Publication number
CN105824941B
CN105824941B CN201610159027.4A CN201610159027A CN105824941B CN 105824941 B CN105824941 B CN 105824941B CN 201610159027 A CN201610159027 A CN 201610159027A CN 105824941 B CN105824941 B CN 105824941B
Authority
CN
China
Prior art keywords
sampling
webpage
layer
website
accessible
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610159027.4A
Other languages
Chinese (zh)
Other versions
CN105824941A (en
Inventor
王灿
卜佳俊
张梦妮
于智
王炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201610159027.4A priority Critical patent/CN105824941B/en
Publication of CN105824941A publication Critical patent/CN105824941A/en
Application granted granted Critical
Publication of CN105824941B publication Critical patent/CN105824941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A kind of accessible detection optimal sampling method in the website based on WAQM, follows the steps below on the computer systems: all webpages in website to be detected is grouped according to different depth, and the webpage with same depth gathers at one group;Construct the expectational model of website sampling error;Given sampling ratio r calculates every layer of sampling webpage number by minimizing the expectation of sampling error;According to every layer of webpage sampling number, the webpage for randomly choosing defined amount in each layer forms sampling samples;To each webpage in sample, the accessible score of webpage is obtained using machine and artificial detection;According to accessible measurement standard, the accessible score of entire website is estimated using the accessible score of the webpage of sampling.Advantage of the process is that sampling error can be greatly lowered, the sample web page for enabling sampling algorithm to choose preferably reflects the accessible situation of entire website.

Description

A kind of accessible detection optimal sampling method in the website based on WAQM
Technical field
The present invention relates to the technical fields of the methods of sampling towards the accessible detection in website, are based particularly on the website of WAQM The accessible detection methods of sampling.
Background technique
According to the Second China National Sample Survey on Disability, all kinds of disabled numbers in China are 82,960,000, are related to 2.6 hundred million people from family Mouthful.More and more disabled persons obtain information using internet, entertain, get to know friend, and it is daily that internet has become disabled person The important element of life.Since the self-defect of disabled person and most of website are obstacles to disabled person, disabled person is being interconnected There are huge difficulties for net information service acquisition, utilization and interaction etc..Therefore, how effectively to find have for disabled person The accessible degree of one website of webpage and rapid evaluation of obstacle becomes the important subject in Information barrier-free field.
It in actual detection process, is detected automatically since the accessible detection in website cannot fully achieve machine, part is examined It surveys and needs artificial intervention;Generally there is the webpage of magnanimity to take out to reduce worker's detection overhead to website for website simultaneously Detection is necessary after sample.
The sampling algorithm that accessible detection field is related in website at present be all it is extensive, i.e., sampling algorithm is not directed to tool The accessible measurement standard of body optimizes.But existing research shows that a sampling algorithm causes in accessible detection Sampling error depend greatly on selected accessible measurement standard;Some when sampling ratio is very big statistics indicate that have Sampling algorithm still result in 20% sampling error, this illustrates that the sampling algorithm and the accessible standard mismatch, i.e., Make to have selected very big sample, bring sampling error is still very big.
Summary of the invention
The present invention will overcome the disadvantages mentioned above of the prior art, propose that a kind of accessible detection in the website based on WAQM is best and take out Quadrat method
In order to reduce the sampling error in the accessible detection in website, we have proposed the sampling for being directed to accessible measurement standard Algorithm.Since WAQM is the current accessible common measurement standard in field, we have proposed an optimization pumpings for WAQM Sample algorithm, the algorithm greatly reduce sampling error, improve sampling quality, make the webpage of the selection of sampling algorithm better generation The table accessible situation of entire website.
A kind of accessible detection optimal sampling method in website based on WAQM of the present invention, on the computer systems into Row following steps:
1) all webpages in website to be detected are divided into (d+1) group according to different depth, with same depth Webpage gathers at one group, and wherein d is the depth capacity of the website, and the depth of homepage is 0;
2) expectational model of the sampling error of website is constructed;
3) sampling ratio r is given, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage for randomly choosing defined amount in each layer forms sampling samples;
5) the accessible score of each webpage in sampling samples is obtained using machine and artificial detection;
6) according to relevant criterion, the accessible score of entire website is calculated using the accessible score of webpage.
Construction sampling error model described in step 2, the specific steps are as follows:
21) defining webpage depth first is that the minimum hop count of the webpage is jumped from the homepage of the website, and homepage depth is set as 0;
22) assume that maximum depth in website is d (d >=0), then can by all n webpages of the website according to Depth is divided into (d+1) layer, and for the website construction with same depth at one layer, the number of every layer of webpage is respectively n0,n1,n2, n3,…nd, and
23) expectation of the sampling error of every layer of webpage is calculated.Assuming that every layer of webpage number that sampling algorithm obtains is respectively n0′n1′,n2' ... nd', then in i-th (0≤i≤d) layer, the average value of this layer of all webpage scores is ui, this layer sampling webpage Average mark is ui', then the sampling error of i-th layer of webpage is desired for E [errori]=E [| ui-u′i|] (formula 1);
24) according to WAQM standard, the expectation that entire website is total to the sampling error of (d+1) layer is calculated, i.e.,Wherein niIndicate the total number of i-th layer of webpage, wiIndicate i-th layer of net The weight of page, general wiValue is wi=e-i
25) expectation of the sampling error of website is minimized, i.e.,
The expectation that the sampling error of website is minimized described in step 3, obtains every layer of sampling webpage number, specifically:
31) expectation and minimum of the sampling error of website are minimized Be it is equivalent, by formula (4) be used as final optimization pass function;
32) each webpage is scored at pi, due to all webpages be all it is independent, obey IID distribution, it is assumed that variance is all σi(0≤i≤d), can do following variation:
33) formula (5) is brought into formula (4), the majorized function after available conversion
34) formula (6) are directed to, can be regarded into as combinatorial optimization problem below, wherein every layer of sampling number n0′,n1′,n2' ... nd' it is the parameter for needing to solve, define N '={ n0′,n1′,n2' ... nd', then available:
∑n′i=r* ∑ ni
35) greedy algorithm is proposed for the combinatorial optimization problem, steps are as follows:
A. it is initially every layer of sampling number n0′n1', n2' ... nd' is 1;
B. every layer of anticipation error added after 1 sampling number respectively is calculated;
C. the sampling number of anticipation error minimum respective layer after calculating is added 1, other layer constant;
D. b, step c, until reaching population of samples are repeated.
The invention proposes the accessible detection optimal sampling methods in the website based on WAQM, the advantage is that: the sampling side Method can determine every layer of sampling number, greatly reduce sampling error, improve sampling quality, reduce artificial generation Valence can be used in the accessible detection in website, choose the webpage of the accessible situation more representative of entire website.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Specific embodiment
Referring to attached drawing, the present invention is further illustrated:
In order to reduce the sampling error in accessible detection, a kind of present invention website based on WAQM accessible detection is most The good methods of sampling, follows the steps below on the computer systems:
1) all webpages in website to be detected are divided into (d+1) group according to different depth, with same depth Webpage gathers at one group, and wherein d is the depth capacity of the website, and the depth of homepage is 0;
2) expectational model of the sampling error of website is constructed;
3) sampling ratio r is given, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage for randomly choosing defined amount in each layer forms sampling samples;
5) the accessible score of each webpage in sampling samples is obtained using machine and artificial detection;
6) according to accessible measurement standard, the accessible score of entire website is calculated using the accessible score of webpage.
Construction sampling error model described in step 2, the specific steps are as follows:
21) defining webpage depth first is that the minimum hop count of the webpage is jumped from the homepage of the website, and homepage depth is set as 0;
22) assume that maximum depth in website is d (d >=0), then can by all n webpages of the website according to Depth is divided into (d+1) layer, and for the website construction with same depth at one layer, the number of every layer of webpage is respectively n0,n1,n2, n3,…nd, and
23) expectation of the sampling error of every layer of webpage is calculated.Assuming that every layer of webpage number that sampling algorithm obtains is respectively n0′n1′,n2' ... nd', then in i-th (0≤i≤d) layer, the average value of this layer of all webpage scores is ui, this layer sampling webpage Average mark is ui', then the sampling error of i-th layer of webpage is desired for E [errori]=E [| ui- u 'i|] (formula 1);
24) according to WAQM standard, the expectation that entire website is total to the sampling error of (d+1) layer is calculated, i.e.,Wherein niIndicate the total number of i-th layer of webpage, wiIndicate i-th layer of net The weight of page, general wiValue is wi=e-i
25) the sampling error expectation of website is minimized, i.e.,
The sampling error for the minimum website mentioned in step 3, obtains every layer of sampling webpage number, specifically:
31) it minimizes the sampling error expectation of website and minimizes Be it is equivalent, by formula (4) be used as final optimization pass function;
32) each webpage is scored at pi, due to all webpages be all it is independent, obey IID distribution, it is assumed that variance is all σi(0≤i≤d), can do following variation:
33) formula (5) is brought into formula (4), the majorized function after available conversion
34) formula (6) are directed to, can be regarded as following combinatorial optimization problem, wherein every layer of sampling number n0′n1′, n2' ... nd' need to be solved, define N '={ n0′,n1′,n2' ... nd', then available
∑n′i=r* ∑ ni
35) greedy algorithm is proposed for the combinatorial optimization problem, steps are as follows:
A. it is initially every layer of sampling number n0′n1′,n2' ... nd' it is all 1;
B. every layer of anticipation error added after 1 sampling number respectively is calculated;
C. the sampling number of anticipation error minimum respective layer after calculating is added 1, other layer constant;
D. b, step c, until reaching population of samples are repeated.
Content described in this specification embodiment is only enumerating to the way of realization of inventive concept, protection of the invention Range should not be construed as being limited to the specific forms stated in the embodiments, and protection scope of the present invention is also and in this field skill Art personnel conceive according to the present invention it is conceivable that equivalent technologies mean.

Claims (1)

1. a kind of accessible detection optimal sampling method in the website based on WAQM, follows the steps below on the computer systems:
1) all webpages in website to be detected are divided into d+1 group according to different depth, the webpage with same depth is poly- At one group, wherein d is the depth capacity of the website, and the depth of homepage is 0;
2) expectational model of the sampling error of website is constructed;
3) sampling ratio r is given, by minimizing the expectation of sampling error, calculates every layer of sampling webpage number;
4) according to every layer of webpage sampling number, the webpage for randomly choosing defined amount in each layer forms sampling samples;
5) to each webpage in sample, the accessible score of webpage is obtained using machine and artificial detection;
6) according to accessible measurement standard, the accessible of entire website is estimated using the accessible score of webpage in sampling samples Score;
The expectational model of sampling error is constructed described in step 2, steps are as follows:
21) defining webpage depth first is that the minimum hop count of the webpage is jumped from the homepage of the website, and homepage depth is set as 0;
22) assume that the maximum depth in website is d (d >=0), then can be by all n webpages of the website according to depth It is divided into (d+1) layer, for the website construction with same depth at one layer, the number of every layer of webpage is respectively n0, n1, n2, n3... nd, And
23) expectation of the sampling error of every layer of webpage is calculated;Assuming that every layer of webpage number that sampling algorithm obtains is respectively n '0, n1', n2' ... nd', then in i-th (0≤i≤d) layer, the average value of this layer of all webpage scores is ui, this layer sampling webpage Average mark is ui', then the sampling error of i-th layer of webpage is desired for E [errori]=E [| ui-u′i|] (formula 1);
24) according to WAQM standard, the expectation that entire website is total to the sampling error of (d+1) layer is calculated, i.e.,Wherein niIndicate the total number of i-th layer of webpage, wiIndicate i-th layer of webpage Weight, wiValue is wi=e-i
25) the sampling error expectation of website is minimized, i.e.,
According to the expectation for the sampling error for minimizing website described in step 3, every layer of sampling webpage number, specific steps are obtained It is:
31) it minimizes the sampling error expectation of website and minimizes Be it is equivalent, by formula (4) be used as final optimization pass function;
32) each webpage is scored at pi, since all webpages are all independent, obedience IID distributions, it is assumed that variance is all σi=D (pi) (0≤i≤d), following variation can be done and obtained:
33) formula (5) is brought into formula (4), the majorized function after available conversion
34) formula (6) are directed to, can be regarded as following combinatorial optimization problem, wherein every layer of sampling number n0', n1', n2' ... nd' it is the parameter for needing to solve, define N '={ n0', n1', n2' ... nd', then available
∑n′i=r* ∑ ni
35) greedy algorithm is proposed for the combinatorial optimization problem, steps are as follows:
A. it is initially every layer of sampling number n0′n1', n2' ... nd' it is all 1;
B. every layer of anticipation error added after 1 sampling number respectively is calculated;
C. the sampling number of anticipation error minimum respective layer after calculating is added 1, other layer constant;
D. b, step c, until reaching population of samples are repeated.
CN201610159027.4A 2016-03-21 2016-03-21 A kind of accessible detection optimal sampling method in the website based on WAQM Active CN105824941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610159027.4A CN105824941B (en) 2016-03-21 2016-03-21 A kind of accessible detection optimal sampling method in the website based on WAQM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610159027.4A CN105824941B (en) 2016-03-21 2016-03-21 A kind of accessible detection optimal sampling method in the website based on WAQM

Publications (2)

Publication Number Publication Date
CN105824941A CN105824941A (en) 2016-08-03
CN105824941B true CN105824941B (en) 2019-02-05

Family

ID=56524762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610159027.4A Active CN105824941B (en) 2016-03-21 2016-03-21 A kind of accessible detection optimal sampling method in the website based on WAQM

Country Status (1)

Country Link
CN (1) CN105824941B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960274A (en) * 2018-05-07 2018-12-07 浙江大学 A kind of Active Learning Method for the accessible inspection assessment of webpage information
CN108874883B (en) * 2018-05-07 2021-08-17 浙江大学 User experience partial order relation-based webpage information barrier-free detection method
CN108923951B (en) * 2018-05-07 2020-08-04 浙江大学 Crowdsourcing-based task allocation method for website barrier-free detection system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279548A (en) * 2013-06-06 2013-09-04 浙江大学 Method for performing barrier-free detection on websites
CN103823753A (en) * 2014-01-22 2014-05-28 浙江大学 Webpage sampling method oriented at barrier-free webpage content detection

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030005042A1 (en) * 2001-07-02 2003-01-02 Magnus Karlsson Method and system for detecting aborted connections and modified documents from web server logs

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279548A (en) * 2013-06-06 2013-09-04 浙江大学 Method for performing barrier-free detection on websites
CN103823753A (en) * 2014-01-22 2014-05-28 浙江大学 Webpage sampling method oriented at barrier-free webpage content detection

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A sampling method based on URL clustering for fast web accessibility evaluation;Meng-ni ZHANG 等;《Front Inform Technol Electron Eng》;20150603;第16卷(第6期);第449-456页
Effects of sampling methods on web accessibility evaluations;Giorgio Brajnik 等;《Humanacomputer interaction》;20111231;第26卷(第3期);第1-8页
基于多元分析的多变量事后分层抽样方案设计;高岩;《统计与决策》;20101130(第22期);第8-10页
基于抽样和模板的网站无障碍检测方法;周宇;《万方数据》;20141103;正文第19-28页第3章以及摘要

Also Published As

Publication number Publication date
CN105824941A (en) 2016-08-03

Similar Documents

Publication Publication Date Title
CN104008165B (en) Club detecting method based on network topology and node attribute
CN105824941B (en) A kind of accessible detection optimal sampling method in the website based on WAQM
Bezdek Cluster validity with fuzzy sets
Minias et al. Wing shape and migration in shorebirds: a comparative study
CN106503119A (en) A kind of sort method of mobile interrogation platform vertical search result
CN112927782B (en) Heart health state early warning system based on text emotion analysis
CN108960488A (en) A kind of accurate prediction technique of saturation loading spatial distribution based on deep learning and Multi-source Information Fusion
CN109004997A (en) A kind of frequency spectrum sensing method, device and computer readable storage medium
López-Alonso et al. Measuring inequality in living standards with anthropometric indicators: The case of Mexico 1850–1986
Ma et al. Spatial Poisson models for examining the influence of climate and land cover pattern on bird species richness
Bear et al. Inbreeding in outport Newfoundland
CN103268391A (en) Naive-Bayes-based adaptive lightning disaster risk estimation method
CN107945871A (en) A kind of blood disease intelligent classification system based on big data
CN108694247B (en) Typhoon disaster analysis method based on microblog topic popularity
Smith et al. Multiple variable indicator kriging: a procedure for integrating soil quality indicators
Zweig et al. Body condition factor analysis for the American alligator (Alligator mississippiensis)
CN109377017A (en) A kind of information system is practical and data health degree evaluation method
Grözinger et al. Regional unemployment and individual satisfaction
CN106792531A (en) The node positioning method and its device of a kind of sensor network
Karegowda et al. Combining Akaike’s information criterion (AIC) and the golden-section search technique to find optimal numbers of k-nearest neighbors
CN108197259B (en) Online topic big data detection method for network
Willekens et al. Age-period-cohort (APC) analysis of mortality with applications to Soviet data
Kong et al. Evolution of Scholar Networks During the COVID-19 Outbreak
Repiská et al. Hierarchical clustering based on international sustainability indices of EU countries
Głodowska Multidimensional analysis of social convergence within the European Union countries

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant