CN107357716A - Apparatus and method for choosing webpage - Google Patents
Apparatus and method for choosing webpage Download PDFInfo
- Publication number
- CN107357716A CN107357716A CN201610305142.8A CN201610305142A CN107357716A CN 107357716 A CN107357716 A CN 107357716A CN 201610305142 A CN201610305142 A CN 201610305142A CN 107357716 A CN107357716 A CN 107357716A
- Authority
- CN
- China
- Prior art keywords
- webpage
- webpages
- similarity
- node
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3608—Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
Abstract
The present invention relates to a kind of apparatus and method that webpage is chosen from the application program including multiple webpages.Included according to the device of the selection webpage of the present invention:Webpage acquiring unit, for obtaining multiple webpages of application program;Characteristic element set determining unit, for determining the characteristic element set of each webpage in multiple webpages;Similarity determining unit, for determining the similarity of each two webpage in multiple webpages according to the characteristic element set of each webpage;Division unit, multiple webpages are divided into one or more webpages for the similarity according to each two webpage and combined;And selecting unit, each webpage for being combined from one or more webpages choose access frequency one webpage of highest in combining.The classification of webpage can be reliably achieved using the apparatus and method that webpage is chosen from the application program including multiple webpages according to the present invention, and can therefrom choose access frequency highest webpage.
Description
Technical field
The present invention relates to Application testing technical field, more particularly to for from including multiple webpages
The apparatus and method that webpage is chosen in application program.
Background technology
This part provides the background information relevant with the present invention, and this is not necessarily prior art.
Nowadays, mobile Internet becomes more and more numerous with the fast development of smart mobile phone and 3G/4G networks
Honor, and people gradually begin to use smart mobile phone.As the development of smart mobile phone, smart mobile phone etc. move
The application program (app) installed in dynamic terminal also emerges in an endless stream, and development is swift and violent.From the point of view of specific, currently
Application program can be divided into three classes:One kind is native applications program (native app), is typically relied on
Operating system, there is very strong interactivity, be a complete application program, expansibility is strong, and needs
Want user to download to install and use;Second is Web page application program (web app), and it uses Html5
(Hypertext Markup Language 5, the modification of HTML the 5th) language is write,
Installation need not be downloaded, existence in a browser, can also similar to now described light application program
Say be touch screen version Web page application program, such as open the website such as Sohu with smart mobile phone;The third is mixed
Application program (hybrid app) is closed, refers to the mixed application of half primary half type of webpage, it is needed
Download installation, it appears that similar native applications program, but the content accessed is webpage.
At present, people have put into many resources in terms of the exploitation of these application programs, but simultaneously, one
The outstanding application program of money also be unable to do without sufficiently test.For the test of native applications program, Ren Menyi
Through having done substantial amounts of work, testing tool and framework also have a lot.However, for Web page application program
Test, not yet proposes reliable testing scheme.
For the test of Web page application program, because a Web page application program includes many webpage pages
Face, each page html5 language developments.Therefore, a Web page application program is surveyed
Examination, ideally, it should each Webpage is tested, although such test is filled
Point, it is apparent that taking a long time, waste test resource.
For above technical problem, it is contemplated that be not each Webpage in a Web page application program
It is all critically important, and the structure that partial page be present is substantially similar, therefore the present invention wishes to propose one kind side
Case, the webpage in Web page application program can be classified, so as to be chosen from every a kind of webpage
One most representational webpage.So, by testing less page can as far as possible more
Web page application program is widely covered, so as to reduce the testing time, saves test resource.
The content of the invention
This part provides the general summary of the present invention, rather than its four corner or its whole feature
Full disclosure.
It is used to choose net from the application program including multiple webpages it is an object of the invention to provide a kind of
The apparatus and method of page, reliably can classify, and can therefrom choose and most represent to webpage
The webpage of property.
According to an aspect of the present invention, there is provided a kind of to choose net from the application program including multiple webpages
The device of page, including:Webpage acquiring unit, for obtaining multiple webpages of the application program;Feature
Element set determining unit, for determining the characteristic element set of each webpage in the multiple webpage;
Similarity determining unit, for determining the multiple webpage according to the characteristic element set of each webpage
In each two webpage similarity;Division unit, for being incited somebody to action according to the similarity of each two webpage
The multiple webpage is divided into one or more webpage combinations;And selecting unit, for from one or
Access frequency one webpage of highest is chosen in each webpage combination of multiple webpage combinations.
According to another aspect of the present invention, there is provided a kind of to be chosen from the application program including multiple webpages
The method of webpage, including:Obtain multiple webpages of the application program;Determine in the multiple webpage
The characteristic element set of each webpage;Determined according to the characteristic element set of each webpage the multiple
The similarity of each two webpage in webpage;According to the similarity of each two webpage by the multiple net
Page is divided into one or more webpage combinations;And each group of web from the combination of one or more of webpages
Access frequency one webpage of highest is chosen in conjunction.
According to another aspect of the present invention, there is provided a kind of program product, the program product include being stored in
Machine readable instructions code therein, wherein, the instruction code when by computer read and perform when,
It can perform the computer and net is chosen from the application program including multiple webpages according to the present invention
The method of page.
According to another aspect of the present invention, there is provided a kind of machinable medium, carry root thereon
According to the program product of the present invention.
Use device and the side that webpage is chosen from the application program including multiple webpages according to the present invention
Method, the similarity of each two webpage can be determined according to the characteristic element set of webpage, and according to every two
Webpage is divided into one or more combinations by the similarity of individual webpage, and then access is chosen from each combination
One webpage of frequency highest.So, the classification of webpage can be reliably achieved, and can be with base
Most representational webpage is chosen in each combination in access frequency.When the webpage of selection is used to apply
During program test, the time of test can be reduced in the case where ensureing to test effect, so as to save test
Resource.
Description and specific examples in this summary are intended merely to the purpose of signal, and are not intended to and limit this hair
Bright scope.
Brief description of the drawings
Accompanying drawing described here is intended merely to the purpose of the signal of selected embodiment and not all possible reality
Apply, and be not intended to limitation the scope of the present invention.In the accompanying drawings:
Fig. 1 is to choose net from the application program including multiple webpages according to embodiments of the invention
The structured flowchart of the device of page;
Fig. 2 is to choose net from the application program including multiple webpages according to embodiments of the invention
The structured flowchart of the characteristic element set determining unit of the device of page;
Fig. 3 shows the webpage of the application program according to an embodiment of the invention including multiple webpages
Between tree construction example;
Fig. 4 shows one of the application program according to an embodiment of the invention including multiple webpages
The example of webpage;
Fig. 5 shows the example of the partial source symbols of the webpage shown in Fig. 4;
Fig. 6 is to choose net from the application program including multiple webpages according to embodiments of the invention
The structured flowchart of the similarity determining unit of the device of page;
Fig. 7 is to choose net from the application program including multiple webpages according to embodiments of the invention
The structured flowchart of the division unit of the device of page;
Fig. 8 is to choose net from the application program including multiple webpages according to embodiments of the invention
The flow chart of the method for page;
Fig. 9 is to choose net from the application program including multiple webpages according to embodiments of the invention
The schematic diagram of the process of the method for page;And
Figure 10 is according to an embodiment of the invention from the application for including multiple webpages can wherein to realize
The block diagram of the example arrangement of the general purpose personal computer of the apparatus and method of webpage is chosen in program.
Although the present invention is subjected to various modifications and alternative forms, its specific embodiment is used as example
Son is shown in the drawings, and is described in detail here.It should be understood, however, that at this to particular implementation
The description of example is not intended to limit the invention to disclosed concrete form, but on the contrary, mesh of the present invention
Be intended to cover fall within the spirit and scope of the invention all modifications, it is equivalent and replace.It is noted that
, through several accompanying drawings, corresponding label indicates corresponding part.
Embodiment
The example of the present invention is described more fully with reference now to accompanying drawing.Description is substantially simply shown below
Example property, and it is not intended to the limitation present invention, application or purposes.
Example embodiment is provided below, so that the present invention will become detailed, and will be to this area
Technical staff fully passes on its scope.Elaborate numerous specific details such as discrete cell, device and side
The example of method, to provide the detailed understanding to embodiments of the invention.To those skilled in the art
It will be obvious that, it is not necessary to specific details is used, example embodiment can use many different forms
To implement, they shall not be interpreted to limit the scope of the present invention.In some example embodiments,
Well-known process, well-known structure and widely-known technique are not described in detail.
Fig. 1 is that webpage is chosen from the application program including multiple webpages according to embodiments of the invention
The structured flowchart of device.As shown in figure 1, the device 100 according to an embodiment of the invention for choosing webpage
Webpage acquiring unit 110, characteristic element set determining unit 120, similarity determining unit can be included
130th, division unit 140 and selection unit 150.
According to an embodiment of the invention, webpage acquiring unit 110 can obtain multiple webpages of application program.
According to an embodiment of the invention, application program can be the Web page application program being noted above, and wrap
Include multiple webpages.Webpage acquiring unit 110 can obtain application according to any method well known in the art
Multiple webpages of program.Further, multiple webpages of acquisition can be sent to by webpage acquiring unit 110
Characteristic element set determining unit 120.
According to an embodiment of the invention, characteristic element set determining unit 120 can be determined in multiple webpages
Each webpage characteristic element set.Here, characteristic element set determining unit 120 can be from webpage
Acquiring unit 110 obtains multiple webpages of application program, can include multiple elements on each webpage,
These elements for example can be the combination of word, picture or both.Characteristic element set determining unit 120
The characteristic element set of each webpage can be determined.According to an embodiment of the invention, characteristic element set is
The set of the element of the architectural feature of the webpage can be most represented in the element of webpage.To a certain extent, it is special
Sign element set just represents this webpage.Further, characteristic element set determining unit 120 can incite somebody to action
The characteristic element set of each webpage determined is sent to similarity determining unit 130.
According to an embodiment of the invention, similarity determining unit 130 can be according to the characteristic element of each webpage
Element set determines the similarity of each two webpage in multiple webpages.Here, similarity determining unit 130
The characteristic element set of each webpage can be obtained from characteristic element set determining unit 120.Hereinbefore
Mention, characteristic element set represents webpage corresponding thereto to a certain extent.Therefore, similarity
Determining unit 130 can determine the similarity of each two webpage according to characteristic element set.Specifically,
Similarity determining unit 130 can determine the two webpages according to the characteristic element set of two webpages
Similarity.Further, similarity determining unit 130 can be for each two webpage in multiple webpages all
Such operation is performed, so that it is determined that the similarity of each two webpage.Further, similarity determining unit
The similarity of each two webpage of determination can be sent to division unit 140 by 130.
According to an embodiment of the invention, division unit 140 can will more according to the similarity of each two webpage
Individual webpage is divided into one or more webpage combinations.Here, division unit 140 can determine single from similarity
Member 130 obtains the similarity of each two webpage, so as to according to the similarity of each two webpage by webpage
One or more webpage combinations are divided into, wherein the combination of each webpage can be regarded as a kind of net of classification
Page.Further, it is single can be sent to selection by division unit 140 for one or more webpages combination of division
Member 150.
Choose in each webpage combination that unit 150 can combine from one or more webpages and choose access frequency
One webpage of rate highest.According to an embodiment of the invention, it is desirable to choose one in the webpage of each classification
Individual most representational webpage, and access the high webpage of frequency be typically important webpage either
Relatively popular webpage, thus in the present invention, the importance of webpage is represented with access frequency.Choosing
Take unit 150 to be chosen in the combination of each webpage in webpage combination and access frequency one net of highest
Page, that is to say, that choose unit 150 and select the total number that the webpage number come is equal to webpage combination.
Further, the webpage output come can will be selected by choosing unit 150, including but not limited to be used to test
The application program.
As can be seen here, webpage is chosen from the application program including multiple webpages using according to the present invention
Apparatus and method, the similarity of each two webpage can be determined according to the characteristic element set of webpage, entered
And webpage is divided into one or more webpages and combined.Next, choose one from the combination of each webpage
Access frequency highest webpage.So, reliably webpage can be carried out according to the similarity of webpage
Classification.Further, it is possible to an access frequency highest webpage is chosen from the combination of each webpage, can
Those most important in all webpages and most representational webpages are enough chosen, and can enough efficiently reduces most
The number of the webpage selected afterwards.When these webpages selected are used for Application testing, can subtract
Few testing time, test resource is saved, simultaneously because being surveyed to wherein most representational webpage
Examination, thus can more fully utilize test resource in the case where ensureing to test effect.
Fig. 2 is that webpage is chosen from the application program including multiple webpages according to embodiments of the invention
The structured flowchart of the characteristic element set determining unit 120 of device.
As shown in Fig. 2 characteristic element set determining unit 120 can include the He of element acquiring unit 121
Determining unit 122.
According to an embodiment of the invention, element acquiring unit 121 can obtain multiple elements of each webpage.
According to an embodiment of the invention, each webpage can include multiple elements, and these elements can be
The combination of picture, word or both.In the present invention, element acquiring unit 121 can be according to ability
Any method known to domain obtains all elements on webpage.For example, element acquiring unit 121 can be with
Including Traversal Unit (not shown), for traversal applications program, so as to which element acquiring unit 121 can be with
The all of each webpage in multiple webpages of application program are obtained according to the traversing result of Traversal Unit
Element.In the art, it is known that the method for traversal applications program have a lot, such as depth-first
Algorithm or breadth first algorithm, the present invention are not limited this.
According to an embodiment of the invention, the Traversal Unit of element acquiring unit 121 can with traversal applications program,
So as to obtain each net in multiple webpages of the tree construction and application program between the webpage of application program
DOM (Document Object Model, DOM Document Object Model) tree construction of page.
Fig. 3 is shown between the webpage of the application program according to an embodiment of the invention including multiple webpages
Tree construction example.
As shown in figure 3, each node in tree represents a webpage in application program, between node
Arrow represents the jump relation between webpage.Also, the vertex representation of tree enters most starting for application program
Webpage.For example, in the example depicted in fig. 3, application program includes 7 webpages, one enters application
The webpage that program most starts is webpage 1, and webpage 2,3 and 4 can be jumped to after 1 by entering the Web page.Enter
One step, when entering the Web page after 2, webpage 5 and 6 can be jumped to, naturally it is also possible to return back to webpage
1, by that analogy.It can simply clearly indicate that the structure of application program and level close with the structure of tree
System.
Between the Traversal Unit traversal applications program of element acquiring unit 121 obtains the webpage of application program
Tree construction after, the source code of each webpage can also be obtained, and every to determine according to the source code of each webpage
The DOM tree structure of individual webpage.Each between the DOM tree structure of webpage and the webpage of application program
Tree construction is similar, and each node represents an element in webpage, the arrow between node represent element it
Between jump relation.Also, the first layer element that the vertex representation of dom tree is entered the Web page later.It is right
In each webpage, such dom tree can be determined, with the DOM tree structure of webpage
The structure and hierarchical relationship of webpage can simply be clearly indicated that.Further, Traversal Unit can be with root
The father's element and daughter element of all elements on this webpage are obtained according to the DOM tree structure of each webpage.This
In, father's element representation of an element can jump to this by which element on the page of element place
Element, the daughter element of an element represent which on the page of element place this element can jump to
Element.These information can be obtained easily from the DOM tree structure of webpage.
Here, Traversal Unit can also obtain all elements on each webpage by traversal applications program
Other relevant informations, including the information such as the position of element, attribute, type.
Further, element acquiring unit 121 can by the element information of acquisition, including the position of element,
Attribute, type, dom tree relevant information are sent to determining unit 122.
According to an embodiment of the invention, determining unit 122 can be according to each two element in multiple elements
Similarity determines the characteristic element of each webpage, and the spy using the set of characteristic element as each webpage
Levy element set.
According to an embodiment of the invention, determining unit 122 can obtain every from element acquiring unit 121
Multiple elements of individual webpage, so that it is determined that unit 122 can determine it is any in multiple elements of each webpage
The similarity of two elements, the characteristic element set of each webpage is determined according to the similarity between element.
According to an embodiment of the invention, determining unit 122 can be according to each two element in multiple elements
Multiple elements are divided into one or more element groups by similarity.It is next determined that unit 122 can be from
An element is chosen in each element group of one or more element groups, and using the element of selection as webpage
Characteristic element.
According to an embodiment of the invention, multiple elements can be divided into one or more by determining unit 122
Element group so that at least one other element in element group where any one element similar in appearance to its.
In this embodiment, determining unit 122 can divide one or more elements by following mode
Group:A., all element groups are set for sky;B. first element is put into an element group;C. will
Next element is compared with all elements in existing all elements group, if next member
It is plain similar to existing element, then next element is put into and this existing one
In the corresponding element group of element, if next element and existing all elements not phases
Seemingly, then next element is put into a new element group;Step c is repeated until the webpage
On last element.
In this embodiment, determining unit 122 can assign to similar element in one element group.
It is worth noting that, all elements present in an element group are not necessarily similar mutually,
That is as long as a new element is similar at least one element in some element group, it is possible to
This new element is put into this element group.Next, a member is only chosen in each element group
Element, this selection can be random selections or the certain regular selection of satisfaction.So,
Element in the characteristic element set finally chosen includes all types of elements on webpage and not had
Repeat.
According to an embodiment of the invention, determining unit 122 can be according to the position of element, type and element
The DOM tree structure of place webpage determines the similar of any two element in multiple elements of each webpage
Degree.
According to an embodiment of the invention, the position of element can include the coordinate and size information of element, member
The coordinate of element can include the abscissa and ordinate of element, and the size of element represents to include the outer of the element
The size of rectangle frame is connect, the type of element can include the types such as picture, word, input frame and button,
The DOM tree structure of webpage can determine the information such as father's element and the daughter element of element where element.According to
Embodiments of the invention, think that the two elements are when the information of two elements meets following conditions simultaneously
Similar element:1) abscissa of two elements or ordinate are identical;2) the type phase of two elements
Together;3) size of two elements is identical;4) DOM of two element webpages where the two elements
There is identical father's element in tree construction.In an embodiment of the present invention, the abscissa of two elements or
Ordinate is identical to represent that the two elements are in same row or same a line on webpage;The class of two elements
Type is identical to represent that two elements are all pictures, are all word, are all input frame or are all button;Two
Father's element of element is identical to represent that the two elements are by identical father's element saltus step.That is,
Illustrate the two elements and its similar if aforementioned four condition is met, belong to same category of element.
It is worth noting that, above though it is shown that determining unit 122 determines the similitude of two elements
One embodiment, but the present invention is not defined to this, determining unit 122 can be according to ability
Any method known to domain determines the similitude between element.
Show below in conjunction with one of the characteristic element set of clearly fixed each webpage for Fig. 4 and Fig. 5
Example.Fig. 4 shows a webpage of the application program according to an embodiment of the invention including multiple webpages
Example.Fig. 5 shows the example of the partial source symbols of the webpage shown in Fig. 4.
As shown in figure 4, include many elements on the webpage, including the element outlined with square frame,
Including the element not outlined with square frame, for example, " finance and economics ", " amusement " and " over 4 years first!U.S. a surname
Cloth sells platform escort vessel and guided missile " etc..These elements can pass through traversal applications program by Traversal Unit
The mode of each webpage obtains.Further, Traversal Unit can also obtain the relevant information of these elements,
The position of type, element including element, attribute of an element etc..Further, Traversal Unit can obtain
The source code of each webpage is taken, so as to obtain the DOM tree structure of each webpage.For example, Traversal Unit obtains
The partial source symbols of the webpage shown in Fig. 4 taken are as shown in Figure 5.It is worth noting that, here for the ease of
The purpose of explanation, Fig. 5 merely illustrate the partial source symbols of the webpage shown in Fig. 4, are not shown in Fig. 4
Whole source codes of webpage.As shown in figure 5, " nav class=" site all " " are represented the first row source code
An element on webpage, the second row source code "<div>" represent the element represented by the first row source code
Saltus step and come element, the third line to the last a line source code represent by the second row source code represent element
Saltus step and come element.So as to which Traversal Unit can obtain the webpage shown in Fig. 4 according to source code
DOM tree structure.For example, the element representated by the first row source code is located at the summit of DOM tree structure,
Element representated by second row source code is located at the second node layer, and the third line is to the last representated by a line source code
Element be located at third layer node, have jump relation element between connected with arrow.
It is next determined that unit 122 can determine on each page any two element in all elements
Similitude.According to one embodiment of present invention, multiple elements can be divided into one by determining unit 122
Individual or multiple element groups, it is at least one in the element group where any one element similar in appearance to its to cause
Other elements.For example, element " news " is put into an element group by determining unit 122, then will
For element " finance and economics " compared with element " news ", the ordinate of the two elements is identical, size phase
Together, type is identical, and father's element is also identical, thus determines that the two elements are similar, i.e., by element " finance and economics "
Same element group is put into element " news ".Next, for element " automobile ", itself and element
" news " is similar to element " finance and economics " although dissimilar, so as to which element " automobile " also be put
Enter in this group.By this way, determining unit 122 by element " news ", " finance and economics ", " amusement ",
" physical culture ", " military affairs ", " picture library ", " video ", " automobile ", " history ", " health ", " culture " are drawn
It is divided into an element group, and therefrom randomly selects element " news " and be used as a characteristic element.For it
Its element group, can also be determined using similar mode.As shown in figure 4, determining unit 122
The characteristic element set on the webpage determined is the set of those elements outlined with square frame.
As described above, determining unit 122 can determine according to the similarity of each two element in multiple elements
The characteristic element of each webpage, and the characteristic element collection using the set of characteristic element as each webpage
Close.Further, the characteristic element set of each webpage of determination can be sent to phase by determining unit 122
Like degree determining unit 130, to cause similarity determining unit 130 to determine each two webpage in multiple webpages
Similarity.
Fig. 5 is that webpage is chosen from the application program including multiple webpages according to embodiments of the invention
The structured flowchart of the similarity determining unit of device.
As shown in figure 5, characteristic element can be included to true according to the similarity determining unit 130 of the present invention
Order member 131, computing unit 132 and sum unit 133.
According to an embodiment of the invention, characteristic element can determine the spy of two webpages to determining unit 131
Levy element pair.Wherein, characteristic element is to a characteristic element by a webpage in two webpages and another
The characteristic element composition of one webpage.Further, characteristic element can be by really to determining unit 131
Fixed characteristic element is to being sent to computing unit 132.
According to an embodiment of the invention, the characteristic element set of a webpage just represents this webpage, because
And the similarity for comparing two webpages is converted to the similarity for comparing the characteristic element set of two webpages.
When it is determined that any two webpage in multiple webpages similarity when, characteristic element to determining unit 131 first
The characteristic element pair of the two webpages is determined, the characteristic element of the two webpages is to any by a webpage
Any one characteristic element of individual characteristic element and another webpage is formed.That is, for characteristic element
Element number in set is respectively N and M two webpages, and the number of characteristic element pair should be N
×M。
According to an embodiment of the invention, computing unit 132 can calculate two elements of characteristic element centering
Similarity and as the similarity of this feature element pair.
It is noted above, the Traversal Unit in characteristic element set determining unit 120 can be to application program
Traveled through so as to obtain the DOM tree structure of the tree construction between the webpage of application program, each webpage
With the element information of each webpage, here, computing unit 132 can be obtained using journey from Traversal Unit
The DOM tree structure of tree construction, each webpage and the element information of each webpage between the webpage of sequence,
And characteristic element is obtained to information to determining unit 131 from characteristic element, so as to according to application program
The DOM tree structure of tree construction, each webpage and the element information of each webpage between webpage calculate
The similarity of two elements of characteristic element centering.
According to an embodiment of the invention, computing unit 132 can be according to two elements of characteristic element centering
In the position of each element, attribute and tree structure information calculate characteristic element centering two elements phase
Like degree.
According to an embodiment of the invention, the position of element can include the coordinate and size information of element, member
The attribute of element can include mark, title, label and the hypertext reference information of element, the tree knot of element
Structure information can include the tree knot between the element place DOM tree structure of webpage and the webpage of application program
Structure.Specifically, the tree structure information of element can include father's element, daughter element and the element place of element
The information such as residing level in tree construction of the webpage between the webpage of application program.According to the reality of the present invention
Example is applied, computing unit 132 can calculate webpage a and webpage b ith feature according to following formula
The similarity S of two elements of element centeringab(i):
Sab(i)=αiLi+βiAi+γiDi (1)
Wherein, LiRepresent two elements of webpage a and webpage b ith feature element centering in position
On similarity, AiRepresent that two elements of webpage a and webpage b ith feature element centering are belonging to
Similarity in property, DiRepresent that two elements of webpage a and webpage b ith feature element centering exist
Similarity on tree construction, αi,βi,γiL is represented respectivelyi、AiAnd DiWeight coefficient, and αi+βi+γi=1.
It is worth noting that, αi,βi,γiIt is for element pair and the parameter of setting, that is to say, that for any one
Individual characteristic element pair, all it is configured with one group of αi,βi,γiParameter, this group of αi,βi,γiParameter can be according to actual need
Ask, for example, importance in the similarity for judging element of position, attribute and tree construction and set.As
One specific example, αi,βi,γiThis three is 1/3.
As described above, the similarity of two elements of the characteristic element centering that computing unit 132 calculates can be with
It is the similarity on the position after weighting, the similarity on attribute, the similarity sum on tree construction.Enter
One step, computing unit 132 can be by webpage a and webpage b two elements of ith feature element centering
Similarity Sab(i) as webpage a and webpage b ith feature element pair similarity.
Next it will be explained in detail the L for how calculating each characteristic element pairi、AiAnd Di。
Two of webpage a and webpage b ith feature element centering can be calculated by below equation
The similarity L of element in positioni:
Wherein, denominator l represents number of parameters in position, for example, the position of element can include member
Two parameters of coordinate and size of element, then l=2.LisRepresent two members of ith feature element centering
Whether s-th of the parameter of element in position be similar, is 1 when similar, is 0 when dissimilar.For example, Li1
Represent whether two elements the 1st parameter in position of ith feature element centering similar, i.e., the
Whether two element coordinates of i characteristic element centering are similar.Here it is possible to provide when two elements
Abscissa is identical or thinks that the coordinate of the two elements is similar when ordinate is identical.Li2Represent i-th of spy
Whether similar levy the 2nd parameter of two elements of element centering in position, i.e. ith feature element
Whether two element sizes of centering are similar.Here it is possible to provide to recognize when the size of two elements is identical
It is similar for the coordinate of the two elements.That is, when two elements of ith feature element centering are sat
When mark and size are all similar, Li=1;When in the two element coordinates and size of ith feature element centering
Only one it is similar when, Li=1/2, when ith feature element centering two element coordinates and size all
When dissimilar, Li=0.
In a similar way, webpage a and webpage b ith feature can be calculated by below equation
Similarity A of two elements of element centering on attribute and on tree constructioniAnd Di:
Wherein, a represents the number of parameters on attribute, and d represents the number of parameters on tree construction, example
Such as, attribute of an element can include mark, title, label and the hypertext reference information of element, tree knot
Webpage is between the webpage of application program where structure can include father's element, daughter element and the element of element
The information such as residing level in tree construction, then a=4, d=3.AisRepresent ith feature element centering
Two elements on attribute s-th of parameter (for example, the 1st parameter is the mark of element, the 2nd
Individual parameter is the title of element, and the 3rd parameter is the label of element, and the 4th parameter is the super text of element
This reference information) it is whether similar, it is 1 when similar, is 0, D when dissimilarisRepresent ith feature member
S-th parameter of two elements of plain centering on tree construction is (for example, the 1st parameter is the father of element
Element, the 2nd parameter are the daughter elements of element, and the 3rd parameter is the level residing for webpage where element)
It is whether similar, it is 1 when similar, is 0 when dissimilar.Here it is possible to two elements of self-defining are belonging to
Property the implication similar with any one parameter on tree construction, for example, only when two elements in attribute and
Just think that the two parameters are similar when parameter on tree construction is identical, i.e., corresponding AisOr DisIt is worth and is
1。
Although describe a kind of computing unit 132 it is worth noting that, above-mentioned and calculate characteristic element centering
A kind of embodiment of the similarity of two elements, but the present invention is not limited this, computing unit 132
The two of characteristic element centering elements can also be calculated according to other information or using other algorithms
Similarity.Further, computing unit 132 can be to all characteristic elements in two pages to being carried out
It is such to operate to calculate the similarity of all characteristic elements pair.Next, computing unit 132 can incite somebody to action
The similarity of all characteristic elements pair calculated is sent to sum unit 133.
According to an embodiment of the invention, sum unit 133 can be by all characteristic elements pair of two webpages
Similarity and as two webpages similarity.
According to an embodiment of the invention, sum unit 133 can also be by all characteristic elements of two webpages
To similarity be weighted, and using similarity after weighting and as two webpages.For example, summation
Unit 133 can calculate two webpages a and b similarity S using equation belowab:
Wherein, n be webpage a and webpage b characteristic element pair number, Sab(i) webpage a is represented
With the similarity of webpage b ith feature element two elements of centering, i.e. the of webpage a and webpage b
The similarity of i characteristic element pair, wiRepresent webpage a and webpage b ith feature element centering two
The weight coefficient of the similarity of element.It is worth noting that, wiIt is for element pair and the parameter of setting,
That is, for any one characteristic element pair, w is all configured withiParameter.In a specific example
In, wi=1.
As described above, characteristic element can determine the characteristic element pair of two pages to determining unit 131,
Computing unit 132 can calculate the similarity of all characteristic elements pair of two pages, sum unit 133
The similarity of two pages can be calculated.And be directed to each two page, characteristic element to determining unit 131,
Computing unit 132 and sum unit 133 can carry out such operation, so as to similarity determining unit
130 can determine the similarity of any two page in multiple pages of application program.
Fig. 6 is that webpage is chosen from the application program including multiple webpages according to embodiments of the invention
The structured flowchart of the division unit of device.
As shown in fig. 6, division unit 140 can include judging unit 141 and processing unit 142.
According to an embodiment of the invention, judging unit 141 can be according to the similarity of two webpages and predetermined
Threshold value determines whether two webpages are similar.Here, when the similarity of two webpages is more than predetermined threshold,
Judging unit 141 can determine that two webpages are similar.Here, judging unit 141 can be true from similarity
Order member 130 obtains the similarity of two pages, and then judges whether the two pages are similar.Here,
Can according to the actual needs or empirical value sets a predetermined threshold S, as two webpages a and b
Similarity Sab>Determine that webpage a is similar with b during S, otherwise it is assumed that webpage a and b are dissimilar.Here,
Judging unit 141 can be carried out such operation to any two webpage in multiple webpages, so as to
To judge whether any two webpage is similar.
According to an embodiment of the invention, multiple webpages can be divided into one or more nets by processing unit 142
Page combination, following condition is met with each webpage combination in causing one or more webpages to combine:Work as net
When multiple webpages be present in page combination, each two webpage in multiple webpages of webpage combination is all similar.
According to an embodiment of the invention, processing unit 142 can obtain two webpages from judging unit 141
Whether similar result, and according to these results divide webpage so that similar webpage is divided into a net
Page combination.That is, when multiple webpages be present in webpage combination, in multiple webpages of webpage combination
Each two webpage it is all similar;When only existing a webpage in webpage combination, the webpage and other webpages
Webpage in combination is all dissimilar.
According to an embodiment of the invention, division unit can draw webpage according to the similarity of each two webpage
It is divided into the combination of one or more webpage.So, divided equivalent to by multiple webpages of application program
It is all similar mutually per a kind of webpage for one or more classifications.
According to an embodiment of the invention, predetermined threshold S setting have impact on what processing unit 142 marked off
The number of webpage combination.When predetermined threshold S is larger, the number of the webpage combination marked off is relatively more;
And when predetermined threshold S is smaller, the number of the webpage combination marked off is fewer.Further, weighting system
Number αi,βi,γiAnd wiIt has impact on the similarity S for two pages that similarity determining unit 130 calculatesab,
Also it have impact on the number for the webpage combination that processing unit 142 marks off.Thus in practical operation, when draw
The number of the webpage combination separated is especially more or is especially unsatisfactory for the webpage combined number being actually needed less
When, can be by adjusting S, αi,βi,γiAnd wiValue come adjust webpage combination number.
According to an embodiment of the invention, when the number of webpage combination is more than webpage combined threshold value, list is judged
Member 141 is additionally operable to reduce predetermined threshold;And according to the predetermined threshold after the similarity of two webpages and reduction
Whether value redefines two webpages similar.With similar above, reduced when the similarity of two webpages is more than
During rear predetermined threshold, judging unit 141 can determine that two webpages are similar.Next, processing unit
142 are additionally operable to multiple webpages being divided into one or more webpage combinations again, to cause one or more nets
Each webpage combination in page combination meets following condition:When multiple webpages be present in webpage combination, net
Each two webpage in multiple webpages of page combination is all similar.
According to an embodiment of the invention, when the number of webpage combination is less than webpage combined threshold value, list is judged
Member 141 is additionally operable to raise predetermined threshold;And according to the predetermined threshold after the similarity of two webpages and rise
Whether value redefines two webpages similar.With similar above, when the similarity of two webpages is more than rise
During rear predetermined threshold, judging unit 141 can determine that two webpages are similar.Next, processing unit
142 are additionally operable to multiple webpages being divided into one or more webpage combinations again, to cause one or more nets
Each webpage combination in page combination meets following condition:When multiple webpages be present in webpage combination, net
Each two webpage in multiple webpages of page combination is all similar.
According to an embodiment of the invention, when the number of webpage combination is more than webpage combined threshold value, Ke Yitong
Crossing reduces predetermined threshold S mode so that the Rule of judgment of webpage similarity is reduced, so as to reduce net
The number of page combination.Similarly, when the number of webpage combination is less than webpage combined threshold value, can pass through
Raise predetermined threshold S mode so that the Rule of judgment of webpage similarity is raised, so as to increase webpage
The number of combination.
According to an embodiment of the invention, the device of webpage is chosen from the application program including multiple webpages
100 can also include access frequency determinative elements (not shown), be used for:Obtain application program webpage it
Between tree construction, the webpage in each node on behalf application program in tree;According to from top to bottom and
Order from left to right calculates the access frequency of each node in tree;According to from top to bottom and from right to left
Order calculate tree in each node access frequency;Iteration is suitable according to from top to bottom and from left to right
Sequence calculates the access frequency and each node of order calculating from top to bottom and from right to left of each node
Access frequency the step of, until calculating each node access frequency convergence;And the tree by calculating
In each node access frequency of the access frequency as the webpage corresponding with node.
It is noted above, the Traversal Unit of element acquiring unit 121 can be obtained with traversal applications program to be applied
Tree construction between the webpage of program.Here, Traversal Unit can by the webpage of the application program of acquisition it
Between tree construction be sent to access frequency determinative elements, the access frequency for calculating each node is used as
The access frequency of the webpage corresponding with node.
When Traversal Unit traversal applications program obtain application program webpage between tree construction after, can obtain
Take the information of each node in tree construction.For example, it can be obtained after tree construction as shown in Figure 3 is obtained
Take information as shown in the table.
Table 1
Page number | Enter chain number | Go out chain number | Residing level |
1 | 0 | 3 | 1 |
2 | 2 | 2 | 2 |
3 | 1 | 2 | 2 |
4 | 1 | 0 | 2 |
5 | 1 | 0 | 3 |
6 | 1 | 0 | 3 |
7 | 1 | 0 | 3 |
Wherein, the numbering of page number representation page, enter chain number and represent that this page can be jumped to
Number of pages, go out chain number and represent that this page can jump to the number of pages of other pages, residing layer
Level in the secondary tree construction for representing this page between the webpage of application program, for example, in Fig. 3 institutes
In the example shown, the level residing for the page 1 is 1, and the level residing for the page 2,3 and 4 is 2, the page
5th, the level residing for 6 and 7 is 3.Next, access frequency determinative elements can be true according to these information
The access probability of fixed each node.
According to an embodiment of the invention, the page is accessed probability and is approximately equal to the accessed frequency of the page.And
And after a page is accessed, it is possible to directly exit, it is possible to which retrogressing returns to prevpage, also has
The hyperlink on current page may be clicked on, into next layer some page.In the present invention, it is believed that
The probability of these three situations is impartial, that is to say, that for a node, its advance probability, is returned
Probability and to exit probability be 1/3.Similarly, if the chain number that goes out of certain page is n, then think to visit
Ask that the probability of wherein any one page is all impartial, i.e., be all 1/n.Similarly, if certain page enters
Chain number is m, then it is also all impartial to think to return back to the probability of wherein any one page, i.e., is all
1/m。
According to an embodiment of the invention, there is certain special case in above-mentioned hypothesis.When calculating summit, the page
Face is not by other page jumps.Therefore, for summit 1, it, which exits probability and advance probability, is
1/2, it is 0 to return to probability.Similarly, for the node 5,6 and 7 of the bottom, the page can not redirect
To other pages, therefore it is also each 1/2 to exit probability and return to probability, and advance probability is 0.
To sum up, according to an embodiment of the invention, for non-summit and other node is of non-bottom node,
Assuming that V (i) is accessed node i probability, it is n to go out chain number, and it is m to enter chain number, then can obtain
As shown in the table exits probability, returns to probability and advance probability, wherein, V (i)/3m represents to return to
The probability of some node, V (i)/3n represent to proceed to the probability of some node.
Table 2
According to an embodiment of the invention, each node in the tree construction between the webpage of application program is calculated
Access frequency include:According to the advance probability of each node, return to probability and exit in probability at least
One access frequency for carrying out calculate node.
According to an embodiment of the invention, the order being first according to from top to bottom and from left to right is calculated in tree
The access frequency of each node, then calculated according to order from top to bottom and from right to left each in tree
The access frequency of node.Such a process is properly termed as the calculating process of a wheel.Taken turns when having performed one
Calculating process as such calculating process and then the secondary wheel of execution one, then judges each of calculating
Whether the access frequency of node restrains.According to an embodiment of the invention, when epicycle calculating process with it is last round of
When the difference of the access frequency for each node that calculating process calculates is less than the predetermined threshold of access frequency,
It may determine that the access frequency convergence of each node.Now, each node that last wheel calculates is exported
Access frequency, the access frequency as the page corresponding thereto.
According to an embodiment of the invention, the wheel calculating process for calculating the access frequency of each node can be as
Under:
A. the probability of each arrow to advance from summit is calculated, and updates each arrow to advance from summit
Probability.
B. the result of calculation in step A calculates the access frequency on each summit of next layer, and more
The access frequency of new each node.Wherein, the order of calculating is from left to right, and each node connects
Enter the probability sum that frequency is all arrows that can reach the node.
C. repeat step A and step B, until the bottom.
D. the probability of each arrow opposite direction is calculated from the bottom, and updates the general of each arrow opposite direction
Rate.
E. the access frequency on each summit of last layer is calculated according to step D result of calculation, and is updated
The access frequency of each node of last layer.Wherein, the order of calculating is from right to left, and each node
Access frequency be all arrows that can reach the node probability sum, including arrow positive direction and anti-
Probability in the both direction of direction.
F. repeat step D and step E, until summit.
As an example, next it will be described for how calculating each node in the tree construction shown in Fig. 3
Access the part steps of frequency.In an embodiment of the present invention, it is assumed that the probability on access summit 1 is p.
First, according to step A, the probability of each arrow to advance from summit is calculated, and is updated from summit
The probability of each arrow to advance.For example, when calculating is from summit 1 to the forward arrow node 2
During probability, because the probability that summit 1 is advanced is p/2, and summit 1 may be advanced to node 2,3 and
4, thus it is that to be multiplied by 1/3 be p/6 to p/2 that summit 1, which proceeds to the probability of node 2,.Similarly, Ke Yiji
Calculate from summit 1 to the probability of the forward arrow node 3 and from summit 1 to the advance node 4
The probability of arrow.That is, (1,2)=p/6, (1,3)=p/6, (Isosorbide-5-Nitrae)=p/6.
Next, according to step B, result of calculation in step A calculates each summit of next layer
Access frequency, and update the access frequency of each node.Wherein, the order of calculating is from left to right,
And the access frequency of each node is the probability sum for all arrows that can reach the node.For example,
When the access frequency of calculate node 2, due to the forward arrow between summit 1 and node 2 can only be passed through
Node 2 is reached, therefore forward arrow of the access probability of node 2 between summit 1 and node 2 is general
Rate p/6.Similarly, with calculate node 3 and the access frequency of node 4 and can update, i.e. 2=p/6,3=p/6,
4=p/6.
Next, according to step C, step A and step B are repeated, it is following so as to draw
The probability of arrow or the frequency of node:(2,5)=p/36, (2,6)=p/36, (3,2)=p/36, (3,7)=p/36,
5=p/36 6=p/36,7=p/36.
Next, according to step D, the probability of each arrow opposite direction is calculated from the bottom, and is updated every
The probability of individual arrow opposite direction.For example, when calculating is from summit 7 to the arrow opposite direction node 3
During probability, because the access frequency on the summit 7 calculated in step C is p/36, and node 7 returns generally
Rate is 1/2, and node 7 can only return to node 3, thus node 7 returns to the probability of node 3 and is
It is p/72 that p/36, which is multiplied by 1/2,.Similarly, the arrow returned to from summit 6 between node 2 can be calculated
The probability of the probability of opposite direction and the arrow opposite direction returned to from node 5 between node 2.That is,
(7,3)=p/72, (6,2)=p/72 (5,2)=p/72.
Next, according to step E, each summit of last layer is calculated according to step D result of calculation
Frequency is accessed, and updates the access frequency of each node of last layer.Wherein, the order of calculating be from the right side to
A left side, and the access frequency of each node is the probability sum for all arrows that can reach the node.Example
Such as, the access frequency of calculate node 4, because the access frequency that node 4 is reached along arrow positive direction is
P/6, node 4 can be reached without arrow opposite direction, thus the access frequency of node 4 is p/6.Again
Such as, the access frequency of calculate node 3, due to the positive arrow that can be arrived along node 1 between node 3
Node 3 is reached, the reverse arrow that can also be arrived along node 7 between node 3 reaches node 3, therefore
The access frequency of node 3 adds the probability of (7,3) for the probability of (1,3).Similarly, egress can be calculated
2 access frequency, i.e. 4=p/6,3=(1,3)+(7,3), 2=(1,2)+(5,2)+(6,2)+(7,2).
Next, according to step F, repeat step D and E, until summit 1.This calculating process with
Said process is similar, is not described in detail herein.
As described above, access frequency determinative elements can determine the access frequency of each node in tree, and
As the access frequency of the page corresponding with the node.Further, accessing frequency determinative elements can
Unit 150 is chosen so that the access frequency of each page to be sent to, in order to choose unit 150 from each
An access frequency highest webpage is chosen in webpage combination.
According to an embodiment of the invention, choosing unit 150 can connect in selection one from the combination of each webpage
Enter frequency highest webpage, that is to say, that have chosen a most representational webpage.Thus, select
Take webpage device 100 can be chosen from the application program including multiple webpages it is small numbers of but most
Important most representational webpage.
According to an embodiment of the invention, the webpage of selection can be sent to outside by choosing the device 100 of webpage
The test device connect, the webpage of selection is tested for test device.According to the implementation of the present invention
Example, test cell can also be included by choosing the device 100 of webpage, and choosing unit 150 can be by selection
Webpage is sent to test cell, for testing the webpage of selection.Either test device is still
Test cell is tested the webpage of selection, can be determined according to the test result of the webpage to selection
The test result of application program.
According to an embodiment of the invention, although can test the webpage of selection, to ensure to test
The resource of test is saved in the case of effect, but the present invention is not limited thereto.According to embodiments of the present invention
Selection webpage device 100 after it have chosen final webpage, can be also used for data mining, should
With program analysis etc..
According to an embodiment of the invention be used for from the application for including multiple webpages is described with reference to Fig. 8
The method that webpage is chosen in program.
As shown in figure 8, in step S810, multiple webpages of application program are obtained.
Next, in step S820, the characteristic element set of each webpage in multiple webpages is determined.
Next, in step S830, multiple webpages are determined according to the characteristic element set of each webpage
In each two webpage similarity.
Next, in step S840, multiple webpages are divided into one according to the similarity of each two webpage
Individual or multiple webpage combinations.
Next, in step S850, selected from each webpage combination of one or more webpages combination
Take access frequency one webpage of highest.
According to an embodiment of the invention, the characteristic element set bag of each webpage in multiple webpages is determined
Include:Obtain multiple elements of each webpage;Determined according to the similarity of each two element in multiple elements every
The characteristic element of individual webpage;And the characteristic element set using the set of characteristic element as each webpage.
According to an embodiment of the invention, each net is determined according to the similarity of each two element in multiple elements
The characteristic element of page includes:Multiple elements are divided into according to the similarity of each two element in multiple elements
One or more element groups;An element is chosen from each element group of one or more element groups;With
And the characteristic element using the element of selection as webpage.
According to an embodiment of the invention, according to the DOM of the position of element, type and element place webpage
Tree construction determines the similarity of each two element in multiple elements.
According to an embodiment of the invention, determined according to the characteristic element set of each webpage in multiple webpages
The similarity of each two webpage includes:The characteristic element pair of two webpages is determined, characteristic element is to by two
One characteristic element of a webpage in webpage and a characteristic element of another webpage form;Calculate
The similarity of two elements of characteristic element centering and as the similarity of characteristic element pair;And will
Similarity and as two webpages the similarity of all characteristic elements pair of two webpages.
According to an embodiment of the invention, determining the similarity of two elements of characteristic element centering includes:Root
Determined according to the position of each element in two elements of characteristic element centering, attribute and tree structure information special
Levy the similarity of two elements of element centering.
According to an embodiment of the invention, according to the similarity of each two webpage by multiple webpages be divided into one or
Multiple webpage combinations include:According to the similarity of two webpages and predetermined threshold determine two webpages whether phase
Seemingly, including:When the similarity of two webpages is more than predetermined threshold, determine that two webpages are similar;And
Multiple webpages are divided into one or more webpage combinations, it is each in one or more webpages combinations to cause
Webpage combination meets following condition:When multiple webpages be present in webpage combination, multiple nets of webpage combination
Each two webpage in page is all similar.
According to an embodiment of the invention, when the number of webpage combination is more than webpage combined threshold value, method is also
Including:Reduce predetermined threshold;Redefined according to the predetermined threshold after the similarity of two webpages and reduction
Whether two webpages are similar, including:When the similarity of two webpages is more than the predetermined threshold after reducing,
Determine that two webpages are similar;And multiple webpages are divided into one or more webpages again and combined, to cause
Each webpage combination in one or more webpage combinations meets following condition:It is more when existing in webpage combination
During individual webpage, each two webpage in multiple webpages of webpage combination is all similar.
According to an embodiment of the invention, in addition to each webpage in multiple webpages of application program is determined
Frequency is accessed, wherein it is determined that the access frequency of each webpage in multiple webpages of application program includes:
The tree construction between the webpage of application program is obtained, one in each node on behalf application program in tree
Webpage;The access frequency of each node in tree is calculated according to order from top to bottom and from left to right;Press
The access frequency of each node in tree is calculated according to order from top to bottom and from right to left;Iteration according to from
Top to bottm and order from left to right calculate each node access frequency and from top to bottom and from the right side to
Left order calculates the step of access frequency of each node, until the access frequency of each node of calculating
Convergence;And using the access frequency of each node in the tree of calculating as the webpage corresponding with node
Access frequency.
According to an embodiment of the invention, calculating the access frequency of each node in tree includes:According to each
The advance probability of node, return probability and exit in probability it is at least one come calculate node access frequency
Rate.
According to an embodiment of the invention, in addition to:The webpage of selection is tested;And according to choosing
The test result of the webpage taken determines the test result of application program.
The method according to an embodiment of the invention that webpage is chosen from the application program including multiple webpages
Above-mentioned steps various embodiments before made detailed description, be not repeated herein
It is bright.
Fig. 9 is that webpage is chosen from the application program including multiple webpages according to embodiments of the invention
The schematic diagram of the process of method.As shown in figure 9, application program can be obtained by traversal applications program
Tree construction between structure of web page, including the webpage of application program, the DOM tree structure of each page,
Relevant information of all elements on each page etc..Next it may be determined to the characteristic element of each webpage
Element set, so as to element in characteristic element set similarity come by multiple webpages of application program
It is divided into one or more combinations, such as combination A, combination B etc..Next, the structure according to webpage
The access frequency of each webpage can be determined, so as to choose access frequency maximum in the combination of each webpage
Webpage, finally give test webpage, such as webpage a, webpage b etc..
As can be seen here, can be according to webpage using the apparatus and method of the selection webpage according to the present invention
Characteristic element set determines the similarity of each two webpage, and according to the similarity of each two webpage by net
Page is divided into one or more combinations, and then access frequency one net of highest is chosen from each combination
Page.So, the classification of webpage can be reliably achieved, and access frequency can be based on each
Most important most representational webpage is chosen in combination.When the webpage of selection is used for Application testing
When, time of test can be reduced in the case where ensureing to test effect, so as to save the resource of test.
Obviously, according to each of the method that webpage is chosen from the application program including multiple webpages of the present invention
Individual operating process can be to be stored in various machine readable storage mediums computer executable program
Mode realize.
Moreover, the purpose of the present invention can also be accomplished in the following manner:Above-mentioned executable journey will be stored with
The storage medium of sequence code is directly or indirectly supplied to system or equipment, and in the system or equipment
Computer or CPU (CPU) read and perform said procedure code.Now, as long as
The system or equipment have the function of configuration processor, then embodiments of the present invention are not limited to program,
And the program can also be arbitrary form, for example, target program, interpreter perform program or
It is supplied to shell script of operating system etc..
These above-mentioned machinable mediums include but is not limited to:Various memories and memory cell, half
Conductor device, disk cell such as light, magnetic and magneto-optic disk, and other media suitable for storage information etc..
In addition, computer is by the corresponding website that is connected on internet, and by the meter according to the present invention
Calculation machine program code is downloaded and is installed in computer and then performs the program, can also realize the present invention's
Technical scheme.
Figure 10 is that can wherein realize to be examined according to the repetition collapse being used for application program of the present invention
The block diagram of the example arrangement of the general purpose personal computer of the apparatus and method of survey.
As shown in Figure 10, CPU 1001 according to the program stored in read-only storage (ROM) 1002 or from
The program that storage part 1008 is loaded into random access memory (RAM) 1003 performs various processing.
In RAM 1003, the number required when CPU 1001 performs various processing etc. is stored also according to needs
According to.CPU 1001, ROM 1002 and RAM 1003 are connected to each other via bus 1004.Input/
Output interface 1005 is also connected to bus 1004.
Components described below is connected to input/output interface 1005:Importation 1006 (including keyboard, mouse
Etc.), output par, c 1007 (including display, such as cathode-ray tube (CRT), liquid crystal display (LCD)
Deng, and loudspeaker etc.), storage part 1008 (including hard disk etc.), communications portion 1009 (including
NIC such as LAN card, modem etc.).Communications portion 1009 via network such as because
Spy's net performs communication process.As needed, driver 1010 can be connected to input/output interface 1005.
Detachable media 1011 such as disk, CD, magneto-optic disk, semiconductor memory etc. quilt as needed
On driver 1010 so that the computer program read out is installed to storage as needed
In part 1008.
In the case where realizing above-mentioned series of processes by software, from network such as internet or storage medium
For example the installation of detachable media 1011 forms the program of software.
It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 10 wherein
Have program stored therein, separately distribute to provide a user the detachable media 1011 of program with equipment.Can
The example of dismounting medium 1011 includes disk (include floppy disk (registration mark)), CD (includes that CD is read-only deposits
Reservoir (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (including mini-disk (MD) (registration mark))
And semiconductor memory.Or storage medium can be ROM 1002, storage part 1008 in include
Hard disk etc., wherein computer program stored, and user is distributed to together with the equipment comprising them.
In the system and method for the present invention, it is clear that each unit or each step are can to decompose and/or again
Combination.These decompose and/or reconfigured the equivalents that should be regarded as the present invention.Also, perform above-mentioned
The step of series of processes can order naturally following the instructions perform in chronological order, but and need not
Necessarily perform sequentially in time.Some steps can perform parallel or independently of one another.
Although embodiments of the invention are described in detail with reference to accompanying drawing above, it is to be understood that institute above
The embodiment of description is only intended to the explanation present invention, and is not construed as limiting the invention.For this
For the technical staff in field, above-mentioned embodiment can be made various changes and modifications without departing from
The spirit and scope of the invention.Therefore, the scope of the present invention only by appended claim and its equivalent contains
Justice limits.
On the embodiment including above example, following note is also disclosed:
A kind of 1. devices that webpage is chosen from the application program including multiple webpages are attached, including:
Webpage acquiring unit, for obtaining multiple webpages of the application program;
Characteristic element set determining unit, for determining the feature of each webpage in the multiple webpage
Element set;
Similarity determining unit is described more for being determined according to the characteristic element set of each webpage
The similarity of each two webpage in individual webpage;
Division unit, for the multiple webpage to be divided into one according to the similarity of each two webpage
Individual or multiple webpage combinations;And
Unit is chosen, each webpage for being combined from one or more of webpages is chosen in combining and connect
Enter one webpage of frequency highest.
Device of the note 2. according to note 1, wherein, the characteristic element set determining unit
Including:
Element acquiring unit, for obtaining multiple elements of each webpage;And
Determining unit, it is described every for being determined according to the similarity of each two element in the multiple element
The characteristic element of individual webpage, and the feature using the set of the characteristic element as each webpage
Element set.
Device of the note 3. according to note 2, wherein, the determining unit is used for:
The multiple element is divided into one according to the similarity of each two element in the multiple element
Individual or multiple element groups;
An element is chosen from each element group of one or more of element groups;And
Characteristic element using the element of selection as the webpage.
Devices of the note 4. according to note 3, wherein, the determining unit is according to the position of element
Put, the DOM tree structure of webpage where type and element determines each two member in the multiple element
The similarity of element.
Device of the note 5. according to note 1, wherein, the similarity determining unit includes:
Characteristic element is to determining unit, for determining the characteristic element pair of described two webpages, the spy
Element is levied to a characteristic element by a webpage in described two webpages and another webpage
One characteristic element composition;
Computing unit, for two elements calculating the characteristic element centering similarity and made
For the similarity of the characteristic element pair;And
Sum unit, for using the similarity of all characteristic elements pair of described two webpages and as
The similarity of described two webpages.
Device of the note 6. according to note 5, wherein, the computing unit is according to the feature
Position, attribute and the tree structure information of each element in two elements of element centering calculate the spy
Levy the similarity of two elements of element centering.
Device of the note 7. according to note 1, wherein, the division unit includes:
Judging unit, described two webpages are determined for the similarity according to two webpages and predetermined threshold
It is whether similar, including:When the similarity of described two webpages is more than predetermined threshold, described two are determined
Individual webpage is similar;And
Processing unit, combined for the multiple webpage to be divided into one or more webpages, to cause
The each webpage combination stated in one or more webpage combinations meets following condition:When the webpage combines
In when multiple webpages be present, each two webpage in multiple webpages of the webpage combination is all similar.
Device of the note 8. according to note 7, wherein, when the number of webpage combination is more than
During webpage combined threshold value, the judging unit is additionally operable to reduce the predetermined threshold;And according to two
Whether the predetermined threshold after the similarity of webpage and reduction redefines described two webpages similar, bag
Include:When the similarity of described two webpages is more than the predetermined threshold after reducing, described two nets are determined
Page is similar, and the processing unit is additionally operable to the multiple webpage being divided into one or more nets again
Page combination, following bar is met with each webpage combination in causing one or more of webpages to combine
Part:It is every in multiple webpages of the webpage combination when multiple webpages be present in webpage combination
Two webpages are all similar.
Device of the note 9. according to note 1, wherein, it is true that described device also includes access frequency
Order member, is used for:
Obtain the tree construction between the webpage of the application program, each node on behalf institute in the tree
State a webpage in application program;
The access frequency of each node in the tree is calculated according to order from top to bottom and from left to right
Rate;
The access frequency of each node in the tree is calculated according to order from top to bottom and from right to left
Rate;
Iteration according to order from top to bottom and from left to right calculate each node access frequency and
Order from top to bottom and from right to left calculates the step of access frequency of each node, until calculating
The access frequency convergence of each node;And
Using the access frequency of each node in the tree of calculating as corresponding with the node
The access frequency of webpage.
Note 10. according to note 9 described in devices, wherein, it is described access frequency determinative elements according to
The advance probability of each node, return to probability and exit at least one described to calculate in probability
The access frequency of each node.
Device of the note 11. according to note 1, wherein, described device also includes test cell,
For testing the webpage of selection, and according to determining the test result of the webpage to selection
The test result of application program.
A kind of 12. methods that webpage is chosen from the application program including multiple webpages are attached, including:
Obtain multiple webpages of the application program;
Determine the characteristic element set of each webpage in the multiple webpage;
The each two net in the multiple webpage is determined according to the characteristic element set of each webpage
The similarity of page;
The multiple webpage is divided into by one or more webpages according to the similarity of each two webpage
Combination;And
Access frequency highest is chosen from each webpage combination of one or more of webpages combination
One webpage.
Method of the note 13. according to note 12, wherein it is determined that every in the multiple webpage
The characteristic element set of individual webpage includes:
Obtain multiple elements of each webpage;
The feature of each webpage is determined according to the similarity of each two element in the multiple element
Element;And
Characteristic element set using the set of the characteristic element as each webpage.
Method of the note 14. according to note 13, wherein, according to the similar of the multiple element
Degree determines that the characteristic element of each webpage includes:
The multiple element is divided into one according to the similarity of each two element in the multiple element
Individual or multiple element groups;
An element is chosen from each element group of one or more of element groups;And
Characteristic element using the element of selection as the webpage.
Note 15. according to note 14 described in methods, wherein, according to the position of element, type and
The DOM tree structure of webpage where element determines the similar of each two element in the multiple element
Degree.
Method of the note 16. according to note 12, wherein, according to the feature of each webpage
Element set determines that the similarity of each two webpage in the multiple webpage includes:
The characteristic element pair of described two webpages is determined, the characteristic element is to by described two webpages
A webpage a characteristic element and another webpage a characteristic element form;
Calculate the similarity of two elements of the characteristic element centering and as the characteristic element
The similarity of element pair;And
Using the similarity of all characteristic elements pair of described two webpages and as described two webpages
Similarity.
Method of the note 17. according to note 16, wherein it is determined that the characteristic element centering
The similarity of two elements includes:According to each element in two elements of the characteristic element centering
Position, attribute and tree structure information determine the characteristic element centering two elements similarity.
Method of the note 18. according to note 12, wherein, according to the phase of each two webpage
The multiple webpage is divided into one or more webpage combinations like degree includes:
Determine whether described two webpages are similar according to the similarity of two webpages and predetermined threshold, wrap
Include:When the similarity of described two webpages is more than predetermined threshold, determine that described two webpages are similar;
And
The multiple webpage is divided into one or more webpage combinations, to cause one or more of nets
Each webpage combination in page combination meets following condition:Multiple webpages be present in the webpage combines
When, each two webpage in multiple webpages of webpage combination is all similar.
Method of the note 19. according to note 18, wherein, when the number of webpage combination is big
When webpage combined threshold value, methods described also includes:
Reduce the predetermined threshold;
Described two webpages are redefined according to the predetermined threshold after the similarity of two webpages and reduction
It is whether similar, including:When the similarity of described two webpages is more than the predetermined threshold after reducing, really
Fixed described two webpages are similar;And
The multiple webpage is divided into one or more webpage combinations again, it is one or more to cause
Each webpage combination in individual webpage combination meets following condition:It is multiple when existing in webpage combination
During webpage, each two webpage in multiple webpages of the webpage combination is all similar.
A kind of 20. machinable mediums are attached, carry the machine including being stored therein thereon
The program product of device readable instruction code, wherein, the instruction code is when by computer reading and execution
When, the computer can be made to perform the method according to any one of note 12-19.
Claims (10)
1. a kind of device that webpage is chosen from the application program including multiple webpages, including:
Webpage acquiring unit, for obtaining multiple webpages of the application program;
Characteristic element set determining unit, for determining the feature of each webpage in the multiple webpage
Element set;
Similarity determining unit is described more for being determined according to the characteristic element set of each webpage
The similarity of each two webpage in individual webpage;
Division unit, for the multiple webpage to be divided into one according to the similarity of each two webpage
Individual or multiple webpage combinations;And
Unit is chosen, each webpage for being combined from one or more of webpages is chosen in combining and connect
Enter one webpage of frequency highest.
2. device according to claim 1, wherein, the characteristic element set determining unit
Including:
Element acquiring unit, for obtaining multiple elements of each webpage;And
Determining unit, it is described every for being determined according to the similarity of each two element in the multiple element
The characteristic element of individual webpage, and the feature using the set of the characteristic element as each webpage
Element set.
3. device according to claim 2, wherein, the determining unit is used for:
The multiple element is divided into one according to the similarity of each two element in the multiple element
Individual or multiple element groups;
An element is chosen from each element group of one or more of element groups;And
Characteristic element using the element of selection as the webpage.
4. device according to claim 3, wherein, the determining unit is according to the position of element
Put, the DOM tree structure of webpage where type and element determines each two member in the multiple element
The similarity of element.
5. device according to claim 1, wherein, the similarity determining unit includes:
Characteristic element is to determining unit, for determining the characteristic element pair of described two webpages, the spy
Element is levied to a characteristic element by a webpage in described two webpages and another webpage
One characteristic element composition;
Computing unit, for two elements calculating the characteristic element centering similarity and made
For the similarity of the characteristic element pair;And
Sum unit, for using the similarity of all characteristic elements pair of described two webpages and as
The similarity of described two webpages.
6. device according to claim 5, wherein, the computing unit is according to the feature
Position, attribute and the tree structure information of each element in two elements of element centering calculate the spy
Levy the similarity of two elements of element centering.
7. device according to claim 1, wherein, the division unit includes:
Judging unit, described two webpages are determined for the similarity according to two webpages and predetermined threshold
It is whether similar, including:When the similarity of described two webpages is more than predetermined threshold, described two are determined
Individual webpage is similar;And
Processing unit, combined for the multiple webpage to be divided into one or more webpages, to cause
The each webpage combination stated in one or more webpage combinations meets following condition:When the webpage combines
In when multiple webpages be present, each two webpage in multiple webpages of the webpage combination is all similar.
8. device according to claim 7, wherein, when the number of webpage combination is more than
During webpage combined threshold value, the judging unit is additionally operable to reduce the predetermined threshold;And according to two
Whether the predetermined threshold after the similarity of webpage and reduction redefines described two webpages similar, bag
Include:When the similarity of described two webpages is more than the predetermined threshold after reducing, described two nets are determined
Page is similar, and the processing unit is additionally operable to the multiple webpage being divided into one or more nets again
Page combination, following bar is met with each webpage combination in causing one or more of webpages to combine
Part:It is every in multiple webpages of the webpage combination when multiple webpages be present in webpage combination
Two webpages are all similar.
9. device according to claim 1, wherein, it is true that described device also includes access frequency
Order member, is used for:
Obtain the tree construction between the webpage of the application program, each node on behalf institute in the tree
State a webpage in application program;
The access frequency of each node in the tree is calculated according to order from top to bottom and from left to right
Rate;
The access frequency of each node in the tree is calculated according to order from top to bottom and from right to left
Rate;
Iteration according to order from top to bottom and from left to right calculate each node access frequency and
Order from top to bottom and from right to left calculates the step of access frequency of each node, until calculating
The access frequency convergence of each node;And
Using the access frequency of each node in the tree of calculating as corresponding with the node
The access frequency of webpage.
10. a kind of method that webpage is chosen from the application program including multiple webpages, including:
Obtain multiple webpages of the application program;
Determine the characteristic element set of each webpage in the multiple webpage;
The each two net in the multiple webpage is determined according to the characteristic element set of each webpage
The similarity of page;
The multiple webpage is divided into by one or more webpages according to the similarity of each two webpage
Combination;And
Access frequency highest is chosen from each webpage combination of one or more of webpages combination
One webpage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610305142.8A CN107357716A (en) | 2016-05-10 | 2016-05-10 | Apparatus and method for choosing webpage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610305142.8A CN107357716A (en) | 2016-05-10 | 2016-05-10 | Apparatus and method for choosing webpage |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107357716A true CN107357716A (en) | 2017-11-17 |
Family
ID=60271719
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610305142.8A Pending CN107357716A (en) | 2016-05-10 | 2016-05-10 | Apparatus and method for choosing webpage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107357716A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011053912A (en) * | 2009-09-02 | 2011-03-17 | Nec Corp | Page similarity determination apparatus, page similarity determination method and page similarity determination program |
CN103049562A (en) * | 2012-12-31 | 2013-04-17 | 华为技术有限公司 | Method and device for recognizing similar webpages |
CN103853654A (en) * | 2012-11-30 | 2014-06-11 | 国际商业机器公司 | Method and device for selecting webpage testing paths |
CN104504086A (en) * | 2014-12-25 | 2015-04-08 | 北京国双科技有限公司 | Clustering method and device for webpage |
CN104657391A (en) * | 2013-11-21 | 2015-05-27 | 阿里巴巴集团控股有限公司 | Page processing method and device |
-
2016
- 2016-05-10 CN CN201610305142.8A patent/CN107357716A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011053912A (en) * | 2009-09-02 | 2011-03-17 | Nec Corp | Page similarity determination apparatus, page similarity determination method and page similarity determination program |
CN103853654A (en) * | 2012-11-30 | 2014-06-11 | 国际商业机器公司 | Method and device for selecting webpage testing paths |
CN103049562A (en) * | 2012-12-31 | 2013-04-17 | 华为技术有限公司 | Method and device for recognizing similar webpages |
CN104657391A (en) * | 2013-11-21 | 2015-05-27 | 阿里巴巴集团控股有限公司 | Page processing method and device |
CN104504086A (en) * | 2014-12-25 | 2015-04-08 | 北京国双科技有限公司 | Clustering method and device for webpage |
Non-Patent Citations (1)
Title |
---|
范意兴 等: ""一种基于网页块特征的多级网页聚类方法"", 《山东大学学报(理学版)》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10664999B2 (en) | Saliency prediction for a mobile user interface | |
Jongman et al. | Declining vulnerability to river floods and the global benefits of adaptation | |
US20200037102A1 (en) | Method and apparatus for determining index grids of geo-fence | |
US8898296B2 (en) | Detection of boilerplate content | |
US20140074758A1 (en) | Self organizing maps for visualizing an objective space | |
US20180232351A1 (en) | Joining web data with spreadsheet data using examples | |
EP3828803A1 (en) | Ambient point-of-interest recommendation using look-alike groups | |
US20230024680A1 (en) | Method of determining regional land usage property, electronic device, and storage medium | |
CN111428457A (en) | Automatic formatting of data tables | |
CN113128588B (en) | Model training method, device, computer equipment and computer storage medium | |
CN106503211A (en) | Information issues the method that the mobile edition of class website is automatically generated | |
US20220138954A1 (en) | Progressively-trained scale-invariant and boundary-aware deep neural network for the automatic 3d segmentation of lung lesions | |
CN103885767B (en) | System and method used for geographical area correlated websites | |
US20200320165A1 (en) | Techniques for generating templates from reference single page graphic images | |
CN107992589A (en) | A kind of loading method, the apparatus and system of SVG map datums | |
US20220114269A1 (en) | Page processing method, electronic apparatus and non-transitory computer-readable storage medium | |
Wei et al. | Efficient Priority-Flood depression filling in raster digital elevation models | |
CN116910335A (en) | Data acquisition method and system based on webpage label analysis | |
CN111339396B (en) | Method, device and computer storage medium for extracting webpage content | |
US8175338B2 (en) | Map-based aesthetic evaluation of document layouts | |
Godfrey et al. | An adaptable approach for generating vector features from scanned historical thematic maps using image enhancement and remote sensing techniques in a geographic information system | |
Ngolo et al. | Integrating geographical information systems, remote sensing, and machine learning techniques to monitor urban expansion: an application to Luanda, Angola | |
CN109658485A (en) | Web animation method for drafting, device, computer equipment and storage medium | |
Burgette et al. | Multiple-shrinkage multinomial probit models with applications to simulating geographies in public use data | |
CN107357716A (en) | Apparatus and method for choosing webpage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171117 |
|
WD01 | Invention patent application deemed withdrawn after publication |