CN107357716A - Apparatus and method for choosing webpage - Google Patents

Apparatus and method for choosing webpage Download PDF

Info

Publication number
CN107357716A
CN107357716A CN201610305142.8A CN201610305142A CN107357716A CN 107357716 A CN107357716 A CN 107357716A CN 201610305142 A CN201610305142 A CN 201610305142A CN 107357716 A CN107357716 A CN 107357716A
Authority
CN
China
Prior art keywords
webpage
webpages
similarity
node
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610305142.8A
Other languages
Chinese (zh)
Inventor
马磊
皮冰锋
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201610305142.8A priority Critical patent/CN107357716A/en
Publication of CN107357716A publication Critical patent/CN107357716A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation

Abstract

The present invention relates to a kind of apparatus and method that webpage is chosen from the application program including multiple webpages.Included according to the device of the selection webpage of the present invention:Webpage acquiring unit, for obtaining multiple webpages of application program;Characteristic element set determining unit, for determining the characteristic element set of each webpage in multiple webpages;Similarity determining unit, for determining the similarity of each two webpage in multiple webpages according to the characteristic element set of each webpage;Division unit, multiple webpages are divided into one or more webpages for the similarity according to each two webpage and combined;And selecting unit, each webpage for being combined from one or more webpages choose access frequency one webpage of highest in combining.The classification of webpage can be reliably achieved using the apparatus and method that webpage is chosen from the application program including multiple webpages according to the present invention, and can therefrom choose access frequency highest webpage.

Description

Apparatus and method for choosing webpage
Technical field
The present invention relates to Application testing technical field, more particularly to for from including multiple webpages The apparatus and method that webpage is chosen in application program.
Background technology
This part provides the background information relevant with the present invention, and this is not necessarily prior art.
Nowadays, mobile Internet becomes more and more numerous with the fast development of smart mobile phone and 3G/4G networks Honor, and people gradually begin to use smart mobile phone.As the development of smart mobile phone, smart mobile phone etc. move The application program (app) installed in dynamic terminal also emerges in an endless stream, and development is swift and violent.From the point of view of specific, currently Application program can be divided into three classes:One kind is native applications program (native app), is typically relied on Operating system, there is very strong interactivity, be a complete application program, expansibility is strong, and needs Want user to download to install and use;Second is Web page application program (web app), and it uses Html5 (Hypertext Markup Language 5, the modification of HTML the 5th) language is write, Installation need not be downloaded, existence in a browser, can also similar to now described light application program Say be touch screen version Web page application program, such as open the website such as Sohu with smart mobile phone;The third is mixed Application program (hybrid app) is closed, refers to the mixed application of half primary half type of webpage, it is needed Download installation, it appears that similar native applications program, but the content accessed is webpage.
At present, people have put into many resources in terms of the exploitation of these application programs, but simultaneously, one The outstanding application program of money also be unable to do without sufficiently test.For the test of native applications program, Ren Menyi Through having done substantial amounts of work, testing tool and framework also have a lot.However, for Web page application program Test, not yet proposes reliable testing scheme.
For the test of Web page application program, because a Web page application program includes many webpage pages Face, each page html5 language developments.Therefore, a Web page application program is surveyed Examination, ideally, it should each Webpage is tested, although such test is filled Point, it is apparent that taking a long time, waste test resource.
For above technical problem, it is contemplated that be not each Webpage in a Web page application program It is all critically important, and the structure that partial page be present is substantially similar, therefore the present invention wishes to propose one kind side Case, the webpage in Web page application program can be classified, so as to be chosen from every a kind of webpage One most representational webpage.So, by testing less page can as far as possible more Web page application program is widely covered, so as to reduce the testing time, saves test resource.
The content of the invention
This part provides the general summary of the present invention, rather than its four corner or its whole feature Full disclosure.
It is used to choose net from the application program including multiple webpages it is an object of the invention to provide a kind of The apparatus and method of page, reliably can classify, and can therefrom choose and most represent to webpage The webpage of property.
According to an aspect of the present invention, there is provided a kind of to choose net from the application program including multiple webpages The device of page, including:Webpage acquiring unit, for obtaining multiple webpages of the application program;Feature Element set determining unit, for determining the characteristic element set of each webpage in the multiple webpage; Similarity determining unit, for determining the multiple webpage according to the characteristic element set of each webpage In each two webpage similarity;Division unit, for being incited somebody to action according to the similarity of each two webpage The multiple webpage is divided into one or more webpage combinations;And selecting unit, for from one or Access frequency one webpage of highest is chosen in each webpage combination of multiple webpage combinations.
According to another aspect of the present invention, there is provided a kind of to be chosen from the application program including multiple webpages The method of webpage, including:Obtain multiple webpages of the application program;Determine in the multiple webpage The characteristic element set of each webpage;Determined according to the characteristic element set of each webpage the multiple The similarity of each two webpage in webpage;According to the similarity of each two webpage by the multiple net Page is divided into one or more webpage combinations;And each group of web from the combination of one or more of webpages Access frequency one webpage of highest is chosen in conjunction.
According to another aspect of the present invention, there is provided a kind of program product, the program product include being stored in Machine readable instructions code therein, wherein, the instruction code when by computer read and perform when, It can perform the computer and net is chosen from the application program including multiple webpages according to the present invention The method of page.
According to another aspect of the present invention, there is provided a kind of machinable medium, carry root thereon According to the program product of the present invention.
Use device and the side that webpage is chosen from the application program including multiple webpages according to the present invention Method, the similarity of each two webpage can be determined according to the characteristic element set of webpage, and according to every two Webpage is divided into one or more combinations by the similarity of individual webpage, and then access is chosen from each combination One webpage of frequency highest.So, the classification of webpage can be reliably achieved, and can be with base Most representational webpage is chosen in each combination in access frequency.When the webpage of selection is used to apply During program test, the time of test can be reduced in the case where ensureing to test effect, so as to save test Resource.
Description and specific examples in this summary are intended merely to the purpose of signal, and are not intended to and limit this hair Bright scope.
Brief description of the drawings
Accompanying drawing described here is intended merely to the purpose of the signal of selected embodiment and not all possible reality Apply, and be not intended to limitation the scope of the present invention.In the accompanying drawings:
Fig. 1 is to choose net from the application program including multiple webpages according to embodiments of the invention The structured flowchart of the device of page;
Fig. 2 is to choose net from the application program including multiple webpages according to embodiments of the invention The structured flowchart of the characteristic element set determining unit of the device of page;
Fig. 3 shows the webpage of the application program according to an embodiment of the invention including multiple webpages Between tree construction example;
Fig. 4 shows one of the application program according to an embodiment of the invention including multiple webpages The example of webpage;
Fig. 5 shows the example of the partial source symbols of the webpage shown in Fig. 4;
Fig. 6 is to choose net from the application program including multiple webpages according to embodiments of the invention The structured flowchart of the similarity determining unit of the device of page;
Fig. 7 is to choose net from the application program including multiple webpages according to embodiments of the invention The structured flowchart of the division unit of the device of page;
Fig. 8 is to choose net from the application program including multiple webpages according to embodiments of the invention The flow chart of the method for page;
Fig. 9 is to choose net from the application program including multiple webpages according to embodiments of the invention The schematic diagram of the process of the method for page;And
Figure 10 is according to an embodiment of the invention from the application for including multiple webpages can wherein to realize The block diagram of the example arrangement of the general purpose personal computer of the apparatus and method of webpage is chosen in program.
Although the present invention is subjected to various modifications and alternative forms, its specific embodiment is used as example Son is shown in the drawings, and is described in detail here.It should be understood, however, that at this to particular implementation The description of example is not intended to limit the invention to disclosed concrete form, but on the contrary, mesh of the present invention Be intended to cover fall within the spirit and scope of the invention all modifications, it is equivalent and replace.It is noted that , through several accompanying drawings, corresponding label indicates corresponding part.
Embodiment
The example of the present invention is described more fully with reference now to accompanying drawing.Description is substantially simply shown below Example property, and it is not intended to the limitation present invention, application or purposes.
Example embodiment is provided below, so that the present invention will become detailed, and will be to this area Technical staff fully passes on its scope.Elaborate numerous specific details such as discrete cell, device and side The example of method, to provide the detailed understanding to embodiments of the invention.To those skilled in the art It will be obvious that, it is not necessary to specific details is used, example embodiment can use many different forms To implement, they shall not be interpreted to limit the scope of the present invention.In some example embodiments, Well-known process, well-known structure and widely-known technique are not described in detail.
Fig. 1 is that webpage is chosen from the application program including multiple webpages according to embodiments of the invention The structured flowchart of device.As shown in figure 1, the device 100 according to an embodiment of the invention for choosing webpage Webpage acquiring unit 110, characteristic element set determining unit 120, similarity determining unit can be included 130th, division unit 140 and selection unit 150.
According to an embodiment of the invention, webpage acquiring unit 110 can obtain multiple webpages of application program. According to an embodiment of the invention, application program can be the Web page application program being noted above, and wrap Include multiple webpages.Webpage acquiring unit 110 can obtain application according to any method well known in the art Multiple webpages of program.Further, multiple webpages of acquisition can be sent to by webpage acquiring unit 110 Characteristic element set determining unit 120.
According to an embodiment of the invention, characteristic element set determining unit 120 can be determined in multiple webpages Each webpage characteristic element set.Here, characteristic element set determining unit 120 can be from webpage Acquiring unit 110 obtains multiple webpages of application program, can include multiple elements on each webpage, These elements for example can be the combination of word, picture or both.Characteristic element set determining unit 120 The characteristic element set of each webpage can be determined.According to an embodiment of the invention, characteristic element set is The set of the element of the architectural feature of the webpage can be most represented in the element of webpage.To a certain extent, it is special Sign element set just represents this webpage.Further, characteristic element set determining unit 120 can incite somebody to action The characteristic element set of each webpage determined is sent to similarity determining unit 130.
According to an embodiment of the invention, similarity determining unit 130 can be according to the characteristic element of each webpage Element set determines the similarity of each two webpage in multiple webpages.Here, similarity determining unit 130 The characteristic element set of each webpage can be obtained from characteristic element set determining unit 120.Hereinbefore Mention, characteristic element set represents webpage corresponding thereto to a certain extent.Therefore, similarity Determining unit 130 can determine the similarity of each two webpage according to characteristic element set.Specifically, Similarity determining unit 130 can determine the two webpages according to the characteristic element set of two webpages Similarity.Further, similarity determining unit 130 can be for each two webpage in multiple webpages all Such operation is performed, so that it is determined that the similarity of each two webpage.Further, similarity determining unit The similarity of each two webpage of determination can be sent to division unit 140 by 130.
According to an embodiment of the invention, division unit 140 can will more according to the similarity of each two webpage Individual webpage is divided into one or more webpage combinations.Here, division unit 140 can determine single from similarity Member 130 obtains the similarity of each two webpage, so as to according to the similarity of each two webpage by webpage One or more webpage combinations are divided into, wherein the combination of each webpage can be regarded as a kind of net of classification Page.Further, it is single can be sent to selection by division unit 140 for one or more webpages combination of division Member 150.
Choose in each webpage combination that unit 150 can combine from one or more webpages and choose access frequency One webpage of rate highest.According to an embodiment of the invention, it is desirable to choose one in the webpage of each classification Individual most representational webpage, and access the high webpage of frequency be typically important webpage either Relatively popular webpage, thus in the present invention, the importance of webpage is represented with access frequency.Choosing Take unit 150 to be chosen in the combination of each webpage in webpage combination and access frequency one net of highest Page, that is to say, that choose unit 150 and select the total number that the webpage number come is equal to webpage combination. Further, the webpage output come can will be selected by choosing unit 150, including but not limited to be used to test The application program.
As can be seen here, webpage is chosen from the application program including multiple webpages using according to the present invention Apparatus and method, the similarity of each two webpage can be determined according to the characteristic element set of webpage, entered And webpage is divided into one or more webpages and combined.Next, choose one from the combination of each webpage Access frequency highest webpage.So, reliably webpage can be carried out according to the similarity of webpage Classification.Further, it is possible to an access frequency highest webpage is chosen from the combination of each webpage, can Those most important in all webpages and most representational webpages are enough chosen, and can enough efficiently reduces most The number of the webpage selected afterwards.When these webpages selected are used for Application testing, can subtract Few testing time, test resource is saved, simultaneously because being surveyed to wherein most representational webpage Examination, thus can more fully utilize test resource in the case where ensureing to test effect.
Fig. 2 is that webpage is chosen from the application program including multiple webpages according to embodiments of the invention The structured flowchart of the characteristic element set determining unit 120 of device.
As shown in Fig. 2 characteristic element set determining unit 120 can include the He of element acquiring unit 121 Determining unit 122.
According to an embodiment of the invention, element acquiring unit 121 can obtain multiple elements of each webpage.
According to an embodiment of the invention, each webpage can include multiple elements, and these elements can be The combination of picture, word or both.In the present invention, element acquiring unit 121 can be according to ability Any method known to domain obtains all elements on webpage.For example, element acquiring unit 121 can be with Including Traversal Unit (not shown), for traversal applications program, so as to which element acquiring unit 121 can be with The all of each webpage in multiple webpages of application program are obtained according to the traversing result of Traversal Unit Element.In the art, it is known that the method for traversal applications program have a lot, such as depth-first Algorithm or breadth first algorithm, the present invention are not limited this.
According to an embodiment of the invention, the Traversal Unit of element acquiring unit 121 can with traversal applications program, So as to obtain each net in multiple webpages of the tree construction and application program between the webpage of application program DOM (Document Object Model, DOM Document Object Model) tree construction of page.
Fig. 3 is shown between the webpage of the application program according to an embodiment of the invention including multiple webpages Tree construction example.
As shown in figure 3, each node in tree represents a webpage in application program, between node Arrow represents the jump relation between webpage.Also, the vertex representation of tree enters most starting for application program Webpage.For example, in the example depicted in fig. 3, application program includes 7 webpages, one enters application The webpage that program most starts is webpage 1, and webpage 2,3 and 4 can be jumped to after 1 by entering the Web page.Enter One step, when entering the Web page after 2, webpage 5 and 6 can be jumped to, naturally it is also possible to return back to webpage 1, by that analogy.It can simply clearly indicate that the structure of application program and level close with the structure of tree System.
Between the Traversal Unit traversal applications program of element acquiring unit 121 obtains the webpage of application program Tree construction after, the source code of each webpage can also be obtained, and every to determine according to the source code of each webpage The DOM tree structure of individual webpage.Each between the DOM tree structure of webpage and the webpage of application program Tree construction is similar, and each node represents an element in webpage, the arrow between node represent element it Between jump relation.Also, the first layer element that the vertex representation of dom tree is entered the Web page later.It is right In each webpage, such dom tree can be determined, with the DOM tree structure of webpage The structure and hierarchical relationship of webpage can simply be clearly indicated that.Further, Traversal Unit can be with root The father's element and daughter element of all elements on this webpage are obtained according to the DOM tree structure of each webpage.This In, father's element representation of an element can jump to this by which element on the page of element place Element, the daughter element of an element represent which on the page of element place this element can jump to Element.These information can be obtained easily from the DOM tree structure of webpage.
Here, Traversal Unit can also obtain all elements on each webpage by traversal applications program Other relevant informations, including the information such as the position of element, attribute, type.
Further, element acquiring unit 121 can by the element information of acquisition, including the position of element, Attribute, type, dom tree relevant information are sent to determining unit 122.
According to an embodiment of the invention, determining unit 122 can be according to each two element in multiple elements Similarity determines the characteristic element of each webpage, and the spy using the set of characteristic element as each webpage Levy element set.
According to an embodiment of the invention, determining unit 122 can obtain every from element acquiring unit 121 Multiple elements of individual webpage, so that it is determined that unit 122 can determine it is any in multiple elements of each webpage The similarity of two elements, the characteristic element set of each webpage is determined according to the similarity between element.
According to an embodiment of the invention, determining unit 122 can be according to each two element in multiple elements Multiple elements are divided into one or more element groups by similarity.It is next determined that unit 122 can be from An element is chosen in each element group of one or more element groups, and using the element of selection as webpage Characteristic element.
According to an embodiment of the invention, multiple elements can be divided into one or more by determining unit 122 Element group so that at least one other element in element group where any one element similar in appearance to its. In this embodiment, determining unit 122 can divide one or more elements by following mode Group:A., all element groups are set for sky;B. first element is put into an element group;C. will Next element is compared with all elements in existing all elements group, if next member It is plain similar to existing element, then next element is put into and this existing one In the corresponding element group of element, if next element and existing all elements not phases Seemingly, then next element is put into a new element group;Step c is repeated until the webpage On last element.
In this embodiment, determining unit 122 can assign to similar element in one element group. It is worth noting that, all elements present in an element group are not necessarily similar mutually, That is as long as a new element is similar at least one element in some element group, it is possible to This new element is put into this element group.Next, a member is only chosen in each element group Element, this selection can be random selections or the certain regular selection of satisfaction.So, Element in the characteristic element set finally chosen includes all types of elements on webpage and not had Repeat.
According to an embodiment of the invention, determining unit 122 can be according to the position of element, type and element The DOM tree structure of place webpage determines the similar of any two element in multiple elements of each webpage Degree.
According to an embodiment of the invention, the position of element can include the coordinate and size information of element, member The coordinate of element can include the abscissa and ordinate of element, and the size of element represents to include the outer of the element The size of rectangle frame is connect, the type of element can include the types such as picture, word, input frame and button, The DOM tree structure of webpage can determine the information such as father's element and the daughter element of element where element.According to Embodiments of the invention, think that the two elements are when the information of two elements meets following conditions simultaneously Similar element:1) abscissa of two elements or ordinate are identical;2) the type phase of two elements Together;3) size of two elements is identical;4) DOM of two element webpages where the two elements There is identical father's element in tree construction.In an embodiment of the present invention, the abscissa of two elements or Ordinate is identical to represent that the two elements are in same row or same a line on webpage;The class of two elements Type is identical to represent that two elements are all pictures, are all word, are all input frame or are all button;Two Father's element of element is identical to represent that the two elements are by identical father's element saltus step.That is, Illustrate the two elements and its similar if aforementioned four condition is met, belong to same category of element.
It is worth noting that, above though it is shown that determining unit 122 determines the similitude of two elements One embodiment, but the present invention is not defined to this, determining unit 122 can be according to ability Any method known to domain determines the similitude between element.
Show below in conjunction with one of the characteristic element set of clearly fixed each webpage for Fig. 4 and Fig. 5 Example.Fig. 4 shows a webpage of the application program according to an embodiment of the invention including multiple webpages Example.Fig. 5 shows the example of the partial source symbols of the webpage shown in Fig. 4.
As shown in figure 4, include many elements on the webpage, including the element outlined with square frame, Including the element not outlined with square frame, for example, " finance and economics ", " amusement " and " over 4 years first!U.S. a surname Cloth sells platform escort vessel and guided missile " etc..These elements can pass through traversal applications program by Traversal Unit The mode of each webpage obtains.Further, Traversal Unit can also obtain the relevant information of these elements, The position of type, element including element, attribute of an element etc..Further, Traversal Unit can obtain The source code of each webpage is taken, so as to obtain the DOM tree structure of each webpage.For example, Traversal Unit obtains The partial source symbols of the webpage shown in Fig. 4 taken are as shown in Figure 5.It is worth noting that, here for the ease of The purpose of explanation, Fig. 5 merely illustrate the partial source symbols of the webpage shown in Fig. 4, are not shown in Fig. 4 Whole source codes of webpage.As shown in figure 5, " nav class=" site all " " are represented the first row source code An element on webpage, the second row source code "<div>" represent the element represented by the first row source code Saltus step and come element, the third line to the last a line source code represent by the second row source code represent element Saltus step and come element.So as to which Traversal Unit can obtain the webpage shown in Fig. 4 according to source code DOM tree structure.For example, the element representated by the first row source code is located at the summit of DOM tree structure, Element representated by second row source code is located at the second node layer, and the third line is to the last representated by a line source code Element be located at third layer node, have jump relation element between connected with arrow.
It is next determined that unit 122 can determine on each page any two element in all elements Similitude.According to one embodiment of present invention, multiple elements can be divided into one by determining unit 122 Individual or multiple element groups, it is at least one in the element group where any one element similar in appearance to its to cause Other elements.For example, element " news " is put into an element group by determining unit 122, then will For element " finance and economics " compared with element " news ", the ordinate of the two elements is identical, size phase Together, type is identical, and father's element is also identical, thus determines that the two elements are similar, i.e., by element " finance and economics " Same element group is put into element " news ".Next, for element " automobile ", itself and element " news " is similar to element " finance and economics " although dissimilar, so as to which element " automobile " also be put Enter in this group.By this way, determining unit 122 by element " news ", " finance and economics ", " amusement ", " physical culture ", " military affairs ", " picture library ", " video ", " automobile ", " history ", " health ", " culture " are drawn It is divided into an element group, and therefrom randomly selects element " news " and be used as a characteristic element.For it Its element group, can also be determined using similar mode.As shown in figure 4, determining unit 122 The characteristic element set on the webpage determined is the set of those elements outlined with square frame.
As described above, determining unit 122 can determine according to the similarity of each two element in multiple elements The characteristic element of each webpage, and the characteristic element collection using the set of characteristic element as each webpage Close.Further, the characteristic element set of each webpage of determination can be sent to phase by determining unit 122 Like degree determining unit 130, to cause similarity determining unit 130 to determine each two webpage in multiple webpages Similarity.
Fig. 5 is that webpage is chosen from the application program including multiple webpages according to embodiments of the invention The structured flowchart of the similarity determining unit of device.
As shown in figure 5, characteristic element can be included to true according to the similarity determining unit 130 of the present invention Order member 131, computing unit 132 and sum unit 133.
According to an embodiment of the invention, characteristic element can determine the spy of two webpages to determining unit 131 Levy element pair.Wherein, characteristic element is to a characteristic element by a webpage in two webpages and another The characteristic element composition of one webpage.Further, characteristic element can be by really to determining unit 131 Fixed characteristic element is to being sent to computing unit 132.
According to an embodiment of the invention, the characteristic element set of a webpage just represents this webpage, because And the similarity for comparing two webpages is converted to the similarity for comparing the characteristic element set of two webpages. When it is determined that any two webpage in multiple webpages similarity when, characteristic element to determining unit 131 first The characteristic element pair of the two webpages is determined, the characteristic element of the two webpages is to any by a webpage Any one characteristic element of individual characteristic element and another webpage is formed.That is, for characteristic element Element number in set is respectively N and M two webpages, and the number of characteristic element pair should be N ×M。
According to an embodiment of the invention, computing unit 132 can calculate two elements of characteristic element centering Similarity and as the similarity of this feature element pair.
It is noted above, the Traversal Unit in characteristic element set determining unit 120 can be to application program Traveled through so as to obtain the DOM tree structure of the tree construction between the webpage of application program, each webpage With the element information of each webpage, here, computing unit 132 can be obtained using journey from Traversal Unit The DOM tree structure of tree construction, each webpage and the element information of each webpage between the webpage of sequence, And characteristic element is obtained to information to determining unit 131 from characteristic element, so as to according to application program The DOM tree structure of tree construction, each webpage and the element information of each webpage between webpage calculate The similarity of two elements of characteristic element centering.
According to an embodiment of the invention, computing unit 132 can be according to two elements of characteristic element centering In the position of each element, attribute and tree structure information calculate characteristic element centering two elements phase Like degree.
According to an embodiment of the invention, the position of element can include the coordinate and size information of element, member The attribute of element can include mark, title, label and the hypertext reference information of element, the tree knot of element Structure information can include the tree knot between the element place DOM tree structure of webpage and the webpage of application program Structure.Specifically, the tree structure information of element can include father's element, daughter element and the element place of element The information such as residing level in tree construction of the webpage between the webpage of application program.According to the reality of the present invention Example is applied, computing unit 132 can calculate webpage a and webpage b ith feature according to following formula The similarity S of two elements of element centeringab(i):
Sab(i)=αiLiiAiiDi (1)
Wherein, LiRepresent two elements of webpage a and webpage b ith feature element centering in position On similarity, AiRepresent that two elements of webpage a and webpage b ith feature element centering are belonging to Similarity in property, DiRepresent that two elements of webpage a and webpage b ith feature element centering exist Similarity on tree construction, αiiiL is represented respectivelyi、AiAnd DiWeight coefficient, and αiii=1. It is worth noting that, αiiiIt is for element pair and the parameter of setting, that is to say, that for any one Individual characteristic element pair, all it is configured with one group of αiiiParameter, this group of αiiiParameter can be according to actual need Ask, for example, importance in the similarity for judging element of position, attribute and tree construction and set.As One specific example, αiiiThis three is 1/3.
As described above, the similarity of two elements of the characteristic element centering that computing unit 132 calculates can be with It is the similarity on the position after weighting, the similarity on attribute, the similarity sum on tree construction.Enter One step, computing unit 132 can be by webpage a and webpage b two elements of ith feature element centering Similarity Sab(i) as webpage a and webpage b ith feature element pair similarity.
Next it will be explained in detail the L for how calculating each characteristic element pairi、AiAnd Di
Two of webpage a and webpage b ith feature element centering can be calculated by below equation The similarity L of element in positioni
Wherein, denominator l represents number of parameters in position, for example, the position of element can include member Two parameters of coordinate and size of element, then l=2.LisRepresent two members of ith feature element centering Whether s-th of the parameter of element in position be similar, is 1 when similar, is 0 when dissimilar.For example, Li1 Represent whether two elements the 1st parameter in position of ith feature element centering similar, i.e., the Whether two element coordinates of i characteristic element centering are similar.Here it is possible to provide when two elements Abscissa is identical or thinks that the coordinate of the two elements is similar when ordinate is identical.Li2Represent i-th of spy Whether similar levy the 2nd parameter of two elements of element centering in position, i.e. ith feature element Whether two element sizes of centering are similar.Here it is possible to provide to recognize when the size of two elements is identical It is similar for the coordinate of the two elements.That is, when two elements of ith feature element centering are sat When mark and size are all similar, Li=1;When in the two element coordinates and size of ith feature element centering Only one it is similar when, Li=1/2, when ith feature element centering two element coordinates and size all When dissimilar, Li=0.
In a similar way, webpage a and webpage b ith feature can be calculated by below equation Similarity A of two elements of element centering on attribute and on tree constructioniAnd Di
Wherein, a represents the number of parameters on attribute, and d represents the number of parameters on tree construction, example Such as, attribute of an element can include mark, title, label and the hypertext reference information of element, tree knot Webpage is between the webpage of application program where structure can include father's element, daughter element and the element of element The information such as residing level in tree construction, then a=4, d=3.AisRepresent ith feature element centering Two elements on attribute s-th of parameter (for example, the 1st parameter is the mark of element, the 2nd Individual parameter is the title of element, and the 3rd parameter is the label of element, and the 4th parameter is the super text of element This reference information) it is whether similar, it is 1 when similar, is 0, D when dissimilarisRepresent ith feature member S-th parameter of two elements of plain centering on tree construction is (for example, the 1st parameter is the father of element Element, the 2nd parameter are the daughter elements of element, and the 3rd parameter is the level residing for webpage where element) It is whether similar, it is 1 when similar, is 0 when dissimilar.Here it is possible to two elements of self-defining are belonging to Property the implication similar with any one parameter on tree construction, for example, only when two elements in attribute and Just think that the two parameters are similar when parameter on tree construction is identical, i.e., corresponding AisOr DisIt is worth and is 1。
Although describe a kind of computing unit 132 it is worth noting that, above-mentioned and calculate characteristic element centering A kind of embodiment of the similarity of two elements, but the present invention is not limited this, computing unit 132 The two of characteristic element centering elements can also be calculated according to other information or using other algorithms Similarity.Further, computing unit 132 can be to all characteristic elements in two pages to being carried out It is such to operate to calculate the similarity of all characteristic elements pair.Next, computing unit 132 can incite somebody to action The similarity of all characteristic elements pair calculated is sent to sum unit 133.
According to an embodiment of the invention, sum unit 133 can be by all characteristic elements pair of two webpages Similarity and as two webpages similarity.
According to an embodiment of the invention, sum unit 133 can also be by all characteristic elements of two webpages To similarity be weighted, and using similarity after weighting and as two webpages.For example, summation Unit 133 can calculate two webpages a and b similarity S using equation belowab
Wherein, n be webpage a and webpage b characteristic element pair number, Sab(i) webpage a is represented With the similarity of webpage b ith feature element two elements of centering, i.e. the of webpage a and webpage b The similarity of i characteristic element pair, wiRepresent webpage a and webpage b ith feature element centering two The weight coefficient of the similarity of element.It is worth noting that, wiIt is for element pair and the parameter of setting, That is, for any one characteristic element pair, w is all configured withiParameter.In a specific example In, wi=1.
As described above, characteristic element can determine the characteristic element pair of two pages to determining unit 131, Computing unit 132 can calculate the similarity of all characteristic elements pair of two pages, sum unit 133 The similarity of two pages can be calculated.And be directed to each two page, characteristic element to determining unit 131, Computing unit 132 and sum unit 133 can carry out such operation, so as to similarity determining unit 130 can determine the similarity of any two page in multiple pages of application program.
Fig. 6 is that webpage is chosen from the application program including multiple webpages according to embodiments of the invention The structured flowchart of the division unit of device.
As shown in fig. 6, division unit 140 can include judging unit 141 and processing unit 142.
According to an embodiment of the invention, judging unit 141 can be according to the similarity of two webpages and predetermined Threshold value determines whether two webpages are similar.Here, when the similarity of two webpages is more than predetermined threshold, Judging unit 141 can determine that two webpages are similar.Here, judging unit 141 can be true from similarity Order member 130 obtains the similarity of two pages, and then judges whether the two pages are similar.Here, Can according to the actual needs or empirical value sets a predetermined threshold S, as two webpages a and b Similarity Sab>Determine that webpage a is similar with b during S, otherwise it is assumed that webpage a and b are dissimilar.Here, Judging unit 141 can be carried out such operation to any two webpage in multiple webpages, so as to To judge whether any two webpage is similar.
According to an embodiment of the invention, multiple webpages can be divided into one or more nets by processing unit 142 Page combination, following condition is met with each webpage combination in causing one or more webpages to combine:Work as net When multiple webpages be present in page combination, each two webpage in multiple webpages of webpage combination is all similar.
According to an embodiment of the invention, processing unit 142 can obtain two webpages from judging unit 141 Whether similar result, and according to these results divide webpage so that similar webpage is divided into a net Page combination.That is, when multiple webpages be present in webpage combination, in multiple webpages of webpage combination Each two webpage it is all similar;When only existing a webpage in webpage combination, the webpage and other webpages Webpage in combination is all dissimilar.
According to an embodiment of the invention, division unit can draw webpage according to the similarity of each two webpage It is divided into the combination of one or more webpage.So, divided equivalent to by multiple webpages of application program It is all similar mutually per a kind of webpage for one or more classifications.
According to an embodiment of the invention, predetermined threshold S setting have impact on what processing unit 142 marked off The number of webpage combination.When predetermined threshold S is larger, the number of the webpage combination marked off is relatively more; And when predetermined threshold S is smaller, the number of the webpage combination marked off is fewer.Further, weighting system Number αiiiAnd wiIt has impact on the similarity S for two pages that similarity determining unit 130 calculatesab, Also it have impact on the number for the webpage combination that processing unit 142 marks off.Thus in practical operation, when draw The number of the webpage combination separated is especially more or is especially unsatisfactory for the webpage combined number being actually needed less When, can be by adjusting S, αiiiAnd wiValue come adjust webpage combination number.
According to an embodiment of the invention, when the number of webpage combination is more than webpage combined threshold value, list is judged Member 141 is additionally operable to reduce predetermined threshold;And according to the predetermined threshold after the similarity of two webpages and reduction Whether value redefines two webpages similar.With similar above, reduced when the similarity of two webpages is more than During rear predetermined threshold, judging unit 141 can determine that two webpages are similar.Next, processing unit 142 are additionally operable to multiple webpages being divided into one or more webpage combinations again, to cause one or more nets Each webpage combination in page combination meets following condition:When multiple webpages be present in webpage combination, net Each two webpage in multiple webpages of page combination is all similar.
According to an embodiment of the invention, when the number of webpage combination is less than webpage combined threshold value, list is judged Member 141 is additionally operable to raise predetermined threshold;And according to the predetermined threshold after the similarity of two webpages and rise Whether value redefines two webpages similar.With similar above, when the similarity of two webpages is more than rise During rear predetermined threshold, judging unit 141 can determine that two webpages are similar.Next, processing unit 142 are additionally operable to multiple webpages being divided into one or more webpage combinations again, to cause one or more nets Each webpage combination in page combination meets following condition:When multiple webpages be present in webpage combination, net Each two webpage in multiple webpages of page combination is all similar.
According to an embodiment of the invention, when the number of webpage combination is more than webpage combined threshold value, Ke Yitong Crossing reduces predetermined threshold S mode so that the Rule of judgment of webpage similarity is reduced, so as to reduce net The number of page combination.Similarly, when the number of webpage combination is less than webpage combined threshold value, can pass through Raise predetermined threshold S mode so that the Rule of judgment of webpage similarity is raised, so as to increase webpage The number of combination.
According to an embodiment of the invention, the device of webpage is chosen from the application program including multiple webpages 100 can also include access frequency determinative elements (not shown), be used for:Obtain application program webpage it Between tree construction, the webpage in each node on behalf application program in tree;According to from top to bottom and Order from left to right calculates the access frequency of each node in tree;According to from top to bottom and from right to left Order calculate tree in each node access frequency;Iteration is suitable according to from top to bottom and from left to right Sequence calculates the access frequency and each node of order calculating from top to bottom and from right to left of each node Access frequency the step of, until calculating each node access frequency convergence;And the tree by calculating In each node access frequency of the access frequency as the webpage corresponding with node.
It is noted above, the Traversal Unit of element acquiring unit 121 can be obtained with traversal applications program to be applied Tree construction between the webpage of program.Here, Traversal Unit can by the webpage of the application program of acquisition it Between tree construction be sent to access frequency determinative elements, the access frequency for calculating each node is used as The access frequency of the webpage corresponding with node.
When Traversal Unit traversal applications program obtain application program webpage between tree construction after, can obtain Take the information of each node in tree construction.For example, it can be obtained after tree construction as shown in Figure 3 is obtained Take information as shown in the table.
Table 1
Page number Enter chain number Go out chain number Residing level
1 0 3 1
2 2 2 2
3 1 2 2
4 1 0 2
5 1 0 3
6 1 0 3
7 1 0 3
Wherein, the numbering of page number representation page, enter chain number and represent that this page can be jumped to Number of pages, go out chain number and represent that this page can jump to the number of pages of other pages, residing layer Level in the secondary tree construction for representing this page between the webpage of application program, for example, in Fig. 3 institutes In the example shown, the level residing for the page 1 is 1, and the level residing for the page 2,3 and 4 is 2, the page 5th, the level residing for 6 and 7 is 3.Next, access frequency determinative elements can be true according to these information The access probability of fixed each node.
According to an embodiment of the invention, the page is accessed probability and is approximately equal to the accessed frequency of the page.And And after a page is accessed, it is possible to directly exit, it is possible to which retrogressing returns to prevpage, also has The hyperlink on current page may be clicked on, into next layer some page.In the present invention, it is believed that The probability of these three situations is impartial, that is to say, that for a node, its advance probability, is returned Probability and to exit probability be 1/3.Similarly, if the chain number that goes out of certain page is n, then think to visit Ask that the probability of wherein any one page is all impartial, i.e., be all 1/n.Similarly, if certain page enters Chain number is m, then it is also all impartial to think to return back to the probability of wherein any one page, i.e., is all 1/m。
According to an embodiment of the invention, there is certain special case in above-mentioned hypothesis.When calculating summit, the page Face is not by other page jumps.Therefore, for summit 1, it, which exits probability and advance probability, is 1/2, it is 0 to return to probability.Similarly, for the node 5,6 and 7 of the bottom, the page can not redirect To other pages, therefore it is also each 1/2 to exit probability and return to probability, and advance probability is 0.
To sum up, according to an embodiment of the invention, for non-summit and other node is of non-bottom node, Assuming that V (i) is accessed node i probability, it is n to go out chain number, and it is m to enter chain number, then can obtain As shown in the table exits probability, returns to probability and advance probability, wherein, V (i)/3m represents to return to The probability of some node, V (i)/3n represent to proceed to the probability of some node.
Table 2
According to an embodiment of the invention, each node in the tree construction between the webpage of application program is calculated Access frequency include:According to the advance probability of each node, return to probability and exit in probability at least One access frequency for carrying out calculate node.
According to an embodiment of the invention, the order being first according to from top to bottom and from left to right is calculated in tree The access frequency of each node, then calculated according to order from top to bottom and from right to left each in tree The access frequency of node.Such a process is properly termed as the calculating process of a wheel.Taken turns when having performed one Calculating process as such calculating process and then the secondary wheel of execution one, then judges each of calculating Whether the access frequency of node restrains.According to an embodiment of the invention, when epicycle calculating process with it is last round of When the difference of the access frequency for each node that calculating process calculates is less than the predetermined threshold of access frequency, It may determine that the access frequency convergence of each node.Now, each node that last wheel calculates is exported Access frequency, the access frequency as the page corresponding thereto.
According to an embodiment of the invention, the wheel calculating process for calculating the access frequency of each node can be as Under:
A. the probability of each arrow to advance from summit is calculated, and updates each arrow to advance from summit Probability.
B. the result of calculation in step A calculates the access frequency on each summit of next layer, and more The access frequency of new each node.Wherein, the order of calculating is from left to right, and each node connects Enter the probability sum that frequency is all arrows that can reach the node.
C. repeat step A and step B, until the bottom.
D. the probability of each arrow opposite direction is calculated from the bottom, and updates the general of each arrow opposite direction Rate.
E. the access frequency on each summit of last layer is calculated according to step D result of calculation, and is updated The access frequency of each node of last layer.Wherein, the order of calculating is from right to left, and each node Access frequency be all arrows that can reach the node probability sum, including arrow positive direction and anti- Probability in the both direction of direction.
F. repeat step D and step E, until summit.
As an example, next it will be described for how calculating each node in the tree construction shown in Fig. 3 Access the part steps of frequency.In an embodiment of the present invention, it is assumed that the probability on access summit 1 is p.
First, according to step A, the probability of each arrow to advance from summit is calculated, and is updated from summit The probability of each arrow to advance.For example, when calculating is from summit 1 to the forward arrow node 2 During probability, because the probability that summit 1 is advanced is p/2, and summit 1 may be advanced to node 2,3 and 4, thus it is that to be multiplied by 1/3 be p/6 to p/2 that summit 1, which proceeds to the probability of node 2,.Similarly, Ke Yiji Calculate from summit 1 to the probability of the forward arrow node 3 and from summit 1 to the advance node 4 The probability of arrow.That is, (1,2)=p/6, (1,3)=p/6, (Isosorbide-5-Nitrae)=p/6.
Next, according to step B, result of calculation in step A calculates each summit of next layer Access frequency, and update the access frequency of each node.Wherein, the order of calculating is from left to right, And the access frequency of each node is the probability sum for all arrows that can reach the node.For example, When the access frequency of calculate node 2, due to the forward arrow between summit 1 and node 2 can only be passed through Node 2 is reached, therefore forward arrow of the access probability of node 2 between summit 1 and node 2 is general Rate p/6.Similarly, with calculate node 3 and the access frequency of node 4 and can update, i.e. 2=p/6,3=p/6, 4=p/6.
Next, according to step C, step A and step B are repeated, it is following so as to draw The probability of arrow or the frequency of node:(2,5)=p/36, (2,6)=p/36, (3,2)=p/36, (3,7)=p/36, 5=p/36 6=p/36,7=p/36.
Next, according to step D, the probability of each arrow opposite direction is calculated from the bottom, and is updated every The probability of individual arrow opposite direction.For example, when calculating is from summit 7 to the arrow opposite direction node 3 During probability, because the access frequency on the summit 7 calculated in step C is p/36, and node 7 returns generally Rate is 1/2, and node 7 can only return to node 3, thus node 7 returns to the probability of node 3 and is It is p/72 that p/36, which is multiplied by 1/2,.Similarly, the arrow returned to from summit 6 between node 2 can be calculated The probability of the probability of opposite direction and the arrow opposite direction returned to from node 5 between node 2.That is, (7,3)=p/72, (6,2)=p/72 (5,2)=p/72.
Next, according to step E, each summit of last layer is calculated according to step D result of calculation Frequency is accessed, and updates the access frequency of each node of last layer.Wherein, the order of calculating be from the right side to A left side, and the access frequency of each node is the probability sum for all arrows that can reach the node.Example Such as, the access frequency of calculate node 4, because the access frequency that node 4 is reached along arrow positive direction is P/6, node 4 can be reached without arrow opposite direction, thus the access frequency of node 4 is p/6.Again Such as, the access frequency of calculate node 3, due to the positive arrow that can be arrived along node 1 between node 3 Node 3 is reached, the reverse arrow that can also be arrived along node 7 between node 3 reaches node 3, therefore The access frequency of node 3 adds the probability of (7,3) for the probability of (1,3).Similarly, egress can be calculated 2 access frequency, i.e. 4=p/6,3=(1,3)+(7,3), 2=(1,2)+(5,2)+(6,2)+(7,2).
Next, according to step F, repeat step D and E, until summit 1.This calculating process with Said process is similar, is not described in detail herein.
As described above, access frequency determinative elements can determine the access frequency of each node in tree, and As the access frequency of the page corresponding with the node.Further, accessing frequency determinative elements can Unit 150 is chosen so that the access frequency of each page to be sent to, in order to choose unit 150 from each An access frequency highest webpage is chosen in webpage combination.
According to an embodiment of the invention, choosing unit 150 can connect in selection one from the combination of each webpage Enter frequency highest webpage, that is to say, that have chosen a most representational webpage.Thus, select Take webpage device 100 can be chosen from the application program including multiple webpages it is small numbers of but most Important most representational webpage.
According to an embodiment of the invention, the webpage of selection can be sent to outside by choosing the device 100 of webpage The test device connect, the webpage of selection is tested for test device.According to the implementation of the present invention Example, test cell can also be included by choosing the device 100 of webpage, and choosing unit 150 can be by selection Webpage is sent to test cell, for testing the webpage of selection.Either test device is still Test cell is tested the webpage of selection, can be determined according to the test result of the webpage to selection The test result of application program.
According to an embodiment of the invention, although can test the webpage of selection, to ensure to test The resource of test is saved in the case of effect, but the present invention is not limited thereto.According to embodiments of the present invention Selection webpage device 100 after it have chosen final webpage, can be also used for data mining, should With program analysis etc..
According to an embodiment of the invention be used for from the application for including multiple webpages is described with reference to Fig. 8 The method that webpage is chosen in program.
As shown in figure 8, in step S810, multiple webpages of application program are obtained.
Next, in step S820, the characteristic element set of each webpage in multiple webpages is determined.
Next, in step S830, multiple webpages are determined according to the characteristic element set of each webpage In each two webpage similarity.
Next, in step S840, multiple webpages are divided into one according to the similarity of each two webpage Individual or multiple webpage combinations.
Next, in step S850, selected from each webpage combination of one or more webpages combination Take access frequency one webpage of highest.
According to an embodiment of the invention, the characteristic element set bag of each webpage in multiple webpages is determined Include:Obtain multiple elements of each webpage;Determined according to the similarity of each two element in multiple elements every The characteristic element of individual webpage;And the characteristic element set using the set of characteristic element as each webpage.
According to an embodiment of the invention, each net is determined according to the similarity of each two element in multiple elements The characteristic element of page includes:Multiple elements are divided into according to the similarity of each two element in multiple elements One or more element groups;An element is chosen from each element group of one or more element groups;With And the characteristic element using the element of selection as webpage.
According to an embodiment of the invention, according to the DOM of the position of element, type and element place webpage Tree construction determines the similarity of each two element in multiple elements.
According to an embodiment of the invention, determined according to the characteristic element set of each webpage in multiple webpages The similarity of each two webpage includes:The characteristic element pair of two webpages is determined, characteristic element is to by two One characteristic element of a webpage in webpage and a characteristic element of another webpage form;Calculate The similarity of two elements of characteristic element centering and as the similarity of characteristic element pair;And will Similarity and as two webpages the similarity of all characteristic elements pair of two webpages.
According to an embodiment of the invention, determining the similarity of two elements of characteristic element centering includes:Root Determined according to the position of each element in two elements of characteristic element centering, attribute and tree structure information special Levy the similarity of two elements of element centering.
According to an embodiment of the invention, according to the similarity of each two webpage by multiple webpages be divided into one or Multiple webpage combinations include:According to the similarity of two webpages and predetermined threshold determine two webpages whether phase Seemingly, including:When the similarity of two webpages is more than predetermined threshold, determine that two webpages are similar;And Multiple webpages are divided into one or more webpage combinations, it is each in one or more webpages combinations to cause Webpage combination meets following condition:When multiple webpages be present in webpage combination, multiple nets of webpage combination Each two webpage in page is all similar.
According to an embodiment of the invention, when the number of webpage combination is more than webpage combined threshold value, method is also Including:Reduce predetermined threshold;Redefined according to the predetermined threshold after the similarity of two webpages and reduction Whether two webpages are similar, including:When the similarity of two webpages is more than the predetermined threshold after reducing, Determine that two webpages are similar;And multiple webpages are divided into one or more webpages again and combined, to cause Each webpage combination in one or more webpage combinations meets following condition:It is more when existing in webpage combination During individual webpage, each two webpage in multiple webpages of webpage combination is all similar.
According to an embodiment of the invention, in addition to each webpage in multiple webpages of application program is determined Frequency is accessed, wherein it is determined that the access frequency of each webpage in multiple webpages of application program includes: The tree construction between the webpage of application program is obtained, one in each node on behalf application program in tree Webpage;The access frequency of each node in tree is calculated according to order from top to bottom and from left to right;Press The access frequency of each node in tree is calculated according to order from top to bottom and from right to left;Iteration according to from Top to bottm and order from left to right calculate each node access frequency and from top to bottom and from the right side to Left order calculates the step of access frequency of each node, until the access frequency of each node of calculating Convergence;And using the access frequency of each node in the tree of calculating as the webpage corresponding with node Access frequency.
According to an embodiment of the invention, calculating the access frequency of each node in tree includes:According to each The advance probability of node, return probability and exit in probability it is at least one come calculate node access frequency Rate.
According to an embodiment of the invention, in addition to:The webpage of selection is tested;And according to choosing The test result of the webpage taken determines the test result of application program.
The method according to an embodiment of the invention that webpage is chosen from the application program including multiple webpages Above-mentioned steps various embodiments before made detailed description, be not repeated herein It is bright.
Fig. 9 is that webpage is chosen from the application program including multiple webpages according to embodiments of the invention The schematic diagram of the process of method.As shown in figure 9, application program can be obtained by traversal applications program Tree construction between structure of web page, including the webpage of application program, the DOM tree structure of each page, Relevant information of all elements on each page etc..Next it may be determined to the characteristic element of each webpage Element set, so as to element in characteristic element set similarity come by multiple webpages of application program It is divided into one or more combinations, such as combination A, combination B etc..Next, the structure according to webpage The access frequency of each webpage can be determined, so as to choose access frequency maximum in the combination of each webpage Webpage, finally give test webpage, such as webpage a, webpage b etc..
As can be seen here, can be according to webpage using the apparatus and method of the selection webpage according to the present invention Characteristic element set determines the similarity of each two webpage, and according to the similarity of each two webpage by net Page is divided into one or more combinations, and then access frequency one net of highest is chosen from each combination Page.So, the classification of webpage can be reliably achieved, and access frequency can be based on each Most important most representational webpage is chosen in combination.When the webpage of selection is used for Application testing When, time of test can be reduced in the case where ensureing to test effect, so as to save the resource of test.
Obviously, according to each of the method that webpage is chosen from the application program including multiple webpages of the present invention Individual operating process can be to be stored in various machine readable storage mediums computer executable program Mode realize.
Moreover, the purpose of the present invention can also be accomplished in the following manner:Above-mentioned executable journey will be stored with The storage medium of sequence code is directly or indirectly supplied to system or equipment, and in the system or equipment Computer or CPU (CPU) read and perform said procedure code.Now, as long as The system or equipment have the function of configuration processor, then embodiments of the present invention are not limited to program, And the program can also be arbitrary form, for example, target program, interpreter perform program or It is supplied to shell script of operating system etc..
These above-mentioned machinable mediums include but is not limited to:Various memories and memory cell, half Conductor device, disk cell such as light, magnetic and magneto-optic disk, and other media suitable for storage information etc..
In addition, computer is by the corresponding website that is connected on internet, and by the meter according to the present invention Calculation machine program code is downloaded and is installed in computer and then performs the program, can also realize the present invention's Technical scheme.
Figure 10 is that can wherein realize to be examined according to the repetition collapse being used for application program of the present invention The block diagram of the example arrangement of the general purpose personal computer of the apparatus and method of survey.
As shown in Figure 10, CPU 1001 according to the program stored in read-only storage (ROM) 1002 or from The program that storage part 1008 is loaded into random access memory (RAM) 1003 performs various processing. In RAM 1003, the number required when CPU 1001 performs various processing etc. is stored also according to needs According to.CPU 1001, ROM 1002 and RAM 1003 are connected to each other via bus 1004.Input/ Output interface 1005 is also connected to bus 1004.
Components described below is connected to input/output interface 1005:Importation 1006 (including keyboard, mouse Etc.), output par, c 1007 (including display, such as cathode-ray tube (CRT), liquid crystal display (LCD) Deng, and loudspeaker etc.), storage part 1008 (including hard disk etc.), communications portion 1009 (including NIC such as LAN card, modem etc.).Communications portion 1009 via network such as because Spy's net performs communication process.As needed, driver 1010 can be connected to input/output interface 1005. Detachable media 1011 such as disk, CD, magneto-optic disk, semiconductor memory etc. quilt as needed On driver 1010 so that the computer program read out is installed to storage as needed In part 1008.
In the case where realizing above-mentioned series of processes by software, from network such as internet or storage medium For example the installation of detachable media 1011 forms the program of software.
It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 10 wherein Have program stored therein, separately distribute to provide a user the detachable media 1011 of program with equipment.Can The example of dismounting medium 1011 includes disk (include floppy disk (registration mark)), CD (includes that CD is read-only deposits Reservoir (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (including mini-disk (MD) (registration mark)) And semiconductor memory.Or storage medium can be ROM 1002, storage part 1008 in include Hard disk etc., wherein computer program stored, and user is distributed to together with the equipment comprising them.
In the system and method for the present invention, it is clear that each unit or each step are can to decompose and/or again Combination.These decompose and/or reconfigured the equivalents that should be regarded as the present invention.Also, perform above-mentioned The step of series of processes can order naturally following the instructions perform in chronological order, but and need not Necessarily perform sequentially in time.Some steps can perform parallel or independently of one another.
Although embodiments of the invention are described in detail with reference to accompanying drawing above, it is to be understood that institute above The embodiment of description is only intended to the explanation present invention, and is not construed as limiting the invention.For this For the technical staff in field, above-mentioned embodiment can be made various changes and modifications without departing from The spirit and scope of the invention.Therefore, the scope of the present invention only by appended claim and its equivalent contains Justice limits.
On the embodiment including above example, following note is also disclosed:
A kind of 1. devices that webpage is chosen from the application program including multiple webpages are attached, including:
Webpage acquiring unit, for obtaining multiple webpages of the application program;
Characteristic element set determining unit, for determining the feature of each webpage in the multiple webpage Element set;
Similarity determining unit is described more for being determined according to the characteristic element set of each webpage The similarity of each two webpage in individual webpage;
Division unit, for the multiple webpage to be divided into one according to the similarity of each two webpage Individual or multiple webpage combinations;And
Unit is chosen, each webpage for being combined from one or more of webpages is chosen in combining and connect Enter one webpage of frequency highest.
Device of the note 2. according to note 1, wherein, the characteristic element set determining unit Including:
Element acquiring unit, for obtaining multiple elements of each webpage;And
Determining unit, it is described every for being determined according to the similarity of each two element in the multiple element The characteristic element of individual webpage, and the feature using the set of the characteristic element as each webpage Element set.
Device of the note 3. according to note 2, wherein, the determining unit is used for:
The multiple element is divided into one according to the similarity of each two element in the multiple element Individual or multiple element groups;
An element is chosen from each element group of one or more of element groups;And
Characteristic element using the element of selection as the webpage.
Devices of the note 4. according to note 3, wherein, the determining unit is according to the position of element Put, the DOM tree structure of webpage where type and element determines each two member in the multiple element The similarity of element.
Device of the note 5. according to note 1, wherein, the similarity determining unit includes:
Characteristic element is to determining unit, for determining the characteristic element pair of described two webpages, the spy Element is levied to a characteristic element by a webpage in described two webpages and another webpage One characteristic element composition;
Computing unit, for two elements calculating the characteristic element centering similarity and made For the similarity of the characteristic element pair;And
Sum unit, for using the similarity of all characteristic elements pair of described two webpages and as The similarity of described two webpages.
Device of the note 6. according to note 5, wherein, the computing unit is according to the feature Position, attribute and the tree structure information of each element in two elements of element centering calculate the spy Levy the similarity of two elements of element centering.
Device of the note 7. according to note 1, wherein, the division unit includes:
Judging unit, described two webpages are determined for the similarity according to two webpages and predetermined threshold It is whether similar, including:When the similarity of described two webpages is more than predetermined threshold, described two are determined Individual webpage is similar;And
Processing unit, combined for the multiple webpage to be divided into one or more webpages, to cause The each webpage combination stated in one or more webpage combinations meets following condition:When the webpage combines In when multiple webpages be present, each two webpage in multiple webpages of the webpage combination is all similar.
Device of the note 8. according to note 7, wherein, when the number of webpage combination is more than During webpage combined threshold value, the judging unit is additionally operable to reduce the predetermined threshold;And according to two Whether the predetermined threshold after the similarity of webpage and reduction redefines described two webpages similar, bag Include:When the similarity of described two webpages is more than the predetermined threshold after reducing, described two nets are determined Page is similar, and the processing unit is additionally operable to the multiple webpage being divided into one or more nets again Page combination, following bar is met with each webpage combination in causing one or more of webpages to combine Part:It is every in multiple webpages of the webpage combination when multiple webpages be present in webpage combination Two webpages are all similar.
Device of the note 9. according to note 1, wherein, it is true that described device also includes access frequency Order member, is used for:
Obtain the tree construction between the webpage of the application program, each node on behalf institute in the tree State a webpage in application program;
The access frequency of each node in the tree is calculated according to order from top to bottom and from left to right Rate;
The access frequency of each node in the tree is calculated according to order from top to bottom and from right to left Rate;
Iteration according to order from top to bottom and from left to right calculate each node access frequency and Order from top to bottom and from right to left calculates the step of access frequency of each node, until calculating The access frequency convergence of each node;And
Using the access frequency of each node in the tree of calculating as corresponding with the node The access frequency of webpage.
Note 10. according to note 9 described in devices, wherein, it is described access frequency determinative elements according to The advance probability of each node, return to probability and exit at least one described to calculate in probability The access frequency of each node.
Device of the note 11. according to note 1, wherein, described device also includes test cell, For testing the webpage of selection, and according to determining the test result of the webpage to selection The test result of application program.
A kind of 12. methods that webpage is chosen from the application program including multiple webpages are attached, including:
Obtain multiple webpages of the application program;
Determine the characteristic element set of each webpage in the multiple webpage;
The each two net in the multiple webpage is determined according to the characteristic element set of each webpage The similarity of page;
The multiple webpage is divided into by one or more webpages according to the similarity of each two webpage Combination;And
Access frequency highest is chosen from each webpage combination of one or more of webpages combination One webpage.
Method of the note 13. according to note 12, wherein it is determined that every in the multiple webpage The characteristic element set of individual webpage includes:
Obtain multiple elements of each webpage;
The feature of each webpage is determined according to the similarity of each two element in the multiple element Element;And
Characteristic element set using the set of the characteristic element as each webpage.
Method of the note 14. according to note 13, wherein, according to the similar of the multiple element Degree determines that the characteristic element of each webpage includes:
The multiple element is divided into one according to the similarity of each two element in the multiple element Individual or multiple element groups;
An element is chosen from each element group of one or more of element groups;And
Characteristic element using the element of selection as the webpage.
Note 15. according to note 14 described in methods, wherein, according to the position of element, type and The DOM tree structure of webpage where element determines the similar of each two element in the multiple element Degree.
Method of the note 16. according to note 12, wherein, according to the feature of each webpage Element set determines that the similarity of each two webpage in the multiple webpage includes:
The characteristic element pair of described two webpages is determined, the characteristic element is to by described two webpages A webpage a characteristic element and another webpage a characteristic element form;
Calculate the similarity of two elements of the characteristic element centering and as the characteristic element The similarity of element pair;And
Using the similarity of all characteristic elements pair of described two webpages and as described two webpages Similarity.
Method of the note 17. according to note 16, wherein it is determined that the characteristic element centering The similarity of two elements includes:According to each element in two elements of the characteristic element centering Position, attribute and tree structure information determine the characteristic element centering two elements similarity.
Method of the note 18. according to note 12, wherein, according to the phase of each two webpage The multiple webpage is divided into one or more webpage combinations like degree includes:
Determine whether described two webpages are similar according to the similarity of two webpages and predetermined threshold, wrap Include:When the similarity of described two webpages is more than predetermined threshold, determine that described two webpages are similar; And
The multiple webpage is divided into one or more webpage combinations, to cause one or more of nets Each webpage combination in page combination meets following condition:Multiple webpages be present in the webpage combines When, each two webpage in multiple webpages of webpage combination is all similar.
Method of the note 19. according to note 18, wherein, when the number of webpage combination is big When webpage combined threshold value, methods described also includes:
Reduce the predetermined threshold;
Described two webpages are redefined according to the predetermined threshold after the similarity of two webpages and reduction It is whether similar, including:When the similarity of described two webpages is more than the predetermined threshold after reducing, really Fixed described two webpages are similar;And
The multiple webpage is divided into one or more webpage combinations again, it is one or more to cause Each webpage combination in individual webpage combination meets following condition:It is multiple when existing in webpage combination During webpage, each two webpage in multiple webpages of the webpage combination is all similar.
A kind of 20. machinable mediums are attached, carry the machine including being stored therein thereon The program product of device readable instruction code, wherein, the instruction code is when by computer reading and execution When, the computer can be made to perform the method according to any one of note 12-19.

Claims (10)

1. a kind of device that webpage is chosen from the application program including multiple webpages, including:
Webpage acquiring unit, for obtaining multiple webpages of the application program;
Characteristic element set determining unit, for determining the feature of each webpage in the multiple webpage Element set;
Similarity determining unit is described more for being determined according to the characteristic element set of each webpage The similarity of each two webpage in individual webpage;
Division unit, for the multiple webpage to be divided into one according to the similarity of each two webpage Individual or multiple webpage combinations;And
Unit is chosen, each webpage for being combined from one or more of webpages is chosen in combining and connect Enter one webpage of frequency highest.
2. device according to claim 1, wherein, the characteristic element set determining unit Including:
Element acquiring unit, for obtaining multiple elements of each webpage;And
Determining unit, it is described every for being determined according to the similarity of each two element in the multiple element The characteristic element of individual webpage, and the feature using the set of the characteristic element as each webpage Element set.
3. device according to claim 2, wherein, the determining unit is used for:
The multiple element is divided into one according to the similarity of each two element in the multiple element Individual or multiple element groups;
An element is chosen from each element group of one or more of element groups;And
Characteristic element using the element of selection as the webpage.
4. device according to claim 3, wherein, the determining unit is according to the position of element Put, the DOM tree structure of webpage where type and element determines each two member in the multiple element The similarity of element.
5. device according to claim 1, wherein, the similarity determining unit includes:
Characteristic element is to determining unit, for determining the characteristic element pair of described two webpages, the spy Element is levied to a characteristic element by a webpage in described two webpages and another webpage One characteristic element composition;
Computing unit, for two elements calculating the characteristic element centering similarity and made For the similarity of the characteristic element pair;And
Sum unit, for using the similarity of all characteristic elements pair of described two webpages and as The similarity of described two webpages.
6. device according to claim 5, wherein, the computing unit is according to the feature Position, attribute and the tree structure information of each element in two elements of element centering calculate the spy Levy the similarity of two elements of element centering.
7. device according to claim 1, wherein, the division unit includes:
Judging unit, described two webpages are determined for the similarity according to two webpages and predetermined threshold It is whether similar, including:When the similarity of described two webpages is more than predetermined threshold, described two are determined Individual webpage is similar;And
Processing unit, combined for the multiple webpage to be divided into one or more webpages, to cause The each webpage combination stated in one or more webpage combinations meets following condition:When the webpage combines In when multiple webpages be present, each two webpage in multiple webpages of the webpage combination is all similar.
8. device according to claim 7, wherein, when the number of webpage combination is more than During webpage combined threshold value, the judging unit is additionally operable to reduce the predetermined threshold;And according to two Whether the predetermined threshold after the similarity of webpage and reduction redefines described two webpages similar, bag Include:When the similarity of described two webpages is more than the predetermined threshold after reducing, described two nets are determined Page is similar, and the processing unit is additionally operable to the multiple webpage being divided into one or more nets again Page combination, following bar is met with each webpage combination in causing one or more of webpages to combine Part:It is every in multiple webpages of the webpage combination when multiple webpages be present in webpage combination Two webpages are all similar.
9. device according to claim 1, wherein, it is true that described device also includes access frequency Order member, is used for:
Obtain the tree construction between the webpage of the application program, each node on behalf institute in the tree State a webpage in application program;
The access frequency of each node in the tree is calculated according to order from top to bottom and from left to right Rate;
The access frequency of each node in the tree is calculated according to order from top to bottom and from right to left Rate;
Iteration according to order from top to bottom and from left to right calculate each node access frequency and Order from top to bottom and from right to left calculates the step of access frequency of each node, until calculating The access frequency convergence of each node;And
Using the access frequency of each node in the tree of calculating as corresponding with the node The access frequency of webpage.
10. a kind of method that webpage is chosen from the application program including multiple webpages, including:
Obtain multiple webpages of the application program;
Determine the characteristic element set of each webpage in the multiple webpage;
The each two net in the multiple webpage is determined according to the characteristic element set of each webpage The similarity of page;
The multiple webpage is divided into by one or more webpages according to the similarity of each two webpage Combination;And
Access frequency highest is chosen from each webpage combination of one or more of webpages combination One webpage.
CN201610305142.8A 2016-05-10 2016-05-10 Apparatus and method for choosing webpage Pending CN107357716A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610305142.8A CN107357716A (en) 2016-05-10 2016-05-10 Apparatus and method for choosing webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610305142.8A CN107357716A (en) 2016-05-10 2016-05-10 Apparatus and method for choosing webpage

Publications (1)

Publication Number Publication Date
CN107357716A true CN107357716A (en) 2017-11-17

Family

ID=60271719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610305142.8A Pending CN107357716A (en) 2016-05-10 2016-05-10 Apparatus and method for choosing webpage

Country Status (1)

Country Link
CN (1) CN107357716A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011053912A (en) * 2009-09-02 2011-03-17 Nec Corp Page similarity determination apparatus, page similarity determination method and page similarity determination program
CN103049562A (en) * 2012-12-31 2013-04-17 华为技术有限公司 Method and device for recognizing similar webpages
CN103853654A (en) * 2012-11-30 2014-06-11 国际商业机器公司 Method and device for selecting webpage testing paths
CN104504086A (en) * 2014-12-25 2015-04-08 北京国双科技有限公司 Clustering method and device for webpage
CN104657391A (en) * 2013-11-21 2015-05-27 阿里巴巴集团控股有限公司 Page processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011053912A (en) * 2009-09-02 2011-03-17 Nec Corp Page similarity determination apparatus, page similarity determination method and page similarity determination program
CN103853654A (en) * 2012-11-30 2014-06-11 国际商业机器公司 Method and device for selecting webpage testing paths
CN103049562A (en) * 2012-12-31 2013-04-17 华为技术有限公司 Method and device for recognizing similar webpages
CN104657391A (en) * 2013-11-21 2015-05-27 阿里巴巴集团控股有限公司 Page processing method and device
CN104504086A (en) * 2014-12-25 2015-04-08 北京国双科技有限公司 Clustering method and device for webpage

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范意兴 等: ""一种基于网页块特征的多级网页聚类方法"", 《山东大学学报(理学版)》 *

Similar Documents

Publication Publication Date Title
US10664999B2 (en) Saliency prediction for a mobile user interface
Jongman et al. Declining vulnerability to river floods and the global benefits of adaptation
US20200037102A1 (en) Method and apparatus for determining index grids of geo-fence
US8898296B2 (en) Detection of boilerplate content
US20140074758A1 (en) Self organizing maps for visualizing an objective space
US20180232351A1 (en) Joining web data with spreadsheet data using examples
EP3828803A1 (en) Ambient point-of-interest recommendation using look-alike groups
US20230024680A1 (en) Method of determining regional land usage property, electronic device, and storage medium
CN111428457A (en) Automatic formatting of data tables
CN113128588B (en) Model training method, device, computer equipment and computer storage medium
CN106503211A (en) Information issues the method that the mobile edition of class website is automatically generated
US20220138954A1 (en) Progressively-trained scale-invariant and boundary-aware deep neural network for the automatic 3d segmentation of lung lesions
CN103885767B (en) System and method used for geographical area correlated websites
US20200320165A1 (en) Techniques for generating templates from reference single page graphic images
CN107992589A (en) A kind of loading method, the apparatus and system of SVG map datums
US20220114269A1 (en) Page processing method, electronic apparatus and non-transitory computer-readable storage medium
Wei et al. Efficient Priority-Flood depression filling in raster digital elevation models
CN116910335A (en) Data acquisition method and system based on webpage label analysis
CN111339396B (en) Method, device and computer storage medium for extracting webpage content
US8175338B2 (en) Map-based aesthetic evaluation of document layouts
Godfrey et al. An adaptable approach for generating vector features from scanned historical thematic maps using image enhancement and remote sensing techniques in a geographic information system
Ngolo et al. Integrating geographical information systems, remote sensing, and machine learning techniques to monitor urban expansion: an application to Luanda, Angola
CN109658485A (en) Web animation method for drafting, device, computer equipment and storage medium
Burgette et al. Multiple-shrinkage multinomial probit models with applications to simulating geographies in public use data
CN107357716A (en) Apparatus and method for choosing webpage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171117

WD01 Invention patent application deemed withdrawn after publication