CN105740475A - Web page transformation method and system - Google Patents

Web page transformation method and system Download PDF

Info

Publication number
CN105740475A
CN105740475A CN201610154451.XA CN201610154451A CN105740475A CN 105740475 A CN105740475 A CN 105740475A CN 201610154451 A CN201610154451 A CN 201610154451A CN 105740475 A CN105740475 A CN 105740475A
Authority
CN
China
Prior art keywords
page
object model
unit
document object
subtree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610154451.XA
Other languages
Chinese (zh)
Other versions
CN105740475B (en
Inventor
陈湘萍
赖少凡
陈榕涛
陈庆
程健
高逸斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201610154451.XA priority Critical patent/CN105740475B/en
Publication of CN105740475A publication Critical patent/CN105740475A/en
Application granted granted Critical
Publication of CN105740475B publication Critical patent/CN105740475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a web page transformation method and system.The method comprises the steps that an HTML document object model is adopted for processing an input HTML document, and a document object model tree is obtained; the document object model tree is subjected to subtree processing, and page information units corresponding to subtrees are obtained according to the processing result; the page information units are subjected to similarity processing, and a mapping relation between the page information units is established according to the similarity processing result; according to the mapping relation, information of a page to be transformed is input into a target page according to the mapping relation, and page transformation is completed.By means of the web page transformation method and system, the requirement for the skills of a web page designer on UI design is lowered, the designer only needs to conduct slight modification and adjustment on the produced and transformed web page, the web page designer can visually select a web page template, and the generated page can satisfy the fondness of users to the maximum extent.

Description

A kind of web page conversion method and system
Technical field
The present invention relates to Internet technical field, particularly relate to a kind of web page conversion method and system.
Background technology
In recent years, along with the safe day by day of the universal of network and the technology of E-Payment by mails, specialized, shopping online became a kind of important way of resident's shopping.One household appliances manufacturer such as Taobao, sky cat, Suning, Guomei emerges day by day, drives the development of electricity business's industry.Along with the demand that people are growing, increasing businessman has offered Online Store, brings the great demand to Online Store's class webpage design therewith.It practice, existing a large amount of Online Store is splendid design sample, if a kind of method can be had can to make oneself Online Store rapidly according to these web page templates can reduce design cost largely.Regrettably, not yet there are effective means that template can be utilized to build webpage rapidly at present.Mainly be similar to QQ space one key changes skin to the method for the fast Template conversion that presently, there are, and blog one key changes the conversion of page method based on equity template of this type of skin;Page reconstructing method based on color transfer.
But, there is the template to original web page and target web and require strict equity in the above-mentioned conversion of page method based on equity template, can inject by the accurate of guarantee information, but the aspect such as integral layout, plug-in unit layout, overall framework changes after this result also in conversion of page, this page converts and is merely resting on the color on surface and the conversion of style and is not the webpage that generates different designs, it is impossible to is regarded as template truly and converts;Page reconstructing method based on color transfer is only the conversion in color rather than the conversion in template, can not generate the new page of application another kind design.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, the invention provides a kind of web page conversion method and system, reduce webpage design personnel to the UI grounding in basic skills designed, and the webpage of the conversion produced has only to designer and slightly revises and adjust, webpage design personnel can select Template web page intuitively so that the page of generation can meet the hobby of user to greatest extent.
In order to solve above-mentioned technical problem, embodiments providing a kind of web page conversion method, described method includes:
Adopt html document object model that the html file of input is processed, obtain document object model tree;
Described document object model tree is carried out subtree process, obtains, according to result, the page info unit that described subtree is corresponding;
Described page info unit is carried out similarity process, sets up the mapping relations between described page info unit according to similarity result;
According to described mapping relations, page info to be converted is injected in target pages along described mapping relations, completes conversion of page.
Preferably, described html file includes html file to be converted and target html file.
Preferably, described described document object model tree is carried out subtree process, obtain, according to result, the page info unit that described subtree is corresponding, including:
Described document object model tree is carried out traversal processing, obtains the leaf node of described document object model tree;
Determining that described leaf node current depth is d, according to described degree of depth d, expansion depth is the leaf node of d-1;
Calculate the frequency of occurrence in described document object model tree of the subtree in the leaf node that the described degree of depth is d-1;
If judging when described frequency of occurrence is not less than threshold value, then continue to extend the described leaf node degree of depth, if described frequency of occurrence is less than threshold value, then export described subtree as DOM Document Object Model subtree;
Obtain the page info unit that described DOM Document Object Model subtree is corresponding.
Preferably, described described page info unit is carried out similarity process, set up the mapping relations between described page info unit according to similarity result, including:
Described page info unit is carried out similarity process, obtains two page info units that in described page info unit, similarity is the highest;
It is associated two the highest for described similarity page info units processing, obtains the mapping relations between page info unit.
Preferably, described according to the described mapping relations between described page info unit, page info to be converted is injected in target pages along described mapping relations, including:
DOM Document Object Model subtree is processed, obtains the minimum repetitive of described DOM Document Object Model subtree;
Page info to be converted is injected along described mapping relations the described minimum repetitive of target pages, obtains and change minimum repetitive;
Carry out conversion of page according to the minimum repetitive of described conversion, complete conversion of page.
Correspondingly, the embodiment of the present invention additionally provides a kind of web page converting system, and described system includes:
Document processing module: for adopting html document object model that the html file of input is processed, obtain document object model tree;
Page info unit acquisition module: for described document object model tree is carried out subtree process, obtains, according to result, the page info unit that described subtree is corresponding;
Module is set up in mapping: for described page info unit is carried out similarity process, set up the mapping relations between described page info unit according to similarity result;
Page conversion module: for according to described mapping relations, page info to be converted being injected in target pages along described mapping relations, completes conversion of page.
Preferably, described html file includes html file to be converted and target html file.
Preferably, described page info unit acquisition module includes:
Traversal processing unit: for described document object model tree is carried out traversal processing, obtain the leaf node of described document object model tree;
The degree of depth is determined and expanding element: being used for determining that described leaf node current depth is d, according to described degree of depth d, expansion depth is the leaf node of d-1;
Frequency of occurrence computing unit: for calculating the subtree frequency of occurrence in described document object model tree in the leaf node that the described degree of depth is d-1;
Judging unit: if be not less than threshold value for the described frequency of occurrence of judgement, then continue to extend the described leaf node degree of depth, if described frequency of occurrence is less than threshold value, then export described subtree as DOM Document Object Model subtree;
Page info unit acquiring unit: obtain the page info unit that described DOM Document Object Model subtree is corresponding.
Preferably, described mapping is set up module and is included:
Similar retrieval unit: for described page info unit being carried out similarity process, obtain two page info units that in page info unit, similarity is the highest;
Unit is set up in mapping: for being associated processing according to two page info units that described similarity is the highest, obtain the mapping relations between page info unit.
Preferably, described page conversion module includes:
Subtree processing unit: for DOM Document Object Model subtree is processed, obtain the minimum repetitive of described DOM Document Object Model subtree;
Information injection unit: for according to described mapping relations, page info to be converted being injected along described mapping relations the described minimum repetitive of target pages, obtain and change minimum repetitive;
Conversion of page unit: for carrying out conversion of page according to the minimum repetitive of described conversion, complete conversion of page.
In implementing embodiment of the present invention process, reduce webpage design personnel to the UI grounding in basic skills designed, and the webpage of the conversion produced has only to designer and slightly revises and adjust, webpage design personnel can select Template web page intuitively so that the page of generation can meet the hobby of user to greatest extent;In whole process, user has only to select original web page and Template web page, and whole process all need not manually participate in, and user has only to wait new web page file generated, easy and simple to handle, fast operation, especially because electricity business's structure of web page is often similar, its structure is more similar, and the effect of conversion is more good.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, the accompanying drawing used required in embodiment or description of the prior art will be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the premise not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the schematic flow sheet of the web page conversion method of the embodiment of the present invention;
Fig. 2 is the schematic flow sheet of the page info unit acquisition of the embodiment of the present invention;
Fig. 3 is the schematic flow sheet of the page info to be converted injection target pages of the embodiment of the present invention;
Fig. 4 is the structure composition schematic diagram of the web page converting system of the embodiment of the present invention;
Fig. 5 is the structure composition schematic diagram of the page info unit acquisition module of the embodiment of the present invention;
Fig. 6 is the structure composition schematic diagram of the page conversion module of the embodiment of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under not making creative work premise, broadly fall into the scope of protection of the invention.
Fig. 1 is the schematic flow sheet of the web page conversion method of the embodiment of the present invention, as it is shown in figure 1, the method includes:
S11: adopt html document object model that the html file of input is processed, obtain document object model tree;
S12: the document object model tree is carried out subtree process, obtains, according to result, the page info unit that this subtree is corresponding;
S13: this page info unit is carried out similarity process, sets up the mapping relations between this page info unit according to similarity result;
S14: according to these mapping relations, injects page info to be converted in target pages along these mapping relations, completes conversion of page.
S11 is described further:
DOM Document Object Model (DOM) is a kind of interface unrelated with browser, platform and language, DOM Document Object Model gives web designer or one unified standard of software developer fully, it is possible to allow web designer or software developer access the data in the website of oneself, script and top layer object.Adopt html document object model (DOM) that the html file of input is processed, the structure tree with element, attribute and text in html file is presented, this structure tree is called document object model tree, and wherein input html file is divided into html file to be converted and target html file.
In implementing the present embodiment process, if run into the irregular html file page, by using the increase income storehouse of increasing income in storehouse etc. of beautifulsoup code file to be converted to the html file having reference format, then html document object model (DOM) is adopted to process.
S12 is described further:
The document object model tree is carried out DOM Document Object Model subtree process, obtains DOM Document Object Model subtree, obtain, according to the document object model subtree, the page info unit that subtree is corresponding;Wherein, DOM Document Object Model subtree is the subtree that the degree of depth is deep as far as possible, frequency of occurrence is high retaining semantics information.
Fig. 2 is the schematic flow sheet of the page info unit acquisition of the embodiment of the present invention, as in figure 2 it is shown, this schematic flow sheet is as follows:
S121: the document object model tree carries out traversal processing, obtains the leaf node of the document object model tree;
S122: determine that this leaf node current depth is d, according to this degree of depth d, expansion depth is the leaf node of d-1;
S123: calculate the frequency of occurrence in the document object model tree of the subtree in the leaf node that this degree of depth is d-1;
S124: if this frequency of occurrence is not less than threshold value, then return S122, if this frequency of occurrence is less than threshold value, then exports this subtree as DOM Document Object Model subtree;
S125: obtain the page info unit that the document object model subtree is corresponding.
S121 is described further:
Document object model tree is traveled through by the mode that can adopt level traversal or extreme saturation in embodiments of the present invention, by the document object model tree travels through each leaf node information obtaining the document object model tree.
S122 is described further:
Characteristic according to tree, it is determined that the degree of depth of current leaf node, sets the current leaf node degree of depth as d, adopts the process of " growth " to extend on the leaf node that the degree of depth is d-1 layer.
S123 is described further:
When the process adopting " growth " extends on the leaf node that the degree of depth is d-1 layer, statistics gets the subtree quantity in the leaf node of d-1 layer, then calculates these subtrees frequency of occurrence whole document object model tree.
S124 is described further:
The frequency of occurrence obtained and threshold value are compared, if frequency of occurrence is not less than comparison threshold value, then needs to return S122;If this frequency of occurrence is less than comparison threshold value, export this subtree as DOM Document Object Model subtree;The comparison threshold value adopted in embodiments of the present invention is 2, however it is necessary that and determines according to the concrete condition of user, and threshold value can be set according to different situations.
It is below the example of DOM Document Object Model subtree:
S125 is described further:
Page info unit corresponding in the document object model subtree is obtained by DOM Document Object Model subtree;Wherein, page info unit is the unit of structure and information conversion between the page.
It is below the page info unit of the example of DOM Document Object Model subtree in S124:
S13 is described further:
This page info unit is carried out similarity process, sets up the mapping relations between this page info unit according to similarity result.
Further, adopt the mode of traversal to treat the information word in conversion page and target pages information word carries out traversal processing, obtain the page info unit that the mutual similarity in page info to be converted unit and target pages information word is the highest;Specifically two recirculate, first recirculates travels through all page info unit Ae of all page infos to be converted unit, to each Ae second recirculate traversal target pages information word in all page info unit Be time, mutually compare, obtain similarity S between the two, after traversal has compared, S is maximum, then illustrate that this page info unit Ae is the most relevant to page info unit Be.
Then, it is associated processing by page info unit the highest for two similaritys, sets up the mapping relations between page info unit.
In searching loop process, by the black box function Fnlp that natural speech processes, first recirculate traversal and second recirculate traversal time, two page info units of input, output is the display S of the two page info unit;In page info unit association process, by the information extraction of each page info unit out, transfer a phrase vector V to, utilize the distance between the element of Fnlp function two V of definition, consider that the length between different V is likely different, adopt the technology DTW of dynamic calculation vector distance, calculate the distance between phrase vector V, obtain the page to be converted mapping relations to the multi-to-multi between the page info unit of target pages.
S14 is described further:
According to these mapping relations, page info to be converted is injected in target pages along these mapping relations, completes conversion of page.
Fig. 3 is the schematic flow sheet of the page info to be converted injection target pages of the embodiment of the present invention, as it is shown on figure 3, this schematic flow sheet is as follows:
S141: DOM Document Object Model subtree is processed, obtains the minimum repetitive of the document object model subtree;
S142: page info to be converted injects this minimum repetitive of target pages along these mapping relations, obtains and changes minimum repetitive;
S143: carry out conversion of page according to the minimum repetitive of this conversion, complete conversion of page.
S141 is described further:
DOM Document Object Model subtree is carried out minimum repetitive process, obtains the minimum repetitive of the document object model subtree;Minimum repetitive refers in DOM Document Object Model subtree, have complete message structure minimum that repeat, remove semantic information structure subtree.
Further, processing procedure is from the leaf node of DOM Document Object Model subtree upwards " growth ", when extending from d layer to d-1 layer, judge current " growth " subtree out frequency of occurrence in the DOM Document Object Model subtree at its place, set threshold value F as d-1 layer interstitial content 2/3, if frequency of occurrence more than threshold value F, then judges from the subtree that d-1 layer is downward it is the minimum repetitive of the document object model subtree;If frequency of occurrence is less than threshold value F, then DOM Document Object Model subtree continues " growth ".
It is below an example of medium and small repetitive:
S142 is described further:
Page info to be converted is injected along these mapping relations this minimum repetitive of target pages, obtains and change minimum repetitive.
Further, after the mapping relations of the page info unit got between the page to be converted and target pages, the information of the page to be converted is injected in the minimum repetitive of target pages along these mapping relations, then carries out replicating extension, obtain dress and change minimum repetitive.
The minimum repetitive process that the information of the page to be converted is injected into target pages is as follows:
S143 is described further:
Carry out conversion of page according to the minimum repetitive of this conversion, complete conversion of page.
Further, descend the information injected by the page to be converted in repetitive that the information in this page original is substituted most according to this conversion, it is achieved conversion of page processes, and completes conversion of page.
Fig. 4 is the structure composition schematic diagram of the web page converting system of the embodiment of the present invention, and as shown in Figure 4, this system includes:
Document processing module 11: for adopting html document object model that the html file of input is processed, obtain document object model tree;
Page info unit acquisition module 12: for the document object model tree is carried out subtree process, obtains, according to result, the page info unit that this subtree is corresponding;
Module 13 is set up in mapping: for this page info unit is carried out similarity process, set up the mapping relations between this page info unit according to similarity result;
Page conversion module 14: for according to these mapping relations, page info to be converted being injected in target pages along these mapping relations, completes conversion of page.
Preferably, this html file includes html file to be converted and target html file.
Preferably, Fig. 5 is the structure composition schematic diagram of the page info unit acquisition module of the embodiment of the present invention, as it is shown in figure 5, this page info unit acquisition module 12 includes:
Traversal processing unit 121: for the document object model tree being carried out traversal processing, obtain the leaf node of the document object model tree;
The degree of depth is determined and expanding element 122: being used for determining that this leaf node current depth is d, according to this degree of depth d, expansion depth is the leaf node of d-1;
Frequency of occurrence computing unit 123: for calculating the subtree frequency of occurrence in the document object model tree in the leaf node that this degree of depth is d-1;
Judging unit 124: if be not less than threshold value for this frequency of occurrence of judgement, then continue to extend this leaf node degree of depth, if this frequency of occurrence is less than threshold value, then export this subtree as DOM Document Object Model subtree;
Page info unit acquiring unit 125: obtain the page info unit that the document object model subtree is corresponding.
Preferably, this mapping is set up module 13 and is included:
Similar retrieval unit: for this page info unit being carried out similarity process, obtain two page info units that in page info unit, similarity is the highest;
Unit is set up in mapping: for being associated processing according to two page info units that this similarity is the highest, obtain the mapping relations between page info unit.
Preferably, Fig. 6 is the structure composition schematic diagram of the page conversion module of the embodiment of the present invention, and as shown in Figure 6, this page conversion module 14 includes:
Subtree processing unit 141: for DOM Document Object Model subtree is processed, obtains the minimum repetitive of the document object model subtree;
Information injection unit 142: for according to these mapping relations, page info to be converted being injected along these mapping relations this minimum repetitive of target pages, obtain and change minimum repetitive;
Conversion of page unit 143: for carrying out conversion of page according to the minimum repetitive of this conversion, complete conversion of page.
Specifically, the operation principle of the system related functions module of the embodiment of the present invention referring to the associated description of embodiment of the method, can repeat no more here.
In implementing embodiment of the present invention process, reduce webpage design personnel to the UI grounding in basic skills designed, and the webpage of the conversion produced has only to designer and slightly revises and adjust, webpage design personnel can select Template web page intuitively so that the page of generation can meet the hobby of user to greatest extent;In whole process, user has only to select original web page and Template web page, and whole process all need not manually participate in, and user has only to wait new web page file generated, easy and simple to handle, fast operation, especially because electricity business's structure of web page is often similar, its structure is more similar, and the effect of conversion is more good.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment can be by the hardware that program carrys out instruction relevant and completes, this program can be stored in a computer-readable recording medium, storage medium may include that read only memory (ROM, ReadOnlyMemory), random access memory (RAM, RandomAccessMemory), disk or CD etc..
Additionally, a kind of web page the conversion method above embodiment of the present invention provided and system are described in detail, principles of the invention and embodiment are set forth by specific case used herein, and the explanation of above example is only intended to help to understand method and the core concept thereof of the present invention;Simultaneously for one of ordinary skill in the art, according to the thought of the present invention, all will change in specific embodiments and applications, in sum, this specification content should not be construed as limitation of the present invention.

Claims (10)

1. a web page conversion method, it is characterised in that described method includes:
Adopt html document object model that the html file of input is processed, obtain document object model tree;
Described document object model tree is carried out subtree process, obtains, according to result, the page info unit that described subtree is corresponding;
Described page info unit is carried out similarity process, sets up the mapping relations between described page info unit according to similarity result;
According to described mapping relations, page info to be converted is injected in target pages along described mapping relations, completes conversion of page.
2. web page conversion method according to claim 1, it is characterised in that described html file includes html file to be converted and target html file.
3. web page conversion method according to claim 1, it is characterised in that described described document object model tree is carried out subtree process, obtains, according to result, the page info unit that described subtree is corresponding, including:
Described document object model tree is carried out traversal processing, obtains the leaf node of described document object model tree;
Determining that described leaf node current depth is d, according to described degree of depth d, expansion depth is the leaf node of d-1;
Calculate the frequency of occurrence in described document object model tree of the subtree in the leaf node that the described degree of depth is d-1;
If judging when described frequency of occurrence is not less than threshold value, then continue to extend the described leaf node degree of depth, if described frequency of occurrence is less than threshold value, then export described subtree as DOM Document Object Model subtree;
Obtain the page info unit that described DOM Document Object Model subtree is corresponding.
4. web page conversion method according to claim 1, it is characterised in that described described page info unit is carried out similarity process, sets up the mapping relations between described page info unit according to similarity result, including:
Described page info unit is carried out similarity process, obtains two page info units that in described page info unit, similarity is the highest;
It is associated two the highest for described similarity page info units processing, obtains the mapping relations between page info unit.
5. web page conversion method according to claim 1, it is characterised in that described according to the described mapping relations between described page info unit, injects in target pages by page info to be converted along described mapping relations, including:
DOM Document Object Model subtree is processed, obtains the minimum repetitive of described DOM Document Object Model subtree;
Page info to be converted is injected along described mapping relations the described minimum repetitive of target pages, obtains and change minimum repetitive;
Carry out conversion of page according to the minimum repetitive of described conversion, complete conversion of page.
6. a web page converting system, it is characterised in that described system includes:
Document processing module: for adopting html document object model that the html file of input is processed, obtain document object model tree;
Page info unit acquisition module: for described document object model tree is carried out subtree process, obtains, according to result, the page info unit that described subtree is corresponding;
Module is set up in mapping: for described page info unit is carried out similarity process, set up the mapping relations between described page info unit according to similarity result;
Page conversion module: for according to described mapping relations, page info to be converted being injected in target pages along described mapping relations, completes conversion of page.
7. web page converting system according to claim 6, it is characterised in that described html file includes html file to be converted and target html file.
8. web page converting system according to claim 6, it is characterised in that described page info unit acquisition module includes:
Traversal processing unit: for described document object model tree is carried out traversal processing, obtain the leaf node of described document object model tree;
The degree of depth is determined and expanding element: being used for determining that described leaf node current depth is d, according to described degree of depth d, expansion depth is the leaf node of d-1;
Frequency of occurrence computing unit: for calculating the subtree frequency of occurrence in described document object model tree in the leaf node that the described degree of depth is d-1;
Judging unit: if be not less than threshold value for the described frequency of occurrence of judgement, then continue to extend the described leaf node degree of depth, if described frequency of occurrence is less than threshold value, then export described subtree as DOM Document Object Model subtree;
Page info unit acquiring unit: obtain the page info unit that described DOM Document Object Model subtree is corresponding.
9. web page converting system according to claim 6, it is characterised in that described mapping is set up module and included:
Similar retrieval unit: for described page info unit being carried out similarity process, obtain two page info units that in page info unit, similarity is the highest;
Unit is set up in mapping: for being associated processing according to two page info units that described similarity is the highest, obtain the mapping relations between page info unit.
10. web page converting system according to claim 6, it is characterised in that described page conversion module includes:
Subtree processing unit: for DOM Document Object Model subtree is processed, obtain the minimum repetitive of described DOM Document Object Model subtree;
Information injection unit: for according to described mapping relations, page info to be converted being injected along described mapping relations the described minimum repetitive of target pages, obtain and change minimum repetitive;
Conversion of page unit: for carrying out conversion of page according to the minimum repetitive of described conversion, complete conversion of page.
CN201610154451.XA 2016-03-16 2016-03-16 Web page conversion method and system Active CN105740475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610154451.XA CN105740475B (en) 2016-03-16 2016-03-16 Web page conversion method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610154451.XA CN105740475B (en) 2016-03-16 2016-03-16 Web page conversion method and system

Publications (2)

Publication Number Publication Date
CN105740475A true CN105740475A (en) 2016-07-06
CN105740475B CN105740475B (en) 2020-04-28

Family

ID=56251099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610154451.XA Active CN105740475B (en) 2016-03-16 2016-03-16 Web page conversion method and system

Country Status (1)

Country Link
CN (1) CN105740475B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991131A (en) * 2017-03-08 2017-07-28 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device
CN107862328A (en) * 2017-10-31 2018-03-30 平安科技(深圳)有限公司 The regular execution method of information word set generation method and rule-based engine

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1295293A (en) * 1999-11-05 2001-05-16 国际商业机器公司 Method and system for selecting envelope from wide world web service device for users
CN101261632A (en) * 2008-04-08 2008-09-10 杭州电子科技大学 FrontPage operation paper evaluation method based on HTML grammar tree
CN102890681A (en) * 2011-07-20 2013-01-23 阿里巴巴集团控股有限公司 Method and system for generating webpage structure template
US20130091414A1 (en) * 2011-10-11 2013-04-11 Omer BARKOL Mining Web Applications
US20140236968A1 (en) * 2011-10-31 2014-08-21 Li-Mei Jiao Discrete Wavelet Transform Method for Document Structure Similarity
CN104866527A (en) * 2015-04-24 2015-08-26 美通云动(北京)科技有限公司 Dynamic webpage template matching method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1295293A (en) * 1999-11-05 2001-05-16 国际商业机器公司 Method and system for selecting envelope from wide world web service device for users
CN101261632A (en) * 2008-04-08 2008-09-10 杭州电子科技大学 FrontPage operation paper evaluation method based on HTML grammar tree
CN102890681A (en) * 2011-07-20 2013-01-23 阿里巴巴集团控股有限公司 Method and system for generating webpage structure template
US20130091414A1 (en) * 2011-10-11 2013-04-11 Omer BARKOL Mining Web Applications
US20140236968A1 (en) * 2011-10-31 2014-08-21 Li-Mei Jiao Discrete Wavelet Transform Method for Document Structure Similarity
CN104866527A (en) * 2015-04-24 2015-08-26 美通云动(北京)科技有限公司 Dynamic webpage template matching method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991131A (en) * 2017-03-08 2017-07-28 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device
CN107862328A (en) * 2017-10-31 2018-03-30 平安科技(深圳)有限公司 The regular execution method of information word set generation method and rule-based engine
WO2019085075A1 (en) * 2017-10-31 2019-05-09 平安科技(深圳)有限公司 Information element set generation method and rule execution method based on rule engine

Also Published As

Publication number Publication date
CN105740475B (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN106547739B (en) A kind of text semantic similarity analysis method
US11995409B2 (en) Content generation using target content derived modeling and unsupervised language modeling
CN108681557B (en) Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint
WO2019041521A1 (en) Apparatus and method for extracting user keyword, and computer-readable storage medium
Zheng et al. Template-independent news extraction based on visual consistency
CN107992542A (en) A kind of similar article based on topic model recommends method
CN103810251A (en) Method and device for extracting text
WO2024078105A1 (en) Method for extracting technical problem in patent literature and related device
CN104281565A (en) Semantic dictionary constructing method and device
CN113867694A (en) Method and system for intelligently generating front-end code
CN114997288A (en) Design resource association method
CN105095285B (en) Digital publication guide to visitors catalogue treating method and apparatus
CN103092973B (en) information extraction method and device
CN113434659B (en) Implicit conflict sensing method in collaborative design process
CN105740475A (en) Web page transformation method and system
CN113239256B (en) Method for generating website signature, method and device for identifying website
CN104794209A (en) Chinese microblog sentiment classification method and system based on Markov logic network
CN105320641B (en) Text verification method and user terminal
CN117111909A (en) Code automatic generation method, system, computer equipment and storage medium
CN115269923A (en) Method, system, equipment and medium for processing webpage text area and text information
Pu et al. A vision-based approach for deep web form extraction
YesuRaju et al. A language independent web data extraction using vision based page segmentation algorithm
CN112926318A (en) Method for extracting new sentiment words of online shopping comments based on syntactic analysis
CN106802914B (en) Heuristic multi-feature rule set webpage blocking method
Zhang et al. Automatic web news extraction based on DS theory considering content topics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant