CN108021598A

CN108021598A - Page extraction template matching process, device and server

Info

Publication number: CN108021598A
Application number: CN201610977262.2A
Authority: CN
Inventors: 吴伟勇
Original assignee: Guangzhou Dongjing Computer Technology Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2016-11-04
Filing date: 2016-11-04
Publication date: 2018-05-11
Anticipated expiration: 2036-11-04
Also published as: CN108021598B

Abstract

The present invention provides a kind of page extraction template matching process, device and server, the described method includes：The page data of target pages is obtained, extracts the feature field in the page data；The matching label of the target pages is generated according to the feature field；Default extraction template corresponding with the target pages is searched in default page extraction template storehouse using the matching label；Template matching results mark is carried out to target pages according to lookup result.So, the matching label lookup default extraction template corresponding with the target pages generated according to feature field, the matching range of default extraction template is substantially reduced, more rapidly and accurately realizes the matching of target pages and default extraction template, improves the efficiency of page transcoding service.

Description

Page extraction template matching process, device and server

Technical field

The present invention relates to technical field of network information, in particular to a kind of page extraction template matching process, device And server.

Background technology

With the rapid proliferation of intelligent mobile terminal, user is usually through mobile terminal come the page that browses web sites.But at present It is still net that the browser based on PC (personal computer) version is presented that the webpage that major website provides to the user, which has many, Page, is adapted to WAP versions (Wireless Application Protocol, Wireless Application Protocol) webpage of browser of mobile terminal It is less.Since the factors such as the screen size of mobile terminal, mobile flow influence, cause directly to browse in browser of mobile terminal The user experience of traditional WEB editions webpage is poor.In consideration of it, during user browses web sites the page, search engine or browse The Website page that device often browses user using page transcoding technology carries out transcoding, so that Website page adapts to the clear of mobile terminal Device of looking at is shown.

Transcoding service generally extracts user's content of pages to be browsed by server by extraction template, and content of pages is sieved After choosing filtering, it is laid out again and is shown to user.So so that the layout of the page is more suitable for mobile terminal and is shown, also Surfing flow is saved for user, improves page response speed.

Study and find through inventor, at present now, website quantity is various, it is necessary to make in traditional page Transcoding Scheme Attempted to carry out content of pages extraction to the page for increasing website newly successively with multiple default extraction templates, to judge that extraction template is The no page suitable for the newly-increased website.Time-consuming for the matching process of conventional method extraction template, takes server operation resource It is more.

The content of the invention

In view of this, more rapidly and accurately to realize the matching of site page and extraction template, the present invention seeks to carry For a kind of page extraction template matching process, applied to server, the described method includes：

The page data of target pages is obtained, extracts the feature field in the page data.

The matching label of the target pages is generated according to the feature field.

Default pumping corresponding with the target pages is searched in default page extraction template storehouse using the matching label Modulus plate.

Template matching results mark is carried out to target pages according to lookup result.

Another object of the present invention is to provide a kind of page extraction template coalignment, applied to server, the dress Put including：

Page feature acquisition module, for obtaining the page data of target pages, extracts the feature in the page data Field.

Tag generation module is matched, for generating the matching label of the target pages according to the feature field.

Searching module, for being searched using the matching label in default page extraction template storehouse and the target pages Corresponding default extraction template.

Matching result mark module, for carrying out template matching results mark to target pages according to lookup result.

Another object of the present invention is to provide a kind of server, the server includes：

Memory.

Processor.And

Page extraction template coalignment, described device are installed in the memory and including one or more by described The software function module that processor performs, described device include：

In terms of existing technologies, the invention has the advantages that：

A kind of page extraction template matching process, device and server provided by the invention, by extracting target pages Feature field in pagefile, matching label, and the matching label lookup passed through and the page object are generated according to feature field The corresponding default extraction template in face.So as to substantially reduce the matching range of default extraction template, more rapidly and accurately realize The matchings of target pages and default extraction template, improve the efficiency of page transcoding service.

Brief description of the drawings

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair The restriction of scope, for those of ordinary skill in the art, without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 is the schematic diagram of server provided in an embodiment of the present invention；

Fig. 2 is the flow diagram of page extraction template matching process provided in an embodiment of the present invention；

Fig. 3 is the sub-step flow diagram of the step S110 shown in Fig. 2；

Fig. 4 is the high-level schematic functional block diagram of page extraction template coalignment provided in an embodiment of the present invention；

Fig. 5 is the block diagram for the submodule that the page feature acquisition module shown in Fig. 4 includes；

Fig. 6 is the block diagram for the submodule that the matching result mark module shown in Fig. 4 includes.

Icon：100- servers；110- page extraction template coalignments；111- page feature acquisition modules；1111- Head tag extraction submodules；The non-feature fields of 1112- reject submodule；112- matches tag generation module；1121- first is marked Know submodule；1122- second identifier submodules；113- searching modules；114- matching result mark modules；120- memories； 130- processors；140- communication units.

Embodiment

To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments.The present invention implementation being usually described and illustrated herein in the accompanying drawings The component of example can be arranged and designed with a variety of configurations.

Therefore, below the detailed description of the embodiment of the present invention to providing in the accompanying drawings be not intended to limit it is claimed The scope of the present invention, but be merely representative of the present invention selected embodiment.Based on the embodiments of the present invention, this area is common Technical staff's all other embodiments obtained without creative efforts, belong to the model that the present invention protects Enclose.

It should be noted that：Similar label and letter represents similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing.

In the description of the present invention, it is necessary to which explanation, term " first ", " second ", " the 3rd " etc. are only used for differentiation and retouch State, and it is not intended that instruction or hint relative importance.

In the description of the present invention, it is also necessary to explanation, unless otherwise clearly defined and limited, term " setting ", " installation ", " connected ", " connection " should be interpreted broadly, for example, it may be fixedly connected or be detachably connected, or one Connect body；Can mechanically connect or be electrically connected；It can be directly connected, can also be indirect by intermediary It is connected, can is the connection inside two elements.For the ordinary skill in the art, on being understood with concrete condition State the concrete meaning of term in the present invention.

Fig. 1 is refer to, Fig. 1 is the schematic diagram of server 100 provided in an embodiment of the present invention.The server 100 includes Page extraction template coalignment 110, memory 120, processor 130, communication unit 140.

The memory 120, processor 130 and 140 each element of communication unit are directly or indirectly electrical between each other Connection, to realize the transmission of data or interaction.For example, these elements can pass through one or more communication bus or letter between each other Number line, which is realized, to be electrically connected.The page extraction template coalignment 110 include it is at least one can be with software or firmware (Firmware) form is stored in the memory 120 or is solidificated in the operating system of the server 100 Software function module in (Operating System, OS).The processor 130 is used to perform to deposit in the memory 120 The executable module of storage, such as software function module and computer journey included by the page extraction template coalignment 110 Sequence etc..

Wherein, the memory 120 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only storage (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..Wherein, memory 120 is used for storage program, the processor 130 after execute instruction is received, Perform described program.

The communication unit 140 is used to establish communication connection between the server 100 and other-end, with into line number According to transmission and interaction.

Fig. 2 is refer to, Fig. 2 is that the flow of the page extraction template matching process applied to the server 100 shown in Fig. 1 is shown It is intended to, the described method comprises the following steps.

Step S110, obtains the page data of target pages, extracts the feature field in the page data.

In general, the page of website is all based on the html file for the specification that html language is write.Through inventor Research finds, type is identical, small and medium-sized web sites similar in display structure, often with there is the same or similar html data structure.It is right The content of the page can be carried using same extraction template in these websites with same or similar html data structure Take.

In addition, generally there is identical feature field in the same or like html file of data structure, therefore in this implementation In example, the server 100 extracts the feature field of the html file of target pages, and mould is extracted for searching the corresponding page Plate.

Specifically, Fig. 3 is refer to, to realize the purpose of step S110, in the present embodiment, step S110 can include sub-step Rapid S111 and sub-step S112, below sub-paragraphs S111 and sub-step S112 be described in detail.

Sub-step S111, obtains the html page file of target pages, extracts the head marks in the html page file Label.

Specifically, html file is generally divided into head label segments and body label segments.

The effect of head labels includes：Script, instruction browser that html file is quoted is defined to find style sheet position, carry For metamessage etc..Head labels describe each attribute and information of page HTML file, including the page show title, in net Position in network and with the relation of alternative document etc..Its general html file of website with same or similar data structure Head labels have identical structure and similar content.

Therefore in this embodiment, the server 100 extracts the head labels of html file, and passes through following sub-step S112 To obtain the feature field in the head labels.

Sub-step S112, rejects non-feature field from the head labels, and using remaining field as the page object The feature field in face.

Specifically, by taking a novel class website as an example, its head label is as follows：

<head>

<title>【Page title】</title>

</head>

The head labels include multiple and different subtabs, such as title (title) subtab and metadata (meta) subtab.,

The title subtabs are used for the title for defining the page, and the text within title subtabs can all browse Shown on the title bar and taskbar (such as taskbar of Microsoft Windows) of device.For different websites, its title Content in subtab is usually different.

The meta subtabs are used for the metamessage (Meta-Information) for providing the page, such as search engine With the page-describing and keyword of update frequency.Meta subtabs are used to describe the page, wherein keyword (keywords) parameter For showing the keyword of the page, the main contents brief introduction that page-describing (description) parameter is used to show the page (is such as plucked Will).The content of keywords parameters and description parameters (content) value is not generally in the head labels of different websites Together.

Inventor it has been investigated that, have same or similar structure html file head labels, except title subtabs In text and meta subtabs in keywords parameters and description parameters content beyond, other parts are homogeneous Together.Therefore in this embodiment, after the server 100 extracts the content in the head labels, reject in wherein title subtabs Text message and meta subtabs in keywords parameters and the content of description parameter items, by the head labels Remaining part is as the feature field.

In this way, the server 100 can be in the page with same or similar html data structure by step S110 In extract the shared feature field, search matched basis as step afterwards.

Step S120, the matching label of the target pages is generated according to the feature field.

Specifically, the feature field is converted into corresponding Hash (Hash) value by the server 100, and by described in Matching label of the hash value as the target pages.In the present embodiment, the feature field can be subjected to md5 computings, Base64 codings are carried out to the result that computing obtains, using the hash value formed after coding as the matching label.

In this way, the relatively large critical field of data volume is converted to data by the server 100 by step S120 After measuring the relatively small matching label, server 100 passes through the matching label lookup described in step after being more conducive to With the matched default extraction template of target pages.

Step S130, is searched corresponding with the target pages using the matching label in default page extraction template storehouse Default extraction template.

For the specific html data structure of different type website, corresponding default pumping is stored with the server 100 Modulus plate, the default extraction template carry template label, and the template label is obtained by the specific html data structure, Specific steps refer to the present embodiment step S110 and step S120, and details are not described herein again.

The server 100 searches the matching with the target pages successively in the default page extraction template storehouse The consistent template label of label, will default extraction template corresponding with the template label as matched default with the target pages Extraction template.

Target pages are carried out template matching results mark by step S140 according to lookup result.

In the present embodiment, the server 100 carries out the target pages according to the lookup result of step S130 Mark with result, it is specific as follows.

If do not found in the default page extraction template storehouse and the matched default extraction mould of the target pages The target pages, then is identified as the page of the extraction template of no adaptation by plate.So, then it represents that the target pages are not matched Default extraction template according to the html data structure of the target pages, it is necessary to establish new extraction template again.

If found in the default page extraction template storehouse and the matched default extraction mould of the target pages Plate, then carry out extraction verification by the default extraction template found and the target pages, and according to verification result to described Target pages are identified accordingly.

Specifically, the default extraction template includes being used for the multiple extraction items for extracting different content of pages, the service Device 100 by searching for the default extraction template to target pages carry out content of pages extraction, verify the page object Whether face includes page data corresponding with the multiple extraction item.

If there are the target pages to include page data corresponding with the multiple extraction item, then it represents that to be verified. In this way, the target pages are further identified as the page of the extraction template of adaptation by the server 100, and record the target The domain name of the page and the correspondence of the default extraction template, so that when next time, needs showed the target pages, can Directly invoke after corresponding default extraction template carries out the extraction of content of pages to the target pages and rendered and shown.

If page data corresponding with the multiple extraction item is not included in the target pages, then it represents that verification is obstructed Cross, which is identified as the page of the extraction template of no adaptation by the server 100.

Fig. 4 is refer to, the present embodiment also provides a kind of page extraction template coalignment 110, applied to server 100, The page extraction template coalignment 110 includes page feature acquisition module 111, matching tag generation module 112, searches mould Block 113 and matching result mark module 114.

The page feature acquisition module 111, for obtaining the page data of target pages, is extracted in the page data Feature field.In the present embodiment, which can be used for performing the step S110 shown in Fig. 2, on this The specific descriptions of page feature acquisition module 111 can join the description to the step S110.

Specifically, refer to Fig. 5, the page feature acquisition module 111 include head tag extractions submodule 1111 and Non- feature field rejects submodule 1112.

The head tag extractions submodule 1111, for obtaining the html page file of target pages, described in extraction Head labels in html page file.In the present embodiment, head tag extractions submodule 1111 can be used for performing shown in Fig. 3 Step S111, the specific descriptions on the head tag extractions submodule 1111 can join the description to the step S111.

The non-feature field rejects submodule 1112, for rejecting non-feature field from the head labels, and will Feature field of the remaining field as the target pages.

In the present embodiment, the non-feature field is rejected the non-feature field that submodule 1112 is rejected and is marked including the head Page keyword parameter in text message and metadata subtab and page-describing parameter in label in title subtab it is interior Hold.In the present embodiment, non-feature field rejects submodule 1112 and can be used for performing the step S112 shown in Fig. 3, on the non-spy The specific descriptions of sign field rejecting submodule 1112 can join the description to the step S112.

The matching tag generation module 112, for generating the matching label of the target pages according to the feature field.

Specifically, the matching tag generation module 112 generates the mode for matching label and includes：

The feature field is converted into corresponding cryptographic Hash, the matching mark using the cryptographic Hash as the target pages Label.In the present embodiment, matching tag generation module 112 can be used for performing the step S120 shown in Fig. 2, be given birth on the matching label Specific descriptions into module 112 can join description to the step S120.

The searching module 113, for being searched using the matching label in default page extraction template storehouse with being somebody's turn to do The corresponding default extraction template of target pages.

Specifically, in the present embodiment, the default extraction template includes presets the corresponding template of extraction template with this Label.The searching module 113 is searched to be included with the mode of the matched default extraction template of the target pages：

Search the mould consistent with the matching label of the target pages successively in the default page extraction template storehouse Plate label, will default extraction template corresponding with the template label as with the matched default extraction template of the target pages.This In embodiment, the searching module 113 can be used for performing the step S130 shown in Fig. 2, and specific on the searching module 113 is retouched State the description that can join to the step S130.

The matching result mark module 114, for carrying out template matching results mark to target pages according to lookup result Know.In the present embodiment, the matching result mark module 114 can be used for performing the step S140 shown in Fig. 2, on the matching knot The specific descriptions of fruit mark module 114 can join the description to the step S130.

Specifically, Fig. 6 is refer to, the matching result mark module 114 may include first flag submodule 1121 and Two labeling submodules 1122.

The first flag submodule 1121, for when do not found in the default page extraction template storehouse with it is described During the matched default extraction template of target pages, which is identified as to the page of the extraction template of no adaptation.

The second identifier submodule 1122, for when found in the default page extraction template storehouse with it is described During the matched default extraction template of target pages, the default extraction template and the target pages for finding are subjected to extraction and are tested Card, and the target pages are identified accordingly according to verification result.

In the present embodiment, the default extraction template includes being used for the multiple extraction items for extracting different content of pages, institute State second identifier submodule 1122 and the default extraction template found and the target pages are subjected to extraction verification, and according to The mode that verification result identifies the target pages accordingly includes：

By searching for the default extraction template to target pages carry out content of pages extraction, verify the target Whether the page includes page data corresponding with the multiple extraction item；

If there are the target pages to include page data corresponding with the multiple extraction item, by the target pages mark Know to there is the page of the extraction template of adaptation, and record the domain name and the correspondence of the default extraction template of the target pages；

If page data corresponding with the multiple extraction item is not included in the target pages, by the target pages mark Know the page of the extraction template for no adaptation.

In conclusion a kind of page extraction template matching process, device and server 100 provided by the invention, by carrying The feature field in the pagefile of target pages is taken, matching label is generated according to feature field, and the matching label passed through is looked into Look for default extraction template corresponding with the target pages.So as to substantially reduce the matching range of default extraction template, more accelerate Speed realizes the matching of target pages and default extraction template exactly, improves the efficiency of page transcoding service.

In embodiment provided herein, it should be understood that disclosed apparatus and method, can also be by other Mode realize.Device embodiment described above is only schematical, for example, the flow chart and block diagram in attached drawing are shown The device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product, function And operation.At this point, each square frame in flow chart or block diagram can represent one of a module, program segment or code Point, a part for the module, program segment or code includes one or more and is used for realization the executable of defined logic function Instruction.It should also be noted that at some as in the implementation replaced, the function of being marked in square frame can also be with different from attached The order marked in figure occurs.For example, two continuous square frames can essentially perform substantially in parallel, they also may be used sometimes To perform in the opposite order, this is depending on involved function.It is it is also noted that each in block diagram and/or flow chart The combination of square frame and the square frame in block diagram and/or flow chart, function or the dedicated of action can be based on as defined in execution The system of hardware is realized, or can be realized with the combination of specialized hardware and computer instruction.

In addition, each function module in each embodiment of the present invention can integrate to form an independent portion Point or modules individualism, can also two or more modules be integrated to form an independent part.

If the function is realized in the form of software function module and is used as independent production marketing or in use, can be with It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part to contribute to the prior art or the part of the technical solution can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be People's computer, server, or network equipment etc.) perform all or part of step of each embodiment the method for the present invention. And foregoing storage medium includes：USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.

It should be noted that herein, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to Non-exclusive inclusion, so that process, method, article or equipment including a series of elements not only will including those Element, but also including other elements that are not explicitly listed, or further include as this process, method, article or equipment Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that Also there are other identical element in process, method, article or equipment including the key element.

The foregoing is only a preferred embodiment of the present invention, is not intended to limit the invention, for the skill of this area For art personnel, the invention may be variously modified and varied.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should all be included in the protection scope of the present invention.It should be noted that：Similar label and letter exists Similar terms is represented in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, is then not required in subsequent attached drawing It is further defined and is explained.

The above description is merely a specific embodiment, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention answers the scope of the claims of being subject to.

Claims

A kind of 1. page extraction template matching process, applied to server, it is characterised in that the described method includes：

The page data of target pages is obtained, extracts the feature field in the page data；

The matching label of the target pages is generated according to the feature field；

Default extraction mould corresponding with the target pages is searched in default page extraction template storehouse using the matching label Plate；

Template matching results mark is carried out to target pages according to lookup result.
2. according to the method described in claim 1, it is characterized in that, the page data for obtaining target pages, described in extraction The step of feature field in page data, includes：

The html page file of target pages is obtained, extracts the head labels in the html page file；

Non- feature field, and the feature field using remaining field as the target pages are rejected from the head labels.
3. according to the method described in claim 2, it is characterized in that, the non-feature field of the rejecting includes：

Page keyword parameter and the page in text message and metadata subtab in the head labels in title subtab The content of characterising parameter.
4. according to the method described in claim 1, it is characterized in that, described generate the target pages according to the feature field The step of matching label includes：

The feature field is converted into corresponding cryptographic Hash, the matching label using the cryptographic Hash as the target pages.
5. according to the method described in claim 1, it is characterized in that, the default extraction template has presets extraction template with this Corresponding template label；It is described to be searched using the matching label in default page extraction template storehouse and the target pages pair The step of default extraction template answered, includes：

Search the template mark consistent with the matching label of the target pages successively in the default page extraction template storehouse Label, will default extraction template corresponding with the template label as with the matched default extraction template of the target pages.
6. according to the method described in claim 1, it is characterized in that, described carry out template according to lookup result to target pages The step of being identified with result includes：

If do not found in the default page extraction template storehouse with the matched default extraction template of the target pages, The target pages are identified as to the page of the extraction template of no adaptation；

If found in the default page extraction template storehouse with the matched default extraction template of the target pages, The default extraction template found and the target pages are subjected to extraction verification, and according to verification result to the page object Face is identified accordingly.
7. according to the method described in claim 6, it is characterized in that, the default extraction template includes being used to extract the different pages Multiple extraction items of content；Described the step of carrying out extracting verification by the default extraction template found and the target pages Including：

By searching for the default extraction template to target pages carry out content of pages extraction, verify the target pages Whether with the multiple extraction item corresponding page data is included；

If there are the target pages to include page data corresponding with the multiple extraction item, which is identified as There is the page of the extraction template of adaptation, and record the domain name and the correspondence of the default extraction template of the target pages；

If not including page data corresponding with the multiple extraction item in the target pages, which is identified as The page of extraction template without adaptation.
A kind of 8. page extraction template coalignment, applied to server, it is characterised in that described device includes：

Page feature acquisition module, for obtaining the page data of target pages, extracts the feature field in the page data；

Tag generation module is matched, for generating the matching label of the target pages according to the feature field；

Searching module, it is corresponding with the target pages for being searched using the matching label in default page extraction template storehouse Default extraction template；

Matching result mark module, for carrying out template matching results mark to target pages according to lookup result.
9. device according to claim 8, it is characterised in that the page feature acquisition module includes：

Head tag extraction submodules, for obtaining the html page file of target pages, are extracted in the html page file Head labels；

Non- feature field rejects submodule, makees for rejecting non-feature field from the head labels, and by remaining field For the feature field of the target pages.
10. device according to claim 9, it is characterised in that the non-feature field rejects the non-spy that submodule is rejected Sign field includes：

Page keyword parameter and the page in text message and metadata subtab in the head labels in title subtab The content of characterising parameter.
11. device according to claim 8, it is characterised in that the matching tag generation module generation matching mark The mode of label includes：

The feature field is converted into corresponding cryptographic Hash, the matching label using the cryptographic Hash as the target pages.
12. device according to claim 8, it is characterised in that the default extraction template has and the default extraction mould The corresponding template label of plate；The searching module is searched to be included with the mode of the matched default extraction template of the target pages：

Search the template mark consistent with the matching label of the target pages successively in the default page extraction template storehouse Label, will default extraction template corresponding with the template label as with the matched default extraction template of the target pages.
13. device according to claim 8, it is characterised in that the matching result mark module includes：

First flag submodule, for being matched with the target pages when not found in the default page extraction template storehouse Default extraction template when, which is identified as to the page of the extraction template of no adaptation；

Second identifier submodule, for being matched with the target pages when having been found in the default page extraction template storehouse Default extraction template when, the default extraction template found and the target pages are subjected to extraction verification, and according to testing Card result identifies the target pages accordingly.
14. device according to claim 13, it is characterised in that the default extraction template includes being used to extract not same page Multiple extraction items of face content；The second identifier submodule by the default extraction template found and the target pages into Row extracts verification, and is included according to the mode that verification result identifies the target pages accordingly：

By searching for the default extraction template to target pages carry out content of pages extraction, verify the target pages Whether with the multiple extraction item corresponding page data is included；

If there are the target pages to include page data corresponding with the multiple extraction item, which is identified as There is the page of the extraction template of adaptation, and record the domain name and the correspondence of the default extraction template of the target pages；

If not including page data corresponding with the multiple extraction item in the target pages, which is identified as The page of extraction template without adaptation.
15. a kind of server, it is characterised in that the server includes：

Memory；

Processor；And

Page extraction template coalignment, described device are installed in the memory and including one or more by the processing The software function module that device performs, described device include：

Page feature acquisition module, for obtaining the page data of target pages, extracts the feature field in the page data；

Tag generation module is matched, for generating the matching label of the target pages according to the feature field；

Searching module, it is corresponding with the target pages for being searched using the matching label in default page extraction template storehouse Default extraction template；

Matching result mark module, for carrying out template matching results mark to target pages according to lookup result.
16. server according to claim 15, it is characterised in that the page feature acquisition module includes：

Head tag extraction submodules, for obtaining the html page file of target pages, are extracted in the html page file Head labels；

Non- feature field rejects submodule, makees for rejecting non-feature field from the head labels, and by remaining field For the feature field of the target pages.
17. server according to claim 16, it is characterised in that the non-feature field rejects the non-of submodule rejecting Feature field includes：

Page keyword parameter and the page in text message and metadata subtab in the head labels in title subtab The content of characterising parameter.
18. server according to claim 15, it is characterised in that the matching tag generation module generates the matching The mode of label includes：

The feature field is converted into corresponding cryptographic Hash, the matching label using the cryptographic Hash as the target pages.
19. server according to claim 15, it is characterised in that the matching result mark module includes：

First flag submodule, for being matched with the target pages when not found in the default page extraction template storehouse Default extraction template when, which is identified as to the page of the extraction template of no adaptation；

Second identifier submodule, for being matched with the target pages when having been found in the default page extraction template storehouse Default extraction template when, the default extraction template found and the target pages are subjected to extraction verification, and according to testing Card result identifies the target pages accordingly.
20. server according to claim 19, it is characterised in that the default extraction template includes being used to extract difference Multiple extraction items of content of pages；The second identifier submodule is by the default extraction template found and the target pages Extraction verification is carried out, and is included according to the mode that verification result identifies the target pages accordingly：

By searching for the default extraction template to target pages carry out content of pages extraction, verify the target pages Whether with the multiple extraction item corresponding page data is included；

If there are the target pages to include page data corresponding with the multiple extraction item, which is identified as There is the page of the extraction template of adaptation, and record the domain name and the correspondence of the default extraction template of the target pages；

If not including page data corresponding with the multiple extraction item in the target pages, which is identified as The page of extraction template without adaptation.