CN108255891A - A kind of method and device for differentiating type of webpage - Google Patents

A kind of method and device for differentiating type of webpage Download PDF

Info

Publication number
CN108255891A
CN108255891A CN201611270198.0A CN201611270198A CN108255891A CN 108255891 A CN108255891 A CN 108255891A CN 201611270198 A CN201611270198 A CN 201611270198A CN 108255891 A CN108255891 A CN 108255891A
Authority
CN
China
Prior art keywords
webpage
type
standard
ratio
web pages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611270198.0A
Other languages
Chinese (zh)
Other versions
CN108255891B (en
Inventor
郑立颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201611270198.0A priority Critical patent/CN108255891B/en
Publication of CN108255891A publication Critical patent/CN108255891A/en
Application granted granted Critical
Publication of CN108255891B publication Critical patent/CN108255891B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9562Bookmark management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses it is a kind of differentiate type of webpage method, including:Obtain the page info of webpage to be judged;Heading message is extracted from page info;Judge that preset keyword is the keyword comprising type of webpage whether comprising preset keyword in heading message;If the type of webpage of webpage to be judged is not obtained based on the page structure information corresponding to page info and/or the heading message comprising preset keyword in heading message.The present invention can solve the problems, such as to rely on manual type in the prior art, and to carry out type of webpage classification effectiveness low.The invention also discloses a kind of devices for differentiating type of webpage.

Description

A kind of method and device for differentiating type of webpage
Technical field
The present invention relates to Webpage classification technology field more particularly to a kind of method and devices for differentiating type of webpage.
Background technology
With the rapid development of Internet technology, the webpage quantity that search engine is included is more and more, to type of webpage Differentiate also more and more important.Type of webpage refers to the media properties of webpage, can be divided into news, forum, blog, mhkc, Question and answer etc..Have much to the application scenario that type of webpage is classified, such as:1st, brand exposure analysis, by being exposed to brand URL (Uniform Resource Locator, uniform resource locator) be collected statistics, analyze its categories of websites, can To know that brand is more in which kind of medium type exposure, and then help the more targeted selection brand exposure media of brand master; 2nd, brand the analysis of public opinion by being counted to brand public sentiment, understands the positive negative report of the brand in different media types, into And more effectively it can cope with and release news;3rd, web page crawl by the way that type of webpage is identified, can in advance determine not Same page parsing logic, more reasonably extracts page info.Type of webpage classification is mainly also to rely at present artificial Mode takes time and effort very much, this obviously can not be suitable for the present situation that webpage quantity sharply increases, therefore how improve type of webpage Classification effectiveness be a urgent problem to be solved.
Invention content
In view of the above problems, the present invention provides a kind of method and device for differentiating type of webpage, to solve the prior art Middle dependence manual type carries out the problem of type of webpage classification effectiveness is low.
The present invention provides it is a kind of differentiate type of webpage method, including:
Obtain the page info of webpage to be judged;
Heading message is extracted from the page info;
Judge that the preset keyword is the pass comprising type of webpage whether comprising preset keyword in the heading message Key word;
If not comprising the preset keyword in the heading message, based on the page structure corresponding to the page info Information and/or the heading message obtain the type of webpage of the webpage to be judged.
Preferably, the method further includes:
If the type of webpage corresponding to the preset keyword is made comprising the preset keyword in the heading message Type of webpage for the webpage to be judged.
Preferably, the page info for obtaining webpage to be judged includes:
The webpage to be judged is parsed, extracts the domain name of the corresponding link of the webpage to be judged;
Uniform resource position mark URL corresponding to analog access domain name crawls the page letter of the webpage to be judged Breath.
Preferably, the page structure information based on corresponding to the page info and/or the heading message obtain The type of webpage of the webpage to be judged includes:
Obtain the page info of several webpages as with reference to standard under at least one known web pages type;
It is withdrawn as joining in the page structure information corresponding to page info from the webpage as with reference to standard The label information of standard is examined, and counts the quantity under each known web pages type as the label information with reference to standard;
At least one label letter is extracted in the page structure information corresponding to page info from the webpage to be judged Breath;
Each label information is matched, and count each with the label information as with reference to standard respectively The quantity of the label information of successful match under the known web pages type;
It obtains under each known web pages type under the quantity and the known web pages type of the label information of successful match The ratio of quantity as the label information with reference to standard, and the ratio and default ratio are compared;
If the ratio is more than or equal to the default ratio, using the known web pages type corresponding to the ratio as described in The type of webpage of webpage to be judged.
Preferably, the page structure information based on corresponding to the page info and/or the heading message obtain The type of webpage of the webpage to be judged includes:
Obtain the heading message of several webpages as with reference to standard under at least one known web pages type;
It is split out from the heading message as with reference to the webpage of standard as the phrase with reference to standard, and counts every Quantity under a known web pages type as the phrase with reference to standard;
At least one phrase is split out from the heading message of the webpage to be judged;
Each phrase with the phrase as with reference to standard is matched respectively, and counts each described known The quantity of the phrase of successful match under type of webpage;
Obtain the quantity of the phrase of successful match and conduct under the known web pages type under each known web pages type The ratio of the quantity of the phrase of reference standard, and the ratio and default ratio are compared;
If the ratio is more than or equal to the default ratio, using the known web pages type corresponding to the ratio as described in The type of webpage of webpage to be judged.
Preferably, the page structure information based on corresponding to the page info and/or the heading message obtain The type of webpage of the webpage to be judged includes:
Obtain the page info of several webpages as with reference to standard under at least one known web pages type;
It is withdrawn as joining in the page structure information corresponding to page info from the webpage as with reference to standard The label information of standard is examined, and counts the quantity under each known web pages type as the label information with reference to standard;
At least one label letter is extracted in the page structure information corresponding to page info from the webpage to be judged Breath;
Each label information is matched, and count each with the label information as with reference to standard respectively The quantity of the label information of successful match under the known web pages type;
It obtains under each known web pages type under the quantity and the known web pages type of the label information of successful match First ratio of the quantity as the label information with reference to standard, and first ratio and the first default ratio are compared Compared with;
If first ratio is more than or equal to the described first default ratio, the Hownet corresponding to first ratio is obtained The heading message of several webpages as with reference to standard under page type;
It is split out from the heading message as with reference to the webpage of standard as the phrase with reference to standard, and counts every Quantity under a known web pages type as the phrase with reference to standard;
At least one phrase is split out from the heading message of the webpage to be judged;
Each phrase with the phrase as with reference to standard is matched respectively, and counts each described known The quantity of the phrase of successful match under type of webpage;
Obtain the quantity of the phrase of successful match and conduct under the known web pages type under each known web pages type Second ratio of the quantity of the phrase of reference standard, and second ratio and the second default ratio are compared;
If second ratio is more than or equal to the described second default ratio, by the known web pages corresponding to second ratio The type of webpage of type webpage to be judged as described in.
A kind of device for differentiating type of webpage, including:
Acquisition module, for obtaining the page info of webpage to be judged;
Extraction module, for extracting heading message from the page info;
Judgment module, for judging that the preset keyword is packet whether comprising preset keyword in the heading message Keyword containing type of webpage;
Processing module, if in the heading message do not include the preset keyword when, based on the page info Corresponding page structure information and/or the heading message obtain the type of webpage of the webpage to be judged.
Preferably, the processing module, if be additionally operable to include the preset keyword in the heading message, by described in The type of webpage of type of webpage corresponding to the preset keyword webpage to be judged as described in.
Preferably, the acquisition module includes:
Resolution unit for being parsed to the webpage to be judged, extracts the corresponding link of the webpage to be judged Domain name;
Analog access unit for the uniform resource position mark URL corresponding to analog access domain name, crawls described treat Judge the page info of webpage.
Preferably, the processing module includes:
First acquisition unit, for obtaining several webpages as with reference to standard under at least one known web pages type Page info;
First statistic unit, for believing from the page structure corresponding to the page info as with reference to the webpage of standard The label information of reference standard is withdrawn as in breath, and is counted under each known web pages type as the mark with reference to standard Sign the quantity of information;
First extraction unit, for being extracted from the page structure information corresponding to the page info of the webpage to be judged Go out at least one label information;
First matching unit, for using each label information respectively with it is described as with reference to standard label information into Row matching, and count the quantity of the label information of successful match under each known web pages type;
First comparing unit, for obtain under each known web pages type the quantity of the label information of successful match with Ratio under the known web pages type as the quantity of the label information with reference to standard, and the ratio and default ratio are carried out Compare;
First output unit, if for the ratio be more than or equal to the default ratio, by corresponding to the ratio Know the type of webpage of type of webpage webpage to be judged as described in.
Preferably, the processing module includes:
Second acquisition unit, for obtaining several webpages as with reference to standard under at least one known web pages type Heading message;
Second statistic unit, for being split out from the heading message as with reference to the webpage of standard as with reference to mark Accurate phrase, and count the quantity under each known web pages type as the phrase with reference to standard;
First split cells, for splitting out at least one phrase from the heading message of the webpage to be judged;
Second matching unit, for each phrase to be matched respectively with the phrase as with reference to standard, And count the quantity of the phrase of successful match under each known web pages type;
Second comparing unit, for obtaining under each known web pages type the quantity of the phrase of successful match with this Know the ratio as the quantity of the phrase with reference to standard under type of webpage, and the ratio and default ratio are compared;
Second output unit, will be corresponding to the ratio if be more than or equal to the default ratio for the ratio The type of webpage of known web pages type webpage to be judged as described in.
Preferably, the processing module includes:
Third acquiring unit, for obtaining several webpages as with reference to standard under at least one known web pages type Page info;
Third statistic unit, for believing from the page structure corresponding to the page info as with reference to the webpage of standard The label information of reference standard is withdrawn as in breath, and is counted under each known web pages type as the mark with reference to standard Sign the quantity of information;
Second extraction unit, for being extracted from the page structure information corresponding to the page info of the webpage to be judged Go out at least one label information;
Third matching unit, for using each label information respectively with it is described as with reference to standard label information into Row matching, and count the quantity of the label information of successful match under each known web pages type;
Third comparing unit, for obtain under each known web pages type the quantity of the label information of successful match with The first ratio under the known web pages type as the quantity of the label information with reference to standard, and by first ratio and first Default ratio is compared;
4th acquiring unit if be more than or equal to the first default ratio for first ratio, obtains described the The heading message of several webpages as with reference to standard under known web pages type corresponding to one ratio;
4th statistic unit, for being split out from the heading message as with reference to the webpage of standard as with reference to mark Accurate phrase, and count the quantity under each known web pages type as the phrase with reference to standard;
Second split cells, for splitting out at least one phrase from the heading message of the webpage to be judged;
4th matching unit, for each phrase to be matched respectively with the phrase as with reference to standard, And count the quantity of the phrase of successful match under each known web pages type;
4th comparing unit, for obtaining under each known web pages type the quantity of the phrase of successful match with this Know the second ratio as the quantity of the phrase with reference to standard under type of webpage, and by second ratio and the second default ratio It is compared;
Third output unit, if be more than or equal to the second default ratio for second ratio, by described second The type of webpage of known web pages type corresponding to the ratio webpage to be judged as described in.
By above-mentioned technical proposal, a kind of method for differentiating type of webpage provided by the invention, when needing to type of webpage When being judged, the page info of webpage to be judged is obtained first, and title letter is then extracted from the page info got Then whether breath is further judged comprising the preset keyword that can directly judge type of webpage in heading message, when title is believed When not including preset keyword in breath, obtain waiting to sentence by the page structure information corresponding to page info and/or heading message The type of webpage of suspension page.The classification for carrying out type of webpage relative to manual type is relied in the prior art, the present invention can be certainly The dynamic classification for realizing type of webpage improves the efficiency of type of webpage classification.
Description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this field Technical staff will become clear.Attached drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of method flow diagram of embodiment of the method 1 for differentiating type of webpage disclosed by the invention;
Fig. 2 shows a kind of method flow diagrams for the embodiment of the method 2 for differentiating type of webpage disclosed by the invention;
Fig. 3 shows a kind of method flow diagram of embodiment of the method 3 for differentiating type of webpage disclosed by the invention;
Fig. 4 shows a kind of method flow diagram of embodiment of the method 4 for differentiating type of webpage disclosed by the invention;
Fig. 5 shows a kind of structure diagram of device embodiment 1 for differentiating type of webpage disclosed by the invention;
Fig. 6 shows a kind of structure diagram of device embodiment 2 for differentiating type of webpage disclosed by the invention;
Fig. 7 shows a kind of structure diagram of device embodiment 3 for differentiating type of webpage disclosed by the invention;
Fig. 8 shows a kind of structure diagram of device embodiment 4 for differentiating type of webpage disclosed by the invention.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.
As shown in Figure 1, for a kind of method flow diagram for the embodiment of the method 1 for differentiating type of webpage disclosed by the invention, it should Method can comprise the steps of:
S101, the page info for obtaining webpage to be judged;
When needing to judge the type of webpage belonging to webpage, for example, judging whether webpage belongs to news category webpage Or forum's class webpage etc..First, the page info of webpage to be judged is obtained, wherein, the page info of webpage to be judged includes Heading message and page structure information.
Specifically, when obtaining wait the page info for judging webpage, a kind of realization method therein can be, by band Judge that webpage is parsed, extract the domain name of the corresponding link of webpage to be judged, then the system corresponding to analog access domain name One Resource Locator URL crawls the page info of webpage to be judged.Treat judge that webpage is parsed when, can pass through It treats and judges that the original URL (Uniform Resource Locator, uniform resource locator) of webpage is parsed.Pass through solution Analysis extracts domain name in web page interlinkage to be judged, wherein the domain name can be defined as in URL between beginning “http:// " and occur thereafter first ":" between character string.For example, webpage to be judged is linked as http:// example.com:1234/test.htm judges the parsing of webpage by treating, and the domain name extracted can be example.com.In the uniform resource position mark URL corresponding to analog access domain name, the page info of webpage to be judged is crawled When, Python reptiles library can be used to carry out analog access or carry out analog access using other programming languages, pass through simulation It accesses, crawls the information in page-out.
S102, heading message is extracted from page info;
After getting wait the page info for judging webpage, from HTLM (the HyperText Markup crawled Language, HyperText Markup Language) heading message is extracted in the page.
S103, judge that the preset keyword is includes type of webpage whether comprising preset keyword in heading message Keyword;
Then the heading message of extraction is judged, judged whether comprising the default pass that can directly determine type of webpage Key word, for example, it may be judged whether including the preset keywords such as " ends of the earth ", " forum ", " news ", " blog ".
If not comprising preset keyword in S104, heading message, based on the page structure information corresponding to page info and/ Or the heading message obtains the type of webpage of the webpage to be judged.
It, i.e., cannot be straight by the heading message of extraction when not including preset keyword in the heading message for judging to extract When connecing determining type of webpage, it is based further on the HTLM extracted (HyperText Markup Language, supertext mark Note language) page structure information and/or heading message treat and judge that webpage is classified, to obtain the web page class of webpage to be judged Type.That is, when not including preset keyword in heading message, can further be believed by the page structure corresponding to page info Breath treat judge that webpage is classified and obtain type of webpage or treated by the heading message in page info judge webpage into Row classification obtains type of webpage or treats by the page structure information in page info and heading message to judge that webpage carries out Classification obtains type of webpage.
It should be noted that when including preset keyword in heading message, by the web page class corresponding to preset keyword Type of webpage of the type as webpage to be judged.If for example, heading message for " as one piece of lipstick control, come try a color-amusement eight Comprising preset keyword " forum " in hexagram-forum ", wherein heading message, therefore the type of webpage to be judged can be determined as Forum.
In conclusion in the above-described embodiments, when needing to judge type of webpage, webpage to be judged is obtained first Page info, heading message is then extracted from the page info got, then further judge be in heading message It is no to include the preset keyword that directly judge type of webpage, when not including preset keyword in heading message, pass through page Page structure information and/or heading message corresponding to the information of face obtain the type of webpage of webpage to be judged.Relative to existing skill The classification that manual type carries out type of webpage is relied in art, the present invention can realize the classification of type of webpage, improve net automatically The efficiency of page classification of type.
As shown in Fig. 2, for a kind of method flow diagram for the embodiment of the method 1 for differentiating type of webpage disclosed by the invention, it should Method can comprise the steps of:
S201, the page info for obtaining webpage to be judged;
When needing to judge the type of webpage belonging to webpage, for example, judging whether webpage belongs to news category webpage Or forum's class webpage etc..First, the page info of webpage to be judged is obtained, wherein, the page info of webpage to be judged includes Heading message and page structure information.
Specifically, when obtaining wait the page info for judging webpage, a kind of realization method therein can be, by band Judge that webpage is parsed, extract the domain name of the corresponding link of webpage to be judged, then the system corresponding to analog access domain name One Resource Locator URL crawls the page info of webpage to be judged.Treat judge that webpage is parsed when, can pass through It treats and judges that the original URL (Uniform Resource Locator, uniform resource locator) of webpage is parsed.Pass through solution Analysis extracts domain name in web page interlinkage to be judged, wherein the domain name can be defined as in URL between beginning “http:// " and occur thereafter first ":" between character string.For example, webpage to be judged is linked as http:// example.com:1234/test.htm judges the parsing of webpage by treating, and the domain name extracted can be example.com.In the uniform resource position mark URL corresponding to analog access domain name, the page info of webpage to be judged is crawled When, Python reptiles library can be used to carry out analog access or carry out analog access using other programming languages, pass through simulation It accesses, crawls the information in page-out.
S202, heading message is extracted from page info;
After getting wait the page info for judging webpage, from HTLM (the HyperText Markup crawled Language, HyperText Markup Language) heading message is extracted in the page.
S203, judge that the preset keyword is includes type of webpage whether comprising preset keyword in heading message Keyword;
Then the heading message of extraction is judged, judged whether comprising the default pass that can directly determine type of webpage Key word, for example, it may be judged whether including the preset keywords such as " ends of the earth ", " forum ", " news ", " blog ".
If several works under at least one known web pages type are not obtained comprising preset keyword in S204, heading message The page info of webpage for reference standard;
When not including preset keyword in heading message, i.e., it cannot pass through the heading message of Webpage information to be judged When directly judging the type of webpage, the webpage of at least one known web pages type is obtained first, while is got each known The page info of type of webpage webpage, using the page info got as with reference to standard.
S205, it is withdrawn as joining from the page structure information corresponding to the page info as the webpage with reference to standard The label information of standard is examined, and counts the quantity under each known web pages type as the label information with reference to standard;
After the page info for obtaining several webpages as the standard of referring under at least one known web pages type, from work To be withdrawn as the label information of reference standard in the page structure information corresponding to the page info of the webpage of reference standard. Because each page structure information includes multiple label informations.For example, by taking the webpage of one of known web pages type as an example, packet The label information contained has:" meta ", " link ", " span ", " a ", " p " count the quantity as the label information with reference to standard, " meta " has 12, and " link " has 3, and " span " has 5, and " a " has 3, and " p " has 3.
At least one label letter is extracted in page structure information corresponding to S206, the page info from webpage to be judged Breath;
Meanwhile extracted in the page structure information corresponding to the page info from webpage to be judged, it is at least one to be used for The label information being determined to page type.For example, extract " meta ", " div ".
S207, each label information is matched, and count each with as the label information with reference to standard respectively Know the quantity of the label information of successful match under type of webpage;
Each label information of webpage to be judged is matched respectively with as the label information with reference to standard, with above-mentioned For example, the label information " meta " of webpage to be judged is found by matching, it can be with the label information as reference standard " meta " is matched.Then the quantity of the label information of successful match under each known web pages type is further counted, by system Meter, has 10 " meta ".
S208, it obtains under each known web pages type under the quantity and the known web pages type of the label information of successful match The ratio of quantity as the label information with reference to standard, and ratio and default ratio are compared;
Then it obtains under each known web pages type under the quantity of the label information of successful match and the known web pages type The ratio of quantity as the label information with reference to standard, the quantity ratio of label information " meta " is by taking above-mentioned example as an example 5/6, then the ratio of acquisition and default ratio are compared, wherein, default ratio is flexibly is set according to actual demand Fixed, the type of webpage that certain default ratio if setting is determined with the ratio got if is more accurate.
If S209, ratio are more than or equal to default ratio, using the known web pages type corresponding to ratio as webpage to be judged Type of webpage.
When the ratio got is more than or equal to default ratio, using the known web pages type corresponding to ratio as waiting to judge The type of webpage of webpage.By taking above-mentioned example as an example, label information " meta ", " link ", " span ", " a ", " p ", institute will be included Type of webpage of the corresponding type of webpage as webpage to be judged.
As shown in figure 3, for a kind of method flow diagram for the embodiment of the method 3 for differentiating type of webpage disclosed by the invention, it should Method can comprise the steps of:
S301, the page info for obtaining webpage to be judged;
When needing to judge the type of webpage belonging to webpage, for example, judging whether webpage belongs to news category webpage Or forum's class webpage etc..First, the page info of webpage to be judged is obtained, wherein, the page info of webpage to be judged includes Heading message and page structure information.
Specifically, when obtaining wait the page info for judging webpage, a kind of realization method therein can be, by band Judge that webpage is parsed, extract the domain name of the corresponding connection of webpage to be judged, then the system corresponding to analog access domain name One Resource Locator URL crawls the page info of webpage to be judged.Treat judge that webpage is parsed when, can pass through It treats and judges that the original URL (Uniform Resource Locator, uniform resource locator) of webpage is parsed.Pass through solution Analysis extracts domain name in web page interlinkage to be judged, wherein the domain name can be defined as in URL between beginning “http:// " and occur thereafter first ":" between character string.For example, webpage to be judged is linked as http:// example.com:1234/test.htm judges the parsing of webpage by treating, and the domain name extracted can be example.com.In the uniform resource position mark URL corresponding to analog access domain name, the page info of webpage to be judged is crawled When, Python reptiles library can be used to carry out analog access or carry out analog access using other programming languages, pass through simulation It accesses, crawls the information in page-out.
S302, heading message is extracted from page info;
After getting wait the page info for judging webpage, from HTLM (the HyperText Markup crawled Language, HyperText Markup Language) heading message is extracted in the page.
S303, judge that the preset keyword is includes type of webpage whether comprising preset keyword in heading message Keyword;
Then the heading message of extraction is judged, judged whether comprising the default pass that can directly determine type of webpage Key word, for example, it may be judged whether including the preset keywords such as " ends of the earth ", " forum ", " news ", " blog ".
If several works under at least one known web pages type are not obtained comprising preset keyword in S304, heading message The heading message of webpage for reference standard;
When not including preset keyword in heading message, i.e., it cannot pass through the heading message of Webpage information to be judged When directly judging the type of webpage, the webpage of at least one known web pages type is obtained first, while is got each known Heading message in the page info of type of webpage webpage, using the heading message got as with reference to standard.
S305, it splits out as the phrase with reference to standard, and counts from the heading message as the webpage with reference to standard Quantity under each known web pages type as the phrase with reference to standard;
It is right after the heading message for getting several webpages as the standard of referring under at least one known web pages type Heading message as the webpage with reference to standard carries out phrase fractionation, splits out the phrase of reference standard the most.For example, with wherein For the webpage of one known web pages type, the heading message of the webpage is " as one piece of lipstick control, to select the board of lipstick Son, the color for lipstick of having a try ", segments title, splits out the phrase " lipstick " as reference standard, " brand ", " face Color " counts the quantity as each phrase with reference to standard, and " lipstick " has 3, " brand " 1, " 1, color ".
S306, at least one phrase is split out from the heading message of webpage to be judged;
Meanwhile at least one phrase is split out from the heading message of webpage to be judged, for example, the title of webpage to be judged Information is " when selecting lipstick, there are many kinds of classes for lipstick ", and the phrase of fractionation includes " lipstick ", " type ".
S307, each phrase is matched, and count each known web pages class respectively with as the phrase with reference to standard The quantity of the phrase of successful match under type;
Using wait judge webpage heading message split after each phrase respectively with as with reference to standard phrase carry out Match, by taking above-mentioned example as an example, " lipstick " in web page title information to be judged found by matching, can with as with reference to standard Phrase " lipstick " match.Then the quantity of the phrase of successful match under each known web pages type is further counted, is passed through Statistics, has 2 " lipsticks ".
S308, it obtains under each known web pages type under the quantity and the known web pages type of the phrase of successful match The ratio of quantity as the phrase with reference to standard, and the ratio and default ratio are compared;
Then the quantity of the phrase of successful match and conduct under the known web pages type under each known web pages type are obtained The ratio of the quantity of the phrase of reference standard, the quantity ratio of phrase " lipstick " is 2/3 by taking above-mentioned example as an example, then will be obtained The ratio taken is compared with default ratio, wherein, default ratio is flexibly set according to actual demand, if certainly The default ratio of setting and the ratio got are more accurate closer to then determining type of webpage.
If S309, the ratio are more than or equal to the default ratio, the known web pages type corresponding to the ratio is made Type of webpage for the webpage to be judged.
When the ratio got is more than or equal to default ratio, using the known web pages type corresponding to ratio as waiting to judge The type of webpage of webpage.By taking above-mentioned example as an example, i.e., heading message is " as one piece of lipstick control, to select the brand of lipstick, examination Try the color of lipstick ", type of webpage of the corresponding type of webpage as webpage to be judged.
As shown in figure 4, for a kind of method flow diagram for the embodiment of the method 4 for differentiating type of webpage disclosed by the invention, it should Method can comprise the steps of:
S401, the page info for obtaining webpage to be judged;
When needing to judge the type of webpage belonging to webpage, for example, judging whether webpage belongs to news category webpage Or forum's class webpage etc..First, the page info of webpage to be judged is obtained, wherein, the page info of webpage to be judged includes Heading message and page structure information.
Specifically, when obtaining wait the page info for judging webpage, a kind of realization method therein can be, by treating Judge that webpage is parsed, extract the domain name of the corresponding link of webpage to be judged, then the system corresponding to analog access domain name One Resource Locator URL crawls the page info of webpage to be judged.Treat judge that webpage is parsed when, can pass through It treats and judges that the original URL (Uniform Resource Locator, uniform resource locator) of webpage is parsed.Pass through solution Analysis extracts domain name in web page interlinkage to be judged, wherein the domain name can be defined as in URL between beginning “http:// " and occur thereafter first ":" between character string.For example, webpage to be judged is linked as http:// example.com:1234/test.htm judges the parsing of webpage by treating, and the domain name extracted can be example.com.In the uniform resource position mark URL corresponding to analog access domain name, the page info of webpage to be judged is crawled When, Python reptiles library can be used to carry out analog access or carry out analog access using other programming languages, pass through simulation It accesses, crawls the information in page-out.
S402, heading message is extracted from page info;
After getting wait the page info for judging webpage, from HTLM (the HyperText Markup crawled Language, HyperText Markup Language) heading message is extracted in the page.
S403, judge that the preset keyword is includes type of webpage whether comprising preset keyword in heading message Keyword;
Then the heading message of extraction is judged, judged whether comprising the default pass that can directly determine type of webpage Key word, for example, it may be judged whether including the preset keywords such as " ends of the earth ", " forum ", " news ", " blog ".
If several works under at least one known web pages type are not obtained comprising preset keyword in S404, heading message The page info of webpage for reference standard;
When not including preset keyword in heading message, i.e., it cannot pass through the heading message of Webpage information to be judged When directly judging the type of webpage, the webpage of at least one known web pages type is obtained first, while is got each known The page info of type of webpage webpage, using the page info got as with reference to standard.
S405, it is withdrawn as joining from the page structure information corresponding to the page info as the webpage with reference to standard The label information of standard is examined, and counts the quantity under each known web pages type as the label information with reference to standard;
After the page info for obtaining several webpages as the standard of referring under at least one known web pages type, from work To be withdrawn as the label information of reference standard in the page structure information corresponding to the page info of the webpage of reference standard, Each page structure includes multiple label informations.For example, by taking the webpage of one of known web pages type as an example, comprising label Information has:" meta ", " link ", " span ", " a ", " p " count the quantity as the label information with reference to standard, and " meta " has 12, " link " has 3, and " span " has 5, and " a " has 3, and " p " has 3.
At least one label letter is extracted in page structure information corresponding to S406, the page info from webpage to be judged Breath;
Meanwhile extracted in the page structure information corresponding to the page info from webpage to be judged, it is at least one to be used for The label information being determined to page type.For example, extract " meta ", " div ".
S407, each label information is matched, and count each with as the label information with reference to standard respectively Know the quantity of the label information of successful match under type of webpage;
Each label information of webpage to be judged is matched respectively with as the label information with reference to standard, with above-mentioned For example, found by matching with the label information " meta " for judging webpage, it can be with the label information as reference standard " meta " is matched.Then the quantity of the label information of successful match under each known web pages type is further counted, by system Meter, has 10 " meta ".
S408, it obtains under each known web pages type under the quantity and the known web pages type of the label information of successful match First ratio of the quantity as the label information with reference to standard, and first ratio and the first default ratio are compared Compared with;
Then it obtains under each known web pages type under the quantity of the label information of successful match and the known web pages type First ratio of the quantity as the label information with reference to standard, the quantity ratio of label information " meta " by taking above-mentioned example as an example It is 5/6 to be worth, and is then compared the first ratio of acquisition and the first default ratio, wherein, the first default ratio is according to reality Border demand is flexibly set, the net determined if if the first default ratio of setting and the ratio got certainly Page type is more accurate.
If S409, the first ratio are more than or equal to the first default ratio, the known web pages type corresponding to the first ratio is obtained Under it is several as with reference to standard webpage heading message;
When the first ratio is more than or equal to the first default ratio, the known web pages class corresponding to the first ratio is further obtained The webpage of type, while get the heading message in the page info of each known web pages types of web pages, the title that will be got Information is used as with reference to standard.It should be noted that may have with the webpage of the known web pages type corresponding to the first ratio multiple.
S410, it splits out as the phrase with reference to standard, and counts from the heading message as the webpage with reference to standard Quantity under each known web pages type as the phrase with reference to standard;
It is right after the heading message for getting several webpages as the standard of referring under at least one known web pages type Heading message as the webpage with reference to standard carries out phrase fractionation, splits out as the phrase with reference to standard.For example, with wherein For the webpage of one known web pages type, the heading message of the webpage is " as one piece of lipstick control, to select the board of lipstick Son, the color for lipstick of having a try ", segments title, splits out the phrase " lipstick " as reference standard, " brand ", " face Color " counts the quantity as each phrase with reference to standard, and " lipstick " has 3, " brand " 1, " 1, color ".
S411, at least one phrase is split out from the heading message of webpage to be judged;
Meanwhile at least one phrase is split out from the heading message of webpage to be judged, for example, the title of webpage to be judged Information is " when selecting lipstick, there are many kinds of classes for lipstick ", and the phrase of fractionation includes " lipstick ", " type ".
S412, each phrase is matched, and count each known web pages class respectively with as the phrase with reference to standard The quantity of the phrase of successful match under type;
Using wait judge webpage heading message split after each phrase respectively with as with reference to standard phrase carry out Match, by taking above-mentioned example as an example, " lipstick " in web page title information to be judged found by matching, can with as with reference to standard Phrase " lipstick " match.Then the quantity of the phrase of successful match under each known web pages type is further counted, is passed through Statistics, has 2 " lipsticks ".
S413, conduct under the quantity and the known web pages type of the phrase of successful match under each known web pages type is obtained Second ratio of the quantity of the phrase of reference standard, and second ratio and the second default ratio are compared;
Then the quantity of the phrase of successful match and conduct under the known web pages type under each known web pages type are obtained Second ratio of the quantity of the phrase of reference standard, the quantity ratio of phrase " lipstick " is 2/3 by taking above-mentioned example as an example, then Second ratio of acquisition and the second default ratio are compared, wherein, the second default ratio is to carry out spirit according to actual demand Setting living, it is got over certainly if the type of webpage that the second default ratio of setting is determined with the second ratio got if Accurately.
If S414, the second ratio are more than or equal to the second default ratio, the known web pages type corresponding to the second ratio is made Type of webpage for webpage to be judged.
When the second ratio got is more than or equal to the second default ratio, by the known web pages class corresponding to the second ratio Type of webpage of the type as webpage to be judged.By taking above-mentioned example as an example, i.e., heading message is " as one piece of lipstick control, to select The brand of lipstick, the color for lipstick of having a try ", type of webpage of the corresponding type of webpage as webpage to be judged.
It should be noted that in the above-described embodiments, it, at this time can be by first if the second ratio is less than the second default ratio Type of webpage of the known web pages type as webpage to be judged corresponding to ratio.
As shown in figure 5, for a kind of structure diagram for the device embodiment 1 for differentiating type of webpage disclosed by the invention, it should Device can include:
Acquisition module 501, for obtaining the page info of webpage to be judged;
When needing to judge the type of webpage belonging to webpage, for example, judging whether webpage belongs to news category webpage Or forum's class webpage etc..First, the page info of webpage to be judged is obtained, wherein, the page info of webpage to be judged includes Heading message and page structure information.
Specifically, when obtaining wait the page info for judging webpage, a kind of realization method therein can be, by band Judge that webpage is parsed, extract the domain name of the corresponding connection of webpage to be judged, then the system corresponding to analog access domain name One Resource Locator URL crawls the page info of webpage to be judged.Treat judge that webpage is parsed when, can pass through It treats and judges that the original URL (Uniform Resource Locator, uniform resource locator) of webpage is parsed.Pass through solution Analysis extracts domain name in web page interlinkage to be judged, wherein the domain name can be defined as in URL between beginning “http:// " and occur thereafter first ":" between character string.For example, webpage to be judged is linked as http:// example.com:1234/test.htm judges the parsing of webpage by treating, and the domain name extracted can be example.com.In the uniform resource position mark URL corresponding to analog access domain name, the page info of webpage to be judged is crawled When, Python reptiles library can be used to carry out analog access or carry out analog access using other programming languages, pass through simulation It accesses, crawls the information in page-out.
Extraction module 502, for extracting heading message from page info;
After getting wait the page info for judging webpage, from HTLM (the HyperText Markup crawled Language, HyperText Markup Language) heading message is extracted in the page.
Judgment module 503, for whether judging in heading message comprising preset keyword, the preset keyword be comprising The keyword of type of webpage;
Then the heading message of extraction is judged, judged whether comprising the default pass that can directly determine type of webpage Key word, for example, it may be judged whether including the preset keywords such as " ends of the earth ", " forum ", " news ", " blog ".
Processing module 504, if for not including preset keyword in heading message, based on the page corresponding to page info Structural information and/or the heading message obtain the type of webpage of the webpage to be judged.
It, i.e., cannot be straight by the heading message of extraction when not including preset keyword in the heading message for judging to extract When connecing determining type of webpage, it is based further on the HTLM extracted (HyperText Markup Language, supertext mark Note language) page structure information and/or heading message treat and judge that webpage is classified, to obtain the web page class of webpage to be judged Type.That is, when not including preset keyword in heading message, can further be believed by the page structure corresponding to page info Breath treat judge that webpage is classified and obtain type of webpage or treated by the heading message in page info judge webpage into Row classification obtains type of webpage or treats by the page structure information in page info and heading message to judge that webpage carries out Classification obtains type of webpage.
It should be noted that when including preset keyword in heading message, by the web page class corresponding to preset keyword Type of webpage of the type as webpage to be judged.If for example, heading message for " as one piece of lipstick control, come try a color-amusement eight Comprising preset keyword " forum " in hexagram-forum ", wherein heading message, therefore the type of webpage to be judged can be determined as Forum.
The device for differentiating type of webpage includes processor and memory, above-mentioned acquisition module, extraction module, judgement Module and processing module etc. in memory, are performed stored in memory above-mentioned as program unit storage by processor Program unit realizes corresponding function.
Comprising kernel in processor, gone in memory to transfer corresponding program unit by kernel.Kernel can set one Or more, solve the problems, such as that Web page classifying efficiency is low by adjusting kernel parameter.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM), memory includes at least one deposit Store up chip.
In conclusion in the above-described embodiments, when needing to judge type of webpage, webpage to be judged is obtained first Page info, heading message is then extracted from the page info got, then further judge be in heading message It is no to include the preset keyword that directly judge type of webpage, when not including preset keyword in heading message, pass through page Page structure information and/or heading message corresponding to the information of face obtain the type of webpage of webpage to be judged.Relative to existing skill The classification that manual type carries out type of webpage is relied in art, the present invention can realize the classification of type of webpage, improve net automatically The efficiency of page classification of type.
As shown in fig. 6, for a kind of structure diagram for the device embodiment 2 for differentiating type of webpage disclosed by the invention, it should Device can include:
Acquisition module 601, for obtaining the page info of webpage to be judged;
When needing to judge the type of webpage belonging to webpage, for example, judging whether webpage belongs to news category webpage Or forum's class webpage etc..First, the page info of webpage to be judged is obtained, wherein, the page info of webpage to be judged includes Heading message and page structure information.
Specifically, when obtaining wait the page info for judging webpage, a kind of realization method therein can be, by band Judge that webpage is parsed, extract the domain name of the corresponding connection of webpage to be judged, then the system corresponding to analog access domain name One Resource Locator URL crawls the page info of webpage to be judged.Treat judge that webpage is parsed when, can pass through It treats and judges that the original URL (Uniform Resource Locator, uniform resource locator) of webpage is parsed.Pass through solution Analysis extracts domain name in web page interlinkage to be judged, wherein the domain name can be defined as in URL between beginning “http:// " and occur thereafter first ":" between character string.For example, webpage to be judged is linked as http:// example.com:1234/test.htm judges the parsing of webpage by treating, and the domain name extracted can be example.com.In the uniform resource position mark URL corresponding to analog access domain name, the page info of webpage to be judged is crawled When, Python reptiles library can be used to carry out analog access or carry out analog access using other programming languages, pass through simulation It accesses, crawls the information in page-out.
Extraction module 602, for extracting heading message from page info;
After getting wait the page info for judging webpage, from HTLM (the HyperText Markup crawled Language, HyperText Markup Language) heading message is extracted in the page.
Judgment module 603, for whether judging in heading message comprising preset keyword, the preset keyword be comprising The keyword of type of webpage;
Then the heading message of extraction is judged, judged whether comprising the default pass that can directly determine type of webpage Key word, for example, it may be judged whether including the preset keywords such as " ends of the earth ", " forum ", " news ", " blog ".
First acquisition unit 604, if for, not comprising preset keyword, obtaining at least one known web pages in heading message The page info of several webpages as with reference to standard under type;
When not including preset keyword in heading message, i.e., it cannot pass through the heading message of Webpage information to be judged When directly judging the type of webpage, the webpage of at least one known web pages type is obtained first, while is got each known The page info of type of webpage webpage, using the page info got as with reference to standard.
First statistic unit 605, for believing from the page structure corresponding to the page info as the webpage with reference to standard The label information of reference standard is withdrawn as in breath, and counts and believes under each known web pages type as the label with reference to standard The quantity of breath;
After the page info for obtaining several webpages as the standard of referring under at least one known web pages type, from work To be withdrawn as the label information of reference standard in the page structure information corresponding to the page info of the webpage of reference standard, Each page structure includes multiple label informations.For example, by taking the webpage of one of known web pages type as an example, comprising label Information has:" meta ", " link ", " span ", " a ", " p " count the quantity as the label information with reference to standard, and " meta " has 12, " link " has 3, and " span " has 5, and " a " has 3, and " p " has 3.
First extraction unit 606, for being extracted from the page structure information corresponding to the page info of webpage to be judged Go out at least one label information;
Meanwhile extracted in the page structure information corresponding to the page info from webpage to be judged, it is at least one to be used for The label information being determined to page type.For example, extract " meta ", " div ".
First matching unit 607, for using each label information respectively with as with reference to standard label information carry out Match, and count the quantity of the label information of successful match under each known web pages type;
Each label information of webpage to be judged is matched respectively with as the label information with reference to standard, with above-mentioned For example, found by matching with the label information " meta " for judging webpage, it can be with the label information as reference standard " meta " is matched.Then the quantity of the label information of successful match under each known web pages type is further counted, by system Meter, has 10 " meta ".
First comparing unit 608, for obtain under each known web pages type the quantity of the label information of successful match with Ratio under the known web pages type as the quantity of the label information with reference to standard, and ratio and default ratio are compared Compared with;
Then it obtains under each known web pages type under the quantity of the label information of successful match and the known web pages type The ratio of quantity as the label information with reference to standard, the quantity ratio of label information " meta " is by taking above-mentioned example as an example 5/6, then the ratio of acquisition and default ratio are compared, wherein, default ratio is flexibly is set according to actual demand Fixed, the type of webpage that certain default ratio if setting is determined with the ratio got if is more accurate.
First output unit 609, if being more than or equal to default ratio for ratio, by the known web pages type corresponding to ratio Type of webpage as webpage to be judged.
When the ratio got is more than or equal to default ratio, using the known web pages type corresponding to ratio as waiting to judge The type of webpage of webpage.By taking above-mentioned example as an example, label information " meta ", " link ", " span ", " a ", " p ", institute will be included Type of webpage of the corresponding type of webpage as webpage to be judged.
As shown in fig. 7, for a kind of structure diagram for the device embodiment 3 for differentiating type of webpage disclosed by the invention, it should Device can include:
Acquisition module 701, for obtaining the page info of webpage to be judged;
When needing to judge the type of webpage belonging to webpage, for example, judging whether webpage belongs to news category webpage Or forum's class webpage etc..First, the page info of webpage to be judged is obtained, wherein, the page info of webpage to be judged includes Heading message and page structure information.
Specifically, when obtaining wait the page info for judging webpage, a kind of realization method therein can be, by treating Judge that webpage is parsed, extract the domain name of the corresponding link of webpage to be judged, then the system corresponding to analog access domain name One Resource Locator URL crawls the page info of webpage to be judged.Treat judge that webpage is parsed when, can pass through It treats and judges that the original URL (Uniform Resource Locator, uniform resource locator) of webpage is parsed.Pass through solution Analysis extracts domain name in web page interlinkage to be judged, wherein the domain name can be defined as in URL between beginning “http:// " and occur thereafter first ":" between character string.For example, webpage to be judged is linked as http:// example.com:1234/test.htm judges the parsing of webpage by treating, and the domain name extracted can be example.com.In the uniform resource position mark URL corresponding to analog access domain name, the page info of webpage to be judged is crawled When, Python reptiles library can be used to carry out analog access or carry out analog access using other programming languages, pass through simulation It accesses, crawls the information in page-out.
Extraction module 702, for extracting heading message from page info;
After getting wait the page info for judging webpage, from HTLM (the HyperText Markup crawled Language, HyperText Markup Language) heading message is extracted in the page.
Judgment module 703, for whether judging in heading message comprising preset keyword, the preset keyword be comprising The keyword of type of webpage;
Then the heading message of extraction is judged, judged whether comprising the default pass that can directly determine type of webpage Key word, for example, it may be judged whether including the preset keywords such as " ends of the earth ", " forum ", " news ", " blog ".
Second acquisition unit 704, if for, not comprising preset keyword, obtaining at least one known web pages in heading message The heading message of several webpages as with reference to standard under type;
When not including preset keyword in heading message, i.e., it cannot pass through the heading message of Webpage information to be judged When directly judging the type of webpage, the webpage of at least one known web pages type is obtained first, while is got each known Heading message in the page info of type of webpage webpage, using the heading message got as with reference to standard.
Second statistic unit 705, for being split out from the heading message as the webpage with reference to standard as with reference to mark Accurate phrase, and count the quantity under each known web pages type as the phrase with reference to standard;
It is right after the heading message for getting several webpages as the standard of referring under at least one known web pages type Heading message as the webpage with reference to standard carries out phrase fractionation, splits out the phrase of reference standard the most.For example, with wherein For the webpage of one known web pages type, the heading message of the webpage is " as one piece of lipstick control, to select the board of lipstick Son, the color for lipstick of having a try ", segments title, splits out the phrase " lipstick " as reference standard, " brand ", " face Color " counts the quantity as each phrase with reference to standard, and " lipstick " has 3, " brand " 1, " 1, color ".
First split cells 706, for splitting out at least one phrase from the heading message of webpage to be judged;
Meanwhile at least one phrase is split out from the heading message of webpage to be judged, for example, the title of webpage to be judged Information is " when selecting lipstick, there are many kinds of classes for lipstick ", and the phrase of fractionation includes " lipstick ", " type ".
Second matching unit 707, for using each phrase respectively with being matched, and count as with reference to the phrase of standard The quantity of the phrase of successful match under each known web pages type;
Using wait judge webpage heading message split after each phrase respectively with as with reference to standard phrase carry out Match, by taking above-mentioned example as an example, " lipstick " in web page title information to be judged found by matching, can with as with reference to standard Phrase " lipstick " match.Then the quantity of the phrase of successful match under each known web pages type is further counted, is passed through Statistics, has 2 " lipsticks ".
Second comparing unit 708, for obtain under each known web pages type the quantity of the phrase of successful match with Ratio under the known web pages type as the quantity of the phrase with reference to standard, and the ratio and default ratio are compared Compared with;
Then the quantity of the phrase of successful match and conduct under the known web pages type under each known web pages type are obtained The ratio of the quantity of the phrase of reference standard, the quantity ratio of phrase " lipstick " is 2/3 by taking above-mentioned example as an example, then will be obtained The ratio taken is compared with default ratio, wherein, default ratio is flexibly set according to actual demand, if certainly The default ratio of setting and the ratio got are more accurate closer to then determining type of webpage.
Second output unit 709, will be corresponding to the ratio if being more than or equal to the default ratio for the ratio The type of webpage of known web pages type webpage to be judged as described in.
When the ratio got is more than or equal to default ratio, using the known web pages type corresponding to ratio as waiting to judge The type of webpage of webpage.By taking above-mentioned example as an example, i.e., heading message is " as one piece of lipstick control, to select the brand of lipstick, examination Try the color of lipstick ", type of webpage of the corresponding type of webpage as webpage to be judged.
As shown in figure 8, for a kind of structure diagram for the device embodiment 4 for differentiating type of webpage disclosed by the invention, it should Device can include:
Acquiring unit 801, for obtaining the page info of webpage to be judged;
When needing to judge the type of webpage belonging to webpage, for example, judging whether webpage belongs to news category webpage Or forum's class webpage etc..First, the page info of webpage to be judged is obtained, wherein, the page info of webpage to be judged includes Heading message and page structure information.
Specifically, when obtaining wait the page info for judging webpage, a kind of realization method therein can be, by band Judge that webpage is parsed, extract the domain name of the corresponding connection of webpage to be judged, then the system corresponding to analog access domain name One Resource Locator URL crawls the page info of webpage to be judged.Treat judge that webpage is parsed when, can pass through It treats and judges that the original URL (Uniform Resource Locator, uniform resource locator) of webpage is parsed.Pass through solution Analysis extracts domain name in web page interlinkage to be judged, wherein the domain name can be defined as in URL between beginning “http:// " and occur thereafter first ":" between character string.For example, webpage to be judged is linked as http:// example.com:1234/test.htm judges the parsing of webpage by treating, and the domain name extracted can be example.com.In the uniform resource position mark URL corresponding to analog access domain name, the page info of webpage to be judged is crawled When, Python reptiles library can be used to carry out analog access or carry out analog access using other programming languages, pass through simulation It accesses, crawls the information in page-out.
Extraction module 802, for extracting heading message from page info;
After getting wait the page info for judging webpage, from HTLM (the HyperText Markup crawled Language, HyperText Markup Language) heading message is extracted in the page.
Judgment module 803, for whether judging in heading message comprising preset keyword, the preset keyword be comprising The keyword of type of webpage;
Then the heading message of extraction is judged, judged whether comprising the default pass that can directly determine type of webpage Key word, for example, it may be judged whether including the preset keywords such as " ends of the earth ", " forum ", " news ", " blog ".
Third acquiring unit 804, if for, not comprising preset keyword, obtaining at least one known web pages in heading message The page info of several webpages as with reference to standard under type;
When not including preset keyword in heading message, i.e., it cannot pass through the heading message of Webpage information to be judged When directly judging the type of webpage, the webpage of at least one known web pages type is obtained first, while is got each known The page info of type of webpage webpage, using the page info got as with reference to standard.
Third statistic unit 805, for believing from the page structure corresponding to the page info as the webpage with reference to standard The label information of reference standard is withdrawn as in breath, and is counted under each known web pages type as the mark with reference to standard Sign the quantity of information;
After the page info for obtaining several webpages as the standard of referring under at least one known web pages type, from work To be withdrawn as the label information of reference standard in the page structure information corresponding to the page info of the webpage of reference standard, Each page structure includes multiple label informations.For example, by taking the webpage of one of known web pages type as an example, comprising label Information has:" meta ", " link ", " span ", " a ", " p " count the quantity as the label information with reference to standard, and " meta " has 12, " link " has 3, and " span " has 5, and " a " has 3, and " p " has 3.
Second extraction unit 806, for being extracted from the page structure information corresponding to the page info of webpage to be judged Go out at least one label information;
Meanwhile extracted in the page structure information corresponding to the page info from webpage to be judged, it is at least one to be used for The label information being determined to page type.For example, extract " meta ", " div ".
Third matching unit 807, for using each label information respectively with as with reference to standard label information carry out Match, and count the quantity of the label information of successful match under each known web pages type;
Each label information of webpage to be judged is matched respectively with as the label information with reference to standard, with above-mentioned For example, found by matching with the label information " meta " for judging webpage, it can be with the label information as reference standard " meta " is matched.Then the quantity of the label information of successful match under each known web pages type is further counted, by system Meter, has 10 " meta ".
Third comparing unit 808, for obtain under each known web pages type the quantity of the label information of successful match with The first ratio under the known web pages type as the quantity of the label information with reference to standard, and by first ratio and first Default ratio is compared;
Then it obtains under each known web pages type under the quantity of the label information of successful match and the known web pages type First ratio of the quantity as the label information with reference to standard, the quantity ratio of label information " meta " by taking above-mentioned example as an example It is 5/6 to be worth, and is then compared the first ratio of acquisition and the first default ratio, wherein, the first default ratio is according to reality Border demand is flexibly set, the net determined if if the first default ratio of setting and the ratio got certainly Page type is more accurate.
4th acquiring unit 809 if being more than or equal to the first default ratio for the first ratio, is obtained corresponding to the first ratio Known web pages type under it is several as with reference to standard webpage heading message;
When the first ratio is more than or equal to the first default ratio, the known web pages class corresponding to the first ratio is further obtained The webpage of type, while get the heading message in the page info of each known web pages types of web pages, the title that will be got Information is used as with reference to standard.It should be noted that may have with the webpage of the known web pages type corresponding to the first ratio multiple.
4th statistic unit 810, for being split out from the heading message as the webpage with reference to standard as with reference to mark Accurate phrase, and count the quantity under each known web pages type as the phrase with reference to standard;
It is right after the heading message for getting several webpages as the standard of referring under at least one known web pages type Heading message as the webpage with reference to standard carries out phrase fractionation, splits out the phrase of reference standard the most.For example, with wherein For the webpage of one known web pages type, the heading message of the webpage is " as one piece of lipstick control, to select the board of lipstick Son, the color for lipstick of having a try ", segments title, splits out the phrase " lipstick " as reference standard, " brand ", " face Color " counts the quantity as each phrase with reference to standard, and " lipstick " has 3, " brand " 1, " 1, color ".
Second split cells 811, for splitting out at least one phrase from the heading message of webpage to be judged;
Meanwhile at least one phrase is split out from the heading message of webpage to be judged, for example, the title of webpage to be judged Information is " when selecting lipstick, there are many kinds of classes for lipstick ", and the phrase of fractionation includes " lipstick ", " type ".
4th matching unit 812, for using each phrase respectively with being matched, and count as with reference to the phrase of standard The quantity of the phrase of successful match under each known web pages type;
Using wait judge webpage heading message split after each phrase respectively with as with reference to standard phrase carry out Match, by taking above-mentioned example as an example, " lipstick " in web page title information to be judged found by matching, can with as with reference to standard Phrase " lipstick " match.Then the quantity of the phrase of successful match under each known web pages type is further counted, is passed through Statistics, has 2 " lipsticks ".
4th comparing unit 813, for obtaining under each known web pages type the quantity of the phrase of successful match with this Know the second ratio as the quantity of the phrase with reference to standard under type of webpage, and by second ratio and the second default ratio It is compared;
Then the quantity of the phrase of successful match and conduct under the known web pages type under each known web pages type are obtained Second ratio of the quantity of the phrase of reference standard, the quantity ratio of phrase " lipstick " is 2/3 by taking above-mentioned example as an example, then Second ratio of acquisition and the second default ratio are compared, wherein, the second default ratio is to carry out spirit according to actual demand Setting living, it is got over certainly if the type of webpage that the second default ratio of setting is determined with the second ratio got if Accurately.
Third output unit 814, will be corresponding to the second ratio if being more than or equal to the second default ratio for the second ratio Type of webpage of the known web pages type as webpage to be judged.
When the second ratio got is more than or equal to the second default ratio, by the known web pages class corresponding to the second ratio Type of webpage of the type as webpage to be judged.By taking above-mentioned example as an example, i.e., heading message is " as one piece of lipstick control, to select The brand of lipstick, the color for lipstick of having a try ", type of webpage of the corresponding type of webpage as webpage to be judged.
It should be noted that in the above-described embodiments, it, at this time can be by first if the second ratio is less than the second default ratio Type of webpage of the known web pages type as webpage to be judged corresponding to ratio.
Present invention also provides a kind of computer program products, first when being performed on data processing equipment, being adapted for carrying out The program code of beginningization there are as below methods step:
Obtain the page info of webpage to be judged;
Heading message is extracted from the page info;
Judge that the preset keyword is the pass comprising type of webpage whether comprising preset keyword in the heading message Key word;
If not comprising the preset keyword in the heading message, based on the page structure corresponding to the page info Information and/or the heading message obtain the type of webpage of the webpage to be judged.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware can be used in the application Apply the form of example.Moreover, the computer for wherein including computer usable program code in one or more can be used in the application The computer program production that usable storage medium is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The application is with reference to the flow according to the method for the embodiment of the present application, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that it can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided The processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that the instruction performed by computer or the processor of other programmable data processing devices is generated for real The device of function specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction generation being stored in the computer-readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps are performed on calculation machine or other programmable devices to generate computer implemented processing, so as in computer or The instruction offer performed on other programmable devices is used to implement in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, CD-ROM read-only memory (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, available for storing the information that can be accessed by a computing device.It defines, calculates according to herein Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It these are only embodiments herein, be not limited to the application.To those skilled in the art, The application can have various modifications and variations.All any modifications made within spirit herein and principle, equivalent replacement, Improve etc., it should be included within the scope of claims hereof.

Claims (12)

  1. A kind of 1. method for differentiating type of webpage, which is characterized in that including:
    Obtain the page info of webpage to be judged;
    Heading message is extracted from the page info;
    Judge that the preset keyword is the key comprising type of webpage whether comprising preset keyword in the heading message Word;
    If not comprising the preset keyword in the heading message, based on the page structure information corresponding to the page info And/or the heading message obtains the type of webpage of the webpage to be judged.
  2. 2. according to the method described in claim 1, it is characterized in that, the method further includes:
    If comprising the preset keyword in the heading message, using the type of webpage corresponding to the preset keyword as institute State the type of webpage of webpage to be judged.
  3. 3. according to the method described in claim 1, it is characterized in that, the page info for obtaining webpage to be judged includes:
    The webpage to be judged is parsed, extracts the domain name of the corresponding link of the webpage to be judged;
    Uniform resource position mark URL corresponding to analog access domain name crawls the page info of the webpage to be judged.
  4. 4. according to the method described in any one in claim 1-3, which is characterized in that described right based on the page info The type of webpage that the page structure information and/or the heading message answered obtain the webpage to be judged includes:
    Obtain the page info of several webpages as with reference to standard under at least one known web pages type;
    It is withdrawn as in the page structure information corresponding to page info from the webpage as with reference to standard with reference to mark Accurate label information, and count the quantity under each known web pages type as the label information with reference to standard;
    At least one label information is extracted in the page structure information corresponding to page info from the webpage to be judged;
    Each label information with the label information as with reference to standard is matched respectively, and counts each described The quantity of the label information of successful match under known web pages type;
    Obtain the quantity of the label information of successful match and conduct under the known web pages type under each known web pages type The ratio of the quantity of the label information of reference standard, and the ratio and default ratio are compared;
    If the ratio is more than or equal to the default ratio, the known web pages type corresponding to the ratio is waited to sentence as described in The type of webpage of suspension page.
  5. 5. according to the method described in any one in claim 1-3, which is characterized in that described right based on the page info The type of webpage that the page structure information and/or the heading message answered obtain the webpage to be judged includes:
    Obtain the heading message of several webpages as with reference to standard under at least one known web pages type;
    It is split out from the heading message as with reference to the webpage of standard as the phrase with reference to standard, and count each institute State the quantity as the phrase with reference to standard under known web pages type;
    At least one phrase is split out from the heading message of the webpage to be judged;
    Each phrase is matched, and count each known web pages with the phrase as with reference to standard respectively The quantity of the phrase of successful match under type;
    Under each known web pages type of acquisition under the quantity and the known web pages type of the phrase of successful match as reference The ratio of the quantity of the phrase of standard, and the ratio and default ratio are compared;
    If the ratio is more than or equal to the default ratio, the known web pages type corresponding to the ratio is waited to sentence as described in The type of webpage of suspension page.
  6. 6. according to the method described in any one in claim 1-3, which is characterized in that described right based on the page info The type of webpage that the page structure information and/or the heading message answered obtain the webpage to be judged includes:
    Obtain the page info of several webpages as with reference to standard under at least one known web pages type;
    It is withdrawn as in the page structure information corresponding to page info from the webpage as with reference to standard with reference to mark Accurate label information, and count the quantity under each known web pages type as the label information with reference to standard;
    At least one label information is extracted in the page structure information corresponding to page info from the webpage to be judged;
    Each label information with the label information as with reference to standard is matched respectively, and counts each described The quantity of the label information of successful match under known web pages type;
    Obtain the quantity of the label information of successful match and conduct under the known web pages type under each known web pages type First ratio of the quantity of the label information of reference standard, and first ratio and the first default ratio are compared;
    If first ratio is more than or equal to the described first default ratio, the known web pages class corresponding to first ratio is obtained The heading message of several webpages as with reference to standard under type;
    It is split out from the heading message as with reference to the webpage of standard as the phrase with reference to standard, and count each institute State the quantity as the phrase with reference to standard under known web pages type;
    At least one phrase is split out from the heading message of the webpage to be judged;
    Each phrase is matched, and count each known web pages with the phrase as with reference to standard respectively The quantity of the phrase of successful match under type;
    Under each known web pages type of acquisition under the quantity and the known web pages type of the phrase of successful match as reference Second ratio of the quantity of the phrase of standard, and second ratio and the second default ratio are compared;
    If second ratio is more than or equal to the described second default ratio, by the known web pages type corresponding to second ratio Type of webpage as the webpage to be judged.
  7. 7. a kind of device for differentiating type of webpage, which is characterized in that including:
    Acquisition module, for obtaining the page info of webpage to be judged;
    Extraction module, for extracting heading message from the page info;
    Judgment module, for judging that the preset keyword is includes net whether comprising preset keyword in the heading message The keyword of page type;
    Processing module, if in the heading message do not include the preset keyword when, based on the page info institute it is right The page structure information and/or the heading message answered obtain the type of webpage of the webpage to be judged.
  8. 8. device according to claim 7, which is characterized in that the processing module, if being additionally operable in the heading message During comprising the preset keyword, using the webpage of the webpage to be judged as described in of the type of webpage corresponding to the preset keyword Type.
  9. 9. device according to claim 8, which is characterized in that the acquisition module includes:
    Resolution unit for being parsed to the webpage to be judged, extracts the domain of the corresponding link of the webpage to be judged Name;
    Analog access unit for the uniform resource position mark URL corresponding to analog access domain name, crawls and described waits to judge The page info of webpage.
  10. 10. according to the device described in any one in claim 7-9, which is characterized in that the processing module includes:
    First acquisition unit, for obtaining the page of several webpages as with reference to standard under at least one known web pages type Information;
    First statistic unit, for from the page structure information corresponding to the page info of the webpage as with reference to standard The label information of reference standard is withdrawn as, and counts and believes under each known web pages type as the label with reference to standard The quantity of breath;
    First extraction unit, for extracted from the page structure information corresponding to the page info of the webpage to be judged to A few label information;
    First matching unit, for using each label information respectively with it is described as with reference to standard label information carry out Match, and count the quantity of the label information of successful match under each known web pages type;
    First comparing unit, for obtaining under each known web pages type the quantity of the label information of successful match with this Know the ratio as the quantity of the label information with reference to standard under type of webpage, and the ratio and default ratio are compared Compared with;
    First output unit, if being more than or equal to the default ratio for the ratio, by the Hownet corresponding to the ratio The type of webpage of page type webpage to be judged as described in.
  11. 11. according to the device described in any one in claim 7-9, which is characterized in that the processing module includes:
    Second acquisition unit, for obtaining the title of several webpages as with reference to standard under at least one known web pages type Information;
    Second statistic unit, for being split out from the heading message as with reference to the webpage of standard as with reference to standard Phrase, and count the quantity under each known web pages type as the phrase with reference to standard;
    First split cells, for splitting out at least one phrase from the heading message of the webpage to be judged;
    Second matching unit, for each phrase to be matched, and unite respectively with the phrase as with reference to standard The quantity of the phrase of successful match under each known web pages type of meter;
    Second comparing unit, for obtaining under each known web pages type the quantity of the phrase of successful match and the Hownet Ratio under page type as the quantity of the phrase with reference to standard, and the ratio and default ratio are compared;
    Second output unit, if be more than or equal to the default ratio for the ratio, known to corresponding to the ratio The type of webpage of type of webpage webpage to be judged as described in.
  12. 12. according to the device described in any one in claim 7-9, which is characterized in that the processing module includes:
    Third acquiring unit, for obtaining the page of several webpages as with reference to standard under at least one known web pages type Information;
    Third statistic unit, for from the page structure information corresponding to the page info of the webpage as with reference to standard The label information of reference standard is withdrawn as, and counts and believes under each known web pages type as the label with reference to standard The quantity of breath;
    Second extraction unit, for extracted from the page structure information corresponding to the page info of the webpage to be judged to A few label information;
    Third matching unit, for using each label information respectively with it is described as with reference to standard label information carry out Match, and count the quantity of the label information of successful match under each known web pages type;
    Third comparing unit, for obtaining under each known web pages type the quantity of the label information of successful match with this Know the first ratio as the quantity of the label information with reference to standard under type of webpage, and first ratio is preset with first Ratio is compared;
    4th acquiring unit if be more than or equal to the first default ratio for first ratio, obtains first ratio The heading message of several webpages as with reference to standard under the corresponding known web pages type of value;
    4th statistic unit, for being split out from the heading message as with reference to the webpage of standard as with reference to standard Phrase, and count the quantity under each known web pages type as the phrase with reference to standard;
    Second split cells, for splitting out at least one phrase from the heading message of the webpage to be judged;
    4th matching unit, for each phrase to be matched, and unite respectively with the phrase as with reference to standard The quantity of the phrase of successful match under each known web pages type of meter;
    4th comparing unit, for obtaining under each known web pages type the quantity of the phrase of successful match and the Hownet The second ratio under page type as the quantity of the phrase with reference to standard, and second ratio and the second default ratio are carried out Compare;
    Third output unit, if be more than or equal to the second default ratio for second ratio, by second ratio The type of webpage of corresponding known web pages type webpage to be judged as described in.
CN201611270198.0A 2016-12-29 2016-12-29 Method and device for judging webpage type Active CN108255891B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611270198.0A CN108255891B (en) 2016-12-29 2016-12-29 Method and device for judging webpage type

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611270198.0A CN108255891B (en) 2016-12-29 2016-12-29 Method and device for judging webpage type

Publications (2)

Publication Number Publication Date
CN108255891A true CN108255891A (en) 2018-07-06
CN108255891B CN108255891B (en) 2020-08-28

Family

ID=62721846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611270198.0A Active CN108255891B (en) 2016-12-29 2016-12-29 Method and device for judging webpage type

Country Status (1)

Country Link
CN (1) CN108255891B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287409A (en) * 2019-06-05 2019-09-27 新华三信息安全技术有限公司 A kind of webpage type identification method and device
CN113297525A (en) * 2021-06-17 2021-08-24 恒安嘉新(北京)科技股份公司 Webpage classification method and device, electronic equipment and storage medium
WO2021253252A1 (en) * 2020-06-17 2021-12-23 深圳市欢太数字科技有限公司 Method and apparatus for testing webpage, and electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727500A (en) * 2010-01-15 2010-06-09 清华大学 Text classification method of Chinese web page based on steam clustering
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
WO2012083874A1 (en) * 2010-12-22 2012-06-28 北大方正集团有限公司 Webpage information detection method and system
CN103309862A (en) * 2012-03-07 2013-09-18 腾讯科技(深圳)有限公司 Webpage type recognition method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN101727500A (en) * 2010-01-15 2010-06-09 清华大学 Text classification method of Chinese web page based on steam clustering
WO2012083874A1 (en) * 2010-12-22 2012-06-28 北大方正集团有限公司 Webpage information detection method and system
CN103309862A (en) * 2012-03-07 2013-09-18 腾讯科技(深圳)有限公司 Webpage type recognition method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287409A (en) * 2019-06-05 2019-09-27 新华三信息安全技术有限公司 A kind of webpage type identification method and device
CN110287409B (en) * 2019-06-05 2022-07-22 新华三信息安全技术有限公司 Webpage type identification method and device
WO2021253252A1 (en) * 2020-06-17 2021-12-23 深圳市欢太数字科技有限公司 Method and apparatus for testing webpage, and electronic device and storage medium
CN113297525A (en) * 2021-06-17 2021-08-24 恒安嘉新(北京)科技股份公司 Webpage classification method and device, electronic equipment and storage medium
CN113297525B (en) * 2021-06-17 2023-12-12 恒安嘉新(北京)科技股份公司 Webpage classification method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN108255891B (en) 2020-08-28

Similar Documents

Publication Publication Date Title
US20220197923A1 (en) Apparatus and method for building big data on unstructured cyber threat information and method for analyzing unstructured cyber threat information
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
CN108255862B (en) A kind of search method and device of judgement document
WO2014101783A1 (en) Method and server for performing cloud detection for malicious information
CN113282955B (en) Method, system, terminal and medium for extracting privacy information in privacy policy
Sarne et al. Unsupervised topic extraction from privacy policies
CN102609412A (en) RSS (Really Simple Syndication)-based multi-thread graphic information synchronization crawling control method and system
US20150100877A1 (en) Method or system for automated extraction of hyper-local events from one or more web pages
CN108255891A (en) A kind of method and device for differentiating type of webpage
Cardoso et al. An efficient language-independent method to extract content from news webpages
Tavakoli et al. Metadata analysis of open educational resources
CN112818200A (en) Data crawling and event analyzing method and system based on static website
CN105183843B (en) list page identification system and method
CN105786929B (en) A kind of information monitoring method and device
Siddiqui et al. Developing an Arabic plagiarism detection corpus
CN116108776A (en) Method for improving completeness of chip verification test plan
CN112559754A (en) Judgment result processing method and device
CN106462614B (en) Information analysis system, information analysis method, and information analysis program
CN105868346A (en) Picture extraction method and device applied to web page
Bosse et al. Web Data Mining 1: Collecting textual data from web pages using R
Alqahtani Automated Extraction of Security Concerns from Bug Reports
CN106649337A (en) Method and device for identifying webpage column
CN107220362A (en) A kind of web crawlers for network documentation extracts URL and the framework for indexing and being mapped with keyword
CN108062337A (en) A kind of method and device to label to reptile seed
CN108536688A (en) It was found that the whole network multi-language website and the method for obtaining parallel corpora

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100080 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant