CN106407217B - Navigation webpage identification method and device - Google Patents

Navigation webpage identification method and device Download PDF

Info

Publication number
CN106407217B
CN106407217B CN201510463490.3A CN201510463490A CN106407217B CN 106407217 B CN106407217 B CN 106407217B CN 201510463490 A CN201510463490 A CN 201510463490A CN 106407217 B CN106407217 B CN 106407217B
Authority
CN
China
Prior art keywords
webpage
identified
link number
target value
navigation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510463490.3A
Other languages
Chinese (zh)
Other versions
CN106407217A (en
Inventor
孙德彬
冯鸳鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510463490.3A priority Critical patent/CN106407217B/en
Publication of CN106407217A publication Critical patent/CN106407217A/en
Application granted granted Critical
Publication of CN106407217B publication Critical patent/CN106407217B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a navigation webpage identification method and device. The identification method of the navigation webpage comprises the following steps: acquiring webpage content of a webpage to be identified; analyzing the webpage content to obtain an analysis result; determining a first link number and a second link number according to the analysis result, wherein the first link number is the number of webpage links contained in the webpage to be identified, the second link number is the number of characteristic links contained in the webpage to be identified, and the characteristic links are webpage links used for expanding the link number of the webpage to be identified; and judging whether the webpage to be identified is a navigation webpage or not according to the first link number and the second link number. By the method and the device, the problem of poor accuracy in navigation webpage identification is solved.

Description

Navigation webpage identification method and device
Technical Field
The invention relates to the technical field of webpage identification, in particular to a navigation webpage identification method and device.
Background
In the internet field, for the purpose of application, the category of the web page needs to be identified many times. For example, in the application of web crawler, it is generally detected whether the web address (URL) of a web page has been crawled, and if it is detected that the web address (URL) of a web page has been crawled, the web page does not need to be crawled again. However, in the crawling process, for some web pages, content web pages which can be linked to the web pages can be known by crawling the web pages, and therefore, the web pages are web pages which need to be repeatedly crawled in the crawler application and are called navigation web pages (or navigation list pages).
At present, for how to identify a navigation web page, a method generally adopted is to analyze crawled web page content and obtain the number of links contained in the crawled web page content or a content length ratio (a ratio of the content length contained in the links to the total content length of the web page). And when the number of the links is more than a preset value or the content length ratio is more than a preset value, the webpage is considered as a navigation webpage. The method can identify some webpages, but for some content webpages, the content webpages contain a large number of webpage links or contain little webpage content, and the detection by the method is highly likely to judge the content webpages as navigation webpages, namely, the webpage identification errors are caused.
Aiming at the problem of poor accuracy of identifying navigation web pages in the related technology, no effective solution is provided at present.
Disclosure of Invention
The invention mainly aims to provide a navigation webpage identification method and a navigation webpage identification device, so as to solve the problem of poor accuracy of navigation webpage identification.
In order to achieve the above object, according to one aspect of the present invention, there is provided an identification method of a navigation web page.
The method for identifying the navigation webpage comprises the following steps: acquiring webpage content of a webpage to be identified; analyzing the webpage content to obtain an analysis result; determining a first link number and a second link number according to the analysis result, wherein the first link number is the number of webpage links contained in the webpage to be identified, the second link number is the number of characteristic links contained in the webpage to be identified, and the characteristic links are webpage links used for expanding the link number of the webpage to be identified; and judging whether the webpage to be identified is a navigation webpage or not according to the first link number and the second link number.
Further, judging whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number comprises the following steps: acquiring a first target value according to the first link number; acquiring a second target value according to the second link number; comparing the sum of the first target value and the second target value with a first preset value to obtain a first comparison result; and judging whether the webpage to be identified is a navigation webpage or not according to the first comparison result, wherein if the first comparison result is that the sum of the first target value and the second target value is greater than a first preset value, the webpage to be identified is determined to be the navigation webpage, and if the first comparison result is that the sum of the first target value and the second target value is not greater than the first preset value, the webpage to be identified is determined not to be the navigation webpage.
Further, before judging whether the web page to be identified is the navigation web page according to the first link number and the second link number, the method further comprises the following steps: determining a uniform resource locator contained in the webpage to be identified according to the analysis result; determining the level number of the uniform resource locators, and judging whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number comprises the following steps: and judging whether the webpage to be identified is a navigation webpage or not according to the first link number, the second link number and the hierarchy number.
Further, before determining whether the web page to be identified is the navigation web page according to the first link number, the second link number and the hierarchy number, the method further includes: judging whether the uniform resource locator contains first information to obtain a first judgment result, wherein the first information is information for identifying the webpage category characteristics of the webpage to be identified, and judging whether the webpage to be identified is a navigation webpage according to the first link number, the second link number and the hierarchy number comprises the following steps of: and judging whether the webpage to be identified is a navigation webpage or not according to the first link number, the second link number, the hierarchy number and the first judgment result.
Further, before determining whether the web page to be identified is the navigation web page according to the first link number, the second link number, the hierarchy number and the first determination result, the method further includes: judging whether the webpage to be identified comprises second information according to the analysis result to obtain a second judgment result, wherein the second information is information used for identifying the webpage content characteristics of the webpage to be identified, and judging whether the webpage to be identified is a navigation webpage according to the first link number, the second link number, the hierarchy number and the first judgment result comprises the following steps: and judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number, the hierarchy number, the first judgment result and the second judgment result.
Further, the step of judging whether the web page to be identified is the navigation web page according to the first link number, the second link number, the hierarchy number, the first judgment result and the second judgment result comprises the following steps: acquiring a first target value according to the first link number; acquiring a second target value according to the second link number; acquiring a third target value according to the number of the levels; acquiring a fourth target value according to the first judgment result; acquiring a fifth target value according to the second judgment result; comparing the sum of the first target value, the second target value, the third target value, the fourth target value and the fifth target value with a second preset value to obtain a second comparison result; and judging whether the webpage to be identified is a navigation webpage or not according to a second comparison result, wherein if the second comparison result is that the sum of the first target value, the second target value, the third target value, the fourth target value and the fifth target value is greater than a second preset value, the webpage to be identified is determined to be the navigation webpage, and if the second comparison result is that the sum of the first target value, the second target value, the third target value, the fourth target value and the fifth target value is not greater than the second preset value, the webpage to be identified is determined not to be the navigation webpage.
Further, before judging whether the web page to be identified is the navigation web page according to the first link number and the second link number, the method further comprises the following steps: determining a uniform resource locator contained in the webpage to be identified according to the analysis result; judging whether the uniform resource locator contains first information to obtain a first judgment result, wherein the first information is information for identifying the webpage category characteristics of the webpage to be identified, and judging whether the webpage to be identified is a navigation webpage according to the first link number and the second link number comprises the following steps of: and judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number and the first judgment result.
Further, before judging whether the web page to be identified is the navigation web page according to the first link number and the second link number, the method further comprises the following steps: judging whether the webpage to be identified comprises second information according to the analysis result to obtain a second judgment result, wherein the second information is information used for identifying the webpage content characteristics of the webpage to be identified, and judging whether the webpage to be identified is a navigation webpage according to the first link number and the second link number comprises the following steps: and judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number and the second judgment result.
In order to achieve the above object, according to another aspect of the present invention, a storage medium is provided, where the storage medium includes a stored program, and when the program runs, a device on which the storage medium is located is controlled to execute the above method for identifying a navigation webpage.
In order to achieve the above object, according to another aspect of the present invention, there is provided a processor for executing a program, wherein the program executes the method for identifying a navigation webpage.
In order to achieve the above object, according to another aspect of the present invention, there is provided an identification apparatus for navigating a web page.
The navigation webpage recognition device comprises: the acquisition unit is used for acquiring the webpage content of the webpage to be identified; the analysis unit is used for analyzing the webpage content to obtain an analysis result; the determining unit is used for determining a first link number and a second link number according to the analysis result, wherein the first link number is the number of the webpage links contained in the webpage to be identified, the second link number is the number of the feature links contained in the webpage to be identified, and the feature links are webpage links used for expanding the link number of the webpage to be identified; and the judging unit is used for judging whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number.
Further, the judging unit includes: the first obtaining module is used for obtaining a first target value according to the first link number; the second acquisition module is used for acquiring a second target value according to the second link number; the comparison module is used for comparing the sum of the first target value and the second target value with a first preset value to obtain a first comparison result; and the judging module is used for judging whether the webpage to be identified is the navigation webpage or not according to the first comparison result, wherein if the first comparison result is that the sum of the first target value and the second target value is greater than a first preset value, the webpage to be identified is determined to be the navigation webpage, and if the first comparison result is that the sum of the first target value and the second target value is not greater than the first preset value, the webpage to be identified is determined not to be the navigation webpage.
According to the invention, the method comprising the following steps is adopted: acquiring webpage content of a webpage to be identified; analyzing the webpage content to obtain an analysis result; determining a first link number and a second link number according to the analysis result, wherein the first link number is the number of webpage links contained in the webpage to be identified, the second link number is the number of characteristic links contained in the webpage to be identified, and the characteristic links are webpage links used for expanding the link number of the webpage to be identified; and judging whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number, so that the problem of poor accuracy of identifying the navigation webpage is solved, and further, judging whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number achieves the effect of improving the accuracy of identifying the navigation webpage.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart of an identification method of a navigation web page according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a method for identifying a navigation web page according to a second embodiment of the present invention;
FIG. 3 is a diagram illustrating a feature link in an identification method for navigating a web page according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating second information in an identification method for navigating a web page according to an embodiment of the present invention; and
fig. 5 is a schematic diagram of an apparatus for identifying a navigation web page according to an embodiment of the invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an embodiment of the present invention, a method for identifying a navigation web page is provided.
Fig. 1 is a flowchart of an identification method of a navigation web page according to a first embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, acquiring the webpage content of the webpage to be identified.
In this step, the web page to be identified is a web page for identifying whether it is a navigation web page. For the web page to be identified, the identification result may be a navigation web page, or other web pages, such as a content web page. The webpage content of the webpage to be identified can be obtained by crawling of a web crawler. In the application of the web crawler, the web crawler usually crawls a large number of web pages at one time, and in order to identify whether the web pages need to be crawled for a plurality of times subsequently, whether the web pages are navigation web pages needs to be identified, so that the large number of web pages can be identified one by one as the web pages to be identified.
And step S104, analyzing the webpage content to obtain an analysis result.
In this step, the parsing of the web page content may be implemented by an HTML parser (i.e., HTML Agility Pack). The HTML parser is a hypertext markup language parser, is a tool for analyzing hypertext markup language web pages, and can acquire detailed information of the web pages, including links, extraction and analysis of source codes, and the like.
And step S106, determining a first link number and a second link number according to the analysis result, wherein the first link number is the number of the webpage links contained in the webpage to be identified, the second link number is the number of the characteristic links contained in the webpage to be identified, and the characteristic links are the webpage links used for expanding the link number of the webpage to be identified.
In this step, the first number of links is the total number of links included in the web page to be identified. For navigating a web page, a large number of web page links are typically included. Therefore, the number of links included in the web page to be identified can be used as an important reference index for researching whether the web page to be identified is a navigation web page. However, the number of links included in the web page to be identified is not a sufficient requirement for verifying whether the web page to be identified is the navigation web page, that is, the web page to be identified is not necessarily the navigation web page when a large number of links exist in the web page to be identified. For example, for some content web pages, there may also be a large number of web page links in the web page content. Therefore, in the identification method of the navigation web page according to the embodiment, the number of the feature links is used as another reference index for researching whether the web page to be identified is the navigation web page.
The characteristic link is a webpage link used for expanding the link quantity of the webpage to be identified. In the navigation web page, due to space limitation of the web page or special layout of the web page, some web page links are often hidden, and the hidden links can be displayed by clicking the characteristic links. For example, in a news navigation web page, several (but not all) news links are shown for each type of news section, and other news links can be shown by clicking on the "more" named link in each section. The more links here are the characteristic links in this embodiment. The present invention does not limit the specific link names of the feature links. For example, in some other navigation web pages, the feature link may be named "expand", "heat list", or the like.
It should be noted that, the method for identifying a navigation webpage according to this embodiment takes into account the characteristic link, so that the expansibility of the method is greatly improved. Statistics on new feature links can be added at any time along with the execution of the recognition method. For example, if a web page analyst finds that a new web page link for expanding the link quantity of a web page exists on some web pages, the new web page link can be used as a characteristic link, and the characteristic link can be used as a basis for counting the second link quantity in a subsequent navigation web page identification process.
And step S108, judging whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number.
By taking the obtained first link number and the second link number as the consideration indexes, whether the webpage to be identified is the navigation webpage or not can be judged. According to the embodiment, the second link number is added on the basis of considering the first link number, so that the judgment result is closer to the real situation, namely, the accuracy of navigation webpage identification is improved.
According to the identification method of the navigation webpage, the identification method comprises the following steps: acquiring webpage content of a webpage to be identified; analyzing the webpage content to obtain an analysis result; determining a first link number and a second link number according to the analysis result, wherein the first link number is the number of webpage links contained in the webpage to be identified, the second link number is the number of characteristic links contained in the webpage to be identified, and the characteristic links are webpage links used for expanding the link number of the webpage to be identified; and judging whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number, so that the problem of poor accuracy of identifying the navigation webpage is solved, and further, judging whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number achieves the effect of improving the accuracy of identifying the navigation webpage.
Preferably, the step of judging whether the web page to be identified is the navigation web page according to the first link number and the second link number may include the steps of: acquiring a first target value according to the first link number; acquiring a second target value according to the second link number; comparing the sum of the first target value and the second target value with a first preset value to obtain a first comparison result; and judging whether the webpage to be identified is a navigation webpage or not according to the first comparison result, wherein if the first comparison result is that the sum of the first target value and the second target value is greater than a first preset value, the webpage to be identified is determined to be the navigation webpage, and if the first comparison result is that the sum of the first target value and the second target value is not greater than the first preset value, the webpage to be identified is determined not to be the navigation webpage.
In the embodiment, the first link number and the second link number of the to-be-identified webpage are converted into the numerical values under the same level, and the numerical values are compared with the first preset value, so that whether the to-be-identified webpage is the navigation page or not is judged according to the comparison result. The first preset value is a preset reference value, and may be a result of counting the sum of the first target value and the second target value in a large number of navigation webpages.
The above-mentioned features of a large number of navigation pages for statistics determine the reference value (the feature of the first preset value), so that the reference value may be a numerical value having general applicability or applicability to a certain type of pages. For example, the sum of the first target value and the second target value of a large number of news navigation web pages is counted to obtain a first preset value. The first preset value is a constant value different for different types of web pages.
For example, an algorithm may be preset, and when the number of web page links in the web page to be identified exceeds a first preset link number, the first target value is a first preset target value; and when the webpage to be identified contains the characteristic links, the second target value is the product of the number of the characteristic links and a second preset target value. For example, when the number of web page links in the web page to be identified exceeds 25, the first preset target value is 20; if the number of feature links is 5, the second target value is a product of 5 and 10 (the second preset target value).
For another example, assuming that the first preset value is 60, the number of web page links in the web page to be identified is 30, and the number of feature links is 2, according to the algorithm in the above example, the first target value is 20, the second target value is 20, and the sum of the first target value and the second target value is not greater than the first preset value which is 60, the web page to be identified may be considered not to be a navigation page.
The above example may be understood as employing a scoring mechanism. For example, when the number of the web page links in the web page to be identified exceeds 25, 20 points are added; if there are more links, then 10 points are added on the basis of the original points for each more link. And when the total score exceeds 60 points (a first preset value), determining the webpage to be identified as the navigation webpage.
It should be noted that the algorithm for calculating the first target value and the second target value is not specifically limited in the embodiments of the present invention. For example, the first target value and the second target value may be obtained by the following calculation method: when the number of the webpage links in the webpage to be identified is larger than the first preset link number and is smaller than or equal to the second preset link number, the first target value is a first preset target value, and when the number of the webpage links in the webpage to be identified is larger than the second preset link number, the first target value is a third preset target value; and when the number of the characteristic links contained in the webpage to be identified exceeds the third preset link number, the second target value is the product of the number of the characteristic links and the second preset target value.
Preferably, before determining whether the web page to be identified is the navigation web page according to the first link number and the second link number, the method may further include: determining a uniform resource locator contained in the webpage to be identified according to the analysis result; determining the level number of the uniform resource locators, and judging whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number comprises the following steps: and judging whether the webpage to be identified is a navigation webpage or not according to the first link number, the second link number and the hierarchy number.
For a web page, generally, the greater the number of levels that its Uniform Resource Locator (URL) contains, the lower the probability that it is a navigation web page. For example, the URL of the first webpage (content webpage) ishttp://www.***.cn/xinwen/2015- 02/01/content_*******.htmThe URL of the second webpage (navigation webpage) is http:// www. x.cn/xinwen, and the number of URL levels (4) of the first webpage is more than that (1) of the second webpage. Therefore, according to the identification method of the navigation web page of the embodiment, the number of the URL hierarchy levels is used for judging whether the web page to be identified is the web page to be identified or notIs an index for navigating a web page. Due to the fact that the number of the URL levels is taken into consideration, the accuracy of the identification of the navigation webpage is higher.
For example, judging whether the web page to be identified is the navigation web page according to the first link number, the second link number and the hierarchy number can be realized by the following steps: first, assume that the preset algorithm is as follows: when the number of the first links exceeds 20, the first target value corresponding to the number of the first links is 20, otherwise, the first target value is 0; the second target value corresponding to the characteristic link is the product of the number of the characteristic links and 10; the third target value corresponding to the number of levels is the product of the number of levels and-5; and the preset value for comparison with the sum of the 3 target values is 60. And then, substituting the determined first link number, second link number and level number of the webpage to be identified into the preset algorithm. Assuming that the first number of links is 10, the second number of links is 2, and the number of levels is 4, the sum of the 3 target values is 0 and less than 60. And finally, determining that the webpage to be identified is not the navigation webpage according to the comparison result.
It should be noted that the present embodiment does not limit the specific preset algorithm. Different preset algorithms can be set according to different recognition accuracy requirements.
In the above embodiment, before determining whether the web page to be identified is the navigation web page according to the first number of links, the second number of links, and the number of hierarchical levels, the method may further include: judging whether the uniform resource locator contains first information to obtain a first judgment result, wherein the first information is information for identifying the webpage category characteristics of the webpage to be identified, and judging whether the webpage to be identified is a navigation webpage according to the first link number, the second link number and the hierarchy number comprises the following steps of: and judging whether the webpage to be identified is a navigation webpage or not according to the first link number, the second link number, the hierarchy number and the first judgment result.
The URL of a web page often contains a lot of information, wherein the URL of some web pages contains information for identifying the web page category characteristics of the web page to be identified, i.e. the first information mentioned above. For example, URL:http://www.***.cn/xinwen/ 2015-02/01/content_*******.htmthe URL includes a keyword "content" indicating that the web page is likely to be a content web page. Here, "content" may be used to identify the web page as a content web page, but the web page having the "content" keyword in the URL is not necessarily a content web page, and is merely representative of the content web page with a high possibility. Therefore, in the identification method of the navigation webpage according to the embodiment, whether the URL includes the first information is used as an index for identifying the navigation webpage, and the introduction of the index can increase the accuracy of identifying the navigation webpage.
For example, judging whether the web page to be identified is the navigation web page according to the first link number, the second link number, the hierarchy number and the first judgment result can be realized by the following steps: first, assume that the preset algorithm is as follows: when the number of the first links exceeds 20, the first target value corresponding to the number of the first links is 20, otherwise, the first target value is 0; the second target value corresponding to the characteristic link is the product of the number of the characteristic links and 10; the third target value corresponding to the number of levels is the product of the number of levels and-5; when the judgment result is that the content is contained, the fourth target value corresponding to the result is the product of-1 and 10, and when the judgment result is that the content is not contained, the fourth target value corresponding to the result is the product of 1 and 10; and the preset value for comparison with the sum of the 4 target values is 30. And then, substituting the determined first link number, second link number, level number and first judgment result of the webpage to be identified into the preset algorithm. Assuming that the first link number is 25, the second link number is 2, the number of hierarchical levels is 2, and "content" is not included, the sum of the 4 target values is 40, which is greater than 30. And finally, determining the webpage to be identified as the navigation webpage according to the comparison result.
It should be noted that the first information is not specifically limited in the embodiment of the present invention. For example, the first information may further include: item, page, list, nav, etc. Item, page, etc. may be information for identifying a content web page (when the URL of a web page includes such information, the probability that the web page is not a navigation web page is high), and list, nav, etc. may be information for identifying a navigation web page (when the URL of a web page includes such information, the probability that the web page is a navigation web page is high).
In the above embodiment, before determining whether the web page to be identified is the navigation web page according to the first number of links, the second number of links, the number of hierarchical layers, and the first determination result, the method further includes: judging whether the webpage to be identified comprises second information according to the analysis result to obtain a second judgment result, wherein the second information is information used for identifying the webpage content characteristics of the webpage to be identified, and judging whether the webpage to be identified is a navigation webpage according to the first link number, the second link number, the hierarchy number and the first judgment result comprises the following steps: and judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number, the hierarchy number, the first judgment result and the second judgment result.
In this embodiment, the second information is information for identifying a web page content characteristic of the web page to be recognized. For example, some content webpages often include keywords such as "font", "share", "source", "edit", etc., which identify that the webpage can change the font, content characteristics such as sharing, source information, etc. According to the method for identifying the navigation webpage, whether the webpage contains the second information or not is used as an index for judging the navigation webpage, and the accuracy of identifying the navigation webpage is improved.
In the above embodiment, determining whether the web page to be identified is the navigation web page according to the first link number, the second link number, the hierarchy number, the first determination result, and the second determination result may further include: acquiring a first target value according to the first link number; acquiring a second target value according to the second link number; acquiring a third target value according to the number of the levels; acquiring a fourth target value according to the first judgment result; acquiring a fifth target value according to the second judgment result; comparing the sum of the first target value, the second target value, the third target value, the fourth target value and the fifth target value with a second preset value to obtain a second comparison result; and judging whether the webpage to be identified is a navigation webpage or not according to a second comparison result, wherein if the second comparison result is that the sum of the first target value, the second target value, the third target value, the fourth target value and the fifth target value is greater than a second preset value, the webpage to be identified is determined to be the navigation webpage, and if the second comparison result is that the sum of the first target value, the second target value, the third target value, the fourth target value and the fifth target value is not greater than the second preset value, the webpage to be identified is determined not to be the navigation webpage.
For example, judging whether the web page to be identified is the navigation web page according to the first link number, the second link number, the hierarchy number, the first judgment result and the second judgment result can be realized by the following steps: first, assume that the preset algorithm is as follows: when the number of the first links exceeds 20, the first target value corresponding to the number of the first links is 20, otherwise, the first target value is 0; the second target value corresponding to the characteristic link is the product of the number of the characteristic links and 10; the third target value corresponding to the number of levels is the product of the number of levels and-5; when the first judgment result is that the content is contained, the fourth target value corresponding to the result is the product of-1 and 10, and when the first judgment result is that the content is not contained, the fourth target value corresponding to the result is the product of 1 and 10; when the second determination result includes "source", the fifth target value corresponding to the result is the product of-1 and 5, and when the second determination result does not include "source", the fifth target value corresponding to the result is the product of 1 and 5; and the preset value for comparison with the sum of the 5 target values is 30. Assume that the first link number is 25, the second link number is 2, the number of hierarchical levels is 2, the URL does not include "content", and the web content does not include "source". The sum of the above 5 target values is 45, which is greater than 30. And finally, determining the webpage to be identified as the navigation webpage according to the comparison result.
It should be noted that the present invention does not specifically limit the algorithm for calculating the first target value to the fifth target value, and meanwhile, the first information, the second information, and the feature link may also be expanded according to the actual recognition situation, so as to further improve the accuracy of the navigation web page recognition.
Preferably, before determining whether the web page to be identified is the navigation web page according to the first link number and the second link number, the method further includes: determining a uniform resource locator contained in the webpage to be identified according to the analysis result; judging whether the uniform resource locator contains first information to obtain a first judgment result, wherein the first information is information for identifying the webpage category characteristics of the webpage to be identified, and judging whether the webpage to be identified is a navigation webpage according to the first link number and the second link number comprises the following steps of: and judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number and the first judgment result.
As described above, the first information (information on the category characteristics of the web page) included in the URL can be used as an important index for identifying the navigation web page. In order to improve the execution efficiency of the web page identification, the three indexes of the first link number, the second link number and the first judgment result can be comprehensively considered when the navigation web page identification is carried out. According to the embodiment, the accuracy of the navigation webpage identification is improved on the premise of ensuring the accuracy of the navigation webpage identification.
Preferably, before determining whether the web page to be identified is the navigation web page according to the first link number and the second link number, the method further includes: judging whether the webpage to be identified comprises second information according to the analysis result to obtain a second judgment result, wherein the second information is information used for identifying the webpage content characteristics of the webpage to be identified, and judging whether the webpage to be identified is a navigation webpage according to the first link number and the second link number comprises the following steps: and judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number and the second judgment result.
As described above, the second information (information of the web content characteristics) included in the web content is an important index for navigation web page identification. In order to improve the execution efficiency of the web page identification, the three indexes of the first link number, the second link number and the second judgment result can be comprehensively considered when the navigation web page identification is carried out. According to the embodiment, the accuracy of the navigation webpage identification is improved on the premise of ensuring the accuracy of the navigation webpage identification.
It should be noted that, in the identification method of the navigation web page according to the embodiment of the present invention, the following indexes are mainly involved: the number of first links, the number of second links, the number of hierarchical layers, the first determination result (information on the web page category characteristics), and the second determination result (information on the web page content characteristics). According to different requirements of webpage identification, the first link number, the second link number and other optional indexes can be combined to identify the navigation webpage. When different types of web pages are identified, different priority levels may be assigned to each index. For example, when identifying a news navigation page, since "content" is not generally contained in the URL of the news navigation page, but "content" is generally present in the URL of the news content page, three indexes, i.e., the first link count, the second link count, and the first determination result (information on the page category characteristics), may be prioritized when identifying the news navigation page. On the premise of meeting the preset accuracy requirement, the fewer the considered indexes are, the higher the accuracy of the navigation webpage identification is. If the identification accuracy is further improved, a new index (such as the number of levels) can be further added.
In addition, in addition to the embodiments specifically described above, the method for identifying a navigation webpage according to the present invention may further include the following two embodiments: 1. judging whether the webpage to be identified is a navigation webpage or not according to the first link number, the second link number, the first judgment result and the second judgment result; 2. and judging whether the webpage to be identified is a navigation webpage or not according to the first link number, the second link number, the hierarchy number and the second judgment result. The specific indexes included therein have already been introduced in the above description, and are not described herein again.
Fig. 2 is a flowchart of a method for identifying a navigation web page according to a second embodiment of the present invention, which can be taken as a preferred implementation of the embodiment shown in fig. 1. As shown in fig. 2, the method comprises the steps of:
step S202, acquiring the webpage content of the webpage to be identified.
In step S204, the web page content is parsed by the HTML parser.
And step S206, acquiring the calculation indexes of the link calculation module, the URL calculation module and the content calculation module according to the analysis result.
The calculation indexes are calculation parameters which are input into the link calculation module, the content calculation module and the URL calculation module, algorithms for calculation are stored in each calculation module in advance, and a first calculation value (corresponding to the link calculation module), a second calculation value (corresponding to the content calculation module) and a third calculation value (corresponding to the URL calculation module) can be respectively obtained by adding corresponding calculation indexes into each module.
The user can improve the preparation of the navigation webpage identification by adding the calculation index. The following is the format of the configuration file for the link calculation module, the content calculation module, or the URL calculation module:
< ModuleName >// Note, here the module name, e.g. Link Module
< Item >// Note, here a single calculation index, e.g., total number of links
< Name >
< Type > < Type >
< Value reference Value >
< Score > Score </Score >
<Item>
</ModuleName>
The judgment index can be expanded according to the configuration file format, and the three calculation modules calculate corresponding values according to the configuration file.
Step S208, a first calculated value is obtained according to the calculation index of the link calculation module.
The calculation index related to the link calculation module may include: the total number of links and the characteristic links of the webpage to be identified. Fig. 3 is a schematic diagram of a feature link in an identification method of a navigation web page according to an embodiment of the present invention. As shown in FIG. 3, more links 1 are the above-mentioned feature links. Since the embodiment shown in fig. 1 has already been described, it is not described here again.
The configuration file suitable for linking the computing modules contains the following programs:
<links>
<link>
<Name>Count</Name>
<Type>GreaterThan</Type>
<Value>25</Value>
<Score>20</Score>
</link>
<link>
< Name > more </Name >
<Type>Exist</Type>
<Value>1</Value>
<Score>10</Score>
</link>
<links>
The above configuration file indicates that if there are more than 25 links in the web page, the first calculated value is increased by 20, and if there are "more" links, the first calculated value is increased by 10 for each "more" link.
It should be noted that the configuration files of the URL calculation module and the content calculation module are the same as the configuration file of the link calculation module, and only the configuration files (the calculation indexes involved therein) are different.
Step S210, obtaining a second calculation value according to the calculation index of the URL calculation module.
The calculation index included by the URL calculation module may include: the number of the levels of the URL and whether the URL comprises the first information. The first information here is the same as the first information in the embodiment shown in fig. 1, and is not described here again.
Step S212, a third calculation value is obtained according to the calculation index of the content calculation module.
The calculation index included in the content calculation module may include: whether the web page to be identified includes the second information, where the second information is the same as the second information in the embodiment shown in fig. 1, and fig. 4 is a schematic diagram of the second information in the identification method of the navigation web page according to the embodiment of the present invention. As shown in fig. 4, the keyword "font" 3, the keyword "share" 4, and the keyword "source" 5 are all the above second information, and may be used to identify the web page content features of the web page to be identified. The second information has already been described in the embodiment shown in fig. 1, and is not described again here.
The accuracy of the navigation webpage identification can be improved by adding the judgment on the second information, and the method has good expansibility and can add or modify configuration at any time.
The above steps S208 to S212 do not limit the specific execution sequence.
In step S214, the sum of the first calculated value, the second calculated value, and the third calculated value is obtained.
In step S216, it is detected whether the sum of the first calculated value, the second calculated value and the third calculated value is greater than a preset value.
In step S218, if it is detected that the sum of the first calculated value, the second calculated value and the third calculated value is greater than the preset value, it is determined that the web page to be identified is a navigation web page.
The preset value in this step can be used as a standard value for judgment, when the sum of the first calculated value, the second calculated value and the third calculated value exceeds the preset value, the web page to be identified is determined as a navigation web page, and when the sum of the first calculated value, the second calculated value and the third calculated value does not exceed the preset value, the web page to be identified is determined as a content web page.
According to the method for identifying the navigation webpage in the embodiment, the method comprises the steps S202 to S218, so that the problem of poor accuracy of identifying the navigation webpage is solved, the first calculated value, the second calculated value and the third calculated value are respectively calculated through the link calculation module, the URL calculation module and the content calculation module, and then the sum of the three calculated values is compared with the preset value to judge whether the webpage to be identified is the navigation webpage or not, so that the effect of improving the accuracy of identifying the navigation webpage is achieved.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
In order to achieve the above object, according to another aspect of the present invention, an embodiment of the present invention further provides a storage medium, where the storage medium includes a stored program, and when the program runs, a device on which the storage medium is located is controlled to execute the above method for identifying a navigation webpage.
In order to achieve the above object, according to another aspect of the present invention, an embodiment of the present invention further provides a processor, where the processor is configured to execute a program, where the program executes the above method for identifying a navigation webpage.
In the following, according to an embodiment of the present invention, an apparatus for identifying a navigation web page is provided.
It should be noted that the identification apparatus for a navigation web page according to the embodiment of the present invention may execute the identification method for a navigation web page according to the embodiment of the present invention, and the identification method for a navigation web page according to the embodiment of the present invention may also be executed by the identification apparatus for a navigation web page according to the embodiment of the present invention.
Fig. 5 is a schematic diagram of an apparatus for identifying a navigation web page according to an embodiment of the invention. As shown in fig. 5, the apparatus includes: an acquisition unit 10, an analysis unit 20, a determination unit 30 and a judgment unit 40.
The acquiring unit 10 is used for acquiring the web page content of the web page to be identified.
The parsing unit 20 is configured to parse the web page content to obtain a parsing result.
The determining unit 30 is configured to determine a first link number and a second link number according to the analysis result, where the first link number is the number of web page links included in the web page to be identified, the second link number is the number of feature links included in the web page to be identified, and the feature links are web page links used for expanding the number of links of the web page to be identified.
And the judging unit 40 is used for judging whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number.
According to the embodiment, the identification device for the navigation web page comprises: an obtaining unit 10, configured to obtain web page content of a web page to be identified; the analysis unit 20 is configured to analyze the web page content to obtain an analysis result; the determining unit 30 is configured to determine a first link number and a second link number according to the analysis result, where the first link number is the number of web page links included in the web page to be identified, the second link number is the number of feature links included in the web page to be identified, and the feature links are web page links used for expanding the number of links of the web page to be identified; and the judging unit 40 is used for judging whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number, so that the problem of poor accuracy of identifying the navigation webpage is solved, and then the judging unit 40 judges whether the webpage to be identified is the navigation webpage or not according to the first link number and the second link number, so that the effect of improving the accuracy of identifying the navigation webpage is achieved.
Preferably, the judging unit 40 includes: the first obtaining module is used for obtaining a first target value according to the first link number; the second acquisition module is used for acquiring a second target value according to the second link number; the comparison module is used for comparing the sum of the first target value and the second target value with a first preset value to obtain a first comparison result; and the judging module is used for judging whether the webpage to be identified is the navigation webpage or not according to the first comparison result, wherein if the first comparison result is that the sum of the first target value and the second target value is greater than a first preset value, the webpage to be identified is determined to be the navigation webpage, and if the first comparison result is that the sum of the first target value and the second target value is not greater than the first preset value, the webpage to be identified is determined not to be the navigation webpage.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for identifying a navigation webpage is characterized by comprising the following steps:
acquiring webpage content of a webpage to be identified;
analyzing the webpage content to obtain an analysis result;
determining a first link number and a second link number according to the analysis result, wherein the first link number is the number of web page links contained in the web page to be identified, the second link number is the number of feature links contained in the web page to be identified, the feature links are web page links for expanding the link number of the web page to be identified, and the feature links are used for displaying hidden links under the condition of receiving clicks; and
judging whether the webpage to be identified is a navigation webpage or not according to the sum of the target value corresponding to the first link number and the target value corresponding to the second link number;
wherein, judging whether the webpage to be identified is the navigation webpage according to the sum of the target value corresponding to the first link number and the target value corresponding to the second link number comprises:
acquiring a first target value according to the first link number;
acquiring a second target value according to the second link number;
comparing the sum of the first target value and the second target value with a first preset value to obtain a first comparison result; and
judging whether the webpage to be identified is the navigation webpage or not according to the first comparison result,
if the first comparison result is that the sum of the first target value and the second target value is greater than the first preset value, the webpage to be identified is determined to be the navigation webpage, and if the first comparison result is that the sum of the first target value and the second target value is not greater than the first preset value, the webpage to be identified is determined not to be the navigation webpage.
2. The method of claim 1,
before judging whether the webpage to be identified is a navigation webpage according to the first link number and the second link number, the method further comprises the following steps: determining a uniform resource locator contained in the webpage to be identified according to the analysis result; determining a number of levels of the uniform resource locator,
judging whether the webpage to be identified is a navigation webpage or not according to the first link number and the second link number comprises the following steps: and judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number and the hierarchy number.
3. The method of claim 2,
before determining whether the web page to be identified is the navigation web page according to the first link number, the second link number and the hierarchy number, the method further includes: judging whether the uniform resource locator contains first information or not to obtain a first judgment result, wherein the first information is information for identifying the webpage category characteristics of the webpage to be identified,
judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number and the hierarchy number comprises the following steps: and judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number, the hierarchy number and the first judgment result.
4. The method of claim 3,
before determining whether the web page to be identified is the navigation web page according to the first link number, the second link number, the hierarchy number and the first determination result, the method further includes: judging whether the webpage to be identified comprises second information according to the analysis result to obtain a second judgment result, wherein the second information is information for identifying the webpage content characteristics of the webpage to be identified,
judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number, the hierarchy number and the first judgment result comprises the following steps: and judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number, the hierarchy number, the first judgment result and the second judgment result.
5. The method of claim 4, wherein determining whether the web page to be identified is the navigation web page according to the first number of links, the second number of links, the number of hierarchical levels, the first determination result, and the second determination result comprises:
acquiring a first target value according to the first link number;
acquiring a second target value according to the second link number;
obtaining a third target value according to the number of the levels;
acquiring a fourth target value according to the first judgment result;
acquiring a fifth target value according to the second judgment result;
comparing the sum of the first target value, the second target value, the third target value, the fourth target value and the fifth target value with a second preset value to obtain a second comparison result; and
judging whether the webpage to be identified is the navigation webpage or not according to the second comparison result,
and if the second comparison result is that the sum of the first target value, the second target value, the third target value, the fourth target value and the fifth target value is greater than the second preset value, determining that the webpage to be identified is the navigation webpage, and if the second comparison result is that the sum of the first target value, the second target value, the third target value, the fourth target value and the fifth target value is not greater than the second preset value, determining that the webpage to be identified is not the navigation webpage.
6. The method of claim 1,
before judging whether the webpage to be identified is a navigation webpage according to the first link number and the second link number, the method further comprises the following steps: determining a uniform resource locator contained in the webpage to be identified according to the analysis result; judging whether the uniform resource locator contains first information or not to obtain a first judgment result, wherein the first information is information for identifying the webpage category characteristics of the webpage to be identified,
judging whether the webpage to be identified is a navigation webpage or not according to the first link number and the second link number comprises the following steps: and judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number and the first judgment result.
7. The method of claim 1,
before judging whether the webpage to be identified is a navigation webpage according to the first link number and the second link number, the method further comprises the following steps: judging whether the webpage to be identified comprises second information according to the analysis result to obtain a second judgment result, wherein the second information is information for identifying the webpage content characteristics of the webpage to be identified,
judging whether the webpage to be identified is a navigation webpage or not according to the first link number and the second link number comprises the following steps: and judging whether the webpage to be identified is the navigation webpage or not according to the first link number, the second link number and the second judgment result.
8. An apparatus for identifying a navigation web page, comprising:
the acquisition unit is used for acquiring the webpage content of the webpage to be identified;
the analysis unit is used for analyzing the webpage content to obtain an analysis result;
a determining unit, configured to determine a first link number and a second link number according to the analysis result, where the first link number is a number of web page links included in the web page to be identified, the second link number is a number of feature links included in the web page to be identified, the feature links are web page links used for expanding a link number of the web page to be identified, and the feature links are used for displaying hidden links when a click is received; and
the judging unit is used for judging whether the webpage to be identified is a navigation webpage or not according to the sum of the target value corresponding to the first link number and the target value corresponding to the second link number;
wherein the judging unit includes:
a first obtaining module, configured to obtain a first target value according to the first link number;
the second obtaining module is used for obtaining a second target value according to the second link number;
the comparison module is used for comparing the sum of the first target value and the second target value with a first preset value to obtain a first comparison result; and
a judging module for judging whether the webpage to be identified is the navigation webpage according to the first comparison result,
if the first comparison result is that the sum of the first target value and the second target value is greater than the first preset value, the webpage to be identified is determined to be the navigation webpage, and if the first comparison result is that the sum of the first target value and the second target value is not greater than the first preset value, the webpage to be identified is determined not to be the navigation webpage.
9. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device in which the storage medium is located is controlled to execute the identification method for navigating the webpage in any one of claims 1 to 7.
10. A processor, characterized in that the processor is configured to execute a program, wherein the program executes the method for identifying a navigation webpage according to any one of claims 1 to 7.
CN201510463490.3A 2015-07-31 2015-07-31 Navigation webpage identification method and device Active CN106407217B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510463490.3A CN106407217B (en) 2015-07-31 2015-07-31 Navigation webpage identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510463490.3A CN106407217B (en) 2015-07-31 2015-07-31 Navigation webpage identification method and device

Publications (2)

Publication Number Publication Date
CN106407217A CN106407217A (en) 2017-02-15
CN106407217B true CN106407217B (en) 2019-12-24

Family

ID=58007750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510463490.3A Active CN106407217B (en) 2015-07-31 2015-07-31 Navigation webpage identification method and device

Country Status (1)

Country Link
CN (1) CN106407217B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102810097B (en) * 2011-06-02 2016-03-02 高德软件有限公司 Webpage text content extracting method and device
US8990958B2 (en) * 2012-08-31 2015-03-24 Salesforce.Com, Inc. Systems and methods for content management in an on demand environment
CN103077254B (en) * 2013-02-06 2017-11-03 人民日报媒体技术股份有限公司 Webpage acquisition methods and device
CN103218452B (en) * 2013-04-27 2016-08-10 人民搜索网络股份公司 A kind of method and apparatus identifying effectively link in Hub page
CN104182482B (en) * 2014-08-06 2018-05-22 中国科学院计算技术研究所 A kind of news list page determination methods and the method for screening news list page

Also Published As

Publication number Publication date
CN106407217A (en) 2017-02-15

Similar Documents

Publication Publication Date Title
US8185530B2 (en) Method and system for web document clustering
US20030040887A1 (en) System and process for constructing and analyzing profiles for an application
US20110167063A1 (en) Techniques for categorizing web pages
US20080134015A1 (en) Web Site Structure Analysis
EP3289487B1 (en) Computer-implemented methods of website analysis
CN103617213B (en) Method and system for identifying newspage attributive characters
CN107015839B (en) Method and device for realizing front-end event agent
Pant Deriving link-context from HTML tag tree
CN104320312A (en) Network application safety test tool and fuzz test case generation method and system
CN104281629A (en) Method and device for extracting picture from webpage and client equipment
CN113918794A (en) Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium
US9749352B2 (en) Apparatus and method for collecting harmful website information
Alarte et al. Page-level main content extraction from heterogeneous webpages
CN110781497B (en) Method for detecting web page link and storage medium
EP2937801B1 (en) Harmful site collection device and method
Alarte et al. Webpage menu detection based on DOM
CN106407217B (en) Navigation webpage identification method and device
KR20120090131A (en) Method, system and computer readable recording medium for providing search results
Behfarshad et al. Hidden-web induced by client-side scripting: An empirical study
Alam et al. A data-driven score model to assess online news articles in event-based surveillance system
CN110825976B (en) Website page detection method and device, electronic equipment and medium
CN104063506A (en) Method and device for identifying repeated web pages
US20160092458A1 (en) System for automatically generating wrapper for entire websites
CA3069382C (en) Multi-document intersection acquisition method and document server
KR20150054322A (en) Apparatus for colleting of harmful sites and method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant