WO2023206988A1 - 网站的网页处理方法、装置、电子设备和存储介质 - Google Patents

网站的网页处理方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2023206988A1
WO2023206988A1 PCT/CN2022/126010 CN2022126010W WO2023206988A1 WO 2023206988 A1 WO2023206988 A1 WO 2023206988A1 CN 2022126010 W CN2022126010 W CN 2022126010W WO 2023206988 A1 WO2023206988 A1 WO 2023206988A1
Authority
WO
WIPO (PCT)
Prior art keywords
website
weight
web page
quality
page
Prior art date
Application number
PCT/CN2022/126010
Other languages
English (en)
French (fr)
Inventor
刘伟
林赛群
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023206988A1 publication Critical patent/WO2023206988A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to web page processing methods, devices, electronic devices and storage media for websites in the field of big data.
  • thresholds are usually set, and based on the set thresholds, the proportion of high-quality website statistics or the proportion of low-quality website statistics is judged.
  • this determination method is too simple, resulting in a negative impact on the website. The accuracy of quality judgment is low.
  • the present disclosure provides a web page processing method, device, electronic device and storage medium for a website.
  • a web page processing method of a website may include: obtaining multiple web pages of the website, where the multiple web pages are used to construct the website; and determining the weight of each web page based on the association between the multiple web pages, where the weight is used to characterize the impact of each web page on the website.
  • the contribution proportion of the website determine the website quality of the website based on the weight of each web page and the page quality of each web page; grade the website based on the website quality to obtain the quality level of the website.
  • a web page processing device for a website.
  • the device may include one or more processors, and one or more memories storing program units, wherein the program units are executed by the processors, and the program units include: an acquisition component configured to acquire multiple web pages of the website, wherein the multiple Web pages are used to build a website; the first determination component is set to determine the weight of each web page based on the association between multiple web pages, where the weight is used to represent the contribution proportion of each web page to the website; the second determination component , is set to determine the website quality of the website based on the weight of each web page and the page quality of each web page; the grading component is set to grade the website based on the website quality to obtain the quality level of the website.
  • an electronic device may include: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one process
  • the server can execute the web page processing method of the website according to the embodiment of the present disclosure.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the web page processing method of the website according to the embodiment of the present disclosure.
  • a computer program product which may include a computer program.
  • the computer program When executed by a processor, the computer program implements the web page processing method of a website according to an embodiment of the present disclosure.
  • Figure 1 is a flow chart of a web page processing method of a website according to an embodiment of the present disclosure
  • Figure 2 is a schematic diagram of a site structure diagram according to an embodiment of the present disclosure
  • Figure 3 is a schematic diagram of a target structure tree according to an embodiment of the present disclosure.
  • Figure 4 is a schematic diagram of a web page processing device of a website according to an embodiment of the present disclosure
  • 4A is a schematic diagram of a non-transitory computer-readable storage medium storing computer instructions according to an embodiment of the present disclosure
  • Figure 4B is a schematic diagram of a computer program product according to an embodiment of the present disclosure.
  • FIG. 5 is a block diagram of an electronic device of a web page processing method for a website according to an embodiment of the present disclosure.
  • Figure 1 is a flow chart of a web page processing method for a website according to an embodiment of the present disclosure. As shown in Figure 1, the method may include the following steps:
  • Step S102 Obtain multiple web pages of the website, where the multiple web pages are used to build the website.
  • the website can be composed of multiple web pages, and the purpose of acquiring multiple web pages in the website can be achieved by acquiring multiple historical web pages in the website, where the web page can be a page, For example, it can be a list page, index page, content page, etc. It can also be subdivided into articles, videos, forums, blogs, downloads, pictures, Q&A web pages, etc. There are no specific restrictions here.
  • Step S104 Determine the weight of each web page based on the association between multiple web pages, where the weight is used to represent the contribution proportion of each web page to the website.
  • the association relationship between multiple web pages is determined, and the weight of each web page is determined based on the association relationship between the multiple web pages, where the weight can be the weight of each web page to the website.
  • the contribution ratio can be used to characterize the degree of contribution to the website in the process of building the website.
  • the correlation between web pages can be the order in which the web pages appear, the inclusion relationship, etc.
  • the method for confirming the degree of contribution is specifically limited.
  • Step S106 Determine the website quality of the website based on the weight of each web page and the page quality of each web page.
  • the weight of each web page is determined.
  • the page quality can include cheating, low-quality, ordinary, and high-quality page quality.
  • the field value of the page quality (for example, quality_info) represents the field value.
  • the field value can be set to a continuous value or a discrete value to represent the page quality. For example, it can be represented by 0, 1, 2, 3; the website quality can be used to represent the website.
  • the degree of quality can be expressed by the website quality field value (for example, site_info), and the field can be used to mark the quality of the website. For example, it can be expressed by 0, 1, 2, 3, etc. There are no specific restrictions on the expression method here.
  • the website quality of the website can be determined based on the weight and page quality of all web pages of the website, or nodes in the website can be sampled and selected, thereby saving operating costs.
  • the quality/level of a website is generally determined statistically based on the quality of historical web pages in the website.
  • the statistical method can be the proportion of high quality or the proportion of low quality. , and then set a threshold for judgment, but this judgment method is too violent and does not take into account the weight of different pages' contributions to the site.
  • the embodiment of the present disclosure proposes a method to determine the weight of each web page based on the structural distribution of pages within the site, and achieve a more accurate determination of the quality and grade of the site by adjusting the weight.
  • Step S108 Classify the website based on the website quality to obtain the quality level of the website.
  • the quality of the website is determined, and the websites are graded based on the determined qualities of multiple websites, thereby achieving the purpose of obtaining the quality level of the website.
  • multiple web pages of the website are obtained, where the multiple web pages are used to construct the website; based on the association between the multiple web pages, the weight of each web page is determined, where the weight is used to characterize each The contribution ratio of web pages to the website; determine the website quality of the website based on the weight of each web page and the page quality of each web page; grade the website based on the website quality to obtain the quality level of the website. That is to say, the embodiment of the present disclosure determines the quality of the website through the weights contributed by different web pages to the website, and jointly determines the quality of the website based on the weight and the first quality information of the page. This method is accurate and highly applicable, thereby improving the accuracy of determining the quality of the website. The technical effect solves the technical problem of low accuracy in judging website quality.
  • step S104 determining the weight of each web page based on the association between multiple web pages includes: obtaining a target structure tree of the website, where the target structure tree is used to represent the relationship between multiple web pages.
  • the association relationship, and the nodes of the target structure tree are used to represent web pages; the weight of each web page is determined based on the target structure tree.
  • the association relationship between multiple web pages is determined, and the target structure tree of the website is determined based on the association relationship between the multiple web pages.
  • the weight of each web page can be determined based on the position of the node in the target structure tree, Among them, the target structure tree can be a site structure diagram, which can be used to represent the association between multiple web pages.
  • a web page can be regarded as a node, and the nodes of the target structure tree correspond to the web pages one-to-one.
  • the target structure tree of the website is obtained based on the association between web pages.
  • the target structure tree starts from the homepage of the website, points from the homepage to the next webpage on the homepage, and so on, thereby obtaining the target structure tree.
  • determining the weight of each web page based on the target structural tree includes: determining the weight of each web page based on at least the attribute information of each web page in the target structural tree, wherein the attributes of the nodes of the target structural tree Used to represent corresponding attribute information.
  • the weight of the web page can be determined based on the attribute information of the web page, where the attribute information can include the web page type, web page quality, edge type and corresponding structural information of the web page.
  • this embodiment may determine the attribute information of the node by determining the attribute information of the web page corresponding to the node.
  • web page types may include: list pages (i.e., index pages), content pages, and may further include: articles, videos, forums, blogs, downloads, pictures, Q&A, etc.; page quality may include: cheating, low-quality , normal, high quality, where the page quality field value (for example, quality_info) can be set to a continuous value or a discrete value, which can be used to distinguish the page quality.
  • list pages i.e., index pages
  • content pages may further include: articles, videos, forums, blogs, downloads, pictures, Q&A, etc.
  • page quality may include: cheating, low-quality , normal, high quality, where the page quality field value (for example, quality_info) can be set to a continuous value or a discrete value, which can be used to distinguish the page quality.
  • This disclosed embodiment performs comprehensive statistics on nodes based on the attribute information of web pages, thereby achieving accurate scoring of web pages.
  • there are many methods for judging the quality and type of web pages This is only an example. There are no specific restrictions on the method of judging page quality and type.
  • the edge types may include guiding edges, jump edges, adapting edges, etc.
  • the guiding edges may be clicked from one page to enter another page.
  • page A has a link pointing to page B, such as , the link from page A points to page B, this is the diversion edge
  • the jump edge can be: automatically jump from one page to another page (that is, page A automatically jumps to B), for example, domain name change
  • the adaptation edge can be an adaptation relationship between two pages. For example, a computer page can automatically jump to a mobile site, or a web link can automatically jump to a program, etc.
  • this embodiment can set different weights according to the web page type, quality, edge structure and structural information. For example, assuming that the weight of the list page in the web page type is 0.5, which is higher than the weight of the content page, then when the web page corresponding to the type is content page, the weight of the web page is 0.5, thereby achieving the purpose of determining the weight of each web page based on the attribute information of each web page in the target structure tree.
  • determining the weight of each web page based at least on the attribute information of each web page in the target structure tree includes: determining the first weight of each web page based on the attribute information, where the weight includes the first weight; Based on the target association relationship between each web page and the associated web page in the target structure tree, determine the second weight of each web page, where the weight includes the second weight, and the target association relationship is used to characterize the relationship between the corresponding web page and the associated web page.
  • the processing sequence based on the depth information of the target association relationship relative to the homepage of the website in the target structure tree, determine the third weight of each web page, where the weight includes the third weight.
  • the first weight of each web page is determined based on the attribute information
  • the second weight of each web page is determined based on the target association relationship between each web page and the associated web page in the target structure tree
  • the second weight of each web page is determined based on the target association relationship.
  • the third weight of each web page in the target structure, relative to the depth information of the home page of the website, determine the third weight of each web page.
  • the first weight can include node type weight (for example, w1), node quality weight (for example, w2), and can be based on actual needs.
  • the target association relationship can be a jump relationship, a diversion relationship, etc.
  • the second weight can be the node edge weight (w3) , the weight can be determined based on different edge types (for example, w3).
  • the target association relationship between each web page in the target structure tree and the associated web page can be determined. This can be based on the distance between each web page and the associated web page.
  • the relative position of the target association relationship in the target structure tree relative to the website can be determined.
  • the depth information of the home page (for example, deep_info) determines the third weight based on the depth information.
  • the third weight can be the node structure weight (w4), where the value range of the depth information can be 0 to 1. As the depth information increases As the value increases, the node value of the target structure tree decreases.
  • the value of the depth information can be increased by 1.
  • the attribute information includes the type of the corresponding web page and/or the page quality of the web page.
  • the type of the web page corresponding to the node of the target structure tree is determined, and based on the type of the web page corresponding to the node, the attribute information of the node is determined, where the attribute information includes the type of the corresponding web page and/or the page quality of the page. .
  • the types of web pages may include: list pages (for example, index pages), content pages, and may further include articles, videos, forums, blogs, downloads, pictures, questions and answers, etc., where the page type may be represented by a field value.
  • the field value of the page type (for example, pagetype_info) can be a flag bit, which can be used for weight filtering.
  • the page quality of the web page can include cheating, low quality, ordinary, and high quality.
  • the page quality can be represented by a field value.
  • the field value of the page quality (quality_info) can be set to a continuous value or a discrete value, which can be used to Distinguish the page quality. For example, 0, 1, 2, and 3 respectively represent different page qualities.
  • this embodiment has multiple methods for judging the quality and type of web pages, and the embodiments of this disclosure do not specifically limit the methods for judging the quality and type of web pages.
  • the target association relationship is used to represent at least one of the following relationships between a web page and an associated web page: a web page adapts to an associated web page, a web page jumps to an associated web page, and a web page directs to an associated web page.
  • the target association relationship can be a relationship between edges within the site.
  • the edges within the site can be represented by field values.
  • the field values of the edges (for example, edge_info) can be flag bits, which can be used to filter weights.
  • the target association relationship is used to represent at least one of the following relationships between the web page and the associated web page: the web page is adapted to the associated web page, the web page jumps to the associated web page, and the web page is directed to the associated web page, wherein the web page is adapted to
  • the associated web page can correspond to the adaptation edge, so the web page and the associated web page can have an adaptation relationship.
  • the computer page can automatically jump to the mobile site, and the web page link can automatically jump to the program, etc.; the web page can jump to the associated web page.
  • the jump edge can automatically jump from one page to another page (for example, page A automatically jumps to page B), for example, domain name change; webpage diversion to related webpages can correspond to the diversion edge, which can be Clicking from one page leads to another page.
  • page A has a link pointing to page B.
  • a link from page A points to page B.
  • step S106 determines the website quality of the website based on the weight of each web page and the page quality of each web page, including: adjusting each web page based on the first weight, the second weight, and the second weight.
  • the webpage quality of each webpage; the adjusted webpage quality of each webpage is converted into the website quality of the website based on the third weight and depth information.
  • the webpage quality of each webpage is adjusted based on the above-mentioned first weight, the above-mentioned second weight and the above-mentioned second weight of each webpage, and the adjusted webpage quality of each webpage is based on the above-mentioned third weight and depth information. Quality translates into website quality of the website.
  • the webpage quality may be represented by the information value of the node (for example, node_info), and the webpage quality of each webpage may be determined based on the first weight, the second weight and the second weight of each webpage.
  • the adjusted web page quality of each web page is converted into the website quality of the website based on the third weight and depth information.
  • the website quality can be determined based on the obtained node information.
  • converting the adjusted web page quality of each web page into the website quality of the website based on the third weight and depth information includes: performing an exponential operation on the third weight based on the depth information to obtain a power; obtaining the power and the adjusted webpage quality of each webpage; sum the multiple products corresponding to multiple webpages to obtain the website quality.
  • an exponential operation is performed on the third weight based on the depth information to obtain a power; the product between the power and the adjusted webpage quality of each webpage is obtained; and multiple products corresponding to multiple webpages are obtained. and, to get website quality.
  • node_struct_info node_info*(w4) deep_info
  • the website quality can be obtained by comprehensive statistics of the page quality of all nodes of the website, which can also be called a comprehensive statistical value (site_info).
  • site_info comprehensive statistical value
  • site_info sigmoid( ⁇ (node_struct_info))
  • adjusting the webpage quality of each webpage based on the first weight, the second weight and the second weight includes: combining the first weight, the second weight, the second weight and the webpage quality of each webpage. The product between the four is determined as the adjusted web page quality of each web page.
  • the product between the first weight, the second weight, the second weight and the webpage quality of each webpage is obtained, and the product is determined as the adjusted webpage quality of each webpage.
  • the webpage quality can be represented by the information value of the node (for example, node_info), and the webpage quality of each webpage can be determined based on the first weight, the second weight and the second weight of each webpage, where the information value of the node It can be the product of w1, w2, w3 and the field value of web page quality (for example, quality_info), for example:
  • node_info w1*w2*w3*quality_info
  • obtaining the target structure tree of the website includes: building the original structure tree of the website based on the attribute information of each web page and the target association relationship between each web page and associated web pages, where the original structure tree It is used to represent all the associations between multiple web pages, and the nodes of the original structure tree are the same as the nodes of the target structure tree.
  • the target association is used to represent the processing sequence between the corresponding web pages and the associated web pages;
  • the original structure tree is A child node with a first number of parent nodes is adjusted to a child node with a second number of parent nodes, where the second number is less than the first number; based on the child node and the child node with the second number of parent nodes relative to the original
  • the depth information of the root node of the structure tree is used to construct the target structure tree.
  • the original structure tree of the website is constructed, and the child nodes with multiple parent nodes can be adjusted to convert the original structure
  • the child nodes with the first number of parent nodes in the tree are adjusted to the child nodes with the second number of parent nodes, and the simplified structure tree is processed, based on the child nodes with the second number of parent nodes and the child nodes relative to Depth information of the root node of the original structure tree is used to construct the target structure tree.
  • the original structure tree can be a structure graph network, which can be used to represent all associations between multiple web pages, and the nodes of the original structure tree are consistent with the target structure tree.
  • the nodes are the same, and the target association relationship is used to characterize the processing sequence between the corresponding web page and the associated web page.
  • the second number is smaller than the first number, so the edges are filtered to achieve only one in-degree edge and redundant edges are removed.
  • the edges within the website can be filtered to discard the cyclic edges, thereby adjusting the cyclic graph to a one-way graph, so as to adjust the child nodes with the first number of parent nodes in the original structure tree to have the second number.
  • the home page points to the four lists up, down, left, and right. Since the home page points to the list on the left and points to the list above, the list on the home page points to the list on the left of the home page. The edge pointing to the list on the left of the home page can be omitted.
  • node A points to node B
  • node B points to node A
  • the edge from node B to node A can be discarded.
  • only one in-degree edge may be retained for all nodes, or all nodes may be retained.
  • all nodes may be retained.
  • the traversal can be depth traversal or breadth traversal. Traversal, there are no specific restrictions on the traversal method here.
  • the weights of all in-degree edges can be compared, and the in-degree edges with high weights can be limitedly retained, thereby obtaining the structure tree after filtering the edges.
  • the in-degree edges with high weight of each node can be retained to obtain the target structure tree.
  • At least one of the following processes is performed on the website: including the website, displaying the website, or The web pages are sorted by their weight.
  • the quality threshold range and the grade threshold range can be set according to the actual situation, and it is determined whether the quality of the website satisfies the quality threshold range, or whether the grade of the website satisfies the grade threshold range.
  • at least one of the following processes is performed on the website: including the website, displaying the website, and sorting the weight of the web pages of the website.
  • the quality threshold range can be set according to the actual situation, and the site can be judged or segmented based on comprehensive statistical values.
  • the quality judgment of the entire site can be based on the belief that the site is of low quality and cheating, and can be not included, not displayed, low ranked, etc.
  • a quality threshold range is set, and when the comprehensive statistical results are within this range, it is considered not included, or not included. Not displayed, ranked low, or included or displayed.
  • the site quality is graded based on the comprehensive statistical value, so that the inclusion, display and ranking weight of the site can be adjusted.
  • the ranking weight can be used to increase the weight of high-quality websites and reduce the weight of low-quality websites.
  • the quality and grade of the website hold a high weight.
  • the quality/grade of the site is generally determined statistically based on the quality of the historical web pages in the site.
  • the statistical method can account for the high quality. ratio, or the proportion of low quality, and then set a threshold for judgment.
  • the above judgment method does not take into account the weight of the contribution of different web pages to the site, and there is a problem of insufficient accuracy and easy misjudgment.
  • the embodiment of the present disclosure proposes a method of adjusting the weight of the structure tree based on the structural distribution of web pages within the website, thereby more accurately completing the determination of the quality and level of the site.
  • Figure 2 is a schematic diagram of a site structure diagram according to an embodiment of the present disclosure.
  • a page in the website can be used as a node, and based on the type of the page corresponding to the node, a Site structure diagram network.
  • the attribute information of the node is determined by determining the attribute information of the page corresponding to the node, where the attribute information of the page may include page type and page quality.
  • page types may include: list pages (for example, index pages), content pages, and may further include: articles, videos, forums, blogs, downloads, pictures, Q&A, etc., where the page types may be represented by field values.
  • the field value of the page type (for example, pagetype_info) can be a flag bit, which can be used to filter the weight.
  • the page quality can include: cheating, low quality, ordinary, and high quality.
  • the page quality can be represented by a field value (for example, quality_info).
  • quality_info The field value of the page quality can be set to a continuous value or a discrete value. You can use To distinguish page quality, for example, 0, 1, 2, and 3 respectively represent different page qualities.
  • Figure 3 is a schematic diagram of a target structure tree according to an embodiment of the present disclosure.
  • the directed graph is triggered from the homepage, filters the edges in the website, and discards cyclic edges. , to adjust the cyclic graph into a one-way graph.
  • the home page points to the four lists up, down, left, and right. Since the home page points to the list on the left and points to the list above, the list on the home page can be omitted on the side pointing to the list on the left of the home page.
  • node A points to node B
  • node B points to node A
  • the edge from node B to node A can be discarded.
  • the edges may include guiding edges, jump edges, and adapting edges.
  • the guiding edges may be clicks from one page to enter another page. For example, if page A has a link pointing to page B, it is a guiding edge.
  • Streaming edge; jump edge can be an automatic jump from one page to another (i.e., page A automatically jumps to B), for example, domain name change; adaptation edge can be an adaptation relationship between two pages, for example, The computer page can automatically jump to the mobile site, and the web link can also automatically jump to the program, etc.
  • all nodes may retain only one in-degree edge, or may retain all in-degree edges. There is no specific limitation here.
  • the traversal can be a depth traversal or a deep traversal. Breadth traversal, there are no specific restrictions on the traversal method here.
  • the weights of all in-degree edges can be compared, and the in-degree edges with higher weights can be retained first, thereby obtaining the structure tree after filtering the edges.
  • the in-degree edges with high weight of each node can be retained to obtain the target structure tree.
  • the distance between the node and the homepage can be calculated to obtain the depth information (for example, deep_info).
  • the homepage directly points to the list above the homepage, then the list above the homepage
  • the depth information of the list can be 1. If the home page points to the list on the right through the list above, the depth information of the list on the right can be 2.
  • calculating the weight of a node may include calculating node type weight (for example, w1), node quality weight (for example, w2), node edge weight (for example, w3), and node structure weight (for example, w4) perform calculations.
  • node type weight for example, w1
  • node quality weight for example, w2
  • node edge weight for example, w3
  • node structure weight for example, w4
  • the node type weight (for example, w1) can be determined based on different page types.
  • the weight of the list page can be set to be higher than that of the content page.
  • node quality weights may be determined based on different page quality levels.
  • the higher the quality the higher the weight; if used for mining of low quality, the opposite can be true.
  • node edge weights (eg, w3) can be determined based on different edge types.
  • the weight of the adaptation edge > the weight of the jump edge > the weight of the diversion edge can be set, where the adaptation edge is used to represent the adaptation of the same content, the jump edge is used to represent the strong jump relationship, and the diversion edge Edges are used to represent weak diversion relationships.
  • the weight of the node (for example, w4) can be determined based on the depth information, where the value range of the depth information can be 0 to 1. As the depth information increases, the value of the node becomes lower.
  • the quality information of the website is determined based on the weight of the node.
  • the information value of the node can be the value of w1, w2, w3, and the page quality field. product, for example:
  • node_info w1*w2*w3*quality_info
  • this embodiment determines the information_structure value (node_struct_info) of the node based on the weight of each node and the information value of the node. For example, the power of the corresponding depth information of the node's weight is multiplied by the information value of the node. , get the information_structure value, such as:
  • node_struct_info node_info*(w4) deep_info
  • site_info sigmoid( ⁇ (node_struct_info))
  • the comprehensive statistical value of the website can be calculated by calculating the information of all nodes. However, due to cost considerations, only some nodes can be sampled and calculated.
  • the quality threshold range can be set according to the actual situation, and the site can be judged or segmented based on comprehensive statistical values.
  • the quality of the entire website can be judged by deeming the site to be of low quality and cheating, and it can be not included, not displayed, low ranked, etc.
  • a quality threshold range can be set, and when the comprehensive statistical results are within this range, it will be considered not included. , not displayed, low sorted, or included or displayed.
  • the site quality is graded based on the comprehensive statistical value, so that the inclusion, display and ranking weight of the site can be adjusted.
  • the ranking weight can be used to increase the weight of high-quality websites and reduce the weight of low-quality websites.
  • a site structure diagram is constructed based on the attribute information of nodes corresponding to historical web pages of the site; based on the site structure diagram, the distance between each node and the homepage is determined to obtain depth information; a structural depth tree of the site is constructed; based on the nodes Type, quality, node edge and node weight setting, comprehensive statistics on nodes, grading and judging the quality of web pages based on the results of comprehensive statistics, thereby achieving the technical effect of improving the accuracy of site quality and grade judgment, and solving This solves the technical problem of low accuracy in determining site quality and grade.
  • Embodiments of the present disclosure also provide a web page processing device for a website that performs the web page processing method of the website in the embodiment shown in FIG. 1 .
  • the device may include one or more processors, and one or more stored program units. memory, wherein the program unit is executed by the processor, and the program unit includes: an acquisition component, a first determination component, a second determination component and a grading component.
  • FIG 4 is a schematic diagram of a web page processing device of a website according to an embodiment of the present disclosure.
  • the web page processing device 40 of the website may include: an acquisition component 41, a first determination component 42, and a second determination component 43 and hierarchical components 44.
  • the acquisition component 41 is configured to acquire multiple web pages of the website, where the multiple web pages are used to build the website.
  • the first determining component 42 is configured to determine the weight of each web page based on the association between multiple web pages, where the weight is used to represent the contribution proportion of each web page to the website.
  • the second determination component 43 is configured to determine the website quality of the website based on the weight of each web page and the page quality of each web page.
  • the grading component 44 is used to grade the website for website quality and obtain the quality level of the website.
  • the above-mentioned acquisition component 41, first determination component 42, second determination component 43 and grading component 44 can be run in the terminal as part of the device, and can be executed by the processor in the terminal.
  • the terminal can also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, an applause computer, and a mobile Internet device (Mobile Internet Devices, MID), PAD and other terminal devices.
  • the first determining component 42 includes: an acquisition sub-component, configured to acquire a target structure tree of the website, where the target structure tree is used to represent the association between multiple web pages, and the nodes of the target structure tree are used to represent Web pages; the first determination subcomponent is set to determine the weight of each web page based on the target structure tree.
  • the above-mentioned obtaining sub-component and the first determining sub-component can be run in the terminal as part of the device, and the functions implemented by the above-mentioned components can be executed by the processor in the terminal.
  • the first determination sub-component is used to determine the weight of each web page based on the target structure tree through the following steps: determine the weight of each web page based on at least the attribute information of each web page in the target structure tree, wherein the target structure
  • the attributes of tree nodes are used to represent corresponding attribute information.
  • the first determination sub-component is configured to determine the weight of each web page based on at least the attribute information of each web page in the target structure tree through the following steps: determine the first weight of each web page based on the attribute information, wherein the weight Including the first weight; based on the target association relationship between each web page in the target structure tree and the associated web page, determine the second weight of each web page, where the weight includes the second weight, and the target association relationship is used to characterize the corresponding web page and the processing sequence between the associated web pages; based on the depth information of the target association relationship relative to the homepage of the website in the target structure tree, determine the third weight of each web page, where the weight includes the third weight.
  • the attribute information includes the type of the corresponding web page and/or the page quality of the web page.
  • the target association relationship is used to represent at least one of the following relationships between the web page and the associated web page: the web page adapts to the associated web page, the web page jumps to the associated web page, and the web page directs to the associated web page.
  • the second determination component 43 includes: a second determination sub-component, configured to adjust the web page quality of each web page based on the first weight, the second weight and the second weight; a conversion sub-component, configured to adjust the web page quality based on the third weight and Depth information converts the adjusted page quality of each web page into the site quality of the website.
  • the above-mentioned second determination sub-component and conversion sub-component can run in the terminal as part of the device, and the functions implemented by the above-mentioned components can be executed by the processor in the terminal.
  • the conversion subcomponent is configured to convert the adjusted web page quality of each web page into the website quality of the website based on the third weight and the depth information through the following steps: performing an exponential operation on the third weight based on the depth information to obtain a power ; Obtain the product between the power and the adjusted web page quality of each web page; sum the multiple products corresponding to multiple web pages to obtain the website quality.
  • the second determination sub-component is configured to adjust the webpage quality of each webpage based on the first weight, the second weight and the second weight through the following steps: combining the first weight, the second weight, the second weight and each The product of the four webpage quality parameters is determined as the adjusted webpage quality of each webpage.
  • the acquisition sub-component is set to obtain the target structure tree of the website through the following steps: based on the attribute information of each web page and the target association relationship between each web page and the associated web page, the original structure tree of the website is constructed, where, The original structure tree is used to represent the relationship between multiple web pages and is used to represent the processing sequence between the corresponding web pages and associated web pages; the child nodes with the first number of parent nodes in the original structure tree are adjusted to have the second number. A number of child nodes of the parent node, wherein the second number is less than the first number; and based on the second number of child nodes of the parent node and depth information of the child nodes relative to the root node of the original structure tree, the target structure tree is constructed.
  • the device further includes: a processing component configured to perform at least one of the following processes on the website in response to the quality of the website meeting the quality threshold range, or the grade of the website meeting the grade threshold range: including the website, or processing the website. Display and sort the weight of the web pages of the website.
  • a processing component configured to perform at least one of the following processes on the website in response to the quality of the website meeting the quality threshold range, or the grade of the website meeting the grade threshold range: including the website, or processing the website. Display and sort the weight of the web pages of the website.
  • multiple web pages of the website are acquired through the acquisition component, where the multiple web pages are used to construct the website; through the first determination component, each web page is determined based on the association between the multiple web pages.
  • the weight where the weight is used to characterize the contribution proportion of each web page to the website; through the second determination component, the website quality of the website is determined based on the weight of each web page and the page quality of each web page; through the grading component, based on the website Quality grades the website to obtain the quality level of the website, thereby improving the technical effect of accuracy in determining website quality and solving the technical problem of low accuracy in determining website quality.
  • the present disclosure also provides an electronic device, a non-transitory computer-readable storage medium storing computer instructions, and a computer program product.
  • Embodiments of the present disclosure provide an electronic device, which may include: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions that can be executed by the at least one processor, and the instructions Executed by at least one processor, so that at least one processor can execute the web page processing method of the website according to the embodiment of the present disclosure.
  • the above-mentioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the above-mentioned processor, and the input-output device is connected to the above-mentioned processor.
  • the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the web page processing method of the website according to the embodiment of the present disclosure.
  • FIG. 4A is a schematic diagram of a non-transitory computer-readable storage medium storing computer instructions according to an embodiment of the present disclosure. As shown in FIG. 4A, a non-transitory computer-readable storage medium 401 storing computer instructions is described. Computer instructions are stored thereon. The computer instructions are used to cause the computer to perform the program code of the following steps:
  • S1 obtain multiple web pages of the website, where multiple web pages are used to build the website;
  • S3 determine the website quality of the website based on the weight of each web page and the page quality of each web page;
  • S4 Classify websites based on website quality to obtain the quality level of the website.
  • the computer instructions are also used to cause the computer to perform the following steps: determining the weight of each web page based on the association relationship between multiple web pages, including: obtaining a target structure tree of the website, wherein the target structure tree is used to Represents the association between multiple web pages, and the nodes of the target structure tree are used to represent the web pages; the weight of each web page is determined based on the target structure tree.
  • the computer instructions are also used to cause the computer to perform the following steps: determine the weight of each web page based on at least the attribute information of each web page in the target structure tree, wherein the attributes of the nodes of the target structure tree are used to represent Corresponding attribute information.
  • the computer instructions are also used to cause the computer to execute the program code of the following steps: determining the first weight of each web page based on the attribute information, wherein the weight includes the first weight; based on each web page in the target structure tree and the associated web page The target association relationship between them determines the second weight of each web page, where the weight includes the second weight, and the target association relationship is used to characterize the processing sequence between the corresponding web page and the associated web page; based on the target association relationship, the target structure tree
  • the depth information relative to the home page of the website is used to determine the third weight of each web page, where the weight includes the third weight.
  • the computer instructions are also used to cause the computer to perform the following steps: adjust the webpage quality of each webpage based on the first weight, the second weight and the second weight; and adjust the adjusted webpage of each webpage based on the third weight and depth information.
  • the page quality of a web page is converted into the website quality of the website.
  • the computer instructions are also used to cause the computer to perform the following steps: perform an exponential operation on the third weight based on the depth information to obtain a power; obtain the product between the power and the adjusted webpage quality of each webpage. ;Sum multiple products corresponding to multiple web pages to obtain website quality.
  • the computer instructions are also used to cause the computer to perform the following steps: determine the product of the first weight, the second weight, the second weight and the web page quality of each web page as the adjusted The page quality of each page.
  • the computer instructions are also used to cause the computer to perform the following steps: constructing the original structure tree of the website based on the attribute information of each web page and the target association relationship between each web page and associated web pages, where the original structure
  • the tree is used to represent all associations between multiple web pages, and the nodes of the original structure tree are the same as the nodes of the target structure tree.
  • the target association is used to represent the processing sequence between the corresponding web pages and associated web pages; the original structure tree
  • the child nodes with a first number of parent nodes in are adjusted to the child nodes with a second number of parent nodes, wherein the second number is less than the first number; based on the child nodes with the second number of parent nodes and the child nodes relative to
  • the depth information of the root node of the original structure tree is used to construct the target structure tree.
  • the computer instructions are also used to cause the computer to perform the following steps:
  • At least one of the following processes is performed on the website: including the website, displaying the website, and sorting the weight of the web pages of the website.
  • the above-mentioned non-transitory computer-readable storage medium may include but is not limited to electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or equipment, or the content of the above. Any suitable combination. More specific examples of readable storage media would include an electrical connection based on one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • FIG. 4B is a schematic diagram of a computer program product according to an embodiment of the present disclosure.
  • the computer program product includes a computer program for a web page processing method of a website.
  • the computer program implements the following steps when executed by the processor 402:
  • S1 obtain multiple web pages of the website, where multiple web pages are used to build the website;
  • S3 determine the website quality of the website based on the weight of each web page and the page quality of each web page;
  • S4 Classify websites based on website quality to obtain the quality level of the website.
  • the computer program when executed by the processor 402, the computer program also implements the following steps: based on the association relationship between multiple web pages, determining the weight of each web page includes: obtaining the target structure tree of the website, wherein the target structure tree is used to Represents the association between multiple web pages, and the nodes of the target structure tree are used to represent the web pages; the weight of each web page is determined based on the target structure tree.
  • the computer instructions are also used to cause the computer to perform the following steps: determine the weight of each web page based on at least the attribute information of each web page in the target structure tree, wherein the attributes of the nodes of the target structure tree are used to represent Corresponding attribute information.
  • the computer program when executed by the processor 402, the computer program also implements the following steps: determining a first weight of each web page based on the attribute information, where the weight includes the first weight; based on each web page in the target structure tree and the associated web page The target association relationship between them determines the second weight of each web page, where the weight includes the second weight, and the target association relationship is used to characterize the processing sequence between the corresponding web page and the associated web page; based on the target association relationship, the target structure tree The depth information relative to the home page of the website is used to determine the third weight of each web page, where the weight includes the third weight.
  • the computer program when executed by the processor 402, the computer program also implements the following steps: adjusting the webpage quality of each webpage based on the first weight, the second weight and the second weight; and adjusting the adjusted webpage of each webpage based on the third weight and depth information.
  • the page quality of a web page is converted into the website quality of the website.
  • the computer instructions are also used to cause the computer to perform the following steps: perform an exponential operation on the third weight based on the depth information to obtain a power; obtain the product between the power and the adjusted webpage quality of each webpage. ;Sum multiple products corresponding to multiple web pages to obtain website quality.
  • the computer program when executed by the processor 402, the computer program also implements the following steps: building an original structure tree of the website based on the attribute information of each web page and the target association relationship between each web page and associated web pages, where the original structure The tree is used to represent all associations between multiple web pages, and the nodes of the original structure tree are the same as the nodes of the target structure tree.
  • the target association is used to represent the processing sequence between the corresponding web pages and associated web pages; the original structure tree
  • the child nodes with a first number of parent nodes in are adjusted to the child nodes with a second number of parent nodes, wherein the second number is less than the first number; based on the child nodes with the second number of parent nodes and the child nodes relative to
  • the depth information of the root node of the original structure tree is used to construct the target structure tree.
  • At least one of the following processes is performed on the website: including the website, displaying the website, and sorting the weight of the web pages of the website.
  • FIG. 5 is a block diagram of an electronic device of a web page processing method for a website according to an embodiment of the present disclosure.
  • Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 500 includes a computing component 501 that can execute according to a computer program stored in a read-only memory (ROM) 502 or loaded from a storage component 508 into a random access memory (RAM) 503 Various appropriate actions and treatments. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored.
  • Computing component 501, ROM 502 and RAM 503 are connected to each other via bus 504.
  • An input/output (I/O) interface 505 is also connected to bus 504.
  • I/O interface 505 Multiple components in the device 500 are connected to the I/O interface 505, including: input components 506, such as keyboards, mice, etc.; output components 507, such as various types of displays, speakers, etc.; storage components 508, such as magnetic disks, optical disks, etc. ; and communication components 509, such as network cards, modems, wireless communication transceivers, etc. Communications component 509 allows device 500 to exchange information/data with other devices over computer networks such as the Internet and/or various telecommunications networks.
  • input components 506 such as keyboards, mice, etc.
  • output components 507 such as various types of displays, speakers, etc.
  • storage components 508 such as magnetic disks, optical disks, etc.
  • communication components 509 such as network cards, modems, wireless communication transceivers, etc.
  • Communications component 509 allows device 500 to exchange information/data with other devices over computer networks such as the Internet and/or various telecommunications networks.
  • Computing component 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing components 501 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing components that run machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc.
  • Computing component 501 performs various methods and processes described above, such as method data processing methods.
  • method data processing methods may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage component 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 500 via ROM 502 and/or communications component 509.
  • computing component 501 When a computer program is loaded into RAM 503 and executed by computing component 501, one or more steps of the data processing method described above may be performed.
  • computing component 501 may be configured to perform data processing methods in any other suitable manner (eg, by means of firmware).
  • Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system
  • CPLD complex programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor
  • the processor which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • Computer systems may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact over a communications network.
  • the relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
  • the server can be a cloud server, a distributed system server, or a server combined with a blockchain.
  • the embodiment of the present disclosure determines the quality of the website through the weights contributed by different web pages to the website, and jointly determines the quality of the website based on the weight and the first quality information of the page. This method is accurate and highly applicable, thereby improving the accuracy of determining the quality of the website. The technical effect solves the technical problem of low accuracy in judging website quality.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本公开提供了一种网站的网页处理方法、装置、电子设备和存储介质,涉及计算机领域,尤其涉及大数据领域。具体实现方案为:获取网站的多个网页,其中,多个网页用于构建网站;基于多个网页之间的关联关系,确定每个网页的权重,其中,权重用于表征每个网页对网站的贡献比例;基于每个网页的权重和每个网页的页面质量,确定网站的网站质量;基于网站质量对网站进行分级,得到网站的质量等级。

Description

网站的网页处理方法、装置、电子设备和存储介质
本申请要求于2022年04月29日提交中国专利局、申请号为202210467257.2、发明名称为“网站的网页处理方法、装置、电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,尤其涉及大数据领域中的网站的网页处理方法、装置、电子设备和存储介质。
背景技术
目前,在对网站数据进行质量筛选的过程中,通常通过设定阈值,基于设定的阈值,对网站统计优质占比、或者低质占比进行判断,但该判定方法过于简单,导致对网站质量进行判定的精度低。
发明内容
本公开提供了一种用于网站的网页处理方法、装置、电子设备和存储介质。
根据本公开的一方面,提供了一种网站的网页处理方法。该方法可以包括:获取网站的多个网页,其中,多个网页用于构建网站;基于多个网页之间的关联关系,确定每个网页的权重,其中,权重用于表征每个网页对网站的贡献比例;基于每个网页的权重和每个网页的页面质量,确定网站的网站质量;基于网站质量对网站进行分级,得到网站的质量等级。
根据本公开的另一方面,提供了一种网站的网页处理装置。该装置可以包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,程序单元由处理器执行,程序单元包括:获取组件,设置为获取网站的多个网页,其中,多个网页用于构建网站;第一确定组件,设置 为基于多个网页之间的关联关系,确定每个网页的权重,其中,权重用于表征每个网页对网站的贡献比例;第二确定组件,设置为基于每个网页的权重和每个网页的页面质量,确定网站的网站质量;分级组件,设置为基于网站质量对网站进行分级,得到网站的质量等级。
根据本公开的另一方面,还提供了一种电子设备。该电子设备可以包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行本公开实施例的网站的网页处理方法。
根据本公开的另一方面,还提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,计算机指令用于使计算机执行本公开实施例的网站的网页处理方法。
根据本公开的另一方面,还提供了一种计算机程序产品,可以包括计算机程序,计算机程序在被处理器执行时实现本公开实施例的网站的网页处理方法。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本公开的限定。其中:
图1是根据本公开实施例的一种网站的网页处理方法的流程图;
图2是根据本公开实施例的一种站点结构图的示意图;
图3是根据本公开实施例的一种目标结构树的示意图;
图4是根据本公开实施例的一种网站的网页处理装置的示意图;
图4A是根据本公开实施例的一种存储有计算机指令的非瞬时计算机可读存储介质的示意图;
图4B是根据本公开实施例的一种计算机程序产品的示意图;
图5是根据本公开实施例的一种网站的网页处理方法的电子设备的框图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
图1是根据本公开实施例的一种网站的网页处理方法的流程图。如图1所示,该方法可以包括以下步骤:
步骤S102,获取网站的多个网页,其中,多个网页用于构建网站。
在本公开上述步骤S102提供的技术方案中,网站可以由多个网页构成,可以通过获取网站内的多个历史网页,以实现获取网站内的多个网页的目的,其中,网页可以为页面,比如,可以为列表页、索引页、内容页的网页等,还可以细分为文章、视频、论坛、博客、下载、图片、问答的网页等,此处不做具体限制。
步骤S104,基于多个网页之间的关联关系,确定每个网页的权重,其中,权重用于表征每个网页对网站的贡献比例。
在本公开上述步骤S104提供的技术方案中,确定多个网页之间的关联关系,基于多个网页之间的关联关系,确定每个网页的权重,其中,权重可以为每个网页对网站的贡献比例,可以用于表征在构建网站的过程中对网站的贡献程度,网页之间的关联关系可以为网页出现的先后顺序、包含关系等。
可选地,可以设定不同网页类型的网页在构建网站的过程中对网站的贡献程度不同,从而可以通过每个网页的类型确定每个网页对应的权重,此处仅为举例说明,不对网页贡献程度的确认方法具体限定。
步骤S106,基于每个网页的权重和每个网页的页面质量,确定网站的网站质量。
在本公开上述步骤S106提供的技术方案中,确定每个网页的权重,网站中有多个网页,通过确定网站中每个网页的页面质量,从而可以确定网站的网站质量。因而,该实施例基于网站中每个网页的权重和每个 网页的页面质量,达到确定网站的网站质量的目的,其中,页面质量可以包括作弊、低质、普通、优质的页面质量,可以用页面质量的字段值(比如,quality_info)表示,字段值可以设定为连续值或离散值,用于表征页面质量,比如,可以用0、1、2、3表示;网站质量可以用于表示网站的优劣程度,可以用网站质量字段值表示(比如,site_info),可以用字段标志网站的质量,比如,可以用0,1,2,3表示等,此处不对表示方式做具体限制。
可选地,确定每个网页在构建网站的过程中的贡献程度,从而确定每个网页的权重;对每个网页的优劣程度进行判断,得到每个网页的页面质量,基于每个网页的权重和每个网页的页面质量,确定网站的网站质量。
可选地,在本公开实施例中,可以基于网站的所有网页的权重和页面质量确定网站的网站质量,也可以对网站中的节点进行抽样选取,从而节约运行成本。
在相关技术中,在对数据进行质量筛选过程中,网站的质量/等级的判定一般是通过网站内历史网页质量进行统计判定的,其中,统计的方法可以为优质占比、或者低质占比,再设定阈值进行判定,但这种判定方法过于暴力,没有考虑到不同页面对站点贡献的权重。然后,本公开实施例提出了一种基于站点内页面结构分布,确定每个网页的权重,通过进行权重调节的方法,实现更精准地完成对站点质量和等级的判定。
步骤S108,基于网站质量对网站进行分级,得到网站的质量等级。
在本公开上述步骤S108提供的技术方案中,确定网站质量,基于确定的多个网站的质量对网站进行分级,从而达到得到网站的质量等级的目的。
可选地,确定网站质量,根据网站质量对网站进行分级,比如,当网站质量用数字表示时,可以根据对应数字的大小进行排序,将第一名至第三名作为第一等级,从而确定网站的质量等级,此处仅为举例说明,不对确定网站的质量等级的方式做具体限制。
通过上述步骤S102至步骤S108,获取网站的多个网页,其中,多个网页用于构建网站;基于多个网页之间的关联关系,确定每个网页的 权重,其中,权重用于表征每个网页对网站的贡献比例;基于每个网页的权重和每个网页的页面质量,确定网站的网站质量;基于网站质量对网站进行分级,得到网站的质量等级。也就是说,本公开实施例通过不同网页对网站贡献的权重,基于权重和页面的第一质量信息共同确定网站的质量,该方法精准、适用性强,从而提高了对网站质量进行判定的精度的技术效果,解决了对网站质量进行判定的精度低的技术问题。
下面对该实施例的上述方法进行进一步地详细介绍。
作为一种可选的实施方式,步骤S104,基于多个网页之间的关联关系,确定每个网页的权重包括:获取网站的目标结构树,其中,目标结构树用于表示多个网页之间的关联关系,且目标结构树的节点用于表示网页;基于目标结构树确定每个网页的权重。
在该实施例中,确定多个网页之间的关联关系,基于多个网页之间的关联关系,确定网站的目标结构树,可以基于节点在目标结构树中的位置确定每个网页的权重,其中,目标结构树可以为站点结构图,可以用于表示多个网页之间的关联关系,一个网页可以看作一个节点,目标结构树的节点与网页一一对应。
可选地,基于网页之间的关联关系,从而获取网站的目标结构树,比如,目标结构树从网站的首页出发,由首页指向首页的下一个网页,依次类推下去,从而得到目标结构树。
作为一种可选的实施方式,基于目标结构树确定每个网页的权重包括:至少基于每个网页在目标结构树中的属性信息确定每个网页的权重,其中,目标结构树的节点的属性用于表示对应的属性信息。
在该实施例中,确定每个网页的属性信息,可以基于网页的属性信息确定网页的权重,其中,属性信息可以包括网页的网页类型、网页质量、边类型和网页对应的结构信息。
可选地,该实施例可以通过确定节点对应网页的属性信息确定节点的属性信息。
可选地,网页类型可以包括:列表页(即,索引页)、内容页,还可以进一步包括:文章、视频、论坛、博客、下载、图片、问答等;页面质量可以包括:作弊、低质、普通、优质,其中,页面质量的字段值 (比如,quality_info)可以设定为连续值或离散值,可以用于区分页面质量。
本公开实施例基于网页的属性信息,对节点进行综合统计,从而实现了对网页的精准打分,但需要说明的是,对页面质量和类型的判断有多种方法,此处仅为举例说明,不对页面质量和类型的判断方法做具体限制。
可选地,边类型可以包括导流边、跳转边和适配边等,其中,导流边可以为可以从一个页面点击后进入另一个页面,比如,A页有链接指向B页,比如,从A页面的链接,指向B页面,这种是导流边;跳转边可以为:从一个页面自动跳转至另一个页面(即,A页面自动跳转至B),比如,域名变更;适配边可以为两个页面是适配关系,比如,电脑端页面可以自动跳到移动站点,还可以为网页链接可以自动跳到程序等。
可选地,该实施例可以根据网页类型、质量、边结构和结构信息设置不同的权重,比如,设网页类型中列表页权重为0.5,高于内容页权重,则当网页对应为类型为内容页时,该网页的权重为0.5,从而达到基于每个网页在目标结构树中的属性信息确定每个网页的权重的目的。
举例而言,设置权重越高,网页质量越高,则目标结构树的节点对应的权重越高,节点对应的网页的质量越高。
作为一种可选的实施方式,至少基于每个网页在目标结构树中的属性信息确定每个网页的权重包括:基于属性信息确定每个网页的第一权重,其中,权重包括第一权重;基于每个网页在目标结构树中与关联网页之间的目标关联关系,确定每个网页的第二权重,其中,权重包括第二权重,目标关联关系用于表征对应的网页和关联网页之间的处理顺序;基于目标关联关系在目标结构树中相对网站的首页的深度信息,确定每个网页的第三权重,其中,权重包括第三权重。
在该实施例中,基于属性信息确定每个网页的第一权重,基于每个网页在目标结构树中与关联网页之间的目标关联关系,确定每个网页的第二权重,基于目标关联关系在目标结构中相对网站的首页的深度信息,确定每个网页的第三权重,其中,第一权重可以包括节点类型权重(比 如,w1)、节点质量权重(比如,w2),可以根据实际需要,设定不同节点类型、节点质量、节点边、节点结构对应的权重,从而确定节点的权重;目标关联关系可以为跳转关系、导流关系等,第二权重可以为节点边权重(w3),可以基于不同边类型确定权重(比如,w3),比如,可以设定适配边的权重>跳转边的权重>导流边的权重,其中,适配边用于表示适配同内容,跳转边用于表示跳转强关系,导流边用于表示导流弱关系。
可选地,可以确定每个网页在目标结构树中与关联网页之间的目标关联关系,可以是根据每个网页与关联网页之间的距离,确定目标关联关系在目标结构树中相对网站的首页的深度信息(比如,deep_info),基于该深度信息确定第三权重,第三权重可以为节点结构权重(w4),其中,深度信息的取值区间可以为0~1,随着深度信息的增加,目标结构树的节点的价值越低。
可选地,网页在目标结构树中与关联网页每增加一层(一个边),深度信息的值可以加1。
作为一种可选的实施方式,属性信息包括对应的网页的类型和/或网页的页面质量。
在该实施例中,确定目标结构树的节点对应的网页的类型,基于节点对应的网页的类型,确定节点的属性信息,其中,该属性信息包括对应的网页的类型和/或页面的页面质量。
可选地,网页的类型可以包括:列表页(比如,索引页)、内容页,还可以进一步包括文章、视频、论坛、博客、下载、图片、问答等,其中,页面类型可以通过字段值表示,页面类型的字段值(比如,pagetype_info)可以为标记位,从而可以用于权重筛选。
可选地,网页的页面质量可以包括作弊、低质、普通、优质,其中,页面质量可以通过字段值表示,页面质量的字段值(quality_info)可以设定为连续值或离散值,可以用于区分页面质量,比如,0、1、2、3分别表示不同的页面质量。
需要说明的是,该实施例对页面质量和类型的判断有多种方法,本公开实施例不对网页质量和类型的判断方法做具体限制。
作为一种可选的实施方式,目标关联关系用于表示网页与关联网页之间的以下至少之一关系:网页适配至关联网页,网页跳转至关联网页和网页导流至关联网页。
在该实施例中,目标关联关系可以为站内边的关系,站内边可以字段值表示,边的字段值(比如,edge_info)可以为标记位,从而可以用于筛选权重。
可选地,目标关联关系用于表示网页与关联网页之间的以下至少之一关系:网页适配至关联网页,网页跳转至关联网页和网页导流至关联网页,其中,网页适配至关联网页可以对应适配边,则网页与关联网页可以为适配关系,比如,电脑端页面可以自动跳到移动站点,还可以为网页链接可以自动跳到程序等;网页跳转至关联网页可以对应跳转边,可以为从一个页面自动跳转至另一个页面(比如,A页面自动跳转至B页面),比如,域名变更;网页导流至关联网页可以对应导流边,可以为可以从一个页面点击后进入另一个页面,比如,A页面有链接指向B页面,比如,从A页面的链接,指向B页面。
作为一种可选的实施方式,步骤S106,基于每个网页的权重和每个网页的页面质量,确定网站的网站质量,包括:基于第一权重、第二权重和第二权重调整每个网页的网页质量;基于第三权重和深度信息将调整后的每个网页的网页质量转换为网站的网站质量。
在该实施例中,基于每个网页的上述第一权重、上述第二权重和上述第二权重调整每个网页的网页质量,基于上述第三权重和深度信息将调整后的每个网页的网页质量转换为网站的网站质量。
可选地,网页质量可以用节点的信息值(比如,node_info)表示,可以基于每个网页的第一权重、第二权重和第二权重确定每个网页的网页质量。
可选地,基于第三权重和深度信息将调整后的每个网页的网页质量转换为网站的网站质量,比如,可以基于得到节点的信息确定网站质量。
作为一种可选的实施方式,基于第三权重和深度信息将调整后的每个网页的网页质量转换为网站的网站质量包括:基于深度信息对第三权重进行指数运算,得到幂;获取幂和调整后的每个网页的网页质量二者 之间的积;对多个网页对应的多个积进行求和,以得到网站质量。
在该实施例中,基于深度信息对第三权重进行指数运算,得到幂;获取幂和调整后的每个网页的网页质量二者之间的积;对多个网页对应的多个积进行求和,以得到网站质量。
可选地,可以通过深度信息对节点的第三权重进行指数运算,得到幂,将其与调整后的每个网页的网页质量(节点信息值)相乘,得到信息_结构值(node_struct_info),比如:
node_struct_info=node_info*(w4) deep_info
可选地,网站质量可以为对网站的所有节点的页面质量进行综合统计得到,又可以称为综合统计值(site_info),对多个网页对应的多个积进行求和,以得到网站质量,比如:
site_info=sigmoid(∑(node_struct_info))
作为一种可选的实施方式,基于第一权重、第二权重和第二权重调整每个网页的网页质量包括:将第一权重、第二权重、第二权重和每个网页的网页质量这四者之间的积,确定为调整后的每个网页的网页质量。
在该实施例中,获取第一权重、第二权重、第二权重和每个网页的网页质量这四者之间的积,将该积确定为调整后的每个网页的网页质量。
可选地,网页质量可以用节点的信息值(比如,node_info)表示,可以基于每个网页的第一权重、第二权重和第二权重确定每个网页的网页质量,其中,节点的信息值可以为w1,w2,w3和网页质量的字段值(比如,quality_info)的乘积,比如:
node_info=w1*w2*w3*quality_info
作为一种可选的实施方式,获取网站的目标结构树包括:基于每个网页的属性信息、每个网页与关联网页之间的目标关联关系,构建网站的原始结构树,其中,原始结构树用于表示多个网页之间的全部关联关系,且原始结构树的节点与目标结构树的节点相同,目标关联关系用于表征对应的网页和关联网页之间的处理顺序;将原始结构树中具有第一数量的父节点的子节点,调整为具有第二数量的父节点的子节点,其中,第二数量小于第一数量;基于具有第二数量的父节点的子节点和子节点相对于原始结构树的根节点的深度信息,构建目标结构树。
在该实施例中,基于每个网页的属性信息、每个网页与关联网页之间的目标关联关系,构建网站的原始结构树,可以对具有多个父节点的子节点进行调整,将原始结构树中具有第一数量的父节点的子节点,调整为具有第二数量的父节点的子节点,对简化过的结构树进行处理,基于具有第二数量的父节点的子节点和子节点相对于原始结构树的根节点的深度信息,构建目标结构树,其中,原始结构树可以为结构图网络,可以用于表示多个网页之间的全部关联关系,且原始结构树的节点与目标结构树的节点相同,目标关联关系用于表征对应的网页和关联网页之间的处理顺序,第二数量小于第一数量,从而对边进行筛选,实现只有一个入度边,去除冗余边。
可选地,可以对网站内的边进行筛选,舍弃循环边,从而将循环图调整为单向图,以实现将原始结构树中具有第一数量的父节点的子节点,调整为具有第二数量的父节点的子节点,比如,首页指向上下左右四个列表,由于首页指向左边的列表又指向上边的列表,则首页上面的列表在指向首页左边列表的边可以省略。
举例而言,如果节点A指向节点B,节点B又指向节点A,如果先得到节点A指向节点B,那么节点B再指向节点A的边就可以舍弃。
可选地,本公开实施例为了去除冗余所有节点可以只保留一个入度边,也可以都保留,此处不做具体限制。
可选地,可以根据场景选择节点的保留方式,当需要性能好的结构树,可以通过对结构图进行遍历,循序先到先得的原则保留节点,其中,遍历可以为深度遍历也可以为广度遍历,此处不对遍历方式做具体限制。
可选地,当重视结构树结构时,可以对所有的入度边的权重进行比较,有限保留权重高的入度边,从而得到对边进行筛选之后的结构树。
可选地,当重视节点的完整性时,可以保留每个节点权重高的入度边,从而得到目标结构树。
作为一种可选的实施方式,响应于网站的质量满足质量阈值范围,或网站的等级满足等级阈值范围,对网站进行以下至少之一处理:对网站进行收录、对网站进行展现、对网站的网页的权重进行排序。
在该实施例中,可以根据实际情况设定质量阈值范围和等级阈值范 围,确定网站质量是否满足质量阈值范围,或者确定网站的等级是否满足等级阈值范围。响应于网站的质量满足质量阈值范围,或网站的等级满足等级阈值范围,对网站进行以下至少之一处理:对网站进行收录、对网站进行展现、对网站的网页的权重进行排序。
可选地,可以根据实际情况设定质量阈值范围,基于综合统计值对站点进行判定或分段。
可选地,整站质量判断可以为认为站点低质作弊,可以不收录、不展现、低排序等,比如,设定一个质量阈值范围,当综合统计的结果在这个范围中就属于不收录、不展现、低排序,或者属于收录、展现。
可选地,基于综合统计值对站点质量进行分级,从而可以调节站点的收录、展现和排序权重,其中,排序权重可以为对优质的网站进行提权,对劣质的网站进行降权。
下面结合优选的实施例,对本公开实施例的上述技术方案进行进一步地举例介绍。
在对数据进行质量筛选过程中,网站的质量、等级占有很高权重,其中,站点的质量/等级的判定一般是通过站点内历史网页质量进行统计判定的,其中,统计的方法可以为优质占比、或者低质占比,再设定阈值进行判定,但上述判定方法没有考虑到不同网页对站点贡献的权重,存在精度不足、容易误判的问题。
然而,本公开实施例提出一种基于网站内网页结构分布,对结构树进行权重调节的方法,进而更精准地完成对站点质量和等级的判定。
作为一种可选的实施例,图2是根据本公开实施例的一种站点结构图的示意图,如图2所示,网站中的页面可以作为一个节点,基于节点对应的页面的类型,构建站点结构图网络。
可选地,通过确定节点对应页面的属性信息确定节点的属性信息,其中,页面的属性信息可以包括页面类型和页面质量。
可选地,页面类型可以包括:列表页(比如,索引页)、内容页,还可以进一步包括:文章、视频、论坛、博客、下载、图片、问答等,其中,页面类型可以用字段值表示,页面类型的字段值(比如,pagetype_info)可以为标记位,可以用于筛选权重。
可选地,页面质量可以包括:作弊、低质、普通、优质,其中,页面质量可以用字段值(比如,quality_info)表示,页面质量的字段值可以设定为连续值或离散值,可以用于区分页面质量,比如,0、1、2、3分别表示不同的页面质量。
需要说明的是,对页面质量和类型的判断有多种方法,本公开实施例不对页面质量和类型的判断方法做具体限制。
作为一种可选的实施例,图3是根据本公开实施例的一种目标结构树的示意图,如图3所示,有向图从首页触发,对网站内的边进行筛选,舍弃循环边,以实现将循环图调整为单向图,比如,首页指向上下左右四个列表,由于首页指向左边的列表又指向上边的列表,则首页上面的列表在指向首页左边列表的边即可省略。
举例而言,如果节点A指向节点B,节点B又指向节点A,如果先得到节点A指向节点B,那么节点B再指向节点A的边就可以舍弃。
可选地,边可以包括导流边、跳转边和适配边,其中,导流边可以为可以从一个页面点击后进入另一个页面,比如,A页面有链接指向B页面,即为导流边;跳转边可以为从一个页面自动跳转至另一个页面(即,A页面自动跳转至B),比如,域名变更;适配边可以为两个页面是适配关系,比如,电脑端页面可以自动跳到移动站点,还可以为网页链接可以自动跳到程序等。
可选地,本公开实施例为了去除冗余,所有节点可以只保留一个入度边,也可以都保留,此处不做具体限制。
可选地,可以根据场景选择节点的保留方式,当需要性能好的结构树,可以通过对结构图进行遍历,遵循先到先得的原则保留节点,其中,遍历可以为深度遍历,也可以为广度遍历,此处不对遍历方式做具体限制。
可选地,当侧重结构树的结构时,可以对所有的入度边的权重进行比较,优先保留权重高的入度边,从而得到对边进行筛选之后的结构树。
可选地,当侧重节点的完整性时,可以保留每个节点权重高的入度边,从而得到目标结构树。
可选地,在得到目标结构树之后,可以对节点距离首页的距离进行 计算,得到深度信息(比如,deep_info),比如,如图3所示,首页直接指向首页上方的列表,则首页上方的列表的深度信息可以为1,首页通过上方的列表在指向右边的列表,则右边的列表的深度信息可以为2。
作为一种可选的实施例,计算节点的权重,可以包括对节点类型权重(比如,w1)、节点质量权重(比如,w2)、节点边权重(比如,w3)和节点结构权重(比如,w4)进行计算。
可选地,可以基于不同页面类型确定节点类型权重(比如,w1),比如,可以设定列表页权重高于内容页。
可选地,可以基于不同页面质量的等级确定节点质量权重(比如,w2)。当为高质量挖掘时,质量越高,所占权重越高;如果用于低质挖掘,则可相反。
可选地,可以基于不同边类型确定节点边权重(比如,w3)。该实施例可以设定适配边的权重>跳转边的权重>导流边的权重,其中,适配边用于表示适配同内容,跳转边用于表示跳转强关系,导流边用于表示导流弱关系。
可选地,可以基于深度信息确定节点的权重(比如,w4),其中,深度信息的取值区间可以为0~1,随着深度信息的增加,该节点的价值越低。
作为一种可选的实施例,基于节点的权重,确定网站的质量信息。
可选地,基于每个节点对应的w1,w2,w3,w4,确定每个节点的信息值(比如,node_info),其中,节点的信息值可以为w1,w2,w3和页面质量字段值的乘积,比如:
node_info=w1*w2*w3*quality_info
可选地,该实施例基于每个节点的权重和节点的信息值,确定节点的信息_结构值(node_struct_info),比如,节点的权重的对应深度信息的次方再与节点的信息值相乘,得到信息_结构值,比如:
node_struct_info=node_info*(w4) deep_info
可选地,对网站的所有节点的信息_结构值进行综合统计,得到网站的综合统计值(比如,site_info),比如:
site_info=sigmoid(∑(node_struct_info))
需要说明的是,网站的综合统计值可以为对所有节点的信息进行计算得到,但是,出于成本的考量,也可以仅对部分节点进行抽样计算。
作为一种可选择的实施例,可以根据实际情况设定质量阈值范围,基于综合统计值对站点进行判定或分段。
可选地,判断整个网站的质量可以为认为站点低质作弊,可以不收录、不展现、低排序等,比如,设定一个质量阈值范围,当综合统计的结果在这个范围中就属于不收录、不展现、低排序,或者属于收录、展现。
可选地,基于综合统计值对站点质量进行分级,从而可以调节站点的收录、展现和排序权重,其中,排序权重可以为对优质的网站进行提权,对劣质的网站进行降权。
在本公开实施例中,基于站点的历史网页对应的节点的属性信息,构建站点结构图;基于站点结构图,确定各个节点与首页的距离,得到深度信息;构建站点的结构深度树;基于节点类型、质量、节点边和节点的权重设定,对节点进行综合统计,基于综合统计的结果对网页的质量进行分级和判断,从而实现提高对站点质量和等级判定的准确性的技术效果,解决了对站点质量和等级判定的准确性低的技术问题。
本公开实施例还提供了一种用于执行图1所示实施例的网站的网页处理方法的网站的网页处理装置,该装置可以包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,程序单元由处理器执行,程序单元包括:获取组件、第一确定组件,第二确定组件和分级组件。
图4是根据本公开实施例的一种网站的网页处理装置的示意图,如图4所示,该网站的网页处理装置40可以包括:获取组件41、第一确定组件42,第二确定组件43和分级组件44。
获取组件41,设置为获取网站的多个网页,其中,多个网页用于构建网站。
第一确定组件42,设置为基于多个网页之间的关联关系,确定每个网页的权重,其中,权重用于表征每个网页对网站的贡献比例。
第二确定组件43,设置为基于每个网页的权重和每个网页的页面质量,确定网站的网站质量。
分级组件44,用于设置为网站质量对网站进行分级,得到网站的质量等级。
此处需要说明的是,上述获取组件41、第一确定组件42,第二确定组件43和分级组件44可以作为装置的一部分运行在终端中,可以通过终端中的处理器来执行上述组件实现的功能,终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌声电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
可选地,第一确定组件42包括:获取子组件,设置为获取网站的目标结构树,其中,目标结构树用于表示多个网页之间的关联关系,且目标结构树的节点用于表示网页;第一确定子组件,设置为基于目标结构树确定每个网页的权重。
此处需要说明的是,上述获取子组件和第一确定子组件可以作为装置的一部分运行在终端中,可以通过终端中的处理器来执行上述组件实现的功能。
可选地,第一确定子组件用于通过以下步骤来基于目标结构树确定每个网页的权重:至少基于每个网页在目标结构树中的属性信息确定每个网页的权重,其中,目标结构树的节点的属性用于表示对应的属性信息。
可选地,第一确定子组件设置为通过以下步骤来至少基于每个网页在目标结构树中的属性信息确定每个网页的权重:基于属性信息确定每个网页的第一权重,其中,权重包括第一权重;基于每个网页在目标结构树中与关联网页之间的目标关联关系,确定每个网页的第二权重,其中,权重包括第二权重,目标关联关系用于表征对应的网页和关联网页之间的处理顺序;基于目标关联关系在目标结构树中相对网站的首页的深度信息,确定每个网页的第三权重,其中,权重包括第三权重。
可选地,属性信息包括对应的网页的类型和/或网页的页面质量。
可选地,目标关联关系用于表示网页与关联网页之间的以下至少之一关系:网页适配至关联网页,网页跳转至关联网页和网页导流至关联网页。
可选地,第二确定组件43包括:第二确定子组件,设置为基于第 一权重、第二权重和第二权重调整每个网页的网页质量;转换子组件,设置为基于第三权重和深度信息将调整后的每个网页的网页质量转换为网站的网站质量。
此处需要说明的是,上述第二确定子组件和转换子组件可以作为装置的一部分运行在终端中,可以通过终端中的处理器来执行上述组件实现的功能。
可选地,转换子组件设置为通过以下步骤来基于第三权重和深度信息将调整后的每个网页的网页质量转换为网站的网站质量:基于深度信息对第三权重进行指数运算,得到幂;获取幂和调整后的每个网页的网页质量二者之间的积;对多个网页对应的多个积进行求和,以得到网站质量。
可选地,第二确定子组件设置为通过以下步骤来基于第一权重、第二权重和第二权重调整每个网页的网页质量:将第一权重、第二权重、第二权重和每个网页的网页质量这四者之间的积,确定为调整后的每个网页的网页质量。
可选地,获取子组件设置为通过以下步骤来获取网站的目标结构树:基于每个网页的属性信息、每个网页与关联网页之间的目标关联关系,构建网站的原始结构树,其中,原始结构树用于表示多个网页之间的联关系用于表征对应的网页和关联网页之间的处理顺序;将原始结构树中具有第一数量的父节点的子节点,调整为具有第二数量的父节点的子节点,其中,第二数量小于第一数量;基于具有第二数量的父节点的子节点和子节点相对于原始结构树的根节点的深度信息,构建目标结构树。
可选地,所述装置还包括:处理组件,设置为响应于网站的质量满足质量阈值范围,或网站的等级满足等级阈值范围,对网站进行以下至少之一处理:对网站进行收录、对网站进行展现、对网站的网页的权重进行排序。
在该公开实施例的装置中,通过获取组件,获取网站的多个网页,其中,多个网页用于构建网站;通过第一确定组件,基于多个网页之间的关联关系,确定每个网页的权重,其中,权重用于表征每个网页对网站的贡献比例;通过第二确定组件,基于每个网页的权重和每个网页的 页面质量,确定网站的网站质量;通过分级组件,基于网站质量对网站进行分级,得到网站的质量等级,从而提高了对网站质量进行判定的精度的技术效果,解决了对网站质量进行判定的精度低的技术问题。
本公开的技术方案中,所涉及的用户个人信息的获取,存储和应用等,均符合相关法律法规的规定,且不违背公序良俗。
根据本公开的实施例,本公开还提供了一种电子设备、一种存储有计算机指令的非瞬时计算机可读存储介质和一种计算机程序产品。
本公开的实施例提供了一种电子设备,该电子设备可以包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行本公开实施例的网站的网页处理方法。
可选地,上述电子设备还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
根据本公开的实施例,本公开还提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,计算机指令用于使计算机执行本公开实施例的网站的网页处理方法。
图4A是根据本公开实施例的一种存储有计算机指令的非瞬时计算机可读存储介质的示意图。如图4A所示,描述了存储有计算机指令的非瞬时计算机可读存储介质401,其上存储有计算机指令,该计算机指令用于使计算机执行如下步骤的程序代码:
S1,获取网站的多个网页,其中,多个网页用于构建网站;
S2,基于多个网页之间的关联关系,确定每个网页的权重,其中,权重用于表征每个网页对网站的贡献比例;
S3,基于每个网页的权重和每个网页的页面质量,确定网站的网站质量;
S4,基于网站质量对网站进行分级,得到网站的质量等级。
可选地,计算机指令还用于使计算机执行如下步骤的程序代码:基于多个网页之间的关联关系,确定每个网页的权重包括:获取网站的目标结构树,其中,目标结构树用于表示多个网页之间的关联关系,且目标结构树的节点用于表示网页;基于目标结构树确定每个网页的权重。
可选地,计算机指令还用于使计算机执行如下步骤的程序代码:至少基于每个网页在目标结构树中的属性信息确定每个网页的权重,其中,目标结构树的节点的属性用于表示对应的属性信息。
可选地,计算机指令还用于使计算机执行如下步骤的程序代码:基于属性信息确定每个网页的第一权重,其中,权重包括第一权重;基于每个网页在目标结构树中与关联网页之间的目标关联关系,确定每个网页的第二权重,其中,权重包括第二权重,目标关联关系用于表征对应的网页和关联网页之间的处理顺序;基于目标关联关系在目标结构树中相对网站的首页的深度信息,确定每个网页的第三权重,其中,权重包括第三权重。
可选地,计算机指令还用于使计算机执行如下步骤的程序代码:基于第一权重、第二权重和第二权重调整每个网页的网页质量;基于第三权重和深度信息将调整后的每个网页的网页质量转换为网站的网站质量。
可选地,计算机指令还用于使计算机执行如下步骤的程序代码:基于深度信息对第三权重进行指数运算,得到幂;获取幂和调整后的每个网页的网页质量二者之间的积;对多个网页对应的多个积进行求和,以得到网站质量。
可选地,计算机指令还用于使计算机执行如下步骤的程序代码:将第一权重、第二权重、第二权重和每个网页的网页质量这四者之间的积,确定为调整后的每个网页的网页质量。
可选地,计算机指令还用于使计算机执行如下步骤的程序代码:基于每个网页的属性信息、每个网页与关联网页之间的目标关联关系,构建网站的原始结构树,其中,原始结构树用于表示多个网页之间的全部关联关系,且原始结构树的节点与目标结构树的节点相同,目标关联关系用于表征对应的网页和关联网页之间的处理顺序;将原始结构树中具有第一数量的父节点的子节点,调整为具有第二数量的父节点的子节点,其中,第二数量小于第一数量;基于具有第二数量的父节点的子节点和子节点相对于原始结构树的根节点的深度信息,构建目标结构树。
可选地,计算机指令还用于使计算机执行如下步骤的程序代码:
响应于网站的质量满足质量阈值范围,或网站的等级满足等级阈值范围,对网站进行以下至少之一处理:对网站进行收录、对网站进行展现、对网站的网页的权重进行排序。
可选地,在本实施例中,上述非瞬时计算机可读存储介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的实施例,本公开还提供了一种计算机程序产品,图4B是根据本公开实施例的一种计算机程序产品的示意图。如图4B所示,该计算机程序产品包括网站的网页处理方法的计算机程序,该计算机程序在被处理器402执行时实现以下步骤:
S1,获取网站的多个网页,其中,多个网页用于构建网站;
S2,基于多个网页之间的关联关系,确定每个网页的权重,其中,权重用于表征每个网页对网站的贡献比例;
S3,基于每个网页的权重和每个网页的页面质量,确定网站的网站质量;
S4,基于网站质量对网站进行分级,得到网站的质量等级。
可选地,计算机程序在被处理器402执行时还实现以下步骤:基于多个网页之间的关联关系,确定每个网页的权重包括:获取网站的目标结构树,其中,目标结构树用于表示多个网页之间的关联关系,且目标结构树的节点用于表示网页;基于目标结构树确定每个网页的权重。
可选地,计算机指令还用于使计算机执行如下步骤的程序代码:至少基于每个网页在目标结构树中的属性信息确定每个网页的权重,其中,目标结构树的节点的属性用于表示对应的属性信息。
可选地,计算机程序在被处理器402执行时还实现以下步骤:基于属性信息确定每个网页的第一权重,其中,权重包括第一权重;基于每个网页在目标结构树中与关联网页之间的目标关联关系,确定每个网页 的第二权重,其中,权重包括第二权重,目标关联关系用于表征对应的网页和关联网页之间的处理顺序;基于目标关联关系在目标结构树中相对网站的首页的深度信息,确定每个网页的第三权重,其中,权重包括第三权重。
可选地,计算机程序在被处理器402执行时还实现以下步骤:基于第一权重、第二权重和第二权重调整每个网页的网页质量;基于第三权重和深度信息将调整后的每个网页的网页质量转换为网站的网站质量。
可选地,计算机指令还用于使计算机执行如下步骤的程序代码:基于深度信息对第三权重进行指数运算,得到幂;获取幂和调整后的每个网页的网页质量二者之间的积;对多个网页对应的多个积进行求和,以得到网站质量。
可选地,计算机程序在被处理器402执行时还实现以下步骤:将第一权重、第二权重、第二权重和每个网页的网页质量这四者之间的积,确定为调整后的每个网页的网页质量。
可选地,计算机程序在被处理器402执行时还实现以下步骤:基于每个网页的属性信息、每个网页与关联网页之间的目标关联关系,构建网站的原始结构树,其中,原始结构树用于表示多个网页之间的全部关联关系,且原始结构树的节点与目标结构树的节点相同,目标关联关系用于表征对应的网页和关联网页之间的处理顺序;将原始结构树中具有第一数量的父节点的子节点,调整为具有第二数量的父节点的子节点,其中,第二数量小于第一数量;基于具有第二数量的父节点的子节点和子节点相对于原始结构树的根节点的深度信息,构建目标结构树。
可选地,计算机程序在被处理器402执行时还实现以下步骤:
响应于网站的质量满足质量阈值范围,或网站的等级满足等级阈值范围,对网站进行以下至少之一处理:对网站进行收录、对网站进行展现、对网站的网页的权重进行排序。
图5是根据本公开实施例的一种网站的网页处理方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动 装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图5所示,设备500包括计算组件501,其可以根据存储在只读存储器(ROM)502中的计算机程序或者从存储组件508加载到随机访问存储器(RAM)503中的计算机程序,来执行各种适当的动作和处理。在RAM 503中,还可存储设备500操作所需的各种程序和数据。计算组件501、ROM502以及RAM503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。
设备500中的多个部件连接至I/O接口505,包括:输入组件506,例如键盘、鼠标等;输出组件507,例如各种类型的显示器、扬声器等;存储组件508,例如磁盘、光盘等;以及通信组件509,例如网卡、调制解调器、无线通信收发机等。通信组件509允许设备500通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
计算组件501可以是各种具有处理和计算能力的通用和/或专用处理组件。计算组件501的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算组件、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算组件501执行上文所描述的各个方法和处理,例如方法数据处理方法。例如,在一些实施例中,方法数据处理方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储组件508。在一些实施例中,计算机程序的部分或者全部可以经由ROM 502和/或通信组件509而被载入和/或安装到设备500上。当计算机程序加载到RAM503并由计算组件501执行时,可以执行上文描述的数据处理方法的一个或多个步骤。备选地,在其他实施例中,计算组件501可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行数据处理方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、 专用标准产品(ASSP)、芯片上系统的系统(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提 供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式系统的服务器,或者是结合了区块链的服务器。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。
工业实用性
获取网站的多个网页,其中,多个网页用于构建网站;基于多个网页之间的关联关系,确定每个网页的权重,其中,权重用于表征每个网页对网站的贡献比例;基于每个网页的权重和每个网页的页面质量,确定网站的网站质量;基于网站质量对网站进行分级,得到网站的质量等 级。也就是说,本公开实施例通过不同网页对网站贡献的权重,基于权重和页面的第一质量信息共同确定网站的质量,该方法精准、适用性强,从而提高了对网站质量进行判定的精度的技术效果,解决了对网站质量进行判定的精度低的技术问题。

Claims (15)

  1. 一种网站的网页处理方法,包括:
    获取网站的多个网页,其中,所述多个网页用于构建所述网站;
    基于所述多个网页之间的关联关系,确定每个所述网页的权重,其中,所述权重用于表征每个所述网页对所述网站的贡献比例;
    基于每个所述网页的权重和每个所述网页的页面质量,确定所述网站的网站质量;
    基于所述网站质量对所述网站进行分级,得到所述网站的质量等级。
  2. 根据权利要求1所述的方法,其中,基于所述多个网页之间的关联关系,确定每个所述网页的权重包括:
    获取所述网站的目标结构树,其中,所述目标结构树用于表示所述多个网页之间的关联关系,且所述目标结构树的节点用于表示所述网页;
    基于所述目标结构树确定每个所述网页的权重。
  3. 根据权利要求2所述的方法,其中,基于所述目标结构树确定每个所述网页的权重包括:
    至少基于每个所述网页在所述目标结构树中的属性信息确定每个所述网页的权重,其中,所述目标结构树的节点的属性用于表示对应的所述属性信息。
  4. 根据权利要求3所述的方法,其中,至少基于每个所述网页在所述目标结构树中的属性信息确定每个所述网页的权重包括:
    基于所述属性信息确定每个所述网页的第一权重,其中,所述权重包括所述第一权重;
    基于每个所述网页在所述目标结构树中与关联网页之间的目标关联关系,确定每个所述网页的第二权重,其中,所述权重包括所述第二权重,所述目标关联关系用于表征对应的所述网页和所述关联网页之间的处理顺序;
    基于所述目标关联关系在所述目标结构树中相对所述网站的首 页的深度信息,确定每个所述网页的第三权重,其中,所述权重包括所述第三权重。
  5. 根据权利要求4所述的方法,其中,所述属性信息包括对应的所述网页的类型和/或所述网页的页面质量。
  6. 根据权利要求4所述的方法,其中,所述目标关联关系用于表示所述网页与所述关联网页之间的以下至少之一关系:所述网页适配至所述关联网页,所述网页跳转至所述关联网页和所述网页导流至所述关联网页。
  7. 根据权利要求4所述的方法,其中,基于每个所述网页的权重和每个所述网页的页面质量,确定所述网站的网站质量,包括:
    基于所述第一权重、所述第二权重和所述第二权重调整每个所述网页的网页质量;
    基于所述第三权重和所述深度信息将调整后的每个所述网页的网页质量转换为所述网站的网站质量。
  8. 根据权利要求7所述的方法,其中,基于所述第三权重和所述深度信息将调整后的每个所述网页的网页质量转换为所述网站的网站质量包括:
    基于所述深度信息对所述第三权重进行指数运算,得到幂;
    获取所述幂和调整后的每个所述网页的网页质量二者之间的积;
    对所述多个网页对应的多个所述积进行求和,以得到所述网站质量。
  9. 根据权利要求7所述的方法,其中,基于所述第一权重、所述第二权重和所述第二权重调整每个所述网页的网页质量包括:
    将所述第一权重、所述第二权重、所述第二权重和每个所述网页的网页质量这四者之间的积,确定为调整后的每个所述网页的网页质量。
  10. 根据权利要求2所述的方法,其中,获取所述网站的目标结构树包括:
    基于每个所述网页的属性信息、每个所述网页与关联网页之间的目标关联关系,构建所述网站的原始结构树,其中,所述原始结 构树用于表示所述多个网页之间的全部关联关系,且所述原始结构树的节点与所述目标结构树的节点相同,所述目标关联关系用于表征对应的所述网页和所述关联网页之间的处理顺序;
    将所述原始结构树中具有第一数量的父节点的子节点,调整为具有第二数量的父节点的子节点,其中,所述第二数量小于所述第一数量;
    基于具有所述第二数量的父节点的子节点和所述子节点相对于所述原始结构树的根节点的深度信息,构建所述目标结构树。
  11. 根据权利要求1至10中任意一项所述的方法,还包括:
    响应于所述网站的质量满足质量阈值范围,或所述网站的等级满足等级阈值范围,对所述网站进行以下至少之一处理:对所述网站进行收录、对所述网站进行展现、对所述网站的网页的权重进行排序。
  12. 一种网站的网页处理装置,包括一个或多个处理器,以及一个或多个存储程序单元的存储器,其中,所述程序单元由所述处理器执行,所述程序单元包括:
    获取组件,设置为获取网站的多个网页,其中,所述多个网页用于构建所述网站;
    第一确定组件,设置为基于所述多个网页之间的关联关系,确定每个所述网页的权重,其中,所述权重用于表征每个所述网页对所述网站的贡献比例;
    第二确定组件,设置为基于每个所述网页的权重和每个所述网页的页面质量,确定所述网站的网站质量;
    分级组件,设置为基于所述网站质量对所述网站进行分级,得到所述网站的质量等级。
  13. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执 行权利要求1-12中任一项所述的方法。
  14. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1-12中任一项所述的方法。
  15. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-12中任一项所述的方法。
PCT/CN2022/126010 2022-04-29 2022-10-18 网站的网页处理方法、装置、电子设备和存储介质 WO2023206988A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210467257.2A CN114925308B (zh) 2022-04-29 2022-04-29 网站的网页处理方法、装置、电子设备和存储介质
CN202210467257.2 2022-04-29

Publications (1)

Publication Number Publication Date
WO2023206988A1 true WO2023206988A1 (zh) 2023-11-02

Family

ID=82806815

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/126010 WO2023206988A1 (zh) 2022-04-29 2022-10-18 网站的网页处理方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN114925308B (zh)
WO (1) WO2023206988A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996299A (zh) * 2006-12-12 2007-07-11 孙斌 对网页和网站评级的方法
US20170359235A1 (en) * 2016-06-14 2017-12-14 Microsoft Technology Licensing, Llc Weighted Experience Website Performance Score
CN108121741A (zh) * 2016-11-30 2018-06-05 百度在线网络技术(北京)有限公司 网站质量评估方法及装置
US20200151187A1 (en) * 2017-09-06 2020-05-14 Siteimprove A/S Website scoring system
CN114285760A (zh) * 2020-09-18 2022-04-05 华为技术有限公司 一种网页访问质量评估方法及装置
CN114297465A (zh) * 2021-12-29 2022-04-08 北京天融信网络安全技术有限公司 一种网页信息处理方法、系统、电子设备及存储介质

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101473304B (zh) * 2006-04-18 2016-04-27 新思科技有限公司 通过电路仿真对网页进行评级的方法
CN102542474B (zh) * 2010-12-07 2015-10-21 阿里巴巴集团控股有限公司 查询结果排序方法及装置
CN102541947B (zh) * 2010-12-31 2015-03-18 百度在线网络技术(北京)有限公司 一种用于基于扩展推荐事件更新网页权威值的方法与设备
US20130036039A1 (en) * 2011-08-01 2013-02-07 Rohlfs Michael B System for market hedging and related method
CN103136626B (zh) * 2011-11-29 2016-08-03 北京建龙重工集团有限公司 工程项目的在线管理方法
CN102663101B (zh) * 2012-04-13 2015-10-28 北京交通大学 一种基于新浪微博的用户等级排序算法
CN103544257B (zh) * 2013-10-15 2017-01-18 北京国双科技有限公司 网页质量检测方法和装置
CN103533367B (zh) * 2013-10-23 2015-08-19 传线网络科技(上海)有限公司 一种无参考视频质量评价方法及装置
CN104519141B (zh) * 2015-01-12 2018-07-20 张树人 社会关系网络中基于关系评价传递的量化模型与方法
CN107229631B (zh) * 2016-03-24 2020-11-03 北京京东尚科信息技术有限公司 一种抓取网站数据的方法和装置
CN106570525A (zh) * 2016-10-26 2017-04-19 昆明理工大学 一种基于贝叶斯网络的在线商品评价质量评估方法
CN108364199B (zh) * 2018-02-28 2021-08-13 北京搜狐新媒体信息技术有限公司 一种基于互联网用户评论的数据分析方法及系统
CN113239256B (zh) * 2021-05-14 2024-02-23 北京百度网讯科技有限公司 生成网站签名的方法、识别网站的方法及装置
CN113742627A (zh) * 2021-09-08 2021-12-03 北京百度网讯科技有限公司 不良网站识别方法、装置、电子设备和介质
CN113779559B (zh) * 2021-09-13 2023-10-03 北京百度网讯科技有限公司 用于识别作弊网站的方法、装置、电子设备和介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996299A (zh) * 2006-12-12 2007-07-11 孙斌 对网页和网站评级的方法
US20170359235A1 (en) * 2016-06-14 2017-12-14 Microsoft Technology Licensing, Llc Weighted Experience Website Performance Score
CN108121741A (zh) * 2016-11-30 2018-06-05 百度在线网络技术(北京)有限公司 网站质量评估方法及装置
US20200151187A1 (en) * 2017-09-06 2020-05-14 Siteimprove A/S Website scoring system
CN114285760A (zh) * 2020-09-18 2022-04-05 华为技术有限公司 一种网页访问质量评估方法及装置
CN114297465A (zh) * 2021-12-29 2022-04-08 北京天融信网络安全技术有限公司 一种网页信息处理方法、系统、电子设备及存储介质

Also Published As

Publication number Publication date
CN114925308A (zh) 2022-08-19
CN114925308B (zh) 2023-10-03

Similar Documents

Publication Publication Date Title
WO2021073298A1 (zh) 一种语音信息的处理方法、装置、智能终端以及存储介质
CN112765452B (zh) 搜索推荐方法、装置及电子设备
US20220245465A1 (en) Picture searching method and apparatus, electronic device and computer readable storage medium
WO2023279603A1 (zh) 路网交通瓶颈的识别方法、装置及电子设备
US11954084B2 (en) Method and apparatus for processing table, device, and storage medium
WO2023240878A1 (zh) 一种资源识别方法、装置、设备以及存储介质
KR20230150723A (ko) 분류 모델 트레이닝, 의미 분류 방법, 장치, 설비 및 매체
CN117077791A (zh) 一种基于图数据结构的模型推理方法、装置、设备及介质
US20220247626A1 (en) Method For Generating Backbone Network, Apparatus For Generating Backbone Network, Device, And Storage Medium
KR20220078538A (ko) 생체 감지 모델 트레이닝 방법, 장치, 전자 기기 및 저장 매체
WO2023206988A1 (zh) 网站的网页处理方法、装置、电子设备和存储介质
CN113360895A (zh) 站群检测方法、装置及电子设备
WO2023279744A1 (zh) 抓取压力的控制方法、装置、电子设备及可读存储介质
CN116955075A (zh) 一种基于日志的解析语句生成方法、装置、设备及介质
US20240214637A1 (en) Method of pushing video, electronic device and storage medium
WO2022227760A1 (zh) 图像检索方法、装置、电子设备及计算机可读存储介质
CN113408298B (zh) 语义解析方法、装置、电子设备及存储介质
CN113032251B (zh) 应用程序服务质量的确定方法、设备和存储介质
US20230004774A1 (en) Method and apparatus for generating node representation, electronic device and readable storage medium
CN113900731A (zh) 请求处理方法、装置、设备和存储介质
CN113536087A (zh) 作弊站点的识别方法、装置、设备、存储介质及程序产品
CN113360798B (zh) 泛滥数据识别方法、装置、设备和介质
CN113656671B (zh) 模型训练方法、链接评分方法、装置、设备、介质和产品
US20220374603A1 (en) Method of determining location information, electronic device, and storage medium
US11907668B2 (en) Method for selecting annotated sample, apparatus, electronic device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22939808

Country of ref document: EP

Kind code of ref document: A1