CN111914201A - Network page processing method and device - Google Patents
Network page processing method and device Download PDFInfo
- Publication number
- CN111914201A CN111914201A CN202010789735.2A CN202010789735A CN111914201A CN 111914201 A CN111914201 A CN 111914201A CN 202010789735 A CN202010789735 A CN 202010789735A CN 111914201 A CN111914201 A CN 111914201A
- Authority
- CN
- China
- Prior art keywords
- page
- pages
- target
- medical
- field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title abstract description 9
- 238000000034 method Methods 0.000 claims abstract description 55
- 238000012545 processing Methods 0.000 claims abstract description 33
- 230000000875 corresponding effect Effects 0.000 claims description 68
- 238000013145 classification model Methods 0.000 claims description 22
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 7
- 230000002596 correlated effect Effects 0.000 claims description 4
- 238000013016 damping Methods 0.000 claims description 4
- 230000009193 crawling Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 23
- 238000004590 computer program Methods 0.000 description 15
- 238000005516 engineering process Methods 0.000 description 13
- 238000013473 artificial intelligence Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 12
- 230000008569 process Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 6
- 238000003058 natural language processing Methods 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 238000011160 research Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 241000282414 Homo sapiens Species 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 208000037157 Azotemia Diseases 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 208000000913 Kidney Calculi Diseases 0.000 description 1
- 206010029148 Nephrolithiasis Diseases 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 230000002458 infectious effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 231100000957 no side effect Toxicity 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 208000009852 uremia Diseases 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The embodiment of the application provides a method and a device for processing a web page. The processing method of the network page comprises the following steps: performing domain classification on the page to be processed based on the content in the page to be processed to obtain at least one domain; determining authority values of all pages in the target field relative to other pages in the target field based on the incidence relation among the pages in the target field; and presenting the information of the page in the target field in the webpage based on the authority value corresponding to each page. The authority value is calculated according to the page in one field, so that the display of the page with the incidence relation in the field is realized, the logic and hierarchy of network page pushing are further improved, and the content pushing effect on a user side is improved.
Description
Technical Field
The present application relates to the field of computer and communication technologies, and in particular, to a method and an apparatus for processing a web page.
Background
In many websites, the content is pushed by recommending the information of some related webpages in one webpage, so that the purpose of information popularization is achieved. In the in-website web page pushing process of many websites, it is common to directly push the web pages by indexing through an in-website search engine, so as to present some relevant contents on the user terminal. However, due to the variety of sources, types, etc. of the push content, the content pushed by such a push method is often messy, has no logic and different levels, and thus the content push effect on the user terminal is not good.
Disclosure of Invention
Embodiments of the present application provide a method and an apparatus for processing a network page, so that the logic and hierarchy of network page pushing can be improved at least to a certain extent, and the content pushing effect on a user side is improved.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of an embodiment of the present application, a method for processing a web page is provided, including: performing domain classification on the page to be processed based on the content in the page to be processed to obtain at least one domain; determining authority values of all pages in a target field relative to other pages in the target field based on the incidence relation among the pages in the target field; and presenting the information of the pages in the target field in the webpage based on the authority values corresponding to the pages.
According to an aspect of the embodiments of the present application, there is provided a device for processing a web page, including: the classification unit is used for carrying out domain classification on the page to be processed based on the content in the page to be processed to obtain at least one domain; the fixed value unit is used for determining authority values of all pages in the target field relative to other pages in the target field based on the incidence relation among the pages in the target field; and the presenting unit is used for presenting the information of the pages in the target field in the webpage based on the authority values corresponding to the pages.
In some embodiments of the present application, based on the foregoing solution, the processing device of the web page further includes: the first acquisition unit is used for acquiring the website navigation information; the second acquisition unit is used for acquiring a page in the website based on the website structure and the seed page in the website navigation information; and the relationship determining unit is used for determining the association relationship among the pages based on the link relationship among the pages.
In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: and crawling the information in the website based on the website structure in the website navigation information and the seed page to obtain the page in the website.
In some embodiments of the present application, based on the foregoing scheme, the classification unit includes: the extraction unit is used for extracting the text content in the page to be processed; and the input unit is used for inputting the text content into a trained page classification model to obtain a field corresponding to the to-be-processed page output by the page classification model.
In some embodiments of the present application, the method for training a page classification model based on the foregoing scheme includes: acquiring text content of a page sample and a corresponding field tag thereof; extracting a vocabulary sample from the text content; inputting the vocabulary sample into a page classification network to obtain a classification result output by the page classification network; and adjusting parameters in the page classification network based on the classification result and the loss function obtained by the domain label to obtain the page classification model.
In some embodiments of the present application, based on the foregoing scheme, the valuing unit includes: the related page determining unit is used for determining related pages in the target field based on the related relation among the pages in the selected target field; and the authority value determining unit is used for determining the authority values of the associated pages in the target field relative to other pages in the target field based on the calling relationship among the associated pages, wherein the calling relationship is positively correlated with the authority values.
In some embodiments of the present application, based on the foregoing solution, the authority value determination unit is configured to: determining an incidence matrix based on the calling relation among the incidence pages; determining an authority parameter representing the relationship between the associated page and other pages in the target field based on the target field and the other pages except the page in the target field; and determining the authority value of the associated page in the target field relative to other pages in the target field based on the incidence matrix, the authority parameters and the damping coefficient.
In some embodiments of the present application, based on the foregoing solution, the presenting unit includes: a third acquisition unit configured to acquire a search term for the target domain; the target page determining unit is used for searching a target page corresponding to the search vocabulary entry from the page corresponding to the target field; and the page presenting unit is used for determining the display sequence of the target page based on the authority value corresponding to the target page and presenting the information of the target page in the webpage based on the display sequence.
In some embodiments of the present application, based on the foregoing solution, the processing device of the web page further includes: the medical classification unit is used for classifying the medical pages based on articles in the medical pages to be processed to obtain medical fields corresponding to the medical pages; the medical rating unit is used for determining authority values of the medical pages in the target medical field relative to other medical pages in the target medical field based on the incidence relation among the medical pages in the selected target medical field; and the medical presentation unit is used for presenting the information of the medical page in the medical field in a medical webpage based on the authority values corresponding to the pages.
According to an aspect of the embodiments of the present application, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method for processing a web page as described in the above embodiments.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the method for processing a web page as described in the above embodiments.
According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the processing method of the web page provided in the above-mentioned various alternative implementations.
In the technical solutions provided in some embodiments of the present application, each page to be processed in a website is classified to obtain a page corresponding to each field, so as to perform targeted processing on the page in each field. According to the association relationship among the pages in a target field, determining authority values of the pages in the field relative to other pages, finally presenting the information of the pages with the association relationship in the target field in the web page based on the authority values corresponding to the pages, and realizing the display of the pages with the association relationship in the field by calculating the authority values aiming at the pages in the field, thereby improving the logic and hierarchy of network page push and improving the content push effect on a user terminal.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 shows a schematic diagram of an exemplary system architecture to which aspects of embodiments of the present application may be applied;
FIG. 2 schematically illustrates a flow diagram of a method of processing a web page according to one embodiment of the present application;
FIG. 3 schematically shows a diagram of semantic drift according to an embodiment of the present application;
FIG. 4 schematically shows a diagram of training a page classification model according to an embodiment of the present application;
FIG. 5 schematically shows a flow diagram for presenting information of a page in the target domain in a web page according to one embodiment of the present application;
FIG. 6 schematically shows a flow chart of a method of processing a medical network page according to an embodiment of the present application;
FIG. 7 schematically shows a schematic view of a medical field classification according to an embodiment of the present application;
FIG. 8 schematically illustrates a diagram presenting information of a medical page according to an embodiment of the present application;
FIG. 9 schematically illustrates a block diagram of a processing device of a web page according to an embodiment of the present application;
FIG. 10 schematically shows a block diagram of a processing device of a medical network page according to an embodiment of the present application;
FIG. 11 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common virtual assistants, intelligent speakers, intelligent marketing, robots, intelligent medical treatment, intelligent customer service, and the like.
The scheme provided by the embodiment of the application relates to technologies such as artificial intelligence natural language processing and machine learning, and is specifically explained by the following embodiments: fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
As shown in fig. 1, the system architecture may include a terminal device (e.g., one or more of a smartphone 101, a tablet computer 102, and a portable computer 103 shown in fig. 1, but may also be a desktop computer, etc.), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
A user may use a terminal device to interact with the server 105 over the network 104 to receive or send messages or the like. The server 105 may be a server that provides various services. For example, the server 105 performs domain classification on the page to be processed based on the content in the page to be processed to obtain at least one domain, then determines authority values of each page in the target domain relative to other pages in the target domain based on the association relationship between the pages in the target domain, and finally presents the information of the page in the target domain in the web page based on the authority values corresponding to each page.
According to the scheme in the embodiment, each page to be processed in the website is classified to obtain the page corresponding to each field, so that the pages in each field are processed in a targeted manner. According to the association relationship among the pages in a target field, determining authority values of the pages in the field relative to other pages, finally presenting the information of the pages with the association relationship in the target field in the web page based on the authority values corresponding to the pages, and realizing the display of the pages with the association relationship in the field by calculating the authority values aiming at the pages in the field, thereby improving the logic and hierarchy of network page push and improving the content push effect on a user terminal.
It should be noted that the processing method of the web page provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the processing device of the web page is generally disposed in the server 105. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to execute the method for processing the web page provided in the embodiments of the present application.
The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:
fig. 2 shows a flowchart of a processing method of a web page according to an embodiment of the present application, which may be performed by a server, which may be the server shown in fig. 1. Referring to fig. 2, the method for processing the web page at least includes steps S210 to S230, which are described in detail as follows:
in step S210, the to-be-processed page is subjected to domain classification based on the content in the to-be-processed page, so as to obtain at least one domain.
Fig. 3 is a schematic diagram of semantic drift according to an embodiment of the present application.
As shown in fig. 3, in an embodiment of the present application, since different websites have different definitions on the related recommendation policy, there may still be a certain semantic drift between the related recommendation link given by the system and the original web page. For example, the text area 310 of the web page in fig. 3 is "the reason for frequent blisters on the mouth" and the corresponding text portion, and the related problems of the recommendation area 320 in the page include "no side effect will occur when face shaping is performed". Without any management between the two, this situation leads to drift and bias of content recommendations. In order to avoid this situation, in this embodiment, the to-be-processed pages are subjected to domain classification based on the content in the to-be-processed pages, so as to obtain pages included in each domain.
In an embodiment of the present application, in the process of performing domain classification on the page to be processed, the page to be processed may be classified based on text content and images in the page to be processed. For example, the similarity between images in each page to be processed is identified, the pages to be processed belonging to the same field are determined based on the similarity, and the corresponding name of the field is determined based on the image content or the text content.
In one embodiment of the present application, the pages to be processed may include various pages in a website, and the content in the pages may be associated or not associated. Meanwhile, the pages may also include a next level page below one page, and the like.
In one embodiment of the present application, the domain may be used to represent different page types, scopes to which the pages belong, and the like. The fields in this embodiment may be classified for multiple times to obtain corresponding fields of different grades, a first-grade field, a second-grade field, and the like under one directory.
In an embodiment of the present application, the process of performing domain classification on the to-be-processed page based on the content in the to-be-processed page in step S210 to obtain at least one domain includes the following steps: extracting text content in a page to be processed; and inputting the text content into the trained page classification model to obtain the field corresponding to the page to be processed and output by the page classification model.
In an embodiment of the present application, when performing domain classification on a to-be-processed page based on content in the to-be-processed page, the domain classification may be performed based on text content in the to-be-processed page. And obtaining the corresponding field of each page to be processed by identifying the similar condition or the associated condition among the text contents in the page to be processed. In addition, the text content of the page to be processed can be input into the trained page classification model, and the field corresponding to the page to be processed and output by the page classification model can be obtained.
Specifically, in an embodiment of the present application, a method for training a page classification model includes: acquiring text content of a page sample and a corresponding field tag thereof; extracting a vocabulary sample from the text content; inputting the vocabulary sample into a page classification network to obtain a classification result output by the page classification network; and adjusting parameters in the page classification network based on the classification result and the loss function obtained by the domain label to obtain a page classification model.
Fig. 4 is a schematic diagram of a training page classification model according to an embodiment of the present application.
As shown in fig. 4, in an embodiment of the present application, the page classification network may be constructed based on a Text Convolutional Neural network (TextCNN). Firstly, carrying out sample marking on text contents of a page sample, and determining a corresponding field label; inputting a sequence with the length of n, and extracting words 0-n-1 from the sequence; inputting each vocabulary into a page classification network in an input layer 410 to obtain a word vector with a dimension of K; inputting the word vector dimension K into the convolution layer 420 for convolution, wherein the specific convolution modes can be 2 × 1 dimension, 3 × 1 dimension and 4 × 1 dimension of 1024 layers; pooling data output by the convolutional layer 420 in the pooling layer 430 to obtain 1024 layers of pooled data; and then, connecting the pooled data at the full connection layer to obtain a classification system corresponding to the page sample, and finally obtaining a corresponding classification label based on the classification system.
Further, after obtaining the classification label corresponding to the page sample, comparing the classification label with the set domain label, and determining a corresponding loss function according to the comparison result so as to adjust parameters in the page classification network, thereby obtaining a page classification model.
Illustratively, as shown in fig. 4, in an application scenario of a medical website, a TextCNN-based classification model is trained by obtaining a medical science popularization article of the medical website to perform automatic sample labeling. Firstly, a word vector model in the medical field is trained based on tens of millions of medical information articles collected in advance, and then information titles in subsequent training and predicting stages are subjected to vector representation. Wherein, the words 0-n-1 in the leftmost data sequence of FIG. 4 are the K-dimensional word vectors corresponding to each segmented word in the medical information header. And the right-most classification system to be classified is the disease classification in the medical website.
In an embodiment of the present application, before the process of determining authority values of each page in the target domain relative to other pages in the target domain based on the association relationship between the pages in the target domain in step S220, the following steps are included: acquiring website navigation information; acquiring a page in a website based on a website structure and a seed page in the website navigation information; and determining the association relation between the pages based on the link relation between the pages.
It should be noted that, this part of the scheme may be executed before step S220, or may be executed before step S210.
In an embodiment of the application, the website navigation information is acquired, so that information in the website is crawled based on a website structure and a seed page in the website navigation information to acquire a page in the website. And determining the association relationship between the pages based on the connection relationship between the pages.
Specifically, the website navigation information in this embodiment may include a website structure, a seed page as a root page or a home page, and the like.
In step S220, authority values of the pages in the target domain relative to other pages in the target domain are determined based on the association relationship between the pages in the target domain.
In an embodiment of the present application, based on a specified target domain, according to an association relationship between pages in the target domain, in this embodiment, more important pages are often referred to by other pages more, or hyperlinks leading to the pages are added to other pages more. Illustratively, the link from the A page to the B page is interpreted as the A page voting for the B page, and the rank and authority value of the voted page are determined according to the voting source, the source of the source, namely the rank of the page linked to the A page and the voting object.
In an embodiment of the present application, the process of determining authority values of the pages in the target domain relative to other pages in the target domain based on the association relationship between the pages in the target domain in step S220 includes the following steps S2201 to S2202:
in step S2201, the associated pages in the target domain are determined based on the association relationship between the pages in the selected target domain.
In an embodiment of the present application, there is an association relationship between pages in a website, and, in the scope of the target domain, there is an association relationship between some websites, and there may be no association relationship between some webpages. In this embodiment, the page having the association relationship is used as the association page in the target domain based on the association relationship between the pages in the target domain.
In step S2202, authority values of the associated pages in the target domain relative to other pages in the target domain are determined based on a call relationship between the associated pages, where the call relationship and the authority values are positively correlated.
In an embodiment of the present application, since a positive correlation exists between the calling relationship and the authority value, in this embodiment, the authority value of the associated page in the target field relative to other pages is determined according to the calling relationship between the associated pages.
In an embodiment of the present application, in step S2202, based on a call relationship between associated pages, an authority value of an associated page in a target domain relative to other pages in the target domain is determined, where a process of positively correlating the call relationship with the authority value includes the following steps:
based on the calling relationship among the associated pages, determining the association matrix as:
wherein p is1~pNIndicating the page identification, N is a natural number greater than 2, iota (p)i,pj) For representing pages piFor page pjI and j are natural numbers smaller than N.
And determining the authority parameter representing the relationship between the associated page and other pages to be s based on the target domain and other pages except the pages in the target domain. Where s represents a vector, i.e., an inlink matrix in the same domain. Specifically, for a field, if the page k belongs to the field, the kth element in s is 1, otherwise, it is 0. Since the domains to which the respective pages belong are different, there is s corresponding to each domain, and | s | represents the number of 1 s, and the larger the number, the more pages the domain has.
Determining authority values of the associated pages in the target field relative to other pages in the target field in an iteration mode based on the incidence matrix, the authority parameters and the damping coefficient q as follows:
the concrete expression is as follows:
in one embodiment of the application, authority parameters are determined based on each page contained in one field, so that authority values corresponding to the pages in the field are determined based on the authority parameters, and the comprehensiveness and accuracy of authority calculation of the pages are improved.
In step S230, information of the page in the target domain is presented in the web page based on the authority value corresponding to each page.
In an embodiment of the application, after authority values corresponding to pages are obtained through calculation, information of the pages with association relation in a target field is presented in a webpage based on the authority values of the pages.
In an embodiment of the present application, as shown in fig. 5, the process of presenting the information of the page in the target domain in the web page based on the authority value corresponding to each page in step S230 includes steps S2301 to S2303:
in step S2301, a search term for the target domain is acquired.
In one embodiment of the application, after the authority value is calculated, the search term for the target field input by the user is obtained. The search lifting bar in this embodiment may be a keyword or the like of search corresponding to the target field, and in addition, may be an image, a screenshot, or the like.
Specifically, in this embodiment, after the search term input by the user is acquired, the target field corresponding to the search term is determined in the website based on the search term. Or directly prompting the user to input the search terms aiming at the target field under the environment corresponding to the target field.
In step S2302, a target page corresponding to the search term is searched for from the pages corresponding to the target field.
In one embodiment of the application, a target page corresponding to a search term is determined based on pages corresponding to a target field. The specific target page searching method may be based on the search term, and search whether the text content in the page corresponding to the target field includes the search term and its similar term, if yes, the page is determined to be the target page.
In step S2303, a display order of the target page is determined based on the authority value corresponding to the target page, and information of the target page is presented in the web page based on the display order.
In one embodiment of the application, after determining a target page corresponding to a search term and authority values of the target page corresponding to the field, determining a display order of the target page based on the authority values of the target page, so as to present information of the target page in a webpage based on the display order. Specifically, the target page with the highest authority value may be used as the main page, and based on the authority value, information of other target pages is presented in the recommended part below the main page.
In one embodiment of the present application, the information of the rendered target page may include an illustration of the target page, a summary of the target page, a date of generation of the target page, and the like.
Fig. 6 shows a flowchart of a processing method of a medical network page according to an embodiment of the present application in the medical field, where the processing method of the medical network page may be performed by a server, which may be the server shown in fig. 1. Referring to fig. 6, the processing method of the medical web page at least includes steps S610 to S630, which are described in detail as follows:
in step S610, the medical pages are classified based on the articles in the medical page to be processed, so as to obtain the medical field corresponding to the medical page.
In an embodiment of the application, the medical field corresponding to the medical page is obtained by classifying the field of the medical page based on the article in the medical page to be processed in the medical website.
Fig. 7 is a schematic diagram of a medical field classification provided in an embodiment of the present application.
As shown in fig. 7, the medical field in the present embodiment may include a primary field, a secondary field, and so on. Wherein, the primary domain may be a department classification 710, which may include medical domains such as: internal medicine, surgery, oncology, neurology, infectious department, otorhinolaryngology, pediatrics, and the like. The secondary domain 710 may be a domain below the primary domain, such as under-medical nephrology, gastroenterology, endocrinology, and so forth. The tertiary domain may be a domain below the secondary domain, for example, kidney stones below nephrology, kidney deficiency, uremia, and the like.
In the embodiment, different medical fields are classified into different grades, so that the medical fields can be more clearly classified to obtain more clear page recommendation.
For example, in this embodiment, different domain levels may be divided for pages in the website according to different project levels, and then pages in corresponding ranges are recommended based on target domains in the different domain levels.
In step S620, based on the association relationship between the medical pages in the selected target medical field, authority values of the medical pages in the target medical field relative to other medical pages in the target medical field are determined.
In one embodiment of the application, authority values of the medical pages in the target medical field relative to other medical pages in the target medical field are determined based on the incidence relations among the medical pages in the selected target medical field. The authority value determination method may refer to the description in step S220 corresponding to fig. 2, and is not described herein again.
In step S630, information of the medical page in the medical field is presented in the medical web page based on the authority values corresponding to the respective pages.
Fig. 8 is a schematic diagram illustrating a presentation of information of a medical page according to an embodiment of the present application.
As shown in FIG. 8, for the home page 810 in the current medical website, during the display process, relevant recommendations appear at the bottom of the page, including the associated pages in the same domain and associated with the home page. Each associated page has a different authority value, and in this embodiment, the summary information of the associated pages may be displayed in an order from high to low based on the authority values, such as 820, 830, and 840 in fig. 8.
In the embodiment, the authority values of the associated pages are determined in the same medical field, so that when the content of one main page is displayed, the corresponding associated page can be determined based on the content of the main page, and the display mode of the information of the associated page can be determined based on the authority values of the associated pages, thereby improving the efficiency of page pushing.
The following describes an embodiment of an apparatus of the present application, which may be used to execute a method for processing a web page in the foregoing embodiment of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method for processing web pages described above in the present application.
FIG. 9 shows a block diagram of a processing device of a web page according to an embodiment of the application.
Referring to fig. 9, a device 900 for processing a web page according to an embodiment of the present application includes: a classifying unit 910, configured to perform domain classification on the page to be processed based on content in the page to be processed, so as to obtain at least one domain; a rating unit 920, configured to determine authority values of each page in the target domain relative to other pages in the target domain based on an association relationship between pages in the target domain; a presenting unit 930, configured to present, in the web page, information of the page in the target field based on the authority value corresponding to each page.
In some embodiments of the present application, based on the foregoing solution, the processing apparatus 900 for a web page further includes: the first acquisition unit is used for acquiring the website navigation information; the second acquisition unit is used for acquiring the page in the website based on the website structure and the seed page in the website navigation information; and the relationship determining unit is used for determining the association relationship between the pages based on the link relationship between the pages.
In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: and crawling the information in the website based on the website structure and the seed page in the website navigation information to obtain the page in the website.
In some embodiments of the present application, based on the foregoing scheme, the classification unit 910 includes: the extraction unit is used for extracting text contents in the page to be processed; and the input unit is used for inputting the text content into the trained page classification model to obtain the field corresponding to the page to be processed and output by the page classification model.
In some embodiments of the present application, the training method based on the aforementioned solution page classification model includes: acquiring text content of a page sample and a corresponding field tag thereof; extracting a vocabulary sample from the text content; inputting the vocabulary sample into a page classification network to obtain a classification result output by the page classification network; and adjusting parameters in the page classification network based on the classification result and the loss function obtained by the domain label to obtain a page classification model.
In some embodiments of the present application, based on the foregoing scheme, the constant value unit 920 includes: the related page determining unit is used for determining related pages in the target field based on the related relation among the pages in the selected target field; and the authority value determining unit is used for determining the authority values of the associated pages in the target field relative to other pages in the target field based on the calling relationship among the associated pages, wherein the calling relationship is positively correlated with the authority values.
In some embodiments of the present application, based on the foregoing scheme, the authority value determination unit is configured to: determining an incidence matrix based on a calling relation between the incidence pages; determining an authority parameter representing the relation between the associated page and other pages based on the target field and other pages except the pages in the target field; and determining the authority value of the associated page in the target field relative to other pages in the target field based on the incidence matrix, the authority parameters and the damping coefficient.
In some embodiments of the present application, based on the foregoing scheme, the presenting unit 930 includes: a third acquisition unit configured to acquire a search term for the target domain; the target page determining unit is used for searching a target page corresponding to the search vocabulary entry from the page corresponding to the target field; and the page presenting unit is used for determining the display sequence of the target page based on the authority value corresponding to the target page and presenting the information of the target page in the webpage based on the display sequence.
Fig. 10 is a block diagram of a processing apparatus of a medical network page according to an embodiment of the present application, and an implementation method of the processing apparatus in the application and medical field is the embodiment corresponding to fig. 6, which is not described herein again.
Referring to fig. 10, a processing apparatus 1000 for a medical network page according to an embodiment of the present application includes: the medical classification unit 1010 is used for classifying the medical pages based on the articles in the medical pages to be processed to obtain the medical fields corresponding to the medical pages; a medical rating unit 1020, configured to determine authority values of the medical pages in the target medical field relative to other medical pages in the target medical field based on the association relationship between the medical pages in the selected target medical field; and a medical presenting unit 1030, configured to present information of medical pages in the medical field in the medical web page based on the authority values corresponding to the respective pages.
FIG. 11 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
It should be noted that the computer system 1100 of the electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 11, a computer system 1100 includes a Central Processing Unit (CPU)1101, which can perform various appropriate actions and processes, such as performing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for system operation are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An Input/Output (I/O) interface 1105 is also connected to bus 1104.
The following components are connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output section 1107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a Network interface card such as a LAN (local area Network) card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. When the computer program is executed by a Central Processing Unit (CPU)1101, various functions defined in the system of the present application are executed.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units. Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
Claims (10)
1. A method for processing a web page, comprising:
performing domain classification on the page to be processed based on the content in the page to be processed to obtain at least one domain;
determining authority values of all pages in a target field relative to other pages in the target field based on the incidence relation among the pages in the target field;
and presenting the information of the pages in the target field in the webpage based on the authority values corresponding to the pages.
2. The method according to claim 1, wherein before determining authority values of the pages in the target domain relative to other pages in the target domain based on the association relationship between the pages in the target domain, the method further comprises:
acquiring website navigation information;
acquiring a page in the website based on a website structure and a seed page in the website navigation information;
and determining the association relation among the pages based on the link relation among the pages.
3. The method of claim 2, wherein obtaining the page in the website based on the website structure and the seed page in the website navigation information comprises:
and crawling the information in the website based on the website structure in the website navigation information and the seed page to obtain the page in the website.
4. The method according to claim 1, wherein performing domain classification on the page to be processed based on the content in the page to be processed to obtain at least one domain comprises:
extracting text content in the page to be processed;
and inputting the text content into a trained page classification model to obtain a field corresponding to the page to be processed and output by the page classification model.
5. The method of claim 4, wherein the method of training the page classification model comprises:
acquiring text content of a page sample and a corresponding field tag thereof;
extracting a vocabulary sample from the text content;
inputting the vocabulary sample into a page classification network to obtain a classification result output by the page classification network;
and adjusting parameters in the page classification network based on the classification result and the loss function obtained by the domain label to obtain the page classification model.
6. The method of claim 1, wherein determining authority values of each page in a target domain relative to other pages in the target domain based on the association between pages in the target domain comprises:
determining an associated page in the target field based on the association relationship between the pages in the selected target field;
and determining authority values of the associated pages in the target field relative to other pages in the target field based on the calling relation among the associated pages, wherein the calling relation is positively correlated with the authority values.
7. The method of claim 6, wherein determining authority values of the associated pages in the target domain relative to other pages in the target domain based on call relations between the associated pages comprises:
determining an incidence matrix based on the calling relation among the incidence pages;
determining an authority parameter representing the relationship between the associated page and other pages in the target field based on the target field and the other pages except the page in the target field;
and determining the authority value of the associated page in the target field relative to other pages in the target field based on the incidence matrix, the authority parameters and the damping coefficient.
8. The method according to claim 1, wherein presenting information of pages in the target domain in a web page based on the authority values corresponding to the respective pages comprises:
acquiring a search entry aiming at the target field;
searching a target page corresponding to the search entry from the page corresponding to the target field;
determining the display sequence of the target page based on the authority value corresponding to the target page, and presenting the information of the target page in the webpage based on the display sequence.
9. The method of claim 1, further comprising:
classifying the medical pages based on articles in the medical pages to be processed to obtain medical fields corresponding to the medical pages;
determining authority values of the medical pages in the target medical field relative to other medical pages in the target medical field based on the incidence relation among the medical pages in the selected target medical field;
and presenting the information of the medical page in the medical field in a medical webpage based on the authority values corresponding to the pages.
10. An apparatus for processing web pages, comprising:
the classification unit is used for carrying out domain classification on the page to be processed based on the content in the page to be processed to obtain at least one domain;
the fixed value unit is used for determining authority values of all pages in the target field relative to other pages in the target field based on the incidence relation among the pages in the target field;
and the presenting unit is used for presenting the information of the pages in the target field in the webpage based on the authority values corresponding to the pages.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010789735.2A CN111914201B (en) | 2020-08-07 | 2020-08-07 | Processing method and device of network page |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010789735.2A CN111914201B (en) | 2020-08-07 | 2020-08-07 | Processing method and device of network page |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111914201A true CN111914201A (en) | 2020-11-10 |
CN111914201B CN111914201B (en) | 2023-11-07 |
Family
ID=73283233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010789735.2A Active CN111914201B (en) | 2020-08-07 | 2020-08-07 | Processing method and device of network page |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111914201B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112416212A (en) * | 2020-11-25 | 2021-02-26 | 维沃移动通信有限公司 | Program access method, device, electronic equipment and readable storage medium |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101132446A (en) * | 2006-08-23 | 2008-02-27 | 上海万纬信息技术有限公司 | Web page intelligent snapping system and method thereof |
US20080275833A1 (en) * | 2007-05-04 | 2008-11-06 | Microsoft Corporation | Link spam detection using smooth classification function |
CN101751438A (en) * | 2008-12-17 | 2010-06-23 | 中国科学院自动化研究所 | Theme webpage filter system for driving self-adaption semantics |
CN101903878A (en) * | 2007-10-11 | 2010-12-01 | 谷歌公司 | Methods and systems for classifying search results to determine page elements |
CN102567409A (en) * | 2010-12-31 | 2012-07-11 | 珠海博睿科技有限公司 | Method and device for providing retrieval associated word |
CN102859516A (en) * | 2009-04-08 | 2013-01-02 | 谷歌公司 | Generating improved document classification data using historical search results |
CN102890717A (en) * | 2012-09-29 | 2013-01-23 | 北京奇虎科技有限公司 | System and method for building webpage category knowledge base |
CN102902793A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Creation system and method of webpage classification knowledge base |
CN102902790A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Web page classification system and method |
CN102959545A (en) * | 2010-06-29 | 2013-03-06 | 微软公司 | Navigation to popular search results |
US20150095300A1 (en) * | 2010-06-20 | 2015-04-02 | Remeztech Ltd. | System and method for mark-up language document rank analysis |
CN104504070A (en) * | 2014-12-22 | 2015-04-08 | 北京奇虎科技有限公司 | Search method and device |
US20150302076A1 (en) * | 2014-04-17 | 2015-10-22 | Samsung Electronics Co., Ltd. | Method of storing and expressing web page in an electronic device |
CN106649823A (en) * | 2016-12-29 | 2017-05-10 | 淮海工学院 | Webpage classification recognition method based on comprehensive subject term vertical search and focused crawler |
CN106776710A (en) * | 2016-11-18 | 2017-05-31 | 广东技术师范学院 | A kind of picture and text construction of knowledge base method based on vertical search engine |
CN106874340A (en) * | 2016-12-22 | 2017-06-20 | 新华三技术有限公司 | A kind of web page address sorting technique and device |
CN107153498A (en) * | 2016-03-30 | 2017-09-12 | 阿里巴巴集团控股有限公司 | A kind of page processing method, device and intelligent terminal |
CN108694197A (en) * | 2017-04-10 | 2018-10-23 | 富士通株式会社 | Hypertext grasping means and device |
CN110209906A (en) * | 2018-02-07 | 2019-09-06 | 北京京东尚科信息技术有限公司 | Method and apparatus for extracting webpage information |
US20210377628A1 (en) * | 2018-08-31 | 2021-12-02 | Beijing Bytedance Network Technology Co., Ltd. | Method and apparatus for outputting information |
-
2020
- 2020-08-07 CN CN202010789735.2A patent/CN111914201B/en active Active
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101132446A (en) * | 2006-08-23 | 2008-02-27 | 上海万纬信息技术有限公司 | Web page intelligent snapping system and method thereof |
US20080275833A1 (en) * | 2007-05-04 | 2008-11-06 | Microsoft Corporation | Link spam detection using smooth classification function |
CN101903878A (en) * | 2007-10-11 | 2010-12-01 | 谷歌公司 | Methods and systems for classifying search results to determine page elements |
CN101751438A (en) * | 2008-12-17 | 2010-06-23 | 中国科学院自动化研究所 | Theme webpage filter system for driving self-adaption semantics |
CN102859516A (en) * | 2009-04-08 | 2013-01-02 | 谷歌公司 | Generating improved document classification data using historical search results |
US20150095300A1 (en) * | 2010-06-20 | 2015-04-02 | Remeztech Ltd. | System and method for mark-up language document rank analysis |
CN102959545A (en) * | 2010-06-29 | 2013-03-06 | 微软公司 | Navigation to popular search results |
CN102567409A (en) * | 2010-12-31 | 2012-07-11 | 珠海博睿科技有限公司 | Method and device for providing retrieval associated word |
CN102890717A (en) * | 2012-09-29 | 2013-01-23 | 北京奇虎科技有限公司 | System and method for building webpage category knowledge base |
CN102902793A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Creation system and method of webpage classification knowledge base |
CN102902790A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Web page classification system and method |
US20150302076A1 (en) * | 2014-04-17 | 2015-10-22 | Samsung Electronics Co., Ltd. | Method of storing and expressing web page in an electronic device |
CN104504070A (en) * | 2014-12-22 | 2015-04-08 | 北京奇虎科技有限公司 | Search method and device |
CN107153498A (en) * | 2016-03-30 | 2017-09-12 | 阿里巴巴集团控股有限公司 | A kind of page processing method, device and intelligent terminal |
CN106776710A (en) * | 2016-11-18 | 2017-05-31 | 广东技术师范学院 | A kind of picture and text construction of knowledge base method based on vertical search engine |
CN106874340A (en) * | 2016-12-22 | 2017-06-20 | 新华三技术有限公司 | A kind of web page address sorting technique and device |
CN106649823A (en) * | 2016-12-29 | 2017-05-10 | 淮海工学院 | Webpage classification recognition method based on comprehensive subject term vertical search and focused crawler |
CN108694197A (en) * | 2017-04-10 | 2018-10-23 | 富士通株式会社 | Hypertext grasping means and device |
CN110209906A (en) * | 2018-02-07 | 2019-09-06 | 北京京东尚科信息技术有限公司 | Method and apparatus for extracting webpage information |
US20210377628A1 (en) * | 2018-08-31 | 2021-12-02 | Beijing Bytedance Network Technology Co., Ltd. | Method and apparatus for outputting information |
Non-Patent Citations (2)
Title |
---|
SHINE N. DAS 等: ""An Efficient Approach for Finding Near Duplicate Web pages using Minimum Weight Overlapping Method"", 《INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING (IJECE)》, pages 187 - 194 * |
何力 等: ""基于无标记Web数据的层次式文本分类"", 《智能系统学报》, pages 330 - 335 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112416212A (en) * | 2020-11-25 | 2021-02-26 | 维沃移动通信有限公司 | Program access method, device, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111914201B (en) | 2023-11-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10963794B2 (en) | Concept analysis operations utilizing accelerators | |
US11151177B2 (en) | Search method and apparatus based on artificial intelligence | |
CN107491534B (en) | Information processing method and device | |
CN111737476B (en) | Text processing method and device, computer readable storage medium and electronic equipment | |
CN110851713B (en) | Information processing method, recommending method and related equipment | |
CN107463704B (en) | Search method and device based on artificial intelligence | |
CN113535984B (en) | Knowledge graph relation prediction method and device based on attention mechanism | |
CN111046275B (en) | User label determining method and device based on artificial intelligence and storage medium | |
US9535980B2 (en) | NLP duration and duration range comparison methodology using similarity weighting | |
CN106776503A (en) | The determination method and device of text semantic similarity | |
CN113011172B (en) | Text processing method, device, computer equipment and storage medium | |
CN113392209A (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN111813905A (en) | Corpus generation method and device, computer equipment and storage medium | |
CN113761190A (en) | Text recognition method and device, computer readable medium and electronic equipment | |
CN112464042A (en) | Task label generation method according to relation graph convolution network and related device | |
Azzam et al. | A question routing technique using deep neural network for communities of question answering | |
CN111914201B (en) | Processing method and device of network page | |
JP2023517518A (en) | Vector embedding model for relational tables with null or equivalent values | |
CN116628162A (en) | Semantic question-answering method, device, equipment and storage medium | |
WO2021223165A1 (en) | Systems and methods for object evaluation | |
CN109885647B (en) | User history verification method, device, electronic equipment and storage medium | |
CN113705692A (en) | Emotion classification method and device based on artificial intelligence, electronic equipment and medium | |
CN113656586B (en) | Emotion classification method, emotion classification device, electronic equipment and readable storage medium | |
CN114357163A (en) | Text type identification method and device, computer readable medium and electronic equipment | |
CN112528183B (en) | Webpage component layout method and device based on big data, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |