CN111914201B - Processing method and device of network page - Google Patents

Processing method and device of network page Download PDF

Info

Publication number
CN111914201B
CN111914201B CN202010789735.2A CN202010789735A CN111914201B CN 111914201 B CN111914201 B CN 111914201B CN 202010789735 A CN202010789735 A CN 202010789735A CN 111914201 B CN111914201 B CN 111914201B
Authority
CN
China
Prior art keywords
pages
page
target
field
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010789735.2A
Other languages
Chinese (zh)
Other versions
CN111914201A (en
Inventor
康战辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010789735.2A priority Critical patent/CN111914201B/en
Publication of CN111914201A publication Critical patent/CN111914201A/en
Application granted granted Critical
Publication of CN111914201B publication Critical patent/CN111914201B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a processing method and device of a network page. The processing method of the network page comprises the following steps: performing domain classification on the page to be processed based on the content in the page to be processed to obtain at least one domain; determining authority values of all pages in the target field relative to other pages in the target field based on the association relation between the pages in the target field; and based on the authority value corresponding to each page, presenting the information of the pages in the target field in the webpage. The authority value is calculated for the pages in one field, so that the pages with association relations in the field are displayed, the logic and layering performance of network page pushing are improved, and the content pushing effect on the user side is improved.

Description

Processing method and device of network page
Technical Field
The application relates to the technical field of computers and communication, in particular to a method and a device for processing a network page.
Background
In many websites, the content is pushed by recommending information of some related webpages in one page, so that the aim of information popularization is fulfilled. In the in-site page pushing process of many websites, the in-site search engine is generally used for indexing to directly push the content so as to display some related content on the user terminal. However, because of various sources, types and the like of the push content, the content pushed by the push mode is often messy, has no logic and is different in level, so that the content pushing effect on the user terminal is poor.
Disclosure of Invention
The embodiment of the application provides a processing method and a processing device for a network page, which can further improve the logic and layering of the network page pushing and improve the content pushing effect on a user side at least to a certain extent.
Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application.
According to an aspect of an embodiment of the present application, there is provided a method for processing a web page, including: performing domain classification on the page to be processed based on the content in the page to be processed to obtain at least one domain; determining authority values of all pages in the target field relative to other pages in the target field based on association relations among the pages in the target field; and based on the authority values corresponding to the pages, presenting the information of the pages in the target field in the webpage.
According to an aspect of an embodiment of the present application, there is provided a processing apparatus for a web page, including: the classifying unit is used for classifying the fields of the pages to be processed based on the content in the pages to be processed to obtain at least one field; the value setting unit is used for determining authority values of all pages in the target field relative to other pages in the target field based on the association relation between the pages in the target field; and the presentation unit is used for presenting the information of the pages in the target field in the webpage based on the authority values corresponding to the pages.
In some embodiments of the present application, based on the foregoing solution, the processing apparatus for a web page further includes: the first acquisition unit is used for acquiring website navigation information; the second acquisition unit is used for acquiring pages in the website based on the website structure and the seed pages in the website navigation information; and the relation determining unit is used for determining the association relation among the pages based on the link relation among the pages.
In some embodiments of the present application, based on the foregoing solution, the second obtaining unit is configured to: and crawling information in the website based on the website structure and the seed page in the website navigation information to acquire pages in the website.
In some embodiments of the application, based on the foregoing, the classification unit includes: the extraction unit is used for extracting text content in the page to be processed; the input unit is used for inputting the text content into the trained page classification model to obtain the field corresponding to the page to be processed output by the page classification model.
In some embodiments of the present application, the training method of the page classification model based on the foregoing scheme includes: acquiring text content of a page sample and a corresponding field label thereof; extracting a vocabulary sample from the text content; inputting the vocabulary sample into a page classification network to obtain a classification result output by the page classification network; and adjusting parameters in the page classification network based on the classification result and the loss function obtained by the domain label to obtain the page classification model.
In some embodiments of the present application, based on the foregoing scheme, the constant value unit includes: an associated page determining unit, configured to determine an associated page in a target domain based on an association relationship between pages in the selected target domain; the authority value determining unit is used for determining authority values of the associated pages in the target field relative to other pages in the target field based on calling relations among the associated pages, wherein the calling relations and the authority values are positively correlated.
In some embodiments of the present application, based on the foregoing, the authority value determining unit is configured to: determining an association matrix based on the calling relation between the association pages; determining authority parameters representing the relationship between the associated page and other pages based on the target domain and the other pages except the page in the target domain; and determining authority values of the associated pages in the target field relative to other pages in the target field based on the association matrix, the authority parameters and the damping coefficients.
In some embodiments of the application, based on the foregoing, the presentation unit includes: a third obtaining unit, configured to obtain a search term for the target field; the target page determining unit is used for searching a target page corresponding to the search term from pages corresponding to the target field; and the page presentation unit is used for determining the display sequence of the target page based on the authority value corresponding to the target page and presenting the information of the target page in the webpage based on the display sequence.
In some embodiments of the present application, based on the foregoing solution, the processing apparatus for a web page further includes: the medical treatment classification unit is used for classifying the medical treatment pages based on articles in the medical treatment pages to be processed to obtain medical treatment fields corresponding to the medical treatment pages; the medical treatment value setting unit is used for determining authority values of all medical treatment pages in the target medical treatment field relative to other medical treatment pages in the target medical treatment field based on the association relation among all medical treatment pages in the selected target medical treatment field; and the medical treatment presentation unit is used for presenting the information of the medical treatment page in the medical treatment field in a medical treatment webpage based on the authority values corresponding to the pages.
According to an aspect of the embodiments of the present application, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method of processing a web page as described in the above embodiments.
According to an aspect of an embodiment of the present application, there is provided an electronic apparatus including: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for processing a web page as described in the above embodiments.
According to an aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the processing method of the network page provided in the above-mentioned various alternative implementations.
In the technical schemes provided by some embodiments of the present application, each page to be processed in a website is classified to obtain pages corresponding to each field, so as to perform targeted processing on the pages in each field. According to the association relation among the pages in one target field, the authority degree value of each page in the field relative to other pages is determined, finally, the information of the pages with the association relation in the target field is presented in the webpage based on the corresponding authority degree value of each page, and the authority degree value is calculated for the pages in one field to realize the display of the pages with the association relation in the field, so that the logic and layering performance of network page pushing are improved, and the content pushing effect on a user side is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:
FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the application may be applied;
FIG. 2 schematically illustrates a flow diagram of a method of processing a web page according to one embodiment of the application;
FIG. 3 schematically illustrates a schematic diagram of semantic drift according to one embodiment of the present application;
FIG. 4 schematically illustrates a schematic diagram of a training page classification model according to an embodiment of the application;
FIG. 5 schematically illustrates a flow diagram for presenting information of pages in the target domain in a web page according to one embodiment of the application;
FIG. 6 schematically illustrates a flow chart of a method of processing a medical network page according to one embodiment of the application;
FIG. 7 schematically illustrates a schematic diagram of medical domain classification according to an embodiment of the application;
FIG. 8 schematically illustrates a schematic diagram of presenting information of a medical page according to one embodiment of the application;
FIG. 9 schematically illustrates a block diagram of a processing device of a web page according to one embodiment of the application;
FIG. 10 schematically illustrates a block diagram of a processing device of a medical network page according to one embodiment of the application;
fig. 11 shows a schematic diagram of a computer system suitable for use in implementing an embodiment of the application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like.
With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are developed in various fields, such as common virtual assistants, intelligent sound boxes, intelligent marketing, robots, intelligent medical treatment, intelligent customer service and the like, and it is believed that with the development of technology, the artificial intelligence technology will be applied in more fields and become more and more important.
The scheme provided by the embodiment of the application relates to artificial intelligence natural language processing, machine learning and other technologies, and is specifically described by the following embodiments: fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the present application may be applied.
As shown in fig. 1, the system architecture may include a terminal device (such as one or more of the smartphone 101, tablet 102, and portable computer 103 shown in fig. 1, but of course, a desktop computer, etc.), a network 104, and a server 105. The network 104 is the medium used to provide communication links between the terminal devices and the server 105. The network 104 may include various connection types, such as wired communication links, wireless communication links, and the like.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.
A user may interact with the server 105 via the network 104 using a terminal device to receive or send messages or the like. The server 105 may be a server providing various services. For example, the server 105 classifies the to-be-processed page according to the content in the to-be-processed page to obtain at least one domain, then determines authority values of each page in the target domain relative to other pages in the target domain according to the association relation between the pages in the target domain, and finally presents the information of the pages in the target domain in the webpage according to the authority values corresponding to each page.
According to the scheme in the embodiment, the pages to be processed in the website are classified to obtain the pages corresponding to the fields, and targeted processing is carried out on the pages in the fields. According to the association relation among the pages in one target field, the authority degree value of each page in the field relative to other pages is determined, finally, the information of the pages with the association relation in the target field is presented in the webpage based on the corresponding authority degree value of each page, and the authority degree value is calculated for the pages in one field to realize the display of the pages with the association relation in the field, so that the logic and layering performance of network page pushing are improved, and the content pushing effect on a user side is improved.
It should be noted that, the processing method of the web page provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the processing device of the web page is generally disposed in the server 105. However, in other embodiments of the present application, the terminal device may also have a similar function to the server, so as to execute the processing method of the network page provided by the embodiment of the present application.
The implementation details of the technical scheme of the embodiment of the application are described in detail below:
Fig. 2 shows a flowchart of a processing method of a web page according to an embodiment of the present application, which may be performed by a server, which may be the server shown in fig. 1. Referring to fig. 2, the processing method of the web page at least includes steps S210 to S230, and is described in detail as follows:
in step S210, the domain classification is performed on the page to be processed based on the content in the page to be processed, so as to obtain at least one domain.
Fig. 3 is a schematic diagram of semantic drift according to an embodiment of the present application.
As shown in fig. 3, in one embodiment of the present application, because different websites have different definitions on the relevant recommendation policies, there may still be some semantic drift between the relevant recommendation links given by the system and the original web page. For example, the text area 310 of the web page in fig. 3 is "the reason that the mouth is often long-bubble" and the corresponding text portion, and the related problems of the recommended area 320 in the page include "no side effect will occur when the face shaping is done. Without any management between the two, this situation severely leads to drift and bias in the content recommendation. To avoid this, in this embodiment, domain classification is performed on the pages to be processed based on the content in the pages to be processed, and pages included in each domain are obtained.
In one embodiment of the application, in the process of classifying the field of the page to be processed, classification can be performed based on text content and images in the page to be processed. For example, the similarity between images in each page to be processed is identified, pages to be processed belonging to the same domain are determined based on the similarity, and the name corresponding to the domain is determined based on the image content or the text content.
In one embodiment of the application, the pages to be processed may include various pages in a website, and the content in the pages may or may not be related. Meanwhile, the pages may also include the next page under one page, and so on.
In one embodiment of the application, the field may be used to represent different page types, the scope to which the page belongs, and so on. The fields in this embodiment may be classified multiple times to obtain the fields of different levels, primary fields, secondary fields, and so on, corresponding to one directory.
In one embodiment of the present application, the process of classifying the domain of the page to be processed based on the content in the page to be processed in step S210 to obtain at least one domain includes the following steps: extracting text content in a page to be processed; inputting the text content into the page classification model obtained through training, and obtaining the field corresponding to the page to be processed output by the page classification model.
In one embodiment of the application, when the field classification is performed on the page to be processed based on the content in the page to be processed, the text content in the page to be processed can be performed. And obtaining the corresponding field of each page to be processed by identifying the similar situation or the association situation between the text contents in the pages to be processed. In addition, text content of the page to be processed can be input into the trained page classification model, and the field corresponding to the page to be processed output by the page classification model can be obtained.
Specifically, in one embodiment of the present application, a training method for a page classification model includes: acquiring text content of a page sample and a corresponding field label thereof; extracting vocabulary samples from text content; inputting the vocabulary sample into a page classification network to obtain a classification result output by the page classification network; and adjusting parameters in the page classification network based on the classification result and the loss function obtained by the domain label to obtain a page classification model.
Fig. 4 is a schematic diagram of a training page classification model according to an embodiment of the present application.
As shown in fig. 4, in one embodiment of the application, the page classification network may be constructed based on a text convolutional neural network (Text Convolutional Neural Networks, textCNN). Firstly, labeling a text content of a page sample, and determining a corresponding field label; inputting a sequence with the length of n, and extracting vocabulary 0-n-1 from the sequence; inputting each vocabulary into a page classification network in an input layer 410 to obtain a word vector with a dimension of K; the word vector dimension K is input to the convolution layer 420 for convolution, and the specific convolution modes can be 2×1, 3×1 and 4×1 dimensions of 1024 layers; in the pooling layer 430, pooling the data output by the convolution layer 420 to obtain pooled data of 1024 layers; and then, connecting the pooled data at a full connection layer to obtain a classification system corresponding to the page sample, and finally obtaining a corresponding classification label based on the classification system.
Further, after obtaining the classification label corresponding to the page sample, comparing the classification label with the set domain label, and determining a corresponding loss function according to the comparison result so as to adjust parameters in the page classification network and obtain the page classification model.
For example, as shown in fig. 4, in the application scenario of the medical website, automatic sample labeling is performed by acquiring medical science popularization articles of the medical website, and a corresponding classification model based on TextCNN is trained. Firstly, training a word vector model in the medical field based on ten million medical information articles collected in advance, and further performing vector representation on information titles in subsequent training and prediction stages. Wherein, the words 0 to n-1 in the leftmost data sequence in fig. 4 are K-dimensional word vectors corresponding to each segmented word in the medical information title. The classification system to be classified on the rightmost side is the disease classification in the medical website.
In one embodiment of the present application, before the process of determining authority values of each page in the target domain relative to other pages in the target domain based on the association relationship between the pages in the target domain in step S220, the method includes the following steps: acquiring website navigation information; acquiring pages in a website based on the website structure and the seed pages in the website navigation information; and determining the association relation among the pages based on the link relation among the pages.
It should be noted that, this partial scheme may be performed before step S220 or may be performed before step S210.
In one embodiment of the application, website navigation information is obtained to crawl information in a website based on a website structure and seed pages in the website navigation information to obtain pages in the website. And determining the association relation between the pages based on the connection relation between the pages.
Specifically, the website navigation information in this embodiment may include a website structure, a seed page as a root page or a homepage, and the like.
In step S220, authority values of the respective pages in the target domain with respect to other pages in the target domain are determined based on the association relationship between the pages in the target domain.
In one embodiment of the present application, based on the specified target domain, according to the association relationship between pages in the target domain, more important pages are often more referenced by other pages, or hyperlinks to the pages are more added to other pages. Illustratively, a link from an A page to a B page is interpreted as an A page votes for a B page, and the rank of the voted page, and the authority value, is determined based on the source of the vote, the source of the source, i.e., the rank of the pages and voting objects linked to the A page.
In one embodiment of the present application, the process of determining authority values of each page in the target domain with respect to other pages in the target domain in step S220 based on the association relationship between pages in the target domain includes the following steps S2201 to S2202:
in step S2201, an associated page in the target domain is determined based on the association relationship between pages in the selected target domain.
In one embodiment of the application, the association relationship exists among the pages in the websites, and in the scope of the target field, the association relationship exists among some websites, and the association relationship does not exist among some webpages possibly. In the embodiment, the pages with the association relationship are used as the association pages in the target domain based on the association relationship between the pages in the target domain.
In step S2202, authority values of the associated pages in the target domain relative to other pages in the target domain are determined based on the calling relationships between the associated pages, wherein the calling relationships and the authority values are positively correlated.
In one embodiment of the present application, since there is a positive correlation between the calling relationship and the authority value, in this embodiment, the authority value of the related page in the target domain relative to other pages is determined according to the calling relationship between the related pages.
In one embodiment of the present application, the step S2202 determines authority values of the associated pages in the target domain relative to other pages in the target domain based on the calling relations between the associated pages, where the procedure of positive correlation between the calling relations and the authority values includes the following steps:
based on the calling relation between the association pages, determining the association matrix as follows:
wherein p is 1 ~p N Representing the identity of the page, N is a natural number greater than 2, iota (p i ,p j ) For representing page p i For page p j I and j are natural numbers less than N.
And determining that the authority parameter representing the relationship between the associated page and other pages is s based on the target field and other pages except the pages in the target field. Where s represents a vector, i.e. an in-link matrix in the same domain. Specifically, for a field, if the page k belongs to the field, each element k in s is 1, otherwise, it is 0. Since the domains to which the respective pages belong are different, there is a corresponding s for each domain, and |s| represents the number of 1 s, and the larger the number, the more pages the domain has.
Based on the incidence matrix, the authority parameters and the damping coefficient q, determining the authority degree value of the incidence page in the target field relative to other pages in the target field in an iterative mode as follows:
The concrete steps are as follows:
in one embodiment of the application, the completeness and accuracy of the page authority calculation are improved by determining the authority parameters based on each page contained in one field so as to determine the authority value corresponding to the page in the field based on the authority parameters.
In step S230, the information of the pages in the target domain is presented in the web page based on the authority values corresponding to the respective pages.
In one embodiment of the application, after the authority value corresponding to the page is calculated, the information of the page with the association relation in the target field is presented in the webpage based on the authority value of each page.
In one embodiment of the present application, as shown in fig. 5, the process of presenting the information of the pages in the target domain in the web page based on the authority values corresponding to the respective pages in step S230 includes steps S2301 to S2303:
in step S2301, a search term for the target area is acquired.
In one embodiment of the application, after the authority value is calculated, search terms input by a user for the target field are acquired. The search bar in this embodiment may be a keyword of a search corresponding to the target field, and the like, and may be an image, a screenshot, and the like.
Specifically, in this embodiment, after a search term input by a user is obtained, a target field corresponding to the search term is determined in a website based on the search term. The user can be directly prompted to input search terms aiming at the target field under the environment corresponding to the target field.
In step S2302, a target page corresponding to the search term is searched from the pages corresponding to the target field.
In one embodiment of the application, a target page corresponding to the search term is determined based on the page corresponding to the target field. The specific target page searching method can be to search whether the text content in the page corresponding to the target field contains the search term and similar terms based on the search term, and if so, determining the page as the target page.
In step S2303, the display order of the target page is determined based on the authority value corresponding to the target page, and the information of the target page is presented in the web page based on the display order.
In one embodiment of the application, after determining the target page corresponding to the search term and the authority value corresponding to the target page in the field, determining the display order of the target page based on the authority value of the target page, so as to present the information of the target page in the webpage based on the display order. Specifically, the target page with the highest authority value can be used as the main page, and based on the authority value, the recommendation part below the main page presents information of other target pages.
In one embodiment of the application, the information of the target page presented may include an illustration of the target page, a summary of the target page, a date of generation of the target page, and so on.
Fig. 6 shows a flow chart of a method of processing a medical web page, which may be performed by a server, which may be the server shown in fig. 1, according to an embodiment of the present application in the medical field. Referring to fig. 6, the processing method of the medical network page at least includes steps S610 to S630, and is described in detail as follows:
in step S610, the medical pages are classified based on the articles in the medical pages to be processed, so as to obtain the medical fields corresponding to the medical pages.
In one embodiment of the application, the medical field corresponding to the medical page is obtained by classifying the fields of the medical page based on the articles in the medical page to be processed in the medical website.
Fig. 7 is a schematic diagram of medical domain classification according to an embodiment of the present application.
As shown in fig. 7, the medical field in the present embodiment may include a primary field, a secondary field, and the like. The first-level domain may be a department class 710, where medical domains may include: internal medicine, surgery, oncology, neurology, infectious department, pentadacology, pediatrics, and the like. The secondary field 710 may be a field below the primary field, for example, a nephrology department, a gastroenterology department, an endocrinology department, etc. below the internal medicine. The tertiary field may be a field below the secondary field, for example, kidney stones, kidney deficiency, uremia, etc. below the nephrology department.
In the embodiment, the medical fields are classified into different grades based on the classification of the different medical fields, so that the classification of the medical fields is clearer, and the clear page recommendation is obtained.
By way of example, in this embodiment, different domain levels may be divided for pages in the website according to different project levels, and then pages in a corresponding range may be recommended based on target domains in the different domain levels.
In step S620, authority values of the respective medical pages in the target medical field with respect to other medical pages in the target medical field are determined based on the association relationship between the respective medical pages in the selected target medical field.
In one embodiment of the application, the authority value of each medical page in the target medical domain relative to other medical pages in the target medical domain is determined based on the association between the medical pages in the selected target medical domain. The determination method of the authority value can refer to the description in step S220 corresponding to fig. 2, and will not be described herein.
In step S630, information of the medical page in the medical field is presented in the medical web page based on the authority value corresponding to each page.
Fig. 8 is a schematic diagram of information presenting a medical page according to an embodiment of the present application.
As shown in FIG. 8, for a main page 810 in the current medical website, during its display, a relevant recommendation will appear at the bottom of the page, including an associated page in the same domain as the main page and associated with the main page. Each associated page has a different authority value, and summary information of the associated pages may be displayed in this embodiment based on the order of the authority values from high to low, as shown by 820, 830, and 840 in fig. 8.
In the embodiment, the authority value of each associated page is determined in the same medical field, so that when the content of one main page is displayed, the corresponding associated page can be determined based on the content of the main page, and the display mode of the information of the associated page can be determined based on the authority value of each associated page, thereby improving the page pushing efficiency.
The following describes an embodiment of the apparatus of the present application, which may be used to execute the processing method of the web page in the foregoing embodiment of the present application. For details not disclosed in the embodiment of the apparatus of the present application, please refer to the embodiment of the method for processing a web page described above.
Fig. 9 shows a block diagram of a processing device of a web page according to an embodiment of the application.
Referring to fig. 9, a processing apparatus 900 of a web page according to an embodiment of the present application includes: a classification unit 910, configured to perform domain classification on the page to be processed based on the content in the page to be processed, so as to obtain at least one domain; a value setting unit 920, configured to determine authority values of each page in the target domain relative to other pages in the target domain based on association relationships between pages in the target domain; and a presenting unit 930, configured to present the information of the pages in the target domain in the web page based on the authority values corresponding to the respective pages.
In some embodiments of the present application, based on the foregoing solution, the processing apparatus 900 for web pages further includes: the first acquisition unit is used for acquiring website navigation information; the second acquisition unit is used for acquiring pages in the website based on the website structure and the seed pages in the website navigation information; and the relationship determining unit is used for determining the association relationship among the pages based on the link relationship among the pages.
In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: based on the website structure and the seed pages in the website navigation information, crawling the information in the website to acquire the pages in the website.
In some embodiments of the present application, based on the foregoing scheme, the classification unit 910 includes: the extraction unit is used for extracting text content in the page to be processed; the input unit is used for inputting the text content into the trained page classification model to obtain the field corresponding to the page to be processed output by the page classification model.
In some embodiments of the present application, the training method based on the scheme page classification model includes: acquiring text content of a page sample and a corresponding field label thereof; extracting vocabulary samples from text content; inputting the vocabulary sample into a page classification network to obtain a classification result output by the page classification network; and adjusting parameters in the page classification network based on the classification result and the loss function obtained by the domain label to obtain a page classification model.
In some embodiments of the present application, based on the foregoing scheme, the constant value unit 920 includes: an associated page determining unit for determining an associated page in the target domain based on an associated relationship between pages in the selected target domain; the authority degree value determining unit is used for determining authority degree values of the associated pages in the target field relative to other pages in the target field based on calling relations among the associated pages, wherein the calling relations and the authority degree values are positively correlated.
In some embodiments of the present application, based on the foregoing scheme, the authority value determining unit is configured to: determining an association matrix based on the calling relation between the association pages; determining authority parameters representing the relationship between the associated page and other pages based on the target field and other pages except the pages in the target field; and determining authority values of the associated pages in the target field relative to other pages in the target field based on the association matrix, the authority parameters and the damping coefficients.
In some embodiments of the application, based on the foregoing scheme, the presentation unit 930 includes: a third obtaining unit, configured to obtain a search term for the target field; the target page determining unit is used for searching a target page corresponding to the search term from pages corresponding to the target field; and the page presentation unit is used for determining the display sequence of the target page based on the authority value corresponding to the target page and presenting the information of the target page in the webpage based on the display sequence.
Fig. 10 is a block diagram of a processing device of a medical network page according to an embodiment of the present application, and a method executed by the processing device in the medical field is an embodiment corresponding to fig. 6, which is not described herein.
Referring to fig. 10, a processing apparatus 1000 of a medical network page according to an embodiment of the present application includes: the medical classification unit 1010 is configured to classify medical pages based on articles in the medical pages to be processed, so as to obtain medical fields corresponding to the medical pages; a medical treatment value setting unit 1020 for determining authority values of each medical treatment page in the target medical treatment domain relative to other medical treatment pages in the target medical treatment domain based on the association relationship between each medical treatment page in the selected target medical treatment domain; the medical presentation unit 1030 is configured to present information of a medical page in a medical field in a medical web page based on authority values corresponding to the respective pages.
Fig. 11 shows a schematic diagram of a computer system suitable for use in implementing an embodiment of the application.
It should be noted that, the computer system 1100 of the electronic device shown in fig. 11 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
As shown in fig. 11, the computer system 1100 includes a central processing unit (Central Processing Unit, CPU) 1101 that can perform various appropriate actions and processes, such as performing the method described in the above embodiment, according to a program stored in a Read-Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a random access Memory (RandomAccess Memory, RAM) 1103. In the RAM 1103, various programs and data required for system operation are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An Input/Output (I/O) interface 1105 is also connected to bus 1104.
The following components are connected to the I/O interface 1105: an input section 1106 including a keyboard, a mouse, and the like; an output portion 1107 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage section 1108 including a hard disk or the like; and a communication section 1109 including a network interface card such as a LAN (LocalArea Network ) card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. The drive 1110 is also connected to the I/O interface 1105 as needed. Removable media 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed on drive 1110, so that a computer program read therefrom is installed as needed into storage section 1108.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111. When executed by a Central Processing Unit (CPU) 1101, performs the various functions defined in the system of the present application.
It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternative implementations described above.
As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the methods described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied. From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for processing a web page, comprising:
performing domain classification on the page to be processed based on the content in the page to be processed to obtain at least one domain;
determining authority values of all pages in the target field relative to other pages in the target field based on association relations among the pages in the target field;
based on the authority values corresponding to the pages, information of the pages in the target field is presented in the webpage;
The determining authority value of each page in the target domain relative to other pages in the target domain based on the association relation between the pages in the target domain comprises the following steps: determining an associated page in the target domain based on an association relationship between pages in the selected target domain; determining authority values of the associated pages in the target field relative to other pages in the target field based on the calling relations among the associated pages, wherein the calling relations are positively correlated with the authority values;
the step of presenting the information of the pages in the target field in the webpage based on the authority values corresponding to the pages comprises the following steps: acquiring search terms aiming at the target field; searching a target page corresponding to the search term from pages corresponding to the target field; and determining the display sequence of the target page based on the authority value corresponding to the target page, and presenting the information of the target page in the webpage based on the display sequence.
2. The method of claim 1, wherein prior to determining authority values of each page in a target domain relative to other pages in the target domain based on association relationships between pages in the target domain, the method further comprises:
Acquiring website navigation information;
acquiring pages in the website based on the website structure and the seed pages in the website navigation information;
and determining the association relation among the pages based on the link relation among the pages.
3. The method of claim 2, wherein obtaining pages in the website based on the website structure and seed pages in the website navigation information comprises:
and crawling information in the website based on the website structure and the seed page in the website navigation information to acquire pages in the website.
4. The method of claim 1, wherein the domain classification of the page to be processed based on the content in the page to be processed results in at least one domain, comprising:
extracting text content in the page to be processed;
inputting the text content into a page classification model obtained through training, and obtaining the field corresponding to the page to be processed output by the page classification model.
5. The method of claim 4, wherein the training method of the page classification model comprises:
acquiring text content of a page sample and a corresponding field label thereof;
Extracting a vocabulary sample from the text content;
inputting the vocabulary sample into a page classification network to obtain a classification result output by the page classification network;
and adjusting parameters in the page classification network based on the classification result and the loss function obtained by the domain label to obtain the page classification model.
6. The method of claim 1, wherein determining an authority value of the associated page in the target domain relative to other pages in the target domain based on the calling relationship between the associated pages comprises:
determining an association matrix based on the calling relation between the association pages;
determining authority parameters representing the relationship between the associated page and other pages based on the target domain and the other pages except the page in the target domain;
and determining authority values of the associated pages in the target field relative to other pages in the target field based on the association matrix, the authority parameters and the damping coefficients.
7. The method according to claim 1, wherein the method further comprises:
classifying the medical pages based on articles in the medical pages to be processed to obtain medical fields corresponding to the medical pages;
Determining authority values of all medical pages in the target medical field relative to other medical pages in the target medical field based on association relations among all medical pages in the selected target medical field;
and based on the authority values corresponding to the pages, information of the medical pages in the medical field is presented in the medical webpage.
8. A processing apparatus for web pages, comprising:
the classifying unit is used for classifying the fields of the pages to be processed based on the content in the pages to be processed to obtain at least one field;
the value setting unit is used for determining authority values of all pages in the target field relative to other pages in the target field based on the association relation between the pages in the target field;
the presentation unit is used for presenting the information of the pages in the target field in the webpage based on the authority values corresponding to the pages;
wherein, the constant value unit includes: an associated page determining unit, configured to determine an associated page in a target domain based on an association relationship between pages in the selected target domain; the authority value determining unit is used for determining authority values of the associated pages in the target field relative to other pages in the target field based on calling relations among the associated pages, wherein the calling relations are positively correlated with the authority values;
The presentation unit includes: an acquisition unit configured to acquire a search term for the target field; the target page determining unit is used for searching a target page corresponding to the search term from pages corresponding to the target field; and the page presentation unit is used for determining the display sequence of the target page based on the authority value corresponding to the target page and presenting the information of the target page in the webpage based on the display sequence.
9. A computer readable medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the method according to any of claims 1-7.
10. An electronic device, comprising: one or more processors; storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of any of claims 1-7.
CN202010789735.2A 2020-08-07 2020-08-07 Processing method and device of network page Active CN111914201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010789735.2A CN111914201B (en) 2020-08-07 2020-08-07 Processing method and device of network page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010789735.2A CN111914201B (en) 2020-08-07 2020-08-07 Processing method and device of network page

Publications (2)

Publication Number Publication Date
CN111914201A CN111914201A (en) 2020-11-10
CN111914201B true CN111914201B (en) 2023-11-07

Family

ID=73283233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010789735.2A Active CN111914201B (en) 2020-08-07 2020-08-07 Processing method and device of network page

Country Status (1)

Country Link
CN (1) CN111914201B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416212B (en) * 2020-11-25 2023-05-30 维沃移动通信有限公司 Program access method, apparatus, electronic device and readable storage medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101132446A (en) * 2006-08-23 2008-02-27 上海万纬信息技术有限公司 Web page intelligent snapping system and method thereof
CN101751438A (en) * 2008-12-17 2010-06-23 中国科学院自动化研究所 Theme webpage filter system for driving self-adaption semantics
CN101903878A (en) * 2007-10-11 2010-12-01 谷歌公司 Methods and systems for classifying search results to determine page elements
CN102567409A (en) * 2010-12-31 2012-07-11 珠海博睿科技有限公司 Method and device for providing retrieval associated word
CN102859516A (en) * 2009-04-08 2013-01-02 谷歌公司 Generating improved document classification data using historical search results
CN102890717A (en) * 2012-09-29 2013-01-23 北京奇虎科技有限公司 System and method for building webpage category knowledge base
CN102902793A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Creation system and method of webpage classification knowledge base
CN102902790A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Web page classification system and method
CN102959545A (en) * 2010-06-29 2013-03-06 微软公司 Navigation to popular search results
CN104504070A (en) * 2014-12-22 2015-04-08 北京奇虎科技有限公司 Search method and device
CN106649823A (en) * 2016-12-29 2017-05-10 淮海工学院 Webpage classification recognition method based on comprehensive subject term vertical search and focused crawler
CN106776710A (en) * 2016-11-18 2017-05-31 广东技术师范学院 A kind of picture and text construction of knowledge base method based on vertical search engine
CN106874340A (en) * 2016-12-22 2017-06-20 新华三技术有限公司 A kind of web page address sorting technique and device
CN107153498A (en) * 2016-03-30 2017-09-12 阿里巴巴集团控股有限公司 A kind of page processing method, device and intelligent terminal
CN108694197A (en) * 2017-04-10 2018-10-23 富士通株式会社 Hypertext grasping means and device
CN110209906A (en) * 2018-02-07 2019-09-06 北京京东尚科信息技术有限公司 Method and apparatus for extracting webpage information

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7941391B2 (en) * 2007-05-04 2011-05-10 Microsoft Corporation Link spam detection using smooth classification function
US8782037B1 (en) * 2010-06-20 2014-07-15 Remeztech Ltd. System and method for mark-up language document rank analysis
KR102202896B1 (en) * 2014-04-17 2021-01-14 삼성전자 주식회사 Method for saving and expressing webpage
CN109271557B (en) * 2018-08-31 2022-03-22 北京字节跳动网络技术有限公司 Method and apparatus for outputting information

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101132446A (en) * 2006-08-23 2008-02-27 上海万纬信息技术有限公司 Web page intelligent snapping system and method thereof
CN101903878A (en) * 2007-10-11 2010-12-01 谷歌公司 Methods and systems for classifying search results to determine page elements
CN101751438A (en) * 2008-12-17 2010-06-23 中国科学院自动化研究所 Theme webpage filter system for driving self-adaption semantics
CN102859516A (en) * 2009-04-08 2013-01-02 谷歌公司 Generating improved document classification data using historical search results
CN102959545A (en) * 2010-06-29 2013-03-06 微软公司 Navigation to popular search results
CN102567409A (en) * 2010-12-31 2012-07-11 珠海博睿科技有限公司 Method and device for providing retrieval associated word
CN102890717A (en) * 2012-09-29 2013-01-23 北京奇虎科技有限公司 System and method for building webpage category knowledge base
CN102902793A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Creation system and method of webpage classification knowledge base
CN102902790A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Web page classification system and method
CN104504070A (en) * 2014-12-22 2015-04-08 北京奇虎科技有限公司 Search method and device
CN107153498A (en) * 2016-03-30 2017-09-12 阿里巴巴集团控股有限公司 A kind of page processing method, device and intelligent terminal
CN106776710A (en) * 2016-11-18 2017-05-31 广东技术师范学院 A kind of picture and text construction of knowledge base method based on vertical search engine
CN106874340A (en) * 2016-12-22 2017-06-20 新华三技术有限公司 A kind of web page address sorting technique and device
CN106649823A (en) * 2016-12-29 2017-05-10 淮海工学院 Webpage classification recognition method based on comprehensive subject term vertical search and focused crawler
CN108694197A (en) * 2017-04-10 2018-10-23 富士通株式会社 Hypertext grasping means and device
CN110209906A (en) * 2018-02-07 2019-09-06 北京京东尚科信息技术有限公司 Method and apparatus for extracting webpage information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"An Efficient Approach for Finding Near Duplicate Web pages using Minimum Weight Overlapping Method";Shine N. Das 等;《International Journal of Electrical and Computer Engineering (IJECE)》;第187-194页 *
"基于无标记Web数据的层次式文本分类";何力 等;《智能系统学报》;第330-335页 *

Also Published As

Publication number Publication date
CN111914201A (en) 2020-11-10

Similar Documents

Publication Publication Date Title
US10963794B2 (en) Concept analysis operations utilizing accelerators
CN107491534B (en) Information processing method and device
US11436487B2 (en) Joint embedding of corpus pairs for domain mapping
US9996604B2 (en) Generating usage report in a question answering system based on question categorization
US10642935B2 (en) Identifying content and content relationship information associated with the content for ingestion into a corpus
US20160232221A1 (en) Categorizing Questions in a Question Answering System
US9535980B2 (en) NLP duration and duration range comparison methodology using similarity weighting
CN107491547A (en) Searching method and device based on artificial intelligence
US10147047B2 (en) Augmenting answer keys with key characteristics for training question and answer systems
US9916395B2 (en) Determining answer stability in a question answering system
CN114565104A (en) Language model pre-training method, result recommendation method and related device
US10657189B2 (en) Joint embedding of corpus pairs for domain mapping
CN112131881B (en) Information extraction method and device, electronic equipment and storage medium
US11663518B2 (en) Cognitive system virtual corpus training and utilization
CN111813905A (en) Corpus generation method and device, computer equipment and storage medium
US10642919B2 (en) Joint embedding of corpus pairs for domain mapping
CN105760363A (en) Text file word sense disambiguation method and device
CN113761190A (en) Text recognition method and device, computer readable medium and electronic equipment
CN111914201B (en) Processing method and device of network page
Azzam et al. A question routing technique using deep neural network for communities of question answering
CN116628162A (en) Semantic question-answering method, device, equipment and storage medium
US11574017B2 (en) Sub-question result merging in question and answer (QA) systems
CN114357163A (en) Text type identification method and device, computer readable medium and electronic equipment
Cheng et al. Retrieving Articles and Image Labeling Based on Relevance of Keywords
CN116956899A (en) Text information keyword calculation method, device, program, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant