CN108959325B - Uniform resource locator display method, information display method and related products thereof - Google Patents

Uniform resource locator display method, information display method and related products thereof Download PDF

Info

Publication number
CN108959325B
CN108959325B CN201710385155.5A CN201710385155A CN108959325B CN 108959325 B CN108959325 B CN 108959325B CN 201710385155 A CN201710385155 A CN 201710385155A CN 108959325 B CN108959325 B CN 108959325B
Authority
CN
China
Prior art keywords
content
url
category
information
domain name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710385155.5A
Other languages
Chinese (zh)
Other versions
CN108959325A (en
Inventor
周显
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Kingsoft Mobile Technology Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Kingsoft Mobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd, Guangzhou Kingsoft Mobile Technology Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN201710385155.5A priority Critical patent/CN108959325B/en
Priority to PCT/CN2018/088438 priority patent/WO2018214964A1/en
Publication of CN108959325A publication Critical patent/CN108959325A/en
Application granted granted Critical
Publication of CN108959325B publication Critical patent/CN108959325B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention provides a Uniform Resource Locator (URL) display method, an information expression method, a URL display device and an information expression system. The method for displaying the uniform resource locator URL can comprise the following steps: acquiring first content, wherein the first content is at least one part of resource content corresponding to the URL; determining a category of the first content; determining a content display form according to the category; and displaying the URL in a content display form. Wherein, determining the content presentation form according to the category specifically includes: a content presentation form is determined based on the category and the domain name in the URL and based on at least a portion of the second content. Therefore, the embodiment of the invention solves the technical problem of how to improve the expressive force of the URL, beautifies the URL, improves the expressive force of the URL, enhances the expressive force of the article, and can improve the attraction to the user and further increase the click rate of the user.

Description

Uniform resource locator display method, information display method and related products thereof
Technical Field
The invention relates to the technical field of internet, in particular to a uniform resource locator display method, an information expression method, a uniform resource locator display device and an information expression system.
Background
Currently, authors often publish articles over the internet. When an author introduces a website, a video, a song or refers to other articles in his article, the author adds a corresponding URL (uniform resource locator) to the article published by the author, namely a hyperlink which is often seen by the author. Therefore, when the user reads the article, the user can jump to the corresponding webpage by directly clicking the URL, namely the hyperlink, and browse the webpage.
Specifically, there are two main ways of displaying URLs in the prior art: the first mode is to directly write the character string of the URL into an article; the second way is to set the URL as a text hyperlink, and then the author describes the resource content corresponding to the URL in the form of characters according to the self understanding.
As for the first way, this way of directly showing the URL in the article is actually just to directly show the URL string itself, and the URL string has no content information for the user, so that it cannot clearly show the main resource content that the URL is expected to be shown to the user. Therefore, the display mode causes poor expression and lack of attraction of the URL. For the second approach, although it is intended to facilitate user understanding and attract user clicks; however, since the textual description is a subjective summary of the author, in essence, the textual description of the URL presented in this manner has become part of the article, which does not present the URL objectively and realistically. And because the word expression has limitation, and is also related to the expression ability of the author; therefore, this URL display method is not very expressive.
Disclosure of Invention
The embodiment of the invention aims to provide a method for displaying a Uniform Resource Locator (URL) so as to at least solve the technical problem of how to improve the expressive force of the URL. In addition, an information expression method, a Uniform Resource Locator (URL) display device and an information expression system are also provided.
In order to achieve the above object, according to one aspect of the present invention, the following technical solutions are provided:
a method of uniform resource locator, URL, exposure, the method may comprise:
acquiring first content, wherein the first content is at least one part of resource content corresponding to the URL;
determining a category of the first content;
determining a content display form according to the category;
and displaying the URL with the content display form.
Further, the acquiring the first content may specifically include:
capturing the resource content corresponding to the URL in any one or more of the following modes: a crawler tool, an ASP web page crawling tool, a Java mode, a PHP mode, a Delphi mode, a Python mode, a Flex mode or a Ruby mode.
Further, the determining the category of the first content may specifically include:
identifying at least a portion of the first content;
extracting at least one part of second content, wherein the second content is the at least one part of first content;
classifying the at least a portion of the second content to determine a category of the first content.
Further, the extracting at least a part of the second content may specifically include:
XML serializing the at least a portion of the first content;
the extracting of the at least a portion of the second content is achieved by extracting at least a portion of the XML node information.
Further, the classifying the at least one part of the second content may specifically include:
classifying the at least a portion of the second content using a domain name and an HTML tag in the URL.
Further, the classifying the at least one part of the second content by using the domain name and the HTML tag in the URL may specifically include:
matching the domain name with a preset domain name list; the predetermined domain name list comprises predetermined domain names and first class information thereof;
if the matching is successful, extracting the first type information;
identifying the HTML label and determining second category information;
classifying the at least a portion of the second content according to the first category information and the second category information.
Further, the method may further include:
and if the matching fails, executing the step of identifying the HTML tag.
Further, the determining a content presentation form according to the category may specifically include:
determining the content presentation form based on the at least a portion of the second content according to the category and the domain name in the URL.
In order to achieve the above object, according to another aspect of the present invention, the following technical solutions are also provided:
a method of information presentation, the method may comprise:
the URL in the article is expressed by the URL display method.
In order to achieve the above object, according to still another aspect of the present invention, the following technical solutions are also provided:
an apparatus for Uniform Resource Locator (URL) presentation, the apparatus comprising:
an obtaining module, configured to obtain first content, where the first content is at least a part of resource content corresponding to the URL;
a first determination module for determining a category of the first content;
the second determining module is used for determining a content display form according to the category;
and the display module is used for displaying the URL in a content display mode.
Further, the acquiring module may specifically include:
a URL obtaining subunit, configured to obtain a URL;
a capturing unit, configured to capture the resource content corresponding to the URL in any one or more of the following manners, so as to obtain the first content: a crawler tool, an ASP web page crawling tool, a Java mode, a PHP mode, a Delphi mode, a Python mode, a Flex mode or a Ruby mode.
Further, the first determining module may specifically include:
an identifying unit configured to identify at least a part of the first content;
an extracting unit, configured to extract at least a part of second content, where the second content is the at least a part of the first content;
a classification unit configured to classify the at least a portion of the second content, thereby determining a category of the first content.
Further, the extraction unit may specifically include:
a serialization unit for performing XML serialization on the at least one part of the first content;
and the extraction subunit is used for realizing the extraction of at least one part of second content by extracting at least one part of XML node information.
Further, the classification unit may specifically include:
a classification subunit, configured to classify the at least part of the second content by using the domain name and the HTML tag in the URL.
Further, the classification subunit may specifically include:
the matching unit is used for matching the domain name with a preset domain name list; the predetermined domain name list comprises predetermined domain names and first class information thereof;
the category extraction unit is used for extracting the first category information under the condition of successful matching;
the category determining unit is used for identifying the HTML label and determining second category information;
a content classifying unit, configured to classify the at least part of the second content according to the first category information and the second category information.
Further, the apparatus may further include:
and the execution unit is used for triggering the category determination unit under the condition of failed matching.
Further, the second determining module may specifically include:
and the determining subunit is used for determining the content presentation form according to the category and the domain name in the URL and based on the at least part of the second content.
In order to achieve the above object, according to another aspect of the present invention, the following technical solutions are also provided:
an information presentation system, the system may comprise:
the URL display device;
and the expression module is used for expressing the URL in the article through the URL display device.
The embodiment of the invention provides a Uniform Resource Locator (URL) display method, an information expression method, a URL display device and an information expression system. The method for displaying the uniform resource locator URL can comprise the following steps: acquiring first content, wherein the first content is at least one part of resource content corresponding to the URL; determining a category of the first content; determining a content display form according to the category; and displaying the URL in a content display form. According to the technical scheme, the content display form is determined by judging the category of the resource content corresponding to the URL, so that the URL is displayed through the content display form. Compared with the mode of directly displaying the characters in the URL, the technical scheme provided by the embodiment of the invention beautifies the URL by displaying the URL in a content display mode, thereby improving the expressive force of the URL and enhancing the display force of the article, so that the attraction to the user can be improved, and the click rate of the user can be increased. Compared with the mode that an author subjectively summarizes resource content corresponding to the URL and then shows the URL through the text summarization, the content related to content showing in the technical scheme provided by the embodiment of the invention is objectively obtained, namely, the embodiment of the invention objectively obtains the first content, then judges the category of the first content, determines the content showing form according to the category, and finally shows the URL objectively and really by matching with the content showing form; therefore, the resource content corresponding to the URL can be accurately displayed, the expressive force of the URL is improved, the expressive force of the article is enhanced, and the attraction and click rate of the user can be improved.
Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a URL presentation method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a URL presentation method according to another embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a URL display device according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an information presentation system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, in the case of no conflict, the technical features in the embodiments of the present invention may be combined with each other.
Description of the terms
The content presentation format refers to various content presentation modes presented for the content to be more easily understood, for example: video, audio, teletext, maps, text, etc., but are by no means limited thereto.
The text Markup Language (HTML) is used to describe a web document. HTML tags various portions of a web page to be displayed by markup symbols. The web page file itself is a text file that can tell the browser how to display the contents therein by adding a marker to the text file, such as: how the text is processed, how the pictures are arranged, how the pictures are displayed, etc.
Extensible Markup Language (XML) is a standard text format for web sites to represent structured information. XML separates data from HTML. XML data is stored in plain text format. XML data can be accessed in HTML pages.
The embodiment of the invention provides a URL display method, which comprises the steps of obtaining resource content corresponding to a URL in advance and judging the category of the content; then, determining a content display form suitable for the category according to the category; finally, the URL and the content display form are presented in the article together, so that the URL can be displayed more clearly and expressively to represent the information, and the user can be more willing to click.
In view of the above, in practical applications, in order to solve the technical problem of how to improve the expressive power of the URL, an embodiment of the present invention provides a method for displaying a URL that is a uniform resource locator. As shown in fig. 1, the method may be implemented through steps S100 to S130. Wherein:
s100: and acquiring first content, wherein the first content is at least one part of resource content corresponding to the URL.
Here, url (uniform Resource locator) means a uniform Resource locator, commonly referred to as a web page address. A URL is a compact representation of the location and access method to a resource available from the internet, and is the address of a standard resource on the internet. A URL corresponds to a network-wide extension of a file name. It contains information indicating the location of the file and how the browser should handle it. Therefore, any resource content on the internet (e.g., websites, files, articles, videos, etc.) can be referred to by a URL.
It should be noted that "at least a portion" referred to herein may be a part or all of the above. For example: at least a part of the resource content corresponding to the URL may be a part of the resource content corresponding to the URL, or may be all the resource content corresponding to the URL.
In some optional embodiments, step S100 may specifically include: capturing the resource content corresponding to the URL in any one or more of the following modes: a crawler tool, an ASP (active server page) web page crawling tool, a Java (Java) mode, a PHP (Hypertext Preprocessor) mode, a Delphi (Delphi visual object oriented design tool) mode, a Python (scripting language combining interpretability, compilability, interactivity and object-oriented), a Flex (application tool based on Flash platform) mode or a Ruby (cross-platform object-oriented interpreted language) mode.
The following describes a process of capturing resource content corresponding to a URL by using a crawler tool as an example. The crawler tool can be realized by Java, Python and the like.
Specifically, the method includes the steps of obtaining URLs on initial webpages from URLs of one or a plurality of initial webpages, obtaining resource content of the URLs corresponding to one webpage through source code analysis, and continuously extracting new URLs from a current webpage and putting the new URLs into a queue in the process of webpage capturing until all webpages are captured. Alternatively, the resource content is captured by: and filtering the links which are not related to the subject according to the existing webpage analysis algorithm, reserving useful links, and putting the useful links into a URL queue waiting for grabbing. And then, selecting the URL of the resource content to be captured next from the queue according to the search strategy, and repeating the process until all the webpages are captured. Wherein, the search strategy can adopt to preferentially grab important web pages. Here, the importance of a web page may be measured by the popularity of the link, the link importance, and the average link depth. The popularity of a link may be measured by whether there are a large number of web pages directed. Link importance may be measured by whether it contains ". com", "home", and less "/". The average link depth represents the link distance (where the link is determined by breadth-first traversal rules) from each seed site to the web page in a set of seed sites.
The ASP web page grabbing tool is taken as an example to explain the process of grabbing the resource content corresponding to the URL. Wherein, filling out Form base class expression by using C # language (which is an object-oriented programming language); parameters are then passed through the post method (in asynchronous communication with the server) to grab the resource content.
Those skilled in the art will appreciate that a third-party tool may be used to capture the resource content when actually acquiring the content.
For example: a jsup parser (which is a Java-based HTML parser) may be used to directly parse a certain URL address, and capture resource content corresponding to the URL through DOM (Document Object Model) and CSS (Cascading Style Sheets).
It should be understood by those skilled in the art that the above examples are not intended to be exhaustive and that any presently existing or later-to-be-developed means for obtaining the first content, if applicable to the embodiments of the present invention, are intended to be included within the scope of the present invention and are hereby incorporated by reference.
S110: a category of the first content is determined.
In some optional embodiments, this step may be specifically implemented by step S112 to step S116. Wherein:
s112: at least a portion of the first content is identified.
As an example, this step may identify at least a portion of the first content by identifying an HTML tag based on HTML source code displayed by the browser. Because the HTML tags are predefined, it is possible to identify whether the content in the tags is text, video, or title based on the HTML tags. For a web page, there may be both pictures and words; the steps can identify which part is a picture, which part is a character, or which part is a video, etc.
Specifically, for example, the "< audio >" tag defines sound content, and it is known that the part of the content is audio by recognizing the tag. The "< canvas >" label defines a graphic, and the portion of the content is known to be a graphic by identifying the label. The "< video >" tag defines a video, and the video can be known by identifying the tag. The "< img >" tag defines an image, by which the portion of content is known to be an image. The "< body >" tag defines the content of the web page presentation. "< head >" defines title, character format, language, compatibility, keywords, description, etc.
It will be appreciated by those of ordinary skill in the art that the above examples are not intended to be exhaustive or that any presently existing or later-to-be-developed means for identifying at least a portion of the first content may be incorporated by reference herein, as may be applicable to the embodiments of the present invention.
S114: at least a portion of the second content is extracted, wherein the second content is at least a portion of the first content.
This step may extract portions of the content according to predefined HTML tags for presentation of the URL content.
In some optional embodiments, the step of extracting at least a part of the second content may further specifically include:
s1142: XML serialization is performed on at least a portion of the first content.
Where serialization refers to the process of converting a data structure or object into a binary string.
The serialization process is described in detail below in a preferred embodiment.
Step a 1: all HTML codes of the resource content are read using the method of WebClient's OpenRead in C #. NET.
Step a 2: the irrelevant code is deleted.
Since the Script cannot be parsed by XML, the Script between all < Script > </Script > in the code is deleted.
Step a 3: the regular irrelevant item in the HTML code is deleted.
For example: contents between < title > and </title >, and in < meta name ═ keywords "content ═ and"/>, can be deleted.
Step a 4: the HTML code is fully tagged and all tags are closed by adding an end tag.
Step a 5: and (3) serializing the whole Html code XML by using an XML serialization method of Microsoft to generate an XML file.
It should be understood by those skilled in the art that the above examples are not exhaustive, and any existing or future possible way of XML serialization of at least a portion of the first content, if applicable to embodiments of the present invention, is intended to be included within the scope of the present invention and is hereby incorporated by reference.
S1144: the extraction of at least a portion of the second content is achieved by extracting at least a portion of the XML node information.
An XML file is made up of nodes. A node is the smallest unit of a valid and complete structure in an XML file. The XML DOM (XML Document Object Model) defines the standard method of accessing and manipulating XML documents. The DOM views the XML document as a tree structure, so all elements can be accessed through the DOM tree. Elements and their text, attributes are considered nodes.
In a specific implementation process, node information may be extracted from the generated XML file through a Get function, or XML node information may be extracted by using XPath (XML Path Language).
The following example details the process of extracting XML node information (e.g., video).
Step b 1: recursive lookups are performed from node to node until the smallest node is found.
Wherein, the smallest node is the most basic unit Element.
Step b 2: and analyzing each minimum node by using an XML analyzer to obtain character string data.
Step b 3: the string data is passed to the VideoInfo class.
Step b 4: the data is localized to each video class by means of successive 2-layer extractions.
Step b 5: the data in each Video class is passed to the ArrayList data structure in the Search Video Info.
Step b 6: and associating the data in the ArrayList data structure with the corresponding Adapter data, thereby realizing the extraction of the XML node information.
It should be understood by those skilled in the art that the above examples are not exhaustive, and any existing or future possible way of extracting at least a portion of XML node information, if applicable to the embodiments of the present invention, is also intended to be included within the scope of the present invention and is hereby incorporated by reference.
S116: at least a portion of the second content is classified to determine a category of the first content.
In some optional embodiments, the step of classifying at least a part of the second content may specifically include: at least a portion of the second content is classified using the domain name and the HTML tag in the URL.
In some preferred embodiments, the step of classifying at least a part of the second content by using the domain name and the HTML tag in the URL may specifically include:
step c 1: matching the domain name with a preset domain name list, and if the matching is successful, executing the step c 2; if the matching fails, go to step c 3; the predetermined domain name list comprises predetermined domain names and first class information thereof.
Step c 2: first category information is extracted.
Step c 3: and identifying the HTML label and determining second category information.
Step c 4: at least a portion of the second content is classified according to the first category information and the second category information.
In the specific implementation process of the step, the step can identify the domain name according to the URL character string, and match the domain name with a preset domain name list. And if the matching is successful, extracting the attribute information of the domain name in the domain name list, identifying the HTML label, and finally determining the category of at least one part of second content according to the attribute information of the domain name and the HTML label. Taking a preferential heat website as an example, identifying a domain name in a URL character string, and matching the domain name with a domain name list; and if the matching is successful, determining that the URL is the preferred web site, and extracting the category information in the attribute information related to the domain name from a preset domain name list according to the domain name in the URL. And if the matching fails, utilizing the HTML label to judge the category. The predefined category information in the attribute information corresponding to the domain name in the domain name list indicates that the content corresponding to the domain name is mainly a video; extracting information of the video category, identifying the HTML label, and knowing that the content contains characters through identification; the category of the portion of content may then be determined to be that of video in combination with text. For another example, for a pan website, a domain name is extracted from a URL character string, then the domain name is compared with a domain name list, if matching is successful, and it is determined that the URL is the pan website, then category information of pre-defined image-text mixed arrangement in attribute information of the domain name is extracted from a predetermined domain name list, then an HTML tag is identified, and after identification, it is determined that the content also includes a video, and it is determined that the category of the content is the category of the image-text mixed arrangement combined video.
In the above embodiment, the URL includes: protocol// domain name port/virtual directory/file name? Parameter # anchor part. Wherein an IP address can be used as the domain name. Take http:// www.aspxfans.com:8080/news/index. aspbaardidd ═ 5& ID ═ 24618& page ═ 1# name as an example, www.aspxfans.com is the domain name.
It should be noted here that, in the above process of determining the category, the step of identifying the domain name and the step of determining the category of the content by using the HTML tag may be performed simultaneously, or may be performed sequentially, and if performed sequentially, the embodiment of the present invention does not limit the order, that is: the category of the content can be identified by the HTML label first and then the category information is extracted through the domain name, or the category information is extracted through the domain name first and then the category of the content is identified by the HTML label.
It will be appreciated by those of ordinary skill in the art that the above examples are not intended to be exhaustive or to limit at least some of the present or future examples to the precise form disclosed, and that such examples are intended to include within their scope all such forms as may come within the true scope of the present invention and are intended to be included herein by reference.
S120: according to the category, a content presentation form is determined.
In some optional embodiments, the step may specifically include: a content presentation form is determined based on the category and the domain name in the URL and based on at least a portion of the second content.
For example, if the category is determined to be a type of mixed-text, and the domain name is www.taobao.com, and the obtained second content is the text content on the panning website, the structure of big picture plus text can be used in combination with the text content on the panning website as the content presentation form.
For another example, if the category is determined to be the type of text-text mixed-up, and the domain name is www.qq.com, and the obtained second content is the text content on the Tencent website, the structure of the small graph and the text can be used in combination with the text content on the Tencent website as the content presentation form.
When the content display form is determined, the embodiment of the invention can adopt any one or more of the following structural forms: the structure of the large picture matched with characters, the structure of the small picture matched with characters, the structure of the large window matched with characters, the structure of the small window matched with characters, the structure of the picture matched with audio and the like. The typesetting form of the structural form may be an up-down structure, a left-right structure, a diagonal structure, but is by no means limited thereto. It should be appreciated by those of ordinary skill in the art that any such actual or future presentation of content that may occur or become known to those of ordinary skill in the art may be incorporated into the present invention without departing from the scope of the present invention. And are hereby incorporated by reference herein.
It should be understood by those skilled in the art that the above examples are not intended to be exhaustive and that any present or future manner of determining the nature of the content presentation, as it may be applied to the embodiments of the present invention, is intended to be encompassed within the scope of the present invention and is hereby incorporated by reference.
S130: and displaying the URL in a content display form.
For example, through the step, the URL character string can be displayed in a certain layout manner by matching with the structural form of a large figure and characters. The layout manner may be an up-down manner, a left-right manner, a diagonal manner, but is by no means limited thereto.
In practical application, the display mode of the content can be manually selected by a user or automatically pushed by a background.
Compared with the existing style-free selectable method, the embodiment of the invention can intelligently recommend various presentation styles, thereby helping an author to improve the presentation power of the article.
The embodiment of the invention obtains the first content, wherein the first content is at least one part of resource content corresponding to the URL; then, determining a category of the first content; determining a content display form according to the category; and finally, displaying the URL in a content display form. Therefore, the content presentation form is determined by judging the category of the resource content corresponding to the URL, and the URL is finally presented in combination with the content presentation form, for example: the URL can be displayed by combining a content display form of a small window video with mixed arrangement of pictures and texts below the URL character string; for another example, the URL may be displayed above the URL string in a content display form of large-window video combined with text. Therefore, the URL is beautified, the technical problem of how to improve the expressive force of the URL is solved, the expressive force of the article is enhanced, the attraction to the user can be improved, and the click rate of the user can be increased.
The invention is described in more detail below with reference to fig. 2, which shows a preferred embodiment.
S200: the first content is obtained by any one or several of the following: a crawler tool, an ASP web page grabbing tool, a Java mode, a PHP mode, a Delphi mode, a Python mode, a Flex mode or a Ruby mode; the first content is at least one part of resource content corresponding to the URL.
S210: at least a portion of the first content is identified.
S220: XML serialization is performed on at least a portion of the first content.
S230: extracting at least a portion of the second content by extracting at least a portion of the XML node information; wherein the second content is at least a portion of the first content.
S240: at least a portion of the second content is classified using the domain name and HTML tags in the URL to determine a category of the first content.
S250: a content presentation form is determined based on the category and the domain name in the URL and based on at least a portion of the second content.
S260: and displaying the URL with the content display form.
Through the preferred embodiment, the URL is beautified, and a reader can be helped to acquire more information on the premise of not clicking a link, so that the expressive force of the URL is improved, the expressive force of an article is enhanced, the attraction to a user can be improved, and the click rate of the user can be increased.
In addition, in order to assist the author in improving the exhibition capacity of the article, the embodiment of the invention also provides an information expression method. The method may include any of the above URL display method embodiments.
For the description of the present embodiment, reference may be made to the description of any one of the above embodiments of the URL display method, which is not repeated herein.
It should be noted that, the steps in the URL display method embodiment and the information expression method embodiment may be executed sequentially or in parallel, and are not limited herein.
Based on the same technical concept as the URL display method embodiment, the embodiment of the invention also provides a URL display device. The apparatus embodiment may perform the method embodiment described above. As shown in fig. 3, the apparatus 30 may include: an acquisition module 32, a first determination module 34, a second determination module 36, and a presentation module 38. The obtaining module 32 is configured to obtain a first content, where the first content is at least a part of resource content corresponding to the URL. The first determination module 34 is used to determine a category of the first content. The second determining module 36 is configured to determine the content presentation form according to the category. The presentation module 38 is configured to present the URL in a content presentation format.
In some optional embodiments, on the basis of the embodiment shown in fig. 3, the obtaining module may specifically include: the device comprises a URL acquiring subunit and a grabbing unit. The URL obtaining subunit is used for obtaining the URL. The capturing unit is configured to capture the resource content corresponding to the URL by any one or more of the following ways, so as to obtain the first content: a crawler tool, an ASP web page crawling tool, a Java mode, a PHP mode, a Delphi mode, a Python mode, a Flex mode or a Ruby mode.
In some optional embodiments, on the basis of the embodiment shown in fig. 3, the first determining module may specifically include: the device comprises an identification unit, an extraction unit and a classification unit. Wherein the identification unit is configured to identify at least a part of the first content. The extraction unit is used for extracting at least one part of second content, wherein the second content is at least one part of first content. The classification unit is used for classifying at least one part of the second content so as to determine the category of the first content.
In some optional embodiments, on the basis of the foregoing embodiments, the extracting unit may specifically include: a serialization unit and an extraction subunit. The serialization unit is used for performing XML serialization on at least one part of the first content. The extraction subunit is configured to extract at least a portion of the second content by extracting at least a portion of the XML node information.
In some optional embodiments, on the basis of the foregoing embodiments, the classification unit may specifically include: and (5) classifying the subunits. Wherein the classification subunit is configured to classify at least a portion of the second content using the URL and the HTML tag.
In some preferred embodiments, the classification subunit further specifically includes: the device comprises a matching unit, a category extraction unit, a category determination unit and a content classification unit. The matching unit is used for matching the domain name with a preset domain name list; the predetermined domain name list comprises predetermined domain names and first class information thereof. The category extraction unit is used for extracting the first category information under the condition that the matching is successful. The category determining unit is used for identifying the HTML label and determining second category information. The content classification unit is used for classifying at least one part of the second content according to the first category information and the second category information.
On the basis of the above preferred embodiment, the URL display apparatus may further include an execution unit. The execution unit is used for triggering the category determination unit under the condition that the matching fails.
In some optional embodiments, on the basis of the foregoing embodiments, the second determining module may specifically include: a subunit is determined. The determining subunit is configured to determine a content presentation form according to the category and the domain name in the URL and based on at least a part of the second content.
In addition, in order to assist the author in improving the exhibition capacity of the article, the embodiment of the invention also provides an information expression system. As shown in fig. 4, the system may include a URL presentation device and an expression module. The expression module is used for expressing the URL in the article through the URL display device.
For the description of the present embodiment, reference may be made to the description of any one of the above embodiments of the URL display method, which is not repeated herein.
Those skilled in the art will appreciate that the above-described embodiments of the URL presentation apparatus and the information presentation system may also include some well-known structures, such as a processor, a memory, a bus, and the like. Wherein the processor is connected with the memory through a bus. Processors include, but are not limited to, ARM, programmable logic devices, DSPs, and the like. The memory may be a random access memory or a read only memory. The buses may include a data bus, an address bus, and a control bus.
It should be noted that, in the above-mentioned URL display apparatus embodiment and information expression system embodiment, the modules are divided only in an exemplary manner, and those skilled in the art should understand that other manners may also be used to divide the modules, and the divided modules may also be further split or combined; furthermore, the modules may be executed sequentially or in parallel, and are not limited herein.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A method for showing a Uniform Resource Locator (URL), the method comprising:
acquiring first content, wherein the first content is at least one part of resource content corresponding to the URL;
identifying at least a portion of the first content; extracting at least one part of second content, wherein the second content is at least one part of the first content;
matching the domain name in the URL with a preset domain name list; if the matching is successful, extracting first class information; the predetermined domain name list comprises a predetermined domain name and first class information thereof, wherein the first class information is a class of content corresponding to the domain name; identifying an HTML (hypertext markup language) tag corresponding to the second content, and determining second category information, wherein the second category information is the category of the content in the webpage corresponding to the URL; classifying the at least one part of the second content according to the first category information and the second category information so as to determine the category of the first content; the category of the first content is the first category information and the second category information;
determining a content display form according to the category of the first content;
and displaying the resource content of which the category is the first category information or the second category information in the webpage corresponding to the URL by matching the URL with the content display form.
2. The method according to claim 1, wherein the obtaining the first content specifically includes:
capturing the resource content corresponding to the URL in any one or more of the following modes: a crawler tool, an ASP web page crawling tool, a Java mode, a PHP mode, a Delphi mode, a Python mode, a Flex mode or a Ruby mode.
3. The method according to claim 1, wherein the extracting at least a portion of the second content specifically comprises:
XML serializing the at least a portion of the first content;
the extracting of the at least a portion of the second content is achieved by extracting at least a portion of the XML node information.
4. The method of claim 1, further comprising:
and if the matching fails, executing the step of identifying the HTML label corresponding to the second content.
5. An information presentation method, the method comprising:
displaying the URL in the article by the URL displaying method as claimed in any one of claims 1-4.
6. An apparatus for Uniform Resource Locator (URL) presentation, the apparatus comprising:
an obtaining module, configured to obtain first content, where the first content is at least a part of resource content corresponding to the URL;
a first determination module for determining a category of the first content; the first determining module specifically includes: an identifying unit configured to identify at least a part of the first content; an extracting unit, configured to extract at least a part of second content, where the second content is at least a part of the first content; the matching unit is used for matching the domain name in the URL with a preset domain name list; the category extraction unit is used for extracting first category information under the condition of successful matching; the predetermined domain name list comprises a predetermined domain name and first class information thereof, wherein the first class information is a class of content corresponding to the domain name; the category determining unit is used for identifying an HTML (hypertext markup language) tag corresponding to the second content and determining second category information, wherein the second category information is the category of the content in the webpage corresponding to the URL; a content classification unit, configured to classify the at least a portion of the second content according to the first category information and the second category information, so as to determine a category of the first content; the category of the first content is the first category information and the second category information;
the second determination module is used for determining a content display form according to the category of the first content;
and the display module is used for displaying the resource content of which the category is the first category information or the category is the second category information in the webpage corresponding to the URL by matching the URL with the content display form.
7. The apparatus according to claim 6, wherein the obtaining module specifically includes:
a URL obtaining subunit, configured to obtain a URL;
a capturing unit, configured to capture the resource content corresponding to the URL in any one or more of the following manners, so as to obtain the first content: a crawler tool, an ASP web page crawling tool, a Java mode, a PHP mode, a Delphi mode, a Python mode, a Flex mode or a Ruby mode.
8. The apparatus according to claim 6, wherein the extracting unit specifically comprises:
a serialization unit for performing XML serialization on the at least one part of the first content;
and the extraction subunit is used for realizing the extraction of at least one part of second content by extracting at least one part of XML node information.
9. The apparatus of claim 6, further comprising:
and the execution unit is used for triggering the category determination unit under the condition of failed matching.
10. An information presentation system, the system comprising:
the URL presentation means as claimed in any one of claims 6-9;
and the expression module is used for displaying the URL in the article through the URL display device.
CN201710385155.5A 2017-05-26 2017-05-26 Uniform resource locator display method, information display method and related products thereof Active CN108959325B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710385155.5A CN108959325B (en) 2017-05-26 2017-05-26 Uniform resource locator display method, information display method and related products thereof
PCT/CN2018/088438 WO2018214964A1 (en) 2017-05-26 2018-05-25 Uniform resource locator display method, information expression method and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710385155.5A CN108959325B (en) 2017-05-26 2017-05-26 Uniform resource locator display method, information display method and related products thereof

Publications (2)

Publication Number Publication Date
CN108959325A CN108959325A (en) 2018-12-07
CN108959325B true CN108959325B (en) 2021-06-29

Family

ID=64396255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710385155.5A Active CN108959325B (en) 2017-05-26 2017-05-26 Uniform resource locator display method, information display method and related products thereof

Country Status (2)

Country Link
CN (1) CN108959325B (en)
WO (1) WO2018214964A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GR1010585B (en) * 2022-11-10 2023-12-12 Παναγιωτης Τσαντιλας Web crawling and content summarization

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258058A (en) * 2013-06-03 2013-08-21 贝壳网际(北京)安全技术有限公司 Page display method and system and browser
CN104468720A (en) * 2014-11-07 2015-03-25 广州市至德科技企业孵化器有限公司 Method for determining preview link and providing dynamic preview information for preview link

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544178B (en) * 2012-07-13 2019-04-12 百度在线网络技术(北京)有限公司 It is a kind of for providing the method and apparatus of reconstruction page corresponding with target pages
US9569522B2 (en) * 2014-06-04 2017-02-14 International Business Machines Corporation Classifying uniform resource locators
CN105512149A (en) * 2014-10-14 2016-04-20 阿里巴巴集团控股有限公司 Method for updating and displaying keyword information of digital reading resource and relevant devices
CN106033428B (en) * 2015-03-11 2019-08-30 北大方正集团有限公司 The selection method of uniform resource locator and the selection device of uniform resource locator
CN106095453B (en) * 2016-06-16 2019-12-24 北京金山安全软件有限公司 Information display method and device and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258058A (en) * 2013-06-03 2013-08-21 贝壳网际(北京)安全技术有限公司 Page display method and system and browser
CN104468720A (en) * 2014-11-07 2015-03-25 广州市至德科技企业孵化器有限公司 Method for determining preview link and providing dynamic preview information for preview link

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
希望博客新增加的功能——"网页预览"&SNAP;IT昆仑;《https://blog.51cto.com/qinyezhai/113164》;20081116;全文 *

Also Published As

Publication number Publication date
WO2018214964A1 (en) 2018-11-29
CN108959325A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
US8762556B2 (en) Displaying content on a mobile device
US8601120B2 (en) Update notification method and system
JP5636521B2 (en) Configuration of web crawler to extract web page information
JP5505671B2 (en) Update notification method and browser
US8683311B2 (en) Generating structured data objects from unstructured web pages
US10515142B2 (en) Method and apparatus for extracting webpage information
US20150295942A1 (en) Method and server for performing cloud detection for malicious information
CN104063401B (en) The method and apparatus that a kind of webpage pattern address merges
EP2357572A1 (en) Processing method and apparatus of page resources
CN107153716B (en) Webpage content extraction method and device
JP2012523047A (en) Method and system for acquiring AJAX web page content
CN111104587A (en) Webpage display method and device and server
CN106294885A (en) A kind of data collection towards isomery webpage and mask method
CN104023046B (en) Mobile terminal recognition method and device
US20130268832A1 (en) Method and system for creating digital bookmarks
CN106446123A (en) Webpage verification code element identification method
CN111381809B (en) Method and device for searching focus page
CN108959325B (en) Uniform resource locator display method, information display method and related products thereof
CN112612990A (en) Webpage analysis method, system and computer readable storage medium
EP2711838A1 (en) Documentation parser
CN112468550A (en) File downloading method and device and electronic equipment
JPH10289250A (en) System for url registration and display for www browser
CN110764994A (en) Page element packaging method and device, electronic equipment and storage medium
JP2006065467A (en) Device for creating data extraction definition information and method for creating data extraction definition information
CN113806667B (en) Method and system for supporting webpage classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant