WO2018214964A1

WO2018214964A1 - Uniform resource locator display method, information expression method and related product

Info

Publication number: WO2018214964A1
Application number: PCT/CN2018/088438
Authority: WO
Inventors: 周显
Original assignee: 北京金山办公软件股份有限公司; 珠海金山办公软件有限公司; 广州金山移动科技有限公司
Priority date: 2017-05-26
Filing date: 2018-05-25
Publication date: 2018-11-29
Also published as: CN108959325A; CN108959325B

Abstract

Provided are a uniform resource locator (URL) display method, an information expression method, a uniform resource locator (URL) display device, and an information expression system. The uniform resource locator (URL) display method may comprise: acquiring first content, wherein the first content is at least some resource content corresponding to a URL to be displayed; determining the category of the first content; determining a content presentation form according to the category; and displaying the URL to be displayed in the content presentation form, wherein determining the content presentation form according to the category specifically comprises: determining a content presentation form according to the category and a domain name in the URL to be displayed, and based on at least some of second content. Therefore, by means of the embodiments of the present application, the technical problem of how to improve the expressiveness of a URL is solved, and the URL is beautified, thereby improving the expressiveness of the URL and enhancing the presentation of an article, so that the appeal for a user can be improved, and the user's click rate can be increased.

Description

Uniform resource locator display method, information expression method and related products

This application claims the priority of the Chinese patent application filed on May 26, 2017, the Chinese Patent Office, application number 201710385155.5, entitled "Uniform Resource Locator Display Method, Information Expression Method and Related Products", the entire contents of which are The citations are incorporated herein by reference.

Technical field

The present application relates to the field of Internet technologies, and in particular, to a unified resource locator display method, an information expression method, a uniform resource locator display device, and an information expression system.

Background technique

Currently, authors often publish articles via the Internet. When an author introduces a website, video, song, or other article in his or her article, a corresponding URL (Uniform Resource Locator) is added to the article he published, which is what we often see. Hyperlink. In this way, when the user reads the article, directly click on the URL to jump to the corresponding webpage and browse.

Specifically, there are two main ways to display URLs: the first way is to directly write the string of the URL into the article; the second way is to set the URL as a text hyperlink, and then the author according to himself The understanding of the resource content corresponding to the URL is described in the form of text.

For the first way, the way to display the URL directly in the article is actually just to directly display the URL string itself, and the URL string does not have any content information for the user, so it cannot be explicitly displayed. The main resource content that you want to display to the user after the URL. Therefore, this kind of display makes the URL poorly expressed and unattractive. For the second method, although it tries to make the user understand and attract the user's click; however, since the text description is the author's subjective summary, in essence, the text description of the URL displayed in this way has become an article. Part of it, it does not objectively and realistically display the URL. And because the expression of words is limited, and it is also related to the author's ability to express; therefore, the expression of this URL is not very expressive.

Summary of the invention

The purpose of the embodiment of the present application is to provide a uniform resource locator URL display method to solve at least the technical problem of how to improve the expressiveness of a URL. In addition, an information expression method, a uniform resource locator URL display device, and an information expression system are also provided.

In order to achieve the above object, according to an aspect of the present application, the following technical solutions are provided:

A uniform resource locator URL display method, the method may include:

Acquiring the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;

Determining a category of the first content;

Determining a content presentation form according to the category;

The URL to be displayed is displayed in the content presentation form.

Optionally, the obtaining the first content may include:

Get the URL to be displayed;

The resource content corresponding to the URL to be displayed is captured by any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method;

Obtaining at least a portion of the content of the resource as the first content.

Optionally, the determining the category of the first content may specifically include:

Identifying at least a portion of the first content;

Extracting at least a portion of the second content from the identified at least a portion of the first content;

Determining the category of the at least a portion of the second content, and determining the determined category as the category of the first content.

Optionally, the extracting at least a part of the second content from the at least one part of the first content that is identified includes:

XML serializing the at least a portion of the first content;

Extracting at least a portion of the XML node information from the XML file generated by the XML serialization to obtain the at least a portion of the second content.

Optionally, the determining the category of the at least part of the second content includes:

Determining the category of the at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed.

Optionally, the determining, by using the domain name and the HTML tag in the URL to be displayed, the category of the at least part of the second content, specifically:

Matching the domain name in the URL to be displayed with a predetermined domain name list; wherein the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;

If the matching is successful, extracting the first category information;

Identifying an HTML tag in the URL to be displayed, and determining second category information;

Determining the category of the at least a portion of the second content based on the first category information and the second category information.

Optionally, the method further includes:

If the matching fails, identifying an HTML tag in the URL to be displayed, and determining third category information;

Determining the category of the at least a portion of the second content based on the third category information.

Optionally, the determining the content presentation form according to the category includes:

Determining the content presentation form according to the category and the domain name in the URL to be displayed, and based on the at least a portion of the second content.

In order to achieve the above object, according to another aspect of the present application, the following technical solutions are also provided:

An information expression method, the method may include:

Obtaining a to-be-processed article, and displaying the URL in the to-be-processed article by using the URL display method described in any of the above.

In order to achieve the above object, according to still another aspect of the present application, the following technical solutions are also provided:

A uniform resource locator URL display device, the device may include:

An acquiring module, configured to acquire the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;

a first determining module, configured to determine a category of the first content;

a second determining module, configured to determine a content presentation form according to the category;

And a display module, configured to display the to-be-displayed URL according to the content presentation form.

Optionally, the acquiring module may specifically include:

a URL acquisition subunit for obtaining a URL to be displayed;

The crawling unit is configured to capture the resource content corresponding to the URL to be displayed by using any one or more of the following methods: a crawler tool, an ASP web crawling tool, a Java mode, a PHP mode, a Delphi mode, a Python mode, and a Flex Way or Ruby way;

The first content acquisition subunit is configured to acquire at least a part of the resource content as the first content.

Optionally, the first determining module may specifically include:

An identification unit, configured to identify at least a portion of the first content;

An extracting unit, configured to extract at least a part of the second content from the identified at least part of the first content;

a classifying unit, configured to determine a category of the at least a portion of the second content, and determine the determined category as a category of the first content.

Optionally, the extracting unit specifically includes:

a serializing unit, configured to perform XML serialization on the at least a part of the first content;

And an extraction subunit, configured to extract at least a part of the XML node information from the XML file generated after the XML serialization, to obtain the at least part of the second content.

Optionally, the classifying unit specifically includes:

a classification subunit, configured to determine a category of the at least a portion of the second content by using a domain name and an HTML tag in the URL to be displayed.

Optionally, the classification subunit specifically includes:

a matching unit, configured to match a domain name in the URL to be displayed with a predetermined domain name list, where the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;

a class extracting unit, configured to extract the first category information if the matching is successful;

a category determining unit, configured to identify an HTML tag in the URL to be displayed, and determine second category information;

a content classification unit, configured to determine, according to the first category information and the second category information, a category of the at least a portion of the second content.

Optionally, the device further includes:

And an execution unit, configured to identify an HTML tag in the to-be-displayed URL, determine a third category information, and determine a category of the at least a portion of the second content according to the third category information.

Optionally, the second determining module specifically includes:

Determining a subunit, configured to determine the content presentation form based on the category and the domain name in the to-be-displayed URL, and based on the at least a portion of the second content.

An information expression system, the system may include:

a URL display device according to any of the above;

And an expression module, configured to obtain a to-be-processed article, and display the URL in the to-be-processed article by using the URL display device.

To achieve the above objective, an embodiment of the present application further discloses an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus.

a memory for storing a computer program;

The processor, when used to execute a program stored on the memory, implements the above-described uniform resource locator URL display method, or the above information expression method.

In order to achieve the above objective, the embodiment of the present application further discloses a storage medium, where the storage medium is used to store an application, and the application is configured to execute the foregoing uniform resource locator URL display method at runtime, or Information expression method.

To achieve the above objective, the embodiment of the present application discloses an application program for executing the foregoing uniform resource locator URL display method or the above information expression method at runtime.

The embodiment of the present application provides a uniform resource locator URL display method, an information expression method, a uniform resource locator URL display device, and an information expression system. The uniform resource locator URL display method may include: acquiring the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed; determining a category of the first content; determining a content presentation form according to the category; The URL to be displayed is displayed in the form of content presentation. The technical solution determines the content presentation form by judging the category of the resource content corresponding to the URL, thereby displaying the URL through the content presentation form. Compared with the manner of directly displaying the characters in the URL, the technical solution provided by the embodiment of the present application enhances the expressiveness of the URL and enhances the presentation power of the article by displaying the URL according to the content presentation form. It can increase the appeal to users, which in turn can increase the user's click rate. Compared with the author's subjectively summarizing the resource content corresponding to the URL, and then displaying the URL by the text summary, the content involved in the content presentation provided by the technical solution provided by the embodiment of the present application is objectively obtained. That is, the embodiment of the present application objectively acquires the first content, then determines the category thereof, and then determines the content presentation form according to the category. Finally, the URL is displayed objectively and realistically according to the content presentation form; thereby being able to accurately The resource content corresponding to the URL is displayed, thereby improving the expressiveness of the URL, enhancing the presentation power of the article, and thereby increasing the appeal and click rate of the user.

Of course, it is not necessary for any of the products or methods of the present application to achieve all of the advantages described above.

DRAWINGS

In order to more clearly illustrate the embodiments of the present application and the technical solutions of the prior art, the following description of the embodiments and the drawings used in the prior art will be briefly introduced. Obviously, the drawings in the following description are only Some embodiments of the application may also be used to obtain other figures from those of ordinary skill in the art without departing from the scope of the invention.

FIG. 1 is a schematic flowchart of a URL display method according to an embodiment of the present application;

2 is a schematic flowchart of a URL display method according to another embodiment of the present application;

FIG. 3 is a schematic structural diagram of a URL display apparatus according to an embodiment of the present application; FIG.

4 is a schematic structural diagram of an information expression system according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

detailed description

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

It should be noted that the technical features in the embodiments of the present application may be combined with each other without conflict.

Terminology

The content presentation form refers to various content presentation methods for making the content easier to understand, such as video, audio, graphic and text, map or text, but is not limited thereto.

Text markup language (HTML, Hypertext Markup Language) is used to describe web page files. HTML marks how parts of a web page are displayed by adding tags to the web page file. The webpage file itself is a text file, and the webpage file with the tag added can tell the browser how to display the content, for example: how to handle the text, how the screen is arranged, how the image is displayed, and the like.

Extensible Markup Language (XML) is a standard text format for websites that represent structured information. XML separates the data from the HTML page to get the XML data. XML data is stored in plain text format. XML data can be accessed in an HTML page.

The embodiment of the present application provides a URL display method, by obtaining the resource content corresponding to the URL to be displayed in advance and determining the category of the content, and then determining a content presentation form suitable according to the category, and finally, The URL to be displayed is presented in the article along with the content presentation form, so that the URL represents a clearer and more expressive representation of the information, which in turn makes the user more willing to click.

In this regard, in an actual application, in order to solve the technical problem of how to improve the performance of the URL, the embodiment of the present application provides a uniform resource locator URL display method. As shown in FIG. 1, the method can be implemented by step S100 to step S130. among them:

S100: Acquire a first content, where the first content is at least a part of resource content corresponding to the URL to be displayed.

Among them, the URL (Uniform Resource Locator) means a uniform resource locator, commonly known as a web page address. A URL is a concise representation of the location and access method of a resource that can be obtained from the Internet, and is the address of a standard resource on the Internet. Every file on the Internet has a unique URL that contains information indicating the location of the file and how the browser should handle it. Any resource content on the Internet (eg, websites, files, articles, videos, etc.) can be referred to by URL.

It should be noted here that "at least a part" mentioned herein may be part or all. For example, at least a part of the resource content corresponding to the URL to be displayed may be part of the resource content corresponding to the URL to be displayed, or may be all resource content corresponding to the URL to be displayed.

In some optional embodiments, step S100 may specifically include: obtaining a URL to be displayed; and capturing resource content corresponding to the URL to be displayed by using any one or more of the following methods: a crawler tool, an ASP (Active Server Page) webpage Crawler, Java (Java), PHP (Hypertext Preprocessor), Delphi (Delphi Visualization), Python (combined with interpretability, compiling, interactivity, and object-orientation) Scripting language), Flex (Flash-based application-based) or Ruby (cross-platform object-oriented interpreted language); acquire at least part of the resource content as the first content.

The following uses the crawler tool as an example to illustrate the process of crawling the resource content corresponding to the URL. Among them, the crawler tool can be implemented by means of Java, Python, and the like.

Specifically, a plurality of initial webpages are preset, and any initial webpage is captured, and the URL on the initial webpage is obtained into a URL queue waiting to capture the content of the resource, and the source code parsing is used to obtain the content of the waiting for the crawling resource. A resource content corresponding to the URL of the URL queue, in the process of crawling the webpage, continuously extracting a new URL from the current webpage into a URL queue waiting to capture the resource content, and acquiring the resource content corresponding to the new URL, until Crawl all pages.

Or, the content of the resource is fetched by filtering the URLs that are not related to the topic from a preset number of webpages according to the existing webpage analysis algorithm, retaining the useful URL, and placing it in the content of waiting for the resource to be fetched. URL queue. Then, according to the preset search strategy, the URLs of the resource content to be crawled are sequentially selected from the queue until the resource content corresponding to all the URLs is captured.

The preset search policy may preferentially capture the resource content corresponding to the URL on the important webpage. Here, you can measure the importance of a web page by its popularity, link importance, and average link depth. Among them, the popularity of the link can be measured by whether there are a large number of pages pointed to. Link importance can be measured by whether it contains ".com", "home", and fewer "/". The average link depth represents the link distance of each seed site to the web page in a seed site set, wherein the link distance can be determined by the width first traversal rule).

The ASP web crawling tool is taken as an example to illustrate the process of crawling the resource content corresponding to the URL. Specifically, using the C# language (which is an object-oriented programming language) to fill in the Form base class expression of the URL to be crawled to obtain the parameter of the URL to be crawled, and then pass the post method (in a manner of asynchronous communication with the server) to be captured. The parameter of the URL is taken to the server, and the server captures the resource content corresponding to the URL to be crawled based on the received parameter of the URL to be crawled, wherein, in programming, the Form constitutes a part of the application user interface.

Those skilled in the art should be able to understand that when content acquisition is actually performed, third party tools can also be used to implement resource content capture.

For example, you can use the jsoup parser (which is a Java-based HTML parser) to directly parse the URL to be displayed, and use the DOM (Document Object Model), CSS (Cascading Style Sheets) to fetch The resource content corresponding to the URL to be displayed.

It should be understood by those skilled in the art that the above examples are not exhaustive, and any existing or future possible manner of obtaining the first content, if applicable to the embodiments of the present application, should also be included in the scope of protection of the present application. This is incorporated herein by reference.

S110: Determine a category of the first content.

In some optional embodiments, this step may be specifically implemented by using step S112 to step S116. among them:

S112: Identify at least a part of the first content.

As an example, this step may identify at least a portion of the first content by identifying the HTML tags based on the HTML source code displayed by the browser. Because HTML tags are predefined, it is possible to identify whether the content in the tag is text, video, or title based on the HTML tag. For a web page, there may be images, texts, etc. Through this step, it can be identified which part is the picture, which part is the text, or which part is the video.

Specifically, for example, the "<audio>" tag defines a sound content, and by identifying the tag, it can be known that the content is audio. The "<canvas>" tag defines a graphic, and by identifying the tag, you can know that this part is a graphic. The "<video>" tag defines a video, and by identifying the tag, you can know that this part of the content is a video. The "<img>" tag defines an image by which you can know that this part of the image is an image. The "<body>" tag defines what the page displays. "<head>" defines the title, character format, language, compatibility, keywords, description, and more.

Those skilled in the art should be able to understand that the above examples are not exhaustive, and any existing or future possible ways of identifying at least a portion of the first content, if applicable to the embodiments of the present application, should also be included in the protection of the present application. The scope is incorporated herein by reference.

S114: Extract at least a part of the second content from the identified at least part of the first content.

This step can extract part of the content according to the predefined HTML tags for the presentation of the content of the URL to be displayed.

In some optional embodiments, the step of extracting at least a portion of the second content from the at least one portion of the first content that is identified may specifically include:

S1142: Perform XML serialization on at least a part of the first content.

Among them, serialization refers to the process of converting a data structure or object into a binary string.

The process of serialization is described in detail below in a preferred embodiment.

Step a1: Read the entire HTML code of the resource content using the OpenRead method of the WebClient service in C#.NET.

Among them, the WebClient service allows Win32 applications to access documents on the Internet. The service extends the networking capabilities of Windows by allowing standard Win32 applications to create, read, and write files on Internet file servers by using WebDAV, a file access protocol described by XML. The OpenRead method is a way to read resources in the WebClient service.

Step a2: Delete irrelevant code.

Because the script cannot be parsed by XML, delete the script between all <Script></Script> in the code. That is to say, scripts that cannot be parsed by XML are irrelevant code.

Step a3: Delete the general irrelevant item in the HTML code.

A regular irrelevant item is content that is not related to serialization in HTML code. Deleting this content does not affect serializing the HTML code.

For example, you can delete between <title> and </title> and in <meta name="keywords"content=""/>.

Step a4: Label all the HTML code.

The process of tagging HTML code is to add a start tag at the beginning of the HTML code, generate a tag for all HTML code, and then add an end tag at the end of the HTML code.

Step a5: Using Microsoft's XML serialization method, serialize the entire Html code XML to generate an XML file.

Those skilled in the art should understand that the above examples are not exhaustive, and any existing or future possible manner of XML serialization of at least a portion of the first content, if applicable to the embodiments of the present application, should also be included in It is within the scope of the present application and is hereby incorporated by reference.

S1144: extract at least a part of the XML node information from the XML file generated after the XML serialization, to obtain at least a part of the second content.

An XML file is composed of nodes, and each component in the XML file is a node. Illustratively, the entire file is a file node, each XML tag is an element node, the text contained in the XML element is a text node, and each XML attribute is a property node. A node is the smallest unit of an efficient and complete structure in an XML file.

Since the XML DOM (XML Document Object Model) defines a standard method for accessing and manipulating XML files, the DOM tree corresponding to the XML file can be created according to the XML DOM. The DOM tree includes a root node and multiple child nodes. That is to say, based on the XML DOM, the XML file can be viewed as a tree structure, and all nodes in the XML file can be accessed by accessing the DOM tree.

In the specific implementation process, the XML function information can be extracted from the generated XML file through the Get function, or XPath (XML Path Language) can be selected to extract the XML node information.

The following example details the process of extracting XML node information (eg, video).

Step b1: Perform a recursive lookup from the root node of the DOM tree to find the smallest node.

Among them, the smallest node is the most basic unit Element, that is, the node without child nodes.

The recursive search from the root node of the DOM tree can find the smallest node by recursively searching from the root node of the DOM tree according to the preset recursive algorithm to find the smallest node.

Step b2: Parse each of the smallest nodes using an XML parser to obtain string data.

Step b3: The string data is passed to the video attribute information VideoInfo class, wherein the VideoInfo class is a tool for parsing the video data.

Step b4: The string data is located to each video video class by means of continuous 2-layer extraction, that is, the video category to which the string data belongs is determined.

Step b5: Pass the string data in each video video class to the dynamic array ArrayList data structure in the search video information Search Video Info.

Step b6: Correlate the string data in the dynamic array ArrayList data structure with the corresponding Adapter data to obtain the node information of the minimum node, thereby implementing the extraction of the XML node information.

It should be understood by those skilled in the art that the above examples are not exhaustive, and any existing or future possible methods for extracting at least a part of XML node information, if applicable to the embodiments of the present application, should also be included in the protection of the present application. The scope is incorporated herein by reference.

S116: Determine at least a part of the category of the second content, and determine the determined category as the category of the first content.

In some optional embodiments, the step of determining the category of at least a portion of the second content may specifically include: determining a category of at least a portion of the second content by using a domain name and an HTML tag in the URL to be displayed.

In some preferred embodiments, the step of determining the category of at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed may specifically include:

Step c1: Match the domain name in the URL to be displayed with the predetermined domain name list. If the matching is successful, perform steps c2-c4; if the matching fails, perform steps c5-c6; wherein the predetermined domain name list includes the predetermined domain name And the first category information corresponding to the predetermined domain name.

Step c2: Extract the first category information.

Step c3: Identify the HTML tag in the URL to be displayed, and determine the second category information.

Step c4: Determine at least a part of the category of the second content according to the first category information and the second category information.

Step c5: Identify the HTML tag in the URL to be displayed, and determine the third category information.

Step c6: Determine at least a part of the category of the second content according to the third category information.

In the specific implementation process, the domain name may be identified according to the URL string to be displayed, and the domain name is matched with the predetermined domain name list.

If the matching is successful, the first category information in the attribute information of the domain name in the domain name list is extracted, and the HTML label in the display URL is identified to determine the second category information. Finally, according to the first category information in the attribute information of the domain name, The second category information identified by the HTML tag in the display URL is determined to determine at least a portion of the category of the second content.

If the matching fails, the HTML tag in the URL to be displayed is identified, the third category information is determined, and at least a part of the category of the second content is determined according to the third category information. The determining, according to the third category information, the category of the at least part of the second content may be: determining the category corresponding to the third category information as the category of at least a portion of the second content. Thus, if the match fails, at least a portion of the second content category determination can be made using only the HTML tags in the URL to be displayed.

Take the Youku website as an example, identify the domain name in the URL string, and match the domain name with the domain name list; if the matching is successful, determine that the URL is a Youku website, according to the domain name in the URL, extract from the predetermined domain name list. The category information in the attribute information related to the domain name. The predefined category information in the attribute information corresponding to the domain name in the domain name list indicates that the content corresponding to the domain name is mainly a video; the category information of the video is extracted, and then the HTML label is identified, and after being identified, the content can be known. The content contains text; thus, it can be determined that the category of this part of the content is the category of the video combined text. If it fails, the HTML tag is identified, and after recognition, it can be known that the content contains text; thus, it can be determined that the category of the content is a text category.

For another example, for the Taobao website, the domain name is extracted from the URL string, and then the domain name is compared with the domain name list. If the matching is successful, the URL is determined to be the Taobao website, and the domain name is extracted from the predetermined domain name list. The predefined text in the attribute information is mixed with the category information, and then the HTML tag is identified. After the identification, the content is further included in the video, and the category of the content is determined to be a combination of the text and the video. If the matching fails, the HTML tag is identified, and after the content is determined to include the video, it is determined that the category of the content is a video category.

In the above embodiment, the composition of the URL includes: protocol / / domain name: port / virtual directory / file name? Parameter # anchor part. Among them, you can use the IP address as the domain name. To http://www.aspxfans.com:8080/news/index.asp? boardID=5&ID=24618&page=1#name is an example, and www.aspxfans.com is a domain name.

It should be noted that, in the process of performing the category determination, the step of identifying the domain name and the step of determining the content category by using the HTML tag may be performed simultaneously, or may be performed in order, and if performed in sequence, the embodiment of the present application does not limit the sequence. The order, that is, the HTML tag can be used to identify the category of the content, and then the category information can be extracted through the domain name, or the category information can be extracted through the domain name, and the HTML tag can be used to identify the category of the content.

It should be understood by those skilled in the art that the above examples are not exhaustive, and any existing or future possible methods for classifying at least a portion of the second content, if applicable to the embodiments of the present application, should also be included in the present application. It is within the scope of protection and is hereby incorporated by reference.

S120: Determine a content presentation form according to the category.

In some optional embodiments, the step may specifically include: determining a content presentation form according to the category and the domain name in the URL to be displayed, and based on at least a portion of the second content.

For example, if it is judged that the category is a type of text and text, and the domain name is www.taobao.com, and the second content obtained is the graphic content on the Taobao website, the structure of the large image plus the text can be used. And combined with the graphic content of Taobao.com as a form of content presentation.

For another example, if it is determined that the category is a type of text and text, and the domain name is www.qq.com, and the second content obtained is the graphic content on the Tencent website, the small image plus the text can be used. Structure, combined with the content of the text on the Tencent website as a form of content presentation.

When determining the content presentation form, the embodiment of the present application may adopt any one or several of the following structural forms: the large picture according to the structure of the text, the small picture according to the structure of the text, the large window according to the structure of the text, and the small window according to the structure of the text. , the picture follows the structure of the audio, and so on. The layout form of the structural form may be an upper and lower structure, a left and right structure, and a diagonal structure, but is not limited thereto. Those skilled in the art should be able to understand that any existing or future content presentation forms that may be applied to the embodiments of the present application are also included in the protection scope of the present application. This is hereby incorporated by reference.

Those skilled in the art should be able to understand that the above examples are not exhaustive, and any existing or future possible manner of determining the content presentation form according to the category, if applicable to the embodiments of the present application, should also be included in the protection of the present application. The scope is incorporated herein by reference.

S130: Display the URL to be displayed according to the content presentation form.

For example, through this step, the URL string to be displayed can be displayed in a certain layout manner according to the structure of the large image and the text. The layout mode may be an up-and-down form, a left-right form, or a diagonal form, but is not limited thereto.

In the actual application, the display in which content presentation form can be manually selected by the user or automatically pushed by the background.

Compared with the existing non-style optional method, the embodiment of the present application can intelligently recommend a plurality of presentation styles, thereby helping the author to enhance the presentation power of the article.

The embodiment of the present application obtains the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed; then, determining the category of the first content; and further determining the content presentation form according to the category; and finally, the URL to be displayed Display in the form of content presentation. Therefore, the content presentation form is determined by determining the category of the resource content corresponding to the URL to be displayed, and finally the URL to be displayed is displayed according to the content presentation form. For example, the URL string to be displayed may be mixed according to the image and text. The row is combined with the content display form of the small window video to display the display URL; for example, the URL to be displayed may be displayed in the content display form of the large window video combined with the text above the URL string to be displayed. The beautification of the URL to be displayed solves the technical problem of how to improve the expressiveness of the URL, enhances the presentation of the article, thereby increasing the appeal to the user, thereby increasing the user's click rate.

The present application will be described in detail below with reference to FIG. 2 in a preferred embodiment.

S200: obtaining the first content by using one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method; wherein, the first content At least a portion of the resource content corresponding to the URL.

S210: Identify at least a part of the first content.

S220: Perform XML serialization on at least a part of the first content.

S230: extract at least a part of the XML node information from the XML file generated after the XML serialization, to obtain at least a part of the second content.

S240: Determine, by using a domain name and an HTML tag in the URL to be displayed, at least a part of the category of the second content, and determine the determined category as the category of the first content.

S250: Determine a content presentation form according to the domain name in the category and the URL to be displayed, and based on at least a part of the second content.

S260: Display the URL to be displayed according to the content presentation form.

Through the preferred embodiment, the beautification of the URL can help the reader to obtain more information without clicking the link, thereby improving the expressiveness of the URL and enhancing the presentation power of the article, thereby improving the appeal to the user. In turn, the user's click rate can be increased.

In addition, in order to assist the author to enhance the presentation of the article, the embodiment of the present application further provides an information expression method. The method may include any of the foregoing URL presentation method embodiments, and may specifically include: acquiring an article to be processed, and displaying the URL in the article to be processed by using any of the URL display methods described above.

In the embodiment of the present application, the article to be processed is obtained, and then the URL in the article to be processed is displayed through the URL display method. Since the URL display method can accurately display the resource content corresponding to the URL, the expressiveness of the URL is improved, and the presentation power of the article is further enhanced, thereby increasing the appeal and click rate of the user.

For the description of the embodiment, reference may be made to the description of any of the foregoing URL display method embodiments, and details are not described herein again.

It should be noted that the steps in the foregoing embodiment of the method for displaying the URL and the method for expressing the information may be performed in sequence or in parallel, which is not limited herein.

The embodiment of the present application further provides a uniform resource locator URL display device based on the same technical concept as the URL display method embodiment. The device embodiment can perform the above method embodiments. As shown in FIG. 3, the apparatus 30 can include an acquisition module 32, a first determination module 34, a second determination module 36, and a presentation module 38. The obtaining module 32 is configured to obtain the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed. The first determining module 34 is configured to determine a category of the first content. The second determining module 36 is configured to determine a content presentation form according to the category. The presentation module 38 is configured to display the URL to be displayed in a content presentation form.

In some optional embodiments, on the basis of the embodiment shown in FIG. 3, the obtaining module may specifically include: a URL obtaining subunit and a crawling unit. The URL obtaining subunit is configured to obtain a URL to be displayed. The crawling unit is configured to capture the resource content corresponding to the URL to be displayed by using any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby mode; a first content acquisition subunit, configured to acquire at least a part of the resource content as the first content.

In some optional embodiments, based on the embodiment shown in FIG. 3, the first determining module may specifically include: an identifying unit, an extracting unit, and a classifying unit. The identification unit is configured to identify at least a portion of the first content. The extracting unit is configured to extract at least a portion of the second content from the identified at least a portion of the first content. The classification unit is configured to determine the category of the at least a portion of the second content, and determine the determined category as the category of the first content.

In some optional embodiments, based on the foregoing embodiment, the extracting unit may specifically include: a serializing unit and an extracting subunit. The serialization unit is configured to perform XML serialization on at least a part of the first content. The extracting subunit is configured to extract at least a part of the XML node information from the XML file generated after the XML serialization to obtain the at least part of the second content.

In some optional embodiments, based on the foregoing embodiment, the classification unit may specifically include: a classification subunit. The classification subunit is configured to determine a category of the at least a portion of the second content by using a domain name and an HTML tag in the URL to be displayed.

In some preferred embodiments, the foregoing classification sub-unit further includes: a matching unit, a category extracting unit, a category determining unit, and a content sorting unit. The matching unit is configured to match the domain name in the URL to be displayed with a predetermined domain name list, where the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name; a unit, configured to: when the matching is successful, extract the first category information; a category determining unit, configured to identify an HTML tag in the URL to be displayed, determine second category information; and a content classification unit, configured to Determining the category of the at least a portion of the second content by describing the first category information and the second category information.

Based on the above preferred embodiments, the URL presentation device may further include an execution unit. The execution unit is configured to identify an HTML tag in the to-be-displayed URL, determine a third category information, and determine a category of the at least a portion of the second content according to the third category information.

In some optional embodiments, based on the foregoing embodiment, the foregoing second determining module may specifically include: determining a subunit. The determining subunit is configured to determine a content presentation form according to the domain name in the category and the URL to be displayed, and based on at least a portion of the second content.

In addition, in order to assist the author to enhance the presentation of the article, the embodiment of the present application further provides an information expression system. As shown in FIG. 4, the system can include a URL presentation device and an expression module. The expression module is configured to obtain an article to be processed, and display the URL in the to-be-processed article by using the URL display device.

Those skilled in the art will appreciate that the above-described URL presentation apparatus embodiments and information expression system embodiments may also include some well-known structures such as a processor, a memory, a bus, and the like. The processor is connected to the memory through a bus. Processors include, but are not limited to, ARM, programmable logic devices, DSPs, and the like. The memory can be either a random access memory or a read only memory. The bus can include a data bus, an address bus, and a control bus.

It should be noted that, in the foregoing embodiment of the URL display device and the information expression system, the module is divided into modules in an exemplary manner, and those skilled in the art should be able to understand that other methods may be used for module division. And the divided modules can be further split or combined; and the modules can be executed sequentially or in parallel, which is not limited herein.

The embodiment of the present application further provides an electronic device, as shown in FIG. 5, including: a processor 501, a communication interface 502, a memory 503, and a communication bus 504, wherein the processor 501, the communication interface 502, and the memory 503 pass through a communication bus. 504 completes communication with each other,

a memory 503, configured to store a computer program;

The processor 501 is configured to implement the foregoing resource locator URL display method step when the program stored on the memory 503 is executed, where the method includes:

Determining a category of the first content;

Determining a content presentation form according to the category;

The URL to be displayed is displayed in the content presentation form.

In an implementation manner of the application, the acquiring the first content specifically includes:

Get the URL to be displayed;

In an implementation manner of the application, the determining the category of the first content includes:

Identifying at least a portion of the first content;

In an implementation of the present application, the extracting at least a portion of the second content from the at least a portion of the first content that is identified includes:

XML serializing the at least a portion of the first content;

In an implementation manner of the application, the determining the category of the at least a portion of the second content includes:

In an implementation manner of the present application, the determining, by using the domain name and the HTML tag in the URL to be displayed, the category of the at least part of the second content, specifically:

If the matching is successful, extracting the first category information;

In an implementation manner of the application, the method further includes:

In an implementation manner of the application, the determining the content presentation form according to the category includes:

The embodiment of the present application obtains the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed; determining the category of the first content; determining the content presentation form according to the category; and performing the URL to be displayed according to the content presentation form Show. The technical solution determines the content presentation form by judging the category of the resource content corresponding to the URL, thereby displaying the URL through the content presentation form. Compared with the manner of directly displaying the characters in the URL, the technical solution provided by the embodiment of the present application enhances the expressiveness of the URL and enhances the presentation power of the article by displaying the URL according to the content presentation form. It can increase the appeal to users, which in turn can increase the user's click rate. Compared with the author's subjectively summarizing the resource content corresponding to the URL, and then displaying the URL by the text summary, the content involved in the content presentation provided by the technical solution provided by the embodiment of the present application is objectively obtained. That is, the embodiment of the present application objectively acquires the first content, then determines the category thereof, and then determines the content presentation form according to the category. Finally, the URL is displayed objectively and realistically according to the content presentation form; thereby being able to accurately The resource content corresponding to the URL is displayed, thereby improving the expressiveness of the URL, enhancing the presentation power of the article, and thereby increasing the appeal and click rate of the user.

The embodiment of the present application further provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus.

a memory for storing a computer program;

The processor is configured to implement the information expression method step when the program stored in the memory is executed, and the method includes:

The embodiment of the present application further discloses a storage medium, where the storage medium is used to store an application, and the application is configured to execute a uniform resource locator URL display method step at a runtime, where the method includes:

Determining a category of the first content;

Determining a content presentation form according to the category;

The URL to be displayed is displayed in the content presentation form.

Get the URL to be displayed;

Identifying at least a portion of the first content;

Determining the category of the at least a portion of the second content, determining the determined category as the category of the first content.

XML serializing the at least a portion of the first content;

If the matching is successful, extracting the first category information;

In an implementation manner of the application, the method further includes:

The embodiment of the present application further discloses a storage medium, where the storage medium is used to store an application, and the application is configured to execute an information expression method step at a runtime, where the method includes:

The embodiment of the present application discloses an application, where the application is used to execute a uniform resource locator URL display method at runtime, and the method includes:

Determining a category of the first content;

Determining a content presentation form according to the category;

The URL to be displayed is displayed in the content presentation form.

Get the URL to be displayed;

Identifying at least a portion of the first content;

XML serializing the at least a portion of the first content;

If the matching is successful, extracting the first category information;

In an implementation manner of the application, the method further includes:

The embodiment of the present application discloses an application program for executing an information expression method step at runtime, where the method includes:

It should also be noted that, in this context, relational terms such as first and second, etc. are used merely to distinguish one entity or operation from another entity or operation, without necessarily requiring or implying such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

The various embodiments in the present specification are described in a related manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

The above description is only the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are included in the scope of the present application.

Claims

A uniform resource locator URL display method, the method comprising:

Acquiring the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;

Determining a category of the first content;

Determining a content presentation form according to the category;

The URL to be displayed is displayed in the content presentation form.
The method according to claim 1, wherein the obtaining the first content specifically comprises:

Get the URL to be displayed;

The resource content corresponding to the URL to be displayed is captured by any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method;

Obtaining at least a portion of the content of the resource as the first content.
The method according to claim 1, wherein the determining the category of the first content comprises:

Identifying at least a portion of the first content;

Extracting at least a portion of the second content from the identified at least a portion of the first content;

Determining the category of the at least a portion of the second content, and determining the determined category as the category of the first content.
The method according to claim 3, wherein the extracting at least a portion of the second content from the identified at least a portion of the first content comprises:

XML serializing the at least a portion of the first content;

Extracting at least a portion of the XML node information from the XML file generated by the XML serialization to obtain the at least a portion of the second content.
The method according to claim 4, wherein the determining the category of the at least a portion of the second content comprises:

Determining the category of the at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed.
The method according to claim 5, wherein the determining the category of the at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed includes:

Matching the domain name in the URL to be displayed with a predetermined domain name list; wherein the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;

If the matching is successful, extracting the first category information;

Identifying an HTML tag in the URL to be displayed, and determining second category information;

Determining the category of the at least a portion of the second content based on the first category information and the second category information.
The method of claim 6 wherein the method further comprises:

If the matching fails, identifying an HTML tag in the URL to be displayed, and determining third category information;

Determining the category of the at least a portion of the second content based on the third category information.
The method according to claim 3, wherein the determining the content presentation form according to the category comprises:

Determining the content presentation form according to the category and the domain name in the URL to be displayed, and based on the at least a portion of the second content.
An information expression method, the method comprising:

Obtaining a pending article, and displaying the URL in the to-be-processed article by the URL display method according to any one of claims 1-8.
A uniform resource locator URL display device, characterized in that the device comprises:

An acquiring module, configured to acquire the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;

a first determining module, configured to determine a category of the first content;

a second determining module, configured to determine a content presentation form according to the category;

And a display module, configured to display the to-be-displayed URL according to the content presentation form.
The device according to claim 10, wherein the obtaining module specifically comprises:

a URL acquisition subunit for obtaining a URL to be displayed;

The crawling unit is configured to capture the resource content corresponding to the URL to be displayed by using any one or more of the following methods: a crawler tool, an ASP web crawling tool, a Java mode, a PHP mode, a Delphi mode, a Python mode, and a Flex Way or Ruby way;

The first content acquisition subunit is configured to acquire at least a part of the resource content as the first content.
The device according to claim 10, wherein the first determining module specifically comprises:

An identification unit, configured to identify at least a portion of the first content;

An extracting unit, configured to extract at least a part of the second content from the identified at least part of the first content;

a classifying unit, configured to determine a category of the at least a portion of the second content, and determine the determined category as a category of the first content.
The device according to claim 12, wherein the extracting unit specifically comprises:

a serializing unit, configured to perform XML serialization on the at least a part of the first content;

And an extraction subunit, configured to extract at least a part of the XML node information from the XML file generated after the XML serialization, to obtain the at least part of the second content.
The device according to claim 13, wherein the classifying unit specifically comprises:

a classification subunit, configured to determine a category of the at least a portion of the second content by using a domain name and an HTML tag in the URL to be displayed.
The apparatus according to claim 14, wherein the classification subunit specifically comprises:

a matching unit, configured to match a domain name in the URL to be displayed with a predetermined domain name list, where the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;

a class extracting unit, configured to extract the first category information if the matching is successful;

a category determining unit, configured to identify an HTML tag in the URL to be displayed, and determine second category information;

a content classification unit, configured to determine, according to the first category information and the second category information, a category of the at least a portion of the second content.
The device according to claim 15, wherein the device further comprises:

And an execution unit, configured to identify an HTML tag in the to-be-displayed URL, determine a third category information, and determine a category of the at least a portion of the second content according to the third category information.
The device according to claim 12, wherein the second determining module specifically comprises:

Determining a subunit, configured to determine the content presentation form based on the category and the domain name in the to-be-displayed URL, and based on the at least a portion of the second content.
An information expression system, characterized in that the system comprises:

A URL display device as claimed in any one of claims 10-17;

And an expression module, configured to obtain a to-be-processed article, and display the URL in the to-be-processed article by using the URL display device.
An electronic device, comprising: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

The processor, when executed to execute a program stored on the memory, implements the method of any of claims 1-8, or implements the method of claim 9.
A storage medium, wherein the storage medium is for storing an application, the application for performing the method of any one of claims 1-8 at runtime, or implementing the method of claim 9. method.
An application, characterized in that the application is operative to perform the method of any of claims 1-8 at runtime or to implement the method of claim 9.