WO2018214964A1 - Uniform resource locator display method, information expression method and related product - Google Patents

Uniform resource locator display method, information expression method and related product Download PDF

Info

Publication number
WO2018214964A1
WO2018214964A1 PCT/CN2018/088438 CN2018088438W WO2018214964A1 WO 2018214964 A1 WO2018214964 A1 WO 2018214964A1 CN 2018088438 W CN2018088438 W CN 2018088438W WO 2018214964 A1 WO2018214964 A1 WO 2018214964A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
url
category
displayed
determining
Prior art date
Application number
PCT/CN2018/088438
Other languages
French (fr)
Chinese (zh)
Inventor
周显
Original Assignee
北京金山办公软件股份有限公司
珠海金山办公软件有限公司
广州金山移动科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京金山办公软件股份有限公司, 珠海金山办公软件有限公司, 广州金山移动科技有限公司 filed Critical 北京金山办公软件股份有限公司
Publication of WO2018214964A1 publication Critical patent/WO2018214964A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Definitions

  • the present application relates to the field of Internet technologies, and in particular, to a unified resource locator display method, an information expression method, a uniform resource locator display device, and an information expression system.
  • the first way is to directly write the string of the URL into the article; the second way is to set the URL as a text hyperlink, and then the author according to himself.
  • the understanding of the resource content corresponding to the URL is described in the form of text.
  • the way to display the URL directly in the article is actually just to directly display the URL string itself, and the URL string does not have any content information for the user, so it cannot be explicitly displayed.
  • the second method although it tries to make the user understand and attract the user's click; however, since the text description is the author's subjective summary, in essence, the text description of the URL displayed in this way has become an article. Part of it, it does not objectively and realistically display the URL. And because the expression of words is limited, and it is also related to the author's ability to express; therefore, the expression of this URL is not very expressive.
  • the purpose of the embodiment of the present application is to provide a uniform resource locator URL display method to solve at least the technical problem of how to improve the expressiveness of a URL.
  • an information expression method, a uniform resource locator URL display device, and an information expression system are also provided.
  • a uniform resource locator URL display method may include:
  • the URL to be displayed is displayed in the content presentation form.
  • the obtaining the first content may include:
  • the resource content corresponding to the URL to be displayed is captured by any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method;
  • the determining the category of the first content may specifically include:
  • the extracting at least a part of the second content from the at least one part of the first content that is identified includes:
  • the determining the category of the at least part of the second content includes:
  • the determining, by using the domain name and the HTML tag in the URL to be displayed, the category of the at least part of the second content specifically:
  • the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name
  • the method further includes:
  • the determining the content presentation form according to the category includes:
  • An information expression method may include:
  • a uniform resource locator URL display device may include:
  • An acquiring module configured to acquire the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;
  • a first determining module configured to determine a category of the first content
  • a second determining module configured to determine a content presentation form according to the category
  • a display module configured to display the to-be-displayed URL according to the content presentation form.
  • the acquiring module may specifically include:
  • a URL acquisition subunit for obtaining a URL to be displayed
  • the crawling unit is configured to capture the resource content corresponding to the URL to be displayed by using any one or more of the following methods: a crawler tool, an ASP web crawling tool, a Java mode, a PHP mode, a Delphi mode, a Python mode, and a Flex Way or Ruby way;
  • the first content acquisition subunit is configured to acquire at least a part of the resource content as the first content.
  • the first determining module may specifically include:
  • An identification unit configured to identify at least a portion of the first content
  • An extracting unit configured to extract at least a part of the second content from the identified at least part of the first content
  • a classifying unit configured to determine a category of the at least a portion of the second content, and determine the determined category as a category of the first content.
  • the extracting unit specifically includes:
  • a serializing unit configured to perform XML serialization on the at least a part of the first content
  • an extraction subunit configured to extract at least a part of the XML node information from the XML file generated after the XML serialization, to obtain the at least part of the second content.
  • the classifying unit specifically includes:
  • a classification subunit configured to determine a category of the at least a portion of the second content by using a domain name and an HTML tag in the URL to be displayed.
  • classification subunit specifically includes:
  • a matching unit configured to match a domain name in the URL to be displayed with a predetermined domain name list, where the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;
  • a class extracting unit configured to extract the first category information if the matching is successful
  • a category determining unit configured to identify an HTML tag in the URL to be displayed, and determine second category information
  • a content classification unit configured to determine, according to the first category information and the second category information, a category of the at least a portion of the second content.
  • the device further includes:
  • an execution unit configured to identify an HTML tag in the to-be-displayed URL, determine a third category information, and determine a category of the at least a portion of the second content according to the third category information.
  • the second determining module specifically includes:
  • Determining a subunit configured to determine the content presentation form based on the category and the domain name in the to-be-displayed URL, and based on the at least a portion of the second content.
  • An information expression system may include:
  • an expression module configured to obtain a to-be-processed article, and display the URL in the to-be-processed article by using the URL display device.
  • an embodiment of the present application further discloses an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus.
  • a memory for storing a computer program
  • the processor when used to execute a program stored on the memory, implements the above-described uniform resource locator URL display method, or the above information expression method.
  • the embodiment of the present application further discloses a storage medium, where the storage medium is used to store an application, and the application is configured to execute the foregoing uniform resource locator URL display method at runtime, or Information expression method.
  • the embodiment of the present application discloses an application program for executing the foregoing uniform resource locator URL display method or the above information expression method at runtime.
  • the embodiment of the present application provides a uniform resource locator URL display method, an information expression method, a uniform resource locator URL display device, and an information expression system.
  • the uniform resource locator URL display method may include: acquiring the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed; determining a category of the first content; determining a content presentation form according to the category; The URL to be displayed is displayed in the form of content presentation.
  • the technical solution determines the content presentation form by judging the category of the resource content corresponding to the URL, thereby displaying the URL through the content presentation form.
  • the technical solution provided by the embodiment of the present application enhances the expressiveness of the URL and enhances the presentation power of the article by displaying the URL according to the content presentation form. It can increase the appeal to users, which in turn can increase the user's click rate.
  • the content involved in the content presentation provided by the technical solution provided by the embodiment of the present application is objectively obtained. That is, the embodiment of the present application objectively acquires the first content, then determines the category thereof, and then determines the content presentation form according to the category.
  • the URL is displayed objectively and realistically according to the content presentation form; thereby being able to accurately
  • the resource content corresponding to the URL is displayed, thereby improving the expressiveness of the URL, enhancing the presentation power of the article, and thereby increasing the appeal and click rate of the user.
  • FIG. 1 is a schematic flowchart of a URL display method according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a URL display method according to another embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a URL display apparatus according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an information expression system according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the content presentation form refers to various content presentation methods for making the content easier to understand, such as video, audio, graphic and text, map or text, but is not limited thereto.
  • Text markup language (HTML, Hypertext Markup Language) is used to describe web page files. HTML marks how parts of a web page are displayed by adding tags to the web page file.
  • the webpage file itself is a text file, and the webpage file with the tag added can tell the browser how to display the content, for example: how to handle the text, how the screen is arranged, how the image is displayed, and the like.
  • Extensible Markup Language is a standard text format for websites that represent structured information. XML separates the data from the HTML page to get the XML data. XML data is stored in plain text format. XML data can be accessed in an HTML page.
  • the embodiment of the present application provides a URL display method, by obtaining the resource content corresponding to the URL to be displayed in advance and determining the category of the content, and then determining a content presentation form suitable according to the category, and finally, The URL to be displayed is presented in the article along with the content presentation form, so that the URL represents a clearer and more expressive representation of the information, which in turn makes the user more willing to click.
  • the embodiment of the present application provides a uniform resource locator URL display method. As shown in FIG. 1, the method can be implemented by step S100 to step S130. among them:
  • S100 Acquire a first content, where the first content is at least a part of resource content corresponding to the URL to be displayed.
  • the URL Uniform Resource Locator
  • a URL is a concise representation of the location and access method of a resource that can be obtained from the Internet, and is the address of a standard resource on the Internet. Every file on the Internet has a unique URL that contains information indicating the location of the file and how the browser should handle it. Any resource content on the Internet (eg, websites, files, articles, videos, etc.) can be referred to by URL.
  • At least a part mentioned herein may be part or all.
  • at least a part of the resource content corresponding to the URL to be displayed may be part of the resource content corresponding to the URL to be displayed, or may be all resource content corresponding to the URL to be displayed.
  • step S100 may specifically include: obtaining a URL to be displayed; and capturing resource content corresponding to the URL to be displayed by using any one or more of the following methods: a crawler tool, an ASP (Active Server Page) webpage Crawler, Java (Java), PHP (Hypertext Preprocessor), Delphi (Delphi Visualization), Python (combined with interpretability, compiling, interactivity, and object-orientation) Scripting language), Flex (Flash-based application-based) or Ruby (cross-platform object-oriented interpreted language); acquire at least part of the resource content as the first content.
  • a crawler tool an ASP (Active Server Page) webpage Crawler
  • Java Java
  • PHP Hypertext Preprocessor
  • Delphi Delphi Visualization
  • Python combined with interpretability, compiling, interactivity, and object-orientation
  • Flex Flash-based application-based
  • Ruby cross-platform object-oriented interpreted language
  • the following uses the crawler tool as an example to illustrate the process of crawling the resource content corresponding to the URL.
  • the crawler tool can be implemented by means of Java, Python, and the like.
  • a plurality of initial webpages are preset, and any initial webpage is captured, and the URL on the initial webpage is obtained into a URL queue waiting to capture the content of the resource, and the source code parsing is used to obtain the content of the waiting for the crawling resource.
  • a resource content corresponding to the URL of the URL queue in the process of crawling the webpage, continuously extracting a new URL from the current webpage into a URL queue waiting to capture the resource content, and acquiring the resource content corresponding to the new URL, until Crawl all pages.
  • the content of the resource is fetched by filtering the URLs that are not related to the topic from a preset number of webpages according to the existing webpage analysis algorithm, retaining the useful URL, and placing it in the content of waiting for the resource to be fetched. URL queue. Then, according to the preset search strategy, the URLs of the resource content to be crawled are sequentially selected from the queue until the resource content corresponding to all the URLs is captured.
  • the preset search policy may preferentially capture the resource content corresponding to the URL on the important webpage.
  • the popularity of the link can be measured by whether there are a large number of pages pointed to.
  • Link importance can be measured by whether it contains ".com”, "home”, and fewer "/”.
  • the average link depth represents the link distance of each seed site to the web page in a seed site set, wherein the link distance can be determined by the width first traversal rule).
  • the ASP web crawling tool is taken as an example to illustrate the process of crawling the resource content corresponding to the URL.
  • C# language which is an object-oriented programming language
  • the parameter of the URL is taken to the server, and the server captures the resource content corresponding to the URL to be crawled based on the received parameter of the URL to be crawled, wherein, in programming, the Form constitutes a part of the application user interface.
  • third party tools can also be used to implement resource content capture.
  • jsoup parser which is a Java-based HTML parser
  • DOM Document Object Model
  • CSS CSS
  • this step may be specifically implemented by using step S112 to step S116. among them:
  • S112 Identify at least a part of the first content.
  • this step may identify at least a portion of the first content by identifying the HTML tags based on the HTML source code displayed by the browser. Because HTML tags are predefined, it is possible to identify whether the content in the tag is text, video, or title based on the HTML tag. For a web page, there may be images, texts, etc. Through this step, it can be identified which part is the picture, which part is the text, or which part is the video.
  • the " ⁇ audio>” tag defines a sound content, and by identifying the tag, it can be known that the content is audio.
  • the " ⁇ canvas>” tag defines a graphic, and by identifying the tag, you can know that this part is a graphic.
  • the " ⁇ video>” tag defines a video, and by identifying the tag, you can know that this part of the content is a video.
  • the " ⁇ img>” tag defines an image by which you can know that this part of the image is an image.
  • the " ⁇ body>” tag defines what the page displays. " ⁇ head>" defines the title, character format, language, compatibility, keywords, description, and more.
  • S114 Extract at least a part of the second content from the identified at least part of the first content.
  • This step can extract part of the content according to the predefined HTML tags for the presentation of the content of the URL to be displayed.
  • the step of extracting at least a portion of the second content from the at least one portion of the first content that is identified may specifically include:
  • S1142 Perform XML serialization on at least a part of the first content.
  • serialization refers to the process of converting a data structure or object into a binary string.
  • Step a1 Read the entire HTML code of the resource content using the OpenRead method of the WebClient service in C#.NET.
  • the WebClient service allows Win32 applications to access documents on the Internet.
  • the service extends the networking capabilities of Windows by allowing standard Win32 applications to create, read, and write files on Internet file servers by using WebDAV, a file access protocol described by XML.
  • the OpenRead method is a way to read resources in the WebClient service.
  • Step a2 Delete irrelevant code.
  • scripts that cannot be parsed by XML are irrelevant code.
  • Step a3 Delete the general irrelevant item in the HTML code.
  • a regular irrelevant item is content that is not related to serialization in HTML code. Deleting this content does not affect serializing the HTML code.
  • Step a4 Label all the HTML code.
  • the process of tagging HTML code is to add a start tag at the beginning of the HTML code, generate a tag for all HTML code, and then add an end tag at the end of the HTML code.
  • Step a5 Using Microsoft's XML serialization method, serialize the entire Html code XML to generate an XML file.
  • S1144 extract at least a part of the XML node information from the XML file generated after the XML serialization, to obtain at least a part of the second content.
  • An XML file is composed of nodes, and each component in the XML file is a node.
  • the entire file is a file node
  • each XML tag is an element node
  • the text contained in the XML element is a text node
  • each XML attribute is a property node.
  • a node is the smallest unit of an efficient and complete structure in an XML file.
  • the DOM tree corresponding to the XML file can be created according to the XML DOM.
  • the DOM tree includes a root node and multiple child nodes. That is to say, based on the XML DOM, the XML file can be viewed as a tree structure, and all nodes in the XML file can be accessed by accessing the DOM tree.
  • the XML function information can be extracted from the generated XML file through the Get function, or XPath (XML Path Language) can be selected to extract the XML node information.
  • the following example details the process of extracting XML node information (eg, video).
  • Step b1 Perform a recursive lookup from the root node of the DOM tree to find the smallest node.
  • the smallest node is the most basic unit Element, that is, the node without child nodes.
  • the recursive search from the root node of the DOM tree can find the smallest node by recursively searching from the root node of the DOM tree according to the preset recursive algorithm to find the smallest node.
  • Step b2 Parse each of the smallest nodes using an XML parser to obtain string data.
  • Step b3 The string data is passed to the video attribute information VideoInfo class, wherein the VideoInfo class is a tool for parsing the video data.
  • Step b4 The string data is located to each video video class by means of continuous 2-layer extraction, that is, the video category to which the string data belongs is determined.
  • Step b5 Pass the string data in each video video class to the dynamic array ArrayList data structure in the search video information Search Video Info.
  • Step b6 Correlate the string data in the dynamic array ArrayList data structure with the corresponding Adapter data to obtain the node information of the minimum node, thereby implementing the extraction of the XML node information.
  • S116 Determine at least a part of the category of the second content, and determine the determined category as the category of the first content.
  • the step of determining the category of at least a portion of the second content may specifically include: determining a category of at least a portion of the second content by using a domain name and an HTML tag in the URL to be displayed.
  • the step of determining the category of at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed may specifically include:
  • Step c1 Match the domain name in the URL to be displayed with the predetermined domain name list. If the matching is successful, perform steps c2-c4; if the matching fails, perform steps c5-c6; wherein the predetermined domain name list includes the predetermined domain name And the first category information corresponding to the predetermined domain name.
  • Step c2 Extract the first category information.
  • Step c3 Identify the HTML tag in the URL to be displayed, and determine the second category information.
  • Step c4 Determine at least a part of the category of the second content according to the first category information and the second category information.
  • Step c5 Identify the HTML tag in the URL to be displayed, and determine the third category information.
  • Step c6 Determine at least a part of the category of the second content according to the third category information.
  • the domain name may be identified according to the URL string to be displayed, and the domain name is matched with the predetermined domain name list.
  • the first category information in the attribute information of the domain name in the domain name list is extracted, and the HTML label in the display URL is identified to determine the second category information.
  • the second category information identified by the HTML tag in the display URL is determined to determine at least a portion of the category of the second content.
  • the HTML tag in the URL to be displayed is identified, the third category information is determined, and at least a part of the category of the second content is determined according to the third category information.
  • the determining, according to the third category information, the category of the at least part of the second content may be: determining the category corresponding to the third category information as the category of at least a portion of the second content.
  • the category information in the attribute information related to the domain name indicates that the content corresponding to the domain name is mainly a video; the category information of the video is extracted, and then the HTML label is identified, and after being identified, the content can be known.
  • the content contains text; thus, it can be determined that the category of this part of the content is the category of the video combined text. If it fails, the HTML tag is identified, and after recognition, it can be known that the content contains text; thus, it can be determined that the category of the content is a text category.
  • the domain name is extracted from the URL string, and then the domain name is compared with the domain name list. If the matching is successful, the URL is determined to be the Taobao website, and the domain name is extracted from the predetermined domain name list. The predefined text in the attribute information is mixed with the category information, and then the HTML tag is identified. After the identification, the content is further included in the video, and the category of the content is determined to be a combination of the text and the video. If the matching fails, the HTML tag is identified, and after the content is determined to include the video, it is determined that the category of the content is a video category.
  • the composition of the URL includes: protocol / / domain name: port / virtual directory / file name? Parameter # anchor part. Among them, you can use the IP address as the domain name.
  • protocol / / domain name port / virtual directory / file name? Parameter # anchor part.
  • the step of identifying the domain name and the step of determining the content category by using the HTML tag may be performed simultaneously, or may be performed in order, and if performed in sequence, the embodiment of the present application does not limit the sequence.
  • the order that is, the HTML tag can be used to identify the category of the content, and then the category information can be extracted through the domain name, or the category information can be extracted through the domain name, and the HTML tag can be used to identify the category of the content.
  • S120 Determine a content presentation form according to the category.
  • the step may specifically include: determining a content presentation form according to the category and the domain name in the URL to be displayed, and based on at least a portion of the second content.
  • the category is a type of text and text, and the domain name is www.taobao.com
  • the second content obtained is the graphic content on the Taobao website
  • the structure of the large image plus the text can be used. And combined with the graphic content of Taobao.com as a form of content presentation.
  • the category is a type of text and text, and the domain name is www.qq.com
  • the second content obtained is the graphic content on the Tencent website
  • the small image plus the text can be used. Structure, combined with the content of the text on the Tencent website as a form of content presentation.
  • the embodiment of the present application may adopt any one or several of the following structural forms: the large picture according to the structure of the text, the small picture according to the structure of the text, the large window according to the structure of the text, and the small window according to the structure of the text.
  • the picture follows the structure of the audio, and so on.
  • the layout form of the structural form may be an upper and lower structure, a left and right structure, and a diagonal structure, but is not limited thereto.
  • the URL string to be displayed can be displayed in a certain layout manner according to the structure of the large image and the text.
  • the layout mode may be an up-and-down form, a left-right form, or a diagonal form, but is not limited thereto.
  • the display in which content presentation form can be manually selected by the user or automatically pushed by the background.
  • the embodiment of the present application can intelligently recommend a plurality of presentation styles, thereby helping the author to enhance the presentation power of the article.
  • the embodiment of the present application obtains the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed; then, determining the category of the first content; and further determining the content presentation form according to the category; and finally, the URL to be displayed Display in the form of content presentation. Therefore, the content presentation form is determined by determining the category of the resource content corresponding to the URL to be displayed, and finally the URL to be displayed is displayed according to the content presentation form.
  • the URL string to be displayed may be mixed according to the image and text.
  • the row is combined with the content display form of the small window video to display the display URL; for example, the URL to be displayed may be displayed in the content display form of the large window video combined with the text above the URL string to be displayed.
  • the beautification of the URL to be displayed solves the technical problem of how to improve the expressiveness of the URL, enhances the presentation of the article, thereby increasing the appeal to the user, thereby increasing the user's click rate.
  • S200 obtaining the first content by using one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method; wherein, the first content At least a portion of the resource content corresponding to the URL.
  • S210 Identify at least a part of the first content.
  • S220 Perform XML serialization on at least a part of the first content.
  • S230 extract at least a part of the XML node information from the XML file generated after the XML serialization, to obtain at least a part of the second content.
  • S240 Determine, by using a domain name and an HTML tag in the URL to be displayed, at least a part of the category of the second content, and determine the determined category as the category of the first content.
  • S250 Determine a content presentation form according to the domain name in the category and the URL to be displayed, and based on at least a part of the second content.
  • S260 Display the URL to be displayed according to the content presentation form.
  • the beautification of the URL can help the reader to obtain more information without clicking the link, thereby improving the expressiveness of the URL and enhancing the presentation power of the article, thereby improving the appeal to the user.
  • the user's click rate can be increased.
  • the embodiment of the present application further provides an information expression method.
  • the method may include any of the foregoing URL presentation method embodiments, and may specifically include: acquiring an article to be processed, and displaying the URL in the article to be processed by using any of the URL display methods described above.
  • the article to be processed is obtained, and then the URL in the article to be processed is displayed through the URL display method. Since the URL display method can accurately display the resource content corresponding to the URL, the expressiveness of the URL is improved, and the presentation power of the article is further enhanced, thereby increasing the appeal and click rate of the user.
  • the embodiment of the present application further provides a uniform resource locator URL display device based on the same technical concept as the URL display method embodiment.
  • the device embodiment can perform the above method embodiments.
  • the apparatus 30 can include an acquisition module 32, a first determination module 34, a second determination module 36, and a presentation module 38.
  • the obtaining module 32 is configured to obtain the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed.
  • the first determining module 34 is configured to determine a category of the first content.
  • the second determining module 36 is configured to determine a content presentation form according to the category.
  • the presentation module 38 is configured to display the URL to be displayed in a content presentation form.
  • the obtaining module may specifically include: a URL obtaining subunit and a crawling unit.
  • the URL obtaining subunit is configured to obtain a URL to be displayed.
  • the crawling unit is configured to capture the resource content corresponding to the URL to be displayed by using any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby mode; a first content acquisition subunit, configured to acquire at least a part of the resource content as the first content.
  • the first determining module may specifically include: an identifying unit, an extracting unit, and a classifying unit.
  • the identification unit is configured to identify at least a portion of the first content.
  • the extracting unit is configured to extract at least a portion of the second content from the identified at least a portion of the first content.
  • the classification unit is configured to determine the category of the at least a portion of the second content, and determine the determined category as the category of the first content.
  • the extracting unit may specifically include: a serializing unit and an extracting subunit.
  • the serialization unit is configured to perform XML serialization on at least a part of the first content.
  • the extracting subunit is configured to extract at least a part of the XML node information from the XML file generated after the XML serialization to obtain the at least part of the second content.
  • the classification unit may specifically include: a classification subunit.
  • the classification subunit is configured to determine a category of the at least a portion of the second content by using a domain name and an HTML tag in the URL to be displayed.
  • the foregoing classification sub-unit further includes: a matching unit, a category extracting unit, a category determining unit, and a content sorting unit.
  • the matching unit is configured to match the domain name in the URL to be displayed with a predetermined domain name list, where the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name; a unit, configured to: when the matching is successful, extract the first category information; a category determining unit, configured to identify an HTML tag in the URL to be displayed, determine second category information; and a content classification unit, configured to Determining the category of the at least a portion of the second content by describing the first category information and the second category information.
  • the URL presentation device may further include an execution unit.
  • the execution unit is configured to identify an HTML tag in the to-be-displayed URL, determine a third category information, and determine a category of the at least a portion of the second content according to the third category information.
  • the foregoing second determining module may specifically include: determining a subunit.
  • the determining subunit is configured to determine a content presentation form according to the domain name in the category and the URL to be displayed, and based on at least a portion of the second content.
  • the embodiment of the present application further provides an information expression system.
  • the system can include a URL presentation device and an expression module.
  • the expression module is configured to obtain an article to be processed, and display the URL in the to-be-processed article by using the URL display device.
  • URL presentation apparatus embodiments and information expression system embodiments may also include some well-known structures such as a processor, a memory, a bus, and the like.
  • the processor is connected to the memory through a bus.
  • Processors include, but are not limited to, ARM, programmable logic devices, DSPs, and the like.
  • the memory can be either a random access memory or a read only memory.
  • the bus can include a data bus, an address bus, and a control bus.
  • the module is divided into modules in an exemplary manner, and those skilled in the art should be able to understand that other methods may be used for module division. And the divided modules can be further split or combined; and the modules can be executed sequentially or in parallel, which is not limited herein.
  • the embodiment of the present application further provides an electronic device, as shown in FIG. 5, including: a processor 501, a communication interface 502, a memory 503, and a communication bus 504, wherein the processor 501, the communication interface 502, and the memory 503 pass through a communication bus. 504 completes communication with each other,
  • the processor 501 is configured to implement the foregoing resource locator URL display method step when the program stored on the memory 503 is executed, where the method includes:
  • the URL to be displayed is displayed in the content presentation form.
  • the acquiring the first content specifically includes:
  • the resource content corresponding to the URL to be displayed is captured by any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method;
  • the determining the category of the first content includes:
  • the extracting at least a portion of the second content from the at least a portion of the first content that is identified includes:
  • the determining the category of the at least a portion of the second content includes:
  • the determining, by using the domain name and the HTML tag in the URL to be displayed, the category of the at least part of the second content specifically:
  • the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name
  • the method further includes:
  • the determining the content presentation form according to the category includes:
  • the embodiment of the present application obtains the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed; determining the category of the first content; determining the content presentation form according to the category; and performing the URL to be displayed according to the content presentation form Show.
  • the technical solution determines the content presentation form by judging the category of the resource content corresponding to the URL, thereby displaying the URL through the content presentation form.
  • the technical solution provided by the embodiment of the present application enhances the expressiveness of the URL and enhances the presentation power of the article by displaying the URL according to the content presentation form. It can increase the appeal to users, which in turn can increase the user's click rate.
  • the embodiment of the present application objectively acquires the first content, then determines the category thereof, and then determines the content presentation form according to the category.
  • the URL is displayed objectively and realistically according to the content presentation form; thereby being able to accurately
  • the resource content corresponding to the URL is displayed, thereby improving the expressiveness of the URL, enhancing the presentation power of the article, and thereby increasing the appeal and click rate of the user.
  • the embodiment of the present application further provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus.
  • a memory for storing a computer program
  • the processor is configured to implement the information expression method step when the program stored in the memory is executed, and the method includes:
  • the article to be processed is obtained, and then the URL in the article to be processed is displayed through the URL display method. Since the URL display method can accurately display the resource content corresponding to the URL, the expressiveness of the URL is improved, and the presentation power of the article is further enhanced, thereby increasing the appeal and click rate of the user.
  • the embodiment of the present application further discloses a storage medium, where the storage medium is used to store an application, and the application is configured to execute a uniform resource locator URL display method step at a runtime, where the method includes:
  • the URL to be displayed is displayed in the content presentation form.
  • the acquiring the first content specifically includes:
  • the resource content corresponding to the URL to be displayed is captured by any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method;
  • the determining the category of the first content includes:
  • Determining the category of the at least a portion of the second content determining the determined category as the category of the first content.
  • the extracting at least a portion of the second content from the at least a portion of the first content that is identified includes:
  • the determining the category of the at least a portion of the second content includes:
  • the determining, by using the domain name and the HTML tag in the URL to be displayed, the category of the at least part of the second content specifically:
  • the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name
  • the method further includes:
  • the determining the content presentation form according to the category includes:
  • the embodiment of the present application obtains the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed; determining the category of the first content; determining the content presentation form according to the category; and performing the URL to be displayed according to the content presentation form Show.
  • the technical solution determines the content presentation form by judging the category of the resource content corresponding to the URL, thereby displaying the URL through the content presentation form.
  • the technical solution provided by the embodiment of the present application enhances the expressiveness of the URL and enhances the presentation power of the article by displaying the URL according to the content presentation form. It can increase the appeal to users, which in turn can increase the user's click rate.
  • the embodiment of the present application objectively acquires the first content, then determines the category thereof, and then determines the content presentation form according to the category.
  • the URL is displayed objectively and realistically according to the content presentation form; thereby being able to accurately
  • the resource content corresponding to the URL is displayed, thereby improving the expressiveness of the URL, enhancing the presentation power of the article, and thereby increasing the appeal and click rate of the user.
  • the embodiment of the present application further discloses a storage medium, where the storage medium is used to store an application, and the application is configured to execute an information expression method step at a runtime, where the method includes:
  • the article to be processed is obtained, and then the URL in the article to be processed is displayed through the URL display method. Since the URL display method can accurately display the resource content corresponding to the URL, the expressiveness of the URL is improved, and the presentation power of the article is further enhanced, thereby increasing the appeal and click rate of the user.
  • the embodiment of the present application discloses an application, where the application is used to execute a uniform resource locator URL display method at runtime, and the method includes:
  • the URL to be displayed is displayed in the content presentation form.
  • the acquiring the first content specifically includes:
  • the resource content corresponding to the URL to be displayed is captured by any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method;
  • the determining the category of the first content includes:
  • the extracting at least a portion of the second content from the at least a portion of the first content that is identified includes:
  • the determining the category of the at least a portion of the second content includes:
  • the determining, by using the domain name and the HTML tag in the URL to be displayed, the category of the at least part of the second content specifically:
  • the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name
  • the method further includes:
  • the determining the content presentation form according to the category includes:
  • the embodiment of the present application obtains the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed; determining the category of the first content; determining the content presentation form according to the category; and performing the URL to be displayed according to the content presentation form Show.
  • the technical solution determines the content presentation form by judging the category of the resource content corresponding to the URL, thereby displaying the URL through the content presentation form.
  • the technical solution provided by the embodiment of the present application enhances the expressiveness of the URL and enhances the presentation power of the article by displaying the URL according to the content presentation form. It can increase the appeal to users, which in turn can increase the user's click rate.
  • the embodiment of the present application objectively acquires the first content, then determines the category thereof, and then determines the content presentation form according to the category.
  • the URL is displayed objectively and realistically according to the content presentation form; thereby being able to accurately
  • the resource content corresponding to the URL is displayed, thereby improving the expressiveness of the URL, enhancing the presentation power of the article, and thereby increasing the appeal and click rate of the user.
  • the embodiment of the present application discloses an application program for executing an information expression method step at runtime, where the method includes:
  • the article to be processed is obtained, and then the URL in the article to be processed is displayed through the URL display method. Since the URL display method can accurately display the resource content corresponding to the URL, the expressiveness of the URL is improved, and the presentation power of the article is further enhanced, thereby increasing the appeal and click rate of the user.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Provided are a uniform resource locator (URL) display method, an information expression method, a uniform resource locator (URL) display device, and an information expression system. The uniform resource locator (URL) display method may comprise: acquiring first content, wherein the first content is at least some resource content corresponding to a URL to be displayed; determining the category of the first content; determining a content presentation form according to the category; and displaying the URL to be displayed in the content presentation form, wherein determining the content presentation form according to the category specifically comprises: determining a content presentation form according to the category and a domain name in the URL to be displayed, and based on at least some of second content. Therefore, by means of the embodiments of the present application, the technical problem of how to improve the expressiveness of a URL is solved, and the URL is beautified, thereby improving the expressiveness of the URL and enhancing the presentation of an article, so that the appeal for a user can be improved, and the user's click rate can be increased.

Description

统一资源定位符展示方法、信息表达方法及其相关产品Uniform resource locator display method, information expression method and related products
本申请要求于2017年5月26日提交中国专利局、申请号为201710385155.5发明名称为“统一资源定位符展示方法、信息表达方法及其相关产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on May 26, 2017, the Chinese Patent Office, application number 201710385155.5, entitled "Uniform Resource Locator Display Method, Information Expression Method and Related Products", the entire contents of which are The citations are incorporated herein by reference.
技术领域Technical field
本申请涉及互联网技术领域,特别是涉及一种统一资源定位符展示方法、信息表达方法、统一资源定位符展示装置及信息表达系统。The present application relates to the field of Internet technologies, and in particular, to a unified resource locator display method, an information expression method, a uniform resource locator display device, and an information expression system.
背景技术Background technique
目前,作者常常通过互联网来发表文章。当作者在其文章中介绍网站、视频、歌曲或引用其他文章等情况时,会添加相应的URL(Uniform Resource Locator,统一资源定位符)到自己所发表的文章中,也就是我们常常看到的超链接。这样,用户在阅读该文章时,直接点击该URL,即可跳转到相应的网页,进行浏览。Currently, authors often publish articles via the Internet. When an author introduces a website, video, song, or other article in his or her article, a corresponding URL (Uniform Resource Locator) is added to the article he published, which is what we often see. Hyperlink. In this way, when the user reads the article, directly click on the URL to jump to the corresponding webpage and browse.
具体而言,现有的展示URL的方式主要有两种:第一种方式是直接将URL的字符串写入文章中;第二种方式是将URL设置成文本超链接,然后,作者根据自己的理解将该URL对应的资源内容以文字的形式进行描述。Specifically, there are two main ways to display URLs: the first way is to directly write the string of the URL into the article; the second way is to set the URL as a text hyperlink, and then the author according to himself The understanding of the resource content corresponding to the URL is described in the form of text.
对于第一种方式而言,这种在文章中直接展示URL的方式,实际上仅仅是直接展示了URL字符串本身,而URL字符串对于用户而言并无任何内容信息,因而其无法明确展示出URL后希望展示给用户的主要资源内容。所以,这种展示方式使得URL表现力差、缺乏吸引力度。对于第二种方式来说,虽然其试图方便用户理解并吸引用户点击;但是,由于文字描述是作者主观上的总结,所以,在本质上,这种方式所展示URL的文字描述已经成为了文章的一部分了,其并没有客观真实地展示URL。又因为文字表达具有局限性,而且也与作者的表达能力相关;所以,这种URL展示方式的表现力也不强。For the first way, the way to display the URL directly in the article is actually just to directly display the URL string itself, and the URL string does not have any content information for the user, so it cannot be explicitly displayed. The main resource content that you want to display to the user after the URL. Therefore, this kind of display makes the URL poorly expressed and unattractive. For the second method, although it tries to make the user understand and attract the user's click; however, since the text description is the author's subjective summary, in essence, the text description of the URL displayed in this way has become an article. Part of it, it does not objectively and realistically display the URL. And because the expression of words is limited, and it is also related to the author's ability to express; therefore, the expression of this URL is not very expressive.
发明内容Summary of the invention
本申请实施例的目的在于提供一种统一资源定位符URL展示方法,以至少解决如何提高URL表现力的技术问题。此外,还提供一种信息表达方法、 统一资源定位符URL展示装置以及信息表达系统。The purpose of the embodiment of the present application is to provide a uniform resource locator URL display method to solve at least the technical problem of how to improve the expressiveness of a URL. In addition, an information expression method, a uniform resource locator URL display device, and an information expression system are also provided.
为了实现上述目的,根据本申请的一个方面,提供以下技术方案:In order to achieve the above object, according to an aspect of the present application, the following technical solutions are provided:
一种统一资源定位符URL展示方法,该方法可以包括:A uniform resource locator URL display method, the method may include:
获取第一内容,其中,所述第一内容为待展示URL对应的至少一部分资源内容;Acquiring the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;
确定所述第一内容的类别;Determining a category of the first content;
根据所述类别,确定内容展现形式;Determining a content presentation form according to the category;
将所述待展示URL按照所述内容展现形式进行展示。The URL to be displayed is displayed in the content presentation form.
可选的,所述获取第一内容具体可以包括:Optionally, the obtaining the first content may include:
获取待展示URL;Get the URL to be displayed;
通过以下任一种或几种方式,抓取所述待展示URL对应的资源内容:爬虫工具、ASP网页抓取工具、Java方式、PHP方式、Delphi方式、Python方式、Flex方式或Ruby方式;The resource content corresponding to the URL to be displayed is captured by any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method;
获取所述资源内容的至少一部分,作为第一内容。Obtaining at least a portion of the content of the resource as the first content.
可选的,所述确定所述第一内容的类别,具体可以包括:Optionally, the determining the category of the first content may specifically include:
识别至少一部分所述第一内容;Identifying at least a portion of the first content;
从所识别的至少一部分第一内容中,提取至少一部分第二内容;Extracting at least a portion of the second content from the identified at least a portion of the first content;
确定所述至少一部分第二内容的类别,将所确定的类别确定为所述第一内容的类别。Determining the category of the at least a portion of the second content, and determining the determined category as the category of the first content.
可选的,所述从所识别的至少一部分第一内容中,提取至少一部分第二内容,具体包括:Optionally, the extracting at least a part of the second content from the at least one part of the first content that is identified includes:
对所述至少一部分第一内容进行XML序列化;XML serializing the at least a portion of the first content;
从XML序列化后生成的XML文件中,提取至少一部分XML节点信息,得到所述至少一部分第二内容。Extracting at least a portion of the XML node information from the XML file generated by the XML serialization to obtain the at least a portion of the second content.
可选的,所述确定所述至少一部分第二内容的类别,具体包括:Optionally, the determining the category of the at least part of the second content includes:
利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed.
可选的,所述利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别,具体包括:Optionally, the determining, by using the domain name and the HTML tag in the URL to be displayed, the category of the at least part of the second content, specifically:
将所述待展示URL中的域名与预定的域名列表进行匹配;其中,所述预定的域名列表包括预定的域名及所述预定的域名对应的第一类别信息;Matching the domain name in the URL to be displayed with a predetermined domain name list; wherein the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;
若匹配成功,则提取所述第一类别信息;If the matching is successful, extracting the first category information;
识别所述待展示URL中的HTML标签,确定第二类别信息;Identifying an HTML tag in the URL to be displayed, and determining second category information;
根据所述第一类别信息和所述第二类别信息,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content based on the first category information and the second category information.
可选的,所述方法还包括:Optionally, the method further includes:
若匹配失败,则识别所述待展示URL中的HTML标签,确定第三类别信息;If the matching fails, identifying an HTML tag in the URL to be displayed, and determining third category information;
根据所述第三类别信息,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content based on the third category information.
可选的,所述根据所述类别,确定内容展现形式,具体包括:Optionally, the determining the content presentation form according to the category includes:
根据所述类别和所述待展示URL中的域名,并基于所述至少一部分第二内容,确定所述内容展现形式。Determining the content presentation form according to the category and the domain name in the URL to be displayed, and based on the at least a portion of the second content.
为了实现上述目的,根据本申请的另一个方面,还提供了以下技术方案:In order to achieve the above object, according to another aspect of the present application, the following technical solutions are also provided:
一种信息表达方法,该方法可以包括:An information expression method, the method may include:
获取待处理文章,通过上述任一所述的URL展示方法,对所述待处理文章中的URL进行展示。Obtaining a to-be-processed article, and displaying the URL in the to-be-processed article by using the URL display method described in any of the above.
为了实现上述目的,根据本申请的再一个方面,还提供了以下技术方案:In order to achieve the above object, according to still another aspect of the present application, the following technical solutions are also provided:
一种统一资源定位符URL展示装置,该装置可以包括:A uniform resource locator URL display device, the device may include:
获取模块,用于获取第一内容,其中,所述第一内容为待展示URL对应的至少一部分资源内容;An acquiring module, configured to acquire the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;
第一确定模块,用于确定所述第一内容的类别;a first determining module, configured to determine a category of the first content;
第二确定模块,用于根据所述类别,确定内容展现形式;a second determining module, configured to determine a content presentation form according to the category;
展示模块,用于将所述待展示URL按照所述内容展现形式进行展示。And a display module, configured to display the to-be-displayed URL according to the content presentation form.
可选的,所述获取模块具体可以包括:Optionally, the acquiring module may specifically include:
URL获取子单元,用于获取待展示URL;a URL acquisition subunit for obtaining a URL to be displayed;
抓取单元,用于通过以下任一种或几种方式,抓取所述待展示URL对应的资源内容:爬虫工具、ASP网页抓取工具、Java方式、PHP方式、Delphi方式、Python方式、Flex方式或Ruby方式;The crawling unit is configured to capture the resource content corresponding to the URL to be displayed by using any one or more of the following methods: a crawler tool, an ASP web crawling tool, a Java mode, a PHP mode, a Delphi mode, a Python mode, and a Flex Way or Ruby way;
第一内容获取子单元,用于获取所述资源内容的至少一部分,作为第一内容。The first content acquisition subunit is configured to acquire at least a part of the resource content as the first content.
可选的,所述第一确定模块具体可以包括:Optionally, the first determining module may specifically include:
识别单元,用于识别至少一部分所述第一内容;An identification unit, configured to identify at least a portion of the first content;
提取单元,用于从所识别的至少一部分第一内容中,提取至少一部分第二内容;An extracting unit, configured to extract at least a part of the second content from the identified at least part of the first content;
分类单元,用于确定所述至少一部分第二内容的类别,将所确定的类别确定为所述第一内容的类别。a classifying unit, configured to determine a category of the at least a portion of the second content, and determine the determined category as a category of the first content.
可选的,所述提取单元具体包括:Optionally, the extracting unit specifically includes:
序列化单元,用于对所述至少一部分第一内容进行XML序列化;a serializing unit, configured to perform XML serialization on the at least a part of the first content;
提取子单元,用于从XML序列化后生成的XML文件中,提取至少一部分XML节点信息,得到所述至少一部分第二内容。And an extraction subunit, configured to extract at least a part of the XML node information from the XML file generated after the XML serialization, to obtain the at least part of the second content.
可选的,所述分类单元具体包括:Optionally, the classifying unit specifically includes:
分类子单元,用于利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别。a classification subunit, configured to determine a category of the at least a portion of the second content by using a domain name and an HTML tag in the URL to be displayed.
可选的,所述分类子单元具体包括:Optionally, the classification subunit specifically includes:
匹配单元,用于将所述待展示URL中的域名与预定的域名列表进行匹配; 其中,所述预定的域名列表包括预定的域名及所述预定的域名对应的第一类别信息;a matching unit, configured to match a domain name in the URL to be displayed with a predetermined domain name list, where the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;
类别提取单元,用于在匹配成功的情况下,提取所述第一类别信息;a class extracting unit, configured to extract the first category information if the matching is successful;
类别确定单元,用于识别所述待展示URL中的HTML标签,确定第二类别信息;a category determining unit, configured to identify an HTML tag in the URL to be displayed, and determine second category information;
内容分类单元,用于根据所述第一类别信息和所述第二类别信息,确定所述至少一部分第二内容的类别。a content classification unit, configured to determine, according to the first category information and the second category information, a category of the at least a portion of the second content.
可选的,所述装置还包括:Optionally, the device further includes:
执行单元,用于在匹配失败的情况下,识别所述待展示URL中的HTML标签,确定第三类别信息,根据所述第三类别信息,确定所述至少一部分第二内容的类别。And an execution unit, configured to identify an HTML tag in the to-be-displayed URL, determine a third category information, and determine a category of the at least a portion of the second content according to the third category information.
可选的,所述第二确定模块具体包括:Optionally, the second determining module specifically includes:
确定子单元,用于根据所述类别和所述待展示URL中的域名,并基于所述至少一部分第二内容,确定所述内容展现形式。Determining a subunit, configured to determine the content presentation form based on the category and the domain name in the to-be-displayed URL, and based on the at least a portion of the second content.
为了实现上述目的,根据本申请的又一个方面,还提供了以下技术方案:In order to achieve the above object, according to still another aspect of the present application, the following technical solutions are also provided:
一种信息表达系统,该系统可以包括:An information expression system, the system may include:
上述任一所述的URL展示装置;a URL display device according to any of the above;
表达模块,用于获取待处理文章,通过所述URL展示装置,对所述待处理文章中的URL进行展示。And an expression module, configured to obtain a to-be-processed article, and display the URL in the to-be-processed article by using the URL display device.
为达到上述目的,本申请实施例还公开了一种电子设备,包括:处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信,To achieve the above objective, an embodiment of the present application further discloses an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus.
存储器,用于存放计算机程序;a memory for storing a computer program;
处理器,用于执行存储器上所存放的程序时,实现上述的统一资源定位符URL展示方法,或,上述的信息表达方法。The processor, when used to execute a program stored on the memory, implements the above-described uniform resource locator URL display method, or the above information expression method.
为达到上述目的,本申请实施例还公开了一种存储介质,所述存储介质用于存储应用程序,所述应用程序用于在运行时执行上述的统一资源定位符URL展示方法,或,上述的信息表达方法。In order to achieve the above objective, the embodiment of the present application further discloses a storage medium, where the storage medium is used to store an application, and the application is configured to execute the foregoing uniform resource locator URL display method at runtime, or Information expression method.
为达到上述目的,本申请实施例公开了一种应用程序,所述应用程序用于在运行时执行上述的统一资源定位符URL展示方法,或,上述的信息表达方法。To achieve the above objective, the embodiment of the present application discloses an application program for executing the foregoing uniform resource locator URL display method or the above information expression method at runtime.
本申请实施例提供一种统一资源定位符URL展示方法、信息表达方法、统一资源定位符URL展示装置以及信息表达系统。其中,该统一资源定位符URL展示方法可以包括:获取第一内容,其中,第一内容为待展示URL对应的至少一部分资源内容;确定第一内容的类别;根据类别,确定内容展现形式;将待展示URL按照内容展现形式来进行展示。该技术方案通过判断URL对应的资源内容的类别,来确定内容展现形式,从而通过内容展现形式来展示URL。与直接展示URL中字符的方式相比,本申请实施例提供的技术方案由于将URL按照内容展现形式来进行展示,美化了URL,因而提高了URL的表现力,增强了文章的展现力,从而可以提高对用户的吸引力度,进而可以增加用户的点击率。相比于作者主观地对URL所对应的资源内容进行概括,然后,通过这种文字概括来展示URL的方式,本申请实施例提供的技术方案进行内容展示中所涉及到的内容是客观获取的,也就是说,本申请实施例客观地获取第一内容,然后判断其类别,接着根据该类别来确定内容展现形式,最后,按照该内容展现形式客观真实地对URL进行展示;从而能够准确地展示URL对应的资源内容,因而提高了URL的表现力,增强了文章的展现力,进而可以提高对用户的吸引力度和点击率。The embodiment of the present application provides a uniform resource locator URL display method, an information expression method, a uniform resource locator URL display device, and an information expression system. The uniform resource locator URL display method may include: acquiring the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed; determining a category of the first content; determining a content presentation form according to the category; The URL to be displayed is displayed in the form of content presentation. The technical solution determines the content presentation form by judging the category of the resource content corresponding to the URL, thereby displaying the URL through the content presentation form. Compared with the manner of directly displaying the characters in the URL, the technical solution provided by the embodiment of the present application enhances the expressiveness of the URL and enhances the presentation power of the article by displaying the URL according to the content presentation form. It can increase the appeal to users, which in turn can increase the user's click rate. Compared with the author's subjectively summarizing the resource content corresponding to the URL, and then displaying the URL by the text summary, the content involved in the content presentation provided by the technical solution provided by the embodiment of the present application is objectively obtained. That is, the embodiment of the present application objectively acquires the first content, then determines the category thereof, and then determines the content presentation form according to the category. Finally, the URL is displayed objectively and realistically according to the content presentation form; thereby being able to accurately The resource content corresponding to the URL is displayed, thereby improving the expressiveness of the URL, enhancing the presentation power of the article, and thereby increasing the appeal and click rate of the user.
当然,实施本申请的任一产品或方法不必一定需要同时达到以上所述的所有优点。Of course, it is not necessary for any of the products or methods of the present application to achieve all of the advantages described above.
附图说明DRAWINGS
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application and the technical solutions of the prior art, the following description of the embodiments and the drawings used in the prior art will be briefly introduced. Obviously, the drawings in the following description are only Some embodiments of the application may also be used to obtain other figures from those of ordinary skill in the art without departing from the scope of the invention.
图1为根据本申请实施例的URL展示方法的流程示意图;FIG. 1 is a schematic flowchart of a URL display method according to an embodiment of the present application;
图2为根据本申请另一实施例的URL展示方法的流程示意图;2 is a schematic flowchart of a URL display method according to another embodiment of the present application;
图3为根据本申请实施例的URL展示装置的结构示意图;FIG. 3 is a schematic structural diagram of a URL display apparatus according to an embodiment of the present application; FIG.
图4为根据本申请实施例的信息表达系统的结构示意图;4 is a schematic structural diagram of an information expression system according to an embodiment of the present application;
图5为本申请实施例提供的电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
具体实施方式detailed description
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
需要说明的是,在不冲突的情况下,本申请各实施例中的技术特征可以相互组合。It should be noted that the technical features in the embodiments of the present application may be combined with each other without conflict.
术语说明Terminology
内容展现形式是指为了让内容更容易被理解而呈现出来的各种内容展现方式,例如:视频、音频、图文混排、地图或文字等,但绝不限于此。The content presentation form refers to various content presentation methods for making the content easier to understand, such as video, audio, graphic and text, map or text, but is not limited thereto.
文本标记语言(HTML,Hypertext Markup Language)用于描述网页文件。HTML通过在网页文件中添加标记符的方式,来标记网页中的各个部分如何显示。其中,网页文件本身是一种文本文件,添加有标记符的网页文件可以告诉浏览器如何显示其中的内容,例如:文字如何处理,画面如何安排,图片如何显示等。Text markup language (HTML, Hypertext Markup Language) is used to describe web page files. HTML marks how parts of a web page are displayed by adding tags to the web page file. The webpage file itself is a text file, and the webpage file with the tag added can tell the browser how to display the content, for example: how to handle the text, how the screen is arranged, how the image is displayed, and the like.
可扩展标记语言(XML,Extensible Markup Language)为网站表示结构化信息的一种标准文本格式。XML把数据从HTML页中分离出来得到XML数据。XML数据以纯文本格式进行存储。可以在HTML页中访问XML数据。Extensible Markup Language (XML) is a standard text format for websites that represent structured information. XML separates the data from the HTML page to get the XML data. XML data is stored in plain text format. XML data can be accessed in an HTML page.
本申请实施例提供了一种URL展示方法,通过提前获取到待展示URL对应的资源内容并判断该内容的类别,然后,再根据该类别确定与之相适合的内容展现形式,最后,将该待展示URL与该内容展现形式一起呈现在文章中, 从而让URL更加清晰、有表现力的展示所代表的信息,进而也会让用户更愿意点击。The embodiment of the present application provides a URL display method, by obtaining the resource content corresponding to the URL to be displayed in advance and determining the category of the content, and then determining a content presentation form suitable according to the category, and finally, The URL to be displayed is presented in the article along with the content presentation form, so that the URL represents a clearer and more expressive representation of the information, which in turn makes the user more willing to click.
对此,在实际应用中,为了解决如何提高URL表现力的技术问题,本申请实施例提供一种统一资源定位符URL展示方法。如图1所示,该方法可以通过步骤S100至步骤S130来实现。其中:In this regard, in an actual application, in order to solve the technical problem of how to improve the performance of the URL, the embodiment of the present application provides a uniform resource locator URL display method. As shown in FIG. 1, the method can be implemented by step S100 to step S130. among them:
S100:获取第一内容,其中,第一内容为待展示URL对应的至少一部分资源内容。S100: Acquire a first content, where the first content is at least a part of resource content corresponding to the URL to be displayed.
其中,URL(Uniform Resource Locator)的意思是统一资源定位符,俗称网页地址。URL是对可以从互联网上得到的资源的位置和访问方法的一种简洁的表示,是互联网上标准资源的地址。互联网上的每个文件都有一个唯一的URL,它包含的信息指出文件的位置以及浏览器应该怎么处理它。互联网上的任何资源内容(例如:网站、文件、文章、视频等)都可以用URL来指代。Among them, the URL (Uniform Resource Locator) means a uniform resource locator, commonly known as a web page address. A URL is a concise representation of the location and access method of a resource that can be obtained from the Internet, and is the address of a standard resource on the Internet. Every file on the Internet has a unique URL that contains information indicating the location of the file and how the browser should handle it. Any resource content on the Internet (eg, websites, files, articles, videos, etc.) can be referred to by URL.
这里需要说明的是,本文中提到的“至少一部分”可以是一部分,也可以是全部。例如:待展示URL对应的至少一部分资源内容可以是待展示URL对应的一部分资源内容,也可以是待展示URL对应的全部资源内容。It should be noted here that "at least a part" mentioned herein may be part or all. For example, at least a part of the resource content corresponding to the URL to be displayed may be part of the resource content corresponding to the URL to be displayed, or may be all resource content corresponding to the URL to be displayed.
在一些可选的实施例中,步骤S100具体可以包括:获取待展示URL;通过以下任一种或几种方式,抓取待展示URL对应的资源内容:爬虫工具、ASP(活动服务器页面)网页抓取工具、Java(爪哇)方式、PHP(Hypertext Preprocessor,超文本预处理语言)方式、Delphi(德尔菲可视化面向对象设计工具)方式、Python(结合了解释性、编译性、互动性和面向对象的脚本语言)方式、Flex(基于Flash平台的应用工具)方式或Ruby(跨平台面向对象的解释型语言)方式;获取资源内容的至少一部分,作为第一内容。In some optional embodiments, step S100 may specifically include: obtaining a URL to be displayed; and capturing resource content corresponding to the URL to be displayed by using any one or more of the following methods: a crawler tool, an ASP (Active Server Page) webpage Crawler, Java (Java), PHP (Hypertext Preprocessor), Delphi (Delphi Visualization), Python (combined with interpretability, compiling, interactivity, and object-orientation) Scripting language), Flex (Flash-based application-based) or Ruby (cross-platform object-oriented interpreted language); acquire at least part of the resource content as the first content.
下面以爬虫工具为例来说明抓取URL对应的资源内容的过程。其中,该爬虫工具可以通过Java、Python等方式来实现。The following uses the crawler tool as an example to illustrate the process of crawling the resource content corresponding to the URL. Among them, the crawler tool can be implemented by means of Java, Python, and the like.
具体地,预先设定有若干个初始网页,抓取任一初始网页,获得该初始网页上的URL放入等待抓取资源内容的URL队列,通过源码解析来获取放入等待抓取资源内容的URL队列的该URL对应的一个资源内容,在抓取网页的 过程中,不断从当前网页上抽取新的URL放入等待抓取资源内容的URL队列并获取该新的URL对应的资源内容,直至抓取完所有网页。Specifically, a plurality of initial webpages are preset, and any initial webpage is captured, and the URL on the initial webpage is obtained into a URL queue waiting to capture the content of the resource, and the source code parsing is used to obtain the content of the waiting for the crawling resource. A resource content corresponding to the URL of the URL queue, in the process of crawling the webpage, continuously extracting a new URL from the current webpage into a URL queue waiting to capture the resource content, and acquiring the resource content corresponding to the new URL, until Crawl all pages.
或者,通过以下方式来抓取资源内容:根据已有的网页分析算法从预设的若干个网页中,过滤与主题无关的URL,保留有用的URL,并将其放入等待抓取资源内容的URL队列。然后,根据预设搜索策略从队列中依次选择要抓取资源内容的URL,直至抓取完所有URL对应的资源内容。Or, the content of the resource is fetched by filtering the URLs that are not related to the topic from a preset number of webpages according to the existing webpage analysis algorithm, retaining the useful URL, and placing it in the content of waiting for the resource to be fetched. URL queue. Then, according to the preset search strategy, the URLs of the resource content to be crawled are sequentially selected from the queue until the resource content corresponding to all the URLs is captured.
其中,预设搜索策略可以为优先抓取重要的网页上的URL对应的资源内容。这里,可以通过链接的欢迎度、链接重要度和平均链接深度来衡量网页的重要性。其中,链接的欢迎度可以通过是否有数量众多的网页所指向来衡量。链接重要度可以通过是否包含“.com”、“home”以及较少的“/”来衡量。平均链接深度表示在一个种子站点集合中每个种子站点到网页的链路距离,其中,链路距离可以通过宽度优先遍历规则来确定)。The preset search policy may preferentially capture the resource content corresponding to the URL on the important webpage. Here, you can measure the importance of a web page by its popularity, link importance, and average link depth. Among them, the popularity of the link can be measured by whether there are a large number of pages pointed to. Link importance can be measured by whether it contains ".com", "home", and fewer "/". The average link depth represents the link distance of each seed site to the web page in a seed site set, wherein the link distance can be determined by the width first traversal rule).
再以ASP网页抓取工具为例来说明抓取URL对应的资源内容的过程。具体的,使用C#语言(其为面向对象的程序设计语言)填写待抓取URL的Form基类表达得到待抓取URL的参数,然后,通过post方法(与服务器异步通信的方式)传递待抓取URL的参数至服务器,服务器基于接收到的待抓取URL的参数抓取待抓取URL对应的资源内容,其中,在编程中,Form组成应用程序用户界面的一部分。The ASP web crawling tool is taken as an example to illustrate the process of crawling the resource content corresponding to the URL. Specifically, using the C# language (which is an object-oriented programming language) to fill in the Form base class expression of the URL to be crawled to obtain the parameter of the URL to be crawled, and then pass the post method (in a manner of asynchronous communication with the server) to be captured. The parameter of the URL is taken to the server, and the server captures the resource content corresponding to the URL to be crawled based on the received parameter of the URL to be crawled, wherein, in programming, the Form constitutes a part of the application user interface.
本领域技术人员应能理解,在实际进行内容获取时,还可以采用第三方工具来实现资源内容的抓取。Those skilled in the art should be able to understand that when content acquisition is actually performed, third party tools can also be used to implement resource content capture.
例如:可以利用jsoup解析器(其为基于Java的HTML解析器)来直接解析待展示URL地址,通过DOM(Document Object Model,文件对象模型)、CSS(Cascading Style Sheets,层叠样式表)来抓取待展示URL对应的资源内容。For example, you can use the jsoup parser (which is a Java-based HTML parser) to directly parse the URL to be displayed, and use the DOM (Document Object Model), CSS (Cascading Style Sheets) to fetch The resource content corresponding to the URL to be displayed.
本领域技术人员应该能够理解,上述举例并非穷举,任意现有的或今后可能出现的获取第一内容的方式,若可应用于本申请实施例,则也应包括在本申请的保护范围之内,并在此以引用的方式结合于此。It should be understood by those skilled in the art that the above examples are not exhaustive, and any existing or future possible manner of obtaining the first content, if applicable to the embodiments of the present application, should also be included in the scope of protection of the present application. This is incorporated herein by reference.
S110:确定第一内容的类别。S110: Determine a category of the first content.
在一些可选的实施例中,本步骤具体可以通过步骤S112至步骤S116来实现。其中:In some optional embodiments, this step may be specifically implemented by using step S112 to step S116. among them:
S112:识别至少一部分第一内容。S112: Identify at least a part of the first content.
作为示例,本步骤可以基于由浏览器所显示的HTML源代码,通过识别HTML标签来识别至少一部分第一内容。因为HTML标签是预定义的,所以,根据HTML标签就可以识别标签中的内容是文字,还是视频,亦或是标题。对于一个网页来说,其中可能既有图片,也有文字等等,通过本步骤可以识别出哪部分是图片,哪部分是文字,亦或是哪部分是视频等。As an example, this step may identify at least a portion of the first content by identifying the HTML tags based on the HTML source code displayed by the browser. Because HTML tags are predefined, it is possible to identify whether the content in the tag is text, video, or title based on the HTML tag. For a web page, there may be images, texts, etc. Through this step, it can be identified which part is the picture, which part is the text, or which part is the video.
具体地,举例来说,“<audio>”标签定义了声音内容,通过识别该标签可以知道这部分内容为音频。“<canvas>”标签定义了图形,通过识别该标签可以知道这部分内容为图形。“<video>”标签定义了视频,通过识别该标签可以知道这部分内容为视频。“<img>”标签定义了图像,通过该标签可以知道这部分内容为图像。“<body>”标签定义了网页展示的内容。“<head>”定义了标题、字符格式、语言、兼容性、关键字、描述等信息。Specifically, for example, the "<audio>" tag defines a sound content, and by identifying the tag, it can be known that the content is audio. The "<canvas>" tag defines a graphic, and by identifying the tag, you can know that this part is a graphic. The "<video>" tag defines a video, and by identifying the tag, you can know that this part of the content is a video. The "<img>" tag defines an image by which you can know that this part of the image is an image. The "<body>" tag defines what the page displays. "<head>" defines the title, character format, language, compatibility, keywords, description, and more.
本领域技术人员应该能够理解,上述举例并非穷举,任意现有的或今后可能出现的识别至少一部分第一内容的方式,若可应用于本申请实施例,则也应包括在本申请的保护范围之内,并在此以引用的方式结合于此。Those skilled in the art should be able to understand that the above examples are not exhaustive, and any existing or future possible ways of identifying at least a portion of the first content, if applicable to the embodiments of the present application, should also be included in the protection of the present application. The scope is incorporated herein by reference.
S114:从所识别的至少一部分第一内容中,提取至少一部分第二内容。S114: Extract at least a part of the second content from the identified at least part of the first content.
本步骤可以根据预定义的HTML标签来提取部分内容,以用于待展示URL的内容的展示。This step can extract part of the content according to the predefined HTML tags for the presentation of the content of the URL to be displayed.
在一些可选的实施例中,从所识别的至少一部分第一内容中,提取至少一部分第二内容的步骤具体可以包括:In some optional embodiments, the step of extracting at least a portion of the second content from the at least one portion of the first content that is identified may specifically include:
S1142:对至少一部分第一内容进行XML序列化。S1142: Perform XML serialization on at least a part of the first content.
其中,序列化是指将数据结构或对象转换成二进制串的过程。Among them, serialization refers to the process of converting a data structure or object into a binary string.
下面以一优选实施例来详细说明序列化的过程。The process of serialization is described in detail below in a preferred embodiment.
步骤a1:使用C#.NET中的WebClient服务的OpenRead方法读取资源内容的全部HTML代码。Step a1: Read the entire HTML code of the resource content using the OpenRead method of the WebClient service in C#.NET.
其中,WebClient服务允许Win32应用程序访问Internet中的文档。该服务扩展了Windows的网络功能,它允许标准Win 32应用程序通过使用WebDAV(一种通过XML描述的文件访问协议)来创建、读取和写入Internet文件服务器上的文件。OpenRead方法是WebClient服务中的一种读取资源的方法。Among them, the WebClient service allows Win32 applications to access documents on the Internet. The service extends the networking capabilities of Windows by allowing standard Win32 applications to create, read, and write files on Internet file servers by using WebDAV, a file access protocol described by XML. The OpenRead method is a way to read resources in the WebClient service.
步骤a2:删除无关的代码。Step a2: Delete irrelevant code.
因为脚本无法被XML解析,所以,删除代码中的所有<Script></Script>之间的脚本。也就是说无法被XML解析的脚本就是无关的代码。Because the script cannot be parsed by XML, delete the script between all <Script></Script> in the code. That is to say, scripts that cannot be parsed by XML are irrelevant code.
步骤a3:删除HTML代码中的常规无关项。Step a3: Delete the general irrelevant item in the HTML code.
常规无关项就是HTML代码中与序列化无关的内容,删除这些内容并不影响将HTML代码序列化。A regular irrelevant item is content that is not related to serialization in HTML code. Deleting this content does not affect serializing the HTML code.
例如:可以删除在<title>与</title>之间,以及在<meta name="keywords"content=""/>中的内容。For example, you can delete between <title> and </title> and in <meta name="keywords"content=""/>.
步骤a4:将HTML代码全部标签化。Step a4: Label all the HTML code.
将HTML代码全部标签化的过程为:在HTML代码的起始处添加开始标签,将所有HTML代码生成标签,然后在HTML代码的结束处添加结束标签。The process of tagging HTML code is to add a start tag at the beginning of the HTML code, generate a tag for all HTML code, and then add an end tag at the end of the HTML code.
步骤a5:使用微软的XML序列化方法,使整个Html代码XML序列化,生成XML文件。Step a5: Using Microsoft's XML serialization method, serialize the entire Html code XML to generate an XML file.
本领域技术人员应该能够理解,上述举例并非穷举,任意现有的或今后可能出现的对至少一部分第一内容进行XML序列化的方式,若可应用于本申请实施例,则也应包括在本申请的保护范围之内,并在此以引用的方式结合于此。Those skilled in the art should understand that the above examples are not exhaustive, and any existing or future possible manner of XML serialization of at least a portion of the first content, if applicable to the embodiments of the present application, should also be included in It is within the scope of the present application and is hereby incorporated by reference.
S1144:从XML序列化后生成的XML文件中,提取至少一部分XML节点信息,得到至少一部分第二内容。S1144: extract at least a part of the XML node information from the XML file generated after the XML serialization, to obtain at least a part of the second content.
XML文件是由节点构成的,XML文件中的每个成分都是一个节点。示例性的,整个文件是一个文件节点,每个XML标签是一个元素节点,包含在XML元素中的文本是文本节点,每一个XML属性是一个属性节点。节点是 XML文件中有效而完整结构的最小单元。An XML file is composed of nodes, and each component in the XML file is a node. Illustratively, the entire file is a file node, each XML tag is an element node, the text contained in the XML element is a text node, and each XML attribute is a property node. A node is the smallest unit of an efficient and complete structure in an XML file.
由于XML DOM(XML Document Object Model,文档对象模型)定义了访问和操作XML文件的标准方法,因此,根据XML DOM可以建立XML文件对应的DOM树,DOM树包括一个根节点以及多个子节点,也就是说基于XML DOM可以把XML文件作为树结构来查看,可以通过访问DOM树的方式来访问XML文件中的所有节点。Since the XML DOM (XML Document Object Model) defines a standard method for accessing and manipulating XML files, the DOM tree corresponding to the XML file can be created according to the XML DOM. The DOM tree includes a root node and multiple child nodes. That is to say, based on the XML DOM, the XML file can be viewed as a tree structure, and all nodes in the XML file can be accessed by accessing the DOM tree.
在具体实现过程中,可以通过Get函数对生成的XML文件进行XML节点信息的提取,也可以选择使用XPath(XML Path Language,XML路径语言)来提取XML节点信息。In the specific implementation process, the XML function information can be extracted from the generated XML file through the Get function, or XPath (XML Path Language) can be selected to extract the XML node information.
下面举例详细说明提取XML节点信息(例如:视频)的过程。The following example details the process of extracting XML node information (eg, video).
步骤b1:从DOM树的根节点进行递归查找,找到最小的节点。Step b1: Perform a recursive lookup from the root node of the DOM tree to find the smallest node.
其中,最小的节点是最基本的单元Element,就是没有子节点的节点。Among them, the smallest node is the most basic unit Element, that is, the node without child nodes.
从DOM树的根节点进行递归查找,找到最小的节点的方式可以为:根据预设递归算法,从DOM树的根节点进行递归查找,找到最小的节点。The recursive search from the root node of the DOM tree can find the smallest node by recursively searching from the root node of the DOM tree according to the preset recursive algorithm to find the smallest node.
步骤b2:使用XML解析器对每一个最小的节点进行解析,得到字符串数据。Step b2: Parse each of the smallest nodes using an XML parser to obtain string data.
步骤b3:将该字符串数据传递至视频属性信息VideoInfo类,其中,VideoInfo类为解析视频数据的一个工具。Step b3: The string data is passed to the video attribute information VideoInfo class, wherein the VideoInfo class is a tool for parsing the video data.
步骤b4:通过连续2层提取的方式,将字符串数据定位到每个视频video类,也就是确定字符串数据所属于的视频类别。Step b4: The string data is located to each video video class by means of continuous 2-layer extraction, that is, the video category to which the string data belongs is determined.
步骤b5:将每个视频video类里的字符串数据传递给搜索视频信息Search Video Info中的动态数组ArrayList数据结构。Step b5: Pass the string data in each video video class to the dynamic array ArrayList data structure in the search video information Search Video Info.
步骤b6:将动态数组ArrayList数据结构中的字符串数据和对应的Adapter数据关联起来得到最小节点的节点信息,从而实现XML节点信息的提取。Step b6: Correlate the string data in the dynamic array ArrayList data structure with the corresponding Adapter data to obtain the node information of the minimum node, thereby implementing the extraction of the XML node information.
本领域技术人员应该能够理解,上述举例并非穷举,任意现有的或今后可能出现的提取至少一部分XML节点信息的方式,若可应用于本申请实施例, 则也应包括在本申请的保护范围之内,并在此以引用的方式结合于此。It should be understood by those skilled in the art that the above examples are not exhaustive, and any existing or future possible methods for extracting at least a part of XML node information, if applicable to the embodiments of the present application, should also be included in the protection of the present application. The scope is incorporated herein by reference.
S116:确定至少一部分第二内容的类别,将所确定的类别确定为第一内容的类别。S116: Determine at least a part of the category of the second content, and determine the determined category as the category of the first content.
在一些可选的实施例中,确定至少一部分第二内容的类别的步骤具体可以包括:利用待展示URL中的域名和HTML标签,确定至少一部分第二内容的类别。In some optional embodiments, the step of determining the category of at least a portion of the second content may specifically include: determining a category of at least a portion of the second content by using a domain name and an HTML tag in the URL to be displayed.
在一些优选的实施例中,利用待展示URL中的域名和HTML标签,确定至少一部分第二内容的类别的步骤具体可以包括:In some preferred embodiments, the step of determining the category of at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed may specifically include:
步骤c1:将待展示URL中的域名与预定的域名列表进行匹配,若匹配成功,则执行步骤c2-c4;若匹配失败,则执行步骤c5-c6;其中,预定的域名列表包括预定的域名及预定的域名对应的第一类别信息。Step c1: Match the domain name in the URL to be displayed with the predetermined domain name list. If the matching is successful, perform steps c2-c4; if the matching fails, perform steps c5-c6; wherein the predetermined domain name list includes the predetermined domain name And the first category information corresponding to the predetermined domain name.
步骤c2:提取第一类别信息。Step c2: Extract the first category information.
步骤c3:识别待展示URL中的HTML标签,确定第二类别信息。Step c3: Identify the HTML tag in the URL to be displayed, and determine the second category information.
步骤c4:根据第一类别信息和第二类别信息,确定至少一部分第二内容的类别。Step c4: Determine at least a part of the category of the second content according to the first category information and the second category information.
步骤c5:识别待展示URL中的HTML标签,确定第三类别信息。Step c5: Identify the HTML tag in the URL to be displayed, and determine the third category information.
步骤c6:根据第三类别信息,确定至少一部分第二内容的类别。Step c6: Determine at least a part of the category of the second content according to the third category information.
本步骤在具体实现过程中,可以根据待展示URL字符串识别出域名,将该域名与预定的域名列表进行匹配。In the specific implementation process, the domain name may be identified according to the URL string to be displayed, and the domain name is matched with the predetermined domain name list.
如果匹配成功,提取域名列表中该域名的属性信息中的第一类别信息,并对待展示URL中的HTML标签进行识别确定第二类别信息,最后,根据域名的属性信息中的第一类别信息和对待展示URL中的HTML标签进行识别确定的第二类别信息来确定至少一部分第二内容的类别。If the matching is successful, the first category information in the attribute information of the domain name in the domain name list is extracted, and the HTML label in the display URL is identified to determine the second category information. Finally, according to the first category information in the attribute information of the domain name, The second category information identified by the HTML tag in the display URL is determined to determine at least a portion of the category of the second content.
如果匹配失败,识别待展示URL中的HTML标签,确定第三类别信息,根据第三类别信息,确定至少一部分第二内容的类别。其中,根据第三类别信息,确定至少一部分第二内容的类别可以为:将第三类别信息对应的类别确 定为至少一部分第二内容的类别。由此,如果匹配失败,仅利用待展示URL中的HTML标签就可进行至少一部分第二内容的类别判断。If the matching fails, the HTML tag in the URL to be displayed is identified, the third category information is determined, and at least a part of the category of the second content is determined according to the third category information. The determining, according to the third category information, the category of the at least part of the second content may be: determining the category corresponding to the third category information as the category of at least a portion of the second content. Thus, if the match fails, at least a portion of the second content category determination can be made using only the HTML tags in the URL to be displayed.
以优酷网站为例,识别URL字符串中的域名,将该域名与域名列表进行匹配;如果匹配成功,确定该URL为优酷网站,则根据该URL中的域名,从预定的域名列表中提取与该域名相关的属性信息中的类别信息。其中,域名列表中与域名对应的属性信息中预定义的类别信息会注明该域名所对应的内容主要是视频;提取出视频这一类别信息,然后对HTML标签进行识别,经过识别,可以知道内容包含文字;于是,可以确定这部分内容的类别为视频结合文字的类别。如果失败,对HTML标签进行识别,经过识别,可以知道内容包含文字;于是,可以确定这部分内容的类别为文字类别。Take the Youku website as an example, identify the domain name in the URL string, and match the domain name with the domain name list; if the matching is successful, determine that the URL is a Youku website, according to the domain name in the URL, extract from the predetermined domain name list. The category information in the attribute information related to the domain name. The predefined category information in the attribute information corresponding to the domain name in the domain name list indicates that the content corresponding to the domain name is mainly a video; the category information of the video is extracted, and then the HTML label is identified, and after being identified, the content can be known. The content contains text; thus, it can be determined that the category of this part of the content is the category of the video combined text. If it fails, the HTML tag is identified, and after recognition, it can be known that the content contains text; thus, it can be determined that the category of the content is a text category.
再比如,对于淘宝网站来说,从URL字符串中提取出域名,然后将该域名与域名列表进行比较,如果匹配成功,确定该URL为淘宝网站,则从预定的域名列表中提取该域名的属性信息中预定义的图文混排这一类别信息,然后,对HTML标签进行识别,经过识别,判断出内容还包括视频,则确定这部分内容的类别为图文混排结合视频的类别。如果匹配失败,对HTML标签进行识别,经过识别,判断出内容包括视频,则确定这部分内容的类别为视频类别。For another example, for the Taobao website, the domain name is extracted from the URL string, and then the domain name is compared with the domain name list. If the matching is successful, the URL is determined to be the Taobao website, and the domain name is extracted from the predetermined domain name list. The predefined text in the attribute information is mixed with the category information, and then the HTML tag is identified. After the identification, the content is further included in the video, and the category of the content is determined to be a combination of the text and the video. If the matching fails, the HTML tag is identified, and after the content is determined to include the video, it is determined that the category of the content is a video category.
在上述实施例中,URL的组成包括:协议//域名:端口/虚拟目录/文件名?参数#锚部分。其中,可以使用IP地址作为域名。以http://www.aspxfans.com:8080/news/index.asp?boardID=5&ID=24618&page=1#name为例,www.aspxfans.com为域名。In the above embodiment, the composition of the URL includes: protocol / / domain name: port / virtual directory / file name? Parameter # anchor part. Among them, you can use the IP address as the domain name. To http://www.aspxfans.com:8080/news/index.asp? boardID=5&ID=24618&page=1#name is an example, and www.aspxfans.com is a domain name.
这里需要说明的是,在上述进行类别判断的过程中,识别域名的步骤与利用HTML标签判断内容类别的步骤可以同时进行,也可以按顺序进行,如果按顺序进行,本申请实施例不限制先后顺序,即:既可以先利用HTML标签识别内容的类别,再通过域名提取类别信息,也可以先通过域名提取类别信息,再利用HTML标签来识别内容的类别。It should be noted that, in the process of performing the category determination, the step of identifying the domain name and the step of determining the content category by using the HTML tag may be performed simultaneously, or may be performed in order, and if performed in sequence, the embodiment of the present application does not limit the sequence. The order, that is, the HTML tag can be used to identify the category of the content, and then the category information can be extracted through the domain name, or the category information can be extracted through the domain name, and the HTML tag can be used to identify the category of the content.
本领域技术人员应该能够理解,上述举例并非穷举,任意现有的或今后可能出现的对至少一部分第二内容进行分类的方式,若可应用于本申请实施例,则也应包括在本申请的保护范围之内,并在此以引用的方式结合于此。It should be understood by those skilled in the art that the above examples are not exhaustive, and any existing or future possible methods for classifying at least a portion of the second content, if applicable to the embodiments of the present application, should also be included in the present application. It is within the scope of protection and is hereby incorporated by reference.
S120:根据类别,确定内容展现形式。S120: Determine a content presentation form according to the category.
在一些可选的实施例中,本步骤具体可以包括:根据类别和待展示URL中的域名,并基于至少一部分第二内容,确定内容展现形式。In some optional embodiments, the step may specifically include: determining a content presentation form according to the category and the domain name in the URL to be displayed, and based on at least a portion of the second content.
举例来说,如果判断出类别是图文混排的类型,并且域名为www.taobao.com,而且得到的第二内容为淘宝网站上的图文内容,则可以使用大图加上文字的结构,并结合淘宝网上的图文内容来作为内容展现形式。For example, if it is judged that the category is a type of text and text, and the domain name is www.taobao.com, and the second content obtained is the graphic content on the Taobao website, the structure of the large image plus the text can be used. And combined with the graphic content of Taobao.com as a form of content presentation.
再举例来说,如果判断出类别是图文混排的类型,并且域名为www.qq.com,而且得到的第二内容为腾讯网站上的图文内容,则可以使用小图加上文字的结构,并结合腾讯网站上的图文内容来作为内容展现形式。For another example, if it is determined that the category is a type of text and text, and the domain name is www.qq.com, and the second content obtained is the graphic content on the Tencent website, the small image plus the text can be used. Structure, combined with the content of the text on the Tencent website as a form of content presentation.
本申请实施例在确定内容展现形式时,可以采用如下任意一种或几种结构形式:大图按照文字的结构、小图按照文字的结构、大视窗按照文字的结构、小视窗按照文字的结构、图片按照音频的结构等。该结构形式的排版形式可以是上下结构、左右结构、对角线结构,但绝不限于此。本领域技术人员应该能够理解,任意现有的或今后可能出现的内容展现形式若可应用于本申请实施例,则也应包括在本申请的保护范围内。并在此以引用的方式结合于此。When determining the content presentation form, the embodiment of the present application may adopt any one or several of the following structural forms: the large picture according to the structure of the text, the small picture according to the structure of the text, the large window according to the structure of the text, and the small window according to the structure of the text. , the picture follows the structure of the audio, and so on. The layout form of the structural form may be an upper and lower structure, a left and right structure, and a diagonal structure, but is not limited thereto. Those skilled in the art should be able to understand that any existing or future content presentation forms that may be applied to the embodiments of the present application are also included in the protection scope of the present application. This is hereby incorporated by reference.
本领域技术人员应该能够理解,上述举例并非穷举,任意现有的或今后可能出现的根据类别确定内容展现形式的方式,若可应用于本申请实施例,则也应包括在本申请的保护范围之内,并在此以引用的方式结合于此。Those skilled in the art should be able to understand that the above examples are not exhaustive, and any existing or future possible manner of determining the content presentation form according to the category, if applicable to the embodiments of the present application, should also be included in the protection of the present application. The scope is incorporated herein by reference.
S130:将待展示URL按照内容展现形式来进行展示。S130: Display the URL to be displayed according to the content presentation form.
举例来说,通过本步骤,可以将待展示URL字符串按照大图和文字的结构形式,以一定的布局方式进行展示。其中,布局方式可以是上下形式、左右形式、对角形式,但绝不限于此。For example, through this step, the URL string to be displayed can be displayed in a certain layout manner according to the structure of the large image and the text. The layout mode may be an up-and-down form, a left-right form, or a diagonal form, but is not limited thereto.
在实际应用中,以何种内容展现形式进行展示可以由用户自己手动选择,也可以由后台自动推送。In the actual application, the display in which content presentation form can be manually selected by the user or automatically pushed by the background.
与现有无样式可选的方法相比,本申请实施例可以智能地推荐多种展现样式,从而帮助作者提升文章的展现力。Compared with the existing non-style optional method, the embodiment of the present application can intelligently recommend a plurality of presentation styles, thereby helping the author to enhance the presentation power of the article.
本申请实施例通过获取第一内容,其中,第一内容为待展示URL对应的至少一部分资源内容;然后,确定第一内容的类别;再根据类别,确定内容展现形式;最后,将待展示URL按照内容展现形式来进行展示。由此,通过判断待展示URL对应的资源内容的类别,来确定内容展现形式,最终将待展示URL按照内容展现形式来进行展示,例如:可以在待展示URL字符串的下方,按照图文混排结合小视窗视频的内容展现形式,来对待展示URL进行展示;再比如,可以在待展示URL字符串的上方,按照大视窗视频结合文字的内容展现形式来展示待展示URL。由此美化了待展示URL,解决了如何提高URL的表现力的技术问题,增强了文章的展现力,从而可以提高对用户的吸引力度,进而可以增加用户的点击率。The embodiment of the present application obtains the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed; then, determining the category of the first content; and further determining the content presentation form according to the category; and finally, the URL to be displayed Display in the form of content presentation. Therefore, the content presentation form is determined by determining the category of the resource content corresponding to the URL to be displayed, and finally the URL to be displayed is displayed according to the content presentation form. For example, the URL string to be displayed may be mixed according to the image and text. The row is combined with the content display form of the small window video to display the display URL; for example, the URL to be displayed may be displayed in the content display form of the large window video combined with the text above the URL string to be displayed. The beautification of the URL to be displayed solves the technical problem of how to improve the expressiveness of the URL, enhances the presentation of the article, thereby increasing the appeal to the user, thereby increasing the user's click rate.
下面结合图2再以一优选实施例对本申请进行详细说明。The present application will be described in detail below with reference to FIG. 2 in a preferred embodiment.
S200:通过以下任一种或几种方式,来获取第一内容:爬虫工具、ASP网页抓取工具、Java方式、PHP方式、Delphi方式、Python方式、Flex方式或Ruby方式;其中,第一内容为URL对应的至少一部分资源内容。S200: obtaining the first content by using one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method; wherein, the first content At least a portion of the resource content corresponding to the URL.
S210:识别至少一部分第一内容。S210: Identify at least a part of the first content.
S220:对至少一部分第一内容进行XML序列化。S220: Perform XML serialization on at least a part of the first content.
S230:从XML序列化后生成的XML文件中,提取至少一部分XML节点信息,得到至少一部分第二内容。S230: extract at least a part of the XML node information from the XML file generated after the XML serialization, to obtain at least a part of the second content.
S240:利用待展示URL中的域名和HTML标签,确定至少一部分第二内容的类别,将所确定的类别确定为第一内容的类别。S240: Determine, by using a domain name and an HTML tag in the URL to be displayed, at least a part of the category of the second content, and determine the determined category as the category of the first content.
S250:根据类别和待展示URL中的域名,并基于至少一部分第二内容,确定内容展现形式。S250: Determine a content presentation form according to the domain name in the category and the URL to be displayed, and based on at least a part of the second content.
S260:将待展示URL按照该内容展现形式进行展示。S260: Display the URL to be displayed according to the content presentation form.
通过本优选实施例,美化了URL,可以帮助读者在未点击链接的前提下获取更多的信息,因而提高了URL的表现力,增强了文章的展现力,从而可以提高对用户的吸引力度,进而可以增加用户的点击率。Through the preferred embodiment, the beautification of the URL can help the reader to obtain more information without clicking the link, thereby improving the expressiveness of the URL and enhancing the presentation power of the article, thereby improving the appeal to the user. In turn, the user's click rate can be increased.
此外,为了辅助作者提升文章的展现力,本申请实施例还提供一种信息 表达方法。该方法可以包括上述任一URL展示方法实施例,具体可以包括:获取待处理文章,通过上述任一URL展示方法,对待处理文章中的URL进行展示。In addition, in order to assist the author to enhance the presentation of the article, the embodiment of the present application further provides an information expression method. The method may include any of the foregoing URL presentation method embodiments, and may specifically include: acquiring an article to be processed, and displaying the URL in the article to be processed by using any of the URL display methods described above.
本申请实施例中,获取待处理文章,然后通过URL展示方法,对待处理文章中的URL进行展示。由于通过URL展示方法能够准确地展示URL对应的资源内容,因而提高了URL的表现力,进一步增强了文章的展现力,进而可以提高对用户的吸引力度和点击率。In the embodiment of the present application, the article to be processed is obtained, and then the URL in the article to be processed is displayed through the URL display method. Since the URL display method can accurately display the resource content corresponding to the URL, the expressiveness of the URL is improved, and the presentation power of the article is further enhanced, thereby increasing the appeal and click rate of the user.
有关本实施例的说明可以参考上述任一URL展示方法实施例的说明,在此不再赘述。For the description of the embodiment, reference may be made to the description of any of the foregoing URL display method embodiments, and details are not described herein again.
需要说明的是,上述URL展示方法实施例和信息表达方法实施例中的各个步骤可以顺序执行,也可以并行执行,在此不作限定。It should be noted that the steps in the foregoing embodiment of the method for displaying the URL and the method for expressing the information may be performed in sequence or in parallel, which is not limited herein.
基于与URL展示方法实施例相同的技术构思,本申请实施例还提供一种统一资源定位符URL展示装置。该装置实施例可以执行上述方法实施例。如图3所示,该装置30可以包括:获取模块32、第一确定模块34、第二确定模块36和展示模块38。其中,获取模块32用于获取第一内容,其中,第一内容为待展示URL对应的至少一部分资源内容。第一确定模块34用于确定第一内容的类别。第二确定模块36用于根据类别,确定内容展现形式。展示模块38用于将待展示URL按照内容展现形式来进行展示。The embodiment of the present application further provides a uniform resource locator URL display device based on the same technical concept as the URL display method embodiment. The device embodiment can perform the above method embodiments. As shown in FIG. 3, the apparatus 30 can include an acquisition module 32, a first determination module 34, a second determination module 36, and a presentation module 38. The obtaining module 32 is configured to obtain the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed. The first determining module 34 is configured to determine a category of the first content. The second determining module 36 is configured to determine a content presentation form according to the category. The presentation module 38 is configured to display the URL to be displayed in a content presentation form.
在一些可选的实施例中,在图3所示实施例的基础上,获取模块具体可以包括:URL获取子单元和抓取单元。其中,该URL获取子单元用于获取待展示URL。该抓取单元用于通过以下任一种或几种方式,抓取待展示URL对应的资源内容:爬虫工具、ASP网页抓取工具、Java方式、PHP方式、Delphi方式、Python方式、Flex方式或Ruby方式;第一内容获取子单元,用于获取所述资源内容的至少一部分,作为第一内容。In some optional embodiments, on the basis of the embodiment shown in FIG. 3, the obtaining module may specifically include: a URL obtaining subunit and a crawling unit. The URL obtaining subunit is configured to obtain a URL to be displayed. The crawling unit is configured to capture the resource content corresponding to the URL to be displayed by using any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby mode; a first content acquisition subunit, configured to acquire at least a part of the resource content as the first content.
在一些可选的实施例中,在图3所示实施例的基础上,第一确定模块具体可以包括:识别单元、提取单元和分类单元。其中,识别单元用于识别至少一部分第一内容。提取单元用于从所识别的至少一部分第一内容中,提取至 少一部分第二内容。分类单元用于确定所述至少一部分第二内容的类别,将所确定的类别确定为所述第一内容的类别。In some optional embodiments, based on the embodiment shown in FIG. 3, the first determining module may specifically include: an identifying unit, an extracting unit, and a classifying unit. The identification unit is configured to identify at least a portion of the first content. The extracting unit is configured to extract at least a portion of the second content from the identified at least a portion of the first content. The classification unit is configured to determine the category of the at least a portion of the second content, and determine the determined category as the category of the first content.
在一些可选的实施例中,在上述实施例的基础上,提取单元具体可以包括:序列化单元和提取子单元。其中,序列化单元用于对至少一部分第一内容进行XML序列化。提取子单元用于从XML序列化后生成的XML文件中,提取至少一部分XML节点信息,得到所述至少一部分第二内容。In some optional embodiments, based on the foregoing embodiment, the extracting unit may specifically include: a serializing unit and an extracting subunit. The serialization unit is configured to perform XML serialization on at least a part of the first content. The extracting subunit is configured to extract at least a part of the XML node information from the XML file generated after the XML serialization to obtain the at least part of the second content.
在一些可选的实施例中,在上述实施例的基础上,分类单元具体可以包括:分类子单元。其中,分类子单元用于利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别。In some optional embodiments, based on the foregoing embodiment, the classification unit may specifically include: a classification subunit. The classification subunit is configured to determine a category of the at least a portion of the second content by using a domain name and an HTML tag in the URL to be displayed.
在一些优选的实施例中,上述分类子单元进一步具体包括:匹配单元、类别提取单元、类别确定单元和内容分类单元。其中,匹配单元用于将所述待展示URL中的域名与预定的域名列表进行匹配;其中,所述预定的域名列表包括预定的域名及所述预定的域名对应的第一类别信息;类别提取单元,用于在匹配成功的情况下,提取所述第一类别信息;类别确定单元,用于识别所述待展示URL中的HTML标签,确定第二类别信息;内容分类单元,用于根据所述第一类别信息和所述第二类别信息,确定所述至少一部分第二内容的类别。In some preferred embodiments, the foregoing classification sub-unit further includes: a matching unit, a category extracting unit, a category determining unit, and a content sorting unit. The matching unit is configured to match the domain name in the URL to be displayed with a predetermined domain name list, where the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name; a unit, configured to: when the matching is successful, extract the first category information; a category determining unit, configured to identify an HTML tag in the URL to be displayed, determine second category information; and a content classification unit, configured to Determining the category of the at least a portion of the second content by describing the first category information and the second category information.
在上述优选实施例的基础上,URL展示装置还可以包括执行单元。该执行单元用于在匹配失败的情况下,识别所述待展示URL中的HTML标签,确定第三类别信息,根据所述第三类别信息,确定所述至少一部分第二内容的类别。Based on the above preferred embodiments, the URL presentation device may further include an execution unit. The execution unit is configured to identify an HTML tag in the to-be-displayed URL, determine a third category information, and determine a category of the at least a portion of the second content according to the third category information.
在一些可选的实施例中,在上述实施例的基础上,上述第二确定模块具体可以包括:确定子单元。其中,确定子单元用于根据类别和待展示URL中的域名,并基于至少一部分第二内容,确定内容展现形式。In some optional embodiments, based on the foregoing embodiment, the foregoing second determining module may specifically include: determining a subunit. The determining subunit is configured to determine a content presentation form according to the domain name in the category and the URL to be displayed, and based on at least a portion of the second content.
此外,为了辅助作者提升文章的展现力,本申请实施例还提供一种信息表达系统。如图4所示,该系统可以包括URL展示装置和表达模块。其中,表达模块用于获取待处理文章,通过所述URL展示装置,对所述待处理文章中的URL进行展示。In addition, in order to assist the author to enhance the presentation of the article, the embodiment of the present application further provides an information expression system. As shown in FIG. 4, the system can include a URL presentation device and an expression module. The expression module is configured to obtain an article to be processed, and display the URL in the to-be-processed article by using the URL display device.
有关本实施例的说明可以参考上述任一URL展示方法实施例的说明,在此不再赘述。For the description of the embodiment, reference may be made to the description of any of the foregoing URL display method embodiments, and details are not described herein again.
本领域技术人员能够理解,上述URL展示装置实施例和信息表达系统实施例还可以包括一些公知结构,例如处理器、存储器和总线等。其中,处理器通过总线与存储器相连。处理器包括但不限于ARM、可编程逻辑器件、DSP等。存储器可以是随机存取存储器,也可以是只读存储器。总线可以包括数据总线、地址总线和控制总线。Those skilled in the art will appreciate that the above-described URL presentation apparatus embodiments and information expression system embodiments may also include some well-known structures such as a processor, a memory, a bus, and the like. The processor is connected to the memory through a bus. Processors include, but are not limited to, ARM, programmable logic devices, DSPs, and the like. The memory can be either a random access memory or a read only memory. The bus can include a data bus, an address bus, and a control bus.
需要说明的是,在上述URL展示装置实施例和信息表达系统实施例中,仅以示例性的方式对其进行了模块划分,本领域技术人员应该能够理解,还可以采用其他方式来进行模块划分,而且所划分的模块还可以再进行拆分或组合;并且,各个模块之间可以顺序执行,也可以并行执行,在此不作限定。It should be noted that, in the foregoing embodiment of the URL display device and the information expression system, the module is divided into modules in an exemplary manner, and those skilled in the art should be able to understand that other methods may be used for module division. And the divided modules can be further split or combined; and the modules can be executed sequentially or in parallel, which is not limited herein.
本申请实施例还提供了一种电子设备,如图5所示,包括:处理器501、通信接口502、存储器503和通信总线504,其中,处理器501,通信接口502,存储器503通过通信总线504完成相互间的通信,The embodiment of the present application further provides an electronic device, as shown in FIG. 5, including: a processor 501, a communication interface 502, a memory 503, and a communication bus 504, wherein the processor 501, the communication interface 502, and the memory 503 pass through a communication bus. 504 completes communication with each other,
存储器503,用于存放计算机程序;a memory 503, configured to store a computer program;
处理器501,用于执行存储器503上所存放的程序时,实现上述统一资源定位符URL展示方法步骤,方法包括:The processor 501 is configured to implement the foregoing resource locator URL display method step when the program stored on the memory 503 is executed, where the method includes:
获取第一内容,其中,所述第一内容为待展示URL对应的至少一部分资源内容;Acquiring the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;
确定所述第一内容的类别;Determining a category of the first content;
根据所述类别,确定内容展现形式;Determining a content presentation form according to the category;
将所述待展示URL按照所述内容展现形式进行展示。The URL to be displayed is displayed in the content presentation form.
在本申请的一种实现方式中,所述获取第一内容具体包括:In an implementation manner of the application, the acquiring the first content specifically includes:
获取待展示URL;Get the URL to be displayed;
通过以下任一种或几种方式,抓取所述待展示URL对应的资源内容:爬 虫工具、ASP网页抓取工具、Java方式、PHP方式、Delphi方式、Python方式、Flex方式或Ruby方式;The resource content corresponding to the URL to be displayed is captured by any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method;
获取所述资源内容的至少一部分,作为第一内容。Obtaining at least a portion of the content of the resource as the first content.
在本申请的一种实现方式中,所述确定所述第一内容的类别,具体包括:In an implementation manner of the application, the determining the category of the first content includes:
识别至少一部分所述第一内容;Identifying at least a portion of the first content;
从所识别的至少一部分第一内容中,提取至少一部分第二内容;Extracting at least a portion of the second content from the identified at least a portion of the first content;
确定所述至少一部分第二内容的类别,将所确定的类别确定为所述第一内容的类别。Determining the category of the at least a portion of the second content, and determining the determined category as the category of the first content.
在本申请的一种实现方式中,所述从所识别的至少一部分第一内容中,提取至少一部分第二内容,具体包括:In an implementation of the present application, the extracting at least a portion of the second content from the at least a portion of the first content that is identified includes:
对所述至少一部分第一内容进行XML序列化;XML serializing the at least a portion of the first content;
从XML序列化后生成的XML文件中,提取至少一部分XML节点信息,得到所述至少一部分第二内容。Extracting at least a portion of the XML node information from the XML file generated by the XML serialization to obtain the at least a portion of the second content.
在本申请的一种实现方式中,所述确定所述至少一部分第二内容的类别,具体包括:In an implementation manner of the application, the determining the category of the at least a portion of the second content includes:
利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed.
在本申请的一种实现方式中,所述利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别,具体包括:In an implementation manner of the present application, the determining, by using the domain name and the HTML tag in the URL to be displayed, the category of the at least part of the second content, specifically:
将所述待展示URL中的域名与预定的域名列表进行匹配;其中,所述预定的域名列表包括预定的域名及所述预定的域名对应的第一类别信息;Matching the domain name in the URL to be displayed with a predetermined domain name list; wherein the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;
若匹配成功,则提取所述第一类别信息;If the matching is successful, extracting the first category information;
识别所述待展示URL中的HTML标签,确定第二类别信息;Identifying an HTML tag in the URL to be displayed, and determining second category information;
根据所述第一类别信息和所述第二类别信息,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content based on the first category information and the second category information.
在本申请的一种实现方式中,所述方法还包括:In an implementation manner of the application, the method further includes:
若匹配失败,则识别所述待展示URL中的HTML标签,确定第三类别信息;If the matching fails, identifying an HTML tag in the URL to be displayed, and determining third category information;
根据所述第三类别信息,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content based on the third category information.
在本申请的一种实现方式中,所述根据所述类别,确定内容展现形式,具体包括:In an implementation manner of the application, the determining the content presentation form according to the category includes:
根据所述类别和所述待展示URL中的域名,并基于所述至少一部分第二内容,确定所述内容展现形式。Determining the content presentation form according to the category and the domain name in the URL to be displayed, and based on the at least a portion of the second content.
本申请实施例获取第一内容,其中,第一内容为待展示URL对应的至少一部分资源内容;确定第一内容的类别;根据类别,确定内容展现形式;将待展示URL按照内容展现形式来进行展示。该技术方案通过判断URL对应的资源内容的类别,来确定内容展现形式,从而通过内容展现形式来展示URL。与直接展示URL中字符的方式相比,本申请实施例提供的技术方案由于将URL按照内容展现形式来进行展示,美化了URL,因而提高了URL的表现力,增强了文章的展现力,从而可以提高对用户的吸引力度,进而可以增加用户的点击率。相比于作者主观地对URL所对应的资源内容进行概括,然后,通过这种文字概括来展示URL的方式,本申请实施例提供的技术方案进行内容展示中所涉及到的内容是客观获取的,也就是说,本申请实施例客观地获取第一内容,然后判断其类别,接着根据该类别来确定内容展现形式,最后,按照该内容展现形式客观真实地对URL进行展示;从而能够准确地展示URL对应的资源内容,因而提高了URL的表现力,增强了文章的展现力,进而可以提高对用户的吸引力度和点击率。The embodiment of the present application obtains the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed; determining the category of the first content; determining the content presentation form according to the category; and performing the URL to be displayed according to the content presentation form Show. The technical solution determines the content presentation form by judging the category of the resource content corresponding to the URL, thereby displaying the URL through the content presentation form. Compared with the manner of directly displaying the characters in the URL, the technical solution provided by the embodiment of the present application enhances the expressiveness of the URL and enhances the presentation power of the article by displaying the URL according to the content presentation form. It can increase the appeal to users, which in turn can increase the user's click rate. Compared with the author's subjectively summarizing the resource content corresponding to the URL, and then displaying the URL by the text summary, the content involved in the content presentation provided by the technical solution provided by the embodiment of the present application is objectively obtained. That is, the embodiment of the present application objectively acquires the first content, then determines the category thereof, and then determines the content presentation form according to the category. Finally, the URL is displayed objectively and realistically according to the content presentation form; thereby being able to accurately The resource content corresponding to the URL is displayed, thereby improving the expressiveness of the URL, enhancing the presentation power of the article, and thereby increasing the appeal and click rate of the user.
本申请实施例还提供了一种电子设备,包括:处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信,The embodiment of the present application further provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus.
存储器,用于存放计算机程序;a memory for storing a computer program;
处理器,用于执行存储器所存放的程序时,实现信息表达方法步骤,方法包括:The processor is configured to implement the information expression method step when the program stored in the memory is executed, and the method includes:
获取待处理文章,通过上述任一所述的URL展示方法,对所述待处理文章中的URL进行展示。Obtaining a to-be-processed article, and displaying the URL in the to-be-processed article by using the URL display method described in any of the above.
本申请实施例中,获取待处理文章,然后通过URL展示方法,对待处理文章中的URL进行展示。由于通过URL展示方法能够准确地展示URL对应的资源内容,因而提高了URL的表现力,进一步增强了文章的展现力,进而可以提高对用户的吸引力度和点击率。In the embodiment of the present application, the article to be processed is obtained, and then the URL in the article to be processed is displayed through the URL display method. Since the URL display method can accurately display the resource content corresponding to the URL, the expressiveness of the URL is improved, and the presentation power of the article is further enhanced, thereby increasing the appeal and click rate of the user.
本申请实施例还公开了一种存储介质,所述存储介质用于存储应用程序,所述应用程序用于在运行时执行统一资源定位符URL展示方法步骤,方法包括:The embodiment of the present application further discloses a storage medium, where the storage medium is used to store an application, and the application is configured to execute a uniform resource locator URL display method step at a runtime, where the method includes:
获取第一内容,其中,所述第一内容为待展示URL对应的至少一部分资源内容;Acquiring the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;
确定所述第一内容的类别;Determining a category of the first content;
根据所述类别,确定内容展现形式;Determining a content presentation form according to the category;
将所述待展示URL按照所述内容展现形式进行展示。The URL to be displayed is displayed in the content presentation form.
在本申请的一种实现方式中,所述获取第一内容具体包括:In an implementation manner of the application, the acquiring the first content specifically includes:
获取待展示URL;Get the URL to be displayed;
通过以下任一种或几种方式,抓取所述待展示URL对应的资源内容:爬虫工具、ASP网页抓取工具、Java方式、PHP方式、Delphi方式、Python方式、Flex方式或Ruby方式;The resource content corresponding to the URL to be displayed is captured by any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method;
获取所述资源内容的至少一部分,作为第一内容。Obtaining at least a portion of the content of the resource as the first content.
在本申请的一种实现方式中,所述确定所述第一内容的类别,具体包括:In an implementation manner of the application, the determining the category of the first content includes:
识别至少一部分所述第一内容;Identifying at least a portion of the first content;
从所识别的至少一部分第一内容中,提取至少一部分第二内容;Extracting at least a portion of the second content from the identified at least a portion of the first content;
确定所述至少一部分第二内容的类别,将所确定的类别确定为所述第一 内容的类别。Determining the category of the at least a portion of the second content, determining the determined category as the category of the first content.
在本申请的一种实现方式中,所述从所识别的至少一部分第一内容中,提取至少一部分第二内容,具体包括:In an implementation of the present application, the extracting at least a portion of the second content from the at least a portion of the first content that is identified includes:
对所述至少一部分第一内容进行XML序列化;XML serializing the at least a portion of the first content;
从XML序列化后生成的XML文件中,提取至少一部分XML节点信息,得到所述至少一部分第二内容。Extracting at least a portion of the XML node information from the XML file generated by the XML serialization to obtain the at least a portion of the second content.
在本申请的一种实现方式中,所述确定所述至少一部分第二内容的类别,具体包括:In an implementation manner of the application, the determining the category of the at least a portion of the second content includes:
利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed.
在本申请的一种实现方式中,所述利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别,具体包括:In an implementation manner of the present application, the determining, by using the domain name and the HTML tag in the URL to be displayed, the category of the at least part of the second content, specifically:
将所述待展示URL中的域名与预定的域名列表进行匹配;其中,所述预定的域名列表包括预定的域名及所述预定的域名对应的第一类别信息;Matching the domain name in the URL to be displayed with a predetermined domain name list; wherein the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;
若匹配成功,则提取所述第一类别信息;If the matching is successful, extracting the first category information;
识别所述待展示URL中的HTML标签,确定第二类别信息;Identifying an HTML tag in the URL to be displayed, and determining second category information;
根据所述第一类别信息和所述第二类别信息,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content based on the first category information and the second category information.
在本申请的一种实现方式中,所述方法还包括:In an implementation manner of the application, the method further includes:
若匹配失败,则识别所述待展示URL中的HTML标签,确定第三类别信息;If the matching fails, identifying an HTML tag in the URL to be displayed, and determining third category information;
根据所述第三类别信息,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content based on the third category information.
在本申请的一种实现方式中,所述根据所述类别,确定内容展现形式,具体包括:In an implementation manner of the application, the determining the content presentation form according to the category includes:
根据所述类别和所述待展示URL中的域名,并基于所述至少一部分第二内容,确定所述内容展现形式。Determining the content presentation form according to the category and the domain name in the URL to be displayed, and based on the at least a portion of the second content.
本申请实施例获取第一内容,其中,第一内容为待展示URL对应的至少一部分资源内容;确定第一内容的类别;根据类别,确定内容展现形式;将待展示URL按照内容展现形式来进行展示。该技术方案通过判断URL对应的资源内容的类别,来确定内容展现形式,从而通过内容展现形式来展示URL。与直接展示URL中字符的方式相比,本申请实施例提供的技术方案由于将URL按照内容展现形式来进行展示,美化了URL,因而提高了URL的表现力,增强了文章的展现力,从而可以提高对用户的吸引力度,进而可以增加用户的点击率。相比于作者主观地对URL所对应的资源内容进行概括,然后,通过这种文字概括来展示URL的方式,本申请实施例提供的技术方案进行内容展示中所涉及到的内容是客观获取的,也就是说,本申请实施例客观地获取第一内容,然后判断其类别,接着根据该类别来确定内容展现形式,最后,按照该内容展现形式客观真实地对URL进行展示;从而能够准确地展示URL对应的资源内容,因而提高了URL的表现力,增强了文章的展现力,进而可以提高对用户的吸引力度和点击率。The embodiment of the present application obtains the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed; determining the category of the first content; determining the content presentation form according to the category; and performing the URL to be displayed according to the content presentation form Show. The technical solution determines the content presentation form by judging the category of the resource content corresponding to the URL, thereby displaying the URL through the content presentation form. Compared with the manner of directly displaying the characters in the URL, the technical solution provided by the embodiment of the present application enhances the expressiveness of the URL and enhances the presentation power of the article by displaying the URL according to the content presentation form. It can increase the appeal to users, which in turn can increase the user's click rate. Compared with the author's subjectively summarizing the resource content corresponding to the URL, and then displaying the URL by the text summary, the content involved in the content presentation provided by the technical solution provided by the embodiment of the present application is objectively obtained. That is, the embodiment of the present application objectively acquires the first content, then determines the category thereof, and then determines the content presentation form according to the category. Finally, the URL is displayed objectively and realistically according to the content presentation form; thereby being able to accurately The resource content corresponding to the URL is displayed, thereby improving the expressiveness of the URL, enhancing the presentation power of the article, and thereby increasing the appeal and click rate of the user.
本申请实施例还公开了一种存储介质,所述存储介质用于存储应用程序,所述应用程序用于在运行时执行信息表达方法步骤,方法包括:The embodiment of the present application further discloses a storage medium, where the storage medium is used to store an application, and the application is configured to execute an information expression method step at a runtime, where the method includes:
获取待处理文章,通过上述任一所述的URL展示方法,对所述待处理文章中的URL进行展示。Obtaining a to-be-processed article, and displaying the URL in the to-be-processed article by using the URL display method described in any of the above.
本申请实施例中,获取待处理文章,然后通过URL展示方法,对待处理文章中的URL进行展示。由于通过URL展示方法能够准确地展示URL对应的资源内容,因而提高了URL的表现力,进一步增强了文章的展现力,进而可以提高对用户的吸引力度和点击率。In the embodiment of the present application, the article to be processed is obtained, and then the URL in the article to be processed is displayed through the URL display method. Since the URL display method can accurately display the resource content corresponding to the URL, the expressiveness of the URL is improved, and the presentation power of the article is further enhanced, thereby increasing the appeal and click rate of the user.
本申请实施例公开了一种应用程序,所述应用程序用于在运行时执行统一资源定位符URL展示方法,方法包括:The embodiment of the present application discloses an application, where the application is used to execute a uniform resource locator URL display method at runtime, and the method includes:
获取第一内容,其中,所述第一内容为待展示URL对应的至少一部分资源内容;Acquiring the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;
确定所述第一内容的类别;Determining a category of the first content;
根据所述类别,确定内容展现形式;Determining a content presentation form according to the category;
将所述待展示URL按照所述内容展现形式进行展示。The URL to be displayed is displayed in the content presentation form.
在本申请的一种实现方式中,所述获取第一内容具体包括:In an implementation manner of the application, the acquiring the first content specifically includes:
获取待展示URL;Get the URL to be displayed;
通过以下任一种或几种方式,抓取所述待展示URL对应的资源内容:爬虫工具、ASP网页抓取工具、Java方式、PHP方式、Delphi方式、Python方式、Flex方式或Ruby方式;The resource content corresponding to the URL to be displayed is captured by any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method;
获取所述资源内容的至少一部分,作为第一内容。Obtaining at least a portion of the content of the resource as the first content.
在本申请的一种实现方式中,所述确定所述第一内容的类别,具体包括:In an implementation manner of the application, the determining the category of the first content includes:
识别至少一部分所述第一内容;Identifying at least a portion of the first content;
从所识别的至少一部分第一内容中,提取至少一部分第二内容;Extracting at least a portion of the second content from the identified at least a portion of the first content;
确定所述至少一部分第二内容的类别,将所确定的类别确定为所述第一内容的类别。Determining the category of the at least a portion of the second content, and determining the determined category as the category of the first content.
在本申请的一种实现方式中,所述从所识别的至少一部分第一内容中,提取至少一部分第二内容,具体包括:In an implementation of the present application, the extracting at least a portion of the second content from the at least a portion of the first content that is identified includes:
对所述至少一部分第一内容进行XML序列化;XML serializing the at least a portion of the first content;
从XML序列化后生成的XML文件中,提取至少一部分XML节点信息,得到所述至少一部分第二内容。Extracting at least a portion of the XML node information from the XML file generated by the XML serialization to obtain the at least a portion of the second content.
在本申请的一种实现方式中,所述确定所述至少一部分第二内容的类别,具体包括:In an implementation manner of the application, the determining the category of the at least a portion of the second content includes:
利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed.
在本申请的一种实现方式中,所述利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别,具体包括:In an implementation manner of the present application, the determining, by using the domain name and the HTML tag in the URL to be displayed, the category of the at least part of the second content, specifically:
将所述待展示URL中的域名与预定的域名列表进行匹配;其中,所述预 定的域名列表包括预定的域名及所述预定的域名对应的第一类别信息;Matching the domain name in the URL to be displayed with a predetermined domain name list; wherein the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;
若匹配成功,则提取所述第一类别信息;If the matching is successful, extracting the first category information;
识别所述待展示URL中的HTML标签,确定第二类别信息;Identifying an HTML tag in the URL to be displayed, and determining second category information;
根据所述第一类别信息和所述第二类别信息,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content based on the first category information and the second category information.
在本申请的一种实现方式中,所述方法还包括:In an implementation manner of the application, the method further includes:
若匹配失败,则识别所述待展示URL中的HTML标签,确定第三类别信息;If the matching fails, identifying an HTML tag in the URL to be displayed, and determining third category information;
根据所述第三类别信息,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content based on the third category information.
在本申请的一种实现方式中,所述根据所述类别,确定内容展现形式,具体包括:In an implementation manner of the application, the determining the content presentation form according to the category includes:
根据所述类别和所述待展示URL中的域名,并基于所述至少一部分第二内容,确定所述内容展现形式。Determining the content presentation form according to the category and the domain name in the URL to be displayed, and based on the at least a portion of the second content.
本申请实施例获取第一内容,其中,第一内容为待展示URL对应的至少一部分资源内容;确定第一内容的类别;根据类别,确定内容展现形式;将待展示URL按照内容展现形式来进行展示。该技术方案通过判断URL对应的资源内容的类别,来确定内容展现形式,从而通过内容展现形式来展示URL。与直接展示URL中字符的方式相比,本申请实施例提供的技术方案由于将URL按照内容展现形式来进行展示,美化了URL,因而提高了URL的表现力,增强了文章的展现力,从而可以提高对用户的吸引力度,进而可以增加用户的点击率。相比于作者主观地对URL所对应的资源内容进行概括,然后,通过这种文字概括来展示URL的方式,本申请实施例提供的技术方案进行内容展示中所涉及到的内容是客观获取的,也就是说,本申请实施例客观地获取第一内容,然后判断其类别,接着根据该类别来确定内容展现形式,最后,按照该内容展现形式客观真实地对URL进行展示;从而能够准确地展示URL对应的资源内容,因而提高了URL的表现力,增强了文章的展现力,进而可以提高对用户的吸引力度和点击率。The embodiment of the present application obtains the first content, where the first content is at least a part of the resource content corresponding to the URL to be displayed; determining the category of the first content; determining the content presentation form according to the category; and performing the URL to be displayed according to the content presentation form Show. The technical solution determines the content presentation form by judging the category of the resource content corresponding to the URL, thereby displaying the URL through the content presentation form. Compared with the manner of directly displaying the characters in the URL, the technical solution provided by the embodiment of the present application enhances the expressiveness of the URL and enhances the presentation power of the article by displaying the URL according to the content presentation form. It can increase the appeal to users, which in turn can increase the user's click rate. Compared with the author's subjectively summarizing the resource content corresponding to the URL, and then displaying the URL by the text summary, the content involved in the content presentation provided by the technical solution provided by the embodiment of the present application is objectively obtained. That is, the embodiment of the present application objectively acquires the first content, then determines the category thereof, and then determines the content presentation form according to the category. Finally, the URL is displayed objectively and realistically according to the content presentation form; thereby being able to accurately The resource content corresponding to the URL is displayed, thereby improving the expressiveness of the URL, enhancing the presentation power of the article, and thereby increasing the appeal and click rate of the user.
本申请实施例公开了一种应用程序,所述应用程序用于在运行时执行信 息表达方法步骤,方法包括:The embodiment of the present application discloses an application program for executing an information expression method step at runtime, where the method includes:
获取待处理文章,通过上述任一所述的URL展示方法,对所述待处理文章中的URL进行展示。Obtaining a to-be-processed article, and displaying the URL in the to-be-processed article by using the URL display method described in any of the above.
本申请实施例中,获取待处理文章,然后通过URL展示方法,对待处理文章中的URL进行展示。由于通过URL展示方法能够准确地展示URL对应的资源内容,因而提高了URL的表现力,进一步增强了文章的展现力,进而可以提高对用户的吸引力度和点击率。In the embodiment of the present application, the article to be processed is obtained, and then the URL in the article to be processed is displayed through the URL display method. Since the URL display method can accurately display the resource content corresponding to the URL, the expressiveness of the URL is improved, and the presentation power of the article is further enhanced, thereby increasing the appeal and click rate of the user.
还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that, in this context, relational terms such as first and second, etc. are used merely to distinguish one entity or operation from another entity or operation, without necessarily requiring or implying such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in the present specification are described in a related manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
以上所述仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。The above description is only the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are included in the scope of the present application.

Claims (21)

  1. 一种统一资源定位符URL展示方法,其特征在于,所述方法包括:A uniform resource locator URL display method, the method comprising:
    获取第一内容,其中,所述第一内容为待展示URL对应的至少一部分资源内容;Acquiring the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;
    确定所述第一内容的类别;Determining a category of the first content;
    根据所述类别,确定内容展现形式;Determining a content presentation form according to the category;
    将所述待展示URL按照所述内容展现形式进行展示。The URL to be displayed is displayed in the content presentation form.
  2. 根据权利要求1所述的方法,其特征在于,所述获取第一内容具体包括:The method according to claim 1, wherein the obtaining the first content specifically comprises:
    获取待展示URL;Get the URL to be displayed;
    通过以下任一种或几种方式,抓取所述待展示URL对应的资源内容:爬虫工具、ASP网页抓取工具、Java方式、PHP方式、Delphi方式、Python方式、Flex方式或Ruby方式;The resource content corresponding to the URL to be displayed is captured by any one or more of the following methods: a crawler tool, an ASP web crawler, a Java method, a PHP method, a Delphi method, a Python method, a Flex method, or a Ruby method;
    获取所述资源内容的至少一部分,作为第一内容。Obtaining at least a portion of the content of the resource as the first content.
  3. 根据权利要求1所述的方法,其特征在于,所述确定所述第一内容的类别,具体包括:The method according to claim 1, wherein the determining the category of the first content comprises:
    识别至少一部分所述第一内容;Identifying at least a portion of the first content;
    从所识别的至少一部分第一内容中,提取至少一部分第二内容;Extracting at least a portion of the second content from the identified at least a portion of the first content;
    确定所述至少一部分第二内容的类别,将所确定的类别确定为所述第一内容的类别。Determining the category of the at least a portion of the second content, and determining the determined category as the category of the first content.
  4. 根据权利要求3所述的方法,其特征在于,所述从所识别的至少一部分第一内容中,提取至少一部分第二内容,具体包括:The method according to claim 3, wherein the extracting at least a portion of the second content from the identified at least a portion of the first content comprises:
    对所述至少一部分第一内容进行XML序列化;XML serializing the at least a portion of the first content;
    从XML序列化后生成的XML文件中,提取至少一部分XML节点信息,得到所述至少一部分第二内容。Extracting at least a portion of the XML node information from the XML file generated by the XML serialization to obtain the at least a portion of the second content.
  5. 根据权利要求4所述的方法,其特征在于,所述确定所述至少一部分第二内容的类别,具体包括:The method according to claim 4, wherein the determining the category of the at least a portion of the second content comprises:
    利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed.
  6. 根据权利要求5所述的方法,其特征在于,所述利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别,具体包括:The method according to claim 5, wherein the determining the category of the at least a portion of the second content by using the domain name and the HTML tag in the URL to be displayed includes:
    将所述待展示URL中的域名与预定的域名列表进行匹配;其中,所述预定的域名列表包括预定的域名及所述预定的域名对应的第一类别信息;Matching the domain name in the URL to be displayed with a predetermined domain name list; wherein the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;
    若匹配成功,则提取所述第一类别信息;If the matching is successful, extracting the first category information;
    识别所述待展示URL中的HTML标签,确定第二类别信息;Identifying an HTML tag in the URL to be displayed, and determining second category information;
    根据所述第一类别信息和所述第二类别信息,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content based on the first category information and the second category information.
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:The method of claim 6 wherein the method further comprises:
    若匹配失败,则识别所述待展示URL中的HTML标签,确定第三类别信息;If the matching fails, identifying an HTML tag in the URL to be displayed, and determining third category information;
    根据所述第三类别信息,确定所述至少一部分第二内容的类别。Determining the category of the at least a portion of the second content based on the third category information.
  8. 根据权利要求3所述的方法,其特征在于,所述根据所述类别,确定内容展现形式,具体包括:The method according to claim 3, wherein the determining the content presentation form according to the category comprises:
    根据所述类别和所述待展示URL中的域名,并基于所述至少一部分第二内容,确定所述内容展现形式。Determining the content presentation form according to the category and the domain name in the URL to be displayed, and based on the at least a portion of the second content.
  9. 一种信息表达方法,其特征在于,所述方法包括:An information expression method, the method comprising:
    获取待处理文章,通过权利要求1-8中任一所述的URL展示方法,对所述待处理文章中的URL进行展示。Obtaining a pending article, and displaying the URL in the to-be-processed article by the URL display method according to any one of claims 1-8.
  10. 一种统一资源定位符URL展示装置,其特征在于,所述装置包括:A uniform resource locator URL display device, characterized in that the device comprises:
    获取模块,用于获取第一内容,其中,所述第一内容为待展示URL对应的至少一部分资源内容;An acquiring module, configured to acquire the first content, where the first content is at least a part of resource content corresponding to the URL to be displayed;
    第一确定模块,用于确定所述第一内容的类别;a first determining module, configured to determine a category of the first content;
    第二确定模块,用于根据所述类别,确定内容展现形式;a second determining module, configured to determine a content presentation form according to the category;
    展示模块,用于将所述待展示URL按照所述内容展现形式进行展示。And a display module, configured to display the to-be-displayed URL according to the content presentation form.
  11. 根据权利要求10所述的装置,其特征在于,所述获取模块具体包括:The device according to claim 10, wherein the obtaining module specifically comprises:
    URL获取子单元,用于获取待展示URL;a URL acquisition subunit for obtaining a URL to be displayed;
    抓取单元,用于通过以下任一种或几种方式,抓取所述待展示URL对应的资源内容:爬虫工具、ASP网页抓取工具、Java方式、PHP方式、Delphi方式、Python方式、Flex方式或Ruby方式;The crawling unit is configured to capture the resource content corresponding to the URL to be displayed by using any one or more of the following methods: a crawler tool, an ASP web crawling tool, a Java mode, a PHP mode, a Delphi mode, a Python mode, and a Flex Way or Ruby way;
    第一内容获取子单元,用于获取所述资源内容的至少一部分,作为第一内容。The first content acquisition subunit is configured to acquire at least a part of the resource content as the first content.
  12. 根据权利要求10所述的装置,其特征在于,所述第一确定模块具体包括:The device according to claim 10, wherein the first determining module specifically comprises:
    识别单元,用于识别至少一部分所述第一内容;An identification unit, configured to identify at least a portion of the first content;
    提取单元,用于从所识别的至少一部分第一内容中,提取至少一部分第二内容;An extracting unit, configured to extract at least a part of the second content from the identified at least part of the first content;
    分类单元,用于确定所述至少一部分第二内容的类别,将所确定的类别确定为所述第一内容的类别。a classifying unit, configured to determine a category of the at least a portion of the second content, and determine the determined category as a category of the first content.
  13. 根据权利要求12所述的装置,其特征在于,所述提取单元具体包括:The device according to claim 12, wherein the extracting unit specifically comprises:
    序列化单元,用于对所述至少一部分第一内容进行XML序列化;a serializing unit, configured to perform XML serialization on the at least a part of the first content;
    提取子单元,用于从XML序列化后生成的XML文件中,提取至少一部分XML节点信息,得到所述至少一部分第二内容。And an extraction subunit, configured to extract at least a part of the XML node information from the XML file generated after the XML serialization, to obtain the at least part of the second content.
  14. 根据权利要求13所述的装置,其特征在于,所述分类单元具体包括:The device according to claim 13, wherein the classifying unit specifically comprises:
    分类子单元,用于利用所述待展示URL中的域名和HTML标签,确定所述至少一部分第二内容的类别。a classification subunit, configured to determine a category of the at least a portion of the second content by using a domain name and an HTML tag in the URL to be displayed.
  15. 根据权利要求14所述的装置,其特征在于,所述分类子单元具体包 括:The apparatus according to claim 14, wherein the classification subunit specifically comprises:
    匹配单元,用于将所述待展示URL中的域名与预定的域名列表进行匹配;其中,所述预定的域名列表包括预定的域名及所述预定的域名对应的第一类别信息;a matching unit, configured to match a domain name in the URL to be displayed with a predetermined domain name list, where the predetermined domain name list includes a predetermined domain name and first category information corresponding to the predetermined domain name;
    类别提取单元,用于在匹配成功的情况下,提取所述第一类别信息;a class extracting unit, configured to extract the first category information if the matching is successful;
    类别确定单元,用于识别所述待展示URL中的HTML标签,确定第二类别信息;a category determining unit, configured to identify an HTML tag in the URL to be displayed, and determine second category information;
    内容分类单元,用于根据所述第一类别信息和所述第二类别信息,确定所述至少一部分第二内容的类别。a content classification unit, configured to determine, according to the first category information and the second category information, a category of the at least a portion of the second content.
  16. 根据权利要求15所述的装置,其特征在于,所述装置还包括:The device according to claim 15, wherein the device further comprises:
    执行单元,用于在匹配失败的情况下,识别所述待展示URL中的HTML标签,确定第三类别信息,根据所述第三类别信息,确定所述至少一部分第二内容的类别。And an execution unit, configured to identify an HTML tag in the to-be-displayed URL, determine a third category information, and determine a category of the at least a portion of the second content according to the third category information.
  17. 根据权利要求12所述的装置,其特征在于,所述第二确定模块具体包括:The device according to claim 12, wherein the second determining module specifically comprises:
    确定子单元,用于根据所述类别和所述待展示URL中的域名,并基于所述至少一部分第二内容,确定所述内容展现形式。Determining a subunit, configured to determine the content presentation form based on the category and the domain name in the to-be-displayed URL, and based on the at least a portion of the second content.
  18. 一种信息表达系统,其特征在于,所述系统包括:An information expression system, characterized in that the system comprises:
    权利要求10-17中任一所述的URL展示装置;A URL display device as claimed in any one of claims 10-17;
    表达模块,用于获取待处理文章,通过所述URL展示装置,对所述待处理文章中的URL进行展示。And an expression module, configured to obtain a to-be-processed article, and display the URL in the to-be-processed article by using the URL display device.
  19. 一种电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;An electronic device, comprising: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus;
    存储器,用于存放计算机程序;a memory for storing a computer program;
    处理器,用于执行存储器上所存放的程序时,实现权利要求1-8任一所述的方法,或,实现权利要求9所述的方法。The processor, when executed to execute a program stored on the memory, implements the method of any of claims 1-8, or implements the method of claim 9.
  20. 一种存储介质,其特征在于,所述存储介质用于存储应用程序,所述应用程序用于在运行时执行权利要求1-8任一所述的方法,或,实现权利要求9所述的方法。A storage medium, wherein the storage medium is for storing an application, the application for performing the method of any one of claims 1-8 at runtime, or implementing the method of claim 9. method.
  21. 一种应用程序,其特征在于,所述应用程序用于在运行时执行权利要求1-8任一所述的方法,或,实现权利要求9所述的方法。An application, characterized in that the application is operative to perform the method of any of claims 1-8 at runtime or to implement the method of claim 9.
PCT/CN2018/088438 2017-05-26 2018-05-25 Uniform resource locator display method, information expression method and related product WO2018214964A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710385155.5A CN108959325B (en) 2017-05-26 2017-05-26 Uniform resource locator display method, information display method and related products thereof
CN201710385155.5 2017-05-26

Publications (1)

Publication Number Publication Date
WO2018214964A1 true WO2018214964A1 (en) 2018-11-29

Family

ID=64396255

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/088438 WO2018214964A1 (en) 2017-05-26 2018-05-25 Uniform resource locator display method, information expression method and related product

Country Status (2)

Country Link
CN (1) CN108959325B (en)
WO (1) WO2018214964A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GR1010585B (en) * 2022-11-10 2023-12-12 Παναγιωτης Τσαντιλας Web crawling and content summarization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150356196A1 (en) * 2014-06-04 2015-12-10 International Business Machines Corporation Classifying uniform resource locators
CN105512149A (en) * 2014-10-14 2016-04-20 阿里巴巴集团控股有限公司 Method for updating and displaying keyword information of digital reading resource and relevant devices
CN106033428A (en) * 2015-03-11 2016-10-19 北大方正集团有限公司 A uniform resource locator selecting method and a uniform resource locator selecting device
CN106095453A (en) * 2016-06-16 2016-11-09 北京金山安全软件有限公司 Information display method and device and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544178B (en) * 2012-07-13 2019-04-12 百度在线网络技术(北京)有限公司 It is a kind of for providing the method and apparatus of reconstruction page corresponding with target pages
CN103258058B (en) * 2013-06-03 2016-09-21 贝壳网际(北京)安全技术有限公司 Page display method and system and browser
CN104468720B (en) * 2014-11-07 2019-04-26 广州市至德科技企业孵化器有限公司 A kind of determining preview link simultaneously provides it method of dynamic previewing information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150356196A1 (en) * 2014-06-04 2015-12-10 International Business Machines Corporation Classifying uniform resource locators
CN105512149A (en) * 2014-10-14 2016-04-20 阿里巴巴集团控股有限公司 Method for updating and displaying keyword information of digital reading resource and relevant devices
CN106033428A (en) * 2015-03-11 2016-10-19 北大方正集团有限公司 A uniform resource locator selecting method and a uniform resource locator selecting device
CN106095453A (en) * 2016-06-16 2016-11-09 北京金山安全软件有限公司 Information display method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GR1010585B (en) * 2022-11-10 2023-12-12 Παναγιωτης Τσαντιλας Web crawling and content summarization

Also Published As

Publication number Publication date
CN108959325A (en) 2018-12-07
CN108959325B (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN109033358B (en) Method for associating news aggregation with intelligent entity
US9529780B2 (en) Displaying content on a mobile device
US9330179B2 (en) Configuring web crawler to extract web page information
WO2015039586A1 (en) Method, apparatus and browser for webpage loading
US9614862B2 (en) System and method for webpage analysis
US8887039B2 (en) Web page based program versioning
WO2016173200A1 (en) Malicious website detection method and system
WO2015043383A1 (en) Webpage loading method and device and browser
WO2015196954A1 (en) Webpage element display method and browser device
US9934206B2 (en) Method and apparatus for extracting web page content
CN107153716B (en) Webpage content extraction method and device
WO2011017929A1 (en) Method and apparatus for positioning effective information quickly by mobile phone browser
WO2014153457A1 (en) Merging web page style addresses
US20180239834A1 (en) Data transmission method and device
CN111339456B (en) Preloading method and device
US20130268832A1 (en) Method and system for creating digital bookmarks
JP2008197877A (en) Security operation management system, method and program
CN109558123B (en) Method for converting webpage into electronic book, electronic equipment and storage medium
CN114003835A (en) Page rendering method, device, equipment and storage medium
WO2018214964A1 (en) Uniform resource locator display method, information expression method and related product
US9521182B1 (en) Systems and methods related to identifying authorship of internet content
JPH10289250A (en) System for url registration and display for www browser
CN110764994A (en) Page element packaging method and device, electronic equipment and storage medium
Behfarshad et al. Hidden-web induced by client-side scripting: An empirical study
CN110147477B (en) Data resource modeling extraction method, device and equipment of Web system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18805106

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC , EPO FORM 1205A DATED 05.03.2020.

122 Ep: pct application non-entry in european phase

Ref document number: 18805106

Country of ref document: EP

Kind code of ref document: A1