CN108959325A - Uniform resource locator methods of exhibiting, information expression method and its Related product - Google Patents

Uniform resource locator methods of exhibiting, information expression method and its Related product Download PDF

Info

Publication number
CN108959325A
CN108959325A CN201710385155.5A CN201710385155A CN108959325A CN 108959325 A CN108959325 A CN 108959325A CN 201710385155 A CN201710385155 A CN 201710385155A CN 108959325 A CN108959325 A CN 108959325A
Authority
CN
China
Prior art keywords
content
url
classification
domain name
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710385155.5A
Other languages
Chinese (zh)
Other versions
CN108959325B (en
Inventor
周显
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Kingsoft Mobile Technology Co Ltd
Guangzhou Jinshan Mobile Technology Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Jinshan Mobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd, Guangzhou Jinshan Mobile Technology Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN201710385155.5A priority Critical patent/CN108959325B/en
Priority to PCT/CN2018/088438 priority patent/WO2018214964A1/en
Publication of CN108959325A publication Critical patent/CN108959325A/en
Application granted granted Critical
Publication of CN108959325B publication Critical patent/CN108959325B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention provides a kind of uniform resource position mark URL methods of exhibiting, information expression method, uniform resource position mark URLs to show device and knowledge representation systems.Wherein, which may include: acquisition first content, wherein first content is the corresponding at least part resource content of the URL;Determine the classification of first content;According to classification, determine that content shows form;The URL is equipped with content shows form and is shown.Wherein, determine that content shows form and specifically includes according to classification: according to the domain name in classification and URL, and based on the second content of at least part, to determine that content shows form.The embodiment of the present invention solves the technical issues of how improving URL expressive force as a result, has beautified URL, thus improves the expressive force of URL, enhances the power that shows of article, so as to improve the appeal to user, and then can increase the clicking rate of user.

Description

Uniform resource locator methods of exhibiting, information expression method and its Related product
Technical field
The present invention relates to Internet technical fields, more particularly to a kind of uniform resource locator methods of exhibiting, information table Device and knowledge representation systems are shown up to method, uniform resource locator.
Background technique
Currently, author publishes an article often through internet.When author introduces website, video, song in its article Or when quoting other articles, corresponding URL (uniform resource locator) can be added into the article oneself delivered, It is exactly the hyperlink that we usually see.In this way, user clicks directly on the URL i.e. hyperlink when reading this article, can jump Corresponding webpage is gone to, is browsed.
Specifically, there are mainly two types of the existing modes for showing URL: first way is the character string directly by URL It is written in article;The second way is that URL is arranged to text hyperlink, and then, author is according to the understanding of oneself by this URL pairs The resource content answered is described in the form of text.
For first way, this mode that URL is directly shown in article, only actually is directly to show URL character string itself, and URL character string has no any content information for a user, thus it can not clearly show Wish to show the main resource content of user after URL.So this exhibition method makes, URL expressive force is poor, lacks attraction Degree.For the second way, although it attempts to facilitate user to understand and user is attracted to click;But due to verbal description It is the subjective summary of author, so, in itself, the verbal description of this shown URL of mode has become article A part, there is no objective reality shows URL.Again because literal expression has limitation, and the table with author Danone power is related;So the expressive force of this URL exhibition method is not also strong.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of uniform resource position mark URL methods of exhibiting, at least to solve such as What improves the technical issues of URL expressive force.In addition, also providing a kind of information expression method, uniform resource position mark URL shows dress It sets and knowledge representation systems.
To achieve the goals above, according to an aspect of the present invention, the following technical schemes are provided:
A kind of uniform resource position mark URL methods of exhibiting, this method may include:
Obtain first content, wherein the first content is the corresponding at least part resource content of the URL;
Determine the classification of the first content;
According to the classification, determine that content shows form;
The URL is equipped with the content shows form and is shown.
Further, the acquisition first content can specifically include:
By following any one or several modes, to grab the corresponding resource content of the URL: reptile instrument, ASP webpage Gripping tool, Java mode, PHP mode, Delphi mode, Python mode, Flex mode or Ruby mode.
Further, the classification of the determination first content, can specifically include:
Identification at least part first content;
Extract the second content of at least part, wherein second content is at least part first content;
Classify to second content of at least part, so that it is determined that the classification of the first content.
Further, the second content of described extraction at least part, can specifically include:
XML serialization is carried out at least part first content;
By extracting at least part XML node information, to realize the extraction of second content of at least part.
Further, described to classify to second content of at least part, it can specifically include:
Using the domain name and html tag in the URL, to classify to second content of at least part.
Further, the domain name and html tag using in the URL, to second content of at least part Classify, can specifically include:
Domain name is matched with scheduled domain name list;Wherein, the scheduled domain name list includes scheduled Domain name and its first category information;
If successful match, the first category information is extracted;
It identifies the html tag, determines second category information;
According to the first category information and the second category information, second content of at least part is carried out Classification.
Further, the method can also include:
If it fails to match, the identification html tag step is executed.
Further, described according to the classification, it determines that content shows form, can specifically include:
According to the domain name in the classification and the URL, and it is based on second content of at least part, it is described to determine Content shows form.
To achieve the goals above, according to another aspect of the present invention, following technical scheme is additionally provided:
A kind of information expression method, this method may include:
By above-mentioned URL methods of exhibiting, to express the URL in article.
To achieve the goals above, according to a further aspect of the invention, following technical scheme is additionally provided:
A kind of uniform resource position mark URL displaying device, the apparatus may include:
Module is obtained, for obtaining first content, wherein the first content is the corresponding at least part of the URL Resource content;
First determining module, for determining the classification of the first content;
Second determining module, for determining that content shows form according to the classification;
Display module shows form for the URL to be equipped with the content and is shown.
Further, the acquisition module can specifically include:
URL obtains subelement, for obtaining URL;
Picking unit is used for through following any one or several modes, to grab the corresponding resource content of the URL, from And obtain the first content: reptile instrument, ASP webpage capture tool, Java mode, PHP mode, Delphi mode, Python mode, Flex mode or Ruby mode.
Further, first determining module can specifically include:
Recognition unit, for identification at least part first content;
Extraction unit, for extracting the second content of at least part, wherein second content is described at least part First content;
Taxon, for classifying to second content of at least part, so that it is determined that the first content Classification.
Further, the extraction unit can specifically include:
Serialization unit, for carrying out XML serialization at least part first content;
Subelement is extracted, at least part XML node information is extracted for passing through, to realize described at least part second The extraction of content.
Further, the taxon can specifically include:
Classification subelement, for utilizing domain name and html tag in the URL, in described at least part second Appearance is classified.
Further, the classification subelement can specifically include:
Matching unit, for matching domain name with scheduled domain name list;Wherein, the scheduled domain name column Table includes scheduled domain name and its first category information;
Classification extraction unit, for extracting the first category information in the case where successful match;
Classification determination unit, the html tag, determines second category information for identification;
Classifying content unit, for according to the first category information and the second category information, to it is described at least A part of second content is classified.
Further, described device can also include:
Execution unit, for triggering the classification determination unit in the case where it fails to match.
Further, second determining module can specifically include:
It determines subelement, for according to the domain name in the classification and the URL, and is based on described at least part second Content, to determine that the content shows form.
To achieve the goals above, according to a further aspect of the invention, following technical scheme is additionally provided:
A kind of knowledge representation systems, the system may include:
Above-mentioned URL shows device;
Module is expressed, for showing device by the URL, to express the URL in article.
It is fixed that the embodiment of the present invention provides a kind of uniform resource position mark URL methods of exhibiting, information expression method, unified resource Position symbol URL shows device and knowledge representation systems.Wherein, which may include: to obtain First content, wherein first content is the corresponding at least part resource content of the URL;Determine the classification of first content;According to Classification determines that content shows form;The URL is equipped with content shows form and is shown.The technical solution is by judging URL The classification of corresponding resource content, to determine that content shows form, to show form by content to show URL.With it is direct Show that the mode of character in URL is compared, technical solution provided in an embodiment of the present invention shows form since URL is equipped with content Be shown, beautified URL, thus improve the expressive force of URL, enhance the power that shows of article, so as to improve to The appeal at family, and then the clicking rate of user can be increased.Compared to author subjectively to resource content corresponding to URL into Row is summarized, then, this text summarization come in by way of showing URL, technical solution provided in an embodiment of the present invention is carried out Holding content involved in showing is objective acquisition, that is to say, that the embodiment of the present invention objectively obtains first content, so After judge its classification, determine that content shows form then according to the category, finally, being equipped with the content shows form objective reality Ground is shown URL;So as to accurately show the corresponding resource content of URL, thus the expressive force of URL is improved, increased Strong article shows power, and then the appeal and clicking rate to user can be improved.
Certainly, it implements any of the products of the present invention or method does not need necessarily to reach all the above excellent simultaneously Point.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the flow diagram according to the URL methods of exhibiting of the embodiment of the present invention;
Fig. 2 is the flow diagram according to the URL methods of exhibiting of another embodiment of the present invention;
Fig. 3 is the structural schematic diagram that device is shown according to the URL of the embodiment of the present invention;
Fig. 4 is the structural schematic diagram according to the knowledge representation systems of the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the technical characteristic in various embodiments of the present invention can be combined with each other.
Term explanation
Content shows form and refers to allow the various content ways of presentation that content is more easily understood and is showed, example Such as: video, audio, picture and text mixing, map, text, but it is not limited to this.
Text mark up language (HTML, Hypertext Markup Language) is for describing web document.HTML passes through Label symbol marks the various pieces in webpage to be shown.Web page files itself are a kind of text files, by text Marker character is added in this document, can tell how browser shows content therein, such as: how text is handled, and picture is such as What is arranged, and how picture shows.
Extensible markup language (XML, Extensible Markup Language) is that website indicates structured message A kind of received text format.XML separates data from HTML.XML data is stored with plain text format.It can be XML data is accessed in HTML page.
The embodiment of the invention provides a kind of URL methods of exhibiting, by getting the corresponding resource content of the URL in advance simultaneously Judge the classification of the content;Then, determine that the content matched therewith shows form further according to the category;Finally, by the URL with The content shows form and is presented in article together, thus allow URL to be more clear, information representated by expressive displaying, And then also user can be allowed to be more willing to click.
In this regard, in practical applications, in order to solve the technical issues of how improving URL expressive force, the embodiment of the present invention is mentioned For a kind of uniform resource position mark URL methods of exhibiting.As shown in Figure 1, this method can by step S100 to step S130 come It realizes.Wherein:
S100: first content is obtained, wherein first content is the corresponding at least part resource content of URL.
Wherein, URL (Uniform Resource Locator) means uniform resource locator, with being commonly called as webpage Location.URL is the expression succinct to the position for the resource that can be obtained from internet and one kind of access method, is on internet The address of standard resource.URL is equivalent to a filename in the extension of network range.The information that it includes points out the position of file And how browser should handle it.So on internet any resource content (such as: website, file, article, video Deng) can be referred to URL.
What needs to be explained here is that "at least a portion" referred to herein can be a part, it is also possible to whole.Example Such as: the corresponding at least part resource content of URL can be the corresponding a part of resource content of URL, and it is corresponding to be also possible to URL Whole resource contents.
In some alternative embodiments, step S100 can specifically include: by following any one or several modes, come Grab the corresponding resource content of URL: reptile instrument, ASP (Active Server Page) webpage capture tool, the side Java (Java) Formula, PHP (Hypertext Preprocessor, hypertext pre-process language) mode, (Delphi is visualized towards right Delphi As design tool) mode, Python (scripting language for combining explanatory, compiled, interactivity and object-oriented) mode, Flex (application tool based on Flash platform) mode or Ruby (interpreted languages of cross-platform object-oriented) mode.
Illustrate the process for grabbing the corresponding resource content of URL by taking reptile instrument as an example below.Wherein, which can It is realized in a manner of through Java, Python etc..
Specifically, since the URL of one or several Initial pages, the URL on Initial page is obtained, source code solution is passed through Analysis during grabbing webpage, new URL is constantly extracted from current page and is put to obtain URL to one resource content Enqueue, until having grabbed all webpages.Alternatively, grabbing resource content in the following manner: according to existing web page analysis Algorithm filtering is unrelated with theme to be linked, and the URL queue to be captured such as retains useful link, and put it into.Then, by root It selects to grab the URL of resource content in next step from queue according to search strategy, and repeats the above process, until having grabbed all Webpage.Wherein, search strategy, which can be taken, preferentially grabs important webpage.Here it is possible to pass through the welcome degree of link, link weight It spends and links depth averagely to measure the importance of webpage.Wherein, the welcome degree of link can be by whether have large number of Webpage pointed by measure.Link different degree can be by whether weigh comprising " .com ", " home " and less "/" Amount.Averagely link depth representing link range (the wherein link of each seed website to webpage in a seed Website Hosting It is determined by breadth first traversal rule).
Illustrate the process for grabbing the corresponding resource content of URL by taking ASP webpage capture tool as an example again.Wherein, using C# Language (it is object oriented program language) fills in the expression of Form base class;Then, (different with server by post method Walk the mode of communication) carry out Transfer Parameters, to grab resource content.
Those skilled in the art will be understood that actually carry out content acquisition when, can also using third party's tool come reality The crawl of existing resource content.
Such as: it can use jsoup resolver (it is the html parser based on Java) come with directly parsing some URL Location passes through DOM (Document Object Model, document dbject model), CSS (Cascading Style Sheets, layer Stacking style table) grab the corresponding resource content of URL.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive Acquisition first content mode, if can be applied to the embodiment of the present invention, also should include within protection scope of the present invention, And it is herein incorporated by reference herein.
S110: the classification of first content is determined.
In some alternative embodiments, this step can specifically be realized by step S112 to step S116.Wherein:
S112: identification at least part first content.
As an example, this step can based on the html source code as shown by browser, by identification html tag come Identify at least part first content.Because html tag be it is predefined, label can be identified according to html tag In content be text or video, also or title.For a webpage, wherein possible existing picture, also there is text Etc.;Can identify which is partially picture by this step, which is partially text, and also or which is partially video etc..
Specifically, for example, "<audio>" tag definition sound-content, by identifying that the label is known that this Partial content is audio."<canvas>" tag definition figure, by identifying that the label is known that this partial content for figure Shape."<video>" tag definition video, by identifying that the label is known that this partial content is video."<img>" label Image is defined, is known that this partial content is image by the label."<body>" tag definition web page display it is interior Hold."<head>" defines the information such as title, character format, language, compatibility, keyword, description.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive The mode of identification at least part first content if can be applied to the embodiment of the present invention also should include in guarantor of the invention Within the scope of shield, and it is herein incorporated by reference herein.
S114: the second content of at least part is extracted, wherein second content is at least part first content.
This step can extract partial content according to predefined html tag, with the displaying for URL content.
In some alternative embodiments, the step of extracting the second content of at least part also can specifically include:
S1142: XML serialization is carried out at least part first content.
Wherein, serializing, which refers to the process of, is converted into binary string for data structure or object.
The process of serializing is described in detail with a preferred embodiment below.
Step a1: whole HTML of resource content are read using the method for the OpenRead of the WebClient in C#.NET Code.
Step a2: unrelated code is deleted.
Because script can not be parsed by XML, it deletes all in code<script></Script>between foot This.
Step a3: the conventional outlier in HTML code is deleted.
Such as: it can delete<title>with</title>between, Yi Ji<metaname=" keywords " Content=" "/> in content.
Step a4: by HTML code whole labeling, and by addition end-tag to close all labels.
Step a5: using the XML serialization method of Microsoft, make entire Html code XML serialization, generate XML file.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive To at least part first content carry out XML serialization mode should also be included in if can be applied to the embodiment of the present invention Within protection scope of the present invention, and it is herein incorporated by reference herein.
S1144: by extracting at least part XML node information, to realize the extraction of the second content of at least part.
XML file is made of node.Node is effective in XML file and the minimum unit of complete structure.XML DOM (XML Document Object Model, DOM Document Object Model) defines access and operates the standard method of XML document.DOM XML document is checked as tree construction, so, all elements can be accessed by dom tree.Element and its text, attribute It is considered as node.
During specific implementation, the extraction of nodal information can be carried out to the XML file of generation by Get function, It can choose using XPath (XML Path Language, XML Path Language) and extract XML node information.
The process for extracting XML node information (such as: video) is described in detail in citing below.
Step b1: recursive lookup is carried out from node, until finding the smallest node.
Wherein, the smallest node is most basic unit Element.
Step b2: each the smallest node is parsed using XML parser, obtains string data.
Step b3: the string data is transferred to VideoInfo class.
Step b4: in such a way that continuous 2 layers are extracted, data are navigated into each video class.
Step b5: the data in each video class are passed into the ArrayList data in Search Video Info Structure.
Step b6: by ArrayList data structure data and corresponding Adapter data associate, thus real The extraction of existing XML node information.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive The mode of extraction at least part XML node information if can be applied to the embodiment of the present invention also should include of the invention Within protection scope, and it is herein incorporated by reference herein.
S116: classify to the second content of at least part, so that it is determined that the classification of first content.
In some alternative embodiments, the step of classifying to the second content of at least part can specifically include: Using the domain name and html tag in URL, to classify to the second content of at least part.
In some preferred embodiments, using the domain name and html tag in URL, to the second content of at least part The step of being classified can specifically include:
Step c1: domain name is matched with scheduled domain name list, if successful match, thens follow the steps c2;If matching Failure, thens follow the steps c3;Wherein, scheduled domain name list includes scheduled domain name and its first category information.
Step c2: first category information is extracted.
Step c3: identification html tag determines second category information.
Step c4: according to first category information and second category information, to classify to the second content of at least part.
For this step during specific implementation, this step can identify domain name according to URL character string, by the domain name and in advance Fixed domain name list is matched.If successful match, the attribute information of the domain name in domain name list is extracted, and to html tag It is identified, finally, determining the classification of the second content of at least part according to the attribute information of domain name and html tag.With excellent For cruel website, identifies the domain name in URL character string, which is matched with domain name list;If successful match, determine The URL is youku.com website, then according to the domain name in the URL, attribute relevant to the domain name is extracted from scheduled domain name list Classification information in information.If it fails to match, classification judgement is carried out using html tag.Wherein, in domain name list with domain It is mainly video that predefined classification information, which can indicate content corresponding to the domain name, in the corresponding attribute information of name;Extract view Frequently this classification information, then identifies html tag, by identification, it is known that content includes text;Thus it is possible to The classification for determining this partial content is the classification of video combination text.For another example, for Taobao website, from URL character string In extract domain name, then the domain name is compared with domain name list, if successful match, determine the URL be Taobao website, Then predefined this classification information of picture and text mixing from the attribute information for extracting the domain name in scheduled domain name list, then, Html tag is identified, by identification, judges that content further includes video, it is determined that the classification of this partial content is picture and text The classification of mixing combination video.
Does in the above-described embodiments, the composition of URL include: agreement // domain name: port/virtual directory/filename? parameter # anchor Part.Wherein it is possible to use IP address as domain name.With http://www.aspxfans.com:8080/news/ Index.asp? for boardID=5&ID=24618&page=1#name, www.aspxfans.com is domain name.
What needs to be explained here is that during above-mentioned carry out classification judgement, the step of identifying domain name with utilize HTML Label judges that the step of content type can carry out simultaneously, can also carry out in order, if carried out in order, the present invention is implemented Example does not limit sequencing, it may be assumed that can both identify the classification of content first with html tag, then extract classification letter by domain name Breath can also first pass through domain name and extract classification information, recycle html tag to identify the classification of content.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive This hair should be also included in if can be applied to the embodiment of the present invention to the mode classified of the second content of at least part Within bright protection scope, and it is herein incorporated by reference herein.
S120: according to classification, determine that content shows form.
In some alternative embodiments, this step can specifically include: according to the domain name in classification and URL, and be based on Second content at least partially, to determine that content shows form.
For example, if it is judged that classification is the type of picture and text mixing, and domain name is www.taobao.com, and The second obtained content is the graph-text content on Taobao website, then big figure can be used and add the structure of text, and combine Taobao Online graph-text content shows form as content.
Again for example, if it is judged that classification is the type of picture and text mixing, and domain name is www.qq.com, and To the second content be Tencent website on graph-text content, then can be used small figure add text structure, and combine www.qq.com Graph-text content on standing shows form as content.
The embodiment of the present invention, can be using any one or a few following structure type when determining that content shows form: Big figure is equipped with the structure of text, small figure is equipped with the structure of text, big form is equipped with the structure of text, small form is equipped with the knot of text Structure, picture are equipped with the structure etc. of audio.The typesetting form of the structure type can be up-down structure, tiled configuration, diagonal knot Structure, but it is not limited to this.Those skilled in the art are it should be appreciated that any content that is existing or being likely to occur from now on shows It also should include within the scope of the present invention if form can be applied to the embodiment of the present invention.And herein by reference It is hereby incorporated by.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive Determine that content shows the mode of form according to classification, also should include of the invention if can be applied to the embodiment of the present invention Within protection scope, and it is herein incorporated by reference herein.
S130: URL is equipped with content shows form and be shown.
For example, by this step, URL character string can be equipped with to the structure type of big figure and text, with certain Layout type is shown.Wherein, layout type can be form, left and right form, diagonal form up and down, but be not limited to this.
In practical applications, being shown in the form of which kind of content shows can have user oneself to manually select, can also be with By backstage automatic push.
Compared with the existing no optional method of pattern, the embodiment of the present invention can intelligently recommend it is a variety of show pattern, from And it helps author to promote article and shows power.
The embodiment of the present invention is by obtaining first content, wherein first content is the corresponding at least part resource of the URL Content;Then, it is determined that the classification of first content;Further according to classification, determine that content shows form;Finally, the URL is equipped with content Show form to be shown.As a result, by judging the classification of the corresponding resource content of URL, to determine that content shows form, most URL is equipped with content and shows form to be shown at last, such as: picture and text mixing combination can be equipped in the lower section of URL character string The content of small form video shows form, to be shown to URL;For another example, it can be equipped with big in the top of URL character string The content of form video combination text shows form to show the URL.Thus beautify URL, solve the table for how improving URL The technical issues of existing power, enhance article shows power, so as to improve the appeal to user, and then can increase use The clicking rate at family.
Below with reference to Fig. 2, with a preferred embodiment, the present invention is described in detail again.
S200: by following any one or several modes, to obtain first content: reptile instrument, ASP webpage capture work Tool, Java mode, PHP mode, Delphi mode, Python mode, Flex mode or Ruby mode;Wherein, first content is The corresponding at least part resource content of URL.
S210: identification at least part first content.
S220: XML serialization is carried out at least part first content.
S230: by extracting at least part XML node information, to extract the second content of at least part;Wherein, second Content is at least part first content.
S240: using the domain name and html tag in URL, classifying to the second content of at least part, so that it is determined that The classification of first content.
S250: according to the domain name in classification and URL, and based on the second content of at least part, to determine that content shows shape Formula.
S260: URL is equipped with the content shows form and be shown.
By this preferred embodiment, URL is beautified, reader can have been helped to obtain under the premise of not clicking on link more Information, thus the expressive force of URL is improved, the power that shows of article is enhanced, so as to improve the appeal to user, into And the clicking rate of user can be increased.
In addition, showing power to assist author to promote article, the embodiment of the present invention also provides a kind of information expression method. This method may include any of the above-described URL methods of exhibiting embodiment.
Explanation in relation to the present embodiment can refer to the explanation of any of the above-described URL methods of exhibiting embodiment, no longer superfluous herein It states.
It should be noted that each step in above-mentioned URL methods of exhibiting embodiment and information expression method embodiment can Sequentially to execute, it can also execute, be not limited thereto parallel.
Based on technical concept identical with URL methods of exhibiting embodiment, the embodiment of the present invention also provides a kind of unified resource Finger URL URL shows device.The Installation practice can execute above method embodiment.As shown in figure 3, the device 30 can wrap It includes: obtaining module 32, the first determining module 34, the second determining module 36 and display module 38.Wherein, module 32 is obtained for obtaining Take first content, wherein first content is the corresponding at least part resource content of URL.First determining module 34 is for determining The classification of first content.Second determining module 36 is used to determine that content shows form according to classification.Display module 38 is used for will URL is equipped with content and shows form to be shown.
In some alternative embodiments, on the basis of embodiment shown in Fig. 3, obtaining module be can specifically include: URL Obtain subelement and picking unit.Wherein, which obtains subelement for obtaining URL.The picking unit is used to pass through following One or more of modes, to grab the corresponding resource content of URL, to obtain the first content: reptile instrument, ASP webpage Gripping tool, Java mode, PHP mode, Delphi mode, Python mode, Flex mode or Ruby mode.
In some alternative embodiments, on the basis of embodiment shown in Fig. 3, the first determining module specifically be can wrap It includes: recognition unit, extraction unit and taxon.Wherein, recognition unit at least part first content for identification.It extracts single Member is for extracting the second content of at least part, wherein the second content is at least part first content.Taxon for pair The second content is classified at least partially, so that it is determined that the classification of first content.
In some alternative embodiments, on the basis of the above embodiments, extraction unit can specifically include: serializing Unit and extraction subelement.Wherein, serialization unit is used to carry out XML serialization at least part first content.Extract son Unit is used for by extracting at least part XML node information, to realize the extraction of the second content of at least part.
In some alternative embodiments, on the basis of the above embodiments, taxon can specifically include: classification Unit.Wherein, classification subelement is used to utilize URL and html tag, to classify to the second content of at least part.
In some preferred embodiments, above-mentioned classification subelement is further specifically included: matching unit, classification extraction list Member, classification determination unit and classifying content unit.Wherein, matching unit is used to carry out domain name and scheduled domain name list Matching;Wherein, scheduled domain name list includes scheduled domain name and its first category information.Classification extraction unit is for matching In successful situation, first category information is extracted.Classification determination unit html tag for identification, determines second category information. Classifying content unit is used for according to first category information and second category information, to divide the second content of at least part Class.
On the basis of above preferred embodiment, URL shows that device can also include execution unit.The execution unit is used for In the case where it fails to match, classification determination unit is triggered.
In some alternative embodiments, on the basis of the above embodiments, above-mentioned second determining module specifically can wrap It includes: determining subelement.Wherein it is determined that subelement is used for according to the domain name in classification and URL, and based at least part second Hold, to determine that content shows form.
In addition, showing power to assist author to promote article, the embodiment of the present invention also provides a kind of knowledge representation systems. As shown in figure 4, the system may include that URL shows device and expression module.Wherein, expression module is used to show by URL and fill It sets to express the URL in article.
Explanation in relation to the present embodiment can refer to the explanation of any of the above-described URL methods of exhibiting embodiment, no longer superfluous herein It states.
It will be appreciated by those skilled in the art that above-mentioned URL shows that Installation practice and knowledge representation systems embodiment may be used also To include some known features, such as processor, memory and bus etc..Wherein, processor is connected by bus with memory. Processor includes but is not limited to ARM, programmable logic device, DSP etc..Memory can be random access memory, can also be with It is read-only memory.Bus may include data/address bus, address bus and control bus.
It should be noted that being shown in Installation practice and knowledge representation systems embodiment in above-mentioned URL, only with exemplary Mode module division has been carried out to it, those skilled in the art using other modes it should be appreciated that can also be carried out Module divides, and the module divided can also be split or be combined again;Also, it can sequentially be held between modules Row, can also execute parallel, be not limited thereto.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (18)

1. a kind of uniform resource position mark URL methods of exhibiting, which is characterized in that the described method includes:
Obtain first content, wherein the first content is the corresponding at least part resource content of the URL;
Determine the classification of the first content;
According to the classification, determine that content shows form;
The URL is equipped with the content shows form and is shown.
2. the method according to claim 1, wherein the acquisition first content specifically includes:
By following any one or several modes, to grab the corresponding resource content of the URL: reptile instrument, ASP webpage capture Tool, Java mode, PHP mode, Delphi mode, Python mode, Flex mode or Ruby mode.
3. the method according to claim 1, wherein the classification of the determination first content, specifically includes:
Identification at least part first content;
Extract the second content of at least part, wherein second content is at least part first content;
Classify to second content of at least part, so that it is determined that the classification of the first content.
4. according to the method described in claim 3, it is characterized in that, the extraction the second content of at least part, specifically includes:
XML serialization is carried out at least part first content;
By extracting at least part XML node information, to realize the extraction of second content of at least part.
5. according to the method described in claim 4, it is characterized in that, described divide second content of at least part Class specifically includes:
Using the domain name and html tag in the URL, to classify to second content of at least part.
6. according to the method described in claim 5, it is characterized in that, the domain name and html tag using in the URL, comes Classify to second content of at least part, specifically include:
Domain name is matched with scheduled domain name list;Wherein, the scheduled domain name list includes scheduled domain name And its first category information;
If successful match, the first category information is extracted;
It identifies the html tag, determines second category information;
According to the first category information and the second category information, to divide second content of at least part Class.
7. according to the method described in claim 6, it is characterized in that, the method also includes:
If it fails to match, the identification html tag step is executed.
8. according to the method any in claim 3, which is characterized in that it is described according to the classification, determine that content shows Form specifically includes:
According to the domain name in the classification and the URL, and it is based on second content of at least part, to determine the content Show form.
9. a kind of information expression method, which is characterized in that the described method includes:
By URL methods of exhibiting described in any one of claims 1-8, to express the URL in article.
10. a kind of uniform resource position mark URL shows device, which is characterized in that described device includes:
Module is obtained, for obtaining first content, wherein the first content is the corresponding at least part resource of the URL Content;
First determining module, for determining the classification of the first content;
Second determining module, for determining that content shows form according to the classification;
Display module shows form for the URL to be equipped with the content and is shown.
11. device according to claim 10, which is characterized in that the acquisition module specifically includes:
URL obtains subelement, for obtaining URL;
Picking unit is used for by following any one or several modes, to grab the corresponding resource content of the URL, to obtain Obtain the first content: reptile instrument, ASP webpage capture tool, Java mode, PHP mode, Delphi mode, the side Python Formula, Flex mode or Ruby mode.
12. device according to claim 10, which is characterized in that first determining module specifically includes:
Recognition unit, for identification at least part first content;
Extraction unit, for extracting the second content of at least part, wherein second content is described at least part first Content;
Taxon, for classifying to second content of at least part, so that it is determined that the classification of the first content.
13. device according to claim 12, which is characterized in that the extraction unit specifically includes:
Serialization unit, for carrying out XML serialization at least part first content;
Subelement is extracted, at least part XML node information is extracted for passing through, to realize second content of at least part Extraction.
14. device according to claim 13, which is characterized in that the taxon specifically includes:
Classify subelement, for using the domain name and html tag in the URL, to second content of at least part into Row classification.
15. device according to claim 14, which is characterized in that the classification subelement specifically includes:
Matching unit, for matching domain name with scheduled domain name list;Wherein, the scheduled domain name list packet Include scheduled domain name and its first category information;
Classification extraction unit, for extracting the first category information in the case where successful match;
Classification determination unit, the html tag, determines second category information for identification;
Classifying content unit is used for according to the first category information and the second category information, to described at least one The second content is divided to classify.
16. device according to claim 15, which is characterized in that described device further include:
Execution unit, for triggering the classification determination unit in the case where it fails to match.
17. any device in 2 according to claim 1, which is characterized in that second determining module specifically includes:
Subelement is determined, for according to the domain name in the classification and the URL, and based in described at least part second Hold, to determine that the content shows form.
18. a kind of knowledge representation systems, which is characterized in that the system comprises:
Any URL shows device in claim 10-17;
Module is expressed, for showing device by the URL, to express the URL in article.
CN201710385155.5A 2017-05-26 2017-05-26 Uniform resource locator display method, information display method and related products thereof Active CN108959325B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710385155.5A CN108959325B (en) 2017-05-26 2017-05-26 Uniform resource locator display method, information display method and related products thereof
PCT/CN2018/088438 WO2018214964A1 (en) 2017-05-26 2018-05-25 Uniform resource locator display method, information expression method and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710385155.5A CN108959325B (en) 2017-05-26 2017-05-26 Uniform resource locator display method, information display method and related products thereof

Publications (2)

Publication Number Publication Date
CN108959325A true CN108959325A (en) 2018-12-07
CN108959325B CN108959325B (en) 2021-06-29

Family

ID=64396255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710385155.5A Active CN108959325B (en) 2017-05-26 2017-05-26 Uniform resource locator display method, information display method and related products thereof

Country Status (2)

Country Link
CN (1) CN108959325B (en)
WO (1) WO2018214964A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GR1010585B (en) * 2022-11-10 2023-12-12 Παναγιωτης Τσαντιλας Web crawling and content summarization

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258058A (en) * 2013-06-03 2013-08-21 贝壳网际(北京)安全技术有限公司 Page display method and system and browser
CN103544178A (en) * 2012-07-13 2014-01-29 百度在线网络技术(北京)有限公司 Method and equipment for providing reconstruction page corresponding to target page
CN104468720A (en) * 2014-11-07 2015-03-25 广州市至德科技企业孵化器有限公司 Method for determining preview link and providing dynamic preview information for preview link
US20150356196A1 (en) * 2014-06-04 2015-12-10 International Business Machines Corporation Classifying uniform resource locators
CN105512149A (en) * 2014-10-14 2016-04-20 阿里巴巴集团控股有限公司 Method for updating and displaying keyword information of digital reading resource and relevant devices
CN106033428A (en) * 2015-03-11 2016-10-19 北大方正集团有限公司 A uniform resource locator selecting method and a uniform resource locator selecting device
CN106095453A (en) * 2016-06-16 2016-11-09 北京金山安全软件有限公司 Information display method and device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544178A (en) * 2012-07-13 2014-01-29 百度在线网络技术(北京)有限公司 Method and equipment for providing reconstruction page corresponding to target page
CN103258058A (en) * 2013-06-03 2013-08-21 贝壳网际(北京)安全技术有限公司 Page display method and system and browser
US20150356196A1 (en) * 2014-06-04 2015-12-10 International Business Machines Corporation Classifying uniform resource locators
CN105512149A (en) * 2014-10-14 2016-04-20 阿里巴巴集团控股有限公司 Method for updating and displaying keyword information of digital reading resource and relevant devices
CN104468720A (en) * 2014-11-07 2015-03-25 广州市至德科技企业孵化器有限公司 Method for determining preview link and providing dynamic preview information for preview link
CN106033428A (en) * 2015-03-11 2016-10-19 北大方正集团有限公司 A uniform resource locator selecting method and a uniform resource locator selecting device
CN106095453A (en) * 2016-06-16 2016-11-09 北京金山安全软件有限公司 Information display method and device and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IT昆仑: "希望博客新增加的功能——"网页预览"&SNAP", 《HTTPS://BLOG.51CTO.COM/QINYEZHAI/113164》 *
WEIXIN_33834628: "希望博客新增加的功能—"网页预览"&SNAP", 《HTTPS://BLOG.CSDN.NET/WEIXIN_33834628/ARTICLE/DETAILS/92481412》 *

Also Published As

Publication number Publication date
WO2018214964A1 (en) 2018-11-29
CN108959325B (en) 2021-06-29

Similar Documents

Publication Publication Date Title
US9703883B2 (en) Social bookmarking of resources exposed in web pages
CN107423391B (en) Information extraction method of webpage structured data
JP4226261B2 (en) Structured document type determination system and structured document type determination method
CN103678511B (en) The method and device of webpage content extraction is carried out according to visual template
JP2011003182A (en) Keyword display method and system thereof
CN107016102B (en) A kind of big data web crawlers paging configuration method
CN101630330A (en) Method for webpage classification
CN105302815B (en) The filter method and device of the uniform resource position mark URL of webpage
WO2011017929A1 (en) Method and apparatus for positioning effective information quickly by mobile phone browser
Gunawan et al. Improving data collection on article clustering by using distributed focused crawler
CN106294885A (en) A kind of data collection towards isomery webpage and mask method
Prasad et al. Coreex: content extraction from online news articles
CN106547749A (en) The method and apparatus of collecting webpage data
CN104428763B (en) Structuring and unstructured data are realized to the method in XML file
CN103778156A (en) Method and device for searching for data and server for data search
CN111381809B (en) Method and device for searching focus page
CN107368546A (en) A kind of method and apparatus for generating outline
CN103631793B (en) A kind of method, apparatus and equipment for being ranked up to search result
CN108959325A (en) Uniform resource locator methods of exhibiting, information expression method and its Related product
CN103631944B (en) A kind of content-based similar webpage splitting method
CN112612990A (en) Webpage analysis method, system and computer readable storage medium
CN105183843A (en) List page recognition system and method
CN106897287B (en) Webpage release time extraction method and device for webpage release time extraction
CN103729354B (en) web information processing method and device
CN108196874B (en) Webpage analysis method and device, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant