CN108959325A - Uniform resource locator methods of exhibiting, information expression method and its Related product - Google Patents
Uniform resource locator methods of exhibiting, information expression method and its Related product Download PDFInfo
- Publication number
- CN108959325A CN108959325A CN201710385155.5A CN201710385155A CN108959325A CN 108959325 A CN108959325 A CN 108959325A CN 201710385155 A CN201710385155 A CN 201710385155A CN 108959325 A CN108959325 A CN 108959325A
- Authority
- CN
- China
- Prior art keywords
- content
- url
- classification
- domain name
- mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9558—Details of hyperlinks; Management of linked annotations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/14—Tree-structured documents
- G06F40/143—Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the invention provides a kind of uniform resource position mark URL methods of exhibiting, information expression method, uniform resource position mark URLs to show device and knowledge representation systems.Wherein, which may include: acquisition first content, wherein first content is the corresponding at least part resource content of the URL;Determine the classification of first content;According to classification, determine that content shows form;The URL is equipped with content shows form and is shown.Wherein, determine that content shows form and specifically includes according to classification: according to the domain name in classification and URL, and based on the second content of at least part, to determine that content shows form.The embodiment of the present invention solves the technical issues of how improving URL expressive force as a result, has beautified URL, thus improves the expressive force of URL, enhances the power that shows of article, so as to improve the appeal to user, and then can increase the clicking rate of user.
Description
Technical field
The present invention relates to Internet technical fields, more particularly to a kind of uniform resource locator methods of exhibiting, information table
Device and knowledge representation systems are shown up to method, uniform resource locator.
Background technique
Currently, author publishes an article often through internet.When author introduces website, video, song in its article
Or when quoting other articles, corresponding URL (uniform resource locator) can be added into the article oneself delivered,
It is exactly the hyperlink that we usually see.In this way, user clicks directly on the URL i.e. hyperlink when reading this article, can jump
Corresponding webpage is gone to, is browsed.
Specifically, there are mainly two types of the existing modes for showing URL: first way is the character string directly by URL
It is written in article;The second way is that URL is arranged to text hyperlink, and then, author is according to the understanding of oneself by this URL pairs
The resource content answered is described in the form of text.
For first way, this mode that URL is directly shown in article, only actually is directly to show
URL character string itself, and URL character string has no any content information for a user, thus it can not clearly show
Wish to show the main resource content of user after URL.So this exhibition method makes, URL expressive force is poor, lacks attraction
Degree.For the second way, although it attempts to facilitate user to understand and user is attracted to click;But due to verbal description
It is the subjective summary of author, so, in itself, the verbal description of this shown URL of mode has become article
A part, there is no objective reality shows URL.Again because literal expression has limitation, and the table with author
Danone power is related;So the expressive force of this URL exhibition method is not also strong.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of uniform resource position mark URL methods of exhibiting, at least to solve such as
What improves the technical issues of URL expressive force.In addition, also providing a kind of information expression method, uniform resource position mark URL shows dress
It sets and knowledge representation systems.
To achieve the goals above, according to an aspect of the present invention, the following technical schemes are provided:
A kind of uniform resource position mark URL methods of exhibiting, this method may include:
Obtain first content, wherein the first content is the corresponding at least part resource content of the URL;
Determine the classification of the first content;
According to the classification, determine that content shows form;
The URL is equipped with the content shows form and is shown.
Further, the acquisition first content can specifically include:
By following any one or several modes, to grab the corresponding resource content of the URL: reptile instrument, ASP webpage
Gripping tool, Java mode, PHP mode, Delphi mode, Python mode, Flex mode or Ruby mode.
Further, the classification of the determination first content, can specifically include:
Identification at least part first content;
Extract the second content of at least part, wherein second content is at least part first content;
Classify to second content of at least part, so that it is determined that the classification of the first content.
Further, the second content of described extraction at least part, can specifically include:
XML serialization is carried out at least part first content;
By extracting at least part XML node information, to realize the extraction of second content of at least part.
Further, described to classify to second content of at least part, it can specifically include:
Using the domain name and html tag in the URL, to classify to second content of at least part.
Further, the domain name and html tag using in the URL, to second content of at least part
Classify, can specifically include:
Domain name is matched with scheduled domain name list;Wherein, the scheduled domain name list includes scheduled
Domain name and its first category information;
If successful match, the first category information is extracted;
It identifies the html tag, determines second category information;
According to the first category information and the second category information, second content of at least part is carried out
Classification.
Further, the method can also include:
If it fails to match, the identification html tag step is executed.
Further, described according to the classification, it determines that content shows form, can specifically include:
According to the domain name in the classification and the URL, and it is based on second content of at least part, it is described to determine
Content shows form.
To achieve the goals above, according to another aspect of the present invention, following technical scheme is additionally provided:
A kind of information expression method, this method may include:
By above-mentioned URL methods of exhibiting, to express the URL in article.
To achieve the goals above, according to a further aspect of the invention, following technical scheme is additionally provided:
A kind of uniform resource position mark URL displaying device, the apparatus may include:
Module is obtained, for obtaining first content, wherein the first content is the corresponding at least part of the URL
Resource content;
First determining module, for determining the classification of the first content;
Second determining module, for determining that content shows form according to the classification;
Display module shows form for the URL to be equipped with the content and is shown.
Further, the acquisition module can specifically include:
URL obtains subelement, for obtaining URL;
Picking unit is used for through following any one or several modes, to grab the corresponding resource content of the URL, from
And obtain the first content: reptile instrument, ASP webpage capture tool, Java mode, PHP mode, Delphi mode,
Python mode, Flex mode or Ruby mode.
Further, first determining module can specifically include:
Recognition unit, for identification at least part first content;
Extraction unit, for extracting the second content of at least part, wherein second content is described at least part
First content;
Taxon, for classifying to second content of at least part, so that it is determined that the first content
Classification.
Further, the extraction unit can specifically include:
Serialization unit, for carrying out XML serialization at least part first content;
Subelement is extracted, at least part XML node information is extracted for passing through, to realize described at least part second
The extraction of content.
Further, the taxon can specifically include:
Classification subelement, for utilizing domain name and html tag in the URL, in described at least part second
Appearance is classified.
Further, the classification subelement can specifically include:
Matching unit, for matching domain name with scheduled domain name list;Wherein, the scheduled domain name column
Table includes scheduled domain name and its first category information;
Classification extraction unit, for extracting the first category information in the case where successful match;
Classification determination unit, the html tag, determines second category information for identification;
Classifying content unit, for according to the first category information and the second category information, to it is described at least
A part of second content is classified.
Further, described device can also include:
Execution unit, for triggering the classification determination unit in the case where it fails to match.
Further, second determining module can specifically include:
It determines subelement, for according to the domain name in the classification and the URL, and is based on described at least part second
Content, to determine that the content shows form.
To achieve the goals above, according to a further aspect of the invention, following technical scheme is additionally provided:
A kind of knowledge representation systems, the system may include:
Above-mentioned URL shows device;
Module is expressed, for showing device by the URL, to express the URL in article.
It is fixed that the embodiment of the present invention provides a kind of uniform resource position mark URL methods of exhibiting, information expression method, unified resource
Position symbol URL shows device and knowledge representation systems.Wherein, which may include: to obtain
First content, wherein first content is the corresponding at least part resource content of the URL;Determine the classification of first content;According to
Classification determines that content shows form;The URL is equipped with content shows form and is shown.The technical solution is by judging URL
The classification of corresponding resource content, to determine that content shows form, to show form by content to show URL.With it is direct
Show that the mode of character in URL is compared, technical solution provided in an embodiment of the present invention shows form since URL is equipped with content
Be shown, beautified URL, thus improve the expressive force of URL, enhance the power that shows of article, so as to improve to
The appeal at family, and then the clicking rate of user can be increased.Compared to author subjectively to resource content corresponding to URL into
Row is summarized, then, this text summarization come in by way of showing URL, technical solution provided in an embodiment of the present invention is carried out
Holding content involved in showing is objective acquisition, that is to say, that the embodiment of the present invention objectively obtains first content, so
After judge its classification, determine that content shows form then according to the category, finally, being equipped with the content shows form objective reality
Ground is shown URL;So as to accurately show the corresponding resource content of URL, thus the expressive force of URL is improved, increased
Strong article shows power, and then the appeal and clicking rate to user can be improved.
Certainly, it implements any of the products of the present invention or method does not need necessarily to reach all the above excellent simultaneously
Point.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the flow diagram according to the URL methods of exhibiting of the embodiment of the present invention;
Fig. 2 is the flow diagram according to the URL methods of exhibiting of another embodiment of the present invention;
Fig. 3 is the structural schematic diagram that device is shown according to the URL of the embodiment of the present invention;
Fig. 4 is the structural schematic diagram according to the knowledge representation systems of the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the technical characteristic in various embodiments of the present invention can be combined with each other.
Term explanation
Content shows form and refers to allow the various content ways of presentation that content is more easily understood and is showed, example
Such as: video, audio, picture and text mixing, map, text, but it is not limited to this.
Text mark up language (HTML, Hypertext Markup Language) is for describing web document.HTML passes through
Label symbol marks the various pieces in webpage to be shown.Web page files itself are a kind of text files, by text
Marker character is added in this document, can tell how browser shows content therein, such as: how text is handled, and picture is such as
What is arranged, and how picture shows.
Extensible markup language (XML, Extensible Markup Language) is that website indicates structured message
A kind of received text format.XML separates data from HTML.XML data is stored with plain text format.It can be
XML data is accessed in HTML page.
The embodiment of the invention provides a kind of URL methods of exhibiting, by getting the corresponding resource content of the URL in advance simultaneously
Judge the classification of the content;Then, determine that the content matched therewith shows form further according to the category;Finally, by the URL with
The content shows form and is presented in article together, thus allow URL to be more clear, information representated by expressive displaying,
And then also user can be allowed to be more willing to click.
In this regard, in practical applications, in order to solve the technical issues of how improving URL expressive force, the embodiment of the present invention is mentioned
For a kind of uniform resource position mark URL methods of exhibiting.As shown in Figure 1, this method can by step S100 to step S130 come
It realizes.Wherein:
S100: first content is obtained, wherein first content is the corresponding at least part resource content of URL.
Wherein, URL (Uniform Resource Locator) means uniform resource locator, with being commonly called as webpage
Location.URL is the expression succinct to the position for the resource that can be obtained from internet and one kind of access method, is on internet
The address of standard resource.URL is equivalent to a filename in the extension of network range.The information that it includes points out the position of file
And how browser should handle it.So on internet any resource content (such as: website, file, article, video
Deng) can be referred to URL.
What needs to be explained here is that "at least a portion" referred to herein can be a part, it is also possible to whole.Example
Such as: the corresponding at least part resource content of URL can be the corresponding a part of resource content of URL, and it is corresponding to be also possible to URL
Whole resource contents.
In some alternative embodiments, step S100 can specifically include: by following any one or several modes, come
Grab the corresponding resource content of URL: reptile instrument, ASP (Active Server Page) webpage capture tool, the side Java (Java)
Formula, PHP (Hypertext Preprocessor, hypertext pre-process language) mode, (Delphi is visualized towards right Delphi
As design tool) mode, Python (scripting language for combining explanatory, compiled, interactivity and object-oriented) mode,
Flex (application tool based on Flash platform) mode or Ruby (interpreted languages of cross-platform object-oriented) mode.
Illustrate the process for grabbing the corresponding resource content of URL by taking reptile instrument as an example below.Wherein, which can
It is realized in a manner of through Java, Python etc..
Specifically, since the URL of one or several Initial pages, the URL on Initial page is obtained, source code solution is passed through
Analysis during grabbing webpage, new URL is constantly extracted from current page and is put to obtain URL to one resource content
Enqueue, until having grabbed all webpages.Alternatively, grabbing resource content in the following manner: according to existing web page analysis
Algorithm filtering is unrelated with theme to be linked, and the URL queue to be captured such as retains useful link, and put it into.Then, by root
It selects to grab the URL of resource content in next step from queue according to search strategy, and repeats the above process, until having grabbed all
Webpage.Wherein, search strategy, which can be taken, preferentially grabs important webpage.Here it is possible to pass through the welcome degree of link, link weight
It spends and links depth averagely to measure the importance of webpage.Wherein, the welcome degree of link can be by whether have large number of
Webpage pointed by measure.Link different degree can be by whether weigh comprising " .com ", " home " and less "/"
Amount.Averagely link depth representing link range (the wherein link of each seed website to webpage in a seed Website Hosting
It is determined by breadth first traversal rule).
Illustrate the process for grabbing the corresponding resource content of URL by taking ASP webpage capture tool as an example again.Wherein, using C#
Language (it is object oriented program language) fills in the expression of Form base class;Then, (different with server by post method
Walk the mode of communication) carry out Transfer Parameters, to grab resource content.
Those skilled in the art will be understood that actually carry out content acquisition when, can also using third party's tool come reality
The crawl of existing resource content.
Such as: it can use jsoup resolver (it is the html parser based on Java) come with directly parsing some URL
Location passes through DOM (Document Object Model, document dbject model), CSS (Cascading Style Sheets, layer
Stacking style table) grab the corresponding resource content of URL.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive
Acquisition first content mode, if can be applied to the embodiment of the present invention, also should include within protection scope of the present invention,
And it is herein incorporated by reference herein.
S110: the classification of first content is determined.
In some alternative embodiments, this step can specifically be realized by step S112 to step S116.Wherein:
S112: identification at least part first content.
As an example, this step can based on the html source code as shown by browser, by identification html tag come
Identify at least part first content.Because html tag be it is predefined, label can be identified according to html tag
In content be text or video, also or title.For a webpage, wherein possible existing picture, also there is text
Etc.;Can identify which is partially picture by this step, which is partially text, and also or which is partially video etc..
Specifically, for example, "<audio>" tag definition sound-content, by identifying that the label is known that this
Partial content is audio."<canvas>" tag definition figure, by identifying that the label is known that this partial content for figure
Shape."<video>" tag definition video, by identifying that the label is known that this partial content is video."<img>" label
Image is defined, is known that this partial content is image by the label."<body>" tag definition web page display it is interior
Hold."<head>" defines the information such as title, character format, language, compatibility, keyword, description.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive
The mode of identification at least part first content if can be applied to the embodiment of the present invention also should include in guarantor of the invention
Within the scope of shield, and it is herein incorporated by reference herein.
S114: the second content of at least part is extracted, wherein second content is at least part first content.
This step can extract partial content according to predefined html tag, with the displaying for URL content.
In some alternative embodiments, the step of extracting the second content of at least part also can specifically include:
S1142: XML serialization is carried out at least part first content.
Wherein, serializing, which refers to the process of, is converted into binary string for data structure or object.
The process of serializing is described in detail with a preferred embodiment below.
Step a1: whole HTML of resource content are read using the method for the OpenRead of the WebClient in C#.NET
Code.
Step a2: unrelated code is deleted.
Because script can not be parsed by XML, it deletes all in code<script></Script>between foot
This.
Step a3: the conventional outlier in HTML code is deleted.
Such as: it can delete<title>with</title>between, Yi Ji<metaname=" keywords "
Content=" "/> in content.
Step a4: by HTML code whole labeling, and by addition end-tag to close all labels.
Step a5: using the XML serialization method of Microsoft, make entire Html code XML serialization, generate XML file.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive
To at least part first content carry out XML serialization mode should also be included in if can be applied to the embodiment of the present invention
Within protection scope of the present invention, and it is herein incorporated by reference herein.
S1144: by extracting at least part XML node information, to realize the extraction of the second content of at least part.
XML file is made of node.Node is effective in XML file and the minimum unit of complete structure.XML DOM
(XML Document Object Model, DOM Document Object Model) defines access and operates the standard method of XML document.DOM
XML document is checked as tree construction, so, all elements can be accessed by dom tree.Element and its text, attribute
It is considered as node.
During specific implementation, the extraction of nodal information can be carried out to the XML file of generation by Get function,
It can choose using XPath (XML Path Language, XML Path Language) and extract XML node information.
The process for extracting XML node information (such as: video) is described in detail in citing below.
Step b1: recursive lookup is carried out from node, until finding the smallest node.
Wherein, the smallest node is most basic unit Element.
Step b2: each the smallest node is parsed using XML parser, obtains string data.
Step b3: the string data is transferred to VideoInfo class.
Step b4: in such a way that continuous 2 layers are extracted, data are navigated into each video class.
Step b5: the data in each video class are passed into the ArrayList data in Search Video Info
Structure.
Step b6: by ArrayList data structure data and corresponding Adapter data associate, thus real
The extraction of existing XML node information.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive
The mode of extraction at least part XML node information if can be applied to the embodiment of the present invention also should include of the invention
Within protection scope, and it is herein incorporated by reference herein.
S116: classify to the second content of at least part, so that it is determined that the classification of first content.
In some alternative embodiments, the step of classifying to the second content of at least part can specifically include:
Using the domain name and html tag in URL, to classify to the second content of at least part.
In some preferred embodiments, using the domain name and html tag in URL, to the second content of at least part
The step of being classified can specifically include:
Step c1: domain name is matched with scheduled domain name list, if successful match, thens follow the steps c2;If matching
Failure, thens follow the steps c3;Wherein, scheduled domain name list includes scheduled domain name and its first category information.
Step c2: first category information is extracted.
Step c3: identification html tag determines second category information.
Step c4: according to first category information and second category information, to classify to the second content of at least part.
For this step during specific implementation, this step can identify domain name according to URL character string, by the domain name and in advance
Fixed domain name list is matched.If successful match, the attribute information of the domain name in domain name list is extracted, and to html tag
It is identified, finally, determining the classification of the second content of at least part according to the attribute information of domain name and html tag.With excellent
For cruel website, identifies the domain name in URL character string, which is matched with domain name list;If successful match, determine
The URL is youku.com website, then according to the domain name in the URL, attribute relevant to the domain name is extracted from scheduled domain name list
Classification information in information.If it fails to match, classification judgement is carried out using html tag.Wherein, in domain name list with domain
It is mainly video that predefined classification information, which can indicate content corresponding to the domain name, in the corresponding attribute information of name;Extract view
Frequently this classification information, then identifies html tag, by identification, it is known that content includes text;Thus it is possible to
The classification for determining this partial content is the classification of video combination text.For another example, for Taobao website, from URL character string
In extract domain name, then the domain name is compared with domain name list, if successful match, determine the URL be Taobao website,
Then predefined this classification information of picture and text mixing from the attribute information for extracting the domain name in scheduled domain name list, then,
Html tag is identified, by identification, judges that content further includes video, it is determined that the classification of this partial content is picture and text
The classification of mixing combination video.
Does in the above-described embodiments, the composition of URL include: agreement // domain name: port/virtual directory/filename? parameter # anchor
Part.Wherein it is possible to use IP address as domain name.With http://www.aspxfans.com:8080/news/
Index.asp? for boardID=5&ID=24618&page=1#name, www.aspxfans.com is domain name.
What needs to be explained here is that during above-mentioned carry out classification judgement, the step of identifying domain name with utilize HTML
Label judges that the step of content type can carry out simultaneously, can also carry out in order, if carried out in order, the present invention is implemented
Example does not limit sequencing, it may be assumed that can both identify the classification of content first with html tag, then extract classification letter by domain name
Breath can also first pass through domain name and extract classification information, recycle html tag to identify the classification of content.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive
This hair should be also included in if can be applied to the embodiment of the present invention to the mode classified of the second content of at least part
Within bright protection scope, and it is herein incorporated by reference herein.
S120: according to classification, determine that content shows form.
In some alternative embodiments, this step can specifically include: according to the domain name in classification and URL, and be based on
Second content at least partially, to determine that content shows form.
For example, if it is judged that classification is the type of picture and text mixing, and domain name is www.taobao.com, and
The second obtained content is the graph-text content on Taobao website, then big figure can be used and add the structure of text, and combine Taobao
Online graph-text content shows form as content.
Again for example, if it is judged that classification is the type of picture and text mixing, and domain name is www.qq.com, and
To the second content be Tencent website on graph-text content, then can be used small figure add text structure, and combine www.qq.com
Graph-text content on standing shows form as content.
The embodiment of the present invention, can be using any one or a few following structure type when determining that content shows form:
Big figure is equipped with the structure of text, small figure is equipped with the structure of text, big form is equipped with the structure of text, small form is equipped with the knot of text
Structure, picture are equipped with the structure etc. of audio.The typesetting form of the structure type can be up-down structure, tiled configuration, diagonal knot
Structure, but it is not limited to this.Those skilled in the art are it should be appreciated that any content that is existing or being likely to occur from now on shows
It also should include within the scope of the present invention if form can be applied to the embodiment of the present invention.And herein by reference
It is hereby incorporated by.
Those skilled in the art are any existing or be likely to occur from now on it should be appreciated that the example above is not exhaustive
Determine that content shows the mode of form according to classification, also should include of the invention if can be applied to the embodiment of the present invention
Within protection scope, and it is herein incorporated by reference herein.
S130: URL is equipped with content shows form and be shown.
For example, by this step, URL character string can be equipped with to the structure type of big figure and text, with certain
Layout type is shown.Wherein, layout type can be form, left and right form, diagonal form up and down, but be not limited to this.
In practical applications, being shown in the form of which kind of content shows can have user oneself to manually select, can also be with
By backstage automatic push.
Compared with the existing no optional method of pattern, the embodiment of the present invention can intelligently recommend it is a variety of show pattern, from
And it helps author to promote article and shows power.
The embodiment of the present invention is by obtaining first content, wherein first content is the corresponding at least part resource of the URL
Content;Then, it is determined that the classification of first content;Further according to classification, determine that content shows form;Finally, the URL is equipped with content
Show form to be shown.As a result, by judging the classification of the corresponding resource content of URL, to determine that content shows form, most
URL is equipped with content and shows form to be shown at last, such as: picture and text mixing combination can be equipped in the lower section of URL character string
The content of small form video shows form, to be shown to URL;For another example, it can be equipped with big in the top of URL character string
The content of form video combination text shows form to show the URL.Thus beautify URL, solve the table for how improving URL
The technical issues of existing power, enhance article shows power, so as to improve the appeal to user, and then can increase use
The clicking rate at family.
Below with reference to Fig. 2, with a preferred embodiment, the present invention is described in detail again.
S200: by following any one or several modes, to obtain first content: reptile instrument, ASP webpage capture work
Tool, Java mode, PHP mode, Delphi mode, Python mode, Flex mode or Ruby mode;Wherein, first content is
The corresponding at least part resource content of URL.
S210: identification at least part first content.
S220: XML serialization is carried out at least part first content.
S230: by extracting at least part XML node information, to extract the second content of at least part;Wherein, second
Content is at least part first content.
S240: using the domain name and html tag in URL, classifying to the second content of at least part, so that it is determined that
The classification of first content.
S250: according to the domain name in classification and URL, and based on the second content of at least part, to determine that content shows shape
Formula.
S260: URL is equipped with the content shows form and be shown.
By this preferred embodiment, URL is beautified, reader can have been helped to obtain under the premise of not clicking on link more
Information, thus the expressive force of URL is improved, the power that shows of article is enhanced, so as to improve the appeal to user, into
And the clicking rate of user can be increased.
In addition, showing power to assist author to promote article, the embodiment of the present invention also provides a kind of information expression method.
This method may include any of the above-described URL methods of exhibiting embodiment.
Explanation in relation to the present embodiment can refer to the explanation of any of the above-described URL methods of exhibiting embodiment, no longer superfluous herein
It states.
It should be noted that each step in above-mentioned URL methods of exhibiting embodiment and information expression method embodiment can
Sequentially to execute, it can also execute, be not limited thereto parallel.
Based on technical concept identical with URL methods of exhibiting embodiment, the embodiment of the present invention also provides a kind of unified resource
Finger URL URL shows device.The Installation practice can execute above method embodiment.As shown in figure 3, the device 30 can wrap
It includes: obtaining module 32, the first determining module 34, the second determining module 36 and display module 38.Wherein, module 32 is obtained for obtaining
Take first content, wherein first content is the corresponding at least part resource content of URL.First determining module 34 is for determining
The classification of first content.Second determining module 36 is used to determine that content shows form according to classification.Display module 38 is used for will
URL is equipped with content and shows form to be shown.
In some alternative embodiments, on the basis of embodiment shown in Fig. 3, obtaining module be can specifically include: URL
Obtain subelement and picking unit.Wherein, which obtains subelement for obtaining URL.The picking unit is used to pass through following
One or more of modes, to grab the corresponding resource content of URL, to obtain the first content: reptile instrument, ASP webpage
Gripping tool, Java mode, PHP mode, Delphi mode, Python mode, Flex mode or Ruby mode.
In some alternative embodiments, on the basis of embodiment shown in Fig. 3, the first determining module specifically be can wrap
It includes: recognition unit, extraction unit and taxon.Wherein, recognition unit at least part first content for identification.It extracts single
Member is for extracting the second content of at least part, wherein the second content is at least part first content.Taxon for pair
The second content is classified at least partially, so that it is determined that the classification of first content.
In some alternative embodiments, on the basis of the above embodiments, extraction unit can specifically include: serializing
Unit and extraction subelement.Wherein, serialization unit is used to carry out XML serialization at least part first content.Extract son
Unit is used for by extracting at least part XML node information, to realize the extraction of the second content of at least part.
In some alternative embodiments, on the basis of the above embodiments, taxon can specifically include: classification
Unit.Wherein, classification subelement is used to utilize URL and html tag, to classify to the second content of at least part.
In some preferred embodiments, above-mentioned classification subelement is further specifically included: matching unit, classification extraction list
Member, classification determination unit and classifying content unit.Wherein, matching unit is used to carry out domain name and scheduled domain name list
Matching;Wherein, scheduled domain name list includes scheduled domain name and its first category information.Classification extraction unit is for matching
In successful situation, first category information is extracted.Classification determination unit html tag for identification, determines second category information.
Classifying content unit is used for according to first category information and second category information, to divide the second content of at least part
Class.
On the basis of above preferred embodiment, URL shows that device can also include execution unit.The execution unit is used for
In the case where it fails to match, classification determination unit is triggered.
In some alternative embodiments, on the basis of the above embodiments, above-mentioned second determining module specifically can wrap
It includes: determining subelement.Wherein it is determined that subelement is used for according to the domain name in classification and URL, and based at least part second
Hold, to determine that content shows form.
In addition, showing power to assist author to promote article, the embodiment of the present invention also provides a kind of knowledge representation systems.
As shown in figure 4, the system may include that URL shows device and expression module.Wherein, expression module is used to show by URL and fill
It sets to express the URL in article.
Explanation in relation to the present embodiment can refer to the explanation of any of the above-described URL methods of exhibiting embodiment, no longer superfluous herein
It states.
It will be appreciated by those skilled in the art that above-mentioned URL shows that Installation practice and knowledge representation systems embodiment may be used also
To include some known features, such as processor, memory and bus etc..Wherein, processor is connected by bus with memory.
Processor includes but is not limited to ARM, programmable logic device, DSP etc..Memory can be random access memory, can also be with
It is read-only memory.Bus may include data/address bus, address bus and control bus.
It should be noted that being shown in Installation practice and knowledge representation systems embodiment in above-mentioned URL, only with exemplary
Mode module division has been carried out to it, those skilled in the art using other modes it should be appreciated that can also be carried out
Module divides, and the module divided can also be split or be combined again;Also, it can sequentially be held between modules
Row, can also execute parallel, be not limited thereto.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one
Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation
There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain
Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (18)
1. a kind of uniform resource position mark URL methods of exhibiting, which is characterized in that the described method includes:
Obtain first content, wherein the first content is the corresponding at least part resource content of the URL;
Determine the classification of the first content;
According to the classification, determine that content shows form;
The URL is equipped with the content shows form and is shown.
2. the method according to claim 1, wherein the acquisition first content specifically includes:
By following any one or several modes, to grab the corresponding resource content of the URL: reptile instrument, ASP webpage capture
Tool, Java mode, PHP mode, Delphi mode, Python mode, Flex mode or Ruby mode.
3. the method according to claim 1, wherein the classification of the determination first content, specifically includes:
Identification at least part first content;
Extract the second content of at least part, wherein second content is at least part first content;
Classify to second content of at least part, so that it is determined that the classification of the first content.
4. according to the method described in claim 3, it is characterized in that, the extraction the second content of at least part, specifically includes:
XML serialization is carried out at least part first content;
By extracting at least part XML node information, to realize the extraction of second content of at least part.
5. according to the method described in claim 4, it is characterized in that, described divide second content of at least part
Class specifically includes:
Using the domain name and html tag in the URL, to classify to second content of at least part.
6. according to the method described in claim 5, it is characterized in that, the domain name and html tag using in the URL, comes
Classify to second content of at least part, specifically include:
Domain name is matched with scheduled domain name list;Wherein, the scheduled domain name list includes scheduled domain name
And its first category information;
If successful match, the first category information is extracted;
It identifies the html tag, determines second category information;
According to the first category information and the second category information, to divide second content of at least part
Class.
7. according to the method described in claim 6, it is characterized in that, the method also includes:
If it fails to match, the identification html tag step is executed.
8. according to the method any in claim 3, which is characterized in that it is described according to the classification, determine that content shows
Form specifically includes:
According to the domain name in the classification and the URL, and it is based on second content of at least part, to determine the content
Show form.
9. a kind of information expression method, which is characterized in that the described method includes:
By URL methods of exhibiting described in any one of claims 1-8, to express the URL in article.
10. a kind of uniform resource position mark URL shows device, which is characterized in that described device includes:
Module is obtained, for obtaining first content, wherein the first content is the corresponding at least part resource of the URL
Content;
First determining module, for determining the classification of the first content;
Second determining module, for determining that content shows form according to the classification;
Display module shows form for the URL to be equipped with the content and is shown.
11. device according to claim 10, which is characterized in that the acquisition module specifically includes:
URL obtains subelement, for obtaining URL;
Picking unit is used for by following any one or several modes, to grab the corresponding resource content of the URL, to obtain
Obtain the first content: reptile instrument, ASP webpage capture tool, Java mode, PHP mode, Delphi mode, the side Python
Formula, Flex mode or Ruby mode.
12. device according to claim 10, which is characterized in that first determining module specifically includes:
Recognition unit, for identification at least part first content;
Extraction unit, for extracting the second content of at least part, wherein second content is described at least part first
Content;
Taxon, for classifying to second content of at least part, so that it is determined that the classification of the first content.
13. device according to claim 12, which is characterized in that the extraction unit specifically includes:
Serialization unit, for carrying out XML serialization at least part first content;
Subelement is extracted, at least part XML node information is extracted for passing through, to realize second content of at least part
Extraction.
14. device according to claim 13, which is characterized in that the taxon specifically includes:
Classify subelement, for using the domain name and html tag in the URL, to second content of at least part into
Row classification.
15. device according to claim 14, which is characterized in that the classification subelement specifically includes:
Matching unit, for matching domain name with scheduled domain name list;Wherein, the scheduled domain name list packet
Include scheduled domain name and its first category information;
Classification extraction unit, for extracting the first category information in the case where successful match;
Classification determination unit, the html tag, determines second category information for identification;
Classifying content unit is used for according to the first category information and the second category information, to described at least one
The second content is divided to classify.
16. device according to claim 15, which is characterized in that described device further include:
Execution unit, for triggering the classification determination unit in the case where it fails to match.
17. any device in 2 according to claim 1, which is characterized in that second determining module specifically includes:
Subelement is determined, for according to the domain name in the classification and the URL, and based in described at least part second
Hold, to determine that the content shows form.
18. a kind of knowledge representation systems, which is characterized in that the system comprises:
Any URL shows device in claim 10-17;
Module is expressed, for showing device by the URL, to express the URL in article.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710385155.5A CN108959325B (en) | 2017-05-26 | 2017-05-26 | Uniform resource locator display method, information display method and related products thereof |
PCT/CN2018/088438 WO2018214964A1 (en) | 2017-05-26 | 2018-05-25 | Uniform resource locator display method, information expression method and related product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710385155.5A CN108959325B (en) | 2017-05-26 | 2017-05-26 | Uniform resource locator display method, information display method and related products thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108959325A true CN108959325A (en) | 2018-12-07 |
CN108959325B CN108959325B (en) | 2021-06-29 |
Family
ID=64396255
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710385155.5A Active CN108959325B (en) | 2017-05-26 | 2017-05-26 | Uniform resource locator display method, information display method and related products thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108959325B (en) |
WO (1) | WO2018214964A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GR1010585B (en) * | 2022-11-10 | 2023-12-12 | Παναγιωτης Τσαντιλας | Web crawling and content summarization |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103258058A (en) * | 2013-06-03 | 2013-08-21 | 贝壳网际(北京)安全技术有限公司 | Page display method and system and browser |
CN103544178A (en) * | 2012-07-13 | 2014-01-29 | 百度在线网络技术(北京)有限公司 | Method and equipment for providing reconstruction page corresponding to target page |
CN104468720A (en) * | 2014-11-07 | 2015-03-25 | 广州市至德科技企业孵化器有限公司 | Method for determining preview link and providing dynamic preview information for preview link |
US20150356196A1 (en) * | 2014-06-04 | 2015-12-10 | International Business Machines Corporation | Classifying uniform resource locators |
CN105512149A (en) * | 2014-10-14 | 2016-04-20 | 阿里巴巴集团控股有限公司 | Method for updating and displaying keyword information of digital reading resource and relevant devices |
CN106033428A (en) * | 2015-03-11 | 2016-10-19 | 北大方正集团有限公司 | A uniform resource locator selecting method and a uniform resource locator selecting device |
CN106095453A (en) * | 2016-06-16 | 2016-11-09 | 北京金山安全软件有限公司 | Information display method and device and electronic equipment |
-
2017
- 2017-05-26 CN CN201710385155.5A patent/CN108959325B/en active Active
-
2018
- 2018-05-25 WO PCT/CN2018/088438 patent/WO2018214964A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544178A (en) * | 2012-07-13 | 2014-01-29 | 百度在线网络技术(北京)有限公司 | Method and equipment for providing reconstruction page corresponding to target page |
CN103258058A (en) * | 2013-06-03 | 2013-08-21 | 贝壳网际(北京)安全技术有限公司 | Page display method and system and browser |
US20150356196A1 (en) * | 2014-06-04 | 2015-12-10 | International Business Machines Corporation | Classifying uniform resource locators |
CN105512149A (en) * | 2014-10-14 | 2016-04-20 | 阿里巴巴集团控股有限公司 | Method for updating and displaying keyword information of digital reading resource and relevant devices |
CN104468720A (en) * | 2014-11-07 | 2015-03-25 | 广州市至德科技企业孵化器有限公司 | Method for determining preview link and providing dynamic preview information for preview link |
CN106033428A (en) * | 2015-03-11 | 2016-10-19 | 北大方正集团有限公司 | A uniform resource locator selecting method and a uniform resource locator selecting device |
CN106095453A (en) * | 2016-06-16 | 2016-11-09 | 北京金山安全软件有限公司 | Information display method and device and electronic equipment |
Non-Patent Citations (2)
Title |
---|
IT昆仑: "希望博客新增加的功能——"网页预览"&SNAP", 《HTTPS://BLOG.51CTO.COM/QINYEZHAI/113164》 * |
WEIXIN_33834628: "希望博客新增加的功能—"网页预览"&SNAP", 《HTTPS://BLOG.CSDN.NET/WEIXIN_33834628/ARTICLE/DETAILS/92481412》 * |
Also Published As
Publication number | Publication date |
---|---|
WO2018214964A1 (en) | 2018-11-29 |
CN108959325B (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9703883B2 (en) | Social bookmarking of resources exposed in web pages | |
CN107423391B (en) | Information extraction method of webpage structured data | |
JP4226261B2 (en) | Structured document type determination system and structured document type determination method | |
CN103678511B (en) | The method and device of webpage content extraction is carried out according to visual template | |
JP2011003182A (en) | Keyword display method and system thereof | |
CN107016102B (en) | A kind of big data web crawlers paging configuration method | |
CN101630330A (en) | Method for webpage classification | |
CN105302815B (en) | The filter method and device of the uniform resource position mark URL of webpage | |
WO2011017929A1 (en) | Method and apparatus for positioning effective information quickly by mobile phone browser | |
Gunawan et al. | Improving data collection on article clustering by using distributed focused crawler | |
CN106294885A (en) | A kind of data collection towards isomery webpage and mask method | |
Prasad et al. | Coreex: content extraction from online news articles | |
CN106547749A (en) | The method and apparatus of collecting webpage data | |
CN104428763B (en) | Structuring and unstructured data are realized to the method in XML file | |
CN103778156A (en) | Method and device for searching for data and server for data search | |
CN111381809B (en) | Method and device for searching focus page | |
CN107368546A (en) | A kind of method and apparatus for generating outline | |
CN103631793B (en) | A kind of method, apparatus and equipment for being ranked up to search result | |
CN108959325A (en) | Uniform resource locator methods of exhibiting, information expression method and its Related product | |
CN103631944B (en) | A kind of content-based similar webpage splitting method | |
CN112612990A (en) | Webpage analysis method, system and computer readable storage medium | |
CN105183843A (en) | List page recognition system and method | |
CN106897287B (en) | Webpage release time extraction method and device for webpage release time extraction | |
CN103729354B (en) | web information processing method and device | |
CN108196874B (en) | Webpage analysis method and device, storage medium and program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |