CN104715064A - Method and server for marking keywords on webpage - Google Patents

Method and server for marking keywords on webpage Download PDF

Info

Publication number
CN104715064A
CN104715064A CN201510149902.6A CN201510149902A CN104715064A CN 104715064 A CN104715064 A CN 104715064A CN 201510149902 A CN201510149902 A CN 201510149902A CN 104715064 A CN104715064 A CN 104715064A
Authority
CN
China
Prior art keywords
url
keyword
webpage
request
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510149902.6A
Other languages
Chinese (zh)
Other versions
CN104715064B (en
Inventor
李月雷
王志青
贾文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510149902.6A priority Critical patent/CN104715064B/en
Publication of CN104715064A publication Critical patent/CN104715064A/en
Application granted granted Critical
Publication of CN104715064B publication Critical patent/CN104715064B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Abstract

The invention discloses a method and a server for marking keywords on a webpage. The method for marking the keywords on the webpage comprises the following steps: receiving a keyword marking request containing a webpage URL sent from a client side; obtaining one or more corresponding keywords according to the webpage URL in the keyword marking request; returning the one or more found keywords and each search script code corresponding to each keyword to the client so that the client can mark the one or more keywords contained in the webpage when the webpage corresponding to the webpage URL contained in the keyword marking request is loaded and rendered. The method enables the client to mark and highlight the one or more keywords contained in the webpage when the webpage corresponding to the webpage URL is loaded and rendered, so that a user can check and operate the keywords conveniently; as a result, the experience of the user is enhanced.

Description

A kind ofly realize on webpage, mark keyword method and server
Technical field
The present invention relates to Internet technology, be specifically related to a kind of method and the server that realize marking keyword on webpage.
Background technology
Along with the development of Internet technology, increasing user uses internet to browse webpage obtaining information, user is when browsed web content, may interested in certain or some keywords on webpage, one of main method of current web search index is keyword search, and namely user inputs a word/sentence, searches for as content, search engine display of search results, the content of wherein user's input is exactly keyword.According to prior art, if user thinks the information understanding this keywords more further, oneself manual operation is just needed to choose this keyword, being copied by this keyword pastes in the search column of search engine, result of page searching is jumped to after clicking search, in result of page searching, check the more information relevant with this keyword, the mode of this acquisition keyword relational information is very inconvenient, and Consumer's Experience is poor.
Summary of the invention
In view of the above problems, the present invention is proposed to provide a kind of overcoming the problems referred to above or the realization that solves the problem at least in part marks the method for keyword and corresponding server on webpage.
According to one aspect of the present invention, provide a kind of method realizing marking keyword on webpage, the method comprises:
Receive the keyword mark request comprising webpage URL that client-side is sent;
According to the webpage URL in the request of keyword mark, obtain corresponding one or more keyword;
Found one or more keyword and the search script code corresponding respectively with each keyword are returned to client, to make client when loading the webpage playing up the webpage URL indication comprised in the request of keyword mark, the one or more keywords comprised in this webpage are marked.
Alternatively, according to the webpage URL in the request of keyword mark, obtain corresponding one or more keyword and comprise:
Based on the webpage URL comprised in keyword request, extract the text message of webpage URL;
Find out the one or more keywords with the Keywords matching in keyword dictionary in text message.
Alternatively, according to the webpage URL in the request of keyword mark, obtain corresponding one or more keyword and comprise:
According to the webpage URL comprised in keyword request, from keyword database, find out corresponding one or more keywords; Wherein, in keyword database, correspondence preserves the keyword comprised in webpage URL and webpage URL indication webpage.
Alternatively, keyword database adopts Redis as storage, and adopts master-slave back-up.
Alternatively, the method comprises further:
Hot data enquiry frequency in keyword database being greater than preset value is loaded in internal memory;
According to the webpage URL comprised in keyword request, first audit memory, then searching keyword database.
Alternatively, the method comprises the step that off-line makes keyword database further, specifically comprises:
Obtain url list;
Utilize the webpage HTML code that each URL in Web Spider crawl url list is corresponding;
Web page text relevant information is extracted from webpage HTML code;
Keyword is extracted from Web page text relevant information.
Alternatively, obtain url list to comprise:
Periodically obtain the user access logs that browser end is recommended;
The URL of user's access is obtained from user access logs;
The URL that the user of acquisition accesses is added in url list.
Alternatively, before the URL user of acquisition accessed adds in url list, the method comprises further:
According to the pageview of corresponding webpage, Screening Treatment is carried out to the URL of the user's access obtained, the URL after screening is added in url list.
Alternatively, before the URL user of acquisition accessed adds in url list, the method comprises further:
Judge that the URL of user access obtained is whether in URL white list, if, then this URL is added in url list;
And/or, judge that the URL of user access obtained is whether in URL blacklist, if, then this URL is not added in url list.
Corresponding with the method that aforementioned realization marks keyword on webpage, present invention also offers a kind of server realizing marking keyword on webpage, this server comprises:
Receiving element, is suitable for the keyword mark request comprising webpage URL that reception client-side is sent;
Keyword query unit, is suitable for, according to the webpage URL in the request of keyword mark, obtaining corresponding one or more keyword;
Feedback unit, found one or more keyword and the search script code corresponding respectively with each keyword is suitable for return to client, to make client when loading the webpage playing up the webpage URL indication comprised in the request of keyword mark, the one or more keywords comprised in this webpage are marked.
The beneficial effect of technical scheme of the present invention is: receive the keyword mark request comprising webpage URL that client-side is sent, return the one or more keyword found after the request of this keyword mark is processed and distinguish corresponding search script code to client with each keyword, make client when loading the webpage playing up the webpage URL indication comprised in the request of keyword mark, the one or more keywords comprised in this webpage are marked.Such user, when checking webpage by client, can see the keyword of mark, highlight, and facilitates user to check, operates further this keyword, promotes Consumer's Experience.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to technological means of the present invention can be better understood, and can be implemented according to the content of instructions, and can become apparent, below especially exemplified by the specific embodiment of the present invention to allow above and other objects of the present invention, feature and advantage.
Accompanying drawing explanation
By reading hereafter detailed description of the preferred embodiment, various other advantage and benefit will become cheer and bright for those of ordinary skill in the art.Accompanying drawing only for illustrating the object of preferred implementation, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:
Fig. 1 shows a kind of according to an embodiment of the invention schematic flow sheet realizing the method marking keyword on webpage;
Fig. 2 shows the schematic flow sheet searching keyword according to an embodiment of the invention; And
Fig. 3 shows the effect schematic diagram utilizing method shown in Fig. 1 to mark keyword according to an embodiment of the invention on webpage;
Fig. 4 shows on webpage shown in Fig. 3, utilizes mark keyword to carry out the effect schematic diagram searched for according to an embodiment of the invention;
Fig. 5 shows off-line according to an embodiment of the invention and makes the schematic flow sheet of keyword database;
Fig. 6 shows a kind of according to an embodiment of the invention block diagram realizing the server marking keyword on webpage.
Embodiment
Below with reference to accompanying drawings exemplary embodiment of the present disclosure is described in more detail.Although show exemplary embodiment of the present disclosure in accompanying drawing, however should be appreciated that can realize the disclosure in a variety of manners and not should limit by the embodiment set forth here.On the contrary, provide these embodiments to be in order to more thoroughly the disclosure can be understood, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
In the present invention for prior art user when the web page contents of access websites, the keyword on webpage is not marked, the inconvenient problem with use caused, provides a kind of technical scheme realizing marking keyword on webpage.By this scheme of the present invention, the current web page that user is accessed carry out loading play up time, realize respectively the one or more keywords comprised in current web page are marked, highlight keyword, the further search operation of the convenient keyword to mark, thus promote Consumer's Experience.
Fig. 1 shows a kind of according to an embodiment of the invention schematic flow sheet realizing the method marking keyword on webpage, and see Fig. 1, the method that this realization marks keyword on webpage comprises:
Step S110, receives the keyword mark request comprising webpage URL that client-side is sent;
Step S120, according to the webpage URL in the request of described keyword mark, obtains corresponding one or more keyword;
Step S130, found one or more keyword and the search script code corresponding respectively with each keyword are returned to client, to make client when loading the webpage playing up the webpage URL indication comprised in the request of described keyword mark, the one or more keywords comprised in this webpage are marked.
By the method shown in Fig. 1, achieve the mark to keyword on webpage, such user, when the corresponding webpage of access websites, can see that the keyword on this webpage is marked, highlight, improve experience during user's browsed web content.
In one embodiment of the invention, the method shown in Fig. 1 comprises further: the hot data enquiry frequency in keyword database being greater than preset value is loaded in internal memory;
According to the webpage URL comprised in keyword request, first audit memory, then searching keyword database.
Owing to often storing a large amount of data in keyword database, when server inquires about corresponding keyword according to keyword request in keyword database, workload is large, length consuming time.In order to improve server data processing speed and response efficiency, hot data by the enquiry frequency in keyword database being greater than preset value in the present embodiment is loaded in internal memory, first audit memory during server execution query manipulation, then searching keyword database.Compared with keyword database, internal memory reads the speed of information quickly, and memory capacity is less relative to keyword database, effectively can shorten query time.If when server does not inquire the webpage URL in keyword request in internal memory, then go to inquire about in keyword database.In addition, be that hot data enquiry frequency in keyword database being greater than preset value is loaded in internal memory in the present embodiment, concrete preset value can concrete condition be arranged.Such as, preset value can be 100, when in section sometime to the inquiry of these data more than 100 times, from keyword database by this Data import in internal memory, it also avoid the problem Data import of a large amount of non-hot topic being increased in internal memory internal memory burden.
In one embodiment of the invention, in the method shown in Fig. 1, step S120 comprises: based on the webpage URL comprised in keyword request, extracts the text message of webpage URL;
Find out the one or more keywords with the Keywords matching in keyword dictionary in text message.
Fig. 2 shows the schematic flow sheet searching keyword according to an embodiment of the invention, see Fig. 2, based on the webpage URL comprised in keyword request, extracts the text message of webpage URL; Find out in text message and comprise with one or more keyword specific implementation processes of the Keywords matching in keyword dictionary:
Step S210, inputs the text message of the webpage URL extracted;
Wherein, the text message of webpage URL here, surveys the webpage URL comprised in the keyword request sent based on the client received, extract the webpage URL text message that the text message in webpage URL obtains.
Step S220, carries out word segmentation processing in text message;
Here namely text message is Web page text relevant information, after carrying out word segmentation processing, is mated one by one by the string obtained with keyword dictionary text message.In the present embodiment, what carry out word segmentation processing employing to text message is combination grain participle.Participle is exactly process continuous print word sequence being reassembled into word sequence according to certain specification.Difference according to webpage Chinese version form has different process, such as, in the style of writing of English, using space as natural delimiter between word, thus, when occurring English in text message, utilizes space to carry out participle.And Chinese just word, sentence and section are simply demarcated by obvious delimiter, Chinese word segmentation (Chinese Word Segmentation) refers to and a Chinese character sequence is cut into word independent one by one.Concrete word segmentation processing mode has: Boundary Recognition (such as, identifying the border of " word " and " phrase "), disambiguation process (namely utilizing syntactic information and semantic information to process Ambiguity) and interior chain identification (namely judging the border of " word " and " interior chain ").
It should be noted that, participle technique is prior art, can utilize existing participle technique to realize and carry out word segmentation processing in text message, not repeat them here.
Step S230, mates the string after word segmentation processing with keyword dictionary;
Keyword dictionary is that in the text message defining the webpage URL extracted, which word can mark, and these words that can be marked are all the words be of practical significance.In one embodiment of the invention, the source of the keyword in keyword dictionary comprise following in one or more:
Organization names;
Word in " encyclopaedia " storehouse that search mechanism provides;
Sliding word, namely user is when browsing webpage, and webpage is chosen the word of line search of going forward side by side;
Word in " entity storehouse " that search mechanism provides, defines the relation between entity in shown entity storehouse.These words are entries of a complete meaning.
Wherein, organization names is used to identify and concrete mechanism such as identifies those words of the titles such as colleges and universities, company, government bodies of research institute.Sliding word refers to user when access websites browsed web content by right mouse button, chooses some particular words, searches for, these word/sentences chosen by user.Sliding word is the word obtained according to the behavior of user, can embody hobby when user browses webpage and interest.
In one embodiment of the invention, when mating with keyword dictionary one by one the string after participle, a kind of matching algorithm of employing is Trie tree, and Trie tree is set also known as word lookup.Trie tree is a kind of tree structure, and Trie tree typical case is with being statistics, sorting and preserve a large amount of character strings (but being not limited only to character string), and frequent searched automotive engine system is used for text word frequency statistics.The advantage of Trie tree is: utilize the common prefix of character string to reduce query time, reduce meaningless character string comparison to greatest extent, search efficiency is higher than Hash table.
Step S240, hit, then extract key word, exports the text list that will mark.
By the process shown in Fig. 2, the keyword that the webpage URL that the client received can be sent is corresponding marks, and checks or utilize the keyword of mark to carry out search operation to facilitate user.
Fig. 3 shows the schematic diagram utilizing method shown in Fig. 1 to mark keyword according to an embodiment of the invention on webpage; See Fig. 3, method according to Fig. 1 is after the step shown in Fig. 2, the effect that user sees when the webpage of access websites is: mark the keyword " mining industry Science and Technology Ltd. of Beida Jadebird " of organization names class, " Lian Sheng energy investment company limited " and " Liulin County Lian Sheng group ", highlight.
Fig. 4 shows on webpage shown in Fig. 3, utilizes mark keyword to carry out the effect schematic diagram searched for according to an embodiment of the invention, see Fig. 4, because the keyword on webpage marks, highlight, user is when the webpage of access websites, only need the keyword directly clicking this mark, search engine directly can jump to this keyword search results page, represents the information relevant to this keyword.Do not need user manually to choose interested word again, copy the search column pasting search engine, obtain the information relevant to this keyword, eliminate the operation of these complexity, improve user's experience.
In another embodiment of the present invention, in the method shown in Fig. 1, step S120 comprises:
According to the webpage URL comprised in keyword request, from keyword database, find out corresponding one or more keywords; Wherein, in keyword database, correspondence preserves the keyword comprised in webpage URL and webpage URL indication webpage.
In one embodiment of the invention, keyword database adopts Redis as storage, and adopts master-slave back-up.Redis is that a use ANSI C language of increasing income is write, network enabled, can also can log type, the Key-Value database of persistence based on internal memory.Redis supports master-slave synchronisation backup, and data can synchronous from server from master server to any amount, here from server can be other master servers from server of association.Redis has the extensibility of read operation, reduces the advantage of data redundancy.
In one embodiment of the invention, the method shown in Fig. 1 comprises further: off-line makes the step of keyword database.Fig. 5 shows off-line according to an embodiment of the invention and makes the schematic flow sheet of keyword database; See Fig. 5, off-line makes keyword database and specifically comprises:
Step S510, obtains url list;
Step S520, utilizes the webpage HTML code that each URL in Web Spider crawl url list is corresponding;
Concrete, submit to the task of capturing url list to Web Spider, by the url list obtained, under being placed on the ad-hoc location of Hadoop database.Wherein, what Web Spider captured is whole webpage HTML code, needs to carry out resolving to extract useful information.
Step S530, extracts Web page text relevant information from webpage HTML code;
Step S540, extracts keyword from Web page text relevant information.
In one embodiment of the invention, step S510 acquisition url list comprises: periodically obtain the user access logs that browser end is recommended;
The URL of user's access is obtained from user access logs;
The URL that the user of acquisition accesses is added in url list.
Wherein, obtain the user access logs of browser recommendation according to hour granularity, prepare for off-line excavates keyword making keyword database.In user access logs, a most important field is the URL of the website that user accesses.
In one embodiment of the invention, before the URL user of acquisition accessed adds in url list, the method shown in Fig. 1 comprises further:
According to the pageview of corresponding webpage, Screening Treatment is carried out to the URL of the user's access obtained, the URL after screening is added in url list.
Due to the flow that the user access logs gathered in step S510 is the whole network, quite huge, if do not processed this url list, the words directly making keyword database can cause the processing time therefore longer, are necessary that the url list to obtaining screens to improve data processing speed.A kind of concrete mode is screened the url list obtained according to high frequency PV, and PV here refers to Page View, i.e. page browsing amount, and web page browsing amount evaluates one of the most frequently used index of website traffic.In the present embodiment, the page browsing amount according to website is screened url list, and the URL that those page browsing amounts are greater than preset value is put into url list.Preset value can be arranged according to the application of reality, does not limit this.
In one embodiment of the invention, before the URL user of acquisition accessed adds in url list, the method shown in Fig. 1 comprises further:
Judge that the URL of user access obtained is whether in URL white list, if, then this URL is added in url list;
And/or, judge that the URL of user access obtained is whether in URL blacklist, if, then this URL is not added in url list.
When practical application, the method can configure whether can mark keyword in webpage according to the concrete condition of the website of user's access, such as, the website that hope can be marked keyword is added in white list, if when getting the URL of the website of user's access like this, first judge this URL whether in URL white list, if, then this URL is added in url list.
And/or, will not wish that the website being marked keyword is added in blacklist, if when getting the URL of the website of user access, judge this URL whether in URL blacklist, if, then this URL is not added in url list.
Certainly, also the URL of the website not wishing to be marked keyword can be arranged to add in white list, or URL hope being marked the website of keyword adds in blacklist, carry out which list concrete judgement is in when getting the URL of user's access, thus determine whether to add in url list.The granularity storing blacklist or white list can be designed to Host granularity or URL granularity.Granularity refers in the data unit of data warehouse preserves the refinement of data or the rank of degree of integration, and degree of refinement is higher, and particle size fraction is less; On the contrary, degree of refinement is lower, and particle size fraction is larger.
In one embodiment of the invention, step S530, extracts Web page text relevant information and comprises from webpage HTML code:
Title label in extraction webpage HTML code, the content in mate info label and body matter, filtering advertisement and outer chain.
By filtering advertisement and outer chain etc., helpful information is not had to keyword mark, extract the information that content in title label, meta info label and body matter etc. are useful, the structure of keyword database can be simplified, avoid data redundancy.
Above specific description is carried out to the method that this realization of the present invention marks keyword on webpage, can find out, this realization of the present invention marks the method for keyword by marking the keyword on webpage on webpage, highlight, facilitate user and check keyword, simplify and utilize keyword to carry out the step of searching for, improve user's experience.
Corresponding with the method that above-mentioned realization marks keyword on webpage, present invention also offers a kind of server realizing marking keyword on webpage, the server 600 that this realization marks keyword on webpage comprises:
Receiving element 610, is suitable for the keyword mark request comprising webpage URL that reception client-side is sent;
Keyword query unit 620, is suitable for, according to the webpage URL in the request of keyword mark, obtaining corresponding one or more keyword;
Feedback unit 630, found one or more keyword and the search script code corresponding respectively with each keyword is suitable for return to client, to make client when loading the webpage playing up the webpage URL indication comprised in the request of keyword mark, the one or more keywords comprised in this webpage are marked.
Corresponding with server, client, when receiving the webpage URL that server returns, loads and plays up this URL, presenting the effect that the keyword on webpage is marked, such user can see when being checked webpage by client-access website the keyword be marked, and facilitates user to check.
In one embodiment of the invention, the keyword query unit 620 shown in Fig. 6, is suitable for the webpage URL based on comprising in keyword request, extracts the text message of webpage URL; Find out the one or more keywords with the Keywords matching in keyword dictionary in text message.
In one embodiment of the invention, the server 600 shown in Fig. 6 comprises further: Database Unit, is suitable for storing keyword database;
Keyword query unit, is suitable for the webpage URL according to comprising in keyword request, finds out corresponding one or more keywords from keyword database; Wherein, in keyword database, correspondence preserves the keyword comprised in webpage URL and webpage URL indication webpage.
In one embodiment of the invention, adopt Redis to store keyword database in Database Unit, and adopt master-slave back-up.
In one embodiment of the invention, the server 600 shown in Fig. 6 comprises further:
Hot word loading unit, the hot data being suitable for the enquiry frequency in keyword database to be greater than preset value is loaded in internal memory;
Keyword query unit, is suitable for the webpage URL according to comprising in keyword request, first audit memory, then searching keyword database.
In one embodiment of the invention, the server 600 shown in Fig. 6 comprises the device that off-line makes keyword database further, specifically comprises:
URL acquiring unit, is suitable for obtaining url list;
Webpage capture unit, is suitable for the webpage HTML code utilizing each URL in Web Spider crawl url list corresponding;
Text extracting unit, is suitable for extracting Web page text relevant information from webpage HTML code;
Keyword extracting unit, is suitable for extracting keyword from Web page text relevant information.
In one embodiment of the invention, URL acquiring unit, is suitable for the user access logs periodically obtaining browser end recommendation, obtains the URL of user's access, added in url list by the URL that the user of acquisition accesses from user access logs.
In one embodiment of the invention, URL acquiring unit, be suitable for before the URL user of acquisition accessed adds in url list, further according to the pageview of corresponding webpage, Screening Treatment carried out to the URL of the user's access obtained, the URL after screening is added in url list.
In one embodiment of the invention, URL acquiring unit, is suitable for before the URL user of acquisition accessed adds in url list, judges further URL that the user obtained accesses whether in URL white list, if, then this URL is added in url list; And/or, judge further URL that the user obtained accesses whether in URL blacklist, if, then this URL is not added in url list.
In one embodiment of the invention, text extracting unit, is suitable for extracting the title label in webpage HTML code, the content in mate info label and body matter, filtering advertisement and outer chain.
In one embodiment of the invention, keyword extracting unit, is suitable for carrying out word segmentation processing to Web page text relevant information, is mated one by one by the string after participle with keyword dictionary, and hit is then extracted as keyword.
In one embodiment of the invention, the server 600 shown in Fig. 6 comprises further:
Keyword dictionary generation unit, is suitable for obtaining keyword from one or more the source comprised as follows and adds in keyword dictionary:
Organization names;
Word in " encyclopaedia " storehouse that search mechanism provides;
Sliding word, namely user is when browsing webpage, and webpage is chosen the word of line search of going forward side by side;
Word in " entity storehouse " that search mechanism provides, defines the relation between entity in shown entity storehouse.
It should be noted that, the server that this realization of the embodiment of the present invention marks keyword on webpage is that on webpage, mark the method for keyword with aforesaid realization corresponding, therefore, the specific works process of the server in the present embodiment see the corresponding explanation of preceding method part, can not repeat them here.
In sum, this realization of the present invention on webpage, mark the method for keyword and server can realize marking the keyword on the webpage of the website of user's access, highlight, user is facilitated to check, simplifying user utilizes the keyword that webpage is marked to carry out the operation of searching for, and improves user's experience.
It should be noted that:
Intrinsic not relevant to any certain computer, virtual bench or miscellaneous equipment with display at this algorithm provided.Various fexible unit also can with use based on together with this teaching.According to description above, the structure constructed required by this kind of device is apparent.In addition, the present invention is not also for any certain programmed language.It should be understood that and various programming language can be utilized to realize content of the present invention described here, and the description done language-specific is above to disclose preferred forms of the present invention.
In instructions provided herein, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand in each inventive aspect one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and adaptively can change the module in the equipment in embodiment and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.Except at least some in such feature and/or process or unit be mutually repel except, any combination can be adopted to combine all processes of all features disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or equipment or unit.Unless expressly stated otherwise, each feature disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.
In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in the following claims, the one of any of embodiment required for protection can use with arbitrary array mode.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize according to the some or all parts in the server of the embodiment of the present invention.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.

Claims (10)

1. realize the method marking keyword on webpage, wherein, the method comprises:
Receive the keyword mark request comprising webpage URL that client-side is sent;
According to the webpage URL in the request of described keyword mark, obtain corresponding one or more keyword;
Found one or more keyword and the search script code corresponding respectively with each keyword are returned to client, to make client when loading the webpage playing up the webpage URL indication comprised in the request of described keyword mark, the one or more keywords comprised in this webpage are marked.
The method of claim 1, wherein 2. described according to the webpage URL in the request of described keyword mark, obtain corresponding one or more keyword and comprise:
Based on the webpage URL comprised in described keyword request, extract the text message of described webpage URL;
Find out the one or more keywords with the Keywords matching in keyword dictionary in described text message.
3. the method as described in any one of claim 1-2, wherein, described according to the webpage URL in the request of described keyword mark, obtain corresponding one or more keyword and comprise:
According to the webpage URL comprised in described keyword request, from keyword database, find out corresponding one or more keywords; Wherein, in described keyword database, correspondence preserves the keyword comprised in webpage URL and webpage URL indication webpage.
4. the method as described in any one of claim 1-3, wherein, described keyword database adopts Redis as storage, and adopts master-slave back-up.
5. the method as described in any one of claim 1-4, wherein, the method comprises further:
Hot data enquiry frequency in keyword database being greater than preset value is loaded in internal memory;
According to the webpage URL comprised in described keyword request, first audit memory, then searching keyword database.
6. the method as described in any one of claim 1-5, wherein, the method comprises the step that off-line makes described keyword database further, specifically comprises:
Obtain url list;
Web Spider is utilized to capture webpage HTML code corresponding to each URL in described url list;
Web page text relevant information is extracted from webpage HTML code;
Keyword is extracted from Web page text relevant information.
7. the method as described in any one of claim 1-6, wherein, described acquisition url list comprises:
Periodically obtain the user access logs that browser end is recommended;
The URL of user's access is obtained from user access logs;
The URL that the user of acquisition accesses is added in url list.
8. the method as described in any one of claim 1-7, wherein, before the URL user of acquisition accessed adds in url list, the method comprises further:
According to the pageview of corresponding webpage, Screening Treatment is carried out to the URL of the user's access obtained, the URL after screening is added in url list.
9. the method as described in any one of claim 1-8, wherein, before the URL user of acquisition accessed adds in url list, the method comprises further:
Judge that the URL of user access obtained is whether in URL white list, if, then this URL is added in url list;
And/or, judge that the URL of user access obtained is whether in URL blacklist, if, then this URL is not added in url list.
10. realize the server marking keyword on webpage, wherein, this server comprises:
Receiving element, is suitable for the keyword mark request comprising webpage URL that reception client-side is sent;
Keyword query unit, is suitable for, according to the webpage URL in the request of described keyword mark, obtaining corresponding one or more keyword;
Feedback unit, found one or more keyword and the search script code corresponding respectively with each keyword is suitable for return to client, to make client when loading the webpage playing up the webpage URL indication comprised in the request of described keyword mark, the one or more keywords comprised in this webpage are marked.
CN201510149902.6A 2015-03-31 2015-03-31 It is a kind of to realize the method and server that keyword is marked on webpage Expired - Fee Related CN104715064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510149902.6A CN104715064B (en) 2015-03-31 2015-03-31 It is a kind of to realize the method and server that keyword is marked on webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510149902.6A CN104715064B (en) 2015-03-31 2015-03-31 It is a kind of to realize the method and server that keyword is marked on webpage

Publications (2)

Publication Number Publication Date
CN104715064A true CN104715064A (en) 2015-06-17
CN104715064B CN104715064B (en) 2018-11-02

Family

ID=53414390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510149902.6A Expired - Fee Related CN104715064B (en) 2015-03-31 2015-03-31 It is a kind of to realize the method and server that keyword is marked on webpage

Country Status (1)

Country Link
CN (1) CN104715064B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933197A (en) * 2015-07-13 2015-09-23 北京天天卓越科技有限公司 Method and terminal equipment for determining keywords
CN105117498A (en) * 2015-09-28 2015-12-02 北京奇虎科技有限公司 Webpage data processing method and device
CN106021439A (en) * 2016-05-16 2016-10-12 腾讯科技(深圳)有限公司 Communication number processing method and device
CN106407229A (en) * 2015-08-03 2017-02-15 天脉聚源(北京)科技有限公司 Webpage keyword matching method and system
WO2017117912A1 (en) * 2016-01-04 2017-07-13 百度在线网络技术(北京)有限公司 Data acquisition method, apparatus and device, and computer storage medium
CN107203546A (en) * 2016-03-17 2017-09-26 阿里巴巴集团控股有限公司 A kind of textual presentation method and apparatus
CN107341267A (en) * 2017-07-24 2017-11-10 郑州云海信息技术有限公司 A kind of distributed file system access method and platform
CN108920593A (en) * 2018-06-27 2018-11-30 上海深势信息科技有限公司 Text display method, device, equipment and storage medium
CN109144503A (en) * 2018-08-29 2019-01-04 北京城市网邻信息技术有限公司 Pass through the method, apparatus, equipment and readable storage medium storing program for executing of Redux storing data
CN110309395A (en) * 2019-07-05 2019-10-08 云南电网有限责任公司电力科学研究院 A kind of professional dictionary construction method based on data acquisition technology
CN112507664A (en) * 2020-12-29 2021-03-16 医渡云(北京)技术有限公司 Webpage element labeling method and device
CN113434795A (en) * 2021-06-23 2021-09-24 杭州米络星科技(集团)有限公司 Webpage rendering method, device, equipment and storage medium
CN117131301A (en) * 2023-10-24 2023-11-28 苏州阿基米德网络科技有限公司 Webpage end browsing method of medical equipment document

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202314B (en) * 2016-06-30 2020-02-14 北京奇虎科技有限公司 Method and device for searching keywords in webpage

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050547A1 (en) * 2003-08-29 2005-03-03 Whittle Derrick W. Method and apparatus for providing desktop application functionality in a client/server architecture
CN102065145A (en) * 2010-12-31 2011-05-18 华为技术有限公司 Information issuing method, device and system
CN102135967A (en) * 2010-01-27 2011-07-27 华为技术有限公司 Webpage keywords extracting method, device and system
CN102915380A (en) * 2012-11-19 2013-02-06 北京奇虎科技有限公司 Method and system for carrying out searching on data
CN103577597A (en) * 2013-11-15 2014-02-12 北京奇虎科技有限公司 Keyword searching system based on current browse webpage
CN104199954A (en) * 2012-06-26 2014-12-10 北京奇虎科技有限公司 Recommendation system and method for search input

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050547A1 (en) * 2003-08-29 2005-03-03 Whittle Derrick W. Method and apparatus for providing desktop application functionality in a client/server architecture
CN102135967A (en) * 2010-01-27 2011-07-27 华为技术有限公司 Webpage keywords extracting method, device and system
CN102065145A (en) * 2010-12-31 2011-05-18 华为技术有限公司 Information issuing method, device and system
CN104199954A (en) * 2012-06-26 2014-12-10 北京奇虎科技有限公司 Recommendation system and method for search input
CN102915380A (en) * 2012-11-19 2013-02-06 北京奇虎科技有限公司 Method and system for carrying out searching on data
CN103577597A (en) * 2013-11-15 2014-02-12 北京奇虎科技有限公司 Keyword searching system based on current browse webpage

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933197A (en) * 2015-07-13 2015-09-23 北京天天卓越科技有限公司 Method and terminal equipment for determining keywords
CN106407229A (en) * 2015-08-03 2017-02-15 天脉聚源(北京)科技有限公司 Webpage keyword matching method and system
CN105117498A (en) * 2015-09-28 2015-12-02 北京奇虎科技有限公司 Webpage data processing method and device
WO2017117912A1 (en) * 2016-01-04 2017-07-13 百度在线网络技术(北京)有限公司 Data acquisition method, apparatus and device, and computer storage medium
CN107203546B (en) * 2016-03-17 2021-07-16 创新先进技术有限公司 Text display method and device
CN107203546A (en) * 2016-03-17 2017-09-26 阿里巴巴集团控股有限公司 A kind of textual presentation method and apparatus
CN106021439A (en) * 2016-05-16 2016-10-12 腾讯科技(深圳)有限公司 Communication number processing method and device
CN107341267A (en) * 2017-07-24 2017-11-10 郑州云海信息技术有限公司 A kind of distributed file system access method and platform
CN108920593A (en) * 2018-06-27 2018-11-30 上海深势信息科技有限公司 Text display method, device, equipment and storage medium
CN109144503A (en) * 2018-08-29 2019-01-04 北京城市网邻信息技术有限公司 Pass through the method, apparatus, equipment and readable storage medium storing program for executing of Redux storing data
CN110309395A (en) * 2019-07-05 2019-10-08 云南电网有限责任公司电力科学研究院 A kind of professional dictionary construction method based on data acquisition technology
CN112507664A (en) * 2020-12-29 2021-03-16 医渡云(北京)技术有限公司 Webpage element labeling method and device
CN113434795A (en) * 2021-06-23 2021-09-24 杭州米络星科技(集团)有限公司 Webpage rendering method, device, equipment and storage medium
CN117131301A (en) * 2023-10-24 2023-11-28 苏州阿基米德网络科技有限公司 Webpage end browsing method of medical equipment document
CN117131301B (en) * 2023-10-24 2024-01-05 苏州阿基米德网络科技有限公司 Webpage end browsing method of medical equipment document

Also Published As

Publication number Publication date
CN104715064B (en) 2018-11-02

Similar Documents

Publication Publication Date Title
CN104715064A (en) Method and server for marking keywords on webpage
JP5552426B2 (en) Automatic extended language search
US7788262B1 (en) Method and system for creating context based summary
Cafarella et al. Web-scale extraction of structured data
US8423885B1 (en) Updating search engine document index based on calculated age of changed portions in a document
Zhao et al. Topic-centric and semantic-aware retrieval system for internet of things
CN107491465A (en) For searching for the method and apparatus and data handling system of content
CN110889023A (en) Distributed multifunctional search engine of elastic search
CN105808615A (en) Document index generation method and device based on word segment weights
US8949254B1 (en) Enhancing the content and structure of a corpus of content
CN106776937B (en) Method and device for determining inner-link keywords
US20100082594A1 (en) Building a topic based webpage based on algorithmic and community interactions
CN104778232A (en) Searching result optimizing method and device based on long query
Kuc Apache solr 3.1 cookbook
CN104715067A (en) Method, device and system for making key words on web page and browser client
Soulemane et al. Crawling the hidden web: An approach to dynamic web indexing
CN109948015B (en) Meta search list result extraction method and system
CN104462519A (en) Search query method and device
CN114117242A (en) Data query method and device, computer equipment and storage medium
CN112100500A (en) Example learning-driven content-associated website discovery method
CN112989163A (en) Vertical search method and system
He et al. Towards building a metaquerier: Extracting and matching web query interfaces
CN102375835B (en) A kind of information search system and method
CN105808607A (en) Generation method and device of document index
Ahuja et al. Hidden web data extraction tools

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20181102

Termination date: 20210331