The content of the invention
It is a primary object of the present invention to provide a kind of web page storage processing method and processing device for browser, to solve
The problem of efficiency of lookup target webpage is low from the collection webpage of browser.
To achieve these goals, according to an aspect of the invention, there is provided a kind of web page storage for browser
Processing method.
The web page storage processing method for browser includes according to the present invention:Search key is received, wherein, retrieval
Keyword is used to search the webpage for needing to browse from the collection webpage of browser;By search key and the collection net of browser
Page is matched, and obtains the address of matched collection webpage;Export the address of matched collection webpage.
Further, the collection webpage of search key and browser is carried out matching includes:Obtain the collection of browser
The title and content of text of webpage;And the title for collecting webpage and content of text and the search key progress by browser
Match somebody with somebody, wherein, if the title and content of text of the collection webpage of browser are matched with search key, it is determined that search key
Matched with the collection webpage of browser, if the title and content of text and search key of the collection webpage of browser are not
Match somebody with somebody, it is determined that the collection webpage of search key and browser mismatches.
Further, the title of the collection webpage of browser and content of text are being carried out with search key to match it
Before, method further includes:Obtain the content of text of the collection webpage of browser;Obtain the network address and mark of the collection webpage of browser
Topic;And content of text, network address and the title of the collection webpage of storage browser.
Further, obtaining the content of text of the collection webpage of browser includes:Obtain the ground of the collection webpage of browser
Location;Collection webpage is accessed according to the address of the collection webpage of browser;And from collection net during collection webpage is accessed
Page crawls content of text, obtains the content of text of the collection webpage of browser.
Further, from collection web page crawl content of text, browser is obtained from during collection webpage is accessed
The content of text of collection webpage includes:Filter the hypertext markup language label of the collection webpage of browser;It is and super from filtering
Content of text is crawled in the collection webpage of the browser of text mark up language label, is obtained in the text of collection webpage of browser
Hold.
Further, the receipts of browser are obtained from collection web page crawl content of text during collection webpage is accessed
After the content of text for hiding webpage, method further includes:Keyword is obtained from the content of text of the collection webpage of browser, is obtained
The keyword of the collection webpage of browser;Keyword, network address and the title of the collection webpage of browser are stored, by the receipts of browser
The title and content of text for hiding webpage carry out matching with search key and include:By the keyword and mark of the collection webpage of browser
Topic is matched with search key.
To achieve these goals, according to another aspect of the present invention, there is provided a kind of web page storage for browser
Processing unit.
The web page storage processing unit for browser includes according to the present invention:Receiving unit, is closed for receiving retrieval
Keyword, wherein, search key is used to search the webpage for needing to browse from the collection webpage of browser;Matching unit, is used for
Search key is matched with the collection webpage of browser, obtains the address of matched collection webpage;And output unit,
For exporting the address of matched collection webpage.
Further, matching unit includes:First acquisition module, the title and text of the collection webpage for obtaining browser
This content;And matching module, for the title of the collection webpage of browser and content of text and search key to be carried out
Match somebody with somebody, wherein, if the title and content of text of the collection webpage of browser are matched with search key, it is determined that search key
Matched with the collection webpage of browser, if the title and content of text and search key of the collection webpage of browser are not
Match somebody with somebody, it is determined that the collection webpage of search key and browser mismatches.
Further, device further includes:First acquisition unit, the content of text of the collection webpage for obtaining browser;
Second acquisition unit, the network address and title of the collection webpage for obtaining browser;And storage unit, for storing browser
Collection webpage content of text, network address and title.
Further, first acquisition unit includes:Second acquisition module, obtains the address of the collection webpage of browser;Visit
Ask module, the address for the collection webpage according to browser accesses collection webpage;And module is crawled, for accessing collection
From collection web page crawl content of text during webpage, the content of text of the collection webpage of browser is obtained.
By the present invention, the collection webpage for needing to access is searched from the collection webpage of browser by the way of retrieval,
Solve the problems, such as that the efficiency that target webpage is searched from the collection webpage of browser is low, and then improve from browser
Collect the effect for the efficiency that target webpage is searched in webpage.
Embodiment
It should be noted that in the case where there is no conflict, the feature in embodiment and embodiment in the application can phase
Mutually combination.Below with reference to the accompanying drawings and the present invention will be described in detail in conjunction with the embodiments.
In order to make those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Attached drawing, is clearly and completely described the technical solution in the embodiment of the present application, it is clear that described embodiment is only
The embodiment of the application part, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people
Member's all other embodiments obtained without making creative work, should all belong to the model of the application protection
Enclose.
It should be noted that term " first " in the description and claims of this application and above-mentioned attached drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so use
Data can exchange in the appropriate case, so as to embodiments herein described herein.In addition, term " comprising " and " tool
Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing series of steps or unit
Process, method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include without clear
It is listing to Chu or for the intrinsic other steps of these processes, method, product or equipment or unit.
An embodiment of the present invention provides a kind of web page storage processing method for browser, Fig. 1 is real according to the present invention
Apply the flow chart of the web page storage processing method for browser of example.
As shown in Figure 1, this method includes steps S102 to step S106:
Step S102:Search key is received, wherein, search key, which is used to search from the collection webpage of browser, to be needed
The webpage to be browsed.
Search key can be any key for being used to search the webpage for needing to browse from the collection webpage of browser
Word, search key can be a keyword or multiple keywords.Specifically, the receipts in browser can be passed through
The region for hiding webpage sets a frame retrieval, and search key input by user is received by the frame retrieval.
Step S104:Search key is matched with the collection webpage of browser, obtains matched collection webpage
Address.
The collection webpage of browser is usually located in the collection of browser, is preserved in the collection of existing browser
The address of collection webpage and title.It can be that retrieval is crucial that the collection webpage of search key and browser is carried out matching
Word is matched with collecting the title of webpage, if illustrating the collection webpage there are search key in the title of collection webpage
It is related to the webpage that user's needs access.Record reason collection net matched with search key in the collection webpage of browser
Page.Preferably, in order to improve by search key search need access collection webpage accuracy, by search key with
The collection webpage of browser, which carries out matching, to be included:Obtain the title and content of text of the collection webpage of browser;And it will browse
The title and content of text of the collection webpage of device are matched with search key, wherein, if the collection webpage of browser
Title and content of text are matched with search key, it is determined that search key is matched with the collection webpage of browser, if clear
Look at device collection webpage title and content of text and search key mismatch, it is determined that the receipts of search key and browser
Webpage is hidden to mismatch.
The content of collection webpage can be obtained or in advance by the collection net of browser by accessing collection webpage
The content of text of each collection webpage is stored in local data base or other storage regions in page, by from database or
Other storage regions obtain the content of text of collection webpage.The content of text of collection webpage can be the full text for collecting webpage
Content or the keyword of the extraction in the full text content of collection webpage.Due to not collecting the title of webpage sometimes not
The content of collection webpage can be represented, or the keyword of the content of the collection webpage of user's care may be not contained in collection net
In the title of page, at this time, if only can cause to retrieve by the way that search key is matched with collecting the title of webpage
The collection webpage accessed to needs, and by replacing multiple search keys progress, repeatedly retrieval can not also be retrieved possible user
To the collection webpage that accesses of needs, by the way that the title of the collection webpage of browser and content of text and search key are carried out
Match somebody with somebody, can be to avoid the above problem.Specifically, first the title for collecting webpage can be matched with search key, if collection net
The title of page is matched with search key, and the content that can no longer carry out collection webpage is matched with search key, if collection
The title of webpage is mismatched with search key, then the content for collecting webpage is matched with search key.By the above method,
The matched probability of collection webpage and search key can be improved, further improving to search by search key needs to access
Collection webpage accuracy.
Preferably, in order to improve the title of collection webpage of above-mentioned acquisition browser and the efficiency of content of text, in Jiang Liu
Look at device collection webpage title and before content of text matched with search key, this method further includes:Acquisition browses
The content of text of the collection webpage of device;Obtain the network address and title of the collection webpage of browser;And the collection of storage browser
Content of text, network address and the title of webpage.
In text by the collection webpage for obtaining browser in advance before the collection webpage to browser is retrieved
Hold, collect the network address of webpage and collect the title of webpage and be stored in local storage region, such as local data base, specifically
Ground, during the content of text of storage collection webpage, network address and title, can associate the text of the collection webpage of browser
Content, network address and title, that is, establish the correspondence of the content of text for belonging to same collection webpage, network address and title.Pass through
The above method, when collection webpage of the user to browser carry out retrieval be when, can be rapidly obtained collection webpage text
This content, title are matched with search key, if there is with search key it is matched collection webpage when can be quick
The address of the collection webpage is obtained, improves effectiveness of retrieval.
Alternatively, obtaining the content of text of the collection webpage of browser includes:Obtain the address of the collection webpage of browser;
Collection webpage is accessed according to the address of the collection webpage of browser;And climbed during collection webpage is accessed from collection webpage
Content of text is taken, obtains the content of text of the collection webpage of browser.
The network address and title of the collection webpage of browser are had stored in the collection of browser, specifically, Ke Yitong
Cross the application programming interfaces (Application for the address for being used to obtain collection webpage for calling browser to provide
Programming Interface, API) come obtain collection webpage address, i.e. universal resource locator (Uniform
Resource Locator, URL).Address by collecting webpage can access the collection webpage, access the mistake of collection webpage
Cheng Zhongcong collects web page crawl content of text, obtains the content of text of the collection webpage of browser.Specifically, network can be passed through
Reptile crawls content of text from collection webpage.Web crawlers be according to setting rule it is automatic crawl on network the program of information or
It is script, for example, web crawlers can be set only to crawl the content of text on webpage, web crawlers can also be set only to crawl net
Picture on page, waits.The content of text of collection webpage is only crawled in the embodiment of the present invention by web crawlers.Preferably, it is
Improve the efficiency for the content of text for crawling collection webpage, access collect webpage during from collection webpage crawling text
Content, obtaining the content of text of the collection webpage of browser includes:Filter the hypertext markup language of the collection webpage of browser
Label;And content of text is crawled in the collection webpage of the browser from filtering hypertext markup language label, obtain browser
Collection webpage content of text.
Hypertext markup language (Hyper Text Markup Language, HTML) label is in hypertext markup language
Minimum unit, can set the display format of webpage, for example, passing through hypertext markup by the hypertext markup language label
Linguistic labels set display location of the title of webpage, keyword, web page contents etc..Specifically, can be by collecting webpage
Address to server ask access webpage after, by server return content matched with default regular expression, mistake
The hypertext markup language label of collection webpage is filtered, wherein, regular expression is to be described using single character string, matched one
Series meets the character string of some syntactic rule, for example, a regular expression for being used to match China Post's coding is " [1-
9]\\d{5}(!D) ", character string to be matched is " Chinabeijing100081haidian ", then passes through the regular expressions
Formula can go out in character string to be detected to represent the character " 100081 " of postcode with Rapid matching, other characters are then filtered.
Preferably, during collection webpage is accessed the collection of browser is obtained from collection web page crawl content of text
After the content of text of webpage, method further includes:Keyword is obtained from the content of text of the collection webpage of browser, is obtained clear
Look at device collection webpage keyword;Keyword, network address and the title of the collection webpage of browser are stored, by the collection of browser
The title and content of text of webpage carry out matching with search key to be included:By the keyword and title of the collection webpage of browser
Matched with search key.
The keyword of the collection webpage of browser can be some that occurrence number is more in the content of text for collect webpage
Word or collect webpage content of text middle position rest against before content of text word, such as collection webpage text
Summary of content etc..Specifically, the embodiment of the present invention is to collect some more words of occurrence number in the content of text of webpage
Illustrated exemplified by keyword as the collection webpage, can be to collecting net after the content of text of collection webpage is got
The content of text of page carries out cutting word, and the content of text that will collect webpage is divided into independent word, can filter out one in advance
The word without physical meaning such as a little stop words, stop words, that is, modal particle, conjunction, word set is formed by the word obtained after filtering
Close, count the word repeated in the set of words and the word occurrence number repeated, if what this repeated
The occurrence number of word is more than predetermined threshold value, then the keyword using the word that this repeats as collection webpage.Obtain it is clear
Look at after the keyword for collecting webpage of device, similarly, can be built in the keyword of storage collection webpage, network address and title process
The correspondence of the vertical keyword for collecting webpage, network address and title.Since the content of text for collecting webpage may be more, retrieval is closed
Keyword is more time-consuming when the content of text of webpage is matched with collecting, on the other hand, it is also possible to of excessive mistake occurs
With as a result, not being the collection webpage that user needs to access with the matched collection webpage of search key, by extracting collection net
Keyword in the content of text of page is matched with search key, can not only improve matched efficiency, but also can carry
The accuracy of high matching result.
Step S106:Export the address of matched collection webpage.
By above-mentioned steps can obtain browser collection webpage in the matched collection webpage of search key
Address, the address for exporting the matched collection webpage are checked for user.
It can be seen from the above description that the present invention realizes following technique effect:
The embodiment of the present invention is by receiving search key, by the progress of the collection webpage of search key and browser
Match somebody with somebody, obtain the address of matched collection webpage, the address of matched collection webpage is exported, from browser by way of retrieval
Being searched in collection webpage needs the collection webpage that accesses, compared with the prior art in open the collection of browser successively by user
Webpage is searched, improve from browser collection webpage in search target webpage efficiency, solve in correlation technique from
The problem of efficiency of lookup target webpage is low in the collection webpage of browser.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions
Performed in computer system, although also, show logical order in flow charts, in some cases, can be with not
The order being same as herein performs shown or described step.
Another aspect according to embodiments of the present invention, there is provided a kind of web page storage processing unit for browser, should
Device can be used for the web page storage processing method for browser for performing the embodiment of the present invention, the method for the embodiment of the present invention
Can also being performed for the web page storage processing unit of browser by the embodiment of the present invention.
Fig. 2 is the schematic diagram of the web page storage processing unit for browser according to embodiments of the present invention.Such as Fig. 2 institutes
Show, which includes:Receiving unit 10, matching unit 20 and output unit 30.
Receiving unit 10, for receiving search key, wherein, search key is used for from the collection webpage of browser
Search the webpage for needing to browse.
Search key can be any key for being used to search the webpage for needing to browse from the collection webpage of browser
Word, search key can be a keyword or multiple keywords.Specifically, the receipts in browser can be passed through
The region for hiding webpage sets a frame retrieval, and search key input by user is received by the frame retrieval.
Matching unit 20, for search key to be matched with the collection webpage of browser, obtains matched collection
The address of webpage.
The collection webpage of browser is usually located in the collection of browser, is preserved in the collection of existing browser
The address of collection webpage and title.It can be that retrieval is crucial that the collection webpage of search key and browser is carried out matching
Word is matched with collecting the title of webpage, if illustrating the collection webpage there are search key in the title of collection webpage
It is related to the webpage that user's needs access.
Output unit 30, for exporting the address of matched collection webpage.
Behind address in the collection webpage for obtain browser with the matched collection webpage of search key, this is exported
The address for the collection webpage matched somebody with somebody is checked for user.
The embodiment of the present invention receives search key by receiving unit 10, and matching unit 20 is by search key with browsing
The collection webpage of device is matched, and obtains the address of matched collection webpage, and output unit 30 exports matched collection webpage
Address.The embodiment of the present invention searches the collection webpage for needing to access, phase by way of retrieval from the collection webpage of browser
Than being searched in the collection webpage for opening browser successively by user in the prior art, the collection net from browser is improved
The efficiency of target webpage is searched in page, solves the efficiency for searching target webpage in correlation technique from the collection webpage of browser
The problem of low.
Preferably, matching unit 20 includes:First acquisition module, the title and text of the collection webpage for obtaining browser
This content;And matching module, for the title of the collection webpage of browser and content of text and search key to be carried out
Match somebody with somebody, wherein, if the title and content of text of the collection webpage of browser are matched with search key, it is determined that search key
Matched with the collection webpage of browser, if the title and content of text and search key of the collection webpage of browser are not
Match somebody with somebody, it is determined that the collection webpage of search key and browser mismatches.
Preferably, which further includes:First acquisition unit, the content of text of the collection webpage for obtaining browser;
Second acquisition unit, the network address and title of the collection webpage for obtaining browser;And storage unit, for storing browser
Collection webpage content of text, network address and title.
Preferably, first acquisition unit includes:Second acquisition module, obtains the address of the collection webpage of browser;Access
Module, the address for the collection webpage according to browser access collection webpage;And module is crawled, for accessing collection net
From collection web page crawl content of text during page, the content of text of the collection webpage of browser is obtained.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general
Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and formed
Network on, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
Performed in the storage device by computing device, either they are fabricated to respectively each integrated circuit modules or by they
In multiple modules or step be fabricated to single integrated circuit module to realize.In this way, the present invention be not restricted to it is any specific
Hardware and software combines.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the invention, for the skill of this area
For art personnel, the invention may be variously modified and varied.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should all be included in the protection scope of the present invention.