CN104123366A - Search method and server - Google Patents

Search method and server Download PDF

Info

Publication number
CN104123366A
CN104123366A CN201410352627.3A CN201410352627A CN104123366A CN 104123366 A CN104123366 A CN 104123366A CN 201410352627 A CN201410352627 A CN 201410352627A CN 104123366 A CN104123366 A CN 104123366A
Authority
CN
China
Prior art keywords
index
label
webpage
search
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410352627.3A
Other languages
Chinese (zh)
Inventor
谢建平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410352627.3A priority Critical patent/CN104123366A/en
Publication of CN104123366A publication Critical patent/CN104123366A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a search method and server. The method includes the steps that web page information on the internet is obtained and preprocessed to obtain a web address of the web page information; the web address is analyzed, classification labels corresponding to the web address are determined, indexes are established in index databases corresponding to the classification labels, and the classification labels comprise the website label, the web page label and the file label; search information input by a user is obtained and includes a keyword, classification labels and classification sub-labels; according to the classification labels and the classification sub-labels in the search information, index databases corresponding to the search information are analyzed and determined; the determined index databases are searched for according to the keyword to obtain search results, the search results are sequenced in a classification mode according to a preset rule, and the search results are returned to the user. According to the technical scheme, SEO interference can be avoided, search content which the user wants can be obtained for the user more rapidly and accurately, and search efficiency is improved.

Description

A kind of searching method and search server
Technical field
The present invention relates to field of computer technology, relate in particular to a kind of searching method and search server.
Background technology
In today of internet high speed development, Internet user is increasing for the demand of information, also more and more higher for the requirement of the convenience of obtaining information.People are in order to obtain fast needed information often by search engine the internet information from immense.Nowadays the highest search engine of people's utilization rate is exactly to take the full-text search engine that Google, Baidu be representative.
And nowadays the webpage of search engine index up to over ten billion, add the appearance of search engine rank optimisation technique (SEO), make Search Results usually occur the webpage that some correlativitys are little, cause people cannot from Search Results, find fast own required information.Such as the application number patent of invention that is 201110246182.7, this patent can only realize single Horizon Search or longitudinal searching to key word in search, this searching method more and more cannot meet the growing convenience for acquisition of information of people and authoritative demand, thereby cause Search Results more and more inaccurate, user can not directly obtain the search information of wanting.
Summary of the invention
The embodiment of the present invention proposes a kind of searching method and search server, adopts technical solution of the present invention can avoid the interference of SEO, and the search content that the faster acquisition exactly of user is wanted, provides search efficiency.
The embodiment of the present invention provides a kind of searching method, comprising:
Obtain the info web on internet, and described info web is carried out to pre-service, obtain the network address of described info web;
Analyze described network address, confirm the tag along sort that described network address is corresponding, and set up index in index database corresponding to described tag along sort, described tag along sort comprises: web site tags, webpage label and file label;
The search information of obtaining user's input, described search information comprises: keyword, tag along sort and classification subtab;
According to the tag along sort in described search information and classification subtab, analyze and determine index database corresponding to described search information;
According to definite index database described in described key search, obtain result for retrieval, and described result for retrieval is sorted according to default rule classification, to user, return to described result for retrieval.
Further, described described info web is carried out to pre-service, be specially: extract the network address of described info web, described network address is merged to renewal.
Further, the described network address of described analysis, confirms the tag along sort that described network address is corresponding, and sets up index in index database corresponding to described tag along sort, is specially:
Analyze the domain name of described network address affiliated web site, by domain name, search for default station allusion quotation;
If retrieval, less than identical domain name, notifies managerial personnel to carry out manual sort to the website under described webpage;
If retrieve identical domain name, according to described network address, judge whether homepage type webpage of described webpage, if described webpage is homepage type webpage, in web index storehouse corresponding to described web site tags, described webpage is set up to index, if described webpage is non-homepage type webpage, in web page index storehouse corresponding to described webpage label, described webpage is set up to index;
Analyze the content of described webpage, and to setting up index in file index storehouse corresponding to file label for the file of browsing or downloading in described webpage.
Further, described web site tags comprises a plurality of stations group's label, described web index storehouse comprises subindex storehouse, a plurality of website, wherein, each minimum station group label in web site tags is a corresponding subindex storehouse, website respectively, when setting up index, index is set up in the subindex storehouse, website that described webpage is corresponding with minimum station group label;
Described webpage label comprises a plurality of stations group's label, described web page index storehouse comprises a plurality of webpage subindexs storehouse, wherein, each minimum station group label in webpage label is a corresponding webpage subindex storehouse respectively, when setting up index, index is set up in the webpage subindex storehouse that described webpage is corresponding with minimum station group label;
Described file label comprises a plurality of stations group's label, described file index storehouse comprises a plurality of file subindexs storehouse, wherein, each minimum station group label in file label is a corresponding file subindex storehouse respectively, before setting up index, according to the type of file, to described, for the file of browsing or downloading, classify, determine file type under described file, when setting up index, index is set up in the file subindex storehouse that described file is corresponding with minimum station group label in described file type.
Further, described according to the tag along sort in described search information with classification subtab, analyze and determine index database corresponding to described search information, be specially:
According to the tag along sort in search information and classification subtab, whether inquiry there is identical default label, if there is no, to user, provides the default label similar to described search information, for user, again inputs;
If existed, according to described tag along sort, confirm index database corresponding to described search information, according to classification subtab, confirm the subindex storehouse that described search information is corresponding.
Further, described by described result for retrieval according to the sequence of default rule classification, to user, return to described result for retrieval, comprising:
The click situation to historical search result according to User action log and user, optimizes described default rule, and described result for retrieval, according to the rule classification sequence after optimizing, is returned to result for retrieval to user.
Accordingly, the embodiment of the present invention provides a kind of search server, comprising:
Acquisition of information module, for obtaining the info web on internet, and carries out pre-service to described info web, obtains the network address of described info web;
Message processing module, for analyzing described network address, confirms the tag along sort that described network address is corresponding, and sets up index in index database corresponding to described tag along sort, and described tag along sort comprises: web site tags, webpage label and file label;
Information searching module, for obtaining the search information of user's input, described search information comprises: keyword, tag along sort and classification subtab; And for according to the tag along sort of described search information and classification subtab, analyze and determine index database corresponding to described search information; And for according to definite index database described in described key search, obtain result for retrieval, and described result for retrieval is sorted according to default rule classification, to user, return to described result for retrieval.
Further, described information classification module comprises:
Analyze search module, for analyzing the domain name of described network address affiliated web site, by domain name, search for default station allusion quotation;
Classification determination module, for when described analysis search module retrieves identical domain name, determines whether homepage type webpage of webpage corresponding to described network address, and for analyzing described web page contents, determining can be for the file of browsing or downloading in described webpage;
Index module for when described classification determination module determines that described webpage is homepage type webpage, is set up index to described webpage in web index storehouse corresponding to described web site tags; And for when described classification determination module determines that described webpage is non-homepage type webpage, in web page index storehouse corresponding to described webpage label, described webpage is set up to index; And for to setting up index in file index storehouse corresponding to file label for the file browsing or download in described webpage;
Notification module, for during less than identical domain name, notifying managerial personnel to carry out manual sort to described webpage in the retrieval of described analysis search module.
Further, described information searching module comprises:
Label enquiry module, for according to the tag along sort of search information and classification subtab, inquires about and whether has identical default label;
Index database is confirmed module, for when described label enquiry module acknowledgment of your inquiry exists identical default label, according to described tag along sort, confirm index database corresponding to described search information, and according to classification subtab, confirm the subindex storehouse that described search information is corresponding;
Label provides module, for when described label enquiry module acknowledgment of your inquiry does not exist identical default label, to user, provides the default label similar to described search information, for user, again inputs;
Retrieval module, for determining after index database in described index database confirmation module, according to index database definite described in keyword retrieval;
Order module, for sorting result for retrieval according to default rule classification;
Result display module, for returning to the result for retrieval after classification and ordination to user.
Further, described information searching module also comprises:
User behavior module, for analysis user search behavior, and is recorded as user behavior diary by described analysis result;
Order module also, for the click situation to historical search result according to described User action log and user, is optimized described default rule, and described result for retrieval is sorted according to the rule classification after optimizing.
Therefore, implement the embodiment of the present invention, there is following beneficial effect:
The searching method that the embodiment of the present invention provides carries out pre-service by the info web on internet, after pre-service, to it, uses label to carry out taxonomic revision, and this tag along sort comprises: web site tags, webpage label and file label.When user uses search engine search, according to the search information of user's input, Search Results is sorted out and sorted according to factors such as labels.Than the only laterally search of (Webpage search) or longitudinal (special search) single gradient of prior art, technical solution of the present invention can facilitate user to navigate to fast target web, improves search efficiency.
Further, the invention provides file label and search for for user, user can directly obtain a certain file for browsing or downloading, and reduces user's operation, further improves search efficiency.
Further, when the site information of internet is carried out to labeling, according to the domain name of webpage, judge whether to classify, if do not had, notify managerial personnel to carry out manual sort.The accuracy that further raising is classified and professional.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of searching method provided by the invention;
Fig. 2 is the optional schematic flow sheet of step 102 provided by the invention;
Fig. 3 is the structural representation of search server provided by the invention;
Fig. 4 is the optional structural representation of acquisition of information module provided by the invention;
Fig. 5 is the optional structural representation of message processing module provided by the invention;
Fig. 6 is the optional structural representation of information searching module provided by the invention;
Fig. 7 is the optional structural representation of information searching module provided by the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
Embodiment 1
Referring to Fig. 1, be the schematic flow sheet of a kind of searching method provided by the invention, the method comprises the following steps:
Step 101: obtain the info web on internet, and info web is carried out to pre-service, obtain the network address of info web.
In the present embodiment, obtaining mainly of internet information obtained by acquisition of information module.When acquisition of information, the information of this webpage is stored in total chained library.After info web obtains, acquisition of information module is carried out pre-service to info web, extracts the network address of info web, and this network address is merged to renewal.
In the present embodiment, except total chained library, also comprise raw data base.Raw data base is for storing untreated info web.
Step 102: analyze this network address, confirm the tag along sort that this network address is corresponding, and set up index in index database corresponding to tag along sort, tag along sort comprises: web site tags, webpage label and file label.
In the present embodiment, info web after processing is stored in not in indexing information storehouse, and system is never obtained the network address of this webpage in indexing information storehouse, and analyzes this network address, confirm the tag along sort that this network address is corresponding, and set up index in index database corresponding to tag along sort.
In the present embodiment, tag along sort comprises: web site tags, webpage label and file label.It is respectively to having web index storehouse, web page index storehouse and file index storehouse.In each index database, all comprise a plurality of subindexs storehouse, wherein, the respectively corresponding minimum station group label in each subindex storehouse.The group's label of standing refers to two or more the common label in website, and this label has been indicated certain attribute that these a few websites are common.And minimum station group label is station group's label of minimal level, can not segment again.For example: tag along sort is webpage label, it comprises a plurality of stations group's label, as novel, video, music etc., and in the group's label of this station of novel, also comprise a plurality of stations group's label, as online novel, novel forum, novel stack room etc., and minimum station group label is online novel, corresponding, the corresponding subindex storehouse of this minimum station group label.Managerial personnel can arrange station group's label, manually specify minimum station group label, make tag along sort more professional more accurate.
In the present embodiment, by station allusion quotation, station group's label is managed, the allusion quotation of standing is exactly to have recorded the index file of the corresponding relation between web site name, network address, tag along sort, index database.Search engine can clearly know that large station group's label comprises several little station group's labels according to station allusion quotation, and minimum station group's label has and comprised which website, corresponding which index database of that label etc.
Step 102 concrete steps as shown in Figure 2, step 201: analyze the domain name of this network address affiliated web site, by the default station allusion quotation of dns search.Step 202: whether retrieval exists identical domain name.Step 203: if retrieval, less than identical domain name, notifies managerial personnel to carry out manual sort to webpage in the allusion quotation of station, wait for that website is classified after processing to re-execute step 201 reclassifying index.Step 204: if retrieve identical domain name, judge whether homepage type webpage of this webpage according to this network address.Step 205: if this webpage is set up to index in web index storehouse corresponding to web site tags.Step 206: if this webpage is non-homepage type webpage, in web page index storehouse corresponding to webpage label, this webpage is set up to index.Step 207: analyze the content of this webpage, and to setting up index in file index storehouse corresponding to file label for the file of browsing or downloading in webpage.
In the present embodiment, when setting up index, index is set up in the subindex storehouse that webpage is corresponding with minimum station group label.As above-mentioned, give an example, the webpage that adds index if need is an online novel website, when setting up index, in webpage label, inquire about corresponding station group's label (novel), then in the group's label of this station of novel, inquire about minimum station group label (online novel), finally index is set up in this webpage subindex storehouse corresponding with online this minimum station of novel group label.
In the present embodiment, when carrying out labeling, also the content in webpage is analyzed, the file that inquiry can be browsed and download for user, sets up index by file and file index storehouse.Before setting up index, search engine can, according to the type of file, to classifying for the file of browsing or downloading, be determined the file type that this document is affiliated.When setting up index, index is set up in the file subindex storehouse that file is corresponding with minimum station group label in affiliated file type.As above-mentioned, for example the novel text of Gong the download in novel website is classified, according to file layout, be divided into the classifications such as TXT, word.For example, the file that need set up index be the novel file of a TXT form when setting up index, index is set up in the file subindex storehouse that this novel file is corresponding with minimum station group label (TXT form label) in described file layout.Like this, when user search, only need to select corresponding file label, can directly obtain the webpage at this document place, save user and in numerous Search Results, carry out quadratic search, improve search efficiency.
Step 103: obtain the search information of user's input, this search information comprises: keyword, tag along sort and classification subtab.
In the present embodiment, after Web Information Classification arranges, user is to system inputted search information, and this search information comprises: keyword, tag along sort and classification subtab.Wherein, keyword is directly keyed in by user, and tag along sort is selected by user, comprises website, webpage and file.User, selected after tag along sort, system can directly provide default classification subtab to user, or is keyed in voluntarily by user.
Step 104: according to the tag along sort in search information and classification subtab, analyze and index database corresponding to definite search information.
In the present embodiment, step 104 is specially: according to the tag along sort in search information and classification subtab, whether inquiry exists identical default label, if there is no, to user, provide the default label similar to this search information, for user, again input.If existed, according to tag along sort, confirm index database corresponding to search information, and according to classification subtab, confirm subindex storehouse corresponding to search information.
In the present embodiment, classification subtab is mutually corresponding with station group's label, and system is when analyzing the classification subtab of contrast user input, and the corresponding relation according to classification subtab with station group's label, confirms corresponding subindex storehouse.
Step 105: according to this definite index database of key search, obtain result for retrieval, and result for retrieval is sorted according to default rule classification, return to result for retrieval to user.
In the present embodiment, after determining index database, according to this index database of key search, obtain corresponding result for retrieval, and result for retrieval is sorted according to default rule classification, to user, return to result for retrieval.Meanwhile, system, by the search behavior of recording user, according to the click situation of User action log and historical search result, is optimized default ordering rule, when searching for next time, according to the rule classification sequence after optimizing, to user, returns to result for retrieval.
In the present embodiment, if the search information of user's input only comprises key word and tag along sort, while there is no input classification subtab, system can be analyzed similar historical search and Search Results, and searches in corresponding index database, obtains this Search Results.
Therefore the searching method that the embodiment of the present invention provides carries out pre-service by the info web on internet, after pre-service, to it, use label to carry out taxonomic revision, this tag along sort comprises: web site tags, webpage label and file label.When user uses search engine search, according to the search information of user's input, Search Results is sorted out and sorted according to factors such as labels.Than the only laterally search of (Webpage search) or longitudinal (special search) single gradient of prior art, technical solution of the present invention can facilitate user to navigate to fast target web, improves search efficiency.And show by classified search and classification, can lower and avoid the interference of the factors such as SEO, make site owner more focus on the construction of web site contents.Finally, by use multiple labeling (audit of website and classification are artificial) for website, can effectively reduce the type websites such as fishing website for netizen's harm.
Further, the invention provides file label and search for for user, user can directly obtain a certain file for browsing or downloading, and reduces user's operation, further improves search efficiency.
Further, when the site information of internet is carried out to labeling, according to the domain name of webpage, judge whether to classify, if do not had, notify managerial personnel to carry out manual sort.The accuracy that further raising is classified and professional.
Embodiment 2
Referring to Fig. 3, the present embodiment provides a kind of search server, and it mainly comprises: acquisition of information module 301, message processing module 302 and information searching module 303.Wherein, acquisition of information module 301 is connected with message processing module 302, and message processing module 302 is connected with information searching module 303.
Acquisition of information module 301 is for obtaining the info web on internet, and info web is carried out to pre-service, obtains the network address of info web.
Message processing module 302 is for analyzing network address, confirms tag along sort corresponding to network address, and set up index in index database corresponding to tag along sort, and tag along sort comprises: web site tags, webpage label and file label.
Information searching module 303 is for obtaining the search information of user's input, and search information comprises: keyword, tag along sort and classification subtab; And for according to the tag along sort of search information and classification subtab, analyze and index database corresponding to definite search information; And for the index database definite according to key search, obtain result for retrieval, and result for retrieval is sorted according to default rule classification, to user, return to result for retrieval.
Referring to Fig. 4, Fig. 4 is the optional structural representation of acquisition of information module 301.As shown in Figure 4, acquisition of information module 301 comprises reptile module 401 and memory module 402, and reptile module 401 is connected with memory module 402.Reptile module 401 is for obtaining the info web on internet, and to the info web the obtaining processing of creeping, and be that network address sends to memory module 402 by the url link of creeping after processing.Memory module 402 comprises raw data base and total chained library, and raw data base is for storing the undressed info web that reptile module 401 is obtained, and total chained library is for the network address after stores processor.
Referring to Fig. 5, Fig. 5 is the optional structural representation of message processing module 302.As shown in Figure 5, message processing module 302 comprises classified search module 501, classification determination module 502, index module 503 and notification module 504.Analyze search module 501 for analyzing the domain name of network address affiliated web site, by the default station allusion quotation of this dns search.Classification determination module 502 is for when analyzing search module 501 and retrieve identical domain name, determine whether homepage type webpage of webpage corresponding to network address, and for analyzing web page content, determining can be for the file of browsing or downloading in webpage.Index module 503, for when classification determination module 502 determines that webpage is homepage type webpage, is set up index to webpage in web index storehouse corresponding to web site tags; And for when classification determination module 502 determines that webpage is non-homepage type webpage, in web page index storehouse corresponding to webpage label, webpage is set up to index; And for to setting up index in file index storehouse corresponding to file label for the file browsing or download in webpage.Notification module 504, for when analyzing search module 501 retrievals less than identical domain name, notifies managerial personnel to carry out manual sort to webpage.
In the present embodiment, analyze search module 501 and can be, but not limited to as sorter, index module 503 can be, but not limited to as index.
In the present embodiment, message processing module 302 also comprises memory module, and memory module comprises station allusion quotation, total data bank and general index storehouse.Wherein, the allusion quotation of standing is exactly to have recorded the index file of the corresponding relation between web site name, network address, label, index database.The place of total data bank for storing through pretreated info web.General index storehouse comprises web index storehouse, web page index storehouse and file index storehouse.
Referring to Fig. 6, Fig. 6 is the optional structural representation of information searching module 303.As shown in Figure 6, information searching module 303 comprises: label enquiry module 601, index database confirm that module 602, label provide module 603, retrieval module 604, order module 605, result display module 606.Wherein, label enquiry module 601, for according to the tag along sort of search information and classification subtab, is inquired about and whether is had identical default label.Index database confirms that module 602 is for when label enquiry module 601 acknowledgment of your inquiry exist identical default label, according to tag along sort, confirms index database corresponding to search information, and according to classification subtab, confirms subindex storehouse corresponding to search information.Label provides module 603 for when label enquiry module 601 acknowledgment of your inquiry do not exist identical default label, to user, provides the default label similar to described search information, for user, again inputs.Retrieval module 604 is determined after index database for confirm module 602 at index database, the index database definite according to keyword retrieval, acquisition result for retrieval.Order module 605 is for sorting result for retrieval according to default rule classification.Result display module 606 is for returning to sorted result for retrieval to user.
Referring to Fig. 7, Fig. 7 is the optional structural representation of information searching module 303.As shown in Figure 7, information searching module 303 also comprises: user behavior module 701 is for analysis user search behavior, and analysis result is recorded as to user behavior diary.Order module 705 also, for the click situation to historical search result according to User action log and user, is optimized default ordering rule, and result for retrieval is sorted according to the rule classification after optimizing.
Further, information searching module 303 can be, but not limited to comprise load module, for user's inputted search information.
The more detailed principle of work of the present embodiment and step can be, but not limited to the content of recording referring in embodiment 1.
Therefore, the search server that the embodiment of the present invention provides, by the info web on 301 pairs of internets of acquisition of information module, carry out pre-service, after pre-service, by message processing module 302, to it, use label to carry out taxonomic revision, this tag along sort comprises: web site tags, webpage label and file label.When user uses search engine search, information searching module 303, according to the search information of user's input, is sorted out and is sorted according to factors such as labels Search Results.Than the only laterally search of (Webpage search) or longitudinal (special search) single gradient of prior art, technical solution of the present invention can facilitate user to navigate to fast target web, improves search efficiency.
Further, the invention provides file label and search for for user, user can directly obtain a certain file for browsing or downloading, and reduces user's operation, further improves search efficiency.
Further, when the site information of internet is carried out to labeling, according to the domain name of webpage, judge whether to classify, if do not had, notify managerial personnel to carry out manual sort.The accuracy that further raising is classified and professional.
One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, to come the hardware that instruction is relevant to complete by computer program, described program can be stored in a computer read/write memory medium, this program, when carrying out, can comprise as the flow process of the embodiment of above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) etc.
The above is the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications are also considered as protection scope of the present invention.

Claims (10)

1. a searching method, is characterized in that, comprising:
Obtain the info web on internet, and described info web is carried out to pre-service, obtain the network address of described info web;
Analyze described network address, confirm the tag along sort that described network address is corresponding, and set up index in index database corresponding to described tag along sort, described tag along sort comprises: web site tags, webpage label and file label;
The search information of obtaining user's input, described search information comprises: keyword, tag along sort and classification subtab;
According to the tag along sort in described search information and classification subtab, analyze and determine index database corresponding to described search information;
According to definite index database described in described key search, obtain result for retrieval, and described result for retrieval is sorted according to default rule classification, to user, return to described result for retrieval.
2. searching method according to claim 1, is characterized in that, described described info web is carried out to pre-service, is specially: extract the network address of described info web, described network address is merged to renewal.
3. searching method according to claim 1, is characterized in that, the described network address of described analysis, confirms the tag along sort that described network address is corresponding, and sets up index in index database corresponding to described tag along sort, is specially:
Analyze the domain name of described network address affiliated web site, by domain name, search for default station allusion quotation;
If retrieval, less than identical domain name, notifies managerial personnel to the website manual sort under described webpage;
If retrieve identical domain name, according to described network address, judge whether homepage type webpage of described webpage, if described webpage is homepage type webpage, in web index storehouse corresponding to described web site tags, described webpage is set up to index, if described webpage is non-homepage type webpage, in web page index storehouse corresponding to described webpage label, described webpage is set up to index;
Analyze the content of described webpage, and to setting up index in file index storehouse corresponding to file label for the file of browsing or downloading in described webpage.
4. searching method according to claim 3, is characterized in that,
Described web site tags comprises a plurality of stations group's label, described web index storehouse comprises subindex storehouse, a plurality of website, wherein, each minimum station group label in web site tags is a corresponding subindex storehouse, website respectively, when setting up index, index is set up in the subindex storehouse, website that described webpage is corresponding with minimum station group label;
Described webpage label comprises a plurality of stations group's label, described web page index storehouse comprises a plurality of webpage subindexs storehouse, wherein, each minimum station group label in webpage label is a corresponding webpage subindex storehouse respectively, when setting up index, index is set up in the webpage subindex storehouse that described webpage is corresponding with minimum station group label;
Described file label comprises a plurality of stations group's label, described file index storehouse comprises a plurality of file subindexs storehouse, wherein, each minimum station group label in file label is a corresponding file subindex storehouse respectively, before setting up index, according to the type of file, to described, for the file of browsing or downloading, classify, determine file type under described file, when setting up index, index is set up in the file subindex storehouse that described file is corresponding with minimum station group label in described file type.
5. searching method according to claim 1, is characterized in that, described according to the tag along sort in described search information with classification subtab, analyze and determine index database corresponding to described search information, be specially:
According to the tag along sort in search information and classification subtab, whether inquiry there is identical default label, if there is no, to user, provides the default label similar to described search information, for user, again inputs;
If existed, according to described tag along sort, confirm index database corresponding to described search information, according to classification subtab, confirm the subindex storehouse that described search information is corresponding.
6. searching method according to claim 1, is characterized in that, described by described result for retrieval according to the sequence of default rule classification, to user, return to described result for retrieval, comprising:
The click situation to historical search result according to User action log and user, optimizes described default rule, and described result for retrieval, according to the rule classification sequence after optimizing, is returned to result for retrieval to user.
7. a search server, is characterized in that, comprising:
Acquisition of information module, for obtaining the info web on internet, and carries out pre-service to described info web, obtains the network address of described info web;
Message processing module, for analyzing described network address, confirms the tag along sort that described network address is corresponding, and sets up index in index database corresponding to described tag along sort, and described tag along sort comprises: web site tags, webpage label and file label;
Information searching module, for obtaining the search information of user's input, described search information comprises: keyword, tag along sort and classification subtab; And for according to the tag along sort of described search information and classification subtab, analyze and determine index database corresponding to described search information; And for according to definite index database described in described key search, obtain result for retrieval, and described result for retrieval is sorted according to default rule classification, to user, return to described result for retrieval.
8. search server according to claim 7, is characterized in that, described information classification module comprises:
Analyze search module, for analyzing the domain name of described network address affiliated web site, by domain name, search for default station allusion quotation;
Classification determination module, for when described analysis search module retrieves identical domain name, determines whether homepage type webpage of webpage corresponding to described network address, and for analyzing described web page contents, determining can be for the file of browsing or downloading in described webpage;
Index module for when described classification determination module determines that described webpage is homepage type webpage, is set up index to described webpage in web index storehouse corresponding to described web site tags; And for when described classification determination module determines that described webpage is non-homepage type webpage, in web page index storehouse corresponding to described webpage label, described webpage is set up to index; And for to setting up index in file index storehouse corresponding to file label for the file browsing or download in described webpage;
Notification module, for during less than identical domain name, notifying managerial personnel to carry out manual sort to described webpage in the retrieval of described analysis search module.
9. search server according to claim 7, is characterized in that, described information searching module comprises:
Label enquiry module, for according to the tag along sort of search information and classification subtab, inquires about and whether has identical default label;
Index database is confirmed module, for when described label enquiry module acknowledgment of your inquiry exists identical default label, according to described tag along sort, confirm index database corresponding to described search information, and according to classification subtab, confirm the subindex storehouse that described search information is corresponding;
Label provides module, for when described label enquiry module acknowledgment of your inquiry does not exist identical default label, to user, provides the default label similar to described search information, for user, again inputs;
Retrieval module, for determining after index database in described index database confirmation module, according to index database definite described in keyword retrieval;
Order module, for sorting result for retrieval according to default rule classification;
Result display module, for returning to the result for retrieval after classification and ordination to user.
10. according to the search server described in claim 7 or 9, it is characterized in that, described information searching module also comprises:
User behavior module, for analysis user search behavior, and is recorded as user behavior diary by described analysis result;
Order module also, for the click situation to historical search result according to described User action log and user, is optimized described default rule, and described result for retrieval is sorted according to the rule classification after optimizing.
CN201410352627.3A 2014-07-23 2014-07-23 Search method and server Pending CN104123366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410352627.3A CN104123366A (en) 2014-07-23 2014-07-23 Search method and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410352627.3A CN104123366A (en) 2014-07-23 2014-07-23 Search method and server

Publications (1)

Publication Number Publication Date
CN104123366A true CN104123366A (en) 2014-10-29

Family

ID=51768777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410352627.3A Pending CN104123366A (en) 2014-07-23 2014-07-23 Search method and server

Country Status (1)

Country Link
CN (1) CN104123366A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778606A (en) * 2015-04-10 2015-07-15 北京京东尚科信息技术有限公司 Account structure data processing method and device
WO2016101737A1 (en) * 2014-12-22 2016-06-30 北京奇虎科技有限公司 Search query method and apparatus
CN106547869A (en) * 2016-10-25 2017-03-29 广东亿迅科技有限公司 The construction method and device of multiserver index
CN107436871A (en) * 2016-05-25 2017-12-05 北京搜狗科技发展有限公司 A kind of data search method, device and electronic equipment
CN107463711A (en) * 2017-08-22 2017-12-12 山东浪潮云服务信息科技有限公司 A kind of tag match method and device of data
CN107729486A (en) * 2017-10-17 2018-02-23 北京奇艺世纪科技有限公司 A kind of video searching method and device
CN107943893A (en) * 2017-11-16 2018-04-20 北京奇安信科技有限公司 A kind of search processing method and device based on internet
CN109165316A (en) * 2018-09-10 2019-01-08 深圳市轱辘汽车维修技术有限公司 A kind of method for processing video frequency, video index method, device and terminal device
CN109947899A (en) * 2019-02-18 2019-06-28 北京明略软件系统有限公司 A kind of keyword retrieval method, system, terminal and storage medium
CN110347920A (en) * 2019-07-02 2019-10-18 北京纵横无双科技有限公司 A kind of search matching method and device of health and fitness information
CN110399339A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 File classifying method, device, equipment and the storage medium of knowledge base management system
CN111915392A (en) * 2020-06-30 2020-11-10 深圳市世强元件网络有限公司 Classified display method for search results of electronic commerce platform of components
CN111951077A (en) * 2020-08-13 2020-11-17 中国民航信息网络股份有限公司 Ticket buying scheme display method and system
CN112287205A (en) * 2020-03-23 2021-01-29 北京来也网络科技有限公司 Document retrieval method, device, equipment and storage medium combining RPA and AI

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101261629A (en) * 2008-04-21 2008-09-10 上海大学 Specific information searching method based on automatic classification technology
US20090100036A1 (en) * 2007-10-11 2009-04-16 Google Inc. Methods and Systems for Classifying Search Results to Determine Page Elements
CN101604324A (en) * 2009-07-15 2009-12-16 中国科学技术大学 A kind of searching method and system of the video service website based on unit search
CN102236719A (en) * 2011-07-25 2011-11-09 西交利物浦大学 Page search engine based on page classification and quick search method
CN102236691A (en) * 2010-05-04 2011-11-09 张文广 Precision guided searching tool system
CN102317943A (en) * 2011-07-29 2012-01-11 华为技术有限公司 Method and device for full-text search
CN102982175A (en) * 2012-12-17 2013-03-20 北京奇虎科技有限公司 Method for performing search by utilizing browser and browser

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090100036A1 (en) * 2007-10-11 2009-04-16 Google Inc. Methods and Systems for Classifying Search Results to Determine Page Elements
CN101261629A (en) * 2008-04-21 2008-09-10 上海大学 Specific information searching method based on automatic classification technology
CN101604324A (en) * 2009-07-15 2009-12-16 中国科学技术大学 A kind of searching method and system of the video service website based on unit search
CN102236691A (en) * 2010-05-04 2011-11-09 张文广 Precision guided searching tool system
CN102236719A (en) * 2011-07-25 2011-11-09 西交利物浦大学 Page search engine based on page classification and quick search method
CN102317943A (en) * 2011-07-29 2012-01-11 华为技术有限公司 Method and device for full-text search
CN102982175A (en) * 2012-12-17 2013-03-20 北京奇虎科技有限公司 Method for performing search by utilizing browser and browser

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016101737A1 (en) * 2014-12-22 2016-06-30 北京奇虎科技有限公司 Search query method and apparatus
CN104778606A (en) * 2015-04-10 2015-07-15 北京京东尚科信息技术有限公司 Account structure data processing method and device
CN107436871A (en) * 2016-05-25 2017-12-05 北京搜狗科技发展有限公司 A kind of data search method, device and electronic equipment
CN106547869A (en) * 2016-10-25 2017-03-29 广东亿迅科技有限公司 The construction method and device of multiserver index
CN107463711B (en) * 2017-08-22 2020-07-28 山东浪潮云服务信息科技有限公司 Data tag matching method and device
CN107463711A (en) * 2017-08-22 2017-12-12 山东浪潮云服务信息科技有限公司 A kind of tag match method and device of data
CN107729486A (en) * 2017-10-17 2018-02-23 北京奇艺世纪科技有限公司 A kind of video searching method and device
CN107729486B (en) * 2017-10-17 2021-02-09 北京奇艺世纪科技有限公司 Video searching method and device
CN107943893A (en) * 2017-11-16 2018-04-20 北京奇安信科技有限公司 A kind of search processing method and device based on internet
CN109165316A (en) * 2018-09-10 2019-01-08 深圳市轱辘汽车维修技术有限公司 A kind of method for processing video frequency, video index method, device and terminal device
CN109947899A (en) * 2019-02-18 2019-06-28 北京明略软件系统有限公司 A kind of keyword retrieval method, system, terminal and storage medium
CN110399339A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 File classifying method, device, equipment and the storage medium of knowledge base management system
CN110347920A (en) * 2019-07-02 2019-10-18 北京纵横无双科技有限公司 A kind of search matching method and device of health and fitness information
CN112287205A (en) * 2020-03-23 2021-01-29 北京来也网络科技有限公司 Document retrieval method, device, equipment and storage medium combining RPA and AI
CN111915392A (en) * 2020-06-30 2020-11-10 深圳市世强元件网络有限公司 Classified display method for search results of electronic commerce platform of components
CN111951077A (en) * 2020-08-13 2020-11-17 中国民航信息网络股份有限公司 Ticket buying scheme display method and system

Similar Documents

Publication Publication Date Title
CN104123366A (en) Search method and server
US9317613B2 (en) Large scale entity-specific resource classification
TWI482037B (en) Search suggestion clustering and presentation
US10146862B2 (en) Context-based metadata generation and automatic annotation of electronic media in a computer network
US9928296B2 (en) Search lexicon expansion
US8380697B2 (en) Search and retrieval methods and systems of short messages utilizing messaging context and keyword frequency
CN101154224B (en) Websites navigation method and system thereof
US9864768B2 (en) Surfacing actions from social data
US20160034514A1 (en) Providing search results based on an identified user interest and relevance matching
US20130212090A1 (en) Similar document detection and electronic discovery
US8977625B2 (en) Inference indexing
US20110307432A1 (en) Relevance for name segment searches
WO2004025391A2 (en) System and method of searching data utilizing automatic categorization
CN102999625A (en) Method for realizing semantic extension on retrieval request
CN103577489A (en) Method and device of searching web browsing history
CN104715064A (en) Method and server for marking keywords on webpage
CN103577490A (en) Method and device of showing web browsing history
CN103324622A (en) Method and device for automatic generating of front page abstract
CN104428769A (en) Information providing text reader
US20130031075A1 (en) Action-based deeplinks for search results
CN105224624A (en) A kind of method and apparatus realizing down the quick merger of row chain
WO2018013400A1 (en) Contextual based image search results
US20080281811A1 (en) Method of Obtaining a Representation of a Text
KR100671077B1 (en) Server, Method and System for Providing Information Search Service by Using Sheaf of Pages
CN112035723A (en) Resource library determination method and device, storage medium and electronic device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141029

RJ01 Rejection of invention patent application after publication