CN101025737B

CN101025737B - Attention degree based same source information search engine aggregation display method

Info

Publication number: CN101025737B
Application number: CN2006100079057A
Authority: CN
Inventors: 王东
Original assignee: Individual
Current assignee: Individual
Priority date: 2006-02-22
Filing date: 2006-02-22
Publication date: 2011-08-17
Anticipated expiration: 2026-02-22
Also published as: WO2007095834A1; US8176029B2; CN101025737A; US20090094213A1

Abstract

The invention relates to a focus-based same-source information searching engine aggregation display method and system, comprising: searching engine finds all target websites according to conditions as the original searching results; according to the quality of contents, account information of purchasers of display weighting power, and quality of service, and other elements, aggregating the original searching results into a title searching result; only taking the title searching result as final searching result shown to an inquirer, and not showing all the searching results to the inquirer until the inquirer needs to view them. And the system adopts counting server to support network browser and converts all user's operations into PageFocus of a webpage, and transmits the PageFocus back to the counting server to express the quality of contents of the webpage, thus able to become a method for the searching engine to select 'title searching result' and make result display arrangement. And the invention also relates to a method able to automatically judge user state and provide proper style and contents of webpage.

Description

Homologous information search engine aggregation display method based on attention rate

Technical field

The present invention relates to computer networking technology, particularly utilize computing machine in the internet or enterprises the search engine technique of search service is provided on the net.The invention still further relates to a kind of system and web site contents style self-reacting device and method of obtaining the web page user attention rate.

Background technology

On Internet, exist at present a large amount of " webpage or the network service in identical (or similar) source ", for example: 1 by the writing of same individual or entity by the article of massive duplication, viewpoint, Intelligence Page; 2 by same individual or entity interview (or issue) by the news report webpage of massive duplication; 3 are pasted by same individual or the commentaries on classics that is organized in BBS forum speech model; 5 different data formats, the multimedia file of compression factor by the generation of same individual or entity; 6 executable program, data, design documents by the generation of same individual or entity; 7 other modes information content that produce and that extensively duplicated.These " webpage or the network services in identical (or similar) source " are enumerated in present search engine search results one by one, occupy a large amount of lengths, and content is identical, and inconvenient inquiry browses.

Present various search engine and webpage seniority among brothers and sisters service system, all only adopted click traffic and the mode of the webpage residence time to weigh the popular degree of webpage, and the method for taking is main: 1) search engine class: rely on the inquiry that the popular degree of webpage, for example google, Baidu are calculated in the click of Search Results.2) ALEXA website seniority among brothers and sisters class: rely on the toolbar software that is embedded on the browser, the user is sent it back server (parameter comprises current web page address, page open time) to the click and webpage residence time of hyperlink, but do not comprise other appraisal procedures.The Alexa principle of work can referring to:

http://www.singtaonet.com/it/it?sp/t20051110?43674.html，

http://www.people.com.cn/GB/it/8219/41552/41597/3109586.html。

Present various website can be divided into following classification:

Classification one: all web site contents (for example: news website) all have same style and content to Any user at synchronization.

Classification two: can (for example: the news website of google) show different styles and content according to user's setting.

But these websites can not provide different display styles and content at real-time different conditions according to the user.

Summary of the invention

In order to improve the deficiency of the problems referred to above, the invention provides a kind of like this searching method, it can be aggregating into a record because of the identical Search Results that the searchers is had identical use value of content, be the title search result, launch the apparatus and method check other results as required again, thereby avoid " title search result " clickedly to cause that the destination server visit capacity is excessive paralyse, " title search result " click is distributed to apparatus and method on other Search Results targets automatically owing to frequent.The present invention also provides a kind of like this system, the web browser that its utilization can cooperate with the statistical server on the network, whole operation behaviors of user are converted into scoring to this webpage, and send it back statistical server, as scoring, thereby can be used as the arrangement method and the instrument of search engine to the degree of concern of webpage.The present invention also provides a kind of like this method: utilize the various information of can be obtainable, helping to judge user's environment of living in and state, in synchronization, same website in addition the time the same page in, provide different display styles and contents to the user of different conditions.

To achieve these goals, a kind of searching method that the polymerization of homologous information site search engine is shown, it comprises the following steps:

(1) inquiry passes through Web browser or accessible with application software search engine, and input needs the keyword of inquiry;

(2) find whole qualified targeted sites as original searching results by search engine;

(3) the power buyer's who " becomes the title search result " by " homologous information processing module " inquiry accounts information, and in original searching results, choose the object that is used as " title search result " in conjunction with other judgment rules;

(4) " the title search result " that only will be chosen by search engine Web server or application server shows the inquiry as Search Results, and has " the button of " details or other information are checked in expansion " implication for it provides one;

(5) inquiry also can press corresponding with it " button ", and search engine is illustrated in the original searching results that finds in (2) to it again.

" homologous information processing module " has a plurality of " (the corresponding information kind) homologous information processing module " to form, for example: " with the source web page processing module ", " homology multimedia processing module ", " homology picture processing module ", " homology document process module ", " homology software processing module ", " with source data or database processing module ", " homology GIS message processing module ", " with the value network service processing module ", " with being worth the business information processing module " etc.

Described " homologous information processing module " comprises the steps:

(1) information of at first by " information category judge module " the web search device being received is carried out the kind judgement;

(2) with concentrated send to " (the corresponding information kind) the homologous information processing module " of the information of identical type;

(3) will enter " non-homogeneous (the corresponding information kind) object information storehouse " or " homology (the corresponding information kind) object information storehouse " by the search information filing after " (the corresponding information kind) homologous information processing module " processing.

(4) by system " non-homogeneous (the corresponding information kind) object information storehouse " and " homology (the corresponding information kind) object information storehouse " is published on the Web server, for inquiry's inquiry.As implementation method in another, also can directly provide inquiry service according to these two databases based on dynamic web page to the inquiry.

Described step by " with the source web page processing module " processing info web is as follows:

(1) when " search engine searches part " receives the keyword that needs inquiry, at first judge by " Search Results has been distributed on the decision device on the Web server " whether this keyword was inquired about by other people in the recent period, if inquired about, and the result goes up issue at " search engine search results Web server ", then directly return Search Results, the webpage that will have identical source among this result aggregates into a Search Results, after clicking " same source web page " button, can on " search engine search results Web server ", see the search result web page that another comprises whole Search Results, finish whole query script;

(2) if when " search engine searches part " receives the keyword that needs inquiry, judge that by " Search Results has been distributed on the decision device on the Web server " this keyword do not inquired about by other people in the recent period, and also do not have corresponding Query Result to go up issue then at " search engine search results Web server ":

A. start " Webpage search device " search " non-homogeneous web results database " and " homology web results database " and find the web page address that meets searching key word, and obtain the content of these webpages;

If B. " Webpage search device " do not find the web page address that meets searching key word in " non-homogeneous web results database " and " homology web results database ", then return the result that the inquiry " does not have eligible webpage ", and this searching key word is joined next round to be upgraded in the task of " non-homogeneous web results database " and " homology web results database ", select into " non-homogeneous web results database " or " homology web results database " if in renewal process, found qualified web page address then whether had with source web page according to it, if so again the someone to search for same keyword be just can find the result;

(3) by " web page contents separation vessel " web page contents and the hyperlink target that finds resolved into: kinds such as multimedia, picture, literal, hyperlink;

(4) produce court verdict by various content decision devices respectively:

A. produce target web contained " identical multimedia file degree SMS (Same Media Score) " by " content of multimedia decision device ";

B. produce target web contained " the degree SPS of identical picture (Same Photo Score) " by " image content decision device ";

C. produce target web contained " the degree STS of same text (Same Text Score) " by " word content decision device ";

D. produce target web contained " the degree SHS of identical super connection (Same Hyperlinks Score) " by " linked contents decision device ";

(5) obtain " multimedia judgement weight SMP ", " picture is adjudicated weight SPP ", " literal judgement weight STP ", " link judgement weight SHP " from " with source web page decision rule storehouse " respectively and go on foot " identical multimedia file degree SMS ", " the degree SPS of identical picture ", " the degree STS of same text ", " the degree SHS of identical super connection " the doing mathematics multiplication that generates with (4) respectively;

(6) the mathematics multiplication result that (5) step was obtained is done addition, obtains " the homology degree SSS (Same of webpage

Sourc Score) ", homology degree SSS=(SMS*SMP)+(SPS*SPP)+(STS*STP)+(SHS*SHP);

(7) whether " the homology degree SSS " that judges this webpage exceeds thresholding, if exceed thresholding then be judged to be " same source web page " with other webpage, if do not exceed thresholding then be judged to be " non-homogeneous webpage ";

(8) " the non-homogeneous webpage " that (7) step was produced gone into " non-homogeneous web results database " by " non-homogeneous webpage processing module "; " same source web page " that (7) step produced gone into " homology web results database " by " with the source web page processing module ";

(9) dynamically generate the static Web page of Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to inquiring user by browser again;

(10) as the another kind of implementation method in (9) step, also can directly present to inquiring user by " dynamic web page Web server " by browser.

Describedly also can comprise the steps: by " homologous information processing module "

(1) receiving inquiry's searching key word, and judging file or the network service that needs are searched according to key words content and keyword grammer by software;

(2) judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results, will meet search condition among this result and have the file in identical source or the inlet that obtains of network service aggregates into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since (3) step;

(3) return the prompting that the inquiry " does not have qualified result ";

(4) this searching key word is joined next round and upgrade in the task of " homologous information index data base " and " non-homogeneous information index database ", and regularly start the renewal process of two databases;

(5) renewal process of " homologous information index data base " and " non-homogeneous information index database ":

A. by emerging file destination of searcher search and webpage or service entrance, enter this inlet by software and obtain this document or network service;

B. by " content decision device " judge new-found information " belonging to same content? " with the content of current " homologous information index data base " if "Yes" then it is included into this classification of " homologous information index data base " as a new element; If "No" then judge that by " content decision device " content of its " with current non-homogeneous information index database " belongs to same content? "

If C. "Yes" then: " for current information and with it homology and be stored in information in ' non-homogeneous information index database ', a newly-built classification is also all transferred to ' homologous information index data base ' ";

If D. "No" then: " be the current newly-built classification of information, and deposit in ' non-homogeneous information index database ' ";

(6) dynamically generate the static Web page of Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry who comes to search for by browser again;

(7) as the another kind of implementation method in (6) step, also can directly present to inquiring user by " dynamic web page Web server " by browser.

Described when handling document by the homologous information processing module, the renewal process of " homologous information index data base " and " non-homogeneous information index database " is:

A. by emerging document files of " document searching device " search and webpage or link inlet, enter this inlet by software and obtain this document or service;

B. by " word content decision device " and " image content decision device " judge new-found document content " belonging to same content? " with the content of current ' homology document index database ' if "Yes" then it is included into this classification of " homology document index database " as a new element; If "No" then judge that by " document content decision device " content of its " with current non-homogeneous document index database " belongs to same content? "

If C. "Yes" then: " for current document and with it homology and be stored in document in ' non-homogeneous document index database ', a newly-built classification is also all transferred to ' homology document index database ' "; If "No" then " be the current newly-built classification of document, and deposit in ' non-homogeneous document index database ' ";

Described related content decision device module comprises the steps:

(1) receives " being judged object ": can receive the multimedia in a plurality of sources, and record is judged the quantity I nputQuantity of object;

(2) search " being judged object " set attribute that participates in comparing, write down the quantity SameQuantity that current attribute has identical value " being judged object ";

(3) " weight " value Power of the current attribute of input in deterministic process;

(4) calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power;

(5) return (1) next " attribute " carried out (1)～(4), obtain the PSame of this attribute, until the PSame value that obtains subordinate's property;

(6) calculate and return the identical content degree value of " being judged object ": SameMediaPower=(all mathematics accumulated values of Psame value)/InputQuantity.

When content decision device module was the word content decision device, it comprised the steps:

(1) finds out the total length value SameLenth of the part that has identical word or sentence in the word content;

(2) find out in a plurality of word contents of input the length value MinLenth of the input characters that length is the shortest;

(3) return literal similarity degree value SameTextPower=SameLenth/MinLenth.

When content decision device module was the linked contents decision device, it comprised the steps:

(1) receives " being judged object ": the URL address of a plurality of hyperlinks;

(2) the target URL number of addresses that on estimative each hyperlink page pointed, all occurred of statistics " being judged object " similarity degree: SameURLPower=;

(3) return SameURLPower.

When content decision device module was business information content decision device, it comprised the steps:

(1) comparison participates in whether the business information of comparison is identical product or service, if " not being " returns " inconsistent ", if "Yes" entered for (2) step.

(2) whether the business information that judge to participate in comparison has geographic position susceptibility, if " not being " returns judged result " unanimity ", if "Yes" then carried out for (3) step.

(3) whether the supplier of the business information of judgement participation comparison is in identical city or zone, if " not being " returns judged result " inconsistent ", if return judged result " unanimity ".

The specific implementation method that " title search result " selects is as follows:

(1) calculate the probability weights PWn that each " homology Search Results " becomes " title search result ":

PWn＝TP*PageFocus/(RespDelay-K)

N: this Search Results is the n bar

When (RespDelay-K) smaller or equal to zero the time, (RespDelay-K) answering value is 1

PageFocus: webpage attention rate value

RespDelay: web service operating lag

K: the service response constant, suggestion K is set to 50 milliseconds (ms).

TP: title search is power as a result

(2) summation of the probability weights PWn of statistics summation all original " homology Search Results ": the whole probability weights of PWall;

(3) calculate the probability that every " homology Search Results " becomes " title search result ": Pn=PWn/Pwall;

(4) according to the probability of Pn value,, dynamically select at random " title search result ", present to the searchers along with searchers's visit action.

The computing method of the probability weights PWn of described " title search result " can also be:

A.PWn=(TP+PageFocus)/(RespDelay-K) or,

B.PWn=(TP+PageFocus)/RespDelay/K or,

c.PWn＝TP*PageFocus/RespDelay/K。

Described " homologous information processing module ":

A. can be embedded in the search engine;

B. can be placed between " search engine " and " search engine search results Web server ";

C. also can be used as pretreatment module is placed between " search engine " and the searched website.

Described expansion checks that the button of details or other information implications can be super connection or various software interface control.

A kind of system that obtains web page user Search Results attention rate comprises the PageFocus webserver, PageFocus web browser and webpage score server,

The PageFocus webserver comprises PageFocus browser ID registrar, the concern of PageFocusAccServer webpage statistical server, PageFocus browser online upgrading server and data encrypting and deciphering module;

The PageFocus web browser comprises PageFocus browser ID Registering modules, pays close attention to score value PageFocus

Computing module.

Its job step is as follows:

(1) " PageFocus web browser ", each browser all possesses globally unique ID identification number when mounted, or initiatively seeks " PageFocus browser ID registrar " on the network in use to obtain globally unique ID identification number;

(2) " PageFocus web browser " possesses and has the general networks browser, and the user converted to " paying close attention to score value PageFocus " of webpage and form " PageFocus packet " according to weight to the operation of browser with to the operation of webpage, be passed to " the PageFocusAccServer webpage is paid close attention to statistical server " of this search engine by procotol with cipher mode;

(3) " PageFocusAccServer webpage pay close attention to statistical server " " paying close attention to score value PageFocus " of after " PageFocus packet " that each " PageFocus web browser " of receiving the whole world sent its inside being comprised is added on the corresponding webpage;

(4) " paying close attention to score value PageFocus " of each webpage of the whole world that comprises on " PageFocusAccServer webpage pay close attention to statistical server ", these information can form by various disposal routes: search engine is selected to can be used as the foundation of " title search result ", also can directly be announced out the service of conduct " webpage hot topic degree ranking list " according to, search engine the webpage seniority among brothers and sisters in having the identical content Search Results.

Described PageFocusAccServer webpage is paid close attention to statistical server can adopt mathematics logarithm or scientific notation record score.

Described PageFocus packet can form when browser thoroughly cuts out this webpage, also can regularly form, and forms in the time of also can being accumulated to certain score value again.

Described concern score value PageFocus forms according to the listed weight of following table:

Note:

Weighted value in 1 form is embodiment, and other numerical value also can adopt, and is scope of the present invention.

The calculation procedure of described word read speed is as follows:

A. mouse roller rolls: the each literal line number of rolling of word read speed=(viewing area width/set width) */rolling time at interval;

B. keyboard page turning: the literal line number/page turning time interval of word read speed=(viewing area width/set width) each page turning of *;

C. the forms scroll bar rolls: the each literal line number of rolling of word read speed=(viewing area width/set width) */rolling time at interval.

Described PageFocus packet comprises PageFocus browser ID, webpage URL and webpage PageFocus score value field.

Each webpage that possesses " same source web page " is in the page rank process that the participation search engine provides, can use the foundation of the summation of user's attention rate PageFocus score value that each " same source web page " obtain as rank, that is: A can adopt the summation of user's attention rate PageFocus that each " same source web page " obtain as the rank foundation when participating in the search-engine results rank in " the title search result " of " same source web page "; Each webpage in the B " same source web page " also can adopt the summation of user's attention rate PageFocus that each webpage of " the same source web page " of its subordinate obtains as the rank foundation when participating in the search-engine results rank.

A kind of automatic judgement User Status also provides appropriate web page style and the method for content, and it comprises the steps:

(1) after " Website server cluster inlet " receives that the user visits the request of this website webpage first, at first in the access protocal or the IP layer protocol in obtain its IP address;

(2) inquiring about its IP address according to the IP address in " IP address properties database " is " IP address, workplace " or " the IP address of individual or leisure occasion ", if " IP address, workplace " then carried out for (3) step, if then carried out for (4) step " the IP address of individual or leisure occasion ";

(3) obtain " IP address, workplace " residing geographic position, and obtain administrative time of this geographic area, if this IP address affiliated area is in the working time, then its visit is assigned to " work style server " page service that provides suitable workplace to use to it is provided, otherwise carried out for (4) step;

(4) then its visit is assigned to " individual and leisure style server " page service that provides suitable individual and leisure state to use to it is provided.

By such scheme, can be identical and the Search Results that the searchers has identical use value is aggregated into a record content, promptly the title search result launches the apparatus and method of checking other results as required again.Designed and avoided " title search result " clickedly to cause that the destination server visit capacity is excessive paralyses, " title search result " click has been distributed to device on other Search Results targets automatically owing to frequent.The present invention is except possessing existing search engine, the various network services that also possesses search various " multimedias ", " document ", " software ", " hardware and software source code or design document ", " data or database ", " information ", the function of for example file-sharing, FTP service, P2P service etc.

The web browser that utilization can cooperate with the statistical server on the network, whole operation behaviors of user are converted into scoring to this webpage, and send it back statistical server, as scoring, thereby can be used as the rank instrument of search engine to the degree of concern of webpage.

By web site contents style adaptive approach, the user can:

1. 9:00～18:00 in morning of 1～5 belongs to the working time week, and in running order people need see succinctly, rigorous relatively style and as far as possible and the duty related content.

2. week 1～5 18:00 in the evening～morning 9:00 and the whole day in week 6～7 belong to leisure time, and the people who is in the leisure state need see the style and the content of ripple alive, lively, leisure.

3. be in that people from workplace need see succinctly, rigorous relatively style and as far as possible and the duty related content.

4. the people who is in family and leisure place need see ripple alive, the style and the content of livening up, lying fallow.

5. the people who is in other environment or state need see with at that time environment and state adapt style and content.

Brief Description Of Drawings

Fig. 1 is the system works structural drawing of homologous information site search engine aggregation display method;

Fig. 2 is a homologous information processing module cut-away view;

Fig. 3 is with source web page processing module process flow diagram;

Fig. 4 is a homology multimedia processing module process flow diagram;

Fig. 5 is a homology picture processing module process flow diagram;

Fig. 6 is a homology document process module process flow diagram;

Fig. 7 is a homology software processing module process flow diagram;

Fig. 8 is with source data or database processing module process flow diagram;

Fig. 9 is a homology GIS message processing module process flow diagram;

Figure 10 is with value network service processing module process flow diagram;

Figure 11 is with being worth business information processing module process flow diagram;

Figure 12 is for obtaining web page user attention rate system construction drawing;

Figure 13 is not for possessing the existing routine search engine web station system of content and style adaptive technique;

Figure 14 for the present invention possess content and style adaptive technique the search engine web site system.

Embodiment

Now the present invention is described further in conjunction with the accompanying drawings.

Fig. 1 is the system works structural drawing of homologous information site search engine aggregation display method.The 1st step: pass through Web browser or accessible with application software search engine by the inquiry, and input needs the keyword of inquiry.The 2nd step: find whole qualified targeted sites as " original searching results " by search engine.The 3rd step:, and in " original searching results ", choose the object that is used as " title search result ": A " homologous information processing module " in conjunction with other judgment rules and can be embedded in the search engine by " homologous information processing module " inquiry power buyer's that " becomes the title search result " accounts information; " homologous information processing module " can be placed between " search engine " and " search engine search results Web server "; C " homologous information processing module " also can be used as pretreatment module and is placed between " search engine " and the searched website.The 4th step: " the title search result " that only will be chosen by search engine Web server or application server shows the inquiry as Search Results, and has " button (the comprising super connection or various software interface control) " of " details or other information are checked in expansion " implication for it provides one.The 5th step: have only the inquiry to wish further to launch certain bar " title search result ", and when pressing with it corresponding " button ", search engine is illustrated in " original searching results " that finds in " the 2nd step " to it again.

Fig. 2 is a homologous information processing module cut-away view." homologous information processing module " is defined as: be mainly used to 1) judge that whether a plurality of nodes are arranged in the one group of information node that finds according to searching key word is that (these websites have same search to the inquiry and are worth or use value one or more repetition websites with information source, usually needn't all directly represent) to the inquiry, and these are repeated websites aggregate into a Search Results and issue the inquiry, just these Search Results are presented when having only the inquiry to need the website of other equal values.2) mainly to concentrate on the search of webpage different with existing search engine, " homologous information processing module " is except needing to handle the various network services that can also handle various " multimedias ", " document ", " software ", " hardware and software source code or design document ", " data or database ", " information " " Html webpage ", for example: file-sharing, FTP service, P2P service etc.

" homologous information processing module " adopts modular construction, can progressively develop and implement each module wherein as required, and possess extended capability, and each module also can further be strengthened its accuracy of judging automatically simultaneously, comprising:

1 " information category judge module ": judge the kind of information, and information of the same type concentrated send to respective type information processing module, as following module.

2 " with the source web page processing modules ": be used for judging and handle belonging to same source and the inquiry being had the webpage of equal value of finding, for example: Html, ASP, JSP, PHP, the content of BBS forum etc.

3 " homology multimedia processing modules ": be used for judging and handling the same source of finding that belongs to, and the inquiry had the multimedia file or a network service of equal value, for example: .MP3, .AVI, .WMV .MPEG .WAV, .RM wait various video files, and various Video service access interface based on stream media technology.

4 " homology picture processing modules ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry is had the picture of equal value, for example: .GIF .JPG .BMP .PNG etc.

5 " homology document process modules ": be used for judging and handle belonging to same source, having identical or related content of finding, and the inquiry had the various format file files or a network service of equal value, for example: " .Doc ", " .Txt ", " .Pdf ", " .XLS ", " .PPT " etc.

6 " homology software processing module ": can judge and handle the same software that the computer application software installation procedure that finds belongs to same author that they can be to adapt to similar and different operating system, the software installation procedure of identical or different version.

7 " with source data or database processing modules ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry is had equal value, the data file of known format or database file, for example: .DAT, .XLS .MDF .DBF etc.

8 " homology GIS message processing modules ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry is had the numerical map file or the service of equal value.

9 " with the value network service processing module ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry had a network service of equal value, for example: the FTP download service of same file, relay the IPTV service of a TV station simultaneously, the mail service of 1GB capacity etc. is provided simultaneously.

10 " with being worth the business information processing modules ": be used for judging and handle belonging to same source or having identical content of finding, be in identical geography or administrative region, and the inquiry had equal value, by the commercial product of Web publishing oneself or the ad content of service, for example: the egg that provides in same block is sold information, the haircut that provides in same block service sale information is in the operable telephonic communication service in same city etc." information category judge module "

" information category judge module " is mainly used in the information of collecting, and sorts out its type, and delivers to corresponding message processing module.

The information source that " information category judge module " handled mainly contains 3 kinds of forms:

(1) form web page: information comes from the web page contents of website, also contains the hyperlink of pointing to particular file types in the webpage simultaneously, for example: " http://www.008.org.cn/up/the_quiet_american.mp3 "

(2) network service form: comprise the network service entrance that the various network services device provides, for example: the kind sub-services of ftp file download service, various P2P (Pear To Pear) software (for example: BT download, eMule download), NEWS SERVER service etc.For knowing of network service entrance two kinds of approach can be arranged:

A. the network service that can find on the webpage: the network service entrance that can know by the analyzing web page content.

B. directly submit its network service entrance or content to this search engine by Internet Service Provider.

(3) data or database form: directly provide information typing service to network by search engine, submit the information of oneself to by the network user, the final information that forms data file or database form, when this search engine was inquired about, therefrom inquiry's requirement was satisfied in information extraction.

The kind determination methods of " form web page " information is as follows:

Webpage itself just can directly be exported to " with the source web page processing module " as " webpage " and handle, in addition, " information category judge module " according to the webpage grammer (for example: Html, Java, JSP, ASP, ASPX, PHP or the like language) at the grammer of " hyperlink ", can directly parse the file type of its sensing, can distinguish its information type according to different file types, see following table for details:

For example:

1. contain in the webpage: " Http:// xxx/xxx/song.mp3" hyperlink, can judge that its target is " multimedia " type information.

2. contain in the webpage: " Http:// xxx/xxx/song.rar" hyperlink, decompress after finding this file destination, find that the inside only contains " song.mp3 " can judge that still target is " multimedia " type information.

3. contain in the webpage: " Http:// xxx/xxx/song.rar" hyperlink, decompress after finding this file destination, find that the title of file number, each file of file that the inside is contained and catalogue is all identical with the mounting disc of certain known software with size, can judge that it is " software " type information.

The kind determination methods of " network service form " information is as follows:

The 1st step: visit this service as domestic consumer, to obtain its content.

The 2nd step: the content that obtains is classified according to following table.

The 3rd step:, then need to launch classifying according to the 2nd step after its content if acquisition is compressed format files.

The kind determination methods of " data or database form " information is as follows:

The 1st step: visit data file or database, to obtain its content.

The 2nd step: directly carry out " the 4th step " from data file or database if the information that obtains is file.

The 3rd step:, then need to visit this position from data file or database to obtain file destination if the information that obtains is the position of depositing file.

The 4th step: the content that obtains is classified according to following table.

The 5th step:, then need to launch classifying according to 4 steps after its content if acquisition is compressed format files." with the source web page processing module "

Fig. 3 is " with the source web page processing module " process flow diagram." with the source web page processing module " major function: will find according to searching key word, webpage with identical main contents, represent to the inquiry with " title search result " form, and can see the Query Result of the webpage that all inquires by " expansion " implication button with identical main contents.For improving the serviceability of native system substantially, we have adopted following technology:

Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.

" homologous information processing module " is placed in " non-homogeneous web results database " and " homology web results database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period.

" homologous information processing module " treatment scheme is as follows:

The 1st step: when " search engine searches part " receives the keyword that needs inquiry, at first judge by " Search Results has been distributed on the decision device on the Web server " whether this keyword was inquired about by other people in the recent period, if inquired about, and the result goes up issue at " search engine search results Web server ", then directly return Search Results (seeing figure " M1 " mark), the webpage that will have identical source among this result aggregates into a Search Results, after clicking " same source web page " button, can on " search engine search results Web server ", see the search result web page that another comprises whole Search Results, finish whole query script.

The 2nd step: if when " search engine searches part " receives the keyword that needs inquiry, judge that by " Search Results has been distributed on the decision device on the Web server " this keyword do not inquired about by other people in the recent period, and also do not have corresponding Query Result to go up issue then at " search engine search results Web server ":

Start " Webpage search device " search " non-homogeneous web results database " and " homology web results database " and find the web page address that meets searching key word, and obtain the content of these webpages.

If " Webpage search device " do not find the web page address that meets searching key word in " non-homogeneous web results database " and " homology web results database ", then return the result that the inquiry " does not have eligible webpage ", and this searching key word is joined next round to be upgraded in the task of " non-homogeneous web results database " and " homology web results database ", select into " non-homogeneous web results database " or " homology web results database " if in renewal process, found qualified web page address then whether had with source web page according to it, if so again the someone to search for same keyword be just can find the result.

The 3rd step: by " web page contents separation vessel " web page contents and the hyperlink target that finds resolved into: kinds such as multimedia, picture, literal, hyperlink.

The 4th step: produce court verdict by various content decision devices respectively

A. produce target web contained " identical multimedia file degree SMS " (Same Media Score) (multimedia definition comprises: the broadcast service or the file service of the broadcast service of Flash class, vedio/audio file or file service, IPTV/ direct broadcasting satellite/audio-video monitoring/real-time information such as performance/manual answering, other multimedia services) by " content of multimedia decision device ".

B. produce target web contained " the degree SPS of identical picture " (Same Photo Score) by " image content decision device ".

C. produce target web contained " the degree STS of same text " (Same Text Score) by " word content decision device ".

D. produce target web contained " the degree SHS of identical super connection " (Same Hyperlinks Score) by " linked contents decision device ".

The 5th step: obtain " multimedia judgement weight SMP ", " picture is adjudicated weight SPP ", " literal judgement weight STP ", " link judgement weight SHP " from " with source web page decision rule storehouse " respectively and go on foot " identical multimedia file degree SMS ", " the degree SPS of identical picture ", " the degree STS of same text ", " the degree SHS of identical super connection " the doing mathematics multiplication that generates with the 4th respectively.

The 6th step: the mathematics multiplication result that will obtain in " the 5th step " is done addition, obtains " homology degree SSS (the Same Sourc Score) " of webpage, homology degree SSS=(SMS*SMP)+(SPS*SPP)+(STS*STP)+(SHS*SHP)

The 7th step: whether " the homology degree SSS " that judge this webpage exceeds thresholding, if exceed thresholding then be judged to be " same source web page " with other webpage, if do not exceed thresholding then be judged to be " non-homogeneous webpage ".

The 8th step: " the non-homogeneous webpage " that will produce in " the 7th step " goes into " non-homogeneous web results database " by " non-homogeneous webpage processing module "; " the same source web page " that will produce in " the 7th step " gone into " homology web results database " by " with the source web page processing module ".

The 9th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to inquiring user by browser again.(seeing figure " M2 " mark).

As the another kind of implementation method in the 9th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).

" web page contents sorter " can be realized by software, direct basis " Html grammer ", " ASP/ASPX grammer ", and " PHP ", the syntax parsing that uses on the various webpages such as " JSP " goes out the type of each content.

" homology multimedia processing module "

Fig. 4 is " homology multimedia processing module " process flow diagram.For multimedia file that meets search condition or service, " homology multimedia processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:

" homologous information processing module " is placed in " non-homogeneous multimedia index database " and " homology multimedia index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period.

" homology multimedia processing module " treatment scheme is as follows:

The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is multimedia file or service (for example, contain in the keyword searching of " MP3 " expression needs be .MP3 file rather than the webpage that contains this literal).

The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the multimedia interface that obtains that search condition has identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.

The 3rd step: return the result that the inquiry " does not have eligible multimedia ".

The 4th step: this searching key word is joined next round upgrade in the task of " homology multimedia index database " and " non-homogeneous multimedia index database ", and regularly start the renewal process of two databases.

The 5th step: the renewal process of " homology multimedia index database " and " non-homogeneous multimedia index database ":

A. by emerging multimedia file of " multimedia search device " search and webpage or service entrance, enter this inlet by software and obtain this document or service.

B. by " content of multimedia decision device " judge new-found content of multimedia " belonging to same content? " with the content of current " homology multimedia index database " if "Yes" then it is included into this classification of " homology multimedia index database " as a new element; If "No" then judge that by " content of multimedia decision device " content of its " with current non-homogeneous multimedia index database " belongs to same content? "

If C. "Yes" then: " for current multimedia and with it homology and be stored in multimedia in ' non-homogeneous multimedia index database ', a newly-built classification is also all transferred to ' homology multimedia index database ' "; If "No" then " be the current newly-built classification of multimedia, and deposit in ' non-homogeneous multimedia index database ' ";

The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.

As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).

" homology picture processing module "

Fig. 5 is a homology picture processing module process flow diagram.For picture file that meets search condition or link, " homology picture processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:

" homologous information processing module " is placed in " non-homogeneous picture indices database " and " homology picture indices database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology picture processing module " treatment scheme is as follows:

The 1st step: receiving inquiry's searching key word, and declaring according to key words content and keyword grammer by software

Disconnected needs are looked for is picture file or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).

The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the picture in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.

The 3rd step: return the result that the inquiry " does not have eligible picture ".

The 4th step: this searching key word is joined next round upgrade in the task of " homology picture indices database " and " non-homogeneous picture indices database ", and regularly start the renewal process of two databases.

The 5th step: the renewal process of " homology picture indices database " and " non-homogeneous picture indices database ":

A. by emerging picture file of " picture searching device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.

B. by " image content decision device " judge new-found image content " belonging to same content? " with the content of current " homology picture indices database " if "Yes" then it is included into this classification of " homology picture indices database " as a new element; If "No" then judge that by " image content decision device " content of its " with current non-homogeneous picture indices database " belongs to same content? "

If C. "Yes" then: " for current picture and with it homology and be stored in picture in ' non-homogeneous picture indices database ', a newly-built classification is also all transferred to ' homology picture indices database ' "; If "No" then " be the current newly-built classification of picture, and deposit in ' non-homogeneous picture indices database ' ";

" homology document process module "

Fig. 6 is a homology document process module process flow diagram.Homology document process module " support common document format: " .Txt ", " .Doc ", " .PPT ", " .PDF ", " .XLS " or the like.For document files that meets search condition or link, " homology document process module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:

" homologous information processing module " is placed in " non-homogeneous document index database " and " homology document index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology document process module " treatment scheme is as follows:

The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is document files or link (for example, contain in the keyword searching of " .PDF " expression needs be .PDF file rather than the webpage that contains this literal).

The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the document in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.

The 3rd step: return the result that the inquiry " does not have eligible document ".

The 4th step: this searching key word is joined next round upgrade in the task of " homology document index database " and " non-homogeneous document index database ", and regularly start the renewal process of two databases.

The 5th step: the renewal process of " homology document index database " and " non-homogeneous document index database ":

A. by emerging document files of " document searching device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.

" homology software processing module "

Fig. 7 is a homology software processing module process flow diagram.For software document that meets search condition or link, " homology software processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:

" homologous information processing module " is placed in " non-homogeneous software index data base " and " with the source software index data base " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology software processing module " treatment scheme is as follows:

The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is software document or link (for example, contain in the keyword searching of " .EXE " expression needs be .EXE file rather than the webpage that contains this literal).

The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the software in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.

The 3rd step: return the result that the inquiry " does not have eligible software ".

The 4th step: this searching key word is joined next round upgrade in the task of " with the source software index data base " and " non-homogeneous software index data base ", and regularly start the renewal process of two databases.

The 5th step: the renewal process of " with the source software index data base " and " non-homogeneous software index data base ":

A. by emerging software document of " software search device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.

B. by " software content decision device " judge new-found software content " belonging to same content? " with the content of current " with the source software index data base " if "Yes" then it is included into this classification of " with the source software index data base " as a new element; If "No" then judge that by " software content decision device " content of its " with current non-homogeneous software index data base " belongs to same content? "

If C. "Yes" then: " for current software and with it homology and be stored in software in ' non-homogeneous software index data base ', a newly-built classification is also all transferred to ' with the source software index data base ' "; If "No" then " be the current newly-built classification of software, and deposit in ' non-homogeneous software index data base ' ";

" with source data or database processing module "

Fig. 8 is with source data or database processing module process flow diagram.For software document that meets search condition or link, " homology data processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:

" homologous information processing module " is placed in " non-homogeneous data directory database " and " homology data directory database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology data processing module " treatment scheme is as follows:

The 1st step: receiving inquiry's searching key word, and judge that by data based key words content and keyword grammer what need look for is data file or link (for example, contain in the keyword searching of " .DBF " expression needs be ..DBF file rather than the webpage that contains this literal).

The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the data in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.

The 3rd step: return the result that the inquiry " does not have eligible data ".

The 4th step: this searching key word is joined next round upgrade in the task of " homology data directory database " and " non-homogeneous data directory database ", and regularly start the renewal process of two databases.

The 5th step: the renewal process of " homology data directory database " and " non-homogeneous data directory database ":

A. by emerging data file of " data search device " search and webpage or link inlet, enter this inlet by data and obtain this document or service.

B. by " data content decision device " judge new-found data content " belonging to same content? " with the content of current " homology data directory database " if "Yes" then it is included into this classification of " homology data directory database " as a new element; If "No" then judge that by " data content decision device " content of its " with current non-homogeneous data directory database " belongs to same content? "

If C. "Yes" then: " for current data and with it homology and be stored in data in ' non-homogeneous data directory database ', a newly-built classification is also all transferred to ' homology data directory database ' "; If "No" then " be the current newly-built classification of data, and deposit in ' non-homogeneous data directory database ' ";

" homology GIS message processing module "

Fig. 9 is " homology GIS message processing module " process flow diagram.For the GIS message file or the link that meet search condition, " homology GIS message processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:

" homologous information processing module " is placed in " non-homogeneous GIS information index database " and " the homology GIS information index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology GIS message processing module " treatment scheme is as follows:

The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is GIS message file or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).

The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the GIS information in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.

The 3rd step: return the result that the inquiry " does not have eligible GIS information ".

The 4th step: this searching key word is joined next round upgrade in the task of " homology GIS information index database " and " non-homogeneous GIS information index database ", and regularly start the renewal process of two databases.

The 5th step: the renewal process of " homology GIS information index database " and " non-homogeneous GIS information index database ":

A. by emerging GIS message file of " GIS information searcher " search and webpage or link inlet, enter this inlet by software and obtain this document or service.

B. by " GIS information content decision device " judge the new-found GIS information content " belonging to same content? " with the content of current " homology GIS information index database " if "Yes" then it is included into this classification of " homology GIS information index database " as a new element; If "No" then judge that by " GIS information content decision device " content of its " with current non-homogeneous GIS information index database " belongs to same content? "

If C. "Yes" then: " for current GIS information and with it homology and be stored in GIS information in ' non-homogeneous GIS information index database ', a newly-built classification is also all transferred to ' homology GIS information index database ' "; If "No" then " be the current newly-built classification of GIS information, and deposit in ' non-homogeneous GIS information index database ' ";

" with the value network service processing module "

Figure 10 is " with the value network service processing module " process flow diagram.For the network service that meets search condition, " with the value network service processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:

" with the value information processing module " is with in result is sub-category is placed on " non-with value network service index data base " and " serving index data base with value network ", and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." with the value network service processing module " treatment scheme is as follows:

The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is network service document or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).

The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the network service in identical source among this result and aggregate into one " title search result ", after clicking " same value document " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.

The 3rd step: return the result that the inquiry " does not have eligible network service ".

The 4th step: this searching key word is joined next round upgrade in the task of " with value network service index data base " and " non-", and regularly start the renewal process of two databases with value network service index data base.

The 5th step: the renewal process of " with value network service index data base " and " non-" with value network service index data base:

A. by emerging network service document of " network service search device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.

B. by " network service content decision device " judge new-found network service content " belonging to same content? " with the content of current " with value network service index data base " if "Yes" then it is included into this classification of " with value network service index data base " as a new element; If "No" then judge that by " network service content decision device " content of its " with current non-with value network service index data base " belongs to same content? "

If C. "Yes" then: " for current network service and with it be worth and be stored in network service in ' non-' with value network service index data base, a newly-built classification is also all transferred to ' serving index data base with value network ' "; If "No" then " serve a newly-built classification, and deposit in ' non-with value network service index data base ' " for current network;

The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " with being worth the webpage result database " and " non-" with being worth the webpage result database, be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.

" with being worth the business information processing module "

Figure 11 is " with being worth the business information processing module " process flow diagram.For the business information that meets search condition, " with being worth the business information processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:

" with the value information processing module " is with in result is sub-category is placed on " non-with being worth the business information index data base " and " with being worth the business information index data base ", and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." with being worth the business information processing module " treatment scheme is as follows:

The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is business information file or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).

The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the business information in identical source among this result and aggregate into one " title search result ", after clicking " same value document " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.

The 3rd step: return the result that the inquiry " does not have eligible business information ".

The 4th step: this searching key word is joined next round upgrade in the task of " with being worth the business information index data base " and " non-", and regularly start the renewal process of two databases with being worth the business information index data base.

The 5th step: the renewal process of " with being worth the business information index data base " and " non-" with being worth the business information index data base:

A. by emerging business information file of " business information searcher " search and webpage or link inlet, enter this inlet by software and obtain this document or service.

B. by " business information content decision device " judge new-found business information content " belonging to same content? " with the content of current " with being worth the business information index data base " if "Yes" then it is included into this classification of " with being worth the business information index data base " as a new element; If "No" then judge that by " business information content decision device " content of its " with current non-with being worth the business information index data base " belongs to same content? "

If C. "Yes" then: " for current business information and with it be worth and be stored in business information in ' non-' with being worth the business information index data base, a newly-built classification is also all transferred to ' with value business information index data base ' "; If "No" then " be the current newly-built classification of business information, and deposit in ' non-with be worth business information index data base ' ";

The characteristics of " with being worth the business information processing module " are and can judge whether a plurality of business information targets have identical use value to the inquiry automatically with inquiry's distribution according to commodity or service feature, supply, thereby as the foundation that it is aggregated into " title search result ", and the foundation of Query Result ordering.

The content decision device can be general in various " homology (with being worth) message processing modules ".

" content decision device " specific implementation

" content of multimedia decision device " specific implementation:

1 input: many matchmakers file (record into file if the service of playing just will rise, or obtain media file information) that can receive a plurality of sources from Play Server.

2 handle: carry out the comparison of the content of multimedia goodness of fit.

3 return: calculate the identical content degree value that has in the input multimedia: SameMediaPower.

The specific implementation method:

The 1st step: receive " being judged object ": the multimedia that can receive a plurality of sources.And record is judged the quantity of object: InputQuantity.

The 2nd step: search the attribute that " being judged object " can participate in comparing in following table, write down the quantity that current attribute has identical value " being judged object ": SameQuantity (for example, 5 are judged in the object, there is the attribute of 3 objects to have identical value, then the SameQuantity=3 of this attribute)

The 3rd step: import current attribute " weight " value (from following table, finding) in deterministic process: Power

The 4th step: calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power

The 5th step: return " the 1st step " to next " attribute " execution " the 1st step "～" the 4th step ", obtain the PSame of this attribute.Until obtain subordinate's property the PSame value.

The 6th step: the identical content degree value of calculating and return " being judged object ": SameMediaPower=(all mathematics accumulated values of Psame value)/InputQuantity.

Content is judged in video file or the service of playing:

Note:

1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.

2. according to actual conditions, some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.

Audio file is judged content:

Note:

1 the invention reside in the method that employing " weight " value is calculated the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.

2 according to actual conditions, and some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.

The Flash file is judged content:

Note:

" image content decision device " specific implementation

1 input: the picture that can receive a plurality of sources.

2 handle: carry out the comparison of the image content goodness of fit.

3 return: calculate the identical content degree value that has in the input picture: SamePicPower.

The specific implementation method:

The 1st step: receive " being judged object ": the picture that can receive a plurality of sources.And record is judged the quantity of object: InputQuantity.

The 6th step: the identical content degree value of calculating and return " being judged object ": SamePicPower=(all mathematics accumulated values of Psame value)/InputQuantity.

According to the judgement of the various attributes of picture and image recognition software for similarity degree.

Note:

" word content decision device " specific implementation

" word content decision device ", can realize by software:

1 input: can receive the literal in a plurality of sources, as " being judged object ".

2 handle: carry out the comparison of the image content goodness of fit.

3 return: the consistent degree value SameTextPower between " being judged object ".

Implementation method:

The 1st step: find out in a plurality of pictures of input

In the word content, has the total length value of the part of identical word or sentence: SameLenth.

The 2nd step: find out in a plurality of word contents of input the length value of the input characters that length is the shortest, MinLenth.

The 3rd step: return literal similarity degree value: SameTextPower=SameLenth/MinLenth

In the literal that finds in this way: the normally same piece of writing article number of pages of the long article word of length is few or contain mass advertising and outside hyperlink, and the shortest normally same piece of writing of the literal article of length is divided into multipage number more or contain minimum advertisement and outside hyperlink.

" linked contents decision device " specific implementation

" linked contents decision device " can be realized by software: be used for comparing the hyperlink that is contained on a plurality of webpages and whether have common trait.

1 input: the Url address (every group of whole hyperlinks that hyperlink normally obtains from a webpage) of organizing hyperlink more.

2 handle: carry out goodness of fit calculating in hyperlink Url address between each group

3 return: have identical hyperlink number between each group.

Implementation method:

The 1st step: receive " being judged object ": the URL address of organizing hyperlink more.

The 2nd step: the URL number of addresses that statistics " being judged object " similarity degree: SameURLPower=all occurred every group of hyperlink.

The 3rd step: return SameURLPower.

" software content decision device " specific implementation

" software content decision device ", whether a plurality of softwares that are used for comparing input are software of the same race.

1 input: the software that can receive a plurality of sources.

2 handle: carry out the comparison of the software content goodness of fit.

3 return: software content goodness of fit numerical value.

The specific implementation method:

The 1st step: receive " being judged object ": the file of a plurality of inputs or catalogue.And record is judged the quantity of object: InputQuantity.

The 2nd step: search the attribute that " being judged object " can be compared in following table, write down the quantity that current attribute has identical value " being judged object ": SameQuantity (for example, 5 are judged in the object, there is the attribute of 3 objects to have identical value, then the SameQuantity=3 of this attribute)

The 4th step: calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power.

The 6th step: the identical value of calculating and return " being judged object ": SameSoftPower=(all mathematics accumulated values of Psame value)/InputQuantity.

Note:

" data or data-base content decision device " specific implementation

Whether every data recording content comparing one by one in the different pieces of information library file equates, returns the database consistent degree value SameDBPower that participates in comparison and whether surpasses thresholding.

The database of the record number that the SameDBPower=field name is identical and numerical value equates/participation comparison has the minimum record number of this field.

SameDBPower has reflected that identical content record number has the ratio of the database of minimum record number relatively, and the SameDBPower value is: 0～1.

" data or data-base content decision device " specific implementation

Can adopt following performing step for data file:

The 1st step: in a plurality of data files that participate in comparison, file of picked at random is as " comparison standard ".

The 2nd step: carry out the conforming rough comparison of other file and " comparison standard ": file size, file verification and, file attribute informations such as title, theme, version, author, classification, key word, remarks.

The 3rd step: if unanimity then be judged to be " rough consistent ", such judged result is the output of conduct " data or data-base content decision device " directly.

The 4th step: further compare as need, in the input file that obtains " rough consistent ", carried out for the 5th step.

The 5th step: meticulous comparison: the comparison one by one of each byte in file attribute information and the file.All all identical file of feature can be judged to be " in full accord ", as the output of " data or data-base content decision device ".

Can adopt following performing step for database file:

The 1st step: the database file to input judges whether to meet database format of the same race according to filename suffix and file attribute.

The 2nd step: carried out for the 3rd step for database format of the same race, for direct the 4th step of database format not of the same race

The 3rd step: form database of the same race compares roughly: file size, file verification and, file attribute informations such as title, theme, version, author, classification, key word, remarks.Above-mentioned feature carried out for the 4th step not in full conformity with as the output of " inconsistent " judged result for the database file that meets fully.

The 4th step: the meticulous comparison of database: (this step adapts to various database file and participates in the content comparison).Form according to every kind of database file extracts its " database table " one by one, and judge whether its " database table " structure is consistent: inconsistent conduct " inconsistent " output, consistent database file carried out for the 5th step.

The 5th step: the content of comparing every record of the database file that participates in comparison one by one: run into the identical situation of recorded content: for counter " the record number that the SameRecNum field name is identical and numerical value equates " adds 1.

The 6th step: calculate " SameDBPower database consistent degree value "=" the record number that the SameRecNum field name is identical and numerical value equates "/" database that participates in comparing has the minimum record number of this field ".(SameDBPower has reflected that identical content record number has the ratio of the database of minimum record number relatively, and the SameDBPower value is: 0～1).

The 7th step: judge that whether " SameDBPower database consistent degree value " surpasses thresholding, surpass thresholding and then export " unanimity " as judged result, otherwise output " inconsistent " is as judged result.

" GIS information content decision device "

" GIS information content decision device ", can realize by software:

1 input: can receive the numerical map in a plurality of sources, as " being judged object ".

2 handle: carry out the goodness of fit comparison of the coverage of numerical map.

3 return: the consistent degree value SameMapPower (value 0～1) between " being judged object ".

Implementation method:

The 1st step: open the numerical map file of participating in comparison according to the form of numerical map.

The 2nd step: find the northwest corner of numerical map and the longitude and latitude of southeast corner (also can be the map diagonal angle of other form).

The 3rd step: the northwest corner of the numerical map of comparing and longitude, the latitude error of southeast corner are participated in comparison, calculate the consistance value SameMapPower of map overlay area:

Suppose that " Fig. 1 " and " Fig. 2 " participates in comparison:

Then:

The area of minimum map in the secondary map of area/two of SameMapPower=two secondary map overlapping regions.

The 4th step: return the SameMapPower value.

The 5th step: judge whether (for example: threshold value=0.8), be then to be judged to be identical map, be not then to be judged to be map inequality to SameMapPower above thresholding.

" network service content decision device "

The FTP service content judgement of " network service content decision device ":

The 1st step: adopt corresponding File Transfer Protocol to land the service that participates in comparison, and obtain its inner file.

The 2nd step: behind the file that obtains the FTP service, at first judge according to the filename suffix whether file type is consistent, if inconsistent returning " inconsistent " is as output, if the file type unanimity carried out for the 3rd step.

The 3rd step: whether consistent, and return its judged result if adopting " content of multimedia decision device ", " image content decision device ", " word content decision device ", " software content decision device ", " data or data-base content decision device " or " GIS information content decision device " to adjudicate its file content according to file type.

The mailbox service content judgement that the Email website provides:

If the mailbox service information spinner that the Email website provides is by the webpage of each website of software search, and from the webpage label, parse mailbox size, charge situation, whether support information such as POP agreement.

The 1st step: mailbox size is divided into corresponding grade, (for example: 10MB～25MB, 25MB～100MB, 100MB～300MB, 300MB～1GB, 1GB～100GB etc.), whether the mailbox that judge to participate in comparison then is in same rank, if " be not " then return " inconsistent ", if "Yes" then carried out for the 2nd step.

The 2nd step: whether comparison " charge situation " is consistent, if " not being " then return " inconsistent ", if "Yes" then carried out for the 3rd step.

The 3rd step: comparison supports whether the POP terms of agreement is consistent, if " not being " then return " inconsistent ", if "Yes" then return " unanimity ".

" business information content decision device "

Whether product of issuing on webpage or service sale information is identical, and in identical physical geography scope, in the identical administrative geography scope, identical distance range.

The 1st step: whether the business information that comparison participates in comparison is identical product or service, if " not being " returns " inconsistent ", if "Yes" entered for the 2nd step.

The 2nd step: whether the business information that judge to participate in comparison (for example: personal consumption class commodity, need have geographic position susceptibility to on-the-spot service of serving has geographic position susceptibility, for example ice cream, private tutor's service etc.), if " be not " to return judged result " unanimity ", and if "Yes" would carry out the 3rd the step.

The 3rd step: whether the supplier who judges the business information that participates in comparison is in identical city or zone, if " not being " returns judged result " inconsistent ", if return judged result " unanimity ".

" obtain web page user attention rate subsystem "

Figure 12 is for obtaining web page user attention rate subsystem structure figure.This search engine can and supporting with it web browser (or compatible this search engine can and supporting with it web browser between other third party's browsers of communications protocol) the collaborative work mode, gather the degree of concern of user by web browser to each webpage, and report search engine, the foundation of carrying out search result rank or selection " title search result " as search engine.This method and device can also be separately outside search engines, and independent formation can provide the Web inquiry system of " webpage popular degree ranking list ", and can carry out charge operation or in return condition exchange other interests for.

Native system mainly comprises the two large divisions: " the PageFocus webserver " and " PageFocus web browser ".

" the PageFocus webserver " structure

" the PageFocus webserver " obtains the degree of concern of global user to each webpage by " PageFocus web browser ", and forms " pay close attention to score value PageFocus " database of this webpage, as the metric of the popular degree of webpage.

" the PageFocus webserver " is made up of following:

(1) " PageFocus browser ID registrar ": for " the PageFocus web browser " that is just using on network distributes globally unique ID identification number.

(2) " the PageFocusAccServer webpage is paid close attention to statistical server ": " the paying close attention to score value PageFocus " for one or more webpages that comprises in " PageFocus packet " that " the PageFocus web browser " that the reception whole world is being moved sent.Be used for distinguishing the different users that browses for ID number.

(3) " PageFocus browser online upgrading server ": be used for providing online upgrade service to the whole world " PageFocus web browser ".

(4) " data encrypting and deciphering module ": be used between " the PageFocus webserver " and " PageFocus web browser ", transmitting enciphered data, place and attacked or steal information.

" PageFocus web browser " structure

" PageFocus web browser " reports the degree of concern of active user for certain webpage by network to " the PageFocus webserver ".

" PageFocus web browser " is made up of following:

(1) " pays close attention to score value PageFocus computing module ": according to the operation of user to " PageFocus web browser ", calculate the degree of concern of user, and form " PageFocus packet " to " the PageFocusAccServer webpage is paid close attention to statistical server " report to certain webpage.

(2) " PageFocus browser ID Registering modules ": with " PageFocus browser ID registrar " communication obtaining globally unique sign ID, as the foundation of distinguishing different user.

(3) " PageFocus browser online upgrading module ":, be latest edition to keep " PageFocus browser " on active user's computing machine with " PageFocus browser online upgrading server " communication.

This device comprises: " the PageFocus web browser " of the invention, " PageFocus browser ID registrar " and " webpage score server ", and the specific implementation method is as follows:

The 1st step: develop special " a PageFocus web browser ", each browser all possesses globally unique ID identification number when mounted, or initiatively seeks " PageFocus browser ID registrar " on the network in use to obtain globally unique ID identification number.

The 2nd step: " PageFocus web browser " possesses and (for example: the repertoire IE browser of Microsoft) has the general networks browser.

The 3rd step: " PageFocus web browser " also possesses the user converted to " paying close attention to score value PageFocus " of webpage and forms " PageFocus packet " according to the listed weight of following table to the operation of browser with to the operation of webpage, be passed to " the PageFocusAccServer webpage is paid close attention to statistical server " of this search engine with cipher mode by procotol.

The 4th step: " paying close attention to score value PageFocus " that " the PageFocusAccServer webpage is paid close attention to statistical server " comprises its inside after " PageFocus packet " that each " PageFocus web browser " of receiving the whole world sent is added on the corresponding webpage.

The 5th step: " paying close attention to score value PageFocus " of each webpage of the whole world that comprises on " PageFocusAccServer webpage pay close attention to statistical server ", these information can form by various disposal routes: search engine is selected to can be used as the foundation of " title search result ", also can directly be announced out the service of conduct " webpage hot topic degree ranking list " according to, search engine the webpage seniority among brothers and sisters in having the identical content Search Results.

The method that " PageFocus web browser " calculating " is paid close attention to score value PageFocus ":

Because the repertoire that " PageFocus web browser " has generic browser, so can be when the user uses browser, gather its operation behavior according to following table, and according to " weight " of every kind of behavior this webpage is carried out " paying close attention to score value PageFocus " and score, and when browser thoroughly cuts out this webpage, form a branch value record of " paying close attention to score value PageFocus " about this webpage, issue with the form of " PageFocus packet "

" the PageFocusAccServer webpage is paid close attention to statistical server ".

Note:

1. though have erroneous judgement with these standards of grading, can obtain statistical accuracy by a large amount of operations on the network.

2. listed " weight " concrete numerical value in the table is representative value only, and the invention reside in by browser is page marking, and the change of any other " weight project " and " weight " all belongs to category of the present invention.

3. adopt the user that the mode of webpage ballot is based on abundant trust for netizen's social morality, so its " weight " to the mathematics multiplication of whole score, rather than the mathematics addition.

4. because each webpage all may obtain a large amount of PageFocus scores, may cause overflowing of software variable, so can adopt " mathematics logarithm " or " scientific notation " record score " the PageFocusAccServer webpage is paid close attention to statistical server ".

5. be other approach of this method, except when browser thoroughly cuts out this webpage, forming " PageFocus packet ", can also determine the opportunity of " PageFocus packet " with other any regular, for example: regularly, be accumulated to certain score value or the like, these methods all belong to category of the present invention.

6. the detailed calculated method of " every style of writing word reading rate " in showing:

A. mouse roller rolls: the each literal line number of rolling of word read speed=(viewing area width/set width) */rolling time at interval.

B. keyboard page turning: the literal line number/page turning time interval of word read speed=(viewing area width/set width) each page turning of *.

The formation method of " PageFocus packet "

The content of " PageFocus packet ":

Note: each " PageFocus packet " can comprise the call of a plurality of webpages.Every webpage call can also add other attribute, but in order to raise the efficiency, only lists important contents in the table, adds other attributes and also belong to category of the present invention in table." PageFocus packet " sends the selection on opportunity:

Reduce to send bandwidth that " PageFocus packet " take and the pressure that brings to server end, can take one of following several schemes:

When certain webpage is thoroughly sent " PageFocus packet " when browser cuts out.

When thoroughly cutting out, browser sends " PageFocus packet ".

Browser is retained in local computer with " PageFocus packet " with document form, runs up to specific quantity or length-specific or special time and sends during the cycle again.

" title search result " selection algorithm

This algorithm is mainly used in " homology Search Results " how to select to be used as " title search result " in original searching results.This algorithm need address the problem:

1. judge the content quality of webpage by network user behavior and web page contents, the preferential demonstration that quality is high.

2. avoid a certain Search Results to bear too much click traffic, cause the slack-off even collapse of website processing speed because of becoming " title search result ".

3. avoid a certain Search Results to bear too much click traffic and cause service response speed slack-off, and reduce visitor's experience good opinion because of becoming " title search result ".

4. making becomes " title search result " as a kind of power, can offer the website that needs, and this power can be bought in these websites.

5. the baseline results of each " homology Search Results " all has an opportunity to become " title search result " according to certain probability.

" title search result " system of selection is, when in " homology Search Results ", selecting " title search result ", " search result content quality ", " weighted value " and " service response delay " three key elements have been considered simultaneously, that is: the preferential demonstration that content quality is high, the preferential demonstration that has preferential demonstration, the network of weighting to serve; Then still according to this principle, and " weighted value " can be bought to system operator of the present invention when arranging all " homology Search Results ".The specific implementation method that " title search result " selects is as follows:

The 1st step: calculating each " homology Search Results " becomes the probability weights PWn of " title search result " (this Search Results is the n bar):

PWn＝TP*PageFocus/(RespDelay-K)

Note 1: when (RespDelay-K) smaller or equal to zero the time, (RespDelay-K) answering value is 1.

Note 2: the variable implication is as follows in the formula

A.PageFocus webpage attention rate value: be this Search Results according to the present invention in " obtaining the method and apparatus of web page user attention rate " " PageFocus value " of being obtained.

B.RespDelay web service operating lag: be that this Search Results is at the operating lag when the searchers provides service access.(because the operating lag that depends on the website is experienced in visit, react slow more, it is poor more to experience).

C.K service response constant: be the constant that can define, 50 milliseconds (ms) used in suggestion, and the service response that is lower than the K value postpones and will do not discovered, and does not influence experience, thereby can ignore.

The D.TP title search is power as a result: as a kind of weighting, anyone can obtain " the TP title search is power as a result " by various give-and-take conditions with the network operator of system of the present invention.

E. as other implementation algorithm of this formula, following other form can also be arranged:

a.PWn＝(TP+PageFocus)/(RespDelay-K)

b.PWn＝(TP+PageFocus)/RespDelay/K

c.PWn＝TP*PageFocus/RespDelay/K

The 2nd step: the summation of the probability weights PWn of statistics summation all original " homology Search Results ": the whole probability weights of PWall.

The 3rd step: calculate the probability that every " homology Search Results " becomes " title search result ": Pn=PWn/PWall.

The 4th step: according to the probability of Pn value,, dynamically select at random " title search result ", present to the searchers along with searchers's visit action.

The adaptive apparatus and method of web site contents style

Content of the present invention is: utilize the various information of can be obtainable, helping to judge user's environment of living in and state, make the user who is in different operating or life leisure state under the prerequisite that need not any operation, registration, setting or Cookie setting, see different styles during visit same page URL address, comprising:

1. utilize user's IP address to judge its residing country or zone,, can judge that by his time he is in the duty state that still lies fallow again in conjunction with just can calculating local administrative region time of visitor by this website time.

2. by user's IP address, can inquire the attribute of this IP address: family, workplace.Style and the content that is fit to its environment of living in is provided according to its place of living in.

3. can know its residing geographic position by user's IP address, when the inquiry business information, can will be arranged in the foremost apart from he nearest supplier automatically.

Be exemplified below:

Synchronization, the content of seeing during webpage of identical URL in this website of different user captures is different:

A. the user in duty and the environment sees is serious, brief introduction, the page that does not contain leisure recreation and amusement information.

What the user in state and the environment of B. lying fallow saw is the page of livening up, can containing leisure recreation and amusement information, can contain the personal consumption advertising message.

The present invention can partly or entirely be applied to the web station system beyond the search engine, all belongs to category of the present invention.

Each large-scale website in order to satisfy the visit of big flow, has all adopted server cluster, even has set up the local service subsystem in the zone at present, shunts user capture.But being exactly each cluster member, the key character of present server cluster all provides identical content.As Figure 13: the preceding user who visits is by " Website server cluster inlet " equipment, any feature of part, directly be assigned on certain server cluster member server with identical content.

As Figure 14, and device of the present invention has been done partly change to said structure, after " Website server cluster inlet " receives calling party, whether in running order the various customer attribute informations such as IP address that send during according to its access websites judge whether it is in running order, and provide the information service of different-style and content to it according to it.

Automatically judge User Status and provide appropriate web page style and the method for content

The 1st step: at first server cluster is divided into " work style " and " individual and leisure style " two big classes, no matter be static page or dynamic page, in the identical content of this two classes server update, automatically produce two class styles, so that the user of different operating or life leisure state sees different styles when visit same page URL address.

The 2nd step: after " Website server cluster inlet " receives that the user visits the request of this website webpage first, at first obtain its IP address at (or in IP layer protocol) in the access protocal.

The 3rd step: inquiring about its IP address according to the IP address in " IP address properties database " is " IP address, workplace " or " the IP address of individual or leisure occasion ", if " IP address, workplace " then carried out for the 4th step, if then carried out for the 5th step " the IP address of individual or leisure occasion ".

The 4th step: obtain " IP address, workplace " residing geographic position, and obtain administrative time of this geographic area, (week 1～5 8:00～20:00) then is assigned to its visit " work style server " in the server cluster and goes up to provide to it and be fit to the page service that use the workplace, otherwise carries out for the 5th step if this IP address affiliated area is in the working time.

The 5th step: " individual and leisure style server " that then its visit be assigned in the server cluster upward provides the page service that is fit to individual and the use of leisure state to it.

Claims

1. homologous information site search engine aggregation display method, it comprises the following steps:

(3) the power buyer's who " becomes the title search result " by " homologous information processing module " inquiry accounts information, and in original searching results, choose the object that is used as " title search result " in conjunction with judgment rule;

(4) " the title search result " that only will be chosen by search engine Web server or application server shows the inquiry as Search Results, and provides a button that has " details is checked in expansion " implication for " title search result ";

(5) inquiry also can press the button corresponding with " title search result ", and search engine is illustrated in the original searching results that finds in (2) to it again.

2. homologous information site search engine aggregation display method according to claim 1 is characterized in that the treatment scheme of described " homologous information processing module " comprises the steps:

(1) information of the Webpage search device being received by the information category judge module is carried out the kind judgement, and wherein said " homologous information processing module " includes the information category judge module;

(2) by the information category judge module information of identical type is concentrated the message processing module that sends to respective type;

(3) will enter " non-homogeneous object information storehouse " or " homology object information storehouse " by the search information filing after the message processing module processing of respective type;

(4) " non-homogeneous object information storehouse " or " homology object information storehouse " is published on the Web server;

Wherein: " homologous information processing module " is made up of a plurality of " the homologous information processing modules of corresponding information kind ", and described " homologous information processing module " comprises " with the source web page processing module ", " homology multimedia processing module ", " homology picture processing module ", " homology document process module ", " homology software processing module ", " with source data or database processing module ", " homology GIS message processing module ", " with the value network service processing module " and " with being worth the business information processing module ".

3. homologous information site search engine aggregation display method according to claim 2 is characterized in that, the step of described " with the source web page processing module " processing info web is as follows:

(1) when search engine searches partly receives the keyword that needs inquiry, at first judge by the decision device that is distributed on the Search Results on the Web server whether this keyword was inquired about by other people in the recent period, if inquired about, and the result issues on the search engine search results Web server, then directly return Search Results, the webpage that will have identical source among this result aggregates into a Search Results, after clicking " same source web page " button, can on the search engine search results Web server, see the search result web page that comprises whole Search Results, finish whole query script;

(2) if when search engine searches partly receives the keyword that needs inquiry, judge that by the decision device that is distributed on the Search Results on the Web server this keyword do not inquired about by other people in the recent period, and also do not have corresponding Query Result on the search engine search results Web server, to issue then:

(3) by " web page contents separation vessel " web page contents and the hyperlink target that finds resolved into: multimedia, picture, literal, hyperlink kind;

(4) produce court verdict by various content decision devices respectively:

(6) the mathematics multiplication result that (5) step was obtained is done addition, obtains " homology degree SSS (the Same Source Score) " of webpage, homology degree SSS=(SMS*SMP)+(SPS*SPP)+(STS*STP)+(SHS*SHP);

(7) whether " the homology degree SSS " that judges this webpage exceeds thresholding, if exceed thresholding then be judged to be " the same source web page " of other webpage, if do not exceed thresholding then be judged to be " non-homogeneous webpage ";

(8) " the non-homogeneous webpage " that (7) step was produced deposits " non-homogeneous web results database " in by " non-homogeneous webpage processing module "; " same source web page " that (7) step produced gone into " homology web results database " by " with the source web page processing module ";

4. homologous information site search engine aggregation display method according to claim 2 is characterized in that the treatment scheme of described " homologous information processing module " also comprises the steps:

(1) receiving inquiry's searching key word, and judging file or the service that needs are looked for according to key words content and keyword grammer by software;

(2) judge " content that will search for is distributed on the Web server? " if being distributed on, the content of search then directly returns Search Results on " search engine search results Web server ", will meet the multimedia interface that obtains that search condition has identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure; If the target of search is not distributed on " search engine search results Web server " since (3) step;

(3) return the inquiry and do not have qualified result;

A. by emerging file destination of Webpage search device search and webpage or service entrance, enter this inlet by software and obtain this document or service;

B. by " content decision device " judge new-found information " belonging to same content? " with the content of current ' homologous information index data base ' if, "Yes" then it is included into the classification of " homologous information index data base " as a new element; If "No" then judge that by " content decision device " content of its " with current non-homogeneous information index database " belongs to same content? "

If C. "Yes" then: " for current information and with it homology and be stored in information in ' non-homogeneous information index database ', a newly-built classification is also all transferred to ' homologous information index data base ' "; If "No" then " be the current newly-built classification of information, and deposit in ' non-homogeneous information index database ' ";

5. homologous information site search engine aggregation display method according to claim 4, it is characterized in that, when described " homologous information processing module " handled document, the renewal process of " homologous information index data base " and " non-homogeneous information index database " was:

(1) by emerging document files of " document searching device " search and webpage or link inlet, enters this inlet by software and obtain this document or service;

(2) by " word content decision device " and " image content decision device " judge new-found document content " belonging to same content? " with the content of current ' homologous information index data base ' if, "Yes" then it is included into the classification of " homologous information index data base " as a new element; If "No" then judge that by " document content decision device " content of its " with current non-homogeneous information index database " belongs to same content? "

(3) if "Yes" then: " for current document and with it homology and be stored in document in ' non-homogeneous information index database ', a newly-built classification is also all transferred to ' homologous information index data base ' "; If "No" then " be the current newly-built classification of document, and deposit in ' non-homogeneous information index database ' ".

6. according to claim 3, the described homologous information site search engine of 4 or 5 each claims aggregation display method, it is characterized in that the treatment scheme of described content decision device comprises the steps:

(5) return (1) next " attribute " carried out (1)～(4), obtain the PSame of this attribute, until the PSame value that obtains whole attributes;

(6) calculate and return the identical content degree value of " being judged object ": the mathematics accumulated value/InputQuantity of the whole Psame values of SameMediaPower=.

7. according to claim 3, the described homologous information site search engine of 4 or 5 each claims aggregation display method, it is characterized in that when the content decision device was the word content decision device, its treatment scheme comprised the steps:

(1) finds out the total length value SameLenth that has the part of identical word or sentence in the word content of input;

(3) return literal similarity degree value SameTextPower=SameLenth/MinLenth.

8. according to claim 3, the described homologous information site search engine of 4 each claims aggregation display method, it is characterized in that when the content decision device was the linked contents decision device, its treatment scheme comprised the steps:

(2) add up the URL number of addresses that " being judged object " similarity degree: SameURLPower=all occurred in each hyperlink;

(3) return SameURLPower.

9. homologous information site search engine aggregation display method according to claim 4 is characterized in that when the content decision device was business information content decision device, its treatment scheme comprised the steps:

(1) comparison participates in whether the business information of comparison is identical product or service, if " not being " returns " inconsistent ", if "Yes" entered for (2) step;

(2) whether the business information that judge to participate in comparison has geographic position susceptibility, if " not being " returns judged result " unanimity ", if "Yes" then carried out for (3) step;

10. homologous information site search engine aggregation display method according to claim 1 is characterized in that,

PWn＝TP*PageFocus/(RespDelay-K)

N: this Search Results is the n bar

PageFocus: webpage attention rate value

RespDelay: web service operating lag

K: service response constant: incur loss through delay and will do not discovered less than the service of this value,

TP: title search is power as a result

11. homologous information site search engine aggregation display method according to claim 10 is characterized in that,

A.PWn=(TP+PageFocus)/(RespDelay-K) or,

B.PWn=(TP+PageFocus)/RespDelay/K or,

c.PWn＝TP*PageFocus/RespDelay/K。

12. homologous information site search engine aggregation display method according to claim 1 is characterized in that, described " homologous information processing module ":

(1) can be embedded in the search engine;

(2) can be placed between " search engine " and " search engine search results Web server ";

(3) also can be used as pretreatment module is placed between " search engine " and the searched website.

13. homologous information site search engine aggregation display method according to claim 1, described expansion check that the button of details implication can be super connection or various software interface control.