CN101025737B - Attention degree based same source information search engine aggregation display method - Google Patents

Attention degree based same source information search engine aggregation display method Download PDF

Info

Publication number
CN101025737B
CN101025737B CN2006100079057A CN200610007905A CN101025737B CN 101025737 B CN101025737 B CN 101025737B CN 2006100079057 A CN2006100079057 A CN 2006100079057A CN 200610007905 A CN200610007905 A CN 200610007905A CN 101025737 B CN101025737 B CN 101025737B
Authority
CN
China
Prior art keywords
search
content
web
information
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2006100079057A
Other languages
Chinese (zh)
Other versions
CN101025737A (en
Inventor
王东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN2006100079057A priority Critical patent/CN101025737B/en
Priority to PCT/CN2007/000370 priority patent/WO2007095834A1/en
Priority to US12/279,949 priority patent/US8176029B2/en
Publication of CN101025737A publication Critical patent/CN101025737A/en
Application granted granted Critical
Publication of CN101025737B publication Critical patent/CN101025737B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a focus-based same-source information searching engine aggregation display method and system, comprising: searching engine finds all target websites according to conditions as the original searching results; according to the quality of contents, account information of purchasers of display weighting power, and quality of service, and other elements, aggregating the original searching results into a title searching result; only taking the title searching result as final searching result shown to an inquirer, and not showing all the searching results to the inquirer until the inquirer needs to view them. And the system adopts counting server to support network browser and converts all user's operations into PageFocus of a webpage, and transmits the PageFocus back to the counting server to express the quality of contents of the webpage, thus able to become a method for the searching engine to select 'title searching result' and make result display arrangement. And the invention also relates to a method able to automatically judge user state and provide proper style and contents of webpage.

Description

Homologous information search engine aggregation display method based on attention rate
Technical field
The present invention relates to computer networking technology, particularly utilize computing machine in the internet or enterprises the search engine technique of search service is provided on the net.The invention still further relates to a kind of system and web site contents style self-reacting device and method of obtaining the web page user attention rate.
Background technology
On Internet, exist at present a large amount of " webpage or the network service in identical (or similar) source ", for example: 1 by the writing of same individual or entity by the article of massive duplication, viewpoint, Intelligence Page; 2 by same individual or entity interview (or issue) by the news report webpage of massive duplication; 3 are pasted by same individual or the commentaries on classics that is organized in BBS forum speech model; 5 different data formats, the multimedia file of compression factor by the generation of same individual or entity; 6 executable program, data, design documents by the generation of same individual or entity; 7 other modes information content that produce and that extensively duplicated.These " webpage or the network services in identical (or similar) source " are enumerated in present search engine search results one by one, occupy a large amount of lengths, and content is identical, and inconvenient inquiry browses.
Present various search engine and webpage seniority among brothers and sisters service system, all only adopted click traffic and the mode of the webpage residence time to weigh the popular degree of webpage, and the method for taking is main: 1) search engine class: rely on the inquiry that the popular degree of webpage, for example google, Baidu are calculated in the click of Search Results.2) ALEXA website seniority among brothers and sisters class: rely on the toolbar software that is embedded on the browser, the user is sent it back server (parameter comprises current web page address, page open time) to the click and webpage residence time of hyperlink, but do not comprise other appraisal procedures.The Alexa principle of work can referring to:
http://www.singtaonet.com/it/it?sp/t20051110?43674.html
http://www.people.com.cn/GB/it/8219/41552/41597/3109586.html
Present various website can be divided into following classification:
Classification one: all web site contents (for example: news website) all have same style and content to Any user at synchronization.
Classification two: can (for example: the news website of google) show different styles and content according to user's setting.
But these websites can not provide different display styles and content at real-time different conditions according to the user.
Summary of the invention
In order to improve the deficiency of the problems referred to above, the invention provides a kind of like this searching method, it can be aggregating into a record because of the identical Search Results that the searchers is had identical use value of content, be the title search result, launch the apparatus and method check other results as required again, thereby avoid " title search result " clickedly to cause that the destination server visit capacity is excessive paralyse, " title search result " click is distributed to apparatus and method on other Search Results targets automatically owing to frequent.The present invention also provides a kind of like this system, the web browser that its utilization can cooperate with the statistical server on the network, whole operation behaviors of user are converted into scoring to this webpage, and send it back statistical server, as scoring, thereby can be used as the arrangement method and the instrument of search engine to the degree of concern of webpage.The present invention also provides a kind of like this method: utilize the various information of can be obtainable, helping to judge user's environment of living in and state, in synchronization, same website in addition the time the same page in, provide different display styles and contents to the user of different conditions.
To achieve these goals, a kind of searching method that the polymerization of homologous information site search engine is shown, it comprises the following steps:
(1) inquiry passes through Web browser or accessible with application software search engine, and input needs the keyword of inquiry;
(2) find whole qualified targeted sites as original searching results by search engine;
(3) the power buyer's who " becomes the title search result " by " homologous information processing module " inquiry accounts information, and in original searching results, choose the object that is used as " title search result " in conjunction with other judgment rules;
(4) " the title search result " that only will be chosen by search engine Web server or application server shows the inquiry as Search Results, and has " the button of " details or other information are checked in expansion " implication for it provides one;
(5) inquiry also can press corresponding with it " button ", and search engine is illustrated in the original searching results that finds in (2) to it again.
" homologous information processing module " has a plurality of " (the corresponding information kind) homologous information processing module " to form, for example: " with the source web page processing module ", " homology multimedia processing module ", " homology picture processing module ", " homology document process module ", " homology software processing module ", " with source data or database processing module ", " homology GIS message processing module ", " with the value network service processing module ", " with being worth the business information processing module " etc.
Described " homologous information processing module " comprises the steps:
(1) information of at first by " information category judge module " the web search device being received is carried out the kind judgement;
(2) with concentrated send to " (the corresponding information kind) the homologous information processing module " of the information of identical type;
(3) will enter " non-homogeneous (the corresponding information kind) object information storehouse " or " homology (the corresponding information kind) object information storehouse " by the search information filing after " (the corresponding information kind) homologous information processing module " processing.
(4) by system " non-homogeneous (the corresponding information kind) object information storehouse " and " homology (the corresponding information kind) object information storehouse " is published on the Web server, for inquiry's inquiry.As implementation method in another, also can directly provide inquiry service according to these two databases based on dynamic web page to the inquiry.
Described step by " with the source web page processing module " processing info web is as follows:
(1) when " search engine searches part " receives the keyword that needs inquiry, at first judge by " Search Results has been distributed on the decision device on the Web server " whether this keyword was inquired about by other people in the recent period, if inquired about, and the result goes up issue at " search engine search results Web server ", then directly return Search Results, the webpage that will have identical source among this result aggregates into a Search Results, after clicking " same source web page " button, can on " search engine search results Web server ", see the search result web page that another comprises whole Search Results, finish whole query script;
(2) if when " search engine searches part " receives the keyword that needs inquiry, judge that by " Search Results has been distributed on the decision device on the Web server " this keyword do not inquired about by other people in the recent period, and also do not have corresponding Query Result to go up issue then at " search engine search results Web server ":
A. start " Webpage search device " search " non-homogeneous web results database " and " homology web results database " and find the web page address that meets searching key word, and obtain the content of these webpages;
If B. " Webpage search device " do not find the web page address that meets searching key word in " non-homogeneous web results database " and " homology web results database ", then return the result that the inquiry " does not have eligible webpage ", and this searching key word is joined next round to be upgraded in the task of " non-homogeneous web results database " and " homology web results database ", select into " non-homogeneous web results database " or " homology web results database " if in renewal process, found qualified web page address then whether had with source web page according to it, if so again the someone to search for same keyword be just can find the result;
(3) by " web page contents separation vessel " web page contents and the hyperlink target that finds resolved into: kinds such as multimedia, picture, literal, hyperlink;
(4) produce court verdict by various content decision devices respectively:
A. produce target web contained " identical multimedia file degree SMS (Same Media Score) " by " content of multimedia decision device ";
B. produce target web contained " the degree SPS of identical picture (Same Photo Score) " by " image content decision device ";
C. produce target web contained " the degree STS of same text (Same Text Score) " by " word content decision device ";
D. produce target web contained " the degree SHS of identical super connection (Same Hyperlinks Score) " by " linked contents decision device ";
(5) obtain " multimedia judgement weight SMP ", " picture is adjudicated weight SPP ", " literal judgement weight STP ", " link judgement weight SHP " from " with source web page decision rule storehouse " respectively and go on foot " identical multimedia file degree SMS ", " the degree SPS of identical picture ", " the degree STS of same text ", " the degree SHS of identical super connection " the doing mathematics multiplication that generates with (4) respectively;
(6) the mathematics multiplication result that (5) step was obtained is done addition, obtains " the homology degree SSS (Same of webpage
Sourc Score) ", homology degree SSS=(SMS*SMP)+(SPS*SPP)+(STS*STP)+(SHS*SHP);
(7) whether " the homology degree SSS " that judges this webpage exceeds thresholding, if exceed thresholding then be judged to be " same source web page " with other webpage, if do not exceed thresholding then be judged to be " non-homogeneous webpage ";
(8) " the non-homogeneous webpage " that (7) step was produced gone into " non-homogeneous web results database " by " non-homogeneous webpage processing module "; " same source web page " that (7) step produced gone into " homology web results database " by " with the source web page processing module ";
(9) dynamically generate the static Web page of Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to inquiring user by browser again;
(10) as the another kind of implementation method in (9) step, also can directly present to inquiring user by " dynamic web page Web server " by browser.
Describedly also can comprise the steps: by " homologous information processing module "
(1) receiving inquiry's searching key word, and judging file or the network service that needs are searched according to key words content and keyword grammer by software;
(2) judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results, will meet search condition among this result and have the file in identical source or the inlet that obtains of network service aggregates into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since (3) step;
(3) return the prompting that the inquiry " does not have qualified result ";
(4) this searching key word is joined next round and upgrade in the task of " homologous information index data base " and " non-homogeneous information index database ", and regularly start the renewal process of two databases;
(5) renewal process of " homologous information index data base " and " non-homogeneous information index database ":
A. by emerging file destination of searcher search and webpage or service entrance, enter this inlet by software and obtain this document or network service;
B. by " content decision device " judge new-found information " belonging to same content? " with the content of current " homologous information index data base " if "Yes" then it is included into this classification of " homologous information index data base " as a new element; If "No" then judge that by " content decision device " content of its " with current non-homogeneous information index database " belongs to same content? "
If C. "Yes" then: " for current information and with it homology and be stored in information in ' non-homogeneous information index database ', a newly-built classification is also all transferred to ' homologous information index data base ' ";
If D. "No" then: " be the current newly-built classification of information, and deposit in ' non-homogeneous information index database ' ";
(6) dynamically generate the static Web page of Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry who comes to search for by browser again;
(7) as the another kind of implementation method in (6) step, also can directly present to inquiring user by " dynamic web page Web server " by browser.
Described when handling document by the homologous information processing module, the renewal process of " homologous information index data base " and " non-homogeneous information index database " is:
A. by emerging document files of " document searching device " search and webpage or link inlet, enter this inlet by software and obtain this document or service;
B. by " word content decision device " and " image content decision device " judge new-found document content " belonging to same content? " with the content of current ' homology document index database ' if "Yes" then it is included into this classification of " homology document index database " as a new element; If "No" then judge that by " document content decision device " content of its " with current non-homogeneous document index database " belongs to same content? "
If C. "Yes" then: " for current document and with it homology and be stored in document in ' non-homogeneous document index database ', a newly-built classification is also all transferred to ' homology document index database ' "; If "No" then " be the current newly-built classification of document, and deposit in ' non-homogeneous document index database ' ";
Described related content decision device module comprises the steps:
(1) receives " being judged object ": can receive the multimedia in a plurality of sources, and record is judged the quantity I nputQuantity of object;
(2) search " being judged object " set attribute that participates in comparing, write down the quantity SameQuantity that current attribute has identical value " being judged object ";
(3) " weight " value Power of the current attribute of input in deterministic process;
(4) calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power;
(5) return (1) next " attribute " carried out (1)~(4), obtain the PSame of this attribute, until the PSame value that obtains subordinate's property;
(6) calculate and return the identical content degree value of " being judged object ": SameMediaPower=(all mathematics accumulated values of Psame value)/InputQuantity.
When content decision device module was the word content decision device, it comprised the steps:
(1) finds out the total length value SameLenth of the part that has identical word or sentence in the word content;
(2) find out in a plurality of word contents of input the length value MinLenth of the input characters that length is the shortest;
(3) return literal similarity degree value SameTextPower=SameLenth/MinLenth.
When content decision device module was the linked contents decision device, it comprised the steps:
(1) receives " being judged object ": the URL address of a plurality of hyperlinks;
(2) the target URL number of addresses that on estimative each hyperlink page pointed, all occurred of statistics " being judged object " similarity degree: SameURLPower=;
(3) return SameURLPower.
When content decision device module was business information content decision device, it comprised the steps:
(1) comparison participates in whether the business information of comparison is identical product or service, if " not being " returns " inconsistent ", if "Yes" entered for (2) step.
(2) whether the business information that judge to participate in comparison has geographic position susceptibility, if " not being " returns judged result " unanimity ", if "Yes" then carried out for (3) step.
(3) whether the supplier of the business information of judgement participation comparison is in identical city or zone, if " not being " returns judged result " inconsistent ", if return judged result " unanimity ".
The specific implementation method that " title search result " selects is as follows:
(1) calculate the probability weights PWn that each " homology Search Results " becomes " title search result ":
PWn=TP*PageFocus/(RespDelay-K)
N: this Search Results is the n bar
When (RespDelay-K) smaller or equal to zero the time, (RespDelay-K) answering value is 1
PageFocus: webpage attention rate value
RespDelay: web service operating lag
K: the service response constant, suggestion K is set to 50 milliseconds (ms).
TP: title search is power as a result
(2) summation of the probability weights PWn of statistics summation all original " homology Search Results ": the whole probability weights of PWall;
(3) calculate the probability that every " homology Search Results " becomes " title search result ": Pn=PWn/Pwall;
(4) according to the probability of Pn value,, dynamically select at random " title search result ", present to the searchers along with searchers's visit action.
The computing method of the probability weights PWn of described " title search result " can also be:
A.PWn=(TP+PageFocus)/(RespDelay-K) or,
B.PWn=(TP+PageFocus)/RespDelay/K or,
c.PWn=TP*PageFocus/RespDelay/K。
Described " homologous information processing module ":
A. can be embedded in the search engine;
B. can be placed between " search engine " and " search engine search results Web server ";
C. also can be used as pretreatment module is placed between " search engine " and the searched website.
Described expansion checks that the button of details or other information implications can be super connection or various software interface control.
A kind of system that obtains web page user Search Results attention rate comprises the PageFocus webserver, PageFocus web browser and webpage score server,
The PageFocus webserver comprises PageFocus browser ID registrar, the concern of PageFocusAccServer webpage statistical server, PageFocus browser online upgrading server and data encrypting and deciphering module;
The PageFocus web browser comprises PageFocus browser ID Registering modules, pays close attention to score value PageFocus
Computing module.
Its job step is as follows:
(1) " PageFocus web browser ", each browser all possesses globally unique ID identification number when mounted, or initiatively seeks " PageFocus browser ID registrar " on the network in use to obtain globally unique ID identification number;
(2) " PageFocus web browser " possesses and has the general networks browser, and the user converted to " paying close attention to score value PageFocus " of webpage and form " PageFocus packet " according to weight to the operation of browser with to the operation of webpage, be passed to " the PageFocusAccServer webpage is paid close attention to statistical server " of this search engine by procotol with cipher mode;
(3) " PageFocusAccServer webpage pay close attention to statistical server " " paying close attention to score value PageFocus " of after " PageFocus packet " that each " PageFocus web browser " of receiving the whole world sent its inside being comprised is added on the corresponding webpage;
(4) " paying close attention to score value PageFocus " of each webpage of the whole world that comprises on " PageFocusAccServer webpage pay close attention to statistical server ", these information can form by various disposal routes: search engine is selected to can be used as the foundation of " title search result ", also can directly be announced out the service of conduct " webpage hot topic degree ranking list " according to, search engine the webpage seniority among brothers and sisters in having the identical content Search Results.
Described PageFocusAccServer webpage is paid close attention to statistical server can adopt mathematics logarithm or scientific notation record score.
Described PageFocus packet can form when browser thoroughly cuts out this webpage, also can regularly form, and forms in the time of also can being accumulated to certain score value again.
Described concern score value PageFocus forms according to the listed weight of following table:
Figure G2006107905720060228D000091
Figure G2006107905720060228D000101
Note:
Weighted value in 1 form is embodiment, and other numerical value also can adopt, and is scope of the present invention.
The calculation procedure of described word read speed is as follows:
A. mouse roller rolls: the each literal line number of rolling of word read speed=(viewing area width/set width) */rolling time at interval;
B. keyboard page turning: the literal line number/page turning time interval of word read speed=(viewing area width/set width) each page turning of *;
C. the forms scroll bar rolls: the each literal line number of rolling of word read speed=(viewing area width/set width) */rolling time at interval.
Described PageFocus packet comprises PageFocus browser ID, webpage URL and webpage PageFocus score value field.
Each webpage that possesses " same source web page " is in the page rank process that the participation search engine provides, can use the foundation of the summation of user's attention rate PageFocus score value that each " same source web page " obtain as rank, that is: A can adopt the summation of user's attention rate PageFocus that each " same source web page " obtain as the rank foundation when participating in the search-engine results rank in " the title search result " of " same source web page "; Each webpage in the B " same source web page " also can adopt the summation of user's attention rate PageFocus that each webpage of " the same source web page " of its subordinate obtains as the rank foundation when participating in the search-engine results rank.
A kind of automatic judgement User Status also provides appropriate web page style and the method for content, and it comprises the steps:
(1) after " Website server cluster inlet " receives that the user visits the request of this website webpage first, at first in the access protocal or the IP layer protocol in obtain its IP address;
(2) inquiring about its IP address according to the IP address in " IP address properties database " is " IP address, workplace " or " the IP address of individual or leisure occasion ", if " IP address, workplace " then carried out for (3) step, if then carried out for (4) step " the IP address of individual or leisure occasion ";
(3) obtain " IP address, workplace " residing geographic position, and obtain administrative time of this geographic area, if this IP address affiliated area is in the working time, then its visit is assigned to " work style server " page service that provides suitable workplace to use to it is provided, otherwise carried out for (4) step;
(4) then its visit is assigned to " individual and leisure style server " page service that provides suitable individual and leisure state to use to it is provided.
By such scheme, can be identical and the Search Results that the searchers has identical use value is aggregated into a record content, promptly the title search result launches the apparatus and method of checking other results as required again.Designed and avoided " title search result " clickedly to cause that the destination server visit capacity is excessive paralyses, " title search result " click has been distributed to device on other Search Results targets automatically owing to frequent.The present invention is except possessing existing search engine, the various network services that also possesses search various " multimedias ", " document ", " software ", " hardware and software source code or design document ", " data or database ", " information ", the function of for example file-sharing, FTP service, P2P service etc.
The web browser that utilization can cooperate with the statistical server on the network, whole operation behaviors of user are converted into scoring to this webpage, and send it back statistical server, as scoring, thereby can be used as the rank instrument of search engine to the degree of concern of webpage.
By web site contents style adaptive approach, the user can:
1. 9:00~18:00 in morning of 1~5 belongs to the working time week, and in running order people need see succinctly, rigorous relatively style and as far as possible and the duty related content.
2. week 1~5 18:00 in the evening~morning 9:00 and the whole day in week 6~7 belong to leisure time, and the people who is in the leisure state need see the style and the content of ripple alive, lively, leisure.
3. be in that people from workplace need see succinctly, rigorous relatively style and as far as possible and the duty related content.
4. the people who is in family and leisure place need see ripple alive, the style and the content of livening up, lying fallow.
5. the people who is in other environment or state need see with at that time environment and state adapt style and content.
Brief Description Of Drawings
Fig. 1 is the system works structural drawing of homologous information site search engine aggregation display method;
Fig. 2 is a homologous information processing module cut-away view;
Fig. 3 is with source web page processing module process flow diagram;
Fig. 4 is a homology multimedia processing module process flow diagram;
Fig. 5 is a homology picture processing module process flow diagram;
Fig. 6 is a homology document process module process flow diagram;
Fig. 7 is a homology software processing module process flow diagram;
Fig. 8 is with source data or database processing module process flow diagram;
Fig. 9 is a homology GIS message processing module process flow diagram;
Figure 10 is with value network service processing module process flow diagram;
Figure 11 is with being worth business information processing module process flow diagram;
Figure 12 is for obtaining web page user attention rate system construction drawing;
Figure 13 is not for possessing the existing routine search engine web station system of content and style adaptive technique;
Figure 14 for the present invention possess content and style adaptive technique the search engine web site system.
Embodiment
Now the present invention is described further in conjunction with the accompanying drawings.
Fig. 1 is the system works structural drawing of homologous information site search engine aggregation display method.The 1st step: pass through Web browser or accessible with application software search engine by the inquiry, and input needs the keyword of inquiry.The 2nd step: find whole qualified targeted sites as " original searching results " by search engine.The 3rd step:, and in " original searching results ", choose the object that is used as " title search result ": A " homologous information processing module " in conjunction with other judgment rules and can be embedded in the search engine by " homologous information processing module " inquiry power buyer's that " becomes the title search result " accounts information; " homologous information processing module " can be placed between " search engine " and " search engine search results Web server "; C " homologous information processing module " also can be used as pretreatment module and is placed between " search engine " and the searched website.The 4th step: " the title search result " that only will be chosen by search engine Web server or application server shows the inquiry as Search Results, and has " button (the comprising super connection or various software interface control) " of " details or other information are checked in expansion " implication for it provides one.The 5th step: have only the inquiry to wish further to launch certain bar " title search result ", and when pressing with it corresponding " button ", search engine is illustrated in " original searching results " that finds in " the 2nd step " to it again.
Fig. 2 is a homologous information processing module cut-away view." homologous information processing module " is defined as: be mainly used to 1) judge that whether a plurality of nodes are arranged in the one group of information node that finds according to searching key word is that (these websites have same search to the inquiry and are worth or use value one or more repetition websites with information source, usually needn't all directly represent) to the inquiry, and these are repeated websites aggregate into a Search Results and issue the inquiry, just these Search Results are presented when having only the inquiry to need the website of other equal values.2) mainly to concentrate on the search of webpage different with existing search engine, " homologous information processing module " is except needing to handle the various network services that can also handle various " multimedias ", " document ", " software ", " hardware and software source code or design document ", " data or database ", " information " " Html webpage ", for example: file-sharing, FTP service, P2P service etc.
" homologous information processing module " adopts modular construction, can progressively develop and implement each module wherein as required, and possess extended capability, and each module also can further be strengthened its accuracy of judging automatically simultaneously, comprising:
1 " information category judge module ": judge the kind of information, and information of the same type concentrated send to respective type information processing module, as following module.
2 " with the source web page processing modules ": be used for judging and handle belonging to same source and the inquiry being had the webpage of equal value of finding, for example: Html, ASP, JSP, PHP, the content of BBS forum etc.
3 " homology multimedia processing modules ": be used for judging and handling the same source of finding that belongs to, and the inquiry had the multimedia file or a network service of equal value, for example: .MP3, .AVI, .WMV .MPEG .WAV, .RM wait various video files, and various Video service access interface based on stream media technology.
4 " homology picture processing modules ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry is had the picture of equal value, for example: .GIF .JPG .BMP .PNG etc.
5 " homology document process modules ": be used for judging and handle belonging to same source, having identical or related content of finding, and the inquiry had the various format file files or a network service of equal value, for example: " .Doc ", " .Txt ", " .Pdf ", " .XLS ", " .PPT " etc.
6 " homology software processing module ": can judge and handle the same software that the computer application software installation procedure that finds belongs to same author that they can be to adapt to similar and different operating system, the software installation procedure of identical or different version.
7 " with source data or database processing modules ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry is had equal value, the data file of known format or database file, for example: .DAT, .XLS .MDF .DBF etc.
8 " homology GIS message processing modules ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry is had the numerical map file or the service of equal value.
9 " with the value network service processing module ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry had a network service of equal value, for example: the FTP download service of same file, relay the IPTV service of a TV station simultaneously, the mail service of 1GB capacity etc. is provided simultaneously.
10 " with being worth the business information processing modules ": be used for judging and handle belonging to same source or having identical content of finding, be in identical geography or administrative region, and the inquiry had equal value, by the commercial product of Web publishing oneself or the ad content of service, for example: the egg that provides in same block is sold information, the haircut that provides in same block service sale information is in the operable telephonic communication service in same city etc." information category judge module "
" information category judge module " is mainly used in the information of collecting, and sorts out its type, and delivers to corresponding message processing module.
The information source that " information category judge module " handled mainly contains 3 kinds of forms:
(1) form web page: information comes from the web page contents of website, also contains the hyperlink of pointing to particular file types in the webpage simultaneously, for example: " http://www.008.org.cn/up/the_quiet_american.mp3 "
(2) network service form: comprise the network service entrance that the various network services device provides, for example: the kind sub-services of ftp file download service, various P2P (Pear To Pear) software (for example: BT download, eMule download), NEWS SERVER service etc.For knowing of network service entrance two kinds of approach can be arranged:
A. the network service that can find on the webpage: the network service entrance that can know by the analyzing web page content.
B. directly submit its network service entrance or content to this search engine by Internet Service Provider.
(3) data or database form: directly provide information typing service to network by search engine, submit the information of oneself to by the network user, the final information that forms data file or database form, when this search engine was inquired about, therefrom inquiry's requirement was satisfied in information extraction.
The kind determination methods of " form web page " information is as follows:
Webpage itself just can directly be exported to " with the source web page processing module " as " webpage " and handle, in addition, " information category judge module " according to the webpage grammer (for example: Html, Java, JSP, ASP, ASPX, PHP or the like language) at the grammer of " hyperlink ", can directly parse the file type of its sensing, can distinguish its information type according to different file types, see following table for details:
For example:
1. contain in the webpage: " Http:// xxx/xxx/song.mp3" hyperlink, can judge that its target is " multimedia " type information.
2. contain in the webpage: " Http:// xxx/xxx/song.rar" hyperlink, decompress after finding this file destination, find that the inside only contains " song.mp3 " can judge that still target is " multimedia " type information.
3. contain in the webpage: " Http:// xxx/xxx/song.rar" hyperlink, decompress after finding this file destination, find that the title of file number, each file of file that the inside is contained and catalogue is all identical with the mounting disc of certain known software with size, can judge that it is " software " type information.
The kind determination methods of " network service form " information is as follows:
The 1st step: visit this service as domestic consumer, to obtain its content.
The 2nd step: the content that obtains is classified according to following table.
Figure G2006107905720060228D000171
The 3rd step:, then need to launch classifying according to the 2nd step after its content if acquisition is compressed format files.
The kind determination methods of " data or database form " information is as follows:
The 1st step: visit data file or database, to obtain its content.
The 2nd step: directly carry out " the 4th step " from data file or database if the information that obtains is file.
The 3rd step:, then need to visit this position from data file or database to obtain file destination if the information that obtains is the position of depositing file.
The 4th step: the content that obtains is classified according to following table.
Figure G2006107905720060228D000181
The 5th step:, then need to launch classifying according to 4 steps after its content if acquisition is compressed format files." with the source web page processing module "
Fig. 3 is " with the source web page processing module " process flow diagram." with the source web page processing module " major function: will find according to searching key word, webpage with identical main contents, represent to the inquiry with " title search result " form, and can see the Query Result of the webpage that all inquires by " expansion " implication button with identical main contents.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous web results database " and " homology web results database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period.
" homologous information processing module " treatment scheme is as follows:
The 1st step: when " search engine searches part " receives the keyword that needs inquiry, at first judge by " Search Results has been distributed on the decision device on the Web server " whether this keyword was inquired about by other people in the recent period, if inquired about, and the result goes up issue at " search engine search results Web server ", then directly return Search Results (seeing figure " M1 " mark), the webpage that will have identical source among this result aggregates into a Search Results, after clicking " same source web page " button, can on " search engine search results Web server ", see the search result web page that another comprises whole Search Results, finish whole query script.
The 2nd step: if when " search engine searches part " receives the keyword that needs inquiry, judge that by " Search Results has been distributed on the decision device on the Web server " this keyword do not inquired about by other people in the recent period, and also do not have corresponding Query Result to go up issue then at " search engine search results Web server ":
Start " Webpage search device " search " non-homogeneous web results database " and " homology web results database " and find the web page address that meets searching key word, and obtain the content of these webpages.
If " Webpage search device " do not find the web page address that meets searching key word in " non-homogeneous web results database " and " homology web results database ", then return the result that the inquiry " does not have eligible webpage ", and this searching key word is joined next round to be upgraded in the task of " non-homogeneous web results database " and " homology web results database ", select into " non-homogeneous web results database " or " homology web results database " if in renewal process, found qualified web page address then whether had with source web page according to it, if so again the someone to search for same keyword be just can find the result.
The 3rd step: by " web page contents separation vessel " web page contents and the hyperlink target that finds resolved into: kinds such as multimedia, picture, literal, hyperlink.
The 4th step: produce court verdict by various content decision devices respectively
A. produce target web contained " identical multimedia file degree SMS " (Same Media Score) (multimedia definition comprises: the broadcast service or the file service of the broadcast service of Flash class, vedio/audio file or file service, IPTV/ direct broadcasting satellite/audio-video monitoring/real-time information such as performance/manual answering, other multimedia services) by " content of multimedia decision device ".
B. produce target web contained " the degree SPS of identical picture " (Same Photo Score) by " image content decision device ".
C. produce target web contained " the degree STS of same text " (Same Text Score) by " word content decision device ".
D. produce target web contained " the degree SHS of identical super connection " (Same Hyperlinks Score) by " linked contents decision device ".
The 5th step: obtain " multimedia judgement weight SMP ", " picture is adjudicated weight SPP ", " literal judgement weight STP ", " link judgement weight SHP " from " with source web page decision rule storehouse " respectively and go on foot " identical multimedia file degree SMS ", " the degree SPS of identical picture ", " the degree STS of same text ", " the degree SHS of identical super connection " the doing mathematics multiplication that generates with the 4th respectively.
The 6th step: the mathematics multiplication result that will obtain in " the 5th step " is done addition, obtains " homology degree SSS (the Same Sourc Score) " of webpage, homology degree SSS=(SMS*SMP)+(SPS*SPP)+(STS*STP)+(SHS*SHP)
The 7th step: whether " the homology degree SSS " that judge this webpage exceeds thresholding, if exceed thresholding then be judged to be " same source web page " with other webpage, if do not exceed thresholding then be judged to be " non-homogeneous webpage ".
The 8th step: " the non-homogeneous webpage " that will produce in " the 7th step " goes into " non-homogeneous web results database " by " non-homogeneous webpage processing module "; " the same source web page " that will produce in " the 7th step " gone into " homology web results database " by " with the source web page processing module ".
The 9th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to inquiring user by browser again.(seeing figure " M2 " mark).
As the another kind of implementation method in the 9th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" web page contents sorter " can be realized by software, direct basis " Html grammer ", " ASP/ASPX grammer ", and " PHP ", the syntax parsing that uses on the various webpages such as " JSP " goes out the type of each content.
" homology multimedia processing module "
Fig. 4 is " homology multimedia processing module " process flow diagram.For multimedia file that meets search condition or service, " homology multimedia processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous multimedia index database " and " homology multimedia index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period.
" homology multimedia processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is multimedia file or service (for example, contain in the keyword searching of " MP3 " expression needs be .MP3 file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the multimedia interface that obtains that search condition has identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible multimedia ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology multimedia index database " and " non-homogeneous multimedia index database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology multimedia index database " and " non-homogeneous multimedia index database ":
A. by emerging multimedia file of " multimedia search device " search and webpage or service entrance, enter this inlet by software and obtain this document or service.
B. by " content of multimedia decision device " judge new-found content of multimedia " belonging to same content? " with the content of current " homology multimedia index database " if "Yes" then it is included into this classification of " homology multimedia index database " as a new element; If "No" then judge that by " content of multimedia decision device " content of its " with current non-homogeneous multimedia index database " belongs to same content? "
If C. "Yes" then: " for current multimedia and with it homology and be stored in multimedia in ' non-homogeneous multimedia index database ', a newly-built classification is also all transferred to ' homology multimedia index database ' "; If "No" then " be the current newly-built classification of multimedia, and deposit in ' non-homogeneous multimedia index database ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" homology picture processing module "
Fig. 5 is a homology picture processing module process flow diagram.For picture file that meets search condition or link, " homology picture processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous picture indices database " and " homology picture indices database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology picture processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and declaring according to key words content and keyword grammer by software
Disconnected needs are looked for is picture file or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the picture in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible picture ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology picture indices database " and " non-homogeneous picture indices database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology picture indices database " and " non-homogeneous picture indices database ":
A. by emerging picture file of " picture searching device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " image content decision device " judge new-found image content " belonging to same content? " with the content of current " homology picture indices database " if "Yes" then it is included into this classification of " homology picture indices database " as a new element; If "No" then judge that by " image content decision device " content of its " with current non-homogeneous picture indices database " belongs to same content? "
If C. "Yes" then: " for current picture and with it homology and be stored in picture in ' non-homogeneous picture indices database ', a newly-built classification is also all transferred to ' homology picture indices database ' "; If "No" then " be the current newly-built classification of picture, and deposit in ' non-homogeneous picture indices database ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" homology document process module "
Fig. 6 is a homology document process module process flow diagram.Homology document process module " support common document format: " .Txt ", " .Doc ", " .PPT ", " .PDF ", " .XLS " or the like.For document files that meets search condition or link, " homology document process module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous document index database " and " homology document index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology document process module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is document files or link (for example, contain in the keyword searching of " .PDF " expression needs be .PDF file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the document in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible document ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology document index database " and " non-homogeneous document index database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology document index database " and " non-homogeneous document index database ":
A. by emerging document files of " document searching device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " word content decision device " and " image content decision device " judge new-found document content " belonging to same content? " with the content of current ' homology document index database ' if "Yes" then it is included into this classification of " homology document index database " as a new element; If "No" then judge that by " document content decision device " content of its " with current non-homogeneous document index database " belongs to same content? "
If C. "Yes" then: " for current document and with it homology and be stored in document in ' non-homogeneous document index database ', a newly-built classification is also all transferred to ' homology document index database ' "; If "No" then " be the current newly-built classification of document, and deposit in ' non-homogeneous document index database ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" homology software processing module "
Fig. 7 is a homology software processing module process flow diagram.For software document that meets search condition or link, " homology software processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous software index data base " and " with the source software index data base " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology software processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is software document or link (for example, contain in the keyword searching of " .EXE " expression needs be .EXE file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the software in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible software ".
The 4th step: this searching key word is joined next round upgrade in the task of " with the source software index data base " and " non-homogeneous software index data base ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " with the source software index data base " and " non-homogeneous software index data base ":
A. by emerging software document of " software search device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " software content decision device " judge new-found software content " belonging to same content? " with the content of current " with the source software index data base " if "Yes" then it is included into this classification of " with the source software index data base " as a new element; If "No" then judge that by " software content decision device " content of its " with current non-homogeneous software index data base " belongs to same content? "
If C. "Yes" then: " for current software and with it homology and be stored in software in ' non-homogeneous software index data base ', a newly-built classification is also all transferred to ' with the source software index data base ' "; If "No" then " be the current newly-built classification of software, and deposit in ' non-homogeneous software index data base ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" with source data or database processing module "
Fig. 8 is with source data or database processing module process flow diagram.For software document that meets search condition or link, " homology data processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous data directory database " and " homology data directory database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology data processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and judge that by data based key words content and keyword grammer what need look for is data file or link (for example, contain in the keyword searching of " .DBF " expression needs be ..DBF file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the data in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible data ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology data directory database " and " non-homogeneous data directory database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology data directory database " and " non-homogeneous data directory database ":
A. by emerging data file of " data search device " search and webpage or link inlet, enter this inlet by data and obtain this document or service.
B. by " data content decision device " judge new-found data content " belonging to same content? " with the content of current " homology data directory database " if "Yes" then it is included into this classification of " homology data directory database " as a new element; If "No" then judge that by " data content decision device " content of its " with current non-homogeneous data directory database " belongs to same content? "
If C. "Yes" then: " for current data and with it homology and be stored in data in ' non-homogeneous data directory database ', a newly-built classification is also all transferred to ' homology data directory database ' "; If "No" then " be the current newly-built classification of data, and deposit in ' non-homogeneous data directory database ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" homology GIS message processing module "
Fig. 9 is " homology GIS message processing module " process flow diagram.For the GIS message file or the link that meet search condition, " homology GIS message processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous GIS information index database " and " the homology GIS information index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology GIS message processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is GIS message file or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the GIS information in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible GIS information ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology GIS information index database " and " non-homogeneous GIS information index database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology GIS information index database " and " non-homogeneous GIS information index database ":
A. by emerging GIS message file of " GIS information searcher " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " GIS information content decision device " judge the new-found GIS information content " belonging to same content? " with the content of current " homology GIS information index database " if "Yes" then it is included into this classification of " homology GIS information index database " as a new element; If "No" then judge that by " GIS information content decision device " content of its " with current non-homogeneous GIS information index database " belongs to same content? "
If C. "Yes" then: " for current GIS information and with it homology and be stored in GIS information in ' non-homogeneous GIS information index database ', a newly-built classification is also all transferred to ' homology GIS information index database ' "; If "No" then " be the current newly-built classification of GIS information, and deposit in ' non-homogeneous GIS information index database ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" with the value network service processing module "
Figure 10 is " with the value network service processing module " process flow diagram.For the network service that meets search condition, " with the value network service processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" with the value information processing module " is with in result is sub-category is placed on " non-with value network service index data base " and " serving index data base with value network ", and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." with the value network service processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is network service document or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the network service in identical source among this result and aggregate into one " title search result ", after clicking " same value document " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible network service ".
The 4th step: this searching key word is joined next round upgrade in the task of " with value network service index data base " and " non-", and regularly start the renewal process of two databases with value network service index data base.
The 5th step: the renewal process of " with value network service index data base " and " non-" with value network service index data base:
A. by emerging network service document of " network service search device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " network service content decision device " judge new-found network service content " belonging to same content? " with the content of current " with value network service index data base " if "Yes" then it is included into this classification of " with value network service index data base " as a new element; If "No" then judge that by " network service content decision device " content of its " with current non-with value network service index data base " belongs to same content? "
If C. "Yes" then: " for current network service and with it be worth and be stored in network service in ' non-' with value network service index data base, a newly-built classification is also all transferred to ' serving index data base with value network ' "; If "No" then " serve a newly-built classification, and deposit in ' non-with value network service index data base ' " for current network;
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " with being worth the webpage result database " and " non-" with being worth the webpage result database, be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" with being worth the business information processing module "
Figure 11 is " with being worth the business information processing module " process flow diagram.For the business information that meets search condition, " with being worth the business information processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" with the value information processing module " is with in result is sub-category is placed on " non-with being worth the business information index data base " and " with being worth the business information index data base ", and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." with being worth the business information processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is business information file or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the business information in identical source among this result and aggregate into one " title search result ", after clicking " same value document " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible business information ".
The 4th step: this searching key word is joined next round upgrade in the task of " with being worth the business information index data base " and " non-", and regularly start the renewal process of two databases with being worth the business information index data base.
The 5th step: the renewal process of " with being worth the business information index data base " and " non-" with being worth the business information index data base:
A. by emerging business information file of " business information searcher " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " business information content decision device " judge new-found business information content " belonging to same content? " with the content of current " with being worth the business information index data base " if "Yes" then it is included into this classification of " with being worth the business information index data base " as a new element; If "No" then judge that by " business information content decision device " content of its " with current non-with being worth the business information index data base " belongs to same content? "
If C. "Yes" then: " for current business information and with it be worth and be stored in business information in ' non-' with being worth the business information index data base, a newly-built classification is also all transferred to ' with value business information index data base ' "; If "No" then " be the current newly-built classification of business information, and deposit in ' non-with be worth business information index data base ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " with being worth the webpage result database " and " non-" with being worth the webpage result database, be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
The characteristics of " with being worth the business information processing module " are and can judge whether a plurality of business information targets have identical use value to the inquiry automatically with inquiry's distribution according to commodity or service feature, supply, thereby as the foundation that it is aggregated into " title search result ", and the foundation of Query Result ordering.
The content decision device can be general in various " homology (with being worth) message processing modules ".
" content decision device " specific implementation
" content of multimedia decision device " specific implementation:
1 input: many matchmakers file (record into file if the service of playing just will rise, or obtain media file information) that can receive a plurality of sources from Play Server.
2 handle: carry out the comparison of the content of multimedia goodness of fit.
3 return: calculate the identical content degree value that has in the input multimedia: SameMediaPower.
The specific implementation method:
The 1st step: receive " being judged object ": the multimedia that can receive a plurality of sources.And record is judged the quantity of object: InputQuantity.
The 2nd step: search the attribute that " being judged object " can participate in comparing in following table, write down the quantity that current attribute has identical value " being judged object ": SameQuantity (for example, 5 are judged in the object, there is the attribute of 3 objects to have identical value, then the SameQuantity=3 of this attribute)
The 3rd step: import current attribute " weight " value (from following table, finding) in deterministic process: Power
The 4th step: calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power
The 5th step: return " the 1st step " to next " attribute " execution " the 1st step "~" the 4th step ", obtain the PSame of this attribute.Until obtain subordinate's property the PSame value.
The 6th step: the identical content degree value of calculating and return " being judged object ": SameMediaPower=(all mathematics accumulated values of Psame value)/InputQuantity.
Content is judged in video file or the service of playing:
Figure G2006107905720060228D000321
Figure G2006107905720060228D000331
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.
Audio file is judged content:
Note:
1 the invention reside in the method that employing " weight " value is calculated the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.
2 according to actual conditions, and some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.
The Flash file is judged content:
Figure G2006107905720060228D000351
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.
" image content decision device " specific implementation
1 input: the picture that can receive a plurality of sources.
2 handle: carry out the comparison of the image content goodness of fit.
3 return: calculate the identical content degree value that has in the input picture: SamePicPower.
The specific implementation method:
The 1st step: receive " being judged object ": the picture that can receive a plurality of sources.And record is judged the quantity of object: InputQuantity.
The 2nd step: search the attribute that " being judged object " can participate in comparing in following table, write down the quantity that current attribute has identical value " being judged object ": SameQuantity (for example, 5 are judged in the object, there is the attribute of 3 objects to have identical value, then the SameQuantity=3 of this attribute)
The 3rd step: import current attribute " weight " value (from following table, finding) in deterministic process: Power
The 4th step: calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power
The 5th step: return " the 1st step " to next " attribute " execution " the 1st step "~" the 4th step ", obtain the PSame of this attribute.Until obtain subordinate's property the PSame value.
The 6th step: the identical content degree value of calculating and return " being judged object ": SamePicPower=(all mathematics accumulated values of Psame value)/InputQuantity.
According to the judgement of the various attributes of picture and image recognition software for similarity degree.
Figure G2006107905720060228D000361
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.
" word content decision device " specific implementation
" word content decision device ", can realize by software:
1 input: can receive the literal in a plurality of sources, as " being judged object ".
2 handle: carry out the comparison of the image content goodness of fit.
3 return: the consistent degree value SameTextPower between " being judged object ".
Implementation method:
The 1st step: find out in a plurality of pictures of input
In the word content, has the total length value of the part of identical word or sentence: SameLenth.
The 2nd step: find out in a plurality of word contents of input the length value of the input characters that length is the shortest, MinLenth.
The 3rd step: return literal similarity degree value: SameTextPower=SameLenth/MinLenth
In the literal that finds in this way: the normally same piece of writing article number of pages of the long article word of length is few or contain mass advertising and outside hyperlink, and the shortest normally same piece of writing of the literal article of length is divided into multipage number more or contain minimum advertisement and outside hyperlink.
" linked contents decision device " specific implementation
" linked contents decision device " can be realized by software: be used for comparing the hyperlink that is contained on a plurality of webpages and whether have common trait.
1 input: the Url address (every group of whole hyperlinks that hyperlink normally obtains from a webpage) of organizing hyperlink more.
2 handle: carry out goodness of fit calculating in hyperlink Url address between each group
3 return: have identical hyperlink number between each group.
Implementation method:
The 1st step: receive " being judged object ": the URL address of organizing hyperlink more.
The 2nd step: the URL number of addresses that statistics " being judged object " similarity degree: SameURLPower=all occurred every group of hyperlink.
The 3rd step: return SameURLPower.
" software content decision device " specific implementation
" software content decision device ", whether a plurality of softwares that are used for comparing input are software of the same race.
1 input: the software that can receive a plurality of sources.
2 handle: carry out the comparison of the software content goodness of fit.
3 return: software content goodness of fit numerical value.
The specific implementation method:
The 1st step: receive " being judged object ": the file of a plurality of inputs or catalogue.And record is judged the quantity of object: InputQuantity.
The 2nd step: search the attribute that " being judged object " can be compared in following table, write down the quantity that current attribute has identical value " being judged object ": SameQuantity (for example, 5 are judged in the object, there is the attribute of 3 objects to have identical value, then the SameQuantity=3 of this attribute)
The 3rd step: import current attribute " weight " value (from following table, finding) in deterministic process: Power
The 4th step: calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power.
The 5th step: return " the 1st step " to next " attribute " execution " the 1st step "~" the 4th step ", obtain the PSame of this attribute.Until obtain subordinate's property the PSame value.
The 6th step: the identical value of calculating and return " being judged object ": SameSoftPower=(all mathematics accumulated values of Psame value)/InputQuantity.
Figure G2006107905720060228D000391
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.
" data or data-base content decision device " specific implementation
Whether every data recording content comparing one by one in the different pieces of information library file equates, returns the database consistent degree value SameDBPower that participates in comparison and whether surpasses thresholding.
The database of the record number that the SameDBPower=field name is identical and numerical value equates/participation comparison has the minimum record number of this field.
SameDBPower has reflected that identical content record number has the ratio of the database of minimum record number relatively, and the SameDBPower value is: 0~1.
" data or data-base content decision device " specific implementation
Can adopt following performing step for data file:
The 1st step: in a plurality of data files that participate in comparison, file of picked at random is as " comparison standard ".
The 2nd step: carry out the conforming rough comparison of other file and " comparison standard ": file size, file verification and, file attribute informations such as title, theme, version, author, classification, key word, remarks.
The 3rd step: if unanimity then be judged to be " rough consistent ", such judged result is the output of conduct " data or data-base content decision device " directly.
The 4th step: further compare as need, in the input file that obtains " rough consistent ", carried out for the 5th step.
The 5th step: meticulous comparison: the comparison one by one of each byte in file attribute information and the file.All all identical file of feature can be judged to be " in full accord ", as the output of " data or data-base content decision device ".
Can adopt following performing step for database file:
The 1st step: the database file to input judges whether to meet database format of the same race according to filename suffix and file attribute.
The 2nd step: carried out for the 3rd step for database format of the same race, for direct the 4th step of database format not of the same race
The 3rd step: form database of the same race compares roughly: file size, file verification and, file attribute informations such as title, theme, version, author, classification, key word, remarks.Above-mentioned feature carried out for the 4th step not in full conformity with as the output of " inconsistent " judged result for the database file that meets fully.
The 4th step: the meticulous comparison of database: (this step adapts to various database file and participates in the content comparison).Form according to every kind of database file extracts its " database table " one by one, and judge whether its " database table " structure is consistent: inconsistent conduct " inconsistent " output, consistent database file carried out for the 5th step.
The 5th step: the content of comparing every record of the database file that participates in comparison one by one: run into the identical situation of recorded content: for counter " the record number that the SameRecNum field name is identical and numerical value equates " adds 1.
The 6th step: calculate " SameDBPower database consistent degree value "=" the record number that the SameRecNum field name is identical and numerical value equates "/" database that participates in comparing has the minimum record number of this field ".(SameDBPower has reflected that identical content record number has the ratio of the database of minimum record number relatively, and the SameDBPower value is: 0~1).
The 7th step: judge that whether " SameDBPower database consistent degree value " surpasses thresholding, surpass thresholding and then export " unanimity " as judged result, otherwise output " inconsistent " is as judged result.
" GIS information content decision device "
" GIS information content decision device ", can realize by software:
1 input: can receive the numerical map in a plurality of sources, as " being judged object ".
2 handle: carry out the goodness of fit comparison of the coverage of numerical map.
3 return: the consistent degree value SameMapPower (value 0~1) between " being judged object ".
Implementation method:
The 1st step: open the numerical map file of participating in comparison according to the form of numerical map.
The 2nd step: find the northwest corner of numerical map and the longitude and latitude of southeast corner (also can be the map diagonal angle of other form).
The 3rd step: the northwest corner of the numerical map of comparing and longitude, the latitude error of southeast corner are participated in comparison, calculate the consistance value SameMapPower of map overlay area:
Suppose that " Fig. 1 " and " Fig. 2 " participates in comparison:
Then:
The area of minimum map in the secondary map of area/two of SameMapPower=two secondary map overlapping regions.
The 4th step: return the SameMapPower value.
The 5th step: judge whether (for example: threshold value=0.8), be then to be judged to be identical map, be not then to be judged to be map inequality to SameMapPower above thresholding.
" network service content decision device "
The FTP service content judgement of " network service content decision device ":
The 1st step: adopt corresponding File Transfer Protocol to land the service that participates in comparison, and obtain its inner file.
The 2nd step: behind the file that obtains the FTP service, at first judge according to the filename suffix whether file type is consistent, if inconsistent returning " inconsistent " is as output, if the file type unanimity carried out for the 3rd step.
The 3rd step: whether consistent, and return its judged result if adopting " content of multimedia decision device ", " image content decision device ", " word content decision device ", " software content decision device ", " data or data-base content decision device " or " GIS information content decision device " to adjudicate its file content according to file type.
The mailbox service content judgement that the Email website provides:
If the mailbox service information spinner that the Email website provides is by the webpage of each website of software search, and from the webpage label, parse mailbox size, charge situation, whether support information such as POP agreement.
The 1st step: mailbox size is divided into corresponding grade, (for example: 10MB~25MB, 25MB~100MB, 100MB~300MB, 300MB~1GB, 1GB~100GB etc.), whether the mailbox that judge to participate in comparison then is in same rank, if " be not " then return " inconsistent ", if "Yes" then carried out for the 2nd step.
The 2nd step: whether comparison " charge situation " is consistent, if " not being " then return " inconsistent ", if "Yes" then carried out for the 3rd step.
The 3rd step: comparison supports whether the POP terms of agreement is consistent, if " not being " then return " inconsistent ", if "Yes" then return " unanimity ".
" business information content decision device "
Whether product of issuing on webpage or service sale information is identical, and in identical physical geography scope, in the identical administrative geography scope, identical distance range.
The 1st step: whether the business information that comparison participates in comparison is identical product or service, if " not being " returns " inconsistent ", if "Yes" entered for the 2nd step.
The 2nd step: whether the business information that judge to participate in comparison (for example: personal consumption class commodity, need have geographic position susceptibility to on-the-spot service of serving has geographic position susceptibility, for example ice cream, private tutor's service etc.), if " be not " to return judged result " unanimity ", and if "Yes" would carry out the 3rd the step.
The 3rd step: whether the supplier who judges the business information that participates in comparison is in identical city or zone, if " not being " returns judged result " inconsistent ", if return judged result " unanimity ".
" obtain web page user attention rate subsystem "
Figure 12 is for obtaining web page user attention rate subsystem structure figure.This search engine can and supporting with it web browser (or compatible this search engine can and supporting with it web browser between other third party's browsers of communications protocol) the collaborative work mode, gather the degree of concern of user by web browser to each webpage, and report search engine, the foundation of carrying out search result rank or selection " title search result " as search engine.This method and device can also be separately outside search engines, and independent formation can provide the Web inquiry system of " webpage popular degree ranking list ", and can carry out charge operation or in return condition exchange other interests for.
Native system mainly comprises the two large divisions: " the PageFocus webserver " and " PageFocus web browser ".
" the PageFocus webserver " structure
" the PageFocus webserver " obtains the degree of concern of global user to each webpage by " PageFocus web browser ", and forms " pay close attention to score value PageFocus " database of this webpage, as the metric of the popular degree of webpage.
" the PageFocus webserver " is made up of following:
(1) " PageFocus browser ID registrar ": for " the PageFocus web browser " that is just using on network distributes globally unique ID identification number.
(2) " the PageFocusAccServer webpage is paid close attention to statistical server ": " the paying close attention to score value PageFocus " for one or more webpages that comprises in " PageFocus packet " that " the PageFocus web browser " that the reception whole world is being moved sent.Be used for distinguishing the different users that browses for ID number.
(3) " PageFocus browser online upgrading server ": be used for providing online upgrade service to the whole world " PageFocus web browser ".
(4) " data encrypting and deciphering module ": be used between " the PageFocus webserver " and " PageFocus web browser ", transmitting enciphered data, place and attacked or steal information.
" PageFocus web browser " structure
" PageFocus web browser " reports the degree of concern of active user for certain webpage by network to " the PageFocus webserver ".
" PageFocus web browser " is made up of following:
(1) " pays close attention to score value PageFocus computing module ": according to the operation of user to " PageFocus web browser ", calculate the degree of concern of user, and form " PageFocus packet " to " the PageFocusAccServer webpage is paid close attention to statistical server " report to certain webpage.
(2) " PageFocus browser ID Registering modules ": with " PageFocus browser ID registrar " communication obtaining globally unique sign ID, as the foundation of distinguishing different user.
(3) " PageFocus browser online upgrading module ":, be latest edition to keep " PageFocus browser " on active user's computing machine with " PageFocus browser online upgrading server " communication.
This device comprises: " the PageFocus web browser " of the invention, " PageFocus browser ID registrar " and " webpage score server ", and the specific implementation method is as follows:
The 1st step: develop special " a PageFocus web browser ", each browser all possesses globally unique ID identification number when mounted, or initiatively seeks " PageFocus browser ID registrar " on the network in use to obtain globally unique ID identification number.
The 2nd step: " PageFocus web browser " possesses and (for example: the repertoire IE browser of Microsoft) has the general networks browser.
The 3rd step: " PageFocus web browser " also possesses the user converted to " paying close attention to score value PageFocus " of webpage and forms " PageFocus packet " according to the listed weight of following table to the operation of browser with to the operation of webpage, be passed to " the PageFocusAccServer webpage is paid close attention to statistical server " of this search engine with cipher mode by procotol.
The 4th step: " paying close attention to score value PageFocus " that " the PageFocusAccServer webpage is paid close attention to statistical server " comprises its inside after " PageFocus packet " that each " PageFocus web browser " of receiving the whole world sent is added on the corresponding webpage.
The 5th step: " paying close attention to score value PageFocus " of each webpage of the whole world that comprises on " PageFocusAccServer webpage pay close attention to statistical server ", these information can form by various disposal routes: search engine is selected to can be used as the foundation of " title search result ", also can directly be announced out the service of conduct " webpage hot topic degree ranking list " according to, search engine the webpage seniority among brothers and sisters in having the identical content Search Results.
The method that " PageFocus web browser " calculating " is paid close attention to score value PageFocus ":
Because the repertoire that " PageFocus web browser " has generic browser, so can be when the user uses browser, gather its operation behavior according to following table, and according to " weight " of every kind of behavior this webpage is carried out " paying close attention to score value PageFocus " and score, and when browser thoroughly cuts out this webpage, form a branch value record of " paying close attention to score value PageFocus " about this webpage, issue with the form of " PageFocus packet "
" the PageFocusAccServer webpage is paid close attention to statistical server ".
Figure G2006107905720060228D000451
Figure G2006107905720060228D000461
Figure G2006107905720060228D000481
Note:
1. though have erroneous judgement with these standards of grading, can obtain statistical accuracy by a large amount of operations on the network.
2. listed " weight " concrete numerical value in the table is representative value only, and the invention reside in by browser is page marking, and the change of any other " weight project " and " weight " all belongs to category of the present invention.
3. adopt the user that the mode of webpage ballot is based on abundant trust for netizen's social morality, so its " weight " to the mathematics multiplication of whole score, rather than the mathematics addition.
4. because each webpage all may obtain a large amount of PageFocus scores, may cause overflowing of software variable, so can adopt " mathematics logarithm " or " scientific notation " record score " the PageFocusAccServer webpage is paid close attention to statistical server ".
5. be other approach of this method, except when browser thoroughly cuts out this webpage, forming " PageFocus packet ", can also determine the opportunity of " PageFocus packet " with other any regular, for example: regularly, be accumulated to certain score value or the like, these methods all belong to category of the present invention.
6. the detailed calculated method of " every style of writing word reading rate " in showing:
A. mouse roller rolls: the each literal line number of rolling of word read speed=(viewing area width/set width) */rolling time at interval.
B. keyboard page turning: the literal line number/page turning time interval of word read speed=(viewing area width/set width) each page turning of *.
C. the forms scroll bar rolls: the each literal line number of rolling of word read speed=(viewing area width/set width) */rolling time at interval.
The formation method of " PageFocus packet "
The content of " PageFocus packet ":
Figure G2006107905720060228D000491
Note: each " PageFocus packet " can comprise the call of a plurality of webpages.Every webpage call can also add other attribute, but in order to raise the efficiency, only lists important contents in the table, adds other attributes and also belong to category of the present invention in table." PageFocus packet " sends the selection on opportunity:
Reduce to send bandwidth that " PageFocus packet " take and the pressure that brings to server end, can take one of following several schemes:
When certain webpage is thoroughly sent " PageFocus packet " when browser cuts out.
When thoroughly cutting out, browser sends " PageFocus packet ".
Browser is retained in local computer with " PageFocus packet " with document form, runs up to specific quantity or length-specific or special time and sends during the cycle again.
" title search result " selection algorithm
This algorithm is mainly used in " homology Search Results " how to select to be used as " title search result " in original searching results.This algorithm need address the problem:
1. judge the content quality of webpage by network user behavior and web page contents, the preferential demonstration that quality is high.
2. avoid a certain Search Results to bear too much click traffic, cause the slack-off even collapse of website processing speed because of becoming " title search result ".
3. avoid a certain Search Results to bear too much click traffic and cause service response speed slack-off, and reduce visitor's experience good opinion because of becoming " title search result ".
4. making becomes " title search result " as a kind of power, can offer the website that needs, and this power can be bought in these websites.
5. the baseline results of each " homology Search Results " all has an opportunity to become " title search result " according to certain probability.
" title search result " system of selection is, when in " homology Search Results ", selecting " title search result ", " search result content quality ", " weighted value " and " service response delay " three key elements have been considered simultaneously, that is: the preferential demonstration that content quality is high, the preferential demonstration that has preferential demonstration, the network of weighting to serve; Then still according to this principle, and " weighted value " can be bought to system operator of the present invention when arranging all " homology Search Results ".The specific implementation method that " title search result " selects is as follows:
The 1st step: calculating each " homology Search Results " becomes the probability weights PWn of " title search result " (this Search Results is the n bar):
PWn=TP*PageFocus/(RespDelay-K)
Note 1: when (RespDelay-K) smaller or equal to zero the time, (RespDelay-K) answering value is 1.
Note 2: the variable implication is as follows in the formula
A.PageFocus webpage attention rate value: be this Search Results according to the present invention in " obtaining the method and apparatus of web page user attention rate " " PageFocus value " of being obtained.
B.RespDelay web service operating lag: be that this Search Results is at the operating lag when the searchers provides service access.(because the operating lag that depends on the website is experienced in visit, react slow more, it is poor more to experience).
C.K service response constant: be the constant that can define, 50 milliseconds (ms) used in suggestion, and the service response that is lower than the K value postpones and will do not discovered, and does not influence experience, thereby can ignore.
The D.TP title search is power as a result: as a kind of weighting, anyone can obtain " the TP title search is power as a result " by various give-and-take conditions with the network operator of system of the present invention.
E. as other implementation algorithm of this formula, following other form can also be arranged:
a.PWn=(TP+PageFocus)/(RespDelay-K)
b.PWn=(TP+PageFocus)/RespDelay/K
c.PWn=TP*PageFocus/RespDelay/K
The 2nd step: the summation of the probability weights PWn of statistics summation all original " homology Search Results ": the whole probability weights of PWall.
The 3rd step: calculate the probability that every " homology Search Results " becomes " title search result ": Pn=PWn/PWall.
The 4th step: according to the probability of Pn value,, dynamically select at random " title search result ", present to the searchers along with searchers's visit action.
The adaptive apparatus and method of web site contents style
Content of the present invention is: utilize the various information of can be obtainable, helping to judge user's environment of living in and state, make the user who is in different operating or life leisure state under the prerequisite that need not any operation, registration, setting or Cookie setting, see different styles during visit same page URL address, comprising:
1. utilize user's IP address to judge its residing country or zone,, can judge that by his time he is in the duty state that still lies fallow again in conjunction with just can calculating local administrative region time of visitor by this website time.
2. by user's IP address, can inquire the attribute of this IP address: family, workplace.Style and the content that is fit to its environment of living in is provided according to its place of living in.
3. can know its residing geographic position by user's IP address, when the inquiry business information, can will be arranged in the foremost apart from he nearest supplier automatically.
Be exemplified below:
Synchronization, the content of seeing during webpage of identical URL in this website of different user captures is different:
A. the user in duty and the environment sees is serious, brief introduction, the page that does not contain leisure recreation and amusement information.
What the user in state and the environment of B. lying fallow saw is the page of livening up, can containing leisure recreation and amusement information, can contain the personal consumption advertising message.
The present invention can partly or entirely be applied to the web station system beyond the search engine, all belongs to category of the present invention.
Each large-scale website in order to satisfy the visit of big flow, has all adopted server cluster, even has set up the local service subsystem in the zone at present, shunts user capture.But being exactly each cluster member, the key character of present server cluster all provides identical content.As Figure 13: the preceding user who visits is by " Website server cluster inlet " equipment, any feature of part, directly be assigned on certain server cluster member server with identical content.
As Figure 14, and device of the present invention has been done partly change to said structure, after " Website server cluster inlet " receives calling party, whether in running order the various customer attribute informations such as IP address that send during according to its access websites judge whether it is in running order, and provide the information service of different-style and content to it according to it.
Automatically judge User Status and provide appropriate web page style and the method for content
The 1st step: at first server cluster is divided into " work style " and " individual and leisure style " two big classes, no matter be static page or dynamic page, in the identical content of this two classes server update, automatically produce two class styles, so that the user of different operating or life leisure state sees different styles when visit same page URL address.
The 2nd step: after " Website server cluster inlet " receives that the user visits the request of this website webpage first, at first obtain its IP address at (or in IP layer protocol) in the access protocal.
The 3rd step: inquiring about its IP address according to the IP address in " IP address properties database " is " IP address, workplace " or " the IP address of individual or leisure occasion ", if " IP address, workplace " then carried out for the 4th step, if then carried out for the 5th step " the IP address of individual or leisure occasion ".
The 4th step: obtain " IP address, workplace " residing geographic position, and obtain administrative time of this geographic area, (week 1~5 8:00~20:00) then is assigned to its visit " work style server " in the server cluster and goes up to provide to it and be fit to the page service that use the workplace, otherwise carries out for the 5th step if this IP address affiliated area is in the working time.
The 5th step: " individual and leisure style server " that then its visit be assigned in the server cluster upward provides the page service that is fit to individual and the use of leisure state to it.

Claims (13)

1. homologous information site search engine aggregation display method, it comprises the following steps:
(1) inquiry passes through Web browser or accessible with application software search engine, and input needs the keyword of inquiry;
(2) find whole qualified targeted sites as original searching results by search engine;
(3) the power buyer's who " becomes the title search result " by " homologous information processing module " inquiry accounts information, and in original searching results, choose the object that is used as " title search result " in conjunction with judgment rule;
(4) " the title search result " that only will be chosen by search engine Web server or application server shows the inquiry as Search Results, and provides a button that has " details is checked in expansion " implication for " title search result ";
(5) inquiry also can press the button corresponding with " title search result ", and search engine is illustrated in the original searching results that finds in (2) to it again.
2. homologous information site search engine aggregation display method according to claim 1 is characterized in that the treatment scheme of described " homologous information processing module " comprises the steps:
(1) information of the Webpage search device being received by the information category judge module is carried out the kind judgement, and wherein said " homologous information processing module " includes the information category judge module;
(2) by the information category judge module information of identical type is concentrated the message processing module that sends to respective type;
(3) will enter " non-homogeneous object information storehouse " or " homology object information storehouse " by the search information filing after the message processing module processing of respective type;
(4) " non-homogeneous object information storehouse " or " homology object information storehouse " is published on the Web server;
Wherein: " homologous information processing module " is made up of a plurality of " the homologous information processing modules of corresponding information kind ", and described " homologous information processing module " comprises " with the source web page processing module ", " homology multimedia processing module ", " homology picture processing module ", " homology document process module ", " homology software processing module ", " with source data or database processing module ", " homology GIS message processing module ", " with the value network service processing module " and " with being worth the business information processing module ".
3. homologous information site search engine aggregation display method according to claim 2 is characterized in that, the step of described " with the source web page processing module " processing info web is as follows:
(1) when search engine searches partly receives the keyword that needs inquiry, at first judge by the decision device that is distributed on the Search Results on the Web server whether this keyword was inquired about by other people in the recent period, if inquired about, and the result issues on the search engine search results Web server, then directly return Search Results, the webpage that will have identical source among this result aggregates into a Search Results, after clicking " same source web page " button, can on the search engine search results Web server, see the search result web page that comprises whole Search Results, finish whole query script;
(2) if when search engine searches partly receives the keyword that needs inquiry, judge that by the decision device that is distributed on the Search Results on the Web server this keyword do not inquired about by other people in the recent period, and also do not have corresponding Query Result on the search engine search results Web server, to issue then:
A. start " Webpage search device " search " non-homogeneous web results database " and " homology web results database " and find the web page address that meets searching key word, and obtain the content of these webpages;
If B. " Webpage search device " do not find the web page address that meets searching key word in " non-homogeneous web results database " and " homology web results database ", then return the result that the inquiry " does not have eligible webpage ", and this searching key word is joined next round to be upgraded in the task of " non-homogeneous web results database " and " homology web results database ", select into " non-homogeneous web results database " or " homology web results database " if in renewal process, found qualified web page address then whether had with source web page according to it, if so again the someone to search for same keyword be just can find the result;
(3) by " web page contents separation vessel " web page contents and the hyperlink target that finds resolved into: multimedia, picture, literal, hyperlink kind;
(4) produce court verdict by various content decision devices respectively:
A. produce target web contained " identical multimedia file degree SMS (Same Media Score) " by " content of multimedia decision device ";
B. produce target web contained " the degree SPS of identical picture (Same Photo Score) " by " image content decision device ";
C. produce target web contained " the degree STS of same text (Same Text Score) " by " word content decision device ";
D. produce target web contained " the degree SHS of identical super connection (Same Hyperlinks Score) " by " linked contents decision device ";
(5) obtain " multimedia judgement weight SMP ", " picture is adjudicated weight SPP ", " literal judgement weight STP ", " link judgement weight SHP " from " with source web page decision rule storehouse " respectively and go on foot " identical multimedia file degree SMS ", " the degree SPS of identical picture ", " the degree STS of same text ", " the degree SHS of identical super connection " the doing mathematics multiplication that generates with (4) respectively;
(6) the mathematics multiplication result that (5) step was obtained is done addition, obtains " homology degree SSS (the Same Source Score) " of webpage, homology degree SSS=(SMS*SMP)+(SPS*SPP)+(STS*STP)+(SHS*SHP);
(7) whether " the homology degree SSS " that judges this webpage exceeds thresholding, if exceed thresholding then be judged to be " the same source web page " of other webpage, if do not exceed thresholding then be judged to be " non-homogeneous webpage ";
(8) " the non-homogeneous webpage " that (7) step was produced deposits " non-homogeneous web results database " in by " non-homogeneous webpage processing module "; " same source web page " that (7) step produced gone into " homology web results database " by " with the source web page processing module ";
(9) dynamically generate the static Web page of Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to inquiring user by browser again;
(10) as the another kind of implementation method in (9) step, also can directly present to inquiring user by " dynamic web page Web server " by browser.
4. homologous information site search engine aggregation display method according to claim 2 is characterized in that the treatment scheme of described " homologous information processing module " also comprises the steps:
(1) receiving inquiry's searching key word, and judging file or the service that needs are looked for according to key words content and keyword grammer by software;
(2) judge " content that will search for is distributed on the Web server? " if being distributed on, the content of search then directly returns Search Results on " search engine search results Web server ", will meet the multimedia interface that obtains that search condition has identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure; If the target of search is not distributed on " search engine search results Web server " since (3) step;
(3) return the inquiry and do not have qualified result;
(4) this searching key word is joined next round and upgrade in the task of " homologous information index data base " and " non-homogeneous information index database ", and regularly start the renewal process of two databases;
(5) renewal process of " homologous information index data base " and " non-homogeneous information index database ":
A. by emerging file destination of Webpage search device search and webpage or service entrance, enter this inlet by software and obtain this document or service;
B. by " content decision device " judge new-found information " belonging to same content? " with the content of current ' homologous information index data base ' if, "Yes" then it is included into the classification of " homologous information index data base " as a new element; If "No" then judge that by " content decision device " content of its " with current non-homogeneous information index database " belongs to same content? "
If C. "Yes" then: " for current information and with it homology and be stored in information in ' non-homogeneous information index database ', a newly-built classification is also all transferred to ' homologous information index data base ' "; If "No" then " be the current newly-built classification of information, and deposit in ' non-homogeneous information index database ' ";
(6) dynamically generate the static Web page of Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry who comes to search for by browser again;
(7) as the another kind of implementation method in (6) step, also can directly present to inquiring user by " dynamic web page Web server " by browser.
5. homologous information site search engine aggregation display method according to claim 4, it is characterized in that, when described " homologous information processing module " handled document, the renewal process of " homologous information index data base " and " non-homogeneous information index database " was:
(1) by emerging document files of " document searching device " search and webpage or link inlet, enters this inlet by software and obtain this document or service;
(2) by " word content decision device " and " image content decision device " judge new-found document content " belonging to same content? " with the content of current ' homologous information index data base ' if, "Yes" then it is included into the classification of " homologous information index data base " as a new element; If "No" then judge that by " document content decision device " content of its " with current non-homogeneous information index database " belongs to same content? "
(3) if "Yes" then: " for current document and with it homology and be stored in document in ' non-homogeneous information index database ', a newly-built classification is also all transferred to ' homologous information index data base ' "; If "No" then " be the current newly-built classification of document, and deposit in ' non-homogeneous information index database ' ".
6. according to claim 3, the described homologous information site search engine of 4 or 5 each claims aggregation display method, it is characterized in that the treatment scheme of described content decision device comprises the steps:
(1) receives " being judged object ": can receive the multimedia in a plurality of sources, and record is judged the quantity I nputQuantity of object;
(2) search " being judged object " set attribute that participates in comparing, write down the quantity SameQuantity that current attribute has identical value " being judged object ";
(3) " weight " value Power of the current attribute of input in deterministic process;
(4) calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power;
(5) return (1) next " attribute " carried out (1)~(4), obtain the PSame of this attribute, until the PSame value that obtains whole attributes;
(6) calculate and return the identical content degree value of " being judged object ": the mathematics accumulated value/InputQuantity of the whole Psame values of SameMediaPower=.
7. according to claim 3, the described homologous information site search engine of 4 or 5 each claims aggregation display method, it is characterized in that when the content decision device was the word content decision device, its treatment scheme comprised the steps:
(1) finds out the total length value SameLenth that has the part of identical word or sentence in the word content of input;
(2) find out in a plurality of word contents of input the length value MinLenth of the input characters that length is the shortest;
(3) return literal similarity degree value SameTextPower=SameLenth/MinLenth.
8. according to claim 3, the described homologous information site search engine of 4 each claims aggregation display method, it is characterized in that when the content decision device was the linked contents decision device, its treatment scheme comprised the steps:
(1) receives " being judged object ": the URL address of a plurality of hyperlinks;
(2) add up the URL number of addresses that " being judged object " similarity degree: SameURLPower=all occurred in each hyperlink;
(3) return SameURLPower.
9. homologous information site search engine aggregation display method according to claim 4 is characterized in that when the content decision device was business information content decision device, its treatment scheme comprised the steps:
(1) comparison participates in whether the business information of comparison is identical product or service, if " not being " returns " inconsistent ", if "Yes" entered for (2) step;
(2) whether the business information that judge to participate in comparison has geographic position susceptibility, if " not being " returns judged result " unanimity ", if "Yes" then carried out for (3) step;
(3) whether the supplier of the business information of judgement participation comparison is in identical city or zone, if " not being " returns judged result " inconsistent ", if return judged result " unanimity ".
10. homologous information site search engine aggregation display method according to claim 1 is characterized in that,
The specific implementation method that " title search result " selects is as follows:
(1) calculate the probability weights PWn that each " homology Search Results " becomes " title search result ":
PWn=TP*PageFocus/(RespDelay-K)
N: this Search Results is the n bar
When (RespDelay-K) smaller or equal to zero the time, (RespDelay-K) answering value is 1
PageFocus: webpage attention rate value
RespDelay: web service operating lag
K: service response constant: incur loss through delay and will do not discovered less than the service of this value,
TP: title search is power as a result
(2) summation of the probability weights PWn of statistics summation all original " homology Search Results ": the whole probability weights of PWall;
(3) calculate the probability that every " homology Search Results " becomes " title search result ": Pn=PWn/Pwall;
(4) according to the probability of Pn value,, dynamically select at random " title search result ", present to the searchers along with searchers's visit action.
11. homologous information site search engine aggregation display method according to claim 10 is characterized in that,
The computing method of the probability weights PWn of described " title search result " can also be:
A.PWn=(TP+PageFocus)/(RespDelay-K) or,
B.PWn=(TP+PageFocus)/RespDelay/K or,
c.PWn=TP*PageFocus/RespDelay/K。
12. homologous information site search engine aggregation display method according to claim 1 is characterized in that, described " homologous information processing module ":
(1) can be embedded in the search engine;
(2) can be placed between " search engine " and " search engine search results Web server ";
(3) also can be used as pretreatment module is placed between " search engine " and the searched website.
13. homologous information site search engine aggregation display method according to claim 1, described expansion check that the button of details implication can be super connection or various software interface control.
CN2006100079057A 2006-02-22 2006-02-22 Attention degree based same source information search engine aggregation display method Expired - Fee Related CN101025737B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN2006100079057A CN101025737B (en) 2006-02-22 2006-02-22 Attention degree based same source information search engine aggregation display method
PCT/CN2007/000370 WO2007095834A1 (en) 2006-02-22 2007-02-02 Composite display method and system for search engine of same resource information based on degree of attention
US12/279,949 US8176029B2 (en) 2006-02-22 2007-02-02 Composite display method and system for search engine of same resource information based on degree of attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2006100079057A CN101025737B (en) 2006-02-22 2006-02-22 Attention degree based same source information search engine aggregation display method

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN 201110228853 Division CN102298621B (en) 2006-02-22 2006-02-22 System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree

Publications (2)

Publication Number Publication Date
CN101025737A CN101025737A (en) 2007-08-29
CN101025737B true CN101025737B (en) 2011-08-17

Family

ID=38436934

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2006100079057A Expired - Fee Related CN101025737B (en) 2006-02-22 2006-02-22 Attention degree based same source information search engine aggregation display method

Country Status (3)

Country Link
US (1) US8176029B2 (en)
CN (1) CN101025737B (en)
WO (1) WO2007095834A1 (en)

Families Citing this family (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8166041B2 (en) * 2008-06-13 2012-04-24 Microsoft Corporation Search index format optimizations
CA2639438A1 (en) * 2008-09-08 2010-03-08 Semanti Inc. Semantically associated computer search index, and uses therefore
CN102043705A (en) * 2009-10-19 2011-05-04 阿里巴巴集团控股有限公司 Statistical method and apparatus for input behavior
KR101777347B1 (en) 2009-11-13 2017-09-11 삼성전자주식회사 Method and apparatus for adaptive streaming based on segmentation
KR101786051B1 (en) 2009-11-13 2017-10-16 삼성전자 주식회사 Method and apparatus for data providing and receiving
KR101750048B1 (en) 2009-11-13 2017-07-03 삼성전자주식회사 Method and apparatus for providing trick play service
KR101750049B1 (en) 2009-11-13 2017-06-22 삼성전자주식회사 Method and apparatus for adaptive streaming
US20110119268A1 (en) * 2009-11-13 2011-05-19 Rajaram Shyam Sundar Method and system for segmenting query urls
KR101737084B1 (en) 2009-12-07 2017-05-17 삼성전자주식회사 Method and apparatus for streaming by inserting another content to main content
KR101777348B1 (en) 2010-02-23 2017-09-11 삼성전자주식회사 Method and apparatus for transmitting and receiving of data
US8972418B2 (en) * 2010-04-07 2015-03-03 Microsoft Technology Licensing, Llc Dynamic generation of relevant items
CN101853300B (en) * 2010-05-26 2013-01-30 中国科学技术大学 Method and system for identifying and evaluating video downloading service website
KR101837687B1 (en) * 2010-06-04 2018-03-12 삼성전자주식회사 Method and apparatus for adaptive streaming based on plurality of elements determining quality of content
CN101854399A (en) * 2010-06-09 2010-10-06 宇龙计算机通信科技(深圳)有限公司 Method and device for aggregating network data
US10713312B2 (en) 2010-06-11 2020-07-14 Doat Media Ltd. System and method for context-launching of applications
US9069443B2 (en) 2010-06-11 2015-06-30 Doat Media Ltd. Method for dynamically displaying a personalized home screen on a user device
WO2011156605A2 (en) 2010-06-11 2011-12-15 Doat Media Ltd. A system and methods thereof for enhancing a user's search experience
CN102375823B (en) * 2010-08-13 2014-11-05 腾讯科技(深圳)有限公司 Searching result gathering display method and system
US9152726B2 (en) 2010-12-01 2015-10-06 Microsoft Technology Licensing, Llc Real-time personalized recommendation of location-related entities
US20130054591A1 (en) * 2011-03-03 2013-02-28 Brightedge Technologies, Inc. Search engine optimization recommendations based on social signals
US9858342B2 (en) 2011-03-28 2018-01-02 Doat Media Ltd. Method and system for searching for applications respective of a connectivity mode of a user device
US9633122B2 (en) * 2011-10-20 2017-04-25 Aol Inc. Systems and methods for web site customization based on time-of-day
CN103064852A (en) * 2011-10-20 2013-04-24 阿里巴巴集团控股有限公司 Website statistical information processing method and website statistical information processing system
US9547872B2 (en) * 2012-02-22 2017-01-17 Ebay Inc. Systems and methods for providing search results along a corridor
CN104380222B (en) * 2012-03-28 2018-03-27 泰瑞·克劳福德 Sector type is provided and browses the method and system for having recorded dialogue
CN102663048B (en) * 2012-03-29 2017-04-12 天津奇思科技有限公司 Method and device for providing search result
CN103365555A (en) * 2012-03-31 2013-10-23 国际商业机器公司 Data processing method and system and data collecting method and system
CN103389984B (en) * 2012-05-08 2018-03-23 百度在线网络技术(北京)有限公司 A kind of method and apparatus for being used to provide collection relevant information in search result
CN102880706A (en) * 2012-07-16 2013-01-16 刘二中 Method for processing link information input by search engine terminal user
CN102789508A (en) * 2012-07-27 2012-11-21 吴建辉 Distributed practical condition search engine and chat system on basis of geographical position
KR101974867B1 (en) * 2012-08-24 2019-08-23 삼성전자주식회사 Apparatas and method fof auto storage of url to calculate contents of stay value in a electronic device
CN103024055B (en) * 2012-12-18 2016-06-15 百度在线网络技术(北京)有限公司 For the Webpage compression method of mobile terminal, system and cloud server
CN103020276A (en) * 2012-12-27 2013-04-03 新浪网技术(中国)有限公司 Method and device for searching social contact objects
US9386071B2 (en) * 2013-01-15 2016-07-05 Allon Caidar System for communicating media to users over a network
CN104166659B (en) * 2013-05-20 2019-03-08 百度在线网络技术(北京)有限公司 A kind of map datum sentences the method and system of weight
US9471693B2 (en) * 2013-05-29 2016-10-18 Microsoft Technology Licensing, Llc Location awareness using local semantic scoring
CN103399957A (en) * 2013-08-21 2013-11-20 百度在线网络技术(北京)有限公司 Searching method, system and engine as well as client
CN104424261B (en) * 2013-08-29 2018-10-02 腾讯科技(深圳)有限公司 Information displaying method based on electronic map and device
CN103533399A (en) * 2013-09-30 2014-01-22 深圳创维-Rgb电子有限公司 Video-information display method and device
US10963951B2 (en) 2013-11-14 2021-03-30 Ebay Inc. Shopping trip planner
CN103646078B (en) * 2013-12-11 2017-01-25 北京启明星辰信息安全技术有限公司 Method and device for realizing internet propaganda monitoring target evaluations
US20150193804A1 (en) * 2014-01-09 2015-07-09 Microsoft Corporation Incentive mechanisms for user interaction and content consumption
JP6114707B2 (en) * 2014-02-28 2017-04-12 富士フイルム株式会社 Product search device, product search system, server system, and product search method
CN104036003B (en) * 2014-06-16 2018-12-14 北京奇虎科技有限公司 search result integration method and device
CN104504069A (en) * 2014-12-22 2015-04-08 北京奇虎科技有限公司 Building method and device for file index
CN105574061A (en) * 2015-05-24 2016-05-11 刘晓建 Method for filtering user generated content by network information acquisition tool
US10275716B2 (en) * 2015-07-30 2019-04-30 Microsoft Technology Licensing, Llc Feeds by modelling scrolling behavior
CN105069076A (en) * 2015-07-31 2015-11-18 北京奇虎科技有限公司 Method and apparatus for determining address information in home page of official website
CN105138697B (en) * 2015-09-25 2018-11-13 百度在线网络技术(北京)有限公司 A kind of search result shows method, apparatus and system
US9703689B2 (en) * 2015-11-04 2017-07-11 International Business Machines Corporation Defect detection using test cases generated from test models
WO2018005903A1 (en) * 2016-06-30 2018-01-04 Zowdow, Inc. Systems and methods for enhanced search, content, and advertisement delivery
US11232164B2 (en) * 2016-07-03 2022-01-25 Gurunavi, Inc. Information providing method, program, and device
US10210278B2 (en) * 2016-08-29 2019-02-19 Google Llc Optimized digital components
CN107959665A (en) * 2016-10-18 2018-04-24 北京视联动力国际信息技术有限公司 A kind of communication means and communication system
CN108062679A (en) * 2016-11-08 2018-05-22 北京国双科技有限公司 Determine the method and device of user's value
CN106713353A (en) * 2017-01-23 2017-05-24 浙江省测绘科学技术研究院 Intelligent seamless aggregation method and system for geographic information service
US10885118B2 (en) * 2017-05-12 2021-01-05 Futurewei Technologies, Inc. Incremental graph computations for querying large graphs
CN107169147B (en) * 2017-06-20 2020-12-01 阿里巴巴(中国)有限公司 Data processing method and device and electronic equipment
CN109377240B (en) * 2018-08-21 2023-10-20 中国平安人寿保险股份有限公司 Commercial tenant management method and device based on neural network, computer equipment and storage medium
CN110674427B (en) * 2019-09-20 2022-04-22 北京达佳互联信息技术有限公司 Method, device, equipment and storage medium for responding to webpage access request
CN110807690B (en) * 2019-10-31 2023-04-28 网易(杭州)网络有限公司 Transaction object data processing method and device
CN112988794B (en) * 2019-12-02 2024-05-03 深圳云天励飞技术有限公司 Data searching method and device capable of dynamically adjusting searching strategy and electronic equipment
CN111915392A (en) * 2020-06-30 2020-11-10 深圳市世强元件网络有限公司 Classified display method for search results of electronic commerce platform of components
CN113204639B (en) * 2021-05-21 2023-07-18 珠海金山数字网络科技有限公司 Document online playing method and device, computing equipment and readable storage medium
US11687534B2 (en) * 2021-06-17 2023-06-27 Huawei Technologies Co., Ltd. Method and system for detecting sensitive data
CN114154027A (en) * 2021-12-06 2022-03-08 深圳市大数据资源管理中心 Non-homologous inconsistent data processing method
CN115860306B (en) * 2022-03-07 2023-06-06 四川大学 Method for detecting public risk perception space-time difference of sudden public and guard event area
CN115908057B (en) * 2023-03-03 2023-08-04 山东理工职业学院 Visual travel information service system and method based on data processing
CN116301869B (en) * 2023-05-17 2023-08-15 建信金融科技有限责任公司 Front-end page configuration management and control system, method, equipment and medium
CN116595142A (en) * 2023-05-19 2023-08-15 大安健康科技(北京)有限公司 Retrieval matching method and system based on medical semantic analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1254136A (en) * 1998-11-12 2000-05-24 英业达股份有限公司 Method for inquiring about index multi-media header data and its device
CN1639710A (en) * 2002-02-28 2005-07-13 皇家飞利浦电子股份有限公司 Displaying search results
CN1728134A (en) * 2004-07-30 2006-02-01 国际商业机器公司 Multi-language network information search method and system based on supertext

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2240663C (en) * 1995-12-30 2004-06-08 Timeline, Inc. Data retrieval method and apparatus with multiple source capability
JP4706143B2 (en) * 2001-08-02 2011-06-22 ソニー株式会社 Information providing method and apparatus
US20050105513A1 (en) * 2002-10-27 2005-05-19 Alan Sullivan Systems and methods for direction of communication traffic
JP3933617B2 (en) * 2003-09-22 2007-06-20 株式会社日立情報システムズ Shared information search method, shared information search program, and information sharing system
US7231405B2 (en) * 2004-05-08 2007-06-12 Doug Norman, Interchange Corp. Method and apparatus of indexing web pages of a web site for geographical searchine based on user location
US7610279B2 (en) * 2006-01-31 2009-10-27 Perfect Market, Inc. Filtering context-sensitive search results

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1254136A (en) * 1998-11-12 2000-05-24 英业达股份有限公司 Method for inquiring about index multi-media header data and its device
CN1639710A (en) * 2002-02-28 2005-07-13 皇家飞利浦电子股份有限公司 Displaying search results
CN1728134A (en) * 2004-07-30 2006-02-01 国际商业机器公司 Multi-language network information search method and system based on supertext

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
JP特开2003-44484A 2003.02.14
JP特开2005-99890A 2005.04.14
全文.
崔建海等.Web环境下的个性化信息检索技术.现代图书情报技术 128.2005,(128),45-49.
崔建海等.Web环境下的个性化信息检索技术.现代图书情报技术 128.2005,(128),45-49. *
赵银春等.基于Web浏览内容和行为相结合的用户兴趣挖掘.计算机工程31 12.2005,31(12),93-94,198.
赵银春等.基于Web浏览内容和行为相结合的用户兴趣挖掘.计算机工程31 12.2005,31(12),93-94,198. *

Also Published As

Publication number Publication date
WO2007095834A1 (en) 2007-08-30
US8176029B2 (en) 2012-05-08
CN101025737A (en) 2007-08-29
US20090094213A1 (en) 2009-04-09

Similar Documents

Publication Publication Date Title
CN101025737B (en) Attention degree based same source information search engine aggregation display method
TWI416344B (en) Computer-implemented method and computer-readable medium for providing access to content
Efron Information search and retrieval in microblogs
CN103221951B (en) Predictive query suggestion caching
US7987261B2 (en) Traffic predictor for network-accessible information modules
US8447640B2 (en) Device, system and method of handling user requests
CN107862553A (en) Advertisement real-time recommendation method, device, terminal device and storage medium
US20110191331A1 (en) Method of and System for Enhanced Local-Device Content Discovery
CN101568921A (en) Dynamic pricing models for digital content
CN101512586A (en) Serving locally relevant advertisements
KR20100094021A (en) Customized and intellectual symbol, icon internet information searching system utilizing a mobile communication terminal and ip-based information terminal
CN102782676A (en) Online search based on geography tagged recommendations
CN101981570A (en) Open framework for integrating, associating and interacting with content objects
JP2011524054A (en) Online reference collection and scoring
US20140136517A1 (en) Apparatus And Methods for Providing Search Results
CN102298621B (en) System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree
CN104050243A (en) Network searching method and system combined with searching and social contact
CN105787066A (en) Digital content distribution system based on total analysis
CN108781223A (en) The data packet transfer optimization of data for content item selection
Kim et al. TwitterTrends: a spatio-temporal trend detection and related keywords recommendation scheme
CN102880622A (en) Method and system for determining user characteristics on internet
CN101788981A (en) Deep web mobile search method, server and system
Yong-hong et al. Research of data mining based on e-commerce
Liu et al. Digitalization and information management mechanism of sports events based on multisensor node cooperative perception model
Yang et al. Micro-blog friend recommendation algorithms based on content and social relationship

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110817

Termination date: 20150222

EXPY Termination of patent right or utility model